Quantitative Horse Racing with R: Calibration, Backtesting, and Deployment

DuckDB

Parquet

Calibration

Ranking

Bayesian

Odds TS

Backtesting

Racing analytics as an inference-and-decision system

Thoroughbred flat racing is not a binary classification problem. It is a multi-competitor outcome process with hierarchy (horse / trainer / jockey / track), time dependence (form cycles, market moves), and decision layers (how you act on probabilities).

This macro post is deliberately code-heavy. The goal is to show a canonical, reproducible R workflow you can adapt: from a data contract to modeling to evaluation to deployment.

Design principle: treat predictive skill and wagering decisions as separate layers.

Model layer: calibrated probabilities and uncertainty.
Decision layer: explicit assumptions, risk controls, stress tests.

If you do only one thing, do this: never judge models by ROI alone. Use proper scoring rules and calibration first.

1) Canonical data contract: race → runner → odds snapshots

The fastest way to avoid analytics chaos is to standardize your storage model. A minimal schema lets every chapter, notebook, or experiment start from the same tables.

DuckDB schema (SQL)

-- sql/schema.sql
CREATE TABLE IF NOT EXISTS races (
  race_id     VARCHAR PRIMARY KEY,
  race_date   DATE,
  region      VARCHAR,
  track       VARCHAR,
  surface     VARCHAR,
  going       VARCHAR,
  distance_m  INTEGER,
  race_class  VARCHAR,
  purse       DOUBLE,
  field_size  INTEGER
);

CREATE TABLE IF NOT EXISTS runners (
  race_id        VARCHAR,
  runner_id      VARCHAR,
  horse_id       VARCHAR,
  jockey_id      VARCHAR,
  trainer_id     VARCHAR,
  draw           INTEGER,
  weight_kg      DOUBLE,
  age            INTEGER,
  sex            VARCHAR,
  odds_decimal   DOUBLE,
  finish_pos     INTEGER,
  finish_time_s  DOUBLE,
  win            INTEGER,
  PRIMARY KEY (race_id, runner_id)
);

CREATE TABLE IF NOT EXISTS odds_snapshots (
  race_id      VARCHAR,
  runner_id    VARCHAR,
  ts           TIMESTAMP,
  odds_decimal DOUBLE,
  traded_vol   DOUBLE,
  PRIMARY KEY (race_id, runner_id, ts)
);

Why DuckDB?

It queries Parquet locally with high performance (predicate pushdown, parallel scans) and avoids running a server database for most use-cases.

Why Parquet?

Columnar storage + compression + fast scans. Perfect for large runner-level tables and time-series snapshots.

2) Generate a reproducible simulated dataset

High-quality racing data is often license-restricted. A robust approach is to make every lab runnable on a simulator that matches your canonical schema, then provide adapters to ingest real data only when access is lawful.

Race simulator (R)

# R/simulate.R
library(dplyr)
library(tidyr)

simulate_races <- function(
  n_races = 200, n_horses = 600, n_jockeys = 200, n_trainers = 150,
  min_field = 6, max_field = 14, seed = 1
) {
  set.seed(seed)

  horses  <- tibble(horse_id  = sprintf("H%04d", 1:n_horses),
                    ability   = rnorm(n_horses, 0, 1))
  jockeys <- tibble(jockey_id = sprintf("J%03d", 1:n_jockeys),
                    skill     = rnorm(n_jockeys, 0, 0.4))
  trainers<- tibble(trainer_id= sprintf("T%03d", 1:n_trainers),
                    skill     = rnorm(n_trainers, 0, 0.5))

  surfaces <- c("Turf","Dirt","AllWeather")
  goings   <- c("Firm","Good","Gd-Fm","Gd-Sft","Soft","Heavy","Standard","Fast","Sloppy")

  races <- tibble(
    race_id    = sprintf("R%05d", 1:n_races),
    race_date  = as.Date("2025-01-01") + sample(0:365, n_races, TRUE),
    region     = sample(c("GB","IE","US","AU","FR"), n_races, TRUE),
    track      = sample(c("TRACK_A","TRACK_B","TRACK_C"), n_races, TRUE),
    surface    = sample(surfaces, n_races, TRUE, prob = c(0.45,0.35,0.20)),
    going      = sample(goings, n_races, TRUE),
    distance_m = sample(c(1000,1200,1400,1600,1800,2000,2400,2800), n_races, TRUE),
    race_class = sample(c("G1","G2","G3","HCP","ALW","CLM"), n_races, TRUE),
    purse      = round(exp(rnorm(n_races, log(25000), 0.7))),
    field_size = sample(min_field:max_field, n_races, TRUE)
  )

  runners <- races |>
    rowwise() |>
    do({
      r <- .
      k <- r$field_size

      df <- tibble(
        race_id    = r$race_id,
        runner_id  = paste0(r$race_id, "_", seq_len(k)),
        horse_id   = sample(horses$horse_id, k, FALSE),
        jockey_id  = sample(jockeys$jockey_id, k, TRUE),
        trainer_id = sample(trainers$trainer_id, k, TRUE),
        draw       = sample(1:k, k, FALSE),
        weight_kg  = pmax(45, rnorm(k, 55, 3)),
        age        = sample(2:7, k, TRUE),
        sex        = sample(c("C","F","G","H"), k, TRUE, prob = c(0.2,0.35,0.35,0.1))
      ) |>
        left_join(horses, by="horse_id") |>
        left_join(jockeys, by="jockey_id", suffix=c("","_j")) |>
        left_join(trainers, by="trainer_id", suffix=c("","_t")) |>
        mutate(
          draw_effect    = ifelse(k >= 12, (k + 1 - draw)/k, 0),
          surface_effect = case_when(r$surface=="Turf" ~ 0.15,
                                     r$surface=="AllWeather" ~ 0.05,
                                     TRUE ~ 0),
          weight_penalty = -0.03*(weight_kg - 55),
          utility        = ability + skill + skill_t + 0.12*draw_effect +
                           surface_effect + weight_penalty + rnorm(k, 0, 0.25)
        )

      # Sequential proportional-to-exp(utility) finish ordering
      remaining <- seq_len(k)
      w <- exp(df$utility - max(df$utility))
      order_idx <- integer(0)
      for (pos in seq_len(k)) {
        probs <- w[remaining]/sum(w[remaining])
        pick <- sample(remaining, 1, prob = probs)
        order_idx <- c(order_idx, pick)
        remaining <- setdiff(remaining, pick)
      }
      df$finish_pos <- match(seq_len(k), order_idx)
      df$win <- as.integer(df$finish_pos == 1)

      base_time <- r$distance_m/16
      df$finish_time_s <- base_time - 0.8*df$ability + rnorm(k, 0, 1.2)

      # Market-like odds with an overround
      p <- exp(df$utility - max(df$utility)); p <- p/sum(p)
      overround <- 1.18
      df$odds_decimal <- pmin(200, pmax(1.01, 1/(overround*p) * exp(rnorm(k, 0, 0.06))))

      df |>
        select(race_id, runner_id, horse_id, jockey_id, trainer_id,
               draw, weight_kg, age, sex, odds_decimal,
               finish_pos, finish_time_s, win)
    }) |>
    ungroup()

  list(races = races, runners = runners)
}

sim <- simulate_races(seed = 42)

Quick EDA sanity checks

library(ggplot2)

ggplot(sim$races, aes(field_size)) +
  geom_histogram(bins = 12) +
  labs(x = "Runners per race", y = "Count")

ggplot(sim$runners, aes(odds_decimal)) +
  geom_histogram(bins = 50) +
  scale_x_log10() +
  labs(x = "Decimal odds (log scale)", y = "Count")

3) Persist to Parquet and query with DuckDB

A clean pattern is: write Parquet (Arrow) → create tables in DuckDB from read_parquet(). This yields a fast, local warehouse without heavyweight infrastructure.

library(arrow)
library(DBI)
library(duckdb)

dir.create("data/derived", recursive = TRUE, showWarnings = FALSE)
write_parquet(sim$races,   "data/derived/races.parquet")
write_parquet(sim$runners, "data/derived/runners.parquet")

con <- dbConnect(duckdb(), "data/warehouse/racing.duckdb")
dbExecute(con, "INSTALL parquet; LOAD parquet;")

dbExecute(con, "
  CREATE OR REPLACE TABLE races AS
  SELECT * FROM read_parquet('data/derived/races.parquet');
")

dbExecute(con, "
  CREATE OR REPLACE TABLE runners AS
  SELECT * FROM read_parquet('data/derived/runners.parquet');
")

dbGetQuery(con, "SELECT COUNT(*) AS n_races FROM races;")
dbGetQuery(con, "SELECT COUNT(*) AS n_runners FROM runners;")

dbDisconnect(con, shutdown = TRUE)

Query patterns you will use constantly

-- Average odds and win rate by surface
SELECT
  surface,
  AVG(odds_decimal) AS avg_odds,
  AVG(win)          AS win_rate,
  COUNT(*)          AS n
FROM runners r
JOIN races   x USING (race_id)
GROUP BY surface
ORDER BY n DESC;

-- Within-race normalization of implied probability
WITH base AS (
  SELECT
    race_id,
    runner_id,
    odds_decimal,
    1.0 / odds_decimal AS implied_raw
  FROM runners
)
SELECT
  race_id,
  runner_id,
  implied_raw / SUM(implied_raw) OVER (PARTITION BY race_id) AS implied_p_norm
FROM base;

4) Baselines: probabilistic win modeling + calibration

In racing analytics, your model output is typically a probability. That means evaluation should prioritize proper scoring rules and calibration, not just AUC.

Feature store (leakage-safe)

library(dplyr)

features <- sim$runners |>
  left_join(sim$races, by = "race_id") |>
  transmute(
    race_id, runner_id,
    win,
    # pre-race covariates only
    draw, weight_kg, age, sex,
    surface, going, distance_m, race_class, purse,
    odds_decimal,
    log_odds = log(odds_decimal),
    implied_p_raw = 1/odds_decimal
  )

stopifnot(all(is.finite(features$log_odds)))
stopifnot(all(features$odds_decimal >= 1.01))

tidymodels GLM baseline + log loss / Brier / AUC

library(tidymodels)

set.seed(123)

df <- features |>
  group_by(race_id) |>
  mutate(implied_p = implied_p_raw / sum(implied_p_raw)) |>
  ungroup() |>
  mutate(win = factor(win, levels = c(0,1), labels = c("no","yes")))

spl  <- initial_split(df, strata = win)
train<- training(spl)
test <- testing(spl)

rec <- recipe(win ~ draw + weight_kg + age + sex + surface + going +
                distance_m + odds_decimal, data = train) |>
  step_dummy(all_nominal_predictors()) |>
  step_zv(all_predictors()) |>
  step_normalize(all_numeric_predictors())

mod <- logistic_reg() |> set_engine("glm")
wf  <- workflow() |> add_recipe(rec) |> add_model(mod)

fit  <- fit(wf, train)
pred <- predict(fit, test, type = "prob") |>
  bind_cols(test |> select(win))

metric_set(mn_log_loss, brier_class, roc_auc)(pred, truth = win, .pred_yes)

Calibration plot

library(probably)

# Binned calibration plot
cal_plot_breaks(pred, truth = win, estimate = .pred_yes, num_breaks = 10)

Calibration mindset: a model can be “accurate” in ranking yet systematically overconfident. In betting or pricing contexts, miscalibration is usually more dangerous than a small drop in AUC.

5) Multi-runner outcome modeling: rankings, not just winners

Binary win models throw away a lot of structure. Racing naturally produces a ranking: finish positions across multiple runners. Ranking models let you target that structure directly.

Plackett–Luce (direct ranking likelihood)

library(PlackettLuce)

set.seed(99)
items  <- paste0("H", 1:12)
n_races <- 40
ability <- rnorm(length(items)); names(ability) <- items

Rmat <- matrix(0, nrow = n_races, ncol = length(items))
colnames(Rmat) <- items

for (r in 1:n_races) {
  field <- sample(items, size = sample(6:10, 1), replace = FALSE)
  util  <- ability[field] + rnorm(length(field), 0, 0.3)
  ord   <- field[order(util, decreasing = TRUE)]
  Rmat[r, ord] <- seq_along(ord) # 1 = best; 0 = absent
}

rankings <- as.rankings(Rmat)
pl <- PlackettLuce(rankings)

summary(pl)
coef(pl)  # worth parameters

Bradley–Terry via pairwise decomposition

library(dplyr)
library(BradleyTerry2)

pairwise <- tibble()
for (r in 1:n_races) {
  in_race <- names(which(Rmat[r, ] > 0))
  ranks   <- Rmat[r, in_race]
  for (i in 1:(length(in_race) - 1)) {
    for (j in (i + 1):length(in_race)) {
      h1 <- in_race[i]; h2 <- in_race[j]
      win1 <- as.integer(ranks[h1] < ranks[h2])
      pairwise <- bind_rows(pairwise, tibble(h1=h1, h2=h2, win1=win1, win2=1-win1))
    }
  }
}

agg <- pairwise |>
  group_by(h1, h2) |>
  summarise(win1 = sum(win1), win2 = sum(win2), .groups = "drop")

bt <- BTm(cbind(win1, win2), h1, h2, data = agg, id = "horse")
BTabilities(bt)

Interpretation

Both families estimate “ability-like” latent parameters, but they do it with different data representations: full rankings vs aggregated pairwise outcomes.

When rankings matter

If you care about place markets, exactas/quinellas, or modeling finish times/positions beyond “win,” ranking-aware methods often align better with the objective.

6) Hierarchy: GLMMs and Bayesian multilevel models

Racing is fundamentally hierarchical: repeated horses, trainers, jockeys, tracks, regions. Fixed effects can overfit; multilevel models provide shrinkage and better uncertainty accounting.

GLMM with trainer/jockey random effects (lme4)

library(lme4)
library(dplyr)

sim2 <- simulate_races(n_races=500, n_horses=800, n_jockeys=120, n_trainers=100, seed=202)
df2  <- sim2$runners |> left_join(sim2$races, by="race_id")

m_glmm <- glmer(
  win ~ scale(weight_kg) + scale(draw) + surface + going +
    (1 | trainer_id) + (1 | jockey_id),
  data   = df2,
  family = binomial()
)

summary(m_glmm)

Bayesian multilevel model (brms)

library(brms)

priors <- c(
  prior(normal(0, 1), class = "b"),
  prior(exponential(1), class = "sd")
)

fit_bayes <- brm(
  win ~ scale(weight_kg) + scale(draw) + surface + going +
    (1 | trainer_id) + (1 | jockey_id),
  data   = df2,
  family = bernoulli(),
  prior  = priors,
  chains = 2, iter = 1000, cores = 2, seed = 202
)

summary(fit_bayes)
pp_check(fit_bayes)

Why Bayesian here? You get a principled way to encode priors (e.g., smaller effects are more likely than huge ones), propagate uncertainty through derived probabilities, and do posterior predictive checks.

7) Time-to-event: survival and frailty modeling

Some questions in racing are naturally time-to-event: time to first win, time to peak performance, time to retirement, time between races. Survival analysis gives you the right likelihood and handles censoring.

Cox proportional hazards

library(dplyr)
library(survival)

set.seed(1)
n <- 1000

horses <- tibble(
  horse_id  = sprintf("H%04d", 1:n),
  talent    = rnorm(n),
  trainer_id= sample(sprintf("T%03d", 1:80), n, TRUE),
  debut_age = pmax(2, rnorm(n, 3.0, 0.6))
)

base_rate <- 0.002
linpred   <- 0.65 * horses$talent - 0.25 * (horses$debut_age - 3)
rate      <- base_rate * exp(linpred)

time_days   <- rexp(n, rate = rate)
censor_days <- runif(n, 60, 700)

status   <- as.integer(time_days <= censor_days)
time_obs <- pmin(time_days, censor_days)

df_surv <- horses |> mutate(time_days = time_obs, status = status)

cox <- coxph(Surv(time_days, status) ~ scale(talent) + scale(debut_age), data = df_surv)
summary(cox)
cox.zph(cox)

Frailty term (trainer-level heterogeneity)

cox_f <- coxph(
  Surv(time_days, status) ~ scale(talent) + scale(debut_age) + frailty(trainer_id),
  data = df_surv
)

summary(cox_f)

8) Modern ML: XGBoost in tidymodels and mlr3

Gradient boosting can capture non-linearities and interactions without manual feature engineering. But if you don’t evaluate probabilistically (log loss, calibration), you can accidentally ship a model that “ranks well” yet prices poorly.

tidymodels: tuned XGBoost

library(tidymodels)
library(dplyr)

set.seed(314)
sim3 <- simulate_races(n_races = 500, n_horses = 1200, seed = 314)

df3 <- sim3$runners |>
  left_join(sim3$races, by="race_id") |>
  mutate(win = factor(win, levels=c(0,1), labels=c("no","yes")))

spl   <- initial_split(df3, strata = win)
train <- training(spl)
test  <- testing(spl)

rec <- recipe(win ~ draw + weight_kg + age + sex + surface + going +
               distance_m + odds_decimal, data=train) |>
  step_dummy(all_nominal_predictors()) |>
  step_zv(all_predictors())

folds <- vfold_cv(train, v = 5, strata = win)

xgb_spec <- boost_tree(
  trees = 800,
  tree_depth     = tune(),
  learn_rate     = tune(),
  loss_reduction = tune(),
  sample_size    = tune(),
  mtry           = tune()
) |>
  set_engine("xgboost") |>
  set_mode("classification")

wf <- workflow() |> add_recipe(rec) |> add_model(xgb_spec)

grid <- grid_space_filling(
  tree_depth(),
  learn_rate(),
  loss_reduction(),
  sample_size = sample_prop(),
  mtry(range = c(5, 60)),
  size = 20
)

res <- tune_grid(
  wf,
  resamples = folds,
  grid = grid,
  metrics = metric_set(mn_log_loss, roc_auc)
)

best <- select_best(res, "mn_log_loss")
final_fit <- finalize_workflow(wf, best) |> fit(train)

pred <- predict(final_fit, test, type="prob") |> bind_cols(test |> select(win))
metric_set(mn_log_loss, brier_class, roc_auc)(pred, truth = win, .pred_yes)

mlr3: comparable workflow

library(data.table)
library(mlr3)
library(mlr3learners)

set.seed(99)
sim4 <- simulate_races(n_races = 400, n_horses = 900, seed = 99)

df4 <- as.data.table(merge(sim4$runners, sim4$races, by = "race_id"))
df4[, win := factor(ifelse(win == 1, "yes", "no"))]

task <- as_task_classif(df4, target = "win", id = "win_task")
task$set_col_roles(
  c("race_id","runner_id","horse_id","jockey_id","trainer_id"),
  roles = "name"
)

rsmp <- rsmp("cv", folds = 5)

lrn_glm <- lrn("classif.log_reg", predict_type = "prob")
rr_glm  <- resample(task, lrn_glm, rsmp)
rr_glm$aggregate(msr("classif.logloss"))
rr_glm$aggregate(msr("classif.auc"))

lrn_xgb <- lrn("classif.xgboost", predict_type = "prob",
               nrounds = 300, eta = 0.05, max_depth = 4)
rr_xgb <- resample(task, lrn_xgb, rsmp)
rr_xgb$aggregate(msr("classif.logloss"))
rr_xgb$aggregate(msr("classif.auc"))

9) Odds time series: from snapshots to features

Exchange markets produce time-stamped odds and volumes. Treat them as time series and engineer features like drift, volatility, late steam, and liquidity. Keep it honest: market microstructure is noisy, and backtests require strong assumptions.

Simulated snapshots as a tsibble + feature extraction

library(dplyr)
library(tsibble)
library(feasts)
library(ggplot2)

set.seed(1)
snap <- tibble(
  runner_id = "R00001_01",
  ts = as.POSIXct("2025-06-01 14:00:00", tz = "UTC") + seq(0, 60*60, by = 60),
  odds_decimal = pmax(1.01, 6 + cumsum(rnorm(61, 0, 0.05))),
  traded_vol = cumsum(pmax(0, rnorm(61, 10, 3)))
)

tsbl <- snap |> as_tsibble(index = ts, key = runner_id)

feat <- tsbl |>
  features(odds_decimal, list(
    mean  = ~ mean(.),
    sd    = ~ sd(.),
    min   = ~ min(.),
    max   = ~ max(.),
    slope = ~ coef(lm(. ~ seq_along(.)))[2]
  ))

feat

ggplot(tsbl, aes(ts, odds_decimal)) +
  geom_line() +
  labs(x = "Time", y = "Decimal odds")

Schema normalization: snapshots table

# Example transformation skeleton for real exchange data:
# 1) parse raw feed messages
# 2) map to normalized columns: race_id, runner_id, ts, odds_decimal, traded_vol
# 3) write Parquet and load into DuckDB

# write_parquet(odds_snapshots, "data/derived/odds_snapshots.parquet")
# dbExecute(con, "CREATE OR REPLACE TABLE odds_snapshots AS
#                 SELECT * FROM read_parquet('data/derived/odds_snapshots.parquet');")

10) Backtesting as a decision layer (fractional Kelly + guardrails)

Once you have calibrated probabilities, you can add a decision policy. One common framework is fractional Kelly sizing, but it must be capped and stress tested. The same probabilities can produce very different outcomes depending on execution constraints, market impact, limits, and selection filters.

Simulated fractional Kelly example with risk caps

library(dplyr)
library(ggplot2)

sim5 <- simulate_races(n_races = 180, n_horses = 500, seed = 9)
runners <- sim5$runners |>
  group_by(race_id) |>
  mutate(
    p_market = (1/odds_decimal) / sum(1/odds_decimal),
    # toy "model": market + noise (for demonstration)
    p_model  = pmin(0.99, pmax(0.001, p_market + rnorm(n(), 0, 0.02))),
    edge     = p_model - p_market
  ) |>
  ungroup()

fractional_kelly <- function(p, odds_dec, fraction = 0.25, cap = 0.05) {
  b <- odds_dec - 1
  f_star <- (b*p - (1 - p)) / b
  f <- pmax(0, f_star) * fraction
  pmin(f, cap)
}

bankroll <- 1
max_drawdown <- 0.25
peak <- bankroll
hist <- tibble()

for (rid in unique(runners$race_id)) {
  if (bankroll < (1 - max_drawdown) * peak) break  # guardrail: stop

  race <- runners |> filter(race_id == rid)
  pick <- race |> arrange(desc(edge)) |> slice(1)

  if (pick$edge > 0.01 && pick$odds_decimal <= 30) {
    f <- fractional_kelly(pick$p_model, pick$odds_decimal, fraction = 0.25, cap = 0.05)
    stake <- bankroll * f
    pnl <- if (pick$win == 1) stake * (pick$odds_decimal - 1) else -stake
    bankroll <- bankroll + pnl
    peak <- max(peak, bankroll)
  }

  hist <- bind_rows(hist, tibble(race_id = rid, bankroll = bankroll, peak = peak))
}

hist <- hist |> mutate(drawdown = 1 - bankroll/peak)

ggplot(hist, aes(seq_along(bankroll), bankroll)) + geom_line() +
  labs(x = "Race index", y = "Bankroll")

ggplot(hist, aes(seq_along(drawdown), drawdown)) + geom_line() +
  labs(x = "Race index", y = "Drawdown")

Practical warning: this example is intentionally simplified. Real backtests require: commissions, slippage, bet limits, late price changes, liquidity constraints, and selection bias checks.

11) Deployment: versioned models + HTTP prediction API

If you want your work to be reusable, build a deployable artifact. A clean R-native pattern is: pins for versioning + vetiver for model packaging + plumber for the API server.

library(tidymodels)
library(pins)
library(vetiver)
library(plumber)
library(dplyr)

sim6 <- simulate_races(n_races = 250, n_horses = 700, seed = 101)

df6 <- sim6$runners |>
  left_join(sim6$races, by="race_id") |>
  mutate(win = factor(win, levels=c(0,1), labels=c("no","yes")))

rec <- recipe(win ~ draw + weight_kg + age + sex + surface + going +
                distance_m + odds_decimal, data = df6) |>
  step_dummy(all_nominal_predictors()) |>
  step_zv(all_predictors())

fit <- workflow() |>
  add_recipe(rec) |>
  add_model(logistic_reg() |> set_engine("glm")) |>
  fit(df6)

board <- board_folder("pins", versioned = TRUE)
v <- vetiver_model(fit, model_name = "win_prob_glm")

vetiver_pin_write(board, v)

api <- pr() |> vetiver_api(v)

# Run locally:
# api |> pr_run(host = "0.0.0.0", port = 8080)

api

Minimal request payload shape (JSON)

{
  "draw": 7,
  "weight_kg": 55.5,
  "age": 4,
  "sex": "G",
  "surface": "Turf",
  "going": "Good",
  "distance_m": 1600,
  "odds_decimal": 6.2
}

12) Reproducibility mechanics: renv + deterministic builds

Reproducibility is not vibes. Lock dependencies, record versions, and standardize builds. This is especially important when your analytics becomes a book, a report, or a product artifact.

renv workflow

# Initialize (once)
# renv::init()

# Snapshot current library into renv.lock
renv::snapshot()

# Restore exact package versions from renv.lock
renv::restore()

Dockerfile for deterministic PDF builds (Quarto)

FROM rocker/verse:4.4.2
WORKDIR /book

COPY renv.lock renv.lock
COPY renv/ renv/
COPY .Rprofile .Rprofile

RUN R -q -e "install.packages('renv'); renv::restore(prompt = FALSE)"

COPY . .
CMD ["quarto", "render", "book", "--to", "pdf"]

CI skeleton (GitHub Actions)

name: Render PDF
on:
  push:
    branches: [ main ]
  workflow_dispatch:

jobs:
  pdf:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: r-lib/actions/setup-r@v2
      - uses: r-lib/actions/setup-renv@v2
      - name: Render Quarto book
        run: quarto render book --to pdf

13) An “engineering checklist” you can actually follow

Data contract

Canonical schema
Parquet + DuckDB
Adapters, not redistribution

Modeling

GLM/GLMM baselines
Ranking likelihoods
Bayesian uncertainty

Evaluation

Log loss / Brier
Calibration curves
Stress-tested backtests

Time dependence

Odds snapshots
Drift/volatility features
Liquidity constraints

Decision layer

Explicit assumptions
Fractional Kelly caps
Drawdown guardrails

Deployment

pins versioning
vetiver packaging
plumber API

One more code snippet: within-race probability normalization

Many racing tables store “odds” per runner, but for inference you often want a normalized probability within a field. Here’s a standard pattern that also highlights overround issues.

library(dplyr)

normalize_field_probs <- function(df, odds_col = odds_decimal) {
  df |>
    group_by(race_id) |>
    mutate(
      implied_raw = 1 / {{ odds_col }},
      overround   = sum(implied_raw),
      p_market    = implied_raw / overround
    ) |>
    ungroup()
}

df_norm <- normalize_field_probs(sim$runners)

df_norm |>
  summarise(
    avg_overround = mean(overround),
    pmin_overround= min(overround),
    pmax_overround= max(overround)
  )

Interpretation: overround > 1 implies the market’s implied probabilities sum above 1 (built-in margin). Separating “market implied probability” from “model probability” is essential if you want honest evaluation.

If you’re building a full, reproducible workflow (data contract → modeling → evaluation → deployment), you may want a more structured, end-to-end blueprint to keep everything consistent. You can find a concise reference here: racing-analytics-r.

Tip: keep your repository “clean-room” by shipping simulated data + adapters, and never bundling restricted datasets. That one decision makes your work easier to share, review, and publish.