Hierarchical Bayesian modeling (also called multilevel modeling) is one of the most reliable ways to build predictive and inferential models when your data has natural grouping—teams, players, seasons, leagues, referees, venues, or even game states. In sports analytics, that grouping is unavoidable. In R, hierarchical Bayesian models are commonly implemented via brms (Stan), rstanarm, or cmdstanr.

This tutorial focuses on partial pooling (a.k.a. shrinkage) and why it’s the default choice for academic, production-grade modeling: it reduces overfitting, improves out-of-sample performance, and produces honest uncertainty quantification. We will use a sports dataset as a concrete example, but the modeling principles generalize to many domains (education, marketing, medicine, A/B testing, and more).

Core keywords covered: hierarchical Bayesian models in R, partial pooling, multilevel modeling, Bayesian shrinkage, Bayesian inference in Stan, brms tutorial, posterior predictive checks, priors, model calibration, sports analytics in R.


Pooling Strategies: No Pooling, Complete Pooling, Partial Pooling

Suppose we want to estimate team strength. There are three classic approaches:

  • No pooling: estimate a separate parameter for each team using only that team’s data. This can be high-variance (overfits small samples).
  • Complete pooling: ignore teams and estimate one global parameter. This is low-variance but high-bias (misses real differences).
  • Partial pooling: estimate team parameters while sharing information via a population-level distribution. This is the hierarchical Bayesian compromise that tends to dominate in practice.

In partial pooling, each group effect is “pulled” toward the global mean in proportion to how much information that group has. Teams with few matches shrink more; teams with many matches shrink less. This is not a trick—it’s what the posterior implies under a reasonable hierarchical prior structure.


R Setup

We will fit models using brms, which compiles Bayesian models to Stan and provides a high-level formula interface. The workflow below emphasizes reproducibility and good Bayesian practice: priors, diagnostics, posterior predictive checks, and principled comparison against simpler baselines.

# Core modeling stack
install.packages(c("brms", "tidyverse", "posterior", "bayesplot", "tidybayes"))
# Optional but recommended for speed (CmdStan backend)
install.packages("cmdstanr")

library(brms)
library(tidyverse)
library(posterior)
library(bayesplot)
library(tidybayes)

# If using cmdstanr (recommended), set backend once:
# cmdstanr::install_cmdstan()
options(brms.backend = "cmdstanr")

Example Data: Match Outcomes with Team Effects

To keep this tutorial self-contained, we’ll simulate a dataset that resembles soccer-style goal scoring. The same structure appears in hockey, handball, futsal, and other low-to-moderate scoring sports. The key is that each match has two teams and outcomes depend on latent team strength.

set.seed(123)

n_teams <- 20
teams   <- paste0("Team_", seq_len(n_teams))

# True latent parameters (unknown in real life)
true_attack  <- rnorm(n_teams, 0, 0.35)
true_defense <- rnorm(n_teams, 0, 0.35)
true_home_adv <- 0.20

# Schedule: random pairings
n_matches <- 600
home_id <- sample(seq_len(n_teams), n_matches, replace = TRUE)
away_id <- sample(seq_len(n_teams), n_matches, replace = TRUE)
# Avoid self-matches
same <- home_id == away_id
while (any(same)) {
  away_id[same] <- sample(seq_len(n_teams), sum(same), replace = TRUE)
  same <- home_id == away_id
}

home_team <- teams[home_id]
away_team <- teams[away_id]

# Poisson rates for goals
log_lambda_home <- true_home_adv + true_attack[home_id] - true_defense[away_id]
log_lambda_away <-               true_attack[away_id] - true_defense[home_id]

lambda_home <- exp(log_lambda_home)
lambda_away <- exp(log_lambda_away)

home_goals <- rpois(n_matches, lambda_home)
away_goals <- rpois(n_matches, lambda_away)

matches <- tibble(
  match_id   = seq_len(n_matches),
  home_team  = factor(home_team),
  away_team  = factor(away_team),
  home_goals = home_goals,
  away_goals = away_goals
)

matches %>% glimpse()

In a real sports analytics pipeline, you would replace simulation with data ingestion (CSV/APIs), feature engineering (rest days, injuries, xG proxies, travel, Elo, etc.), and train/test splits. The hierarchical core remains: group-level effects with partial pooling.


Baseline: Complete Pooling (No Team Effects)

As a baseline, model home goals using only a global intercept and home advantage indicator. This is intentionally too simple: it cannot learn that some teams are stronger than others.

m_pool <- brm(
  home_goals ~ 1,
  data   = matches,
  family = poisson(),
  prior  = c(
    prior(normal(0, 1.0), class = "Intercept")
  ),
  chains = 4, cores = 4, iter = 2000, seed = 1
)

summary(m_pool)

Complete pooling often underfits: it produces well-behaved uncertainty but misses systematic structure. Next we move to no pooling and then partial pooling.


No Pooling: Separate Team Parameters (High Variance)

No pooling estimates a fixed effect for each team without sharing strength across teams. This can be unstable when some teams have fewer observations or unbalanced schedules.

m_nopool <- brm(
  home_goals ~ 1 + home_team + away_team,
  data   = matches,
  family = poisson(),
  prior  = c(
    prior(normal(0, 1.0), class = "Intercept"),
    prior(normal(0, 0.5), class = "b")  # regularization, but still no pooling
  ),
  chains = 4, cores = 4, iter = 2000, seed = 2
)

summary(m_nopool)

Even with regularizing priors, no pooling is typically noisier than hierarchical partial pooling, especially in small samples. The multilevel model handles this in a more principled way.


Partial Pooling: Hierarchical (Multilevel) Team Effects

A standard approach is to model goals with random effects for home and away team. In brms syntax, (1 | home_team) means each home team gets its own intercept drawn from a common distribution. The distribution’s standard deviation is learned from data and controls the amount of shrinkage.

Below, we fit a model for home goals with hierarchical intercepts for both home and away teams. This captures: home attacking propensity (home team effect) and away defensive vulnerability (away team effect), albeit in a simplified way.

priors_hier <- c(
  prior(normal(0, 1.0), class = "Intercept"),
  # SD priors control how much team-to-team variation is plausible
  prior(exponential(1.0), class = "sd")
)

m_hier <- brm(
  home_goals ~ 1 + (1 | home_team) + (1 | away_team),
  data   = matches,
  family = poisson(),
  prior  = priors_hier,
  chains = 4, cores = 4, iter = 2500, seed = 3
)

summary(m_hier)

Interpretation: the model estimates a global goal rate (intercept) plus deviations for each team, but those deviations are partially pooled. This is the practical meaning of partial pooling: group-level parameters borrow statistical strength.


Modeling Both Scores Jointly

A more realistic sports modeling setup estimates both home and away goals and separates attack/defense structure. brms supports multivariate models. Here’s a simple bivariate Poisson-like approach by fitting two Poisson responses with correlated random effects (conceptually similar to attack/defense components).

bf_home <- bf(home_goals ~ 1 + (1 | home_team) + (1 | away_team))
bf_away <- bf(away_goals ~ 1 + (1 | away_team) + (1 | home_team))

m_mv <- brm(
  bf_home + bf_away + set_rescor(FALSE),
  data   = matches,
  family = poisson(),
  prior  = c(
    prior(normal(0, 1.0), class = "Intercept", resp = "homegoals"),
    prior(normal(0, 1.0), class = "Intercept", resp = "awaygoals"),
    prior(exponential(1.0), class = "sd")
  ),
  chains = 4, cores = 4, iter = 3000, seed = 4
)

summary(m_mv)

This multivariate model is already a meaningful step toward academic-grade sports inference: it produces posterior distributions for team effects with coherent uncertainty, and it avoids overreacting to noisy short-term form.


Diagnostics and Posterior Predictive Checks (Academic Essentials)

A Bayesian model is only as credible as its diagnostics. At minimum, check: R-hat (convergence), effective sample size, and posterior predictive fit.

# Convergence diagnostics
m_hier %>% summary()

# Posterior predictive checks: do simulated goals resemble observed?
pp_check(m_hier, type = "hist", ndraws = 100)

# Another useful view: check distribution by team (subset example)
some_teams <- levels(matches$home_team)[1:6]
pp_check(m_hier, type = "hist", ndraws = 50) +
  ggplot2::facet_wrap(~ home_team, ncol = 3)

Posterior predictive checks help detect mis-specification: too many zeros, heavy tails, or systematic under/over-dispersion. If your sport has extra dispersion, consider a negative binomial likelihood: family = negbinomial().

m_hier_nb <- brm(
  home_goals ~ 1 + (1 | home_team) + (1 | away_team),
  data   = matches,
  family = negbinomial(),
  prior  = c(
    prior(normal(0, 1.0), class = "Intercept"),
    prior(exponential(1.0), class = "sd"),
    prior(exponential(1.0), class = "shape")  # NB dispersion
  ),
  chains = 4, cores = 4, iter = 2500, seed = 5
)

pp_check(m_hier_nb, type = "hist", ndraws = 100)

Visualizing Partial Pooling (Shrinkage) in R

The cleanest way to “see” partial pooling is to compare group estimates under: (a) no pooling and (b) hierarchical pooling. Teams with limited data will shrink toward the global mean in the hierarchical model.

# Extract team effects from both models
re_hier <- ranef(m_hier)$home_team[, , "Intercept"] %>%
  as_tibble(.name_repair = "minimal") %>%
  setNames(c("estimate", "est_error", "q2.5", "q97.5")) %>%
  mutate(team = rownames(ranef(m_hier)$home_team[, , "Intercept"]))

# No pooling fixed effects: home_team coefficients (approx comparison)
fix_nopool <- fixef(m_nopool) %>% as.data.frame() %>% rownames_to_column("term") %>%
  filter(str_starts(term, "home_team")) %>%
  mutate(team = str_remove(term, "home_team")) %>%
  transmute(team, estimate = Estimate, q2.5 = Q2.5, q97.5 = Q97.5)

# Join and compare
comp <- re_hier %>%
  left_join(fix_nopool, by = "team", suffix = c("_hier", "_nopool"))

comp %>%
  ggplot(aes(x = estimate_nopool, y = estimate_hier)) +
  geom_point() +
  geom_abline(slope = 1, intercept = 0) +
  labs(
    x = "No pooling estimate (fixed effects)",
    y = "Partial pooling estimate (hierarchical)",
    title = "Shrinkage: hierarchical estimates pull extreme values toward the mean"
  )

In applied work, this shrinkage is a feature, not a bug: it protects you from chasing noise, especially early in a season or when some teams have faced unusually strong/weak opponents.


Posterior Predictions: From Parameters to Predictive Distributions

Hierarchical Bayesian models shine when you need uncertainty-aware predictions. Instead of a single number, you get a full posterior predictive distribution—useful for forecast intervals, simulations, and decision analysis.

# Create a small set of future fixtures (example)
new_matches <- tibble(
  home_team = factor(c("Team_1", "Team_2", "Team_3"), levels = levels(matches$home_team)),
  away_team = factor(c("Team_4", "Team_5", "Team_6"), levels = levels(matches$away_team))
)

# Posterior expected goals (lambda) for home_goals model
epred <- posterior_epred(m_hier, newdata = new_matches, ndraws = 2000)
# epred is draws x rows. Summarize mean and interval per match:
pred_summary <- apply(epred, 2, function(x) {
  c(mean = mean(x), q10 = quantile(x, 0.10), q90 = quantile(x, 0.90))
}) %>% t() %>% as_tibble()

bind_cols(new_matches, pred_summary)

If you need simulated goal counts (not just expected value), use posterior_predict().

yrep <- posterior_predict(m_hier, newdata = new_matches, ndraws = 2000)

sim_summary <- apply(yrep, 2, function(x) {
  c(mean = mean(x), q10 = quantile(x, 0.10), q90 = quantile(x, 0.90))
}) %>% t() %>% as_tibble()

bind_cols(new_matches, sim_summary)

Model Comparison: Why Hierarchical Often Wins

A common academic question: does partial pooling actually improve predictive accuracy? One principled approach is approximate leave-one-out cross-validation (LOO) using loo().

# Requires: install.packages("loo")
library(loo)

loo_pool   <- loo(m_pool)
loo_nopool <- loo(m_nopool)
loo_hier   <- loo(m_hier)

loo_compare(loo_pool, loo_nopool, loo_hier)

In many real datasets, hierarchical models dominate no-pooling models because they reduce variance without imposing the high bias of complete pooling. When the sample is large and balanced, no-pooling can catch up, but it remains more fragile under distribution shift (new season, roster changes, schedule imbalance).


Priors for Multilevel Sports Models: Practical Guidance

Priors are not “optional decoration”—they formalize regularization and encode plausible scales. For sports scoring, consider:

  • Intercept prior: reflects typical goal rate (log scale for Poisson).
  • Group SD priors: control how much teams can vary from the league mean.
  • Likelihood choice: Poisson vs negative binomial for overdispersion.
# Example: informative-ish intercept prior based on typical goals per match
# If average home goals ~ 1.4, then log(1.4) ~ 0.336
priors_sporty <- c(
  prior(normal(log(1.4), 0.5), class = "Intercept"),
  prior(exponential(1.0), class = "sd")
)

m_hier2 <- brm(
  home_goals ~ 1 + (1 | home_team) + (1 | away_team),
  data   = matches,
  family = poisson(),
  prior  = priors_sporty,
  chains = 4, cores = 4, iter = 2500, seed = 6
)

If you publish academic-style modeling content, a short prior sensitivity check is a strong credibility signal: refit with slightly wider SD priors and confirm conclusions are stable.


Practical Notes for Real Sports Data

  • Unbalanced schedules: hierarchical structure helps stabilize estimates when teams face different opponent quality.
  • Small samples: early season or new leagues are where partial pooling is most valuable.
  • Covariates: add rest days, travel, injuries, Elo, or rolling form as fixed effects, but keep team effects hierarchical.
  • Time dynamics: for long seasons, consider random walks or season-by-season hierarchical layers.
# Add a covariate example (simulated here):
matches2 <- matches %>%
  mutate(rest_diff = rnorm(n(), 0, 1))  # placeholder for real engineered feature

m_cov <- brm(
  home_goals ~ 1 + rest_diff + (1 | home_team) + (1 | away_team),
  data   = matches2,
  family = poisson(),
  prior  = c(
    prior(normal(0, 1.0), class = "Intercept"),
    prior(normal(0, 0.3), class = "b"),   # effect size prior for covariate
    prior(exponential(1.0), class = "sd")
  ),
  chains = 4, cores = 4, iter = 2500, seed = 7
)

summary(m_cov)

Further Reading and Deeper Projects (Optional)

If you want to extend this tutorial into a full sports analytics workflow—data acquisition, feature engineering, predictive evaluation, and model deployment—two longer-form references can be useful as project companions:

You can treat those as optional deep-dives; the core multilevel concepts in this post stand on their own and transfer cleanly to non-sports hierarchical modeling problems.


Conclusion

Partial pooling is the practical heart of hierarchical Bayesian modeling. In R, multilevel models with brms/Stan give you: (1) stable group estimates, (2) principled regularization, and (3) posterior predictive uncertainty that supports simulation, forecasting, and evaluation. If your data has groups—and most real-world data does—hierarchical Bayes is often the most defensible baseline you can set.