predicting nba games with R

Predicting NBA Games with R and hoopR (Win Probabilities + Heatmaps)

If you love sports analytics, this post will show you how to predict NBA matchups using real ESPN data and the R ecosystem — specifically hoopR, tidyverse, and ggplot2.

We’ll use a simple Poisson model to simulate thousands of possible scores, compute win probabilities, and visualize expected outcomes for any matchup — all inside R.


🏀 What you’ll build

A fully reproducible NBA prediction pipeline that:

  • Aggregates player box scores into team-level points for/against
  • Estimates expected points using recent games and a Poisson model
  • Simulates 100,000 possible scorelines
  • Produces three key visuals:
    1. Win probabilities
    2. Scoreline heatmap
    3. Expected team points

⚙️ Why this approach?

  • Transparent – You can inspect every assumption and adjust parameters.
  • Fast – Pulls ESPN game data directly with hoopR.
  • Flexible – Works for any season or teams with minimal edits.

💻 Full R code

You can copy and run this entire block in RStudio.
Change the HOME_TEAM and AWAY_TEAM codes to simulate any matchup.

library(tidyverse)
library(hoopR)
library(ggplot2)

SEASON <- 2024
HOME_TEAM <- “BOS”
AWAY_TEAM <- “DEN”
N_GAMES <- 15
HOME_ADV <- 0.06
MAX_POINTS <- 150

pb <- hoopR::load_nba_player_box(seasons = SEASON)

dnp_cols <- intersect(c(“did_not_play”,”did_not_dress”,”not_with_team”), names(pb)) if (length(dnp_cols)) pb <- pb %>% filter(if_all(all_of(dnp_cols), is.na))

team_pts <- pb %>%
group_by(game_id, game_date, season_type, team_abbreviation) %>%
summarise(pts_for = sum(points, na.rm = TRUE), .groups = “drop”)

team_games <- team_pts %>%
inner_join(team_pts, by = “game_id”, suffix = c(“”, “_opp”)) %>%
filter(team_abbreviation != team_abbreviation_opp) %>%
group_by(game_id, team_abbreviation) %>% slice_head(n = 1) %>% ungroup() %>%
transmute(
game_id, game_date, season_type, team_abbreviation,
opponent_abbreviation = team_abbreviation_opp,
pts_for, pts_against = pts_for_opp
) %>%
filter(season_type == 2)

team_recent_means <- function(df, team, n = 15) { df %>% filter(team_abbreviation == team) %>%
arrange(game_date) %>% slice_tail(n = n) %>%
summarise(off_avg = mean(pts_for, na.rm = TRUE),
def_avg = mean(pts_against, na.rm = TRUE))
}

home_stats <- team_recent_means(team_games, HOME_TEAM, N_GAMES)
away_stats <- team_recent_means(team_games, AWAY_TEAM, N_GAMES)
home_stats[is.na(home_stats)] <- 113
away_stats[is.na(away_stats)] <- 113

lambda_home <- home_stats$off_avg * (away_stats$def_avg / mean(c(home_stats$def_avg, away_stats$def_avg)))
lambda_away <- away_stats$off_avg * (home_stats$def_avg / mean(c(home_stats$def_avg, away_stats$def_avg)))
lambda_home <- lambda_home * (1 + HOME_ADV)
lambda_home <- max(lambda_home, 95); lambda_away <- max(lambda_away, 95)

set.seed(42); S <- 100000 h <- rpois(S, lambda_home); a <- rpois(S, lambda_away) p_home <- mean(h > a); p_away <- mean(h < a); p_tie <- mean(h == a)

probs <- tibble(
outcome = factor(c(“Home”,”Away”), levels = c(“Home”,”Away”)),
prob = c(p_home, p_away),
label = scales::percent(c(p_home, p_away), accuracy = 0.1)
)

p1 <- ggplot(probs, aes(outcome, prob, fill = outcome)) +
geom_col(width = 0.6, color = “black”) +
geom_text(aes(label = label), vjust = -0.35, size = 5) +
scale_y_continuous(labels = scales::percent_format(accuracy = 1),
expand = expansion(mult = c(0, .08))) +
scale_fill_manual(values = c(“Home”=”#1B9E77″,”Away”=”#D95F02”)) +
labs(title = sprintf(“%s vs %s — Win Probabilities”, HOME_TEAM, AWAY_TEAM),
subtitle = sprintf(“Recent %d games • Independent Poisson model”, N_GAMES),
x = NULL, y = “Probability”,
caption = “Data: ESPN via hoopR”) +
theme_minimal(base_size = 14) +
theme(legend.position=”none”, plot.title=element_text(face=”bold”))

grid <- expand.grid(H = 70:MAX_POINTS, A = 70:MAX_POINTS)
grid$prob <- dpois(grid$H, lambda_home) * dpois(grid$A, lambda_away)
p2 <- ggplot(grid, aes(H, A, fill = prob)) +
geom_tile() +
scale_fill_viridis_c(option = “C”, trans = “sqrt”) +
labs(title = sprintf(“%s vs %s — Scoreline Heatmap (Prob.)”, HOME_TEAM, AWAY_TEAM),
x = sprintf(“Home points — %s”, HOME_TEAM),
y = sprintf(“Away points — %s”, AWAY_TEAM),
fill = “Prob.”) +
theme_minimal(base_size = 14)

expected_pts <- tibble(team = c(HOME_TEAM, AWAY_TEAM),
exp_pts = c(lambda_home, lambda_away))
p3 <- ggplot(expected_pts, aes(reorder(team, exp_pts), exp_pts, fill = team)) +
geom_col(width = 0.6, color = “black”) +
geom_text(aes(label = round(exp_pts, 1)), vjust = -0.35, size = 5) +
coord_flip() +
scale_fill_manual(values = c(“#1B9E77”, “#D95F02”)) +
labs(title=”Expected Team Points (Poisson λ)”, x=”Team”, y=”Points”) +
theme_minimal(base_size = 14) + theme(legend.position=”none”)

print(p1); print(p2); print(p3)

📊 How to read the charts

Win Probabilities
Each bar shows the simulated probability of a win for each team based on their offensive and defensive averages.

Scoreline Heatmap
Darker cells represent more likely combinations of final scores.

Expected Points (λ)
Poisson means for each team — useful as predicted points for betting or fantasy analytics.


⚠️ Limitations

  • Assumes independent Poisson scoring (ignores game pace, possessions, and correlations).
  • No adjustments for player injuries or rest days.
  • A simple model — great for demonstration, not gambling advice.

Leave a Comment

Your email address will not be published. Required fields are marked *