Abstract digital illustration of basketball analytics. A glowing basketball is surrounded by dynamic data visualizations, including circular charts, graphs, and network-like connections, symbolizing advanced sports statistics and performance analysis. The design uses a futuristic tech style with dark background and bright neon highlights.

From Data to Victory: Advanced Basketball Analytics with R

A hands-on tutorial with reproducible R code for play-by-play analysis, shot charts, and data-driven decision making.

This post accompanies and expands on the book listed here: Basketball Analytics with R β€” Product Page

Table of Contents

  1. Setup & Data Ingestion
  2. Cleaning and Tidy Structures
  3. Lineups and On/Off Impact
  4. Shot Charts and Efficiency Maps
  5. Pace, Four Factors, and Possessions
  6. Win Probability and In-Game Leverage
  7. Scouting Reports with Reproducible R Markdown
  8. Where to Go Next

1) Setup & Data Ingestion

We will use hoopR (part of the SportsDataverse) to access ESPN-based schedules, box scores, and play-by-play (pbp) data, along with tidyverse for wrangling and ggplot2 for visuals.


# install.packages(c("tidyverse", "lubridate", "ggplot2", "ggforce", "gt"))
# install.packages("hoopR") # or: remotes::install_github("sportsdataverse/hoopR")

library(tidyverse)
library(lubridate)
library(ggplot2)
library(ggforce)
library(gt)
      

2) Cleaning and Tidy Structures

Play-by-play streams often require standardization: consistent team labels, seconds remaining, possession inference, and categorical event tags.


clean_pbp <- function(df){
  df %>%
    mutate(
      sec_left = ifelse(grepl(":", clock_display_value),
                        as.integer(sub(":.*","",clock_display_value))*60 +
                        as.integer(sub(".*:","",clock_display_value)),
                        NA_integer_),
      is_shot = event_type %in% c("made2","made3","miss2","miss3"),
      is_make = event_type %in% c("made2","made3"),
      is_three = event_type %in% c("made3","miss3"),
      value = dplyr::case_when(
        event_type == "made3" ~ 3L,
        event_type == "made2" ~ 2L,
        TRUE ~ 0L
      )
    )
}
      

3) Lineups and On/Off Impact

To study lineup effectiveness, aggregate by stint (continuous period with the same five players). Then compute net rating as points for minus points against per 100 possessions.


# Example outline
# stints <- build_stints_from_subs(pbp_clean)
# ratings <- stints %>%
#   group_by(team_id, lineup_id) %>%
#   summarise(pts_for = sum(value_for),
#             pts_against = sum(value_against),
#             poss = sum(possessions)) %>%
#   mutate(net_rating = 100 * (pts_for - pts_against) / poss)
      

4) Shot Charts and Efficiency Maps

With xy-coordinates relative to the basket, you can render shot charts and hex-efficiency maps in ggplot2.


draw_half_court <- function(){
  list(
    geom_rect(aes(xmin=-25, xmax=25, ymin=0, ymax=47), fill=NA, color="#2dd4bf"),
    geom_circle(aes(x0=0, y0=5.25, r=0.75), color="#2dd4bf"),
    geom_segment(aes(x=-3, xend=3, y=4, yend=4), color="#2dd4bf")
  )
}

ggplot(pbp_shots, aes(x = x, y = y)) +
  draw_half_court() +
  geom_point(aes(color = factor(shot_made)))
      

5) Pace, Four Factors, and Possessions


four_factors <- function(team_box){
  team_box %>%
    mutate(
      poss = 0.5 * ((FGA + 0.44 * FTA + TOV - ORB) +
                    (OppFGA + 0.44 * OppFTA + OppTOV - OppORB)),
      pace = 40 * poss / MinutesPlayed,
      eFG  = (FGM + 0.5 * `3PM`) / FGA,
      TOVR = TOV / poss,
      ORR  = ORB / (ORB + OppDRB),
      FTR  = FTA / FGA
    )
}
      

6) Win Probability and In-Game Leverage


wp_fit <- glm(win ~ margin + possession + poly(sec_left, 2),
              data = states, family = binomial())

states <- states %>%
  mutate(wp = predict(wp_fit, newdata = states, type = "response"))
      

7) Scouting Reports with Reproducible R Markdown


# Example YAML header (Quarto)
# ---
# title: "Opponent Scouting Report"
# params:
#   team: "Example University"
#   date_from: "2025-01-01"
#   date_to: "2025-02-01"
# format: pdf
# ---
      

8) Where to Go Next

If you enjoyed this tutorial, check the full resource here:

Basketball Analytics with R: From hoopR Data to Winning Insights

Leave a Comment

Your email address will not be published. Required fields are marked *