worldfootballR The Complete Guide for Soccer Data in R

worldfootballR Guide (2026): FBref, Understat & Transfermarkt in R

A practical, code-first guide to collecting soccer data, building tidy datasets, and producing shareable analysis in R using worldfootballR.


Want the full end-to-end project template (clean folders, reusable functions, and advanced modeling examples)?

📘 Mastering Football Data with worldfootballR — practical pipelines with examples from FBref, Transfermarkt, and Understat.

What is worldfootballR?

worldfootballR is an open-source R package that helps you collect and analyze soccer data from popular sources like FBref, Transfermarkt, and Understat. Instead of manually copying tables or scraping pages by hand, you can use a consistent set of functions to pull structured data and immediately start analysis.

If you’re doing sports analytics in R—player scouting, team profiling, match modeling, xG-based analysis—this package can save you hours.

Install & Setup

You can install from CRAN (stable) or GitHub (latest development version).

Install from CRAN

install.packages("worldfootballR")

Install the latest version from GitHub

install.packages("devtools")
devtools::install_github("JaseZiv/worldfootballR")

Load the package

library(worldfootballR)

Tip: If you’re starting fresh, install a small “analytics toolkit” too:

install.packages(c("dplyr","tidyr","purrr","stringr","readr","ggplot2","janitor"))

Data Sources: FBref vs Transfermarkt vs Understat

worldfootballR supports different sources. Here’s how to think about them:

  • FBref: rich tables for teams/players, season stats, match logs, advanced metrics.
  • Transfermarkt: squads, player market values, transfer histories, staff details.
  • Understat: expected goals (xG), shot-level data, shot locations, league and team data.

Most practical workflows combine sources: FBref for structured season stats, Understat for xG/shot detail, and Transfermarkt for market context.

A Reproducible Workflow (Recommended Project Structure)

The biggest difference between “a script that works once” and “analysis you can trust” is a reproducible workflow. Here’s a simple structure you can use:

soccer-project/
  data_raw/
  data_clean/
  R/
    01_download.R
    02_clean.R
    03_visualize.R
  outputs/
  README.md

The key ideas: cache data, clean consistently, and separate download from analysis.

FBref: Team & Player Stats

Example: download Premier League team shooting stats for the 2024/25 season (season_end_year = 2025).

library(worldfootballR)
library(dplyr)

epl_shooting <- fb_season_team_stats(
  country = "ENG",
  gender = "M",
  season_end_year = 2025,
  tier = "1st",
  stat_type = "shooting"
)

dplyr::glimpse(epl_shooting)
head(epl_shooting)

FBref outputs are usually already table-like, but you’ll still want to standardize names and types before modeling or plotting. We’ll do that in the Clean & Tidy section.

Transfermarkt: Squads & Transfers

Example: get player URLs for a team and fetch transfer history.

library(worldfootballR)

team_players <- tm_team_player_urls(
  "https://www.transfermarkt.com/fc-bayern-munchen/startseite/verein/27"
)

transfers <- tm_player_transfer_history(player_urls = team_players)

head(transfers)

Transfermarkt data is great for contextual features: player age, market value, transfer fees (where available), squad churn, and career moves that might affect performance.

Understat: xG, Shots, and Match Data

Example: load league shots for EPL (season_start_year = 2024).

library(worldfootballR)

epl_shots <- load_understat_league_shots(
  league = "EPL",
  season_start_year = 2024
)

dplyr::glimpse(epl_shots)
head(epl_shots)

Shot-level data enables stronger analysis: xG trends, shot maps, finishing vs expected, and features for match prediction.

Clean & Tidy Your Data

Even when data looks “ready,” it’s worth standardizing: consistent names, numeric types, missing values, and team identifiers.

library(dplyr)
library(janitor)

epl_shooting_clean <- epl_shooting %>%
  janitor::clean_names() %>%
  mutate(across(where(is.character), ~ trimws(.)))

If you plan to join multiple datasets (FBref + Understat + Transfermarkt), decide early how you will match teams and players (names, IDs, URLs). Consistent keys prevent headaches later.

Visualize Insights (ggplot2 + optional ggsoccer)

Example: a simple team ranking plot (adjust column names to your table).

library(ggplot2)
library(dplyr)

# Replace 'shots_total' with a numeric column that exists in your table:
# names(epl_shooting_clean)

# ggplot(epl_shooting_clean, aes(x = reorder(squad, shots_total), y = shots_total)) +
#   geom_col() +
#   coord_flip() +
#   labs(title = "Top Teams by Shots (EPL 2024/25)", x = "", y = "Shots")

If you use ggsoccer for pitch plots, keep it as an optional section. The primary keyword here is worldfootballR; treat ggsoccer as a bonus.

Mini Project: Build an EPL Team Snapshot

This mini workflow downloads FBref team shooting stats, cleans them, and creates a compact “team snapshot” table. It’s a realistic deliverable for analysts.

library(worldfootballR)
library(dplyr)
library(janitor)

# 1) Download
epl_shooting <- fb_season_team_stats(
  country = "ENG",
  gender = "M",
  season_end_year = 2025,
  tier = "1st",
  stat_type = "shooting"
)

# 2) Clean
df <- epl_shooting %>%
  clean_names() %>%
  mutate(across(where(is.character), ~ trimws(.)))

# 3) Snapshot (adapt columns to what you have)
# Try: names(df) to see available columns
snapshot <- df %>%
  select(squad, matches("shots|xg|npxg|shots_on_target|goals")) %>%
  head(10)

snapshot

From here you can extend into rolling trends, opponent-adjusted metrics, and match prediction pipelines.

Want a structured playbook + reusable code?

📘 Get Mastering Football Data with worldfootballR — step-by-step projects, clean workflows, and reproducible analysis you can reuse.

Best Practices (Caching, Rate Limits, Reliability)

  • Respect rate limits: add pauses between requests; avoid aggressive scraping.
  • Cache locally: save raw pulls to disk so you don’t re-download every time.
  • Expect HTML changes: scraping-based tools can break when sites change layouts.
  • Separate download vs analysis: it makes debugging and reproducibility easier.
  • Document your versions: keep session info and package versions for long projects.
# Simple caching idea:
# saveRDS(epl_shooting, "data_raw/epl_shooting_2024_25.rds")
# epl_shooting <- readRDS("data_raw/epl_shooting_2024_25.rds")

Troubleshooting

1) “Function not found” or errors after install

  • Restart the R session, then run library(worldfootballR) again.
  • Update packages: update.packages(ask = FALSE).
  • Try the GitHub version if CRAN is behind.

2) Empty outputs / missing seasons

  • Double-check season_end_year and league tier.
  • Try a known completed season first.
  • Sources sometimes change table availability mid-season.

3) Rate limit / blocked requests

  • Slow down requests and cache results.
  • Avoid large loops without delays.

FAQ

Is worldfootballR on CRAN?

Yes. You can install worldfootballR from CRAN. For the newest features, use the GitHub version.

What’s the best data source for modeling?

FBref is great for structured season stats; Understat is best for xG and shot-level detail. Many projects combine both.

Can I use this for betting models?

You can use the data to build predictive models, but outcomes are uncertain and data sources can change. Focus on reproducible evaluation (backtesting), and respect each site’s terms.

How do I match teams/players across sources?

Create consistent keys early (team names + season + league) and be careful with naming variations. When possible, use stable identifiers such as URLs.

Next Steps

If you want to go beyond “data pulling” into real analytics projects, do this next:

  1. Build a reproducible pipeline (raw → clean → outputs).
  2. Create 2–3 reusable functions (download, clean, plot).
  3. Add one simple baseline model and evaluate it properly.
  4. Then iterate: features, priors (Bayesian), and backtesting.

📘 If you want a structured, step-by-step playbook: Mastering Football Data with worldfootballR.


Last updated: 2026. If a source changes its site structure, some functions may need updates.

Leave a Comment

Your email address will not be published. Required fields are marked *