The healthcare sector is undergoing a data revolution. Widespread adoption of electronic health records has grown from roughly 9 % of U.S. hospitals in 2008 to over 75 % by 2014, while advances in wearables, genomics and digital therapeutics generate vast streams of medical information. Yet most resources for analysing these data remain in expensive courses or focus on SAS or Python. This book fills that gap by teaching readers how to harness the full power of the R programming language for medical data science.
Beginning with an overview of healthcare data types—clinical records, clinical trials, sensor outputs, multi‑omics and claims—the text shows how to import, clean and wrangle data with the tidyverse. It covers state‑of‑the‑art techniques for handling missing values via multivariate imputation by chained equations; the mice package can impute mixtures of continuous, binary and categorical variables and offers diagnostic plots to check imputation quality. Exploratory visualisation techniques using ggplot2 and Shiny lead naturally into statistical modelling. The book introduces regression, hypothesis testing, causal inference and advanced multivariate methods, and then delves into survival analysis with the survival package—a cornerstone for time‑to‑event data. Extensions include parametric models, competing‑risk and multi‑state models, and joint models linking longitudinal biomarker trajectories with event outcomes.
The machine‑learning section demonstrates how to build classification and regression models using tidymodels, including random forests and gradient boosting for survival outcomes. Epidemiological modelling chapters explain how to estimate time‑varying reproduction numbers using EpiEstim and to simulate infectious‑disease dynamics with differential‑equation solvers. Real‑world case studies illustrate applications such as predicting hospital readmission risk, analysing oncology survival data and forecasting epidemic curves. Privacy and ethics receive dedicated attention: the book describes de‑identification practices like those used in the MIMIC‑III database, which removes all eighteen HIPAA identifiers and shifts dates while retaining clinical intervals. It also discusses regulatory frameworks (HIPAA, GDPR) and the importance of fairness in AI.
This comprehensive guide is designed for clinicians, public‑health researchers, data scientists and students who want to turn raw clinical data into evidence‑based decisions. With clear explanations, reproducible R code and curated datasets, it bridges the gap between statistics, machine learning and medical practice.
Pages: 38






Reviews
There are no reviews yet.