R is a versatile programming language designed for statistical computing and graphics. It can act as a calculator, provide numerical and graphical summaries of data and handle a variety of specific analyses:contentReference[oaicite:0]{index=0}. Whether you’re exploring data, running statistical tests or building predictive models, R offers built-in functions and packages to streamline the workflow. The summary() function, for example, gives a quick overview of each variable’s distribution:contentReference[oaicite:1]{index=1}, and the lm() function fits linear models and returns coefficients and diagnostics:contentReference[oaicite:2]{index=2}. The following sections present 200 practical prompts and code snippets organized by topic, from basic data operations to advanced modeling and visualization.
Data Input and Output
Prompt 1: Read a CSV file
Use `read.csv()` to import a comma‑separated values file and store it in a data frame.
data <- read.csv('path/to/file.csv', header = TRUE, stringsAsFactors = FALSE)
Prompt 2: Read an Excel file
Load spreadsheet data from Excel files with the **readxl** package.
library(readxl)
data <- read_excel('path/to/file.xlsx', sheet = 1)
Prompt 3: Import data from a remote CSV
Read a CSV file from a web URL directly without downloading it first.
url <- 'https://example.com/data.csv'
data <- read.csv(url, header = TRUE)
Prompt 4: Read a JSON file
Parse JSON data into an R list or data frame using **jsonlite**.
library(jsonlite)
json_data <- fromJSON('data.json')
data <- as.data.frame(json_data)
Prompt 5: Read data from a SQL database
Connect to an SQLite database and query data via **DBI** and **RSQLite**.
library(DBI)
con <- dbConnect(RSQLite::SQLite(), 'database.sqlite')
query <- 'SELECT * FROM my_table'
data <- dbGetQuery(con, query)
dbDisconnect(con)
Prompt 6: Read and write RDS files
Load and save objects in R’s native serialized format with `saveRDS()` and `readRDS()`.
saveRDS(data, 'mydata.rds')
loaded_data <- readRDS('mydata.rds')
Prompt 7: Write a data frame to CSV
Export a data frame to a CSV file using `write.csv()`.
write.csv(data, 'output.csv', row.names = FALSE)
Prompt 8: Write data to Excel
Save a data frame to an Excel workbook with **writexl**.
library(writexl)
write_xlsx(list(Sheet1 = data), 'output.xlsx')
Prompt 9: Save workspace to an .RData file
Persist your current R environment by saving all objects to a file.
save.image('workspace.RData')
Prompt 10: Load a .RData file
Restore a previously saved R workspace.
load('workspace.RData')
Data Cleaning
Prompt 11: Inspect data structure
Use `str()` to display the internal structure of an R object and see its variables and types.
str(data)
Prompt 12: Identify missing values
Find missing (NA) values in your dataset using `is.na()` and count them.
sum(is.na(data))
Prompt 13: Replace missing values with the mean
Impute missing numeric values by replacing them with the column mean.
data$variable[is.na(data$variable)] <- mean(data$variable, na.rm = TRUE)
Prompt 14: Remove rows with missing values
Omit any rows containing missing values using `na.omit()`.
clean_data <- na.omit(data)
Prompt 15: Convert characters to factors
Transform character columns into factors for categorical analysis.
data$category <- as.factor(data$category)
Prompt 16: Rename columns
Change column names using `names()` or `dplyr::rename()`.
names(data)[names(data) == 'old_name'] <- 'new_name'
# Or with dplyr
library(dplyr)
data <- rename(data, new_name = old_name)
Prompt 17: Reorder columns
Rearrange the order of columns in a data frame.
data <- data[, c('col3', 'col1', 'col2', setdiff(names(data), c('col1','col2','col3')))]
Prompt 18: Filter rows based on a condition
Select only the rows that meet a logical criterion.
subset_data <- subset(data, variable > 10)
Prompt 19: Sort data frame by a column
Order data by ascending or descending values using `order()`.
sorted_data <- data[order(data$variable, decreasing = FALSE), ]
Prompt 20: Remove duplicate rows
Keep only unique rows using `unique()` or `distinct()`.
unique_data <- unique(data)
# Or with dplyr
library(dplyr)
unique_data <- distinct(data)
Data Transformation with dplyr
Prompt 21: Select specific columns
Use `select()` to choose a subset of columns from a data frame.
library(dplyr)
subset_data <- select(data, column1, column2, column3)
Prompt 22: Filter rows by criteria
Extract rows that satisfy given conditions using `filter()`.
filtered_data <- filter(data, column1 == 'A', column2 > 5)
Prompt 23: Mutate new columns
Create new variables derived from existing columns using `mutate()`.
library(dplyr)
data <- mutate(data, ratio = column1 / column2, log_value = log(column3))
Prompt 24: Summarize data by groups
Group data and compute summary statistics with `group_by()` and `summarise()`.
library(dplyr)
summ <- data %>% group_by(category) %>% summarise(mean_value = mean(value, na.rm = TRUE), count = n())
Prompt 25: Group by multiple variables
Group data by more than one variable for multi‑level summaries.
summary <- data %>% group_by(category, subgroup) %>% summarise(total = sum(value))
Prompt 26: Arrange data
Sort data within the pipeline using `arrange()`.
sorted <- data %>% arrange(desc(value))
Prompt 27: Count occurrences
Tabulate the number of observations per category with `count()`.
library(dplyr)
counts <- data %>% count(category)
Prompt 28: Select distinct rows
Extract unique rows for specified columns using `distinct()`.
unique_rows <- data %>% distinct(category, value)
Prompt 29: Join two data frames
Perform an inner join on two tables with common keys using `inner_join()`.
merged_data <- inner_join(df1, df2, by = 'id')
Prompt 30: Pivot data longer or wider
Reshape data using `pivot_longer()` and `pivot_wider()` from **tidyr**.
library(tidyr)
long <- pivot_longer(data, cols = starts_with('Q'), names_to = 'question', values_to = 'score')
wide <- pivot_wider(long, names_from = question, values_from = score)
Exploratory Data Analysis
Prompt 31: Compute summary statistics
Get an overview of each variable with the `summary()` function, which reports minimum, quartiles, median, mean and maximum【116883529303011†L61-L77】.
summary(data)
Prompt 32: Compute quantiles
Calculate specific quantiles (e.g., 25th and 75th percentiles) using `quantile()`.
quantiles <- quantile(data$variable, probs = c(0.25, 0.5, 0.75))
Prompt 33: Create a frequency table
Tabulate counts of unique values in a vector using `table()`.
freq <- table(data$category)
Prompt 34: Create a cross‑tabulation
Produce contingency tables for two categorical variables with `xtabs()` or `table()`.
crosstab <- table(data$category, data$group)
Prompt 35: Compute a correlation matrix
Calculate pairwise correlations for numeric variables using `cor()`.
cor_matrix <- cor(data[, sapply(data, is.numeric)], use = 'complete.obs')
Prompt 36: Compute covariance matrix
Assess covariance between variables with `cov()`.
cov_matrix <- cov(data[, sapply(data, is.numeric)], use = 'complete.obs')
Prompt 37: Plot a histogram
Visualize the distribution of a continuous variable with `hist()`.
hist(data$variable, breaks = 30, col = 'steelblue', main = 'Histogram', xlab = 'Value')
Prompt 38: Create a boxplot
Display the distribution and outliers of a variable using `boxplot()`.
boxplot(data$variable ~ data$group, main = 'Boxplot', xlab = 'Group', ylab = 'Variable')
Prompt 39: Create a scatter plot
Plot two numeric variables against each other with `plot()`.
plot(data$variable1, data$variable2, main = 'Scatter Plot', xlab = 'Variable 1', ylab = 'Variable 2', pch = 19)
Prompt 40: Make a pairwise scatterplot matrix
Explore relationships between multiple variables using `pairs()`.
pairs(data[, 1:4], main = 'Pairs Plot')
Basic Statistics
Prompt 41: Compute the mean
Calculate the arithmetic mean of a numeric vector with `mean()`.
avg <- mean(data$variable, na.rm = TRUE)
Prompt 42: Compute the median
Find the median (50th percentile) of a numeric vector.
med <- median(data$variable, na.rm = TRUE)
Prompt 43: Compute the standard deviation
Measure the spread of values around the mean using `sd()`.
std_dev <- sd(data$variable, na.rm = TRUE)
Prompt 44: Compute the variance
Compute the sample variance of a numeric vector using `var()`.
var_value <- var(data$variable, na.rm = TRUE)
Prompt 45: Compute the range
Get the minimum and maximum values with `range()`.
range_values <- range(data$variable, na.rm = TRUE)
Prompt 46: Compute the interquartile range
Calculate the IQR (difference between 75th and 25th percentiles) using `IQR()`.
iqr_value <- IQR(data$variable, na.rm = TRUE)
Prompt 47: Generate a random sample from a normal distribution
Create a vector of random numbers drawn from a normal distribution using `rnorm()`.
set.seed(123)
random_values <- rnorm(100, mean = 0, sd = 1)
Prompt 48: Compute summary of an entire data frame
Summarize all variables at once using `summary()`【116883529303011†L61-L77】.
summary_stats <- summary(data)
Prompt 49: Compute covariance matrix
Calculate variances and covariances for numeric variables using `cov()`【116883529303011†L116-L132】.
cov_mat <- cov(data[, sapply(data, is.numeric)])
Prompt 50: Compute correlation coefficient
Measure linear relationships between two variables using `cor()`.
correlation <- cor(data$variable1, data$variable2, use = 'complete.obs')
Statistical Tests
Prompt 51: One‑sample t‑test
Test whether the mean of a sample differs from a hypothesized value.
t.test(data$variable, mu = 0)
Prompt 52: Two‑sample t‑test
Compare means of two independent groups using a two‑sample t‑test.
t.test(variable ~ group, data = data)
Prompt 53: Paired t‑test
Compare means of paired observations, such as before‑after measurements.
t.test(data$pre, data$post, paired = TRUE)
Prompt 54: Chi‑square test of independence
Assess the association between two categorical variables.
chisq.test(table(data$category, data$group))
Prompt 55: Shapiro–Wilk normality test
Check normality of a numeric variable using `shapiro.test()`.
shapiro.test(data$variable)
Prompt 56: Correlation test
Test whether the correlation coefficient differs from zero.
cor.test(data$variable1, data$variable2, method = 'pearson')
Prompt 57: One‑way ANOVA
Compare means across more than two groups using analysis of variance.
anova_result <- aov(variable ~ group, data = data)
summary(anova_result)
Prompt 58: Wilcoxon rank‑sum test
Perform a non‑parametric test to compare two independent samples.
wilcox.test(variable ~ group, data = data)
Prompt 59: Kruskal–Wallis test
Non‑parametric alternative to one‑way ANOVA for more than two groups.
kruskal.test(variable ~ group, data = data)
Prompt 60: Proportion test
Test equality of proportions for two samples using `prop.test()`.
prop.test(x = c(40, 50), n = c(100, 120))
Linear and Generalized Linear Models
Prompt 61: Fit a simple linear regression
Model the relationship between a response and a single predictor using `lm()`【496611002125600†L14-L32】.
lm_fit <- lm(y ~ x, data = data)
summary(lm_fit)
Prompt 62: Fit a multiple linear regression
Include multiple predictors in a linear model.
lm_fit <- lm(y ~ x1 + x2 + x3, data = data)
summary(lm_fit)
Prompt 63: Fit a polynomial regression
Use higher‑order terms to capture non‑linear relationships.
poly_fit <- lm(y ~ poly(x, degree = 2, raw = TRUE), data = data)
summary(poly_fit)
Prompt 64: Fit a logistic regression
Model a binary response variable using `glm()` with the binomial family.
logit_fit <- glm(y ~ x1 + x2, data = data, family = binomial)
summary(logit_fit)
Prompt 65: Fit a Poisson regression
Model count data using `glm()` with the Poisson family.
pois_fit <- glm(count ~ x1 + offset(log(exposure)), data = data, family = poisson)
summary(pois_fit)
Prompt 66: Fit a linear model without an intercept
Suppress the intercept term by adding `0 +` in the formula【496611002125600†L134-L141】.
lm_no_intercept <- lm(y ~ 0 + x1 + x2, data = data)
coef(lm_no_intercept)
Prompt 67: Extract coefficients from a model
Obtain estimated coefficients from an `lm` or `glm` object using `coef()`.
coefficients <- coef(lm_fit)
Prompt 68: Predict new values
Generate predictions on new data using `predict()`.
new_data <- data.frame(x1 = c(1, 2), x2 = c(3, 4))
predictions <- predict(lm_fit, newdata = new_data)
Prompt 69: Plot diagnostic plots
Inspect model diagnostics such as residuals and fitted values using `plot(lm_fit)`.
par(mfrow = c(2, 2))
plot(lm_fit)
Prompt 70: Summarize model output
View summary statistics, coefficients and diagnostic metrics with `summary()`.
summary(lm_fit)
Advanced Modeling and Machine Learning
Prompt 71: Split data into training and testing sets
Randomly partition your dataset into training and testing subsets.
set.seed(42)
train_index <- sample(seq_len(nrow(data)), size = 0.7 * nrow(data))
train <- data[train_index, ]
test <- data[-train_index, ]
Prompt 72: Perform k‑fold cross‑validation
Use the **caret** package to perform k‑fold cross‑validation when training models.
library(caret)
control <- trainControl(method = 'cv', number = 5)
cv_model <- train(y ~ ., data = data, method = 'lm', trControl = control)
Prompt 73: Fit a decision tree
Create a classification or regression tree using **rpart**.
library(rpart)
tree_model <- rpart(Species ~ ., data = iris, method = 'class')
printcp(tree_model)
Prompt 74: Fit a random forest
Train an ensemble of decision trees using **randomForest**.
library(randomForest)
rf_model <- randomForest(Species ~ ., data = iris, ntree = 500)
print(rf_model)
Prompt 75: Fit a gradient boosting machine
Use the **xgboost** package for gradient boosting on numeric matrices.
library(xgboost)
# prepare matrices
label <- as.numeric(iris$Species) - 1
train_matrix <- xgb.DMatrix(data = as.matrix(iris[, -5]), label = label)
params <- list(objective = 'multi:softprob', num_class = 3)
model <- xgb.train(params, train_matrix, nrounds = 50)
Prompt 76: Fit a support vector machine
Use **e1071** or **kernlab** to build an SVM classifier.
library(e1071)
svm_model <- svm(Species ~ ., data = iris, kernel = 'radial')
summary(svm_model)
Prompt 77: Fit k‑nearest neighbors
Implement KNN classification via the **class** package or **caret**.
library(class)
k <- 3
train_x <- iris[ , -5]
train_y <- iris$Species
pred <- knn(train_x, train_x, train_y, k = k)
Prompt 78: Fit a Naive Bayes classifier
Use **e1071** to fit a naive Bayes model for categorical data.
library(e1071)
nb_model <- naiveBayes(Species ~ ., data = iris)
nb_pred <- predict(nb_model, iris)
Prompt 79: Fit a ridge regression
Apply regularization to linear models using **glmnet** with alpha = 0 for ridge.
library(glmnet)
x <- model.matrix(y ~ ., data)[, -1]
y_vec <- data$y
ridge_fit <- cv.glmnet(x, y_vec, alpha = 0)
coef(ridge_fit, s = 'lambda.min')
Prompt 80: Fit a lasso regression
Use **glmnet** with alpha = 1 to perform lasso penalization.
lasso_fit <- cv.glmnet(x, y_vec, alpha = 1)
coef(lasso_fit, s = 'lambda.min')
Clustering and Unsupervised Learning
Prompt 81: Perform k‑means clustering
Partition observations into k clusters using `kmeans()`.
set.seed(123)
km <- kmeans(iris[, -5], centers = 3)
km$cluster
Prompt 82: Determine optimal number of clusters (elbow)
Plot total within‑cluster sum of squares for various k values to choose the optimal number.
wss <- sapply(1:10, function(k) {
kmeans(iris[, -5], centers = k, nstart = 10)$tot.withinss
})
plot(1:10, wss, type = 'b', pch = 19, frame = FALSE, xlab = 'k', ylab = 'Total Within Sum of Squares')
Prompt 83: Perform hierarchical clustering
Use `hclust()` on a distance matrix to build a dendrogram.
d <- dist(iris[, -5])
hc <- hclust(d, method = 'complete')
plot(hc, labels = iris$Species)
Prompt 84: Plot a dendrogram
Visualize hierarchical clustering results with a dendrogram.
plot(as.dendrogram(hc), main = 'Hierarchical Clustering Dendrogram')
Prompt 85: Standardize data before clustering
Scale variables to have mean 0 and unit variance using `scale()`.
scaled_data <- scale(iris[, -5])
km_scaled <- kmeans(scaled_data, centers = 3)
Prompt 86: Perform principal component analysis
Reduce dimensionality of numeric data using `prcomp()`.
pca <- prcomp(iris[, -5], scale. = TRUE)
summary(pca)
Prompt 87: Plot a PCA biplot
Visualize principal components and variable loadings.
biplot(pca, scale = 0, main = 'PCA Biplot')
Prompt 88: Perform t‑SNE
Apply t‑distributed stochastic neighbor embedding via **Rtsne** for high‑dimensional data.
library(Rtsne)
tsne_out <- Rtsne(as.matrix(iris[, -5]), dims = 2, perplexity = 30)
plot(tsne_out$Y, col = as.numeric(iris$Species), pch = 19)
Prompt 89: Perform DBSCAN clustering
Density‑based clustering with **dbscan** package.
library(dbscan)
cl <- dbscan(iris[, -5], eps = 0.5, minPts = 5)
cl$cluster
Prompt 90: Perform factor analysis
Identify latent variables influencing observed data using `factanal()`.
fa <- factanal(iris[, -5], factors = 2, rotation = 'varimax')
fa
Time Series Analysis
Prompt 91: Create a time series object
Use `ts()` to convert a numeric vector into a time series object.
my_ts <- ts(data$variable, start = c(2020, 1), frequency = 12)
Prompt 92: Plot a time series
Visualize a time series object using `plot()`.
plot(my_ts, main = 'Time Series Plot', ylab = 'Value', xlab = 'Time')
Prompt 93: Decompose a time series
Break down a series into trend, seasonal and irregular components using `decompose()`.
components <- decompose(my_ts, type = 'additive')
plot(components)
Prompt 94: Check stationarity (ADF test)
Use the Augmented Dickey–Fuller test from **tseries** to test for stationarity.
library(tseries)
adf.test(my_ts)
Prompt 95: Fit an ARIMA model
Automatically select and fit an ARIMA model using **forecast**.
library(forecast)
fit <- auto.arima(my_ts)
fit
Prompt 96: Forecast future values
Predict future observations with the fitted model and plot the forecast.
fc <- forecast(fit, h = 12)
plot(fc)
Prompt 97: Plot forecast results
Visualize the predicted values and prediction intervals.
autoplot(fc) + ggtitle('Forecast')
Prompt 98: Fit exponential smoothing (Holt–Winters)
Use `HoltWinters()` for exponential smoothing.
hw <- HoltWinters(my_ts)
plot(hw)
Prompt 99: Perform STL decomposition
Apply Seasonal-Trend decomposition using Loess.
stl_fit <- stl(my_ts, s.window = 'periodic')
plot(stl_fit)
Prompt 100: Evaluate forecast accuracy
Compute accuracy metrics such as MAE and RMSE using `accuracy()`.
acc <- accuracy(fc)
acc
Data Visualization with ggplot2
Prompt 101: Create a scatter plot
Use `ggplot()` with `geom_point()` to visualize the relationship between two variables.
library(ggplot2)
ggplot(data, aes(x = variable1, y = variable2)) + geom_point() + ggtitle('Scatter Plot')
Prompt 102: Create a line plot
Plot time series or ordered data with `geom_line()`.
ggplot(data, aes(x = time, y = value)) + geom_line(color = 'blue') + labs(title = 'Line Plot', x = 'Time', y = 'Value')
Prompt 103: Create a bar chart
Represent categorical data as bars using `geom_bar()`.
ggplot(data, aes(x = category)) + geom_bar(fill = 'tomato') + labs(title = 'Bar Chart', x = 'Category', y = 'Count')
Prompt 104: Create a histogram
Visualize the distribution of a continuous variable with `geom_histogram()`.
ggplot(data, aes(x = variable)) + geom_histogram(binwidth = 1, fill = 'skyblue', color = 'black')
Prompt 105: Add facets
Use `facet_wrap()` or `facet_grid()` to create trellis plots.
ggplot(data, aes(x = value)) + geom_histogram(binwidth = 1) + facet_wrap(~ group) + labs(title = 'Faceted Histograms')
Prompt 106: Customize colors and themes
Apply a color palette and theme to improve plot aesthetics.
ggplot(data, aes(x = variable1, y = variable2, color = group)) + geom_point() + theme_minimal() + scale_color_brewer(palette = 'Set2')
Prompt 107: Add a smooth line
Overlay a regression or loess smooth using `geom_smooth()`.
ggplot(data, aes(x = x, y = y)) + geom_point() + geom_smooth(method = 'lm', se = FALSE) + labs(title = 'Scatter with Regression Line')
Prompt 108: Create a boxplot
Use `geom_boxplot()` to display distributions for multiple groups.
ggplot(data, aes(x = group, y = value)) + geom_boxplot(fill = 'lightgreen') + labs(title = 'Boxplot')
Prompt 109: Create a density plot
Plot a density estimate with `geom_density()`.
ggplot(data, aes(x = variable, fill = group)) + geom_density(alpha = 0.5) + labs(title = 'Density Plot')
Prompt 110: Save a plot to file
Export plots to PNG, PDF or other formats using `ggsave()`.
p <- ggplot(data, aes(x = variable1, y = variable2)) + geom_point()
ggsave('scatter_plot.png', p, width = 6, height = 4)
Model Evaluation and Validation
Prompt 111: Compute a confusion matrix
Use `table()` or **caret** to generate a confusion matrix for classification results.
pred <- predict(rf_model, iris)
conf_matrix <- table(Predicted = pred, Actual = iris$Species)
Prompt 112: Compute accuracy, precision and recall
Calculate evaluation metrics from the confusion matrix.
tp <- conf_matrix[1,1]; fp <- conf_matrix[2,1]; fn <- conf_matrix[1,2]; tn <- conf_matrix[2,2]
accuracy <- sum(diag(conf_matrix)) / sum(conf_matrix)
precision <- tp / (tp + fp)
recall <- tp / (tp + fn)
Prompt 113: Plot a ROC curve and compute AUC
Visualize trade‑offs between true positive and false positive rates using **pROC**.
library(pROC)
prob <- predict(rf_model, iris, type = 'prob')[,1]
roc_obj <- roc(iris$Species, prob)
plot(roc_obj)
auc_value <- auc(roc_obj)
Prompt 114: Calculate R‑squared
Retrieve the coefficient of determination for regression models.
rsq <- summary(lm_fit)$r.squared
Prompt 115: Compute mean squared error (MSE)
Evaluate regression performance by averaging squared residuals.
pred <- predict(lm_fit, newdata = data)
mse <- mean((data$y - pred)^2)
Prompt 116: Perform cross‑validation with caret
Run resampling techniques in **caret** for general models.
set.seed(123)
control <- trainControl(method = 'repeatedcv', number = 10, repeats = 3)
cv_result <- train(y ~ ., data = data, method = 'lm', trControl = control)
Prompt 117: Plot residuals vs fitted values
Assess homoscedasticity by plotting residuals against fitted values.
residuals <- resid(lm_fit)
fitted_vals <- fitted(lm_fit)
plot(fitted_vals, residuals, xlab = 'Fitted Values', ylab = 'Residuals', main = 'Residuals vs Fitted')
abline(h = 0, col = 'red')
Prompt 118: Identify influential points
Calculate Cook’s distance to detect influential observations.
cooks <- cooks.distance(lm_fit)
plot(cooks, type = 'h', main = "Cook's distance")
Prompt 119: Perform stepwise model selection
Use `step()` to carry out stepwise selection based on AIC.
step_model <- step(lm_full, direction = 'both')
Prompt 120: Validate model assumptions
Check assumptions like normality of residuals, independence and linearity.
par(mfrow = c(2, 2))
plot(lm_fit)
shapiro.test(residuals(lm_fit))
Data Structures
Prompt 121: Create a vector
Combine elements into an atomic vector using the `c()` function.
v <- c(1, 2, 3, 4, 5)
Prompt 122: Create a matrix
Define a two‑dimensional matrix with `matrix()`.
m <- matrix(1:9, nrow = 3, ncol = 3, byrow = TRUE)
Prompt 123: Create a list
Store heterogeneous objects in a list.
lst <- list(numbers = 1:5, letters = letters[1:3], data = data.frame(x = 1:3, y = 4:6))
Prompt 124: Create a data frame
Construct a data frame from vectors of equal length.
df <- data.frame(name = c('A','B','C'), score = c(90, 85, 88))
Prompt 125: Create a tibble
Use the **tibble** package to create a modern tibble with enhanced printing.
library(tibble)
tbl <- tibble(name = c('A','B'), value = c(1, 2))
Prompt 126: Convert data frame to tibble
Transform a base data frame into a tibble.
tbl <- as_tibble(df)
Prompt 127: Access elements by index
Refer to individual elements in vectors, lists or data frames using bracket notation.
third_element <- v[3];
list_item <- lst$numbers[2];
cell <- df[1, 'score']
Prompt 128: Combine objects by rows and columns
Use `rbind()` and `cbind()` to merge data structures.
combined_rows <- rbind(df, data.frame(name = 'D', score = 92))
combined_cols <- cbind(df, grade = c('A', 'B', 'A'))
Prompt 129: Reshape an array
Create and manipulate multidimensional arrays with `array()`.
arr <- array(1:24, dim = c(2, 3, 4))
arr[1, , 2]
Prompt 130: Convert factors
Convert factor variables to numeric or character to suit analysis.
data$factor_var <- as.numeric(as.character(data$factor_var))
Functions and Programming Constructs
Prompt 131: Define a simple function
Encapsulate reusable code by defining a function using `function()`.
square <- function(x) {
x^2
}
square(4)
Prompt 132: Use conditional statements
Control flow with `if`, `else if`, and `else`.
check_number <- function(x) {
if (x > 0) {
'positive'
} else if (x < 0) {
'negative'
} else {
'zero'
}
}
check_number(-5)
Prompt 133: Use a for loop
Iterate over a sequence of values using a `for` loop.
sum <- 0
for (i in 1:10) {
sum <- sum + i
}
sum
Prompt 134: Use a while loop
Execute a block repeatedly while a condition is TRUE.
count <- 1
while (count <= 5) {
print(count)
count <- count + 1
}
Prompt 135: Use a repeat loop
Create an infinite loop that must be broken manually using `break()`.
x <- 1
repeat {
if (x > 3) break
print(x)
x <- x + 1
}
Prompt 136: Apply a function with `apply()`
Apply a function across rows or columns of a matrix or array.
m <- matrix(1:9, nrow = 3)
row_sums <- apply(m, 1, sum)
col_means <- apply(m, 2, mean)
Prompt 137: Use `lapply()`
Apply a function to each element of a list and return a list.
result <- lapply(lst$numbers, function(x) x * 2)
Prompt 138: Use `sapply()`
Simplify the result of `lapply()` to a vector or matrix when possible.
result_vec <- sapply(lst$numbers, function(x) x * 2)
Prompt 139: Use `map` from purrr
Iterate over lists or vectors with functional programming using **purrr**.
library(purrr)
map_dbl(1:5, ~ .x^2)
Prompt 140: Use anonymous functions
Write functions without naming them for short, disposable operations.
sapply(1:5, function(x) x * 3)
String Manipulation
Prompt 141: Calculate string length
Use `nchar()` to obtain the number of characters in a string.
nchar('R programming')
Prompt 142: Convert to upper and lower case
Change the case of characters with `toupper()` and `tolower()`.
toupper('hello'); tolower('WORLD')
Prompt 143: Concatenate strings
Combine multiple strings using `paste()` or `paste0()`.
full <- paste('Hello', 'world', sep = ', ')
no_space <- paste0('R', 'Stats')
Prompt 144: Split a string
Divide a string into parts based on a delimiter using `strsplit()`.
parts <- strsplit('apple,banana,cherry', split = ',')[[1]]
Prompt 145: Find and replace patterns
Replace text patterns with `gsub()` or `sub()`.
text <- 'R is great'
new_text <- gsub('great', 'awesome', text)
Prompt 146: Extract substrings
Select a portion of a string using `substr()` or `substring()`.
substr('statistics', start = 1, stop = 4)
Prompt 147: Detect pattern presence
Check if a pattern exists in a string with `grepl()`.
grepl('data', 'big data analysis')
Prompt 148: Use regular expressions to match patterns
Match complex string patterns using regular expressions via `gregexpr()`.
matches <- gregexpr('[0-9]+', 'Room 101, Floor 2')
regmatches('Room 101, Floor 2', matches)
Prompt 149: Remove whitespace
Trim leading and trailing whitespace with `trimws()`.
trimws(' hello ')
Prompt 150: Convert factors to character strings
Transform factor variables into strings for text processing.
char_var <- as.character(factor_var)
Date and Time Handling
Prompt 151: Convert string to Date
Parse a character string into a Date object using `as.Date()`.
dates <- as.Date(c('2025-10-08', '2025-12-31'))
Prompt 152: Parse date‑time with lubridate
Use **lubridate** to handle date‑time formats more flexibly.
library(lubridate)
dt <- ymd_hms('2025-10-08 14:30:00')
Prompt 153: Extract year, month and day
Retrieve components of a date or date‑time object.
year <- year(dt); month <- month(dt); day <- day(dt)
Prompt 154: Compute difference between dates
Calculate the time interval between two dates using `difftime()`.
start <- as.Date('2025-01-01')
end <- as.Date('2025-12-31')
interval <- difftime(end, start, units = 'days')
Prompt 155: Add or subtract time durations
Use lubridate to add or subtract periods and durations.
new_date <- dt + days(7) - hours(2)
Prompt 156: Round dates to the nearest unit
Round date‑times to a specified unit such as week or month.
rounded <- round_date(dt, unit = 'hour')
Prompt 157: Format dates for display
Customize date output with `format()` or lubridate’s `stamp()`.
formatted <- format(dt, '%d-%b-%Y %H:%M')
Prompt 158: Handle time zones
Specify and convert between time zones using `with_tz()` and `force_tz()`.
dt_local <- ymd_hms('2025-10-08 14:30:00', tz = 'Europe/Madrid')
dt_utc <- with_tz(dt_local, 'UTC')
Prompt 159: Create a sequence of dates
Generate regularly spaced dates with `seq.Date()`.
date_seq <- seq(as.Date('2025-01-01'), as.Date('2025-01-10'), by = 'day')
Prompt 160: Group data by date components
Aggregate data by year, month or day using `floor_date()` and dplyr.
library(lubridate)
monthly_summary <- data %>% group_by(month = floor_date(date, 'month')) %>% summarise(total = sum(value))
Randomization and Simulation
Prompt 161: Set a random seed
Ensure reproducible results by setting a seed with `set.seed()`.
set.seed(123)
Prompt 162: Sample from a vector
Randomly sample elements from a vector using `sample()`.
sample_vec <- sample(1:100, size = 10, replace = FALSE)
Prompt 163: Generate random numbers from distributions
Use functions like `rnorm()`, `runif()`, and `rbinom()` to sample from normal, uniform and binomial distributions.
norm_samples <- rnorm(10, mean = 0, sd = 1)
unif_samples <- runif(10, min = 0, max = 1)
binom_samples <- rbinom(10, size = 20, prob = 0.5)
Prompt 164: Simulate dice rolls
Roll a fair six‑sided die multiple times using `sample()`.
dice_rolls <- sample(1:6, size = 20, replace = TRUE)
Prompt 165: Perform bootstrap sampling
Generate bootstrap samples to estimate variability of statistics.
set.seed(123)
boot_means <- replicate(1000, mean(sample(data$variable, replace = TRUE)))
Prompt 166: Run a Monte Carlo simulation
Use repeated random sampling to estimate a quantity.
set.seed(42)
monte_carlo <- function(n_sim) {
successes <- 0
for (i in 1:n_sim) {
x <- runif(1)
y <- runif(1)
if (x^2 + y^2 <= 1) successes <- successes + 1
}
4 * successes / n_sim
}
p_estimate <- monte_carlo(10000)
Prompt 167: Generate a random permutation
Shuffle the order of elements in a vector.
perm <- sample(1:10)
Prompt 168: Randomly split a dataset
Divide data into random subsets (e.g., training and testing).
set.seed(123)
indices <- sample(seq_len(nrow(data)))
train_indices <- indices[1:floor(0.8 * length(indices))]
test_indices <- indices[-train_indices]
train_set <- data[train_indices, ]
test_set <- data[test_indices, ]
Prompt 169: Simulate a Poisson process
Generate arrival times from a Poisson process with rate λ.
lambda <- 2
n <- 100
interarrival <- rexp(n, rate = lambda)
arrival_times <- cumsum(interarrival)
Prompt 170: Simulate a Markov chain
Model transitions between states with a transition matrix.
states <- c('A','B','C')
transition <- matrix(c(0.5,0.3,0.2,
0.4,0.4,0.2,
0.2,0.5,0.3),
nrow = 3, byrow = TRUE, dimnames = list(states, states))
set.seed(123)
current <- 'A'
chain <- character(10)
chain[1] <- current
for (i in 2:10) {
current <- sample(states, 1, prob = transition[current, ])
chain[i] <- current
}
File and Directory Management
Prompt 171: List files in a directory
Use `list.files()` to get a vector of file names.
files <- list.files(path = '.', pattern = '*.csv', full.names = TRUE)
Prompt 172: Check if a file exists
Verify whether a file exists using `file.exists()`.
exists <- file.exists('myfile.csv')
Prompt 173: Read multiple files and combine
Use `lapply()` to read multiple files and `bind_rows()` to merge them.
library(readr)
file_list <- list.files(pattern = '*.csv')
all_data <- dplyr::bind_rows(lapply(file_list, read_csv))
Prompt 174: Create a new directory
Create directories with `dir.create()`.
dir.create('new_folder')
Prompt 175: Delete a file or directory
Remove files or directories using `file.remove()` and `unlink()`.
file.remove('old_file.csv')
unlink('old_folder', recursive = TRUE)
Prompt 176: Copy a file
Duplicate a file using `file.copy()`.
file.copy('source.txt', 'destination.txt')
Prompt 177: Move or rename a file
Change the name or location of a file using `file.rename()`.
file.rename('old_name.txt', 'new_name.txt')
Prompt 178: Write data to a compressed file
Compress data when writing to disk using `gzfile()`.
gz_con <- gzfile('data.csv.gz', 'w')
write.csv(data, gz_con)
close(gz_con)
Prompt 179: Get file information
Retrieve metadata such as size and modification time using `file.info()`.
info <- file.info('data.csv')
size <- info$size; modified <- info$mtime
Prompt 180: Redirect output to a file
Capture printed output using `sink()` to write to a text file.
sink('log.txt')
print(summary(data))
sink()
Web Data and API
Prompt 181: Download a file from a URL
Use `download.file()` to retrieve files from the web.
download.file('https://example.com/data.csv', destfile = 'downloaded.csv')
Prompt 182: Read an HTML table
Use **rvest** to scrape tables from web pages.
library(rvest)
page <- read_html('https://example.com')
table <- html_table(html_nodes(page, 'table')[[1]])
Prompt 183: Extract nodes with CSS selectors
Select HTML elements using CSS selectors with **rvest**.
titles <- html_text(html_nodes(page, 'h2.title'))
Prompt 184: Parse an XML file
Use **xml2** to read and parse XML data.
library(xml2)
xml <- read_xml('file.xml')
values <- xml_text(xml_find_all(xml, '//tag'))
Prompt 185: Fetch JSON data from an API
Use **httr** and **jsonlite** to request and parse JSON.
library(httr); library(jsonlite)
response <- GET('https://api.example.com/data')
data_json <- content(response, as = 'text')
data <- fromJSON(data_json)
Prompt 186: Send an HTTP GET request
Retrieve data using `GET()` with query parameters.
res <- GET('https://api.example.com/search', query = list(q = 'R programming'))
content(res, 'text')
Prompt 187: Convert API data to a data frame
Transform JSON or list structures into tidy data frames.
df <- as.data.frame(data)
Prompt 188: Scrape multiple pages
Loop through multiple pages to collect data in a single data set.
base_url <- 'https://example.com/page='
results <- list()
for (i in 1:5) {
page <- read_html(paste0(base_url, i))
results[[i]] <- html_text(html_nodes(page, '.item'))
}
items <- unlist(results)
Prompt 189: Extract meta information from a web page
Use **rvest** to extract metadata such as title and description.
meta_title <- html_text(html_nodes(page, 'title'))
meta_desc <- html_attr(html_nodes(page, "meta[name='description']"), 'content')
Prompt 190: Handle API authentication
Send authenticated requests using headers or tokens with **httr**.
token <- 'your_token_here'
res <- GET('https://api.example.com/protected', add_headers(Authorization = paste('Bearer', token)))
content(res, 'text')
Miscellaneous Tips and Tricks
Prompt 191: Install and load packages
Install packages with `install.packages()` and load them with `library()`.
install.packages('dplyr')
library(dplyr)
Prompt 192: Check installed packages
List packages currently installed on your system.
installed <- installed.packages()
head(installed[, 'Package'])
Prompt 193: Remove objects from workspace
Clean up memory by removing objects with `rm()` and using `gc()`.
rm(list = ls())
gc()
Prompt 194: Set working directory
Change the working directory using `setwd()` and verify with `getwd()`.
setwd('/path/to/project')
getwd()
Prompt 195: Use pipes (`%>%`)
Make code more readable by chaining operations with the pipe operator from **magrittr** or **dplyr**.
library(dplyr)
result <- data %>% filter(value > 0) %>% summarise(mean = mean(value))
Prompt 196: Profile code performance
Measure execution time of expressions using `system.time()`.
execution <- system.time({
Sys.sleep(1)
})
Prompt 197: Vectorize operations
Avoid explicit loops by performing vectorized arithmetic for speed.
v <- 1:100
squared <- v^2
Prompt 198: Use parallel computing
Leverage multiple cores with the **parallel** package.
library(parallel)
cl <- makeCluster(detectCores() - 1)
parSapply(cl, 1:4, function(x) x^2)
stopCluster(cl)
Prompt 199: Document functions with roxygen2
Use roxygen2 comments to document your functions for package development.
#' Calculate the square
#'
#' @param x A numeric value
#' @return The square of x
square <- function(x) { x^2 }
Prompt 200: Create reproducible analysis
Ensure reproducibility by setting seeds and recording session information.
set.seed(123)
# analysis code here
info <- sessionInfo()

