Square blue graphic featuring the R logo and the text ‘R Programming – Data Analysis & Modeling – Code Snippets | Data Science’, representing R programming for data analysis

200 R Programming Prompts & Code Snippets for Data Analysis and Modeling

R is a versatile programming language designed for statistical computing and graphics. It can act as a calculator, provide numerical and graphical summaries of data and handle a variety of specific analyses:contentReference[oaicite:0]{index=0}. Whether you’re exploring data, running statistical tests or building predictive models, R offers built-in functions and packages to streamline the workflow. The summary() function, for example, gives a quick overview of each variable’s distribution:contentReference[oaicite:1]{index=1}, and the lm() function fits linear models and returns coefficients and diagnostics:contentReference[oaicite:2]{index=2}. The following sections present 200 practical prompts and code snippets organized by topic, from basic data operations to advanced modeling and visualization.

Data Input and Output

Prompt 1: Read a CSV file

Use `read.csv()` to import a comma‑separated values file and store it in a data frame.

data <- read.csv('path/to/file.csv', header = TRUE, stringsAsFactors = FALSE)

Prompt 2: Read an Excel file

Load spreadsheet data from Excel files with the **readxl** package.

library(readxl)
data <- read_excel('path/to/file.xlsx', sheet = 1)

Prompt 3: Import data from a remote CSV

Read a CSV file from a web URL directly without downloading it first.

url <- 'https://example.com/data.csv'
data <- read.csv(url, header = TRUE)

Prompt 4: Read a JSON file

Parse JSON data into an R list or data frame using **jsonlite**.

library(jsonlite)
json_data <- fromJSON('data.json')
data <- as.data.frame(json_data)

Prompt 5: Read data from a SQL database

Connect to an SQLite database and query data via **DBI** and **RSQLite**.

library(DBI)
con <- dbConnect(RSQLite::SQLite(), 'database.sqlite')
query <- 'SELECT * FROM my_table'
data <- dbGetQuery(con, query)
dbDisconnect(con)

Prompt 6: Read and write RDS files

Load and save objects in R’s native serialized format with `saveRDS()` and `readRDS()`.

saveRDS(data, 'mydata.rds')
loaded_data <- readRDS('mydata.rds')

Prompt 7: Write a data frame to CSV

Export a data frame to a CSV file using `write.csv()`.

write.csv(data, 'output.csv', row.names = FALSE)

Prompt 8: Write data to Excel

Save a data frame to an Excel workbook with **writexl**.

library(writexl)
write_xlsx(list(Sheet1 = data), 'output.xlsx')

Prompt 9: Save workspace to an .RData file

Persist your current R environment by saving all objects to a file.

save.image('workspace.RData')

Prompt 10: Load a .RData file

Restore a previously saved R workspace.

load('workspace.RData')

Data Cleaning

Prompt 11: Inspect data structure

Use `str()` to display the internal structure of an R object and see its variables and types.

str(data)

Prompt 12: Identify missing values

Find missing (NA) values in your dataset using `is.na()` and count them.

sum(is.na(data))

Prompt 13: Replace missing values with the mean

Impute missing numeric values by replacing them with the column mean.

data$variable[is.na(data$variable)] <- mean(data$variable, na.rm = TRUE)

Prompt 14: Remove rows with missing values

Omit any rows containing missing values using `na.omit()`.

clean_data <- na.omit(data)

Prompt 15: Convert characters to factors

Transform character columns into factors for categorical analysis.

data$category <- as.factor(data$category)

Prompt 16: Rename columns

Change column names using `names()` or `dplyr::rename()`.

names(data)[names(data) == 'old_name'] <- 'new_name'
# Or with dplyr
library(dplyr)
data <- rename(data, new_name = old_name)

Prompt 17: Reorder columns

Rearrange the order of columns in a data frame.

data <- data[, c('col3', 'col1', 'col2', setdiff(names(data), c('col1','col2','col3')))]

Prompt 18: Filter rows based on a condition

Select only the rows that meet a logical criterion.

subset_data <- subset(data, variable > 10)

Prompt 19: Sort data frame by a column

Order data by ascending or descending values using `order()`.

sorted_data <- data[order(data$variable, decreasing = FALSE), ]

Prompt 20: Remove duplicate rows

Keep only unique rows using `unique()` or `distinct()`.

unique_data <- unique(data)
# Or with dplyr
library(dplyr)
unique_data <- distinct(data)

Data Transformation with dplyr

Prompt 21: Select specific columns

Use `select()` to choose a subset of columns from a data frame.

library(dplyr)
subset_data <- select(data, column1, column2, column3)

Prompt 22: Filter rows by criteria

Extract rows that satisfy given conditions using `filter()`.

filtered_data <- filter(data, column1 == 'A', column2 > 5)

Prompt 23: Mutate new columns

Create new variables derived from existing columns using `mutate()`.

library(dplyr)
data <- mutate(data, ratio = column1 / column2, log_value = log(column3))

Prompt 24: Summarize data by groups

Group data and compute summary statistics with `group_by()` and `summarise()`.

library(dplyr)
summ <- data %>% group_by(category) %>% summarise(mean_value = mean(value, na.rm = TRUE), count = n())

Prompt 25: Group by multiple variables

Group data by more than one variable for multi‑level summaries.

summary <- data %>% group_by(category, subgroup) %>% summarise(total = sum(value))

Prompt 26: Arrange data

Sort data within the pipeline using `arrange()`.

sorted <- data %>% arrange(desc(value))

Prompt 27: Count occurrences

Tabulate the number of observations per category with `count()`.

library(dplyr)
counts <- data %>% count(category)

Prompt 28: Select distinct rows

Extract unique rows for specified columns using `distinct()`.

unique_rows <- data %>% distinct(category, value)

Prompt 29: Join two data frames

Perform an inner join on two tables with common keys using `inner_join()`.

merged_data <- inner_join(df1, df2, by = 'id')

Prompt 30: Pivot data longer or wider

Reshape data using `pivot_longer()` and `pivot_wider()` from **tidyr**.

library(tidyr)
long <- pivot_longer(data, cols = starts_with('Q'), names_to = 'question', values_to = 'score')
wide <- pivot_wider(long, names_from = question, values_from = score)

Exploratory Data Analysis

Prompt 31: Compute summary statistics

Get an overview of each variable with the `summary()` function, which reports minimum, quartiles, median, mean and maximum【116883529303011†L61-L77】.

summary(data)

Prompt 32: Compute quantiles

Calculate specific quantiles (e.g., 25th and 75th percentiles) using `quantile()`.

quantiles <- quantile(data$variable, probs = c(0.25, 0.5, 0.75))

Prompt 33: Create a frequency table

Tabulate counts of unique values in a vector using `table()`.

freq <- table(data$category)

Prompt 34: Create a cross‑tabulation

Produce contingency tables for two categorical variables with `xtabs()` or `table()`.

crosstab <- table(data$category, data$group)

Prompt 35: Compute a correlation matrix

Calculate pairwise correlations for numeric variables using `cor()`.

cor_matrix <- cor(data[, sapply(data, is.numeric)], use = 'complete.obs')

Prompt 36: Compute covariance matrix

Assess covariance between variables with `cov()`.

cov_matrix <- cov(data[, sapply(data, is.numeric)], use = 'complete.obs')

Prompt 37: Plot a histogram

Visualize the distribution of a continuous variable with `hist()`.

hist(data$variable, breaks = 30, col = 'steelblue', main = 'Histogram', xlab = 'Value')

Prompt 38: Create a boxplot

Display the distribution and outliers of a variable using `boxplot()`.

boxplot(data$variable ~ data$group, main = 'Boxplot', xlab = 'Group', ylab = 'Variable')

Prompt 39: Create a scatter plot

Plot two numeric variables against each other with `plot()`.

plot(data$variable1, data$variable2, main = 'Scatter Plot', xlab = 'Variable 1', ylab = 'Variable 2', pch = 19)

Prompt 40: Make a pairwise scatterplot matrix

Explore relationships between multiple variables using `pairs()`.

pairs(data[, 1:4], main = 'Pairs Plot')

Basic Statistics

Prompt 41: Compute the mean

Calculate the arithmetic mean of a numeric vector with `mean()`.

avg <- mean(data$variable, na.rm = TRUE)

Prompt 42: Compute the median

Find the median (50th percentile) of a numeric vector.

med <- median(data$variable, na.rm = TRUE)

Prompt 43: Compute the standard deviation

Measure the spread of values around the mean using `sd()`.

std_dev <- sd(data$variable, na.rm = TRUE)

Prompt 44: Compute the variance

Compute the sample variance of a numeric vector using `var()`.

var_value <- var(data$variable, na.rm = TRUE)

Prompt 45: Compute the range

Get the minimum and maximum values with `range()`.

range_values <- range(data$variable, na.rm = TRUE)

Prompt 46: Compute the interquartile range

Calculate the IQR (difference between 75th and 25th percentiles) using `IQR()`.

iqr_value <- IQR(data$variable, na.rm = TRUE)

Prompt 47: Generate a random sample from a normal distribution

Create a vector of random numbers drawn from a normal distribution using `rnorm()`.

set.seed(123)
random_values <- rnorm(100, mean = 0, sd = 1)

Prompt 48: Compute summary of an entire data frame

Summarize all variables at once using `summary()`【116883529303011†L61-L77】.

summary_stats <- summary(data)

Prompt 49: Compute covariance matrix

Calculate variances and covariances for numeric variables using `cov()`【116883529303011†L116-L132】.

cov_mat <- cov(data[, sapply(data, is.numeric)])

Prompt 50: Compute correlation coefficient

Measure linear relationships between two variables using `cor()`.

correlation <- cor(data$variable1, data$variable2, use = 'complete.obs')

Statistical Tests

Prompt 51: One‑sample t‑test

Test whether the mean of a sample differs from a hypothesized value.

t.test(data$variable, mu = 0)

Prompt 52: Two‑sample t‑test

Compare means of two independent groups using a two‑sample t‑test.

t.test(variable ~ group, data = data)

Prompt 53: Paired t‑test

Compare means of paired observations, such as before‑after measurements.

t.test(data$pre, data$post, paired = TRUE)

Prompt 54: Chi‑square test of independence

Assess the association between two categorical variables.

chisq.test(table(data$category, data$group))

Prompt 55: Shapiro–Wilk normality test

Check normality of a numeric variable using `shapiro.test()`.

shapiro.test(data$variable)

Prompt 56: Correlation test

Test whether the correlation coefficient differs from zero.

cor.test(data$variable1, data$variable2, method = 'pearson')

Prompt 57: One‑way ANOVA

Compare means across more than two groups using analysis of variance.

anova_result <- aov(variable ~ group, data = data)
summary(anova_result)

Prompt 58: Wilcoxon rank‑sum test

Perform a non‑parametric test to compare two independent samples.

wilcox.test(variable ~ group, data = data)

Prompt 59: Kruskal–Wallis test

Non‑parametric alternative to one‑way ANOVA for more than two groups.

kruskal.test(variable ~ group, data = data)

Prompt 60: Proportion test

Test equality of proportions for two samples using `prop.test()`.

prop.test(x = c(40, 50), n = c(100, 120))

Linear and Generalized Linear Models

Prompt 61: Fit a simple linear regression

Model the relationship between a response and a single predictor using `lm()`【496611002125600†L14-L32】.

lm_fit <- lm(y ~ x, data = data)
summary(lm_fit)

Prompt 62: Fit a multiple linear regression

Include multiple predictors in a linear model.

lm_fit <- lm(y ~ x1 + x2 + x3, data = data)
summary(lm_fit)

Prompt 63: Fit a polynomial regression

Use higher‑order terms to capture non‑linear relationships.

poly_fit <- lm(y ~ poly(x, degree = 2, raw = TRUE), data = data)
summary(poly_fit)

Prompt 64: Fit a logistic regression

Model a binary response variable using `glm()` with the binomial family.

logit_fit <- glm(y ~ x1 + x2, data = data, family = binomial)
summary(logit_fit)

Prompt 65: Fit a Poisson regression

Model count data using `glm()` with the Poisson family.

pois_fit <- glm(count ~ x1 + offset(log(exposure)), data = data, family = poisson)
summary(pois_fit)

Prompt 66: Fit a linear model without an intercept

Suppress the intercept term by adding `0 +` in the formula【496611002125600†L134-L141】.

lm_no_intercept <- lm(y ~ 0 + x1 + x2, data = data)
coef(lm_no_intercept)

Prompt 67: Extract coefficients from a model

Obtain estimated coefficients from an `lm` or `glm` object using `coef()`.

coefficients <- coef(lm_fit)

Prompt 68: Predict new values

Generate predictions on new data using `predict()`.

new_data <- data.frame(x1 = c(1, 2), x2 = c(3, 4))
predictions <- predict(lm_fit, newdata = new_data)

Prompt 69: Plot diagnostic plots

Inspect model diagnostics such as residuals and fitted values using `plot(lm_fit)`.

par(mfrow = c(2, 2))
plot(lm_fit)

Prompt 70: Summarize model output

View summary statistics, coefficients and diagnostic metrics with `summary()`.

summary(lm_fit)

Advanced Modeling and Machine Learning

Prompt 71: Split data into training and testing sets

Randomly partition your dataset into training and testing subsets.

set.seed(42)
train_index <- sample(seq_len(nrow(data)), size = 0.7 * nrow(data))
train <- data[train_index, ]
test <- data[-train_index, ]

Prompt 72: Perform k‑fold cross‑validation

Use the **caret** package to perform k‑fold cross‑validation when training models.

library(caret)
control <- trainControl(method = 'cv', number = 5)
cv_model <- train(y ~ ., data = data, method = 'lm', trControl = control)

Prompt 73: Fit a decision tree

Create a classification or regression tree using **rpart**.

library(rpart)
tree_model <- rpart(Species ~ ., data = iris, method = 'class')
printcp(tree_model)

Prompt 74: Fit a random forest

Train an ensemble of decision trees using **randomForest**.

library(randomForest)
rf_model <- randomForest(Species ~ ., data = iris, ntree = 500)
print(rf_model)

Prompt 75: Fit a gradient boosting machine

Use the **xgboost** package for gradient boosting on numeric matrices.

library(xgboost)
# prepare matrices
label <- as.numeric(iris$Species) - 1
train_matrix <- xgb.DMatrix(data = as.matrix(iris[, -5]), label = label)
params <- list(objective = 'multi:softprob', num_class = 3)
model <- xgb.train(params, train_matrix, nrounds = 50)

Prompt 76: Fit a support vector machine

Use **e1071** or **kernlab** to build an SVM classifier.

library(e1071)
svm_model <- svm(Species ~ ., data = iris, kernel = 'radial')
summary(svm_model)

Prompt 77: Fit k‑nearest neighbors

Implement KNN classification via the **class** package or **caret**.

library(class)
k <- 3
train_x <- iris[ , -5]
train_y <- iris$Species
pred <- knn(train_x, train_x, train_y, k = k)

Prompt 78: Fit a Naive Bayes classifier

Use **e1071** to fit a naive Bayes model for categorical data.

library(e1071)
nb_model <- naiveBayes(Species ~ ., data = iris)
nb_pred <- predict(nb_model, iris)

Prompt 79: Fit a ridge regression

Apply regularization to linear models using **glmnet** with alpha = 0 for ridge.

library(glmnet)
x <- model.matrix(y ~ ., data)[, -1]
y_vec <- data$y
ridge_fit <- cv.glmnet(x, y_vec, alpha = 0)
coef(ridge_fit, s = 'lambda.min')

Prompt 80: Fit a lasso regression

Use **glmnet** with alpha = 1 to perform lasso penalization.

lasso_fit <- cv.glmnet(x, y_vec, alpha = 1)
coef(lasso_fit, s = 'lambda.min')

Clustering and Unsupervised Learning

Prompt 81: Perform k‑means clustering

Partition observations into k clusters using `kmeans()`.

set.seed(123)
km <- kmeans(iris[, -5], centers = 3)
km$cluster

Prompt 82: Determine optimal number of clusters (elbow)

Plot total within‑cluster sum of squares for various k values to choose the optimal number.

wss <- sapply(1:10, function(k) {
  kmeans(iris[, -5], centers = k, nstart = 10)$tot.withinss
})
plot(1:10, wss, type = 'b', pch = 19, frame = FALSE, xlab = 'k', ylab = 'Total Within Sum of Squares')

Prompt 83: Perform hierarchical clustering

Use `hclust()` on a distance matrix to build a dendrogram.

d <- dist(iris[, -5])
hc <- hclust(d, method = 'complete')
plot(hc, labels = iris$Species)

Prompt 84: Plot a dendrogram

Visualize hierarchical clustering results with a dendrogram.

plot(as.dendrogram(hc), main = 'Hierarchical Clustering Dendrogram')

Prompt 85: Standardize data before clustering

Scale variables to have mean 0 and unit variance using `scale()`.

scaled_data <- scale(iris[, -5])
km_scaled <- kmeans(scaled_data, centers = 3)

Prompt 86: Perform principal component analysis

Reduce dimensionality of numeric data using `prcomp()`.

pca <- prcomp(iris[, -5], scale. = TRUE)
summary(pca)

Prompt 87: Plot a PCA biplot

Visualize principal components and variable loadings.

biplot(pca, scale = 0, main = 'PCA Biplot')

Prompt 88: Perform t‑SNE

Apply t‑distributed stochastic neighbor embedding via **Rtsne** for high‑dimensional data.

library(Rtsne)
tsne_out <- Rtsne(as.matrix(iris[, -5]), dims = 2, perplexity = 30)
plot(tsne_out$Y, col = as.numeric(iris$Species), pch = 19)

Prompt 89: Perform DBSCAN clustering

Density‑based clustering with **dbscan** package.

library(dbscan)
cl <- dbscan(iris[, -5], eps = 0.5, minPts = 5)
cl$cluster

Prompt 90: Perform factor analysis

Identify latent variables influencing observed data using `factanal()`.

fa <- factanal(iris[, -5], factors = 2, rotation = 'varimax')
fa

Time Series Analysis

Prompt 91: Create a time series object

Use `ts()` to convert a numeric vector into a time series object.

my_ts <- ts(data$variable, start = c(2020, 1), frequency = 12)

Prompt 92: Plot a time series

Visualize a time series object using `plot()`.

plot(my_ts, main = 'Time Series Plot', ylab = 'Value', xlab = 'Time')

Prompt 93: Decompose a time series

Break down a series into trend, seasonal and irregular components using `decompose()`.

components <- decompose(my_ts, type = 'additive')
plot(components)

Prompt 94: Check stationarity (ADF test)

Use the Augmented Dickey–Fuller test from **tseries** to test for stationarity.

library(tseries)
adf.test(my_ts)

Prompt 95: Fit an ARIMA model

Automatically select and fit an ARIMA model using **forecast**.

library(forecast)
fit <- auto.arima(my_ts)
fit

Prompt 96: Forecast future values

Predict future observations with the fitted model and plot the forecast.

fc <- forecast(fit, h = 12)
plot(fc)

Prompt 97: Plot forecast results

Visualize the predicted values and prediction intervals.

autoplot(fc) + ggtitle('Forecast')

Prompt 98: Fit exponential smoothing (Holt–Winters)

Use `HoltWinters()` for exponential smoothing.

hw <- HoltWinters(my_ts)
plot(hw)

Prompt 99: Perform STL decomposition

Apply Seasonal-Trend decomposition using Loess.

stl_fit <- stl(my_ts, s.window = 'periodic')
plot(stl_fit)

Prompt 100: Evaluate forecast accuracy

Compute accuracy metrics such as MAE and RMSE using `accuracy()`.

acc <- accuracy(fc)
acc

Data Visualization with ggplot2

Prompt 101: Create a scatter plot

Use `ggplot()` with `geom_point()` to visualize the relationship between two variables.

library(ggplot2)
ggplot(data, aes(x = variable1, y = variable2)) + geom_point() + ggtitle('Scatter Plot')

Prompt 102: Create a line plot

Plot time series or ordered data with `geom_line()`.

ggplot(data, aes(x = time, y = value)) + geom_line(color = 'blue') + labs(title = 'Line Plot', x = 'Time', y = 'Value')

Prompt 103: Create a bar chart

Represent categorical data as bars using `geom_bar()`.

ggplot(data, aes(x = category)) + geom_bar(fill = 'tomato') + labs(title = 'Bar Chart', x = 'Category', y = 'Count')

Prompt 104: Create a histogram

Visualize the distribution of a continuous variable with `geom_histogram()`.

ggplot(data, aes(x = variable)) + geom_histogram(binwidth = 1, fill = 'skyblue', color = 'black')

Prompt 105: Add facets

Use `facet_wrap()` or `facet_grid()` to create trellis plots.

ggplot(data, aes(x = value)) + geom_histogram(binwidth = 1) + facet_wrap(~ group) + labs(title = 'Faceted Histograms')

Prompt 106: Customize colors and themes

Apply a color palette and theme to improve plot aesthetics.

ggplot(data, aes(x = variable1, y = variable2, color = group)) + geom_point() + theme_minimal() + scale_color_brewer(palette = 'Set2')

Prompt 107: Add a smooth line

Overlay a regression or loess smooth using `geom_smooth()`.

ggplot(data, aes(x = x, y = y)) + geom_point() + geom_smooth(method = 'lm', se = FALSE) + labs(title = 'Scatter with Regression Line')

Prompt 108: Create a boxplot

Use `geom_boxplot()` to display distributions for multiple groups.

ggplot(data, aes(x = group, y = value)) + geom_boxplot(fill = 'lightgreen') + labs(title = 'Boxplot')

Prompt 109: Create a density plot

Plot a density estimate with `geom_density()`.

ggplot(data, aes(x = variable, fill = group)) + geom_density(alpha = 0.5) + labs(title = 'Density Plot')

Prompt 110: Save a plot to file

Export plots to PNG, PDF or other formats using `ggsave()`.

p <- ggplot(data, aes(x = variable1, y = variable2)) + geom_point()
ggsave('scatter_plot.png', p, width = 6, height = 4)

Model Evaluation and Validation

Prompt 111: Compute a confusion matrix

Use `table()` or **caret** to generate a confusion matrix for classification results.

pred <- predict(rf_model, iris)
conf_matrix <- table(Predicted = pred, Actual = iris$Species)

Prompt 112: Compute accuracy, precision and recall

Calculate evaluation metrics from the confusion matrix.

tp <- conf_matrix[1,1]; fp <- conf_matrix[2,1]; fn <- conf_matrix[1,2]; tn <- conf_matrix[2,2]
accuracy <- sum(diag(conf_matrix)) / sum(conf_matrix)
precision <- tp / (tp + fp)
recall <- tp / (tp + fn)

Prompt 113: Plot a ROC curve and compute AUC

Visualize trade‑offs between true positive and false positive rates using **pROC**.

library(pROC)
prob <- predict(rf_model, iris, type = 'prob')[,1]
roc_obj <- roc(iris$Species, prob)
plot(roc_obj)
auc_value <- auc(roc_obj)

Prompt 114: Calculate R‑squared

Retrieve the coefficient of determination for regression models.

rsq <- summary(lm_fit)$r.squared

Prompt 115: Compute mean squared error (MSE)

Evaluate regression performance by averaging squared residuals.

pred <- predict(lm_fit, newdata = data)
mse <- mean((data$y - pred)^2)

Prompt 116: Perform cross‑validation with caret

Run resampling techniques in **caret** for general models.

set.seed(123)
control <- trainControl(method = 'repeatedcv', number = 10, repeats = 3)
cv_result <- train(y ~ ., data = data, method = 'lm', trControl = control)

Prompt 117: Plot residuals vs fitted values

Assess homoscedasticity by plotting residuals against fitted values.

residuals <- resid(lm_fit)
fitted_vals <- fitted(lm_fit)
plot(fitted_vals, residuals, xlab = 'Fitted Values', ylab = 'Residuals', main = 'Residuals vs Fitted')
abline(h = 0, col = 'red')

Prompt 118: Identify influential points

Calculate Cook’s distance to detect influential observations.

cooks <- cooks.distance(lm_fit)
plot(cooks, type = 'h', main = "Cook's distance")

Prompt 119: Perform stepwise model selection

Use `step()` to carry out stepwise selection based on AIC.

step_model <- step(lm_full, direction = 'both')

Prompt 120: Validate model assumptions

Check assumptions like normality of residuals, independence and linearity.

par(mfrow = c(2, 2))
plot(lm_fit)
shapiro.test(residuals(lm_fit))

Data Structures

Prompt 121: Create a vector

Combine elements into an atomic vector using the `c()` function.

v <- c(1, 2, 3, 4, 5)

Prompt 122: Create a matrix

Define a two‑dimensional matrix with `matrix()`.

m <- matrix(1:9, nrow = 3, ncol = 3, byrow = TRUE)

Prompt 123: Create a list

Store heterogeneous objects in a list.

lst <- list(numbers = 1:5, letters = letters[1:3], data = data.frame(x = 1:3, y = 4:6))

Prompt 124: Create a data frame

Construct a data frame from vectors of equal length.

df <- data.frame(name = c('A','B','C'), score = c(90, 85, 88))

Prompt 125: Create a tibble

Use the **tibble** package to create a modern tibble with enhanced printing.

library(tibble)
tbl <- tibble(name = c('A','B'), value = c(1, 2))

Prompt 126: Convert data frame to tibble

Transform a base data frame into a tibble.

tbl <- as_tibble(df)

Prompt 127: Access elements by index

Refer to individual elements in vectors, lists or data frames using bracket notation.

third_element <- v[3];
list_item <- lst$numbers[2];
cell <- df[1, 'score']

Prompt 128: Combine objects by rows and columns

Use `rbind()` and `cbind()` to merge data structures.

combined_rows <- rbind(df, data.frame(name = 'D', score = 92))
combined_cols <- cbind(df, grade = c('A', 'B', 'A'))

Prompt 129: Reshape an array

Create and manipulate multidimensional arrays with `array()`.

arr <- array(1:24, dim = c(2, 3, 4))
arr[1, , 2]

Prompt 130: Convert factors

Convert factor variables to numeric or character to suit analysis.

data$factor_var <- as.numeric(as.character(data$factor_var))

Functions and Programming Constructs

Prompt 131: Define a simple function

Encapsulate reusable code by defining a function using `function()`.

square <- function(x) {
  x^2
}
square(4)

Prompt 132: Use conditional statements

Control flow with `if`, `else if`, and `else`.

check_number <- function(x) {
  if (x > 0) {
    'positive'
  } else if (x < 0) {
    'negative'
  } else {
    'zero'
  }
}
check_number(-5)

Prompt 133: Use a for loop

Iterate over a sequence of values using a `for` loop.

sum <- 0
for (i in 1:10) {
  sum <- sum + i
}
sum

Prompt 134: Use a while loop

Execute a block repeatedly while a condition is TRUE.

count <- 1
while (count <= 5) {
  print(count)
  count <- count + 1
}

Prompt 135: Use a repeat loop

Create an infinite loop that must be broken manually using `break()`.

x <- 1
repeat {
  if (x > 3) break
  print(x)
  x <- x + 1
}

Prompt 136: Apply a function with `apply()`

Apply a function across rows or columns of a matrix or array.

m <- matrix(1:9, nrow = 3)
row_sums <- apply(m, 1, sum)
col_means <- apply(m, 2, mean)

Prompt 137: Use `lapply()`

Apply a function to each element of a list and return a list.

result <- lapply(lst$numbers, function(x) x * 2)

Prompt 138: Use `sapply()`

Simplify the result of `lapply()` to a vector or matrix when possible.

result_vec <- sapply(lst$numbers, function(x) x * 2)

Prompt 139: Use `map` from purrr

Iterate over lists or vectors with functional programming using **purrr**.

library(purrr)
map_dbl(1:5, ~ .x^2)

Prompt 140: Use anonymous functions

Write functions without naming them for short, disposable operations.

sapply(1:5, function(x) x * 3)

String Manipulation

Prompt 141: Calculate string length

Use `nchar()` to obtain the number of characters in a string.

nchar('R programming')

Prompt 142: Convert to upper and lower case

Change the case of characters with `toupper()` and `tolower()`.

toupper('hello'); tolower('WORLD')

Prompt 143: Concatenate strings

Combine multiple strings using `paste()` or `paste0()`.

full <- paste('Hello', 'world', sep = ', ')
no_space <- paste0('R', 'Stats')

Prompt 144: Split a string

Divide a string into parts based on a delimiter using `strsplit()`.

parts <- strsplit('apple,banana,cherry', split = ',')[[1]]

Prompt 145: Find and replace patterns

Replace text patterns with `gsub()` or `sub()`.

text <- 'R is great'
new_text <- gsub('great', 'awesome', text)

Prompt 146: Extract substrings

Select a portion of a string using `substr()` or `substring()`.

substr('statistics', start = 1, stop = 4)

Prompt 147: Detect pattern presence

Check if a pattern exists in a string with `grepl()`.

grepl('data', 'big data analysis')

Prompt 148: Use regular expressions to match patterns

Match complex string patterns using regular expressions via `gregexpr()`.

matches <- gregexpr('[0-9]+', 'Room 101, Floor 2')
regmatches('Room 101, Floor 2', matches)

Prompt 149: Remove whitespace

Trim leading and trailing whitespace with `trimws()`.

trimws('  hello  ')

Prompt 150: Convert factors to character strings

Transform factor variables into strings for text processing.

char_var <- as.character(factor_var)

Date and Time Handling

Prompt 151: Convert string to Date

Parse a character string into a Date object using `as.Date()`.

dates <- as.Date(c('2025-10-08', '2025-12-31'))

Prompt 152: Parse date‑time with lubridate

Use **lubridate** to handle date‑time formats more flexibly.

library(lubridate)
dt <- ymd_hms('2025-10-08 14:30:00')

Prompt 153: Extract year, month and day

Retrieve components of a date or date‑time object.

year <- year(dt); month <- month(dt); day <- day(dt)

Prompt 154: Compute difference between dates

Calculate the time interval between two dates using `difftime()`.

start <- as.Date('2025-01-01')
end <- as.Date('2025-12-31')
interval <- difftime(end, start, units = 'days')

Prompt 155: Add or subtract time durations

Use lubridate to add or subtract periods and durations.

new_date <- dt + days(7) - hours(2)

Prompt 156: Round dates to the nearest unit

Round date‑times to a specified unit such as week or month.

rounded <- round_date(dt, unit = 'hour')

Prompt 157: Format dates for display

Customize date output with `format()` or lubridate’s `stamp()`.

formatted <- format(dt, '%d-%b-%Y %H:%M')

Prompt 158: Handle time zones

Specify and convert between time zones using `with_tz()` and `force_tz()`.

dt_local <- ymd_hms('2025-10-08 14:30:00', tz = 'Europe/Madrid')
dt_utc <- with_tz(dt_local, 'UTC')

Prompt 159: Create a sequence of dates

Generate regularly spaced dates with `seq.Date()`.

date_seq <- seq(as.Date('2025-01-01'), as.Date('2025-01-10'), by = 'day')

Prompt 160: Group data by date components

Aggregate data by year, month or day using `floor_date()` and dplyr.

library(lubridate)
monthly_summary <- data %>% group_by(month = floor_date(date, 'month')) %>% summarise(total = sum(value))

Randomization and Simulation

Prompt 161: Set a random seed

Ensure reproducible results by setting a seed with `set.seed()`.

set.seed(123)

Prompt 162: Sample from a vector

Randomly sample elements from a vector using `sample()`.

sample_vec <- sample(1:100, size = 10, replace = FALSE)

Prompt 163: Generate random numbers from distributions

Use functions like `rnorm()`, `runif()`, and `rbinom()` to sample from normal, uniform and binomial distributions.

norm_samples <- rnorm(10, mean = 0, sd = 1)
unif_samples <- runif(10, min = 0, max = 1)
binom_samples <- rbinom(10, size = 20, prob = 0.5)

Prompt 164: Simulate dice rolls

Roll a fair six‑sided die multiple times using `sample()`.

dice_rolls <- sample(1:6, size = 20, replace = TRUE)

Prompt 165: Perform bootstrap sampling

Generate bootstrap samples to estimate variability of statistics.

set.seed(123)
boot_means <- replicate(1000, mean(sample(data$variable, replace = TRUE)))

Prompt 166: Run a Monte Carlo simulation

Use repeated random sampling to estimate a quantity.

set.seed(42)
monte_carlo <- function(n_sim) {
  successes <- 0
  for (i in 1:n_sim) {
    x <- runif(1)
    y <- runif(1)
    if (x^2 + y^2 <= 1) successes <- successes + 1
  }
  4 * successes / n_sim
}
p_estimate <- monte_carlo(10000)

Prompt 167: Generate a random permutation

Shuffle the order of elements in a vector.

perm <- sample(1:10)

Prompt 168: Randomly split a dataset

Divide data into random subsets (e.g., training and testing).

set.seed(123)
indices <- sample(seq_len(nrow(data)))
train_indices <- indices[1:floor(0.8 * length(indices))]
test_indices <- indices[-train_indices]
train_set <- data[train_indices, ]
test_set <- data[test_indices, ]

Prompt 169: Simulate a Poisson process

Generate arrival times from a Poisson process with rate λ.

lambda <- 2
n <- 100
interarrival <- rexp(n, rate = lambda)
arrival_times <- cumsum(interarrival)

Prompt 170: Simulate a Markov chain

Model transitions between states with a transition matrix.

states <- c('A','B','C')
transition <- matrix(c(0.5,0.3,0.2,
                      0.4,0.4,0.2,
                      0.2,0.5,0.3),
                    nrow = 3, byrow = TRUE, dimnames = list(states, states))
set.seed(123)
current <- 'A'
chain <- character(10)
chain[1] <- current
for (i in 2:10) {
  current <- sample(states, 1, prob = transition[current, ])
  chain[i] <- current
}

File and Directory Management

Prompt 171: List files in a directory

Use `list.files()` to get a vector of file names.

files <- list.files(path = '.', pattern = '*.csv', full.names = TRUE)

Prompt 172: Check if a file exists

Verify whether a file exists using `file.exists()`.

exists <- file.exists('myfile.csv')

Prompt 173: Read multiple files and combine

Use `lapply()` to read multiple files and `bind_rows()` to merge them.

library(readr)
file_list <- list.files(pattern = '*.csv')
all_data <- dplyr::bind_rows(lapply(file_list, read_csv))

Prompt 174: Create a new directory

Create directories with `dir.create()`.

dir.create('new_folder')

Prompt 175: Delete a file or directory

Remove files or directories using `file.remove()` and `unlink()`.

file.remove('old_file.csv')
unlink('old_folder', recursive = TRUE)

Prompt 176: Copy a file

Duplicate a file using `file.copy()`.

file.copy('source.txt', 'destination.txt')

Prompt 177: Move or rename a file

Change the name or location of a file using `file.rename()`.

file.rename('old_name.txt', 'new_name.txt')

Prompt 178: Write data to a compressed file

Compress data when writing to disk using `gzfile()`.

gz_con <- gzfile('data.csv.gz', 'w')
write.csv(data, gz_con)
close(gz_con)

Prompt 179: Get file information

Retrieve metadata such as size and modification time using `file.info()`.

info <- file.info('data.csv')
size <- info$size; modified <- info$mtime

Prompt 180: Redirect output to a file

Capture printed output using `sink()` to write to a text file.

sink('log.txt')
print(summary(data))
sink()

Web Data and API

Prompt 181: Download a file from a URL

Use `download.file()` to retrieve files from the web.

download.file('https://example.com/data.csv', destfile = 'downloaded.csv')

Prompt 182: Read an HTML table

Use **rvest** to scrape tables from web pages.

library(rvest)
page <- read_html('https://example.com')
table <- html_table(html_nodes(page, 'table')[[1]])

Prompt 183: Extract nodes with CSS selectors

Select HTML elements using CSS selectors with **rvest**.

titles <- html_text(html_nodes(page, 'h2.title'))

Prompt 184: Parse an XML file

Use **xml2** to read and parse XML data.

library(xml2)
xml <- read_xml('file.xml')
values <- xml_text(xml_find_all(xml, '//tag'))

Prompt 185: Fetch JSON data from an API

Use **httr** and **jsonlite** to request and parse JSON.

library(httr); library(jsonlite)
response <- GET('https://api.example.com/data')
data_json <- content(response, as = 'text')
data <- fromJSON(data_json)

Prompt 186: Send an HTTP GET request

Retrieve data using `GET()` with query parameters.

res <- GET('https://api.example.com/search', query = list(q = 'R programming'))
content(res, 'text')

Prompt 187: Convert API data to a data frame

Transform JSON or list structures into tidy data frames.

df <- as.data.frame(data)

Prompt 188: Scrape multiple pages

Loop through multiple pages to collect data in a single data set.

base_url <- 'https://example.com/page='
results <- list()
for (i in 1:5) {
  page <- read_html(paste0(base_url, i))
  results[[i]] <- html_text(html_nodes(page, '.item'))
}
items <- unlist(results)

Prompt 189: Extract meta information from a web page

Use **rvest** to extract metadata such as title and description.

meta_title <- html_text(html_nodes(page, 'title'))
meta_desc <- html_attr(html_nodes(page, "meta[name='description']"), 'content')

Prompt 190: Handle API authentication

Send authenticated requests using headers or tokens with **httr**.

token <- 'your_token_here'
res <- GET('https://api.example.com/protected', add_headers(Authorization = paste('Bearer', token)))
content(res, 'text')

Miscellaneous Tips and Tricks

Prompt 191: Install and load packages

Install packages with `install.packages()` and load them with `library()`.

install.packages('dplyr')
library(dplyr)

Prompt 192: Check installed packages

List packages currently installed on your system.

installed <- installed.packages()
head(installed[, 'Package'])

Prompt 193: Remove objects from workspace

Clean up memory by removing objects with `rm()` and using `gc()`.

rm(list = ls())
gc()

Prompt 194: Set working directory

Change the working directory using `setwd()` and verify with `getwd()`.

setwd('/path/to/project')
getwd()

Prompt 195: Use pipes (`%>%`)

Make code more readable by chaining operations with the pipe operator from **magrittr** or **dplyr**.

library(dplyr)
result <- data %>% filter(value > 0) %>% summarise(mean = mean(value))

Prompt 196: Profile code performance

Measure execution time of expressions using `system.time()`.

execution <- system.time({
  Sys.sleep(1)
})

Prompt 197: Vectorize operations

Avoid explicit loops by performing vectorized arithmetic for speed.

v <- 1:100
squared <- v^2

Prompt 198: Use parallel computing

Leverage multiple cores with the **parallel** package.

library(parallel)
cl <- makeCluster(detectCores() - 1)
parSapply(cl, 1:4, function(x) x^2)
stopCluster(cl)

Prompt 199: Document functions with roxygen2

Use roxygen2 comments to document your functions for package development.

#' Calculate the square
#'
#' @param x A numeric value
#' @return The square of x
square <- function(x) { x^2 }

Prompt 200: Create reproducible analysis

Ensure reproducibility by setting seeds and recording session information.

set.seed(123)
# analysis code here
info <- sessionInfo()

Leave a Comment

Your email address will not be published. Required fields are marked *