Advanced

caret & mlr3

Explore alternative ML frameworks in R: caret's unified interface and mlr3's modern object-oriented approach.

caret Package Overview

caret (Classification And REgression Training) was the dominant ML framework in R for over a decade. It provides a unified interface to 200+ models.

💡
Note: caret is in maintenance mode. For new projects, tidymodels is recommended. However, you will encounter caret in existing codebases and tutorials, so it is worth understanding.

The train() Function

R
library(caret)

# Train a random forest with cross-validation
set.seed(42)
ctrl <- trainControl(method = "cv", number = 10)

model <- train(
  Species ~ .,
  data = iris,
  method = "rf",          # Random forest
  trControl = ctrl,
  tuneLength = 5          # Try 5 parameter combinations
)

print(model)
plot(model)

# Predictions
preds <- predict(model, newdata = iris)
confusionMatrix(preds, iris$Species)

Preprocessing with caret

R
# preProcess for data transformation
pp <- preProcess(iris[, 1:4], method = c("center", "scale"))
scaled <- predict(pp, iris[, 1:4])

# Or include preprocessing in train()
model <- train(
  Species ~ .,
  data = iris,
  method = "knn",
  preProcess = c("center", "scale"),
  trControl = ctrl,
  tuneGrid = data.frame(k = c(3, 5, 7, 9, 11))
)

Comparing Models in caret

R
# Train multiple models
rf_model <- train(Species ~ ., data = iris, method = "rf", trControl = ctrl)
svm_model <- train(Species ~ ., data = iris, method = "svmRadial", trControl = ctrl)
knn_model <- train(Species ~ ., data = iris, method = "knn", trControl = ctrl)

# Compare
results <- resamples(list(RF = rf_model, SVM = svm_model, KNN = knn_model))
summary(results)
bwplot(results)

mlr3 Framework

mlr3 is a modern, object-oriented ML framework for R. It uses R6 classes and a task/learner/resampling paradigm.

R
library(mlr3)
library(mlr3learners)

# Define a task
task <- TaskClassif$new(id = "iris", backend = iris, target = "Species")

# Define a learner
learner <- lrn("classif.ranger", num.trees = 500)

# Define resampling
resampling <- rsmp("cv", folds = 10)

# Run the experiment
rr <- resample(task, learner, resampling)
rr$aggregate(msr("classif.acc"))

mlr3 Pipelines

R
library(mlr3pipelines)

# Build a pipeline: scale -> PCA -> random forest
graph <- po("scale") %>>%
  po("pca") %>>%
  po("learner", lrn("classif.ranger"))

# Convert to a learner
graph_learner <- as_learner(graph)
graph_learner$train(task)

When to Use Which Framework

FrameworkBest ForStrengths
tidymodelsNew projects, tidyverse usersTidy syntax, active development, great documentation
caretQuick prototyping, legacy code200+ models, simple API, one function does it all
mlr3Complex experiments, benchmarkingFlexible pipelines, R6 classes, extensible
Recommendation: Start with tidymodels for new projects. It has the best documentation, integrates with the tidyverse, and is under active development by the Posit team. Learn caret and mlr3 when you encounter them in existing codebases.