Intermediate

Regression

Predict continuous values with linear models, decision trees, random forests, and gradient boosting. Learn evaluation metrics and feature importance.

Linear Regression

Python
from sklearn.linear_model import LinearRegression, Ridge, Lasso
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score

# Linear Regression
lr = LinearRegression()
lr.fit(X_train, y_train)
y_pred = lr.predict(X_test)

# Ridge Regression (L2 regularization)
ridge = Ridge(alpha=1.0)
ridge.fit(X_train, y_train)

# Lasso Regression (L1 regularization - feature selection)
lasso = Lasso(alpha=0.1)
lasso.fit(X_train, y_train)

Polynomial Regression

Python
from sklearn.preprocessing import PolynomialFeatures
from sklearn.pipeline import Pipeline

poly_model = Pipeline([
    ("poly", PolynomialFeatures(degree=3)),
    ("linear", LinearRegression())
])
poly_model.fit(X_train, y_train)

Tree-Based Regression

Python
from sklearn.tree import DecisionTreeRegressor
from sklearn.ensemble import RandomForestRegressor

# Decision Tree
dt = DecisionTreeRegressor(max_depth=5, random_state=42)
dt.fit(X_train, y_train)

# Random Forest
rf = RandomForestRegressor(n_estimators=100, max_depth=10, random_state=42)
rf.fit(X_train, y_train)

Gradient Boosting

Python
import xgboost as xgb
import lightgbm as lgb

# XGBoost
xgb_model = xgb.XGBRegressor(n_estimators=200, learning_rate=0.1,
                              max_depth=6, random_state=42)
xgb_model.fit(X_train, y_train)

# LightGBM
lgb_model = lgb.LGBMRegressor(n_estimators=200, learning_rate=0.1,
                               random_state=42)
lgb_model.fit(X_train, y_train)

Regression Metrics

Python
y_pred = model.predict(X_test)

mse = mean_squared_error(y_test, y_pred)        # Mean Squared Error
rmse = mean_squared_error(y_test, y_pred, squared=False)  # RMSE
mae = mean_absolute_error(y_test, y_pred)        # Mean Absolute Error
r2 = r2_score(y_test, y_pred)                    # R² Score

print(f"RMSE: {rmse:.4f}, MAE: {mae:.4f}, R²: {r2:.4f}")
MetricFormulaInterpretation
MSEMean of squared errorsPenalizes large errors heavily
RMSESquare root of MSESame units as target variable
MAEMean of absolute errorsRobust to outliers
1 - SS_res/SS_totProportion of variance explained (1.0 = perfect)

Feature Importance

Python
import pandas as pd

# Tree-based feature importance
importance = pd.Series(rf.feature_importances_, index=feature_names)
importance.nlargest(10).plot(kind="barh")
plt.title("Top 10 Feature Importances")
plt.show()
Start simple. Always begin with a basic linear regression as your baseline. Only move to complex models (random forest, XGBoost) if the simple model is not sufficient. Complex models are harder to interpret and debug.