Intermediate
Regression
Predict continuous values with linear models, decision trees, random forests, and gradient boosting. Learn evaluation metrics and feature importance.
Linear Regression
Python
from sklearn.linear_model import LinearRegression, Ridge, Lasso from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score # Linear Regression lr = LinearRegression() lr.fit(X_train, y_train) y_pred = lr.predict(X_test) # Ridge Regression (L2 regularization) ridge = Ridge(alpha=1.0) ridge.fit(X_train, y_train) # Lasso Regression (L1 regularization - feature selection) lasso = Lasso(alpha=0.1) lasso.fit(X_train, y_train)
Polynomial Regression
Python
from sklearn.preprocessing import PolynomialFeatures from sklearn.pipeline import Pipeline poly_model = Pipeline([ ("poly", PolynomialFeatures(degree=3)), ("linear", LinearRegression()) ]) poly_model.fit(X_train, y_train)
Tree-Based Regression
Python
from sklearn.tree import DecisionTreeRegressor from sklearn.ensemble import RandomForestRegressor # Decision Tree dt = DecisionTreeRegressor(max_depth=5, random_state=42) dt.fit(X_train, y_train) # Random Forest rf = RandomForestRegressor(n_estimators=100, max_depth=10, random_state=42) rf.fit(X_train, y_train)
Gradient Boosting
Python
import xgboost as xgb import lightgbm as lgb # XGBoost xgb_model = xgb.XGBRegressor(n_estimators=200, learning_rate=0.1, max_depth=6, random_state=42) xgb_model.fit(X_train, y_train) # LightGBM lgb_model = lgb.LGBMRegressor(n_estimators=200, learning_rate=0.1, random_state=42) lgb_model.fit(X_train, y_train)
Regression Metrics
Python
y_pred = model.predict(X_test) mse = mean_squared_error(y_test, y_pred) # Mean Squared Error rmse = mean_squared_error(y_test, y_pred, squared=False) # RMSE mae = mean_absolute_error(y_test, y_pred) # Mean Absolute Error r2 = r2_score(y_test, y_pred) # R² Score print(f"RMSE: {rmse:.4f}, MAE: {mae:.4f}, R²: {r2:.4f}")
| Metric | Formula | Interpretation |
|---|---|---|
| MSE | Mean of squared errors | Penalizes large errors heavily |
| RMSE | Square root of MSE | Same units as target variable |
| MAE | Mean of absolute errors | Robust to outliers |
| R² | 1 - SS_res/SS_tot | Proportion of variance explained (1.0 = perfect) |
Feature Importance
Python
import pandas as pd # Tree-based feature importance importance = pd.Series(rf.feature_importances_, index=feature_names) importance.nlargest(10).plot(kind="barh") plt.title("Top 10 Feature Importances") plt.show()
Start simple. Always begin with a basic linear regression as your baseline. Only move to complex models (random forest, XGBoost) if the simple model is not sufficient. Complex models are harder to interpret and debug.
Lilly Tech Systems