Beginner

Logistic Regression Deep Dive

Despite its name, Logistic Regression is a classification algorithm — the go-to method for predicting categories like spam/not-spam, sick/healthy, or buy/don't-buy.

Why Linear Regression Fails for Classification

If you try to use Linear Regression for a binary classification problem (predicting 0 or 1), you run into several issues:

  • Unbounded output: Linear Regression can predict values like -3.5 or 7.2, but probabilities must be between 0 and 1.
  • Sensitive to outliers: A single extreme data point can shift the decision boundary dramatically.
  • No probability interpretation: Predictions don't represent meaningful probabilities.
💡
The solution: Wrap Linear Regression's output in a sigmoid function that squashes any real number into the range (0, 1). This gives us a valid probability that we can threshold to make classification decisions.

The Sigmoid Function

The sigmoid (logistic) function is the heart of Logistic Regression:

sigmoid(z) = 1 / (1 + e^(-z))

Where z = W^T * X + b  (the linear combination, same as Linear Regression)

Properties:
  - Output is always between 0 and 1
  - sigmoid(0) = 0.5 (the decision boundary)
  - sigmoid(large positive) approaches 1
  - sigmoid(large negative) approaches 0
  - S-shaped curve, smooth and differentiable

Interpretation:
  P(y=1|X) = sigmoid(W^T * X + b)
  P(y=0|X) = 1 - P(y=1|X)

Log-Odds and Decision Boundary

The log-odds (logit) gives us a linear model inside the sigmoid:

log(p / (1-p)) = W^T * X + b

Where:
  p = probability of class 1
  p/(1-p) = odds ratio
  log(p/(1-p)) = log-odds (logit)

Decision Boundary:
  - Predict class 1 if P(y=1|X) >= 0.5  (i.e., z >= 0)
  - Predict class 0 if P(y=1|X) < 0.5   (i.e., z < 0)
  - The boundary is where W^T * X + b = 0
  - This forms a linear boundary (line in 2D, plane in 3D)
  - You can adjust the threshold (e.g., 0.3) for imbalanced classes

Cost Function: Binary Cross-Entropy

We can't use MSE for Logistic Regression because the sigmoid makes the cost function non-convex (full of local minima). Instead, we use Binary Cross-Entropy (Log Loss):

Cost = -(1/n) * SUM[y_i * log(p_i) + (1-y_i) * log(1-p_i)]

Intuition:
  When y=1: Cost = -log(p)    -> high cost if p is near 0
  When y=0: Cost = -log(1-p)  -> high cost if p is near 1

Properties:
  - Convex: guaranteed to find global minimum
  - Penalizes confident wrong predictions heavily
  - Equivalent to maximizing likelihood (MLE)

Multi-Class Classification

Logistic Regression extends beyond binary classification using two strategies:

Softmax (Multinomial)

P(y=k|X) = e^(z_k) / SUM(e^(z_j)) for all classes j

# Each class gets its own weight vector
# Outputs sum to 1 (valid probability distribution)
# Used when classes are mutually exclusive
# sklearn: LogisticRegression(multi_class='multinomial')

One-vs-Rest (OvR)

# Train K separate binary classifiers:
#   Classifier 1: class 1 vs. all others
#   Classifier 2: class 2 vs. all others
#   ...
#   Classifier K: class K vs. all others
# Predict: class with highest probability

# sklearn: LogisticRegression(multi_class='ovr')

Regularization

Just like Linear Regression, Logistic Regression benefits from regularization to prevent overfitting:

  • L2 (Ridge): Default in sklearn. Shrinks coefficients, keeps all features. penalty='l2'
  • L1 (Lasso): Produces sparse models, automatic feature selection. penalty='l1', solver='liblinear'
  • ElasticNet: Combines L1 and L2. penalty='elasticnet', solver='saga', l1_ratio=0.5
  • C parameter: Inverse of regularization strength. Smaller C = stronger regularization. Default: C=1.0

Evaluation Metrics for Classification

MetricFormulaUse When
Accuracy(TP+TN) / TotalBalanced classes; quick overview
PrecisionTP / (TP+FP)Cost of false positives is high (spam filter)
RecallTP / (TP+FN)Cost of false negatives is high (disease detection)
F1 Score2 * (Prec * Rec) / (Prec + Rec)Need balance between precision and recall
ROC-AUCArea under ROC curveOverall model ranking ability; threshold-independent
Confusion Matrix Quick Guide:
TP (True Positive): Predicted positive, actually positive
TN (True Negative): Predicted negative, actually negative
FP (False Positive): Predicted positive, actually negative (Type I error)
FN (False Negative): Predicted negative, actually positive (Type II error)

Python Implementation

import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import (
    accuracy_score, precision_score, recall_score,
    f1_score, roc_auc_score, confusion_matrix,
    classification_report, roc_curve
)
from sklearn.datasets import load_breast_cancer
from sklearn.preprocessing import StandardScaler

# Load real dataset: Breast Cancer Wisconsin
data = load_breast_cancer()
X, y = data.data, data.target
print(f"Dataset: {X.shape[0]} samples, {X.shape[1]} features")
print(f"Classes: {data.target_names}")  # ['malignant', 'benign']
print(f"Class distribution: {np.bincount(y)}")

# Preprocessing: scale features (important for Logistic Regression)
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Train/test split
X_train, X_test, y_train, y_test = train_test_split(
    X_scaled, y, test_size=0.2, random_state=42, stratify=y
)

# Train Logistic Regression
model = LogisticRegression(C=1.0, penalty='l2', max_iter=1000)
model.fit(X_train, y_train)

# Predictions
y_pred = model.predict(X_test)
y_prob = model.predict_proba(X_test)[:, 1]  # probability of class 1

# --- Evaluation Metrics ---
print(f"\n--- Classification Report ---")
print(classification_report(y_test, y_pred, target_names=data.target_names))

print(f"Accuracy:   {accuracy_score(y_test, y_pred):.4f}")
print(f"Precision:  {precision_score(y_test, y_pred):.4f}")
print(f"Recall:     {recall_score(y_test, y_pred):.4f}")
print(f"F1 Score:   {f1_score(y_test, y_pred):.4f}")
print(f"ROC-AUC:    {roc_auc_score(y_test, y_prob):.4f}")

# --- Confusion Matrix ---
cm = confusion_matrix(y_test, y_pred)
print(f"\nConfusion Matrix:")
print(f"  TN={cm[0][0]}  FP={cm[0][1]}")
print(f"  FN={cm[1][0]}  TP={cm[1][1]}")

# --- ROC Curve Visualization ---
fpr, tpr, thresholds = roc_curve(y_test, y_prob)
plt.figure(figsize=(8, 6))
plt.plot(fpr, tpr, 'b-', linewidth=2,
         label=f'Logistic Regression (AUC = {roc_auc_score(y_test, y_prob):.3f})')
plt.plot([0, 1], [0, 1], 'r--', label='Random Classifier')
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('ROC Curve - Breast Cancer Classification')
plt.legend()
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

# --- Top Features ---
importance = np.abs(model.coef_[0])
top_features = np.argsort(importance)[::-1][:10]
print(f"\nTop 10 Most Important Features:")
for i, idx in enumerate(top_features):
    print(f"  {i+1}. {data.feature_names[idx]:<25} coef={model.coef_[0][idx]:+.4f}")
Don't forget to scale! Logistic Regression with regularization is sensitive to feature scales. Always standardize your features (zero mean, unit variance) using StandardScaler. Without scaling, features with large ranges will dominate the model.