Intermediate

Classification

Predict categories with logistic regression, SVM, KNN, decision trees, random forests, and gradient boosting. Master metrics like precision, recall, F1, and ROC-AUC.

Classification Algorithms

Python

from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
from sklearn.neighbors import KNeighborsClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier
from sklearn.naive_bayes import GaussianNB

models = {
    "Logistic Regression": LogisticRegression(max_iter=1000),
    "SVM": SVC(kernel="rbf", probability=True),
    "KNN": KNeighborsClassifier(n_neighbors=5),
    "Decision Tree": DecisionTreeClassifier(max_depth=5),
    "Random Forest": RandomForestClassifier(n_estimators=100),
    "Gradient Boosting": GradientBoostingClassifier(n_estimators=100),
    "Naive Bayes": GaussianNB()
}

for name, model in models.items():
    model.fit(X_train, y_train)
    score = model.score(X_test, y_test)
    print(f"{name}: {score:.4f}")

Classification Metrics

Python

from sklearn.metrics import (accuracy_score, precision_score, recall_score,
                               f1_score, roc_auc_score, classification_report,
                               confusion_matrix)

y_pred = model.predict(X_test)
y_proba = model.predict_proba(X_test)[:, 1]

print(f"Accuracy:  {accuracy_score(y_test, y_pred):.4f}")
print(f"Precision: {precision_score(y_test, y_pred):.4f}")
print(f"Recall:    {recall_score(y_test, y_pred):.4f}")
print(f"F1:        {f1_score(y_test, y_pred):.4f}")
print(f"ROC-AUC:   {roc_auc_score(y_test, y_proba):.4f}")
print(classification_report(y_test, y_pred))

Confusion Matrix

Python

import seaborn as sns
import matplotlib.pyplot as plt

cm = confusion_matrix(y_test, y_pred)
sns.heatmap(cm, annot=True, fmt="d", cmap="Blues")
plt.xlabel("Predicted")
plt.ylabel("Actual")
plt.title("Confusion Matrix")
plt.show()

Metric	Best For	Description
Accuracy	Balanced classes	Fraction of correct predictions
Precision	Minimizing false positives	Of predicted positives, how many are correct?
Recall	Minimizing false negatives	Of actual positives, how many were found?
F1 Score	Imbalanced classes	Harmonic mean of precision and recall
ROC-AUC	Ranking quality	Area under the ROC curve (0.5 = random, 1.0 = perfect)

Multi-Class Classification

Python

# Multi-class metrics
print(classification_report(y_test, y_pred, target_names=class_names))

# Multi-class ROC-AUC
roc_auc_score(y_test, y_proba, multi_class="ovr", average="weighted")

⚠

Never rely on accuracy alone for imbalanced datasets. If 95% of emails are not spam, a model that always predicts "not spam" gets 95% accuracy but catches zero spam. Use precision, recall, F1, and ROC-AUC instead.

← PreviousRegression Next →Clustering