Intermediate

Classification

Predict categories with logistic regression, SVM, KNN, decision trees, random forests, and gradient boosting. Master metrics like precision, recall, F1, and ROC-AUC.

Classification Algorithms

Python
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
from sklearn.neighbors import KNeighborsClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier
from sklearn.naive_bayes import GaussianNB

models = {
    "Logistic Regression": LogisticRegression(max_iter=1000),
    "SVM": SVC(kernel="rbf", probability=True),
    "KNN": KNeighborsClassifier(n_neighbors=5),
    "Decision Tree": DecisionTreeClassifier(max_depth=5),
    "Random Forest": RandomForestClassifier(n_estimators=100),
    "Gradient Boosting": GradientBoostingClassifier(n_estimators=100),
    "Naive Bayes": GaussianNB()
}

for name, model in models.items():
    model.fit(X_train, y_train)
    score = model.score(X_test, y_test)
    print(f"{name}: {score:.4f}")

Classification Metrics

Python
from sklearn.metrics import (accuracy_score, precision_score, recall_score,
                               f1_score, roc_auc_score, classification_report,
                               confusion_matrix)

y_pred = model.predict(X_test)
y_proba = model.predict_proba(X_test)[:, 1]

print(f"Accuracy:  {accuracy_score(y_test, y_pred):.4f}")
print(f"Precision: {precision_score(y_test, y_pred):.4f}")
print(f"Recall:    {recall_score(y_test, y_pred):.4f}")
print(f"F1:        {f1_score(y_test, y_pred):.4f}")
print(f"ROC-AUC:   {roc_auc_score(y_test, y_proba):.4f}")
print(classification_report(y_test, y_pred))

Confusion Matrix

Python
import seaborn as sns
import matplotlib.pyplot as plt

cm = confusion_matrix(y_test, y_pred)
sns.heatmap(cm, annot=True, fmt="d", cmap="Blues")
plt.xlabel("Predicted")
plt.ylabel("Actual")
plt.title("Confusion Matrix")
plt.show()
MetricBest ForDescription
AccuracyBalanced classesFraction of correct predictions
PrecisionMinimizing false positivesOf predicted positives, how many are correct?
RecallMinimizing false negativesOf actual positives, how many were found?
F1 ScoreImbalanced classesHarmonic mean of precision and recall
ROC-AUCRanking qualityArea under the ROC curve (0.5 = random, 1.0 = perfect)

Multi-Class Classification

Python
# Multi-class metrics
print(classification_report(y_test, y_pred, target_names=class_names))

# Multi-class ROC-AUC
roc_auc_score(y_test, y_proba, multi_class="ovr", average="weighted")
Never rely on accuracy alone for imbalanced datasets. If 95% of emails are not spam, a model that always predicts "not spam" gets 95% accuracy but catches zero spam. Use precision, recall, F1, and ROC-AUC instead.