Intermediate
Classification
Predict categories with logistic regression, SVM, KNN, decision trees, random forests, and gradient boosting. Master metrics like precision, recall, F1, and ROC-AUC.
Classification Algorithms
Python
from sklearn.linear_model import LogisticRegression from sklearn.svm import SVC from sklearn.neighbors import KNeighborsClassifier from sklearn.tree import DecisionTreeClassifier from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier from sklearn.naive_bayes import GaussianNB models = { "Logistic Regression": LogisticRegression(max_iter=1000), "SVM": SVC(kernel="rbf", probability=True), "KNN": KNeighborsClassifier(n_neighbors=5), "Decision Tree": DecisionTreeClassifier(max_depth=5), "Random Forest": RandomForestClassifier(n_estimators=100), "Gradient Boosting": GradientBoostingClassifier(n_estimators=100), "Naive Bayes": GaussianNB() } for name, model in models.items(): model.fit(X_train, y_train) score = model.score(X_test, y_test) print(f"{name}: {score:.4f}")
Classification Metrics
Python
from sklearn.metrics import (accuracy_score, precision_score, recall_score, f1_score, roc_auc_score, classification_report, confusion_matrix) y_pred = model.predict(X_test) y_proba = model.predict_proba(X_test)[:, 1] print(f"Accuracy: {accuracy_score(y_test, y_pred):.4f}") print(f"Precision: {precision_score(y_test, y_pred):.4f}") print(f"Recall: {recall_score(y_test, y_pred):.4f}") print(f"F1: {f1_score(y_test, y_pred):.4f}") print(f"ROC-AUC: {roc_auc_score(y_test, y_proba):.4f}") print(classification_report(y_test, y_pred))
Confusion Matrix
Python
import seaborn as sns import matplotlib.pyplot as plt cm = confusion_matrix(y_test, y_pred) sns.heatmap(cm, annot=True, fmt="d", cmap="Blues") plt.xlabel("Predicted") plt.ylabel("Actual") plt.title("Confusion Matrix") plt.show()
| Metric | Best For | Description |
|---|---|---|
| Accuracy | Balanced classes | Fraction of correct predictions |
| Precision | Minimizing false positives | Of predicted positives, how many are correct? |
| Recall | Minimizing false negatives | Of actual positives, how many were found? |
| F1 Score | Imbalanced classes | Harmonic mean of precision and recall |
| ROC-AUC | Ranking quality | Area under the ROC curve (0.5 = random, 1.0 = perfect) |
Multi-Class Classification
Python
# Multi-class metrics print(classification_report(y_test, y_pred, target_names=class_names)) # Multi-class ROC-AUC roc_auc_score(y_test, y_proba, multi_class="ovr", average="weighted")
Never rely on accuracy alone for imbalanced datasets. If 95% of emails are not spam, a model that always predicts "not spam" gets 95% accuracy but catches zero spam. Use precision, recall, F1, and ROC-AUC instead.
Lilly Tech Systems