Other Algorithms & Master Comparison
Anomaly detection, sequence labeling, association rules, and more
This final section covers important algorithms that do not fit neatly into the previous categories: probabilistic graphical models, anomaly detection methods, association rule mining, and self-organizing maps. The section concludes with the master comparison table of all 100+ algorithms.
1. Hidden Markov Model (HMM)
Description: A probabilistic model for sequential data with hidden (latent) states. Assumes the system transitions between hidden states according to a Markov chain (transition probabilities), and each state emits an observable output (emission probabilities). Three key problems: evaluation (forward algorithm), decoding (Viterbi), and learning (Baum-Welch / EM).
Use Cases: Speech recognition, part-of-speech tagging, gene sequence analysis, handwriting recognition, financial regime detection.
from hmmlearn.hmm import GaussianHMM
import numpy as np
# Generate sequential data
np.random.seed(42)
n_samples = 300
# Simulate two hidden states (e.g., bull/bear market)
state_seq = np.random.choice([0, 1], size=n_samples, p=[0.6, 0.4])
observations = np.where(state_seq == 0,
np.random.normal(0.05, 0.02, n_samples),
np.random.normal(-0.03, 0.04, n_samples)
).reshape(-1, 1)
model = GaussianHMM(
n_components=2, # Number of hidden states
covariance_type='full',
n_iter=100,
random_state=42
)
model.fit(observations)
# Decode: find most likely hidden state sequence
hidden_states = model.predict(observations)
print(f"Transition matrix:\n{model.transmat_.round(3)}")
print(f"Means: {model.means_.flatten().round(4)}")
print(f"Score (log-likelihood): {model.score(observations):.2f}")
2. Conditional Random Field (CRF)
Description: A discriminative probabilistic model for labeling sequential data. Unlike HMMs (generative), CRFs directly model the conditional probability P(labels | observations), allowing them to use arbitrary features of the input. CRFs define a global normalization factor, avoiding the label bias problem of MEMMs.
Use Cases: Named entity recognition (NER), POS tagging, information extraction, image segmentation.
import sklearn_crfsuite
from sklearn_crfsuite import metrics
# CRF for sequence labeling (e.g., NER)
def word_to_features(sentence, i):
word = sentence[i]
features = {
'word.lower()': word.lower(),
'word[-3:]': word[-3:],
'word[-2:]': word[-2:],
'word.isupper()': word.isupper(),
'word.istitle()': word.istitle(),
'word.isdigit()': word.isdigit(),
}
if i > 0:
features['prev_word'] = sentence[i-1].lower()
if i < len(sentence) - 1:
features['next_word'] = sentence[i+1].lower()
return features
# Example training data
X_train = [[word_to_features(s, i) for i in range(len(s))]
for s in [["John", "lives", "in", "New", "York"]]]
y_train = [["B-PER", "O", "O", "B-LOC", "I-LOC"]]
crf = sklearn_crfsuite.CRF(
algorithm='lbfgs',
c1=0.1, # L1 regularization
c2=0.1, # L2 regularization
max_iterations=100
)
crf.fit(X_train, y_train)
print(f"Labels: {crf.classes_}")
3. Isolation Forest
Description: An anomaly detection algorithm based on the principle that anomalies are "few and different" and therefore easier to isolate. Randomly selects a feature and split value to partition data; anomalies require fewer splits to be isolated (shorter path length in the tree).
Use Cases: Fraud detection, network intrusion detection, manufacturing defect detection, any unsupervised anomaly detection task.
from sklearn.ensemble import IsolationForest
import numpy as np
# Normal data + anomalies
np.random.seed(42)
X_normal = np.random.randn(200, 2)
X_anomaly = np.random.uniform(-4, 4, (20, 2))
X = np.vstack([X_normal, X_anomaly])
model = IsolationForest(
n_estimators=100,
contamination=0.1, # Expected proportion of anomalies
max_features=1.0,
random_state=42
)
predictions = model.fit_predict(X)
n_anomalies = (predictions == -1).sum()
print(f"Detected anomalies: {n_anomalies} / {len(X)}")
print(f"Anomaly scores (first 5 normal): {model.score_samples(X[:5]).round(3)}")
print(f"Anomaly scores (first 5 anomaly): {model.score_samples(X[200:205]).round(3)}")
4. Local Outlier Factor (LOF)
Description: Detects anomalies by measuring the local density deviation of a point relative to its neighbors. A point with substantially lower density than its neighbors is considered an outlier. The LOF score quantifies how much more (or less) dense a point's neighborhood is compared to its neighbors' neighborhoods.
Use Cases: Outlier detection in datasets with varying densities, fraud detection, sensor data cleaning.
from sklearn.neighbors import LocalOutlierFactor
model = LocalOutlierFactor(
n_neighbors=20,
contamination=0.1,
metric='euclidean',
novelty=False # True for prediction on new data
)
predictions = model.fit_predict(X)
n_outliers = (predictions == -1).sum()
print(f"Outliers detected: {n_outliers}")
print(f"LOF scores (first 5): {-model.negative_outlier_factor_[:5].round(3)}")
5. One-Class SVM
Description: An unsupervised anomaly detection variant of SVM. Learns a decision boundary that encloses most of the training data in feature space. Points outside this boundary are classified as anomalies. Works well in high-dimensional spaces.
Use Cases: Novelty detection, when only "normal" data is available for training, high-dimensional anomaly detection.
from sklearn.svm import OneClassSVM
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import make_pipeline
model = make_pipeline(
StandardScaler(),
OneClassSVM(
kernel='rbf',
gamma='scale',
nu=0.1 # Upper bound on fraction of outliers
)
)
model.fit(X_normal) # Train on normal data only
# Predict on all data
predictions = model.predict(X)
n_anomalies = (predictions == -1).sum()
print(f"Anomalies detected: {n_anomalies} / {len(X)}")
6. Self-Organizing Map (SOM)
Description: An unsupervised neural network that produces a low-dimensional (typically 2D) discretized representation of the input space. Neurons are arranged in a grid, and each neuron has a weight vector. Training uses competitive learning: the best-matching neuron and its neighbors update their weights towards the input, preserving topological relationships.
Use Cases: Visualization of high-dimensional data, customer segmentation, document organization, exploratory data analysis.
from minisom import MiniSom
import numpy as np
# Create and train SOM
np.random.seed(42)
data = np.random.rand(500, 4) # 500 samples, 4 features
som = MiniSom(
x=10, y=10, # 10x10 grid
input_len=4,
sigma=1.0, # Neighborhood radius
learning_rate=0.5,
random_seed=42
)
som.random_weights_init(data)
som.train_random(data, num_iteration=1000)
# Find best matching unit for each sample
winners = [som.winner(d) for d in data[:5]]
print(f"Grid size: 10x10 = 100 neurons")
print(f"First 5 sample mappings: {winners}")
print(f"Quantization error: {som.quantization_error(data):.4f}")
7. Restricted Boltzmann Machine (RBM)
Description: A two-layer stochastic neural network with visible and hidden units, with no connections within a layer (the "restricted" part). Learns a probability distribution over the input using contrastive divergence. Can be stacked to form Deep Belief Networks.
Use Cases: Feature learning, dimensionality reduction, collaborative filtering, pretraining deep networks (historically).
from sklearn.neural_network import BernoulliRBM
from sklearn.pipeline import make_pipeline
from sklearn.linear_model import LogisticRegression
from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split
X, y = load_digits(return_X_y=True)
X = X / 16.0 # Scale to [0, 1]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# RBM for feature extraction + Logistic Regression for classification
rbm = BernoulliRBM(
n_components=100,
learning_rate=0.01,
n_iter=20,
random_state=42,
verbose=0
)
rbm.fit(X_train)
# Transform features
X_train_rbm = rbm.transform(X_train)
X_test_rbm = rbm.transform(X_test)
print(f"Original features: {X_train.shape[1]}")
print(f"RBM features: {X_train_rbm.shape[1]}")
8. K-Medoids (PAM)
Description: Similar to K-Means but uses actual data points (medoids) as cluster centers instead of means. This makes it more robust to outliers and works with any distance metric (not just Euclidean). Uses Partitioning Around Medoids (PAM) algorithm.
Use Cases: When cluster centers should be actual data points, when using non-Euclidean distances, outlier-robust clustering.
from sklearn_extra.cluster import KMedoids
from sklearn.metrics import silhouette_score
model = KMedoids(
n_clusters=4,
metric='euclidean', # Works with any metric
method='pam', # 'pam' or 'alternate'
init='k-medoids++',
random_state=42
)
labels = model.fit_predict(X_normal)
print(f"Medoid indices: {model.medoid_indices_}")
print(f"Inertia: {model.inertia_:.2f}")
print(f"Silhouette: {silhouette_score(X_normal, labels):.4f}")
9. Apriori Algorithm
Description: An association rule mining algorithm that finds frequent itemsets in transactional databases. Uses a bottom-up approach: first finds frequent individual items, then extends to pairs, triples, etc. The key principle: any subset of a frequent itemset must also be frequent (anti-monotone property), enabling efficient pruning.
Use Cases: Market basket analysis ("customers who bought X also bought Y"), cross-selling, web usage mining, medical diagnosis co-occurrences.
from mlxtend.frequent_patterns import apriori, association_rules
import pandas as pd
# Transaction data (one-hot encoded)
data = {
'bread': [1, 1, 0, 1, 1, 0, 1, 1],
'butter': [0, 1, 1, 1, 1, 0, 1, 0],
'milk': [1, 0, 1, 1, 0, 1, 1, 1],
'eggs': [0, 1, 0, 0, 1, 1, 1, 0],
'cheese': [0, 0, 1, 1, 0, 1, 0, 1],
}
df = pd.DataFrame(data)
# Find frequent itemsets (min support = 40%)
frequent = apriori(df, min_support=0.4, use_colnames=True)
print("Frequent itemsets:")
print(frequent)
# Generate association rules
rules = association_rules(frequent, metric='confidence', min_threshold=0.6)
print("\nAssociation rules:")
print(rules[['antecedents', 'consequents', 'support', 'confidence', 'lift']])
10. FP-Growth
Description: An improved alternative to Apriori that avoids the expensive candidate generation step. Compresses the database into a Frequent Pattern tree (FP-tree), then extracts frequent itemsets directly from this compact structure. Significantly faster than Apriori for large datasets.
Use Cases: Same as Apriori but for larger datasets, real-time association mining, when Apriori is too slow.
from mlxtend.frequent_patterns import fpgrowth, association_rules
# FP-Growth (same interface as Apriori but faster)
frequent_fp = fpgrowth(df, min_support=0.4, use_colnames=True)
print("FP-Growth frequent itemsets:")
print(frequent_fp)
# Generate rules
rules_fp = association_rules(frequent_fp, metric='lift', min_threshold=1.0)
print(f"\nRules with lift > 1.0: {len(rules_fp)}")
for _, row in rules_fp.iterrows():
print(f" {set(row['antecedents'])} => {set(row['consequents'])} "
f"(conf={row['confidence']:.2f}, lift={row['lift']:.2f})")
Master Comparison Table: All 100+ Algorithms
The complete reference table of every algorithm in this directory, organized by category.
Regression (15)
| # | Algorithm | Type | Interpretability | Scalability | Key Library |
|---|---|---|---|---|---|
| 1 | Linear Regression | Supervised | High | High | sklearn |
| 2 | Polynomial Regression | Supervised | Medium | Medium | sklearn |
| 3 | Ridge Regression | Supervised | High | High | sklearn |
| 4 | Lasso Regression | Supervised | High | High | sklearn |
| 5 | Elastic Net | Supervised | High | High | sklearn |
| 6 | Bayesian Linear Regression | Supervised | High | Medium | sklearn |
| 7 | SVR | Supervised | Low | Low | sklearn |
| 8 | Decision Tree Regression | Supervised | High | Medium | sklearn |
| 9 | Random Forest Regression | Supervised | Medium | High | sklearn |
| 10 | Gradient Boosting Regression | Supervised | Low | Medium | sklearn |
| 11 | XGBoost Regression | Supervised | Low | High | xgboost |
| 12 | LightGBM Regression | Supervised | Low | Very High | lightgbm |
| 13 | CatBoost Regression | Supervised | Low | High | catboost |
| 14 | Quantile Regression | Supervised | High | Medium | sklearn |
| 15 | Poisson Regression | Supervised | High | High | sklearn |
Classification (17)
| # | Algorithm | Type | Interpretability | Scalability | Key Library |
|---|---|---|---|---|---|
| 16 | Logistic Regression | Supervised | High | High | sklearn |
| 17 | KNN | Supervised | Medium | Low | sklearn |
| 18 | SVM | Supervised | Low | Low | sklearn |
| 19 | Decision Tree Classifier | Supervised | High | Medium | sklearn |
| 20 | Random Forest Classifier | Supervised | Medium | High | sklearn |
| 21 | Gaussian Naive Bayes | Supervised | High | Very High | sklearn |
| 22 | Bernoulli Naive Bayes | Supervised | High | Very High | sklearn |
| 23 | Multinomial Naive Bayes | Supervised | High | Very High | sklearn |
| 24 | Gradient Boosting Classifier | Supervised | Low | Medium | sklearn |
| 25 | AdaBoost | Supervised | Medium | Medium | sklearn |
| 26 | XGBoost Classifier | Supervised | Low | High | xgboost |
| 27 | LightGBM Classifier | Supervised | Low | Very High | lightgbm |
| 28 | CatBoost Classifier | Supervised | Low | High | catboost |
| 29 | SGD Classifier | Supervised | High | Very High | sklearn |
| 30 | Perceptron | Supervised | High | Very High | sklearn |
| 31 | Passive Aggressive | Supervised | Medium | Very High | sklearn |
| 32 | Naive Bayes (General) | Supervised | High | Very High | sklearn |
Clustering (11)
| # | Algorithm | Type | Requires k? | Scalability | Key Library |
|---|---|---|---|---|---|
| 33 | K-Means | Unsupervised | Yes | High | sklearn |
| 34 | Mini-Batch K-Means | Unsupervised | Yes | Very High | sklearn |
| 35 | Hierarchical Clustering | Unsupervised | Optional | Low | scipy |
| 36 | Agglomerative Clustering | Unsupervised | Yes | Medium | sklearn |
| 37 | DBSCAN | Unsupervised | No | Medium | sklearn |
| 38 | OPTICS | Unsupervised | No | Medium | sklearn |
| 39 | Mean Shift | Unsupervised | No | Low | sklearn |
| 40 | Spectral Clustering | Unsupervised | Yes | Low | sklearn |
| 41 | GMM | Unsupervised | Yes | Medium | sklearn |
| 42 | BIRCH | Unsupervised | Yes | Very High | sklearn |
| 43 | Affinity Propagation | Unsupervised | No | Low | sklearn |
Dimensionality Reduction (10)
| # | Algorithm | Type | Linear? | Scalability | Key Library |
|---|---|---|---|---|---|
| 44 | PCA | Unsupervised | Yes | High | sklearn |
| 45 | Kernel PCA | Unsupervised | No | Medium | sklearn |
| 46 | LDA | Supervised | Yes | High | sklearn |
| 47 | t-SNE | Unsupervised | No | Low | sklearn |
| 48 | UMAP | Unsupervised | No | High | umap-learn |
| 49 | ICA | Unsupervised | Yes | Medium | sklearn |
| 50 | Factor Analysis | Unsupervised | Yes | Medium | sklearn |
| 51 | NMF | Unsupervised | Yes | Medium | sklearn |
| 52 | Isomap | Unsupervised | No | Low | sklearn |
| 53 | LLE | Unsupervised | No | Low | sklearn |
Ensemble Methods (7)
| # | Algorithm | Strategy | Reduces | Scalability | Key Library |
|---|---|---|---|---|---|
| 54 | Bagging | Parallel | Variance | High | sklearn |
| 55 | Boosting | Sequential | Bias | Medium | sklearn |
| 56 | Random Forest | Bagging + feature random | Variance | High | sklearn |
| 57 | Gradient Boosting | Sequential | Both | Medium | sklearn |
| 58 | AdaBoost | Sequential | Bias | Medium | sklearn |
| 59 | Stacking | Meta-learning | Both | Low | sklearn |
| 60 | Voting | Aggregation | Variance | Medium | sklearn |
Reinforcement Learning (14)
| # | Algorithm | Category | Action Space | On/Off Policy | Key Library |
|---|---|---|---|---|---|
| 61 | Q-Learning | Value-based | Discrete | Off | Custom |
| 62 | SARSA | Value-based | Discrete | On | Custom |
| 63 | DQN | Value-based | Discrete | Off | PyTorch/TF |
| 64 | Double DQN | Value-based | Discrete | Off | PyTorch/TF |
| 65 | Dueling DQN | Value-based | Discrete | Off | PyTorch/TF |
| 66 | Policy Gradient | Policy-based | Both | On | PyTorch/TF |
| 67 | REINFORCE | Policy-based | Both | On | PyTorch/TF |
| 68 | Actor-Critic | Actor-Critic | Both | Both | PyTorch/TF |
| 69 | A3C | Actor-Critic | Both | On | PyTorch/TF |
| 70 | PPO | Actor-Critic | Both | On | stable-baselines3 |
| 71 | TRPO | Actor-Critic | Both | On | sb3-contrib |
| 72 | DDPG | Actor-Critic | Continuous | Off | stable-baselines3 |
| 73 | TD3 | Actor-Critic | Continuous | Off | stable-baselines3 |
| 74 | SAC | Actor-Critic | Continuous | Off | stable-baselines3 |
Neural Networks & Deep Learning (14)
| # | Architecture | Data Type | Year | Key Library |
|---|---|---|---|---|
| 75 | ANN | Tabular | 1943 | PyTorch/TF |
| 76 | Feedforward NN | Tabular | 1986 | sklearn/PyTorch |
| 77 | MLP | Tabular | 1986 | sklearn/PyTorch |
| 78 | CNN | Images | 1989 | PyTorch/TF |
| 79 | RNN | Sequential | 1986 | PyTorch/TF |
| 80 | LSTM | Sequential | 1997 | PyTorch/TF |
| 81 | GRU | Sequential | 2014 | PyTorch/TF |
| 82 | Transformer | Sequential/Any | 2017 | PyTorch/TF |
| 83 | GNN | Graphs | 2009 | PyG/DGL |
| 84 | GCN | Graphs | 2017 | PyG/DGL |
| 85 | GAT | Graphs | 2018 | PyG/DGL |
| 86 | Autoencoder | Any | 1986 | PyTorch/TF |
| 87 | VAE | Any | 2013 | PyTorch/TF |
| 88 | GAN | Any | 2014 | PyTorch/TF |
Time Series (5) & Recommendation (4)
| # | Algorithm | Category | Scalability | Key Library |
|---|---|---|---|---|
| 89 | ARIMA | Time Series | Medium | statsmodels |
| 90 | SARIMA | Time Series | Medium | statsmodels |
| 91 | Prophet | Time Series | High | prophet |
| 92 | Holt-Winters | Time Series | High | statsmodels |
| 93 | State Space Models | Time Series | Medium | statsmodels |
| 94 | Collaborative Filtering | Recommendation | Medium | surprise |
| 95 | Content-Based Filtering | Recommendation | High | sklearn |
| 96 | Matrix Factorization | Recommendation | High | surprise/custom |
| 97 | Factorization Machines | Recommendation | High | PyTorch/xlearn |
Other Algorithms (10)
| # | Algorithm | Category | Scalability | Key Library |
|---|---|---|---|---|
| 98 | HMM | Probabilistic / Sequence | Medium | hmmlearn |
| 99 | CRF | Probabilistic / Sequence | Medium | sklearn-crfsuite |
| 100 | Isolation Forest | Anomaly Detection | High | sklearn |
| 101 | Local Outlier Factor | Anomaly Detection | Medium | sklearn |
| 102 | One-Class SVM | Anomaly Detection | Low | sklearn |
| 103 | Self-Organizing Map | Unsupervised / Visualization | Medium | minisom |
| 104 | RBM | Unsupervised / Generative | Medium | sklearn |
| 105 | K-Medoids | Clustering | Medium | sklearn-extra |
| 106 | Apriori | Association Rules | Low | mlxtend |
| 107 | FP-Growth | Association Rules | Medium | mlxtend |
Total: 107 algorithms across 10 categories. This directory provides a comprehensive reference for selecting, understanding, and implementing machine learning algorithms. Use the sidebar to navigate to any category, and refer to the Overview page for the selection guide.