Other Algorithms & Master Comparison

Anomaly detection, sequence labeling, association rules, and more

This final section covers important algorithms that do not fit neatly into the previous categories: probabilistic graphical models, anomaly detection methods, association rule mining, and self-organizing maps. The section concludes with the master comparison table of all 100+ algorithms.

1. Hidden Markov Model (HMM)

Description: A probabilistic model for sequential data with hidden (latent) states. Assumes the system transitions between hidden states according to a Markov chain (transition probabilities), and each state emits an observable output (emission probabilities). Three key problems: evaluation (forward algorithm), decoding (Viterbi), and learning (Baum-Welch / EM).

Use Cases: Speech recognition, part-of-speech tagging, gene sequence analysis, handwriting recognition, financial regime detection.

from hmmlearn.hmm import GaussianHMM
import numpy as np

# Generate sequential data
np.random.seed(42)
n_samples = 300
# Simulate two hidden states (e.g., bull/bear market)
state_seq = np.random.choice([0, 1], size=n_samples, p=[0.6, 0.4])
observations = np.where(state_seq == 0,
    np.random.normal(0.05, 0.02, n_samples),
    np.random.normal(-0.03, 0.04, n_samples)
).reshape(-1, 1)

model = GaussianHMM(
    n_components=2,           # Number of hidden states
    covariance_type='full',
    n_iter=100,
    random_state=42
)
model.fit(observations)

# Decode: find most likely hidden state sequence
hidden_states = model.predict(observations)
print(f"Transition matrix:\n{model.transmat_.round(3)}")
print(f"Means: {model.means_.flatten().round(4)}")
print(f"Score (log-likelihood): {model.score(observations):.2f}")

2. Conditional Random Field (CRF)

Description: A discriminative probabilistic model for labeling sequential data. Unlike HMMs (generative), CRFs directly model the conditional probability P(labels | observations), allowing them to use arbitrary features of the input. CRFs define a global normalization factor, avoiding the label bias problem of MEMMs.

Use Cases: Named entity recognition (NER), POS tagging, information extraction, image segmentation.

import sklearn_crfsuite
from sklearn_crfsuite import metrics

# CRF for sequence labeling (e.g., NER)
def word_to_features(sentence, i):
    word = sentence[i]
    features = {
        'word.lower()': word.lower(),
        'word[-3:]': word[-3:],
        'word[-2:]': word[-2:],
        'word.isupper()': word.isupper(),
        'word.istitle()': word.istitle(),
        'word.isdigit()': word.isdigit(),
    }
    if i > 0:
        features['prev_word'] = sentence[i-1].lower()
    if i < len(sentence) - 1:
        features['next_word'] = sentence[i+1].lower()
    return features

# Example training data
X_train = [[word_to_features(s, i) for i in range(len(s))]
           for s in [["John", "lives", "in", "New", "York"]]]
y_train = [["B-PER", "O", "O", "B-LOC", "I-LOC"]]

crf = sklearn_crfsuite.CRF(
    algorithm='lbfgs',
    c1=0.1,                   # L1 regularization
    c2=0.1,                   # L2 regularization
    max_iterations=100
)
crf.fit(X_train, y_train)
print(f"Labels: {crf.classes_}")

3. Isolation Forest

Description: An anomaly detection algorithm based on the principle that anomalies are "few and different" and therefore easier to isolate. Randomly selects a feature and split value to partition data; anomalies require fewer splits to be isolated (shorter path length in the tree).

Use Cases: Fraud detection, network intrusion detection, manufacturing defect detection, any unsupervised anomaly detection task.

from sklearn.ensemble import IsolationForest
import numpy as np

# Normal data + anomalies
np.random.seed(42)
X_normal = np.random.randn(200, 2)
X_anomaly = np.random.uniform(-4, 4, (20, 2))
X = np.vstack([X_normal, X_anomaly])

model = IsolationForest(
    n_estimators=100,
    contamination=0.1,        # Expected proportion of anomalies
    max_features=1.0,
    random_state=42
)
predictions = model.fit_predict(X)

n_anomalies = (predictions == -1).sum()
print(f"Detected anomalies: {n_anomalies} / {len(X)}")
print(f"Anomaly scores (first 5 normal): {model.score_samples(X[:5]).round(3)}")
print(f"Anomaly scores (first 5 anomaly): {model.score_samples(X[200:205]).round(3)}")

4. Local Outlier Factor (LOF)

Description: Detects anomalies by measuring the local density deviation of a point relative to its neighbors. A point with substantially lower density than its neighbors is considered an outlier. The LOF score quantifies how much more (or less) dense a point's neighborhood is compared to its neighbors' neighborhoods.

Use Cases: Outlier detection in datasets with varying densities, fraud detection, sensor data cleaning.

from sklearn.neighbors import LocalOutlierFactor

model = LocalOutlierFactor(
    n_neighbors=20,
    contamination=0.1,
    metric='euclidean',
    novelty=False             # True for prediction on new data
)
predictions = model.fit_predict(X)

n_outliers = (predictions == -1).sum()
print(f"Outliers detected: {n_outliers}")
print(f"LOF scores (first 5): {-model.negative_outlier_factor_[:5].round(3)}")

5. One-Class SVM

Description: An unsupervised anomaly detection variant of SVM. Learns a decision boundary that encloses most of the training data in feature space. Points outside this boundary are classified as anomalies. Works well in high-dimensional spaces.

Use Cases: Novelty detection, when only "normal" data is available for training, high-dimensional anomaly detection.

from sklearn.svm import OneClassSVM
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import make_pipeline

model = make_pipeline(
    StandardScaler(),
    OneClassSVM(
        kernel='rbf',
        gamma='scale',
        nu=0.1                # Upper bound on fraction of outliers
    )
)
model.fit(X_normal)  # Train on normal data only

# Predict on all data
predictions = model.predict(X)
n_anomalies = (predictions == -1).sum()
print(f"Anomalies detected: {n_anomalies} / {len(X)}")

6. Self-Organizing Map (SOM)

Description: An unsupervised neural network that produces a low-dimensional (typically 2D) discretized representation of the input space. Neurons are arranged in a grid, and each neuron has a weight vector. Training uses competitive learning: the best-matching neuron and its neighbors update their weights towards the input, preserving topological relationships.

Use Cases: Visualization of high-dimensional data, customer segmentation, document organization, exploratory data analysis.

from minisom import MiniSom
import numpy as np

# Create and train SOM
np.random.seed(42)
data = np.random.rand(500, 4)  # 500 samples, 4 features

som = MiniSom(
    x=10, y=10,               # 10x10 grid
    input_len=4,
    sigma=1.0,                 # Neighborhood radius
    learning_rate=0.5,
    random_seed=42
)
som.random_weights_init(data)
som.train_random(data, num_iteration=1000)

# Find best matching unit for each sample
winners = [som.winner(d) for d in data[:5]]
print(f"Grid size: 10x10 = 100 neurons")
print(f"First 5 sample mappings: {winners}")
print(f"Quantization error: {som.quantization_error(data):.4f}")

7. Restricted Boltzmann Machine (RBM)

Description: A two-layer stochastic neural network with visible and hidden units, with no connections within a layer (the "restricted" part). Learns a probability distribution over the input using contrastive divergence. Can be stacked to form Deep Belief Networks.

Use Cases: Feature learning, dimensionality reduction, collaborative filtering, pretraining deep networks (historically).

from sklearn.neural_network import BernoulliRBM
from sklearn.pipeline import make_pipeline
from sklearn.linear_model import LogisticRegression
from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split

X, y = load_digits(return_X_y=True)
X = X / 16.0  # Scale to [0, 1]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# RBM for feature extraction + Logistic Regression for classification
rbm = BernoulliRBM(
    n_components=100,
    learning_rate=0.01,
    n_iter=20,
    random_state=42,
    verbose=0
)
rbm.fit(X_train)

# Transform features
X_train_rbm = rbm.transform(X_train)
X_test_rbm = rbm.transform(X_test)
print(f"Original features: {X_train.shape[1]}")
print(f"RBM features: {X_train_rbm.shape[1]}")

8. K-Medoids (PAM)

Description: Similar to K-Means but uses actual data points (medoids) as cluster centers instead of means. This makes it more robust to outliers and works with any distance metric (not just Euclidean). Uses Partitioning Around Medoids (PAM) algorithm.

Use Cases: When cluster centers should be actual data points, when using non-Euclidean distances, outlier-robust clustering.

from sklearn_extra.cluster import KMedoids
from sklearn.metrics import silhouette_score

model = KMedoids(
    n_clusters=4,
    metric='euclidean',       # Works with any metric
    method='pam',             # 'pam' or 'alternate'
    init='k-medoids++',
    random_state=42
)
labels = model.fit_predict(X_normal)

print(f"Medoid indices: {model.medoid_indices_}")
print(f"Inertia: {model.inertia_:.2f}")
print(f"Silhouette: {silhouette_score(X_normal, labels):.4f}")

9. Apriori Algorithm

Description: An association rule mining algorithm that finds frequent itemsets in transactional databases. Uses a bottom-up approach: first finds frequent individual items, then extends to pairs, triples, etc. The key principle: any subset of a frequent itemset must also be frequent (anti-monotone property), enabling efficient pruning.

Use Cases: Market basket analysis ("customers who bought X also bought Y"), cross-selling, web usage mining, medical diagnosis co-occurrences.

from mlxtend.frequent_patterns import apriori, association_rules
import pandas as pd

# Transaction data (one-hot encoded)
data = {
    'bread':  [1, 1, 0, 1, 1, 0, 1, 1],
    'butter': [0, 1, 1, 1, 1, 0, 1, 0],
    'milk':   [1, 0, 1, 1, 0, 1, 1, 1],
    'eggs':   [0, 1, 0, 0, 1, 1, 1, 0],
    'cheese': [0, 0, 1, 1, 0, 1, 0, 1],
}
df = pd.DataFrame(data)

# Find frequent itemsets (min support = 40%)
frequent = apriori(df, min_support=0.4, use_colnames=True)
print("Frequent itemsets:")
print(frequent)

# Generate association rules
rules = association_rules(frequent, metric='confidence', min_threshold=0.6)
print("\nAssociation rules:")
print(rules[['antecedents', 'consequents', 'support', 'confidence', 'lift']])

10. FP-Growth

Description: An improved alternative to Apriori that avoids the expensive candidate generation step. Compresses the database into a Frequent Pattern tree (FP-tree), then extracts frequent itemsets directly from this compact structure. Significantly faster than Apriori for large datasets.

Use Cases: Same as Apriori but for larger datasets, real-time association mining, when Apriori is too slow.

from mlxtend.frequent_patterns import fpgrowth, association_rules

# FP-Growth (same interface as Apriori but faster)
frequent_fp = fpgrowth(df, min_support=0.4, use_colnames=True)
print("FP-Growth frequent itemsets:")
print(frequent_fp)

# Generate rules
rules_fp = association_rules(frequent_fp, metric='lift', min_threshold=1.0)
print(f"\nRules with lift > 1.0: {len(rules_fp)}")
for _, row in rules_fp.iterrows():
    print(f"  {set(row['antecedents'])} => {set(row['consequents'])} "
          f"(conf={row['confidence']:.2f}, lift={row['lift']:.2f})")

Master Comparison Table: All 100+ Algorithms

The complete reference table of every algorithm in this directory, organized by category.

Regression (15)

#AlgorithmTypeInterpretabilityScalabilityKey Library
1Linear RegressionSupervisedHighHighsklearn
2Polynomial RegressionSupervisedMediumMediumsklearn
3Ridge RegressionSupervisedHighHighsklearn
4Lasso RegressionSupervisedHighHighsklearn
5Elastic NetSupervisedHighHighsklearn
6Bayesian Linear RegressionSupervisedHighMediumsklearn
7SVRSupervisedLowLowsklearn
8Decision Tree RegressionSupervisedHighMediumsklearn
9Random Forest RegressionSupervisedMediumHighsklearn
10Gradient Boosting RegressionSupervisedLowMediumsklearn
11XGBoost RegressionSupervisedLowHighxgboost
12LightGBM RegressionSupervisedLowVery Highlightgbm
13CatBoost RegressionSupervisedLowHighcatboost
14Quantile RegressionSupervisedHighMediumsklearn
15Poisson RegressionSupervisedHighHighsklearn

Classification (17)

#AlgorithmTypeInterpretabilityScalabilityKey Library
16Logistic RegressionSupervisedHighHighsklearn
17KNNSupervisedMediumLowsklearn
18SVMSupervisedLowLowsklearn
19Decision Tree ClassifierSupervisedHighMediumsklearn
20Random Forest ClassifierSupervisedMediumHighsklearn
21Gaussian Naive BayesSupervisedHighVery Highsklearn
22Bernoulli Naive BayesSupervisedHighVery Highsklearn
23Multinomial Naive BayesSupervisedHighVery Highsklearn
24Gradient Boosting ClassifierSupervisedLowMediumsklearn
25AdaBoostSupervisedMediumMediumsklearn
26XGBoost ClassifierSupervisedLowHighxgboost
27LightGBM ClassifierSupervisedLowVery Highlightgbm
28CatBoost ClassifierSupervisedLowHighcatboost
29SGD ClassifierSupervisedHighVery Highsklearn
30PerceptronSupervisedHighVery Highsklearn
31Passive AggressiveSupervisedMediumVery Highsklearn
32Naive Bayes (General)SupervisedHighVery Highsklearn

Clustering (11)

#AlgorithmTypeRequires k?ScalabilityKey Library
33K-MeansUnsupervisedYesHighsklearn
34Mini-Batch K-MeansUnsupervisedYesVery Highsklearn
35Hierarchical ClusteringUnsupervisedOptionalLowscipy
36Agglomerative ClusteringUnsupervisedYesMediumsklearn
37DBSCANUnsupervisedNoMediumsklearn
38OPTICSUnsupervisedNoMediumsklearn
39Mean ShiftUnsupervisedNoLowsklearn
40Spectral ClusteringUnsupervisedYesLowsklearn
41GMMUnsupervisedYesMediumsklearn
42BIRCHUnsupervisedYesVery Highsklearn
43Affinity PropagationUnsupervisedNoLowsklearn

Dimensionality Reduction (10)

#AlgorithmTypeLinear?ScalabilityKey Library
44PCAUnsupervisedYesHighsklearn
45Kernel PCAUnsupervisedNoMediumsklearn
46LDASupervisedYesHighsklearn
47t-SNEUnsupervisedNoLowsklearn
48UMAPUnsupervisedNoHighumap-learn
49ICAUnsupervisedYesMediumsklearn
50Factor AnalysisUnsupervisedYesMediumsklearn
51NMFUnsupervisedYesMediumsklearn
52IsomapUnsupervisedNoLowsklearn
53LLEUnsupervisedNoLowsklearn

Ensemble Methods (7)

#AlgorithmStrategyReducesScalabilityKey Library
54BaggingParallelVarianceHighsklearn
55BoostingSequentialBiasMediumsklearn
56Random ForestBagging + feature randomVarianceHighsklearn
57Gradient BoostingSequentialBothMediumsklearn
58AdaBoostSequentialBiasMediumsklearn
59StackingMeta-learningBothLowsklearn
60VotingAggregationVarianceMediumsklearn

Reinforcement Learning (14)

#AlgorithmCategoryAction SpaceOn/Off PolicyKey Library
61Q-LearningValue-basedDiscreteOffCustom
62SARSAValue-basedDiscreteOnCustom
63DQNValue-basedDiscreteOffPyTorch/TF
64Double DQNValue-basedDiscreteOffPyTorch/TF
65Dueling DQNValue-basedDiscreteOffPyTorch/TF
66Policy GradientPolicy-basedBothOnPyTorch/TF
67REINFORCEPolicy-basedBothOnPyTorch/TF
68Actor-CriticActor-CriticBothBothPyTorch/TF
69A3CActor-CriticBothOnPyTorch/TF
70PPOActor-CriticBothOnstable-baselines3
71TRPOActor-CriticBothOnsb3-contrib
72DDPGActor-CriticContinuousOffstable-baselines3
73TD3Actor-CriticContinuousOffstable-baselines3
74SACActor-CriticContinuousOffstable-baselines3

Neural Networks & Deep Learning (14)

#ArchitectureData TypeYearKey Library
75ANNTabular1943PyTorch/TF
76Feedforward NNTabular1986sklearn/PyTorch
77MLPTabular1986sklearn/PyTorch
78CNNImages1989PyTorch/TF
79RNNSequential1986PyTorch/TF
80LSTMSequential1997PyTorch/TF
81GRUSequential2014PyTorch/TF
82TransformerSequential/Any2017PyTorch/TF
83GNNGraphs2009PyG/DGL
84GCNGraphs2017PyG/DGL
85GATGraphs2018PyG/DGL
86AutoencoderAny1986PyTorch/TF
87VAEAny2013PyTorch/TF
88GANAny2014PyTorch/TF

Time Series (5) & Recommendation (4)

#AlgorithmCategoryScalabilityKey Library
89ARIMATime SeriesMediumstatsmodels
90SARIMATime SeriesMediumstatsmodels
91ProphetTime SeriesHighprophet
92Holt-WintersTime SeriesHighstatsmodels
93State Space ModelsTime SeriesMediumstatsmodels
94Collaborative FilteringRecommendationMediumsurprise
95Content-Based FilteringRecommendationHighsklearn
96Matrix FactorizationRecommendationHighsurprise/custom
97Factorization MachinesRecommendationHighPyTorch/xlearn

Other Algorithms (10)

#AlgorithmCategoryScalabilityKey Library
98HMMProbabilistic / SequenceMediumhmmlearn
99CRFProbabilistic / SequenceMediumsklearn-crfsuite
100Isolation ForestAnomaly DetectionHighsklearn
101Local Outlier FactorAnomaly DetectionMediumsklearn
102One-Class SVMAnomaly DetectionLowsklearn
103Self-Organizing MapUnsupervised / VisualizationMediumminisom
104RBMUnsupervised / GenerativeMediumsklearn
105K-MedoidsClusteringMediumsklearn-extra
106AprioriAssociation RulesLowmlxtend
107FP-GrowthAssociation RulesMediummlxtend

Total: 107 algorithms across 10 categories. This directory provides a comprehensive reference for selecting, understanding, and implementing machine learning algorithms. Use the sidebar to navigate to any category, and refer to the Overview page for the selection guide.