Beginner

Implementing ML From Scratch

The "implement X from scratch" question is the gold standard of ML coding interviews. Companies like Google, Meta, Apple, and Amazon use it to test whether you truly understand the algorithms you use daily — or whether you just call sklearn.fit(). This lesson explains why, what interviewers evaluate, and the NumPy-only approach we use throughout this course.

Why Interviewers Ask This

When an interviewer says "implement linear regression from scratch without using scikit-learn," they are testing three things simultaneously:

Mathematical Understanding

Can you write down the loss function, derive the gradient, and translate the math into code? This separates engineers who understand ML from those who only use APIs.

Coding Ability

Can you write clean, correct, vectorized NumPy code? Handle edge cases? Structure the code with a fit/predict interface? This tests real engineering skills.

Debugging Intuition

When the model doesn't converge, can you diagnose whether it's a learning rate issue, a gradient bug, or a numerical stability problem? This requires deep understanding.

What Interviewers Evaluate

Based on publicly shared interview experiences, here is the rubric interviewers typically use:

# ML From-Scratch Interview Evaluation Rubric

evaluation = {
    "mathematical_correctness": {
        "weight": "CRITICAL",
        "pass": "Correct loss function, correct gradient derivation",
        "fail": "Wrong gradient, missing regularization term",
        "example": "Knows that dL/dw = (1/n) * X.T @ (X @ w - y)"
    },
    "vectorized_implementation": {
        "weight": "HIGH",
        "pass": "Matrix operations, no loops over samples",
        "fail": "For-loop computing gradient one sample at a time",
        "example": "gradient = X.T @ (predictions - y) / n"
    },
    "numerical_stability": {
        "weight": "HIGH",
        "pass": "Log-sum-exp trick, gradient clipping, epsilon in log",
        "fail": "np.exp(x) overflow, log(0) = -inf",
        "example": "np.log(np.clip(probs, 1e-15, 1 - 1e-15))"
    },
    "code_organization": {
        "weight": "MEDIUM",
        "pass": "Class with fit(), predict(), clean interface",
        "fail": "Single monolithic function, global variables",
        "example": "class LinearRegression: def fit(self, X, y): ..."
    },
    "convergence_awareness": {
        "weight": "MEDIUM",
        "pass": "Tracks loss history, knows when to stop",
        "fail": "Fixed number of iterations, no loss monitoring",
        "example": "if abs(prev_loss - loss) < tol: break"
    }
}

The NumPy-Only Approach

Throughout this course, we implement every algorithm using only NumPy. No scikit-learn, no PyTorch, no TensorFlow. This is exactly the constraint you face in interviews.

import numpy as np

# This is ALL you get in the interview:
# - np.array, np.zeros, np.ones, np.eye
# - np.dot, @, np.linalg.norm, np.linalg.inv
# - np.exp, np.log, np.maximum, np.clip
# - np.mean, np.sum, np.std, np.argmax
# - np.random for data generation

# The standard pattern for every algorithm:

class MLAlgorithm:
    """Base pattern for from-scratch implementations."""

    def __init__(self, hyperparameters):
        # Store hyperparameters (learning_rate, n_iterations, etc.)
        self.lr = hyperparameters.get('learning_rate', 0.01)
        self.n_iter = hyperparameters.get('n_iterations', 1000)
        self.weights = None
        self.bias = None
        self.loss_history = []

    def fit(self, X, y):
        """Train the model on data X with labels y."""
        n_samples, n_features = X.shape

        # 1. Initialize parameters
        self.weights = np.zeros(n_features)
        self.bias = 0.0

        # 2. Optimization loop
        for i in range(self.n_iter):
            # Forward pass: compute predictions
            predictions = self._forward(X)

            # Compute loss
            loss = self._compute_loss(predictions, y)
            self.loss_history.append(loss)

            # Backward pass: compute gradients
            dw, db = self._compute_gradients(X, y, predictions)

            # Update parameters
            self.weights -= self.lr * dw
            self.bias -= self.lr * db

        return self

    def predict(self, X):
        """Make predictions on new data."""
        return self._forward(X)

    def _forward(self, X):
        raise NotImplementedError

    def _compute_loss(self, predictions, y):
        raise NotImplementedError

    def _compute_gradients(self, X, y, predictions):
        raise NotImplementedError
💡
Interview tip: Start by writing the class skeleton with __init__, fit, and predict methods before filling in the math. This shows the interviewer you think in terms of clean interfaces, not just raw math. It also gives you a structure to fill in incrementally.

Common Patterns Across All Algorithms

Every ML algorithm you implement from scratch shares these fundamental patterns:

PatternDescriptionExample
Parameter initializationSet initial weights, often zeros or small random valuesW = np.random.randn(d) * 0.01
Forward passCompute predictions from inputs and current parametersy_hat = X @ W + b
Loss computationMeasure how wrong the predictions areloss = np.mean((y_hat - y) ** 2)
Gradient computationCompute direction to update parametersdW = X.T @ (y_hat - y) / n
Parameter updateMove parameters in the negative gradient directionW -= lr * dW
Convergence checkStop when loss stops decreasingif abs(loss - prev) < tol: break

Algorithms We Will Implement

Here is a preview of every algorithm in this course, with the key mathematical concept behind each:

Course Roadmap:

Lesson 2: Linear Regression
  Math: min ||Xw - y||^2, gradient = X.T(Xw - y)/n
  Variants: Normal equation, Ridge (L2), Lasso (L1), Polynomial

Lesson 3: Logistic Regression
  Math: min -[y*log(p) + (1-y)*log(1-p)], sigmoid(z) = 1/(1+e^-z)
  Variants: Binary, multi-class (softmax), regularized

Lesson 4: Decision Trees
  Math: Information gain = H(parent) - weighted_avg(H(children))
  Variants: Gini impurity, pruning, random forest

Lesson 5: K-Means & KNN
  Math: min sum||x_i - mu_k||^2 (K-means), majority vote (KNN)
  Variants: K-means++, weighted KNN, DBSCAN

Lesson 6: Neural Networks
  Math: Chain rule: dL/dW = dL/da * da/dz * dz/dW
  Variants: MLP, ReLU/sigmoid/tanh, SGD/Adam

Lesson 7: PCA & SVD
  Math: Eigendecomposition of covariance matrix
  Variants: PCA, SVD, explained variance, t-SNE basics

Lesson 8: Implementation Tips
  Common mistakes, numerical stability, testing

Setting Up Your Environment

The only dependency you need is NumPy. We also use matplotlib for visualization in some examples, but it is never required for the implementations themselves.

# Install NumPy (if not already installed)
# pip install numpy

import numpy as np

# Verify your setup
print(f"NumPy version: {np.__version__}")

# Generate some test data to use throughout the course
np.random.seed(42)

# Regression data: y = 3x1 + 5x2 + noise
n_samples = 100
X = np.random.randn(n_samples, 2)
y = 3 * X[:, 0] + 5 * X[:, 1] + np.random.randn(n_samples) * 0.5

print(f"X shape: {X.shape}")   # (100, 2)
print(f"y shape: {y.shape}")   # (100,)

# Classification data: two clusters
from_class_0 = np.random.randn(50, 2) + np.array([2, 2])
from_class_1 = np.random.randn(50, 2) + np.array([-2, -2])
X_cls = np.vstack([from_class_0, from_class_1])
y_cls = np.array([0]*50 + [1]*50)

print(f"X_cls shape: {X_cls.shape}")   # (100, 2)
print(f"y_cls shape: {y_cls.shape}")   # (100,)

Key Takeaways

💡
  • "Implement X from scratch" is the most common ML coding interview question format
  • Interviewers evaluate mathematical correctness, vectorized code, numerical stability, and code organization
  • Every algorithm follows the same pattern: initialize, forward pass, compute loss, compute gradient, update parameters
  • Use only NumPy — no scikit-learn, no PyTorch, no TensorFlow
  • Start with a clean class skeleton (fit/predict) before filling in the math
  • Track loss history to demonstrate convergence awareness