Implementing ML From Scratch
The "implement X from scratch" question is the gold standard of ML coding interviews. Companies like Google, Meta, Apple, and Amazon use it to test whether you truly understand the algorithms you use daily — or whether you just call sklearn.fit(). This lesson explains why, what interviewers evaluate, and the NumPy-only approach we use throughout this course.
Why Interviewers Ask This
When an interviewer says "implement linear regression from scratch without using scikit-learn," they are testing three things simultaneously:
Mathematical Understanding
Can you write down the loss function, derive the gradient, and translate the math into code? This separates engineers who understand ML from those who only use APIs.
Coding Ability
Can you write clean, correct, vectorized NumPy code? Handle edge cases? Structure the code with a fit/predict interface? This tests real engineering skills.
Debugging Intuition
When the model doesn't converge, can you diagnose whether it's a learning rate issue, a gradient bug, or a numerical stability problem? This requires deep understanding.
What Interviewers Evaluate
Based on publicly shared interview experiences, here is the rubric interviewers typically use:
# ML From-Scratch Interview Evaluation Rubric
evaluation = {
"mathematical_correctness": {
"weight": "CRITICAL",
"pass": "Correct loss function, correct gradient derivation",
"fail": "Wrong gradient, missing regularization term",
"example": "Knows that dL/dw = (1/n) * X.T @ (X @ w - y)"
},
"vectorized_implementation": {
"weight": "HIGH",
"pass": "Matrix operations, no loops over samples",
"fail": "For-loop computing gradient one sample at a time",
"example": "gradient = X.T @ (predictions - y) / n"
},
"numerical_stability": {
"weight": "HIGH",
"pass": "Log-sum-exp trick, gradient clipping, epsilon in log",
"fail": "np.exp(x) overflow, log(0) = -inf",
"example": "np.log(np.clip(probs, 1e-15, 1 - 1e-15))"
},
"code_organization": {
"weight": "MEDIUM",
"pass": "Class with fit(), predict(), clean interface",
"fail": "Single monolithic function, global variables",
"example": "class LinearRegression: def fit(self, X, y): ..."
},
"convergence_awareness": {
"weight": "MEDIUM",
"pass": "Tracks loss history, knows when to stop",
"fail": "Fixed number of iterations, no loss monitoring",
"example": "if abs(prev_loss - loss) < tol: break"
}
}
The NumPy-Only Approach
Throughout this course, we implement every algorithm using only NumPy. No scikit-learn, no PyTorch, no TensorFlow. This is exactly the constraint you face in interviews.
import numpy as np
# This is ALL you get in the interview:
# - np.array, np.zeros, np.ones, np.eye
# - np.dot, @, np.linalg.norm, np.linalg.inv
# - np.exp, np.log, np.maximum, np.clip
# - np.mean, np.sum, np.std, np.argmax
# - np.random for data generation
# The standard pattern for every algorithm:
class MLAlgorithm:
"""Base pattern for from-scratch implementations."""
def __init__(self, hyperparameters):
# Store hyperparameters (learning_rate, n_iterations, etc.)
self.lr = hyperparameters.get('learning_rate', 0.01)
self.n_iter = hyperparameters.get('n_iterations', 1000)
self.weights = None
self.bias = None
self.loss_history = []
def fit(self, X, y):
"""Train the model on data X with labels y."""
n_samples, n_features = X.shape
# 1. Initialize parameters
self.weights = np.zeros(n_features)
self.bias = 0.0
# 2. Optimization loop
for i in range(self.n_iter):
# Forward pass: compute predictions
predictions = self._forward(X)
# Compute loss
loss = self._compute_loss(predictions, y)
self.loss_history.append(loss)
# Backward pass: compute gradients
dw, db = self._compute_gradients(X, y, predictions)
# Update parameters
self.weights -= self.lr * dw
self.bias -= self.lr * db
return self
def predict(self, X):
"""Make predictions on new data."""
return self._forward(X)
def _forward(self, X):
raise NotImplementedError
def _compute_loss(self, predictions, y):
raise NotImplementedError
def _compute_gradients(self, X, y, predictions):
raise NotImplementedError
__init__, fit, and predict methods before filling in the math. This shows the interviewer you think in terms of clean interfaces, not just raw math. It also gives you a structure to fill in incrementally.Common Patterns Across All Algorithms
Every ML algorithm you implement from scratch shares these fundamental patterns:
| Pattern | Description | Example |
|---|---|---|
| Parameter initialization | Set initial weights, often zeros or small random values | W = np.random.randn(d) * 0.01 |
| Forward pass | Compute predictions from inputs and current parameters | y_hat = X @ W + b |
| Loss computation | Measure how wrong the predictions are | loss = np.mean((y_hat - y) ** 2) |
| Gradient computation | Compute direction to update parameters | dW = X.T @ (y_hat - y) / n |
| Parameter update | Move parameters in the negative gradient direction | W -= lr * dW |
| Convergence check | Stop when loss stops decreasing | if abs(loss - prev) < tol: break |
Algorithms We Will Implement
Here is a preview of every algorithm in this course, with the key mathematical concept behind each:
Course Roadmap:
Lesson 2: Linear Regression
Math: min ||Xw - y||^2, gradient = X.T(Xw - y)/n
Variants: Normal equation, Ridge (L2), Lasso (L1), Polynomial
Lesson 3: Logistic Regression
Math: min -[y*log(p) + (1-y)*log(1-p)], sigmoid(z) = 1/(1+e^-z)
Variants: Binary, multi-class (softmax), regularized
Lesson 4: Decision Trees
Math: Information gain = H(parent) - weighted_avg(H(children))
Variants: Gini impurity, pruning, random forest
Lesson 5: K-Means & KNN
Math: min sum||x_i - mu_k||^2 (K-means), majority vote (KNN)
Variants: K-means++, weighted KNN, DBSCAN
Lesson 6: Neural Networks
Math: Chain rule: dL/dW = dL/da * da/dz * dz/dW
Variants: MLP, ReLU/sigmoid/tanh, SGD/Adam
Lesson 7: PCA & SVD
Math: Eigendecomposition of covariance matrix
Variants: PCA, SVD, explained variance, t-SNE basics
Lesson 8: Implementation Tips
Common mistakes, numerical stability, testing
Setting Up Your Environment
The only dependency you need is NumPy. We also use matplotlib for visualization in some examples, but it is never required for the implementations themselves.
# Install NumPy (if not already installed)
# pip install numpy
import numpy as np
# Verify your setup
print(f"NumPy version: {np.__version__}")
# Generate some test data to use throughout the course
np.random.seed(42)
# Regression data: y = 3x1 + 5x2 + noise
n_samples = 100
X = np.random.randn(n_samples, 2)
y = 3 * X[:, 0] + 5 * X[:, 1] + np.random.randn(n_samples) * 0.5
print(f"X shape: {X.shape}") # (100, 2)
print(f"y shape: {y.shape}") # (100,)
# Classification data: two clusters
from_class_0 = np.random.randn(50, 2) + np.array([2, 2])
from_class_1 = np.random.randn(50, 2) + np.array([-2, -2])
X_cls = np.vstack([from_class_0, from_class_1])
y_cls = np.array([0]*50 + [1]*50)
print(f"X_cls shape: {X_cls.shape}") # (100, 2)
print(f"y_cls shape: {y_cls.shape}") # (100,)
Key Takeaways
- "Implement X from scratch" is the most common ML coding interview question format
- Interviewers evaluate mathematical correctness, vectorized code, numerical stability, and code organization
- Every algorithm follows the same pattern: initialize, forward pass, compute loss, compute gradient, update parameters
- Use only NumPy — no scikit-learn, no PyTorch, no TensorFlow
- Start with a clean class skeleton (fit/predict) before filling in the math
- Track loss history to demonstrate convergence awareness
Lilly Tech Systems