Optimization Advanced
Optimization is where calculus meets practice. Every trained ML model is the result of an optimization algorithm minimizing a loss function. Understanding optimization helps you choose the right algorithm, set learning rates, and diagnose training issues.
Gradient Descent
The simplest optimization: repeatedly move in the negative gradient direction.
Python
import numpy as np def gradient_descent(gradient_fn, x0, lr=0.01, n_steps=100): x = x0.copy() history = [x.copy()] for _ in range(n_steps): grad = gradient_fn(x) x = x - lr * grad history.append(x.copy()) return x, history # Minimize f(x,y) = x^2 + 2y^2 grad_fn = lambda w: np.array([2*w[0], 4*w[1]]) result, _ = gradient_descent(grad_fn, np.array([5.0, 3.0]), lr=0.1) print("Minimum at:", result) # Close to [0, 0]
Gradient Descent Variants
| Variant | Batch Size | Pros | Cons |
|---|---|---|---|
| Batch GD | Full dataset | Stable convergence | Slow for large datasets |
| Stochastic GD | 1 sample | Fast updates, escapes local minima | Noisy, unstable |
| Mini-batch GD | 32-512 samples | Best of both worlds | Batch size is a hyperparameter |
Learning Rate
The learning rate is the most important hyperparameter in optimization:
Learning Rate Effects:
- Too large: Overshoots the minimum, loss diverges
- Too small: Converges too slowly, may get stuck
- Just right: Steady convergence to a good minimum
Challenges in Optimization
- Local minima: Non-convex loss surfaces have many local minima. SGD noise helps escape shallow ones.
- Saddle points: In high dimensions, saddle points are more common than local minima. Momentum helps pass through them.
- Plateaus: Flat regions where gradients are near zero. Adaptive methods like Adam handle these well.
- Ill-conditioning: When the loss surface is much steeper in some directions than others. Preconditioning or adaptive rates help.
Next Up: Best Practices
Learn practical tips for gradient checking, choosing learning rates, and avoiding common calculus pitfalls in ML.
Next: Best Practices →
Lilly Tech Systems