Convex Optimization Intermediate
Convex optimization is the gold standard of optimization: when your problem is convex, any local minimum is the global minimum, and gradient descent is guaranteed to find it. Many classical ML algorithms (linear regression, logistic regression, SVMs) are convex. Understanding convexity helps you know when optimization is "easy" and when it is fundamentally hard.
What is Convexity?
A function f is convex if any line segment between two points on its graph lies above the graph. Intuitively, a convex function is "bowl-shaped" with no bumps or valleys:
import numpy as np # Convex functions (bowl-shaped, one minimum) f_convex_1 = lambda x: x**2 # Quadratic f_convex_2 = lambda x: np.abs(x) # Absolute value f_convex_3 = lambda x: np.exp(x) # Exponential # Non-convex functions (multiple minima) f_nonconvex = lambda x: np.sin(x) + x**2/10 # Has local minima # Check convexity: f(tx + (1-t)y) <= t*f(x) + (1-t)*f(y) def check_convex(f, x, y, n_points=100): for t in np.linspace(0, 1, n_points): lhs = f(t * x + (1 - t) * y) rhs = t * f(x) + (1 - t) * f(y) if lhs > rhs + 1e-10: return False return True
Convex ML Problems
| Algorithm | Convex? | Implication |
|---|---|---|
| Linear Regression (MSE) | Yes | Closed-form solution exists, GD always converges |
| Logistic Regression | Yes | Global optimum guaranteed |
| SVM (linear) | Yes | Unique solution via quadratic programming |
| Lasso / Ridge | Yes | Regularized + convex = well-behaved |
| Neural Networks | No | Multiple local minima, no convergence guarantee |
Constrained Optimization
Many ML problems have constraints (e.g., SVM margin, probability simplex). Lagrange multipliers convert constrained problems into unconstrained ones:
from scipy.optimize import minimize # Constrained optimization: minimize f(x) subject to constraints # Minimize: x1^2 + x2^2 # Subject to: x1 + x2 = 1 (equality constraint) result = minimize( fun=lambda x: x[0]**2 + x[1]**2, x0=[0.5, 0.5], constraints={'type': 'eq', 'fun': lambda x: x[0] + x[1] - 1} ) print("Solution:", result.x) # [0.5, 0.5]
Next Up: Hyperparameter Tuning
Learn systematic methods for finding the best learning rates, batch sizes, and model configurations.
Next: Hyperparameter Tuning →
Lilly Tech Systems