Convex Optimization Intermediate

Convex optimization is the gold standard of optimization: when your problem is convex, any local minimum is the global minimum, and gradient descent is guaranteed to find it. Many classical ML algorithms (linear regression, logistic regression, SVMs) are convex. Understanding convexity helps you know when optimization is "easy" and when it is fundamentally hard.

What is Convexity?

A function f is convex if any line segment between two points on its graph lies above the graph. Intuitively, a convex function is "bowl-shaped" with no bumps or valleys:

Python

import numpy as np

# Convex functions (bowl-shaped, one minimum)
f_convex_1 = lambda x: x**2            # Quadratic
f_convex_2 = lambda x: np.abs(x)       # Absolute value
f_convex_3 = lambda x: np.exp(x)       # Exponential

# Non-convex functions (multiple minima)
f_nonconvex = lambda x: np.sin(x) + x**2/10  # Has local minima

# Check convexity: f(tx + (1-t)y) <= t*f(x) + (1-t)*f(y)
def check_convex(f, x, y, n_points=100):
    for t in np.linspace(0, 1, n_points):
        lhs = f(t * x + (1 - t) * y)
        rhs = t * f(x) + (1 - t) * f(y)
        if lhs > rhs + 1e-10:
            return False
    return True

Convex ML Problems

Algorithm	Convex?	Implication
Linear Regression (MSE)	Yes	Closed-form solution exists, GD always converges
Logistic Regression	Yes	Global optimum guaranteed
SVM (linear)	Yes	Unique solution via quadratic programming
Lasso / Ridge	Yes	Regularized + convex = well-behaved
Neural Networks	No	Multiple local minima, no convergence guarantee

Constrained Optimization

Many ML problems have constraints (e.g., SVM margin, probability simplex). Lagrange multipliers convert constrained problems into unconstrained ones:

Python

from scipy.optimize import minimize

# Constrained optimization: minimize f(x) subject to constraints
# Minimize: x1^2 + x2^2
# Subject to: x1 + x2 = 1 (equality constraint)

result = minimize(
    fun=lambda x: x[0]**2 + x[1]**2,
    x0=[0.5, 0.5],
    constraints={'type': 'eq', 'fun': lambda x: x[0] + x[1] - 1}
)
print("Solution:", result.x)  # [0.5, 0.5]

Why Does Non-Convex Work? Neural networks are non-convex, yet they train well. Research suggests that in high dimensions, most critical points are saddle points (not local minima), and the local minima that do exist tend to have similar loss values. SGD noise helps escape bad regions.

Next Up: Hyperparameter Tuning

Learn systematic methods for finding the best learning rates, batch sizes, and model configurations.

Next: Hyperparameter Tuning →

← Adam & Optimizers Hyperparameter Tuning →