Certified Adversarial Robustness Advanced
Empirical defenses like adversarial training provide no mathematical guarantees — a stronger attack could always be found. Certified defenses provide provable robustness guarantees: a mathematical proof that no perturbation within a specified bound can change the model's prediction. This lesson covers the leading approaches to certified robustness.
Why Certification Matters
Empirical robustness evaluation has a fundamental limitation: you can only test against known attacks. A model that resists FGSM, PGD, and C&W might still be vulnerable to a novel attack. Certified defenses solve this by providing guarantees that hold against any attack within the certified radius.
| Approach | Guarantee Type | Scalability | Certified Radius |
|---|---|---|---|
| Randomized Smoothing | Probabilistic (L2) | Scales to ImageNet | Moderate |
| Interval Bound Propagation | Deterministic (L-inf) | Moderate (small networks) | Small-Moderate |
| Lipschitz Networks | Deterministic (L2) | Good | Moderate |
| Formal Verification | Exact | Limited (very small networks) | Exact |
Randomized Smoothing
Randomized smoothing (Cohen et al., 2019) is the most scalable certified defense. It creates a smoothed classifier by averaging predictions over Gaussian noise added to the input:
import torch import numpy as np from scipy.stats import norm class SmoothedClassifier: """Randomized smoothing for certified robustness.""" def __init__(self, base_model, sigma, num_samples=1000): self.model = base_model self.sigma = sigma self.num_samples = num_samples def predict_and_certify(self, x): """Predict class and compute certified radius.""" # Sample noisy versions of input noise = torch.randn(self.num_samples, *x.shape) * self.sigma noisy_inputs = x.unsqueeze(0) + noise # Count predictions for each class predictions = self.model(noisy_inputs).argmax(dim=1) counts = torch.bincount(predictions) # Top class and its proportion top_class = counts.argmax().item() p_a = counts[top_class].item() / self.num_samples # Certified radius (L2 norm) if p_a > 0.5: radius = self.sigma * norm.ppf(p_a) else: radius = 0.0 # Cannot certify return top_class, radius
The certified radius tells you: "No L2 perturbation smaller than this radius can change the predicted class." This is a mathematically proven guarantee, not just an empirical observation.
Interval Bound Propagation (IBP)
IBP propagates intervals (lower and upper bounds) through each layer of the network to compute guaranteed output bounds for any input within an epsilon-ball:
- Start with the input range [x - epsilon, x + epsilon]
- Propagate bounds through each layer (linear, ReLU, etc.)
- If the output bounds for the true class are always higher than all other classes, the prediction is certified
- Train with verified bounds to improve certification rates
Lipschitz-Constrained Networks
Constrain the Lipschitz constant of the network so that small input changes produce bounded output changes. If the Lipschitz constant is L, then for any perturbation delta: |f(x+delta) - f(x)| ≤ L * |delta|. This directly limits how much an adversary can change the output.
Robustness Benchmarks
| Benchmark | Dataset | Metric |
|---|---|---|
| RobustBench | CIFAR-10, CIFAR-100, ImageNet | AutoAttack accuracy at various epsilon |
| AutoAttack | Any classification dataset | Ensemble of strong attacks for reliable evaluation |
| Certified Accuracy | CIFAR-10, ImageNet | Percentage of correctly classified and certified samples |
Ready for Best Practices?
The final lesson brings everything together with evaluation protocols, benchmarking guidelines, and practical advice for deploying robust models.
Next: Best Practices →
Lilly Tech Systems