Advanced
Side-Channel Attacks
Even when API outputs are restricted, attackers can extract model information through indirect signals like timing, cache behavior, power consumption, and statistical analysis of responses.
Timing Side Channels
Response time variations can reveal model architecture and decision paths:
- Architecture inference: Different model architectures (CNN vs. Transformer) have characteristic processing time profiles
- Depth estimation: Deeper models take longer, and timing differences between inputs reveal computational paths
- Early exit detection: Models with early-exit mechanisms show faster responses for "easy" inputs
- Batch processing: Timing can reveal batch sizes and queue depths, aiding extraction planning
Membership Inference Attacks
Determine whether a specific data point was used to train the target model:
Python - Membership Inference Attack
class MembershipInferenceAttack: def __init__(self, target_model_api, shadow_models): self.target = target_model_api self.shadows = shadow_models self.attack_model = self.train_attack_model() def train_attack_model(self): """Train binary classifier: member vs non-member.""" features, labels = [], [] for shadow in self.shadows: # Members: training data of shadow model for x in shadow.train_data: pred = shadow.predict_proba(x) features.append(self.extract_features(pred)) labels.append(1) # member # Non-members: held-out data for x in shadow.test_data: pred = shadow.predict_proba(x) features.append(self.extract_features(pred)) labels.append(0) # non-member return train_classifier(features, labels) def infer(self, sample) -> bool: """Was this sample in the target model's training data?""" pred = self.target.predict_proba(sample) features = self.extract_features(pred) return self.attack_model.predict(features) == 1
Model Inversion
Model inversion reconstructs representative training data from model outputs. This is especially concerning for models trained on sensitive data:
| Attack Type | Target | Privacy Risk |
|---|---|---|
| Feature reconstruction | Image classifiers | Reconstruct faces from a facial recognition model |
| Attribute inference | Tabular models | Infer sensitive attributes (income, health) from predictions |
| Text memorization | Language models | Extract training data verbatim (phone numbers, addresses) |
| Gradient leakage | Federated learning | Reconstruct training images from shared gradients |
Hyperparameter Extraction
Attackers can infer model hyperparameters through carefully designed queries:
- Learning rate: Analyzed through model update behavior in online learning systems
- Regularization strength: Inferred from the model's generalization behavior on synthetic data
- Architecture type: Determined by analyzing response patterns to structured inputs
- Input preprocessing: Revealed by how the model handles edge cases, special characters, and out-of-range values
Defense implication: Side-channel attacks highlight that security cannot focus solely on direct API outputs. Response timing must be normalized, detailed error messages suppressed, and confidence scores rounded or perturbed to limit information leakage.
Lilly Tech Systems