Advanced

Side-Channel Attacks

Even when API outputs are restricted, attackers can extract model information through indirect signals like timing, cache behavior, power consumption, and statistical analysis of responses.

Timing Side Channels

Response time variations can reveal model architecture and decision paths:

Architecture inference: Different model architectures (CNN vs. Transformer) have characteristic processing time profiles
Depth estimation: Deeper models take longer, and timing differences between inputs reveal computational paths
Early exit detection: Models with early-exit mechanisms show faster responses for "easy" inputs
Batch processing: Timing can reveal batch sizes and queue depths, aiding extraction planning

Membership Inference Attacks

Determine whether a specific data point was used to train the target model:

Python - Membership Inference Attack

class MembershipInferenceAttack:
    def __init__(self, target_model_api, shadow_models):
        self.target = target_model_api
        self.shadows = shadow_models
        self.attack_model = self.train_attack_model()

    def train_attack_model(self):
        """Train binary classifier: member vs non-member."""
        features, labels = [], []
        for shadow in self.shadows:
            # Members: training data of shadow model
            for x in shadow.train_data:
                pred = shadow.predict_proba(x)
                features.append(self.extract_features(pred))
                labels.append(1)  # member
            # Non-members: held-out data
            for x in shadow.test_data:
                pred = shadow.predict_proba(x)
                features.append(self.extract_features(pred))
                labels.append(0)  # non-member
        return train_classifier(features, labels)

    def infer(self, sample) -> bool:
        """Was this sample in the target model's training data?"""
        pred = self.target.predict_proba(sample)
        features = self.extract_features(pred)
        return self.attack_model.predict(features) == 1

Model Inversion

Model inversion reconstructs representative training data from model outputs. This is especially concerning for models trained on sensitive data:

Attack Type	Target	Privacy Risk
Feature reconstruction	Image classifiers	Reconstruct faces from a facial recognition model
Attribute inference	Tabular models	Infer sensitive attributes (income, health) from predictions
Text memorization	Language models	Extract training data verbatim (phone numbers, addresses)
Gradient leakage	Federated learning	Reconstruct training images from shared gradients

Hyperparameter Extraction

Attackers can infer model hyperparameters through carefully designed queries:

Learning rate: Analyzed through model update behavior in online learning systems
Regularization strength: Inferred from the model's generalization behavior on synthetic data
Architecture type: Determined by analyzing response patterns to structured inputs
Input preprocessing: Revealed by how the model handles edge cases, special characters, and out-of-range values

✅

Defense implication: Side-channel attacks highlight that security cannot focus solely on direct API outputs. Response timing must be normalized, detailed error messages suppressed, and confidence scores rounded or perturbed to limit information leakage.

← Previous Query-Based Extraction Next → API Protection