Advanced

Side-Channel Attacks

Even when API outputs are restricted, attackers can extract model information through indirect signals like timing, cache behavior, power consumption, and statistical analysis of responses.

Timing Side Channels

Response time variations can reveal model architecture and decision paths:

  • Architecture inference: Different model architectures (CNN vs. Transformer) have characteristic processing time profiles
  • Depth estimation: Deeper models take longer, and timing differences between inputs reveal computational paths
  • Early exit detection: Models with early-exit mechanisms show faster responses for "easy" inputs
  • Batch processing: Timing can reveal batch sizes and queue depths, aiding extraction planning

Membership Inference Attacks

Determine whether a specific data point was used to train the target model:

Python - Membership Inference Attack
class MembershipInferenceAttack:
    def __init__(self, target_model_api, shadow_models):
        self.target = target_model_api
        self.shadows = shadow_models
        self.attack_model = self.train_attack_model()

    def train_attack_model(self):
        """Train binary classifier: member vs non-member."""
        features, labels = [], []
        for shadow in self.shadows:
            # Members: training data of shadow model
            for x in shadow.train_data:
                pred = shadow.predict_proba(x)
                features.append(self.extract_features(pred))
                labels.append(1)  # member
            # Non-members: held-out data
            for x in shadow.test_data:
                pred = shadow.predict_proba(x)
                features.append(self.extract_features(pred))
                labels.append(0)  # non-member
        return train_classifier(features, labels)

    def infer(self, sample) -> bool:
        """Was this sample in the target model's training data?"""
        pred = self.target.predict_proba(sample)
        features = self.extract_features(pred)
        return self.attack_model.predict(features) == 1

Model Inversion

Model inversion reconstructs representative training data from model outputs. This is especially concerning for models trained on sensitive data:

Attack TypeTargetPrivacy Risk
Feature reconstructionImage classifiersReconstruct faces from a facial recognition model
Attribute inferenceTabular modelsInfer sensitive attributes (income, health) from predictions
Text memorizationLanguage modelsExtract training data verbatim (phone numbers, addresses)
Gradient leakageFederated learningReconstruct training images from shared gradients

Hyperparameter Extraction

Attackers can infer model hyperparameters through carefully designed queries:

  • Learning rate: Analyzed through model update behavior in online learning systems
  • Regularization strength: Inferred from the model's generalization behavior on synthetic data
  • Architecture type: Determined by analyzing response patterns to structured inputs
  • Input preprocessing: Revealed by how the model handles edge cases, special characters, and out-of-range values
Defense implication: Side-channel attacks highlight that security cannot focus solely on direct API outputs. Response timing must be normalized, detailed error messages suppressed, and confidence scores rounded or perturbed to limit information leakage.