Defense in Depth for AI

Lesson 5 of 7 in the AI Security Fundamentals course.

Layered Defense Strategies for AI Systems

Defense in depth is a security strategy that deploys multiple layers of protection so that if one defense fails, others remain in place. For AI systems, this means securing every phase of the ML lifecycle with overlapping controls that provide redundant protection against the diverse threats we identified in the attack surface analysis.

The Defense Layers

A comprehensive defense-in-depth strategy for AI includes these layers, from outermost to innermost:

  1. Network and infrastructure security: Firewalls, network segmentation, and secure compute environments
  2. Access control and authentication: Identity management, RBAC, and API security
  3. Data security: Encryption, integrity verification, and privacy-preserving techniques
  4. Model security: Adversarial training, robustness testing, and model hardening
  5. Application security: Input validation, output sanitization, and rate limiting
  6. Monitoring and detection: Anomaly detection, drift monitoring, and alerting
  7. Response and recovery: Incident response plans, model rollback, and forensics

Layer 1: Infrastructure Security

The foundation of AI security starts with securing the infrastructure:

  • Network segmentation: Isolate ML training clusters from serving infrastructure and from general corporate networks
  • GPU security: Ensure GPU memory is cleared between workloads to prevent data leakage in shared environments
  • Container hardening: Use minimal base images for ML containers, scan for vulnerabilities, and enforce read-only file systems where possible
  • Secrets management: Store API keys, model registry credentials, and database passwords in a secrets manager, never in code or config files
YAML - Kubernetes Pod Security for ML Workloads
apiVersion: v1
kind: Pod
metadata:
  name: ml-inference-server
  labels:
    app: model-serving
spec:
  securityContext:
    runAsNonRoot: true
    runAsUser: 1000
    fsGroup: 2000
    seccompProfile:
      type: RuntimeDefault
  containers:
  - name: model-server
    image: ml-serving:v2.1.0@sha256:abc123...
    securityContext:
      allowPrivilegeEscalation: false
      readOnlyRootFilesystem: true
      capabilities:
        drop: ["ALL"]
    resources:
      limits:
        nvidia.com/gpu: 1
        memory: "8Gi"
        cpu: "4"
    volumeMounts:
    - name: model-artifacts
      mountPath: /models
      readOnly: true
    - name: tmp
      mountPath: /tmp
  volumes:
  - name: model-artifacts
    persistentVolumeClaim:
      claimName: verified-models
      readOnly: true
  - name: tmp
    emptyDir:
      sizeLimit: 500Mi

Layer 2: Data Security

Protecting the data that flows through ML systems is critical:

  • Encryption at rest: All training data, model artifacts, and feature stores should be encrypted using AES-256 or equivalent
  • Encryption in transit: All data movement between components should use TLS 1.3
  • Data integrity: Hash training datasets and verify integrity before each training run to detect tampering
  • Data lineage: Track the origin, transformations, and usage of every piece of training data
💡
Best practice: Implement a data integrity pipeline that computes and stores SHA-256 hashes of training datasets at each processing stage. Before training begins, verify the hash chain to ensure no data has been modified.

Layer 3: Model Hardening

Making models more resistant to attacks through training and architecture choices:

  • Adversarial training: Include adversarial examples in training to improve robustness
  • Ensemble methods: Use multiple models and aggregate predictions to reduce vulnerability to single-model attacks
  • Randomized smoothing: Add controlled noise to inputs during inference to provide certified robustness guarantees
  • Gradient masking: Techniques that make it harder for attackers to compute useful gradients for crafting adversarial examples

Layer 4: Application Security

Securing the application layer where models interact with users and other systems:

Python
import numpy as np
from typing import Dict, Any, Optional

class DefenseInDepthPipeline:
    """Multi-layer defense pipeline for model inference."""

    def __init__(self, model, config: Dict[str, Any]):
        self.model = model
        self.config = config
        self.anomaly_detector = AnomalyDetector(config.get("anomaly_threshold", 3.0))
        self.rate_limiter = RateLimiter(config.get("max_requests_per_minute", 60))

    def predict(self, input_data: Dict, user_id: str) -> Dict:
        # Layer 1: Rate limiting
        if not self.rate_limiter.allow(user_id):
            return {"error": "Rate limit exceeded", "code": 429}

        # Layer 2: Input validation
        validated = self._validate_input(input_data)
        if validated is None:
            return {"error": "Invalid input", "code": 400}

        # Layer 3: Anomaly detection on input
        if self.anomaly_detector.is_anomalous(validated["features"]):
            self._alert("Anomalous input detected", user_id, validated)
            return {"error": "Input rejected by security filter", "code": 422}

        # Layer 4: Model prediction with confidence check
        prediction = self.model.predict(validated["features"])
        confidence = float(prediction["confidence"])

        if confidence < self.config.get("min_confidence", 0.5):
            return {"result": "uncertain", "requires_review": True}

        # Layer 5: Output sanitization
        sanitized = self._sanitize_output(prediction)

        # Layer 6: Audit logging
        self._audit_log(user_id, validated, sanitized)

        return sanitized

    def _validate_input(self, data: Dict) -> Optional[Dict]:
        # Check required fields, types, ranges
        if "features" not in data:
            return None
        features = np.array(data["features"])
        if features.shape != (self.config["expected_dimensions"],):
            return None
        if np.any(np.isnan(features)) or np.any(np.isinf(features)):
            return None
        return {"features": features}

    def _sanitize_output(self, prediction: Dict) -> Dict:
        # Remove internal details, round confidence
        return {
            "prediction": prediction["class"],
            "confidence": round(float(prediction["confidence"]), 2)
        }

    def _audit_log(self, user_id, input_data, output):
        pass  # Send to logging infrastructure

    def _alert(self, message, user_id, data):
        pass  # Send to security monitoring

Layer 5: Monitoring and Detection

Continuous monitoring provides the early warning system for detecting attacks in progress:

  • Data drift monitoring: Detect when input distributions shift, which may indicate poisoning or adversarial campaigns
  • Performance monitoring: Track accuracy, latency, and error rates for sudden changes that may indicate compromise
  • Query pattern analysis: Detect systematic querying patterns that suggest model extraction attempts
  • Prediction distribution monitoring: Alert when the distribution of model outputs changes unexpectedly
Warning: Defense in depth is not about adding complexity — it is about adding the right layers of defense. Each layer should address specific threats identified in your attack surface analysis. Unnecessary layers add operational burden without improving security.

Summary

Defense in depth for AI requires security controls at every layer from infrastructure to monitoring. The key is ensuring that layers are complementary and overlapping, so that a failure at any single layer does not leave the system exposed. In the next lesson, we explore industry security frameworks and standards that provide structured approaches to implementing these defenses.