Defense in Depth for AI
Lesson 5 of 7 in the AI Security Fundamentals course.
Layered Defense Strategies for AI Systems
Defense in depth is a security strategy that deploys multiple layers of protection so that if one defense fails, others remain in place. For AI systems, this means securing every phase of the ML lifecycle with overlapping controls that provide redundant protection against the diverse threats we identified in the attack surface analysis.
The Defense Layers
A comprehensive defense-in-depth strategy for AI includes these layers, from outermost to innermost:
- Network and infrastructure security: Firewalls, network segmentation, and secure compute environments
- Access control and authentication: Identity management, RBAC, and API security
- Data security: Encryption, integrity verification, and privacy-preserving techniques
- Model security: Adversarial training, robustness testing, and model hardening
- Application security: Input validation, output sanitization, and rate limiting
- Monitoring and detection: Anomaly detection, drift monitoring, and alerting
- Response and recovery: Incident response plans, model rollback, and forensics
Layer 1: Infrastructure Security
The foundation of AI security starts with securing the infrastructure:
- Network segmentation: Isolate ML training clusters from serving infrastructure and from general corporate networks
- GPU security: Ensure GPU memory is cleared between workloads to prevent data leakage in shared environments
- Container hardening: Use minimal base images for ML containers, scan for vulnerabilities, and enforce read-only file systems where possible
- Secrets management: Store API keys, model registry credentials, and database passwords in a secrets manager, never in code or config files
apiVersion: v1
kind: Pod
metadata:
name: ml-inference-server
labels:
app: model-serving
spec:
securityContext:
runAsNonRoot: true
runAsUser: 1000
fsGroup: 2000
seccompProfile:
type: RuntimeDefault
containers:
- name: model-server
image: ml-serving:v2.1.0@sha256:abc123...
securityContext:
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
capabilities:
drop: ["ALL"]
resources:
limits:
nvidia.com/gpu: 1
memory: "8Gi"
cpu: "4"
volumeMounts:
- name: model-artifacts
mountPath: /models
readOnly: true
- name: tmp
mountPath: /tmp
volumes:
- name: model-artifacts
persistentVolumeClaim:
claimName: verified-models
readOnly: true
- name: tmp
emptyDir:
sizeLimit: 500Mi
Layer 2: Data Security
Protecting the data that flows through ML systems is critical:
- Encryption at rest: All training data, model artifacts, and feature stores should be encrypted using AES-256 or equivalent
- Encryption in transit: All data movement between components should use TLS 1.3
- Data integrity: Hash training datasets and verify integrity before each training run to detect tampering
- Data lineage: Track the origin, transformations, and usage of every piece of training data
Layer 3: Model Hardening
Making models more resistant to attacks through training and architecture choices:
- Adversarial training: Include adversarial examples in training to improve robustness
- Ensemble methods: Use multiple models and aggregate predictions to reduce vulnerability to single-model attacks
- Randomized smoothing: Add controlled noise to inputs during inference to provide certified robustness guarantees
- Gradient masking: Techniques that make it harder for attackers to compute useful gradients for crafting adversarial examples
Layer 4: Application Security
Securing the application layer where models interact with users and other systems:
import numpy as np
from typing import Dict, Any, Optional
class DefenseInDepthPipeline:
"""Multi-layer defense pipeline for model inference."""
def __init__(self, model, config: Dict[str, Any]):
self.model = model
self.config = config
self.anomaly_detector = AnomalyDetector(config.get("anomaly_threshold", 3.0))
self.rate_limiter = RateLimiter(config.get("max_requests_per_minute", 60))
def predict(self, input_data: Dict, user_id: str) -> Dict:
# Layer 1: Rate limiting
if not self.rate_limiter.allow(user_id):
return {"error": "Rate limit exceeded", "code": 429}
# Layer 2: Input validation
validated = self._validate_input(input_data)
if validated is None:
return {"error": "Invalid input", "code": 400}
# Layer 3: Anomaly detection on input
if self.anomaly_detector.is_anomalous(validated["features"]):
self._alert("Anomalous input detected", user_id, validated)
return {"error": "Input rejected by security filter", "code": 422}
# Layer 4: Model prediction with confidence check
prediction = self.model.predict(validated["features"])
confidence = float(prediction["confidence"])
if confidence < self.config.get("min_confidence", 0.5):
return {"result": "uncertain", "requires_review": True}
# Layer 5: Output sanitization
sanitized = self._sanitize_output(prediction)
# Layer 6: Audit logging
self._audit_log(user_id, validated, sanitized)
return sanitized
def _validate_input(self, data: Dict) -> Optional[Dict]:
# Check required fields, types, ranges
if "features" not in data:
return None
features = np.array(data["features"])
if features.shape != (self.config["expected_dimensions"],):
return None
if np.any(np.isnan(features)) or np.any(np.isinf(features)):
return None
return {"features": features}
def _sanitize_output(self, prediction: Dict) -> Dict:
# Remove internal details, round confidence
return {
"prediction": prediction["class"],
"confidence": round(float(prediction["confidence"]), 2)
}
def _audit_log(self, user_id, input_data, output):
pass # Send to logging infrastructure
def _alert(self, message, user_id, data):
pass # Send to security monitoring
Layer 5: Monitoring and Detection
Continuous monitoring provides the early warning system for detecting attacks in progress:
- Data drift monitoring: Detect when input distributions shift, which may indicate poisoning or adversarial campaigns
- Performance monitoring: Track accuracy, latency, and error rates for sudden changes that may indicate compromise
- Query pattern analysis: Detect systematic querying patterns that suggest model extraction attempts
- Prediction distribution monitoring: Alert when the distribution of model outputs changes unexpectedly
Summary
Defense in depth for AI requires security controls at every layer from infrastructure to monitoring. The key is ensuring that layers are complementary and overlapping, so that a failure at any single layer does not leave the system exposed. In the next lesson, we explore industry security frameworks and standards that provide structured approaches to implementing these defenses.
Lilly Tech Systems