Containment
Containment is about stopping the bleeding. The goal is to limit harm while preserving evidence for root cause analysis, balancing speed of response with service continuity.
Rollback Strategies
Rolling back an AI model is more complex than reverting a code deployment. You must consider model state, feature pipeline compatibility, and downstream dependencies:
| Strategy | When to Use | Considerations |
|---|---|---|
| Model Version Rollback | The previous model version worked correctly | Ensure feature schema compatibility; may lose recent improvements |
| Shadow Mode | Need to investigate while minimizing user impact | Route traffic to fallback; log suspect model outputs for analysis |
| Feature Flag Disable | Issue is isolated to a specific AI-powered feature | Disable the feature; show graceful degradation message |
| Rule-based Fallback | No safe model version available | Replace ML model with deterministic rules temporarily |
| Traffic Throttling | Issue affects only some inputs | Reduce traffic to affected model; investigate specific input patterns |
Model Isolation Techniques
-
Network-Level Isolation
Remove the compromised model endpoint from load balancers. Update DNS or service mesh routing to direct all traffic to a safe fallback model or cached responses.
-
Input Filtering
If the issue is triggered by specific inputs, deploy emergency input filters to block known-malicious patterns while allowing normal traffic through.
-
Output Guardrails
Add or tighten output filtering to catch problematic responses before they reach users. This can be deployed rapidly without model changes.
-
Rate Limiting
Implement aggressive rate limiting if the incident involves active exploitation, to slow down attackers while investigation proceeds.
Limiting Blast Radius
# Emergency containment configuration
containment_config = {
"model_id": "production-llm-v3.2",
"action": "rollback",
"target_version": "production-llm-v3.1",
"rollback_strategy": "blue_green",
"traffic_routing": {
"suspect_model": 0, # 0% traffic
"safe_model": 100 # 100% traffic
},
"emergency_filters": {
"block_patterns": ["regex_pattern_1", "regex_pattern_2"],
"max_output_length": 500,
"enable_toxicity_filter": True,
"toxicity_threshold": 0.3 # Stricter than normal
},
"logging": {
"capture_all_io": True,
"preserve_blocked": True
}
}
Lilly Tech Systems