Output Security
Secure AI API responses with PII detection, content safety classification, response sanitization, and data leakage prevention techniques.
Why Output Security Matters
AI models can generate outputs that contain personally identifiable information (PII), harmful content, leaked system prompts, or executable code. Without output validation, your API becomes a liability even if input validation is perfect.
PII Detection and Filtering
import re class PIIFilter: PATTERNS = { "email": re.compile(r'\b[\w.+-]+@[\w-]+\.[\w.-]+\b'), "phone": re.compile(r'\b\d{3}[-.]?\d{3}[-.]?\d{4}\b'), "ssn": re.compile(r'\b\d{3}-\d{2}-\d{4}\b'), "credit_card": re.compile(r'\b\d{4}[-\s]?\d{4}[-\s]?\d{4}[-\s]?\d{4}\b'), "api_key": re.compile(r'\b(sk|pk|api)[_-][a-zA-Z0-9]{20,}\b'), } def filter_response(self, text): """Replace PII with redaction markers.""" filtered = text detections = [] for pii_type, pattern in self.PATTERNS.items(): matches = pattern.findall(filtered) if matches: detections.append({"type": pii_type, "count": len(matches)}) filtered = pattern.sub( f"[REDACTED_{pii_type.upper()}]", filtered ) return filtered, detections
Content Safety Classification
Run model outputs through a content safety classifier before returning them to users:
| Category | Action | Example |
|---|---|---|
| Safe | Return as-is | Normal informational response |
| Sensitive | Add disclaimer, log for review | Medical or legal information |
| Harmful | Block and return generic error | Instructions for illegal activities |
| PII Detected | Redact PII, log incident | Response contains email addresses |
| System Leak | Block, alert security team | Response contains system prompt text |
Preventing Data Leakage
- System prompt protection: Monitor outputs for fragments of your system prompt. If detected, block the response and log the attempt.
- Training data extraction: Detect when outputs appear to reproduce verbatim text from known training data sources.
- Context window leakage: In multi-turn conversations, ensure responses do not reveal information from other users' sessions.
- Tool output sanitization: When models call tools (APIs, databases), validate that tool outputs are sanitized before being included in responses.
Response Validation Pipeline
Generate Response
The AI model produces its output based on the validated input.
PII Scan
Run regex and NER-based PII detection. Redact any found PII.
Content Safety Check
Classify the response for harmful content. Block responses above the safety threshold.
System Leak Detection
Check for system prompt fragments, API keys, or internal configuration in the output.
Format Validation
Ensure the response matches the expected format (JSON schema, length limits, character encoding).
Return to Client
Only after passing all checks is the response returned to the API consumer.
Lilly Tech Systems