Advanced

Output Security

Secure AI API responses with PII detection, content safety classification, response sanitization, and data leakage prevention techniques.

Why Output Security Matters

AI models can generate outputs that contain personally identifiable information (PII), harmful content, leaked system prompts, or executable code. Without output validation, your API becomes a liability even if input validation is perfect.

⚠

Real risk: LLMs can memorize and reproduce training data including email addresses, phone numbers, and code snippets. Even without prompt injection, a model may spontaneously output PII from its training data in response to benign queries.

PII Detection and Filtering

Python - PII Output Filter

import re

class PIIFilter:
    PATTERNS = {
        "email": re.compile(r'\b[\w.+-]+@[\w-]+\.[\w.-]+\b'),
        "phone": re.compile(r'\b\d{3}[-.]?\d{3}[-.]?\d{4}\b'),
        "ssn": re.compile(r'\b\d{3}-\d{2}-\d{4}\b'),
        "credit_card": re.compile(r'\b\d{4}[-\s]?\d{4}[-\s]?\d{4}[-\s]?\d{4}\b'),
        "api_key": re.compile(r'\b(sk|pk|api)[_-][a-zA-Z0-9]{20,}\b'),
    }

    def filter_response(self, text):
        """Replace PII with redaction markers."""
        filtered = text
        detections = []

        for pii_type, pattern in self.PATTERNS.items():
            matches = pattern.findall(filtered)
            if matches:
                detections.append({"type": pii_type,
                                   "count": len(matches)})
                filtered = pattern.sub(
                    f"[REDACTED_{pii_type.upper()}]", filtered
                )

        return filtered, detections

Content Safety Classification

Run model outputs through a content safety classifier before returning them to users:

Category	Action	Example
Safe	Return as-is	Normal informational response
Sensitive	Add disclaimer, log for review	Medical or legal information
Harmful	Block and return generic error	Instructions for illegal activities
PII Detected	Redact PII, log incident	Response contains email addresses
System Leak	Block, alert security team	Response contains system prompt text

Preventing Data Leakage

System prompt protection: Monitor outputs for fragments of your system prompt. If detected, block the response and log the attempt.
Training data extraction: Detect when outputs appear to reproduce verbatim text from known training data sources.
Context window leakage: In multi-turn conversations, ensure responses do not reveal information from other users' sessions.
Tool output sanitization: When models call tools (APIs, databases), validate that tool outputs are sanitized before being included in responses.

Response Validation Pipeline

Generate Response
The AI model produces its output based on the validated input.
PII Scan
Run regex and NER-based PII detection. Redact any found PII.
Content Safety Check
Classify the response for harmful content. Block responses above the safety threshold.
System Leak Detection
Check for system prompt fragments, API keys, or internal configuration in the output.
Format Validation
Ensure the response matches the expected format (JSON schema, length limits, character encoding).
Return to Client
Only after passing all checks is the response returned to the API consumer.

✅

Performance tip: Output validation adds latency. For streaming responses, apply lightweight checks (regex PII scan) per chunk and heavier checks (classifier-based safety) on the complete response. Consider running safety checks in parallel with generation.

← Previous Input Validation Next → Best Practices

Output Security

Why Output Security Matters

PII Detection and Filtering

Content Safety Classification

Preventing Data Leakage

Response Validation Pipeline

Generate Response

PII Scan

Content Safety Check

System Leak Detection

Format Validation

Return to Client