Advanced

Output Security

Secure AI API responses with PII detection, content safety classification, response sanitization, and data leakage prevention techniques.

Why Output Security Matters

AI models can generate outputs that contain personally identifiable information (PII), harmful content, leaked system prompts, or executable code. Without output validation, your API becomes a liability even if input validation is perfect.

Real risk: LLMs can memorize and reproduce training data including email addresses, phone numbers, and code snippets. Even without prompt injection, a model may spontaneously output PII from its training data in response to benign queries.

PII Detection and Filtering

Python - PII Output Filter
import re

class PIIFilter:
    PATTERNS = {
        "email": re.compile(r'\b[\w.+-]+@[\w-]+\.[\w.-]+\b'),
        "phone": re.compile(r'\b\d{3}[-.]?\d{3}[-.]?\d{4}\b'),
        "ssn": re.compile(r'\b\d{3}-\d{2}-\d{4}\b'),
        "credit_card": re.compile(r'\b\d{4}[-\s]?\d{4}[-\s]?\d{4}[-\s]?\d{4}\b'),
        "api_key": re.compile(r'\b(sk|pk|api)[_-][a-zA-Z0-9]{20,}\b'),
    }

    def filter_response(self, text):
        """Replace PII with redaction markers."""
        filtered = text
        detections = []

        for pii_type, pattern in self.PATTERNS.items():
            matches = pattern.findall(filtered)
            if matches:
                detections.append({"type": pii_type,
                                   "count": len(matches)})
                filtered = pattern.sub(
                    f"[REDACTED_{pii_type.upper()}]", filtered
                )

        return filtered, detections

Content Safety Classification

Run model outputs through a content safety classifier before returning them to users:

CategoryActionExample
SafeReturn as-isNormal informational response
SensitiveAdd disclaimer, log for reviewMedical or legal information
HarmfulBlock and return generic errorInstructions for illegal activities
PII DetectedRedact PII, log incidentResponse contains email addresses
System LeakBlock, alert security teamResponse contains system prompt text

Preventing Data Leakage

  • System prompt protection: Monitor outputs for fragments of your system prompt. If detected, block the response and log the attempt.
  • Training data extraction: Detect when outputs appear to reproduce verbatim text from known training data sources.
  • Context window leakage: In multi-turn conversations, ensure responses do not reveal information from other users' sessions.
  • Tool output sanitization: When models call tools (APIs, databases), validate that tool outputs are sanitized before being included in responses.

Response Validation Pipeline

  1. Generate Response

    The AI model produces its output based on the validated input.

  2. PII Scan

    Run regex and NER-based PII detection. Redact any found PII.

  3. Content Safety Check

    Classify the response for harmful content. Block responses above the safety threshold.

  4. System Leak Detection

    Check for system prompt fragments, API keys, or internal configuration in the output.

  5. Format Validation

    Ensure the response matches the expected format (JSON schema, length limits, character encoding).

  6. Return to Client

    Only after passing all checks is the response returned to the API consumer.

Performance tip: Output validation adds latency. For streaming responses, apply lightweight checks (regex PII scan) per chunk and heavier checks (classifier-based safety) on the complete response. Consider running safety checks in parallel with generation.