Intermediate

Step 2: Email Classification

In this lesson, you will build an LLM-powered classification engine that analyzes every incoming email and assigns a priority level (urgent, high, normal, low), a category tag (meeting, task, FYI, personal, newsletter), and a sentiment score. The classifier uses structured JSON output from OpenAI for reliable, parseable results.

Classification Strategy

We use a single LLM call per email to extract all classification signals at once. This is more efficient than making separate calls for priority, category, and sentiment. The prompt instructs the model to return structured JSON, which we parse and validate.

Email Input
    |
    v
[Subject + Sender + Body (truncated)]
    |
    v
[OpenAI gpt-4o-mini]
    |
    v
{
  "priority": "high",
  "category": "task",
  "sentiment": "neutral",
  "confidence": 0.92,
  "summary": "Manager requesting Q3 report by Friday"
}

The Classification Prompt

The prompt is the most important part. It defines the taxonomy, provides examples, and enforces JSON output format:

# app/ai/classifier.py
"""LLM-powered email classification engine."""
import json
from openai import OpenAI

from app.config import config
from app.database import SessionLocal, Email, Classification

client = OpenAI(api_key=config.openai_api_key)

CLASSIFICATION_PROMPT = """You are an email classification assistant. Analyze the email below and return a JSON object with these fields:

1. "priority": One of "urgent", "high", "normal", "low"
   - urgent: Requires immediate action (deadlines today, emergencies, critical issues)
   - high: Important and time-sensitive (deadlines this week, manager requests, client emails)
   - normal: Regular business communication (meetings, updates, questions)
   - low: Informational only (newsletters, notifications, automated emails)

2. "category": One of "meeting", "task", "question", "fyi", "personal", "newsletter", "notification", "sales"
   - meeting: Calendar invites, scheduling, meeting follow-ups
   - task: Action items, assignments, requests for deliverables
   - question: Direct questions requiring a response
   - fyi: Informational updates, status reports, announcements
   - personal: Non-work communication
   - newsletter: Bulk email, subscriptions, marketing
   - notification: Automated system notifications (GitHub, Jira, etc.)
   - sales: Sales pitches, cold outreach, vendor proposals

3. "sentiment": One of "positive", "neutral", "negative", "urgent"
   - positive: Praise, gratitude, good news, approval
   - neutral: Standard business tone, factual
   - negative: Complaints, criticism, problems, frustration
   - urgent: Panic, deadline pressure, escalation

4. "confidence": A float between 0.0 and 1.0 indicating your confidence

5. "summary": A one-sentence summary of the email (max 100 characters)

Return ONLY the JSON object, no markdown, no explanation.

--- EMAIL ---
From: {sender}
Subject: {subject}
Date: {date}

{body}
--- END EMAIL ---"""


def classify_email(email: Email) -> dict:
    """
    Classify a single email using the LLM.

    Args:
        email: Email ORM object from the database.

    Returns:
        Dict with priority, category, sentiment, confidence, summary.
    """
    # Truncate body to avoid token limits
    body = (email.body_text or "")[:3000]

    prompt = CLASSIFICATION_PROMPT.format(
        sender=f"{email.sender_name} <{email.sender}>",
        subject=email.subject or "(no subject)",
        date=email.date.strftime("%Y-%m-%d %H:%M") if email.date else "Unknown",
        body=body,
    )

    response = client.chat.completions.create(
        model=config.openai_model,
        messages=[
            {
                "role": "system",
                "content": "You are a precise email classifier. Always return valid JSON."
            },
            {"role": "user", "content": prompt}
        ],
        temperature=0.1,  # Low temperature for consistent results
        max_tokens=200,
        response_format={"type": "json_object"},
    )

    raw = response.choices[0].message.content.strip()

    try:
        result = json.loads(raw)
    except json.JSONDecodeError:
        # Fallback if JSON parsing fails
        result = {
            "priority": "normal",
            "category": "fyi",
            "sentiment": "neutral",
            "confidence": 0.5,
            "summary": email.snippet[:100] if email.snippet else "",
        }

    # Validate and normalize
    result["priority"] = _validate_enum(
        result.get("priority", "normal"),
        ["urgent", "high", "normal", "low"],
        "normal"
    )
    result["category"] = _validate_enum(
        result.get("category", "fyi"),
        ["meeting", "task", "question", "fyi", "personal",
         "newsletter", "notification", "sales"],
        "fyi"
    )
    result["sentiment"] = _validate_enum(
        result.get("sentiment", "neutral"),
        ["positive", "neutral", "negative", "urgent"],
        "neutral"
    )
    result["confidence"] = min(1.0, max(0.0, float(
        result.get("confidence", 0.5)
    )))
    result["summary"] = str(result.get("summary", ""))[:200]

    return result


def _validate_enum(value: str, allowed: list, default: str) -> str:
    """Validate a value is in the allowed list."""
    value = value.lower().strip()
    return value if value in allowed else default

Batch Classification Pipeline

Now build the pipeline that classifies all unclassified emails in the database:

# Add to app/ai/classifier.py

def classify_unprocessed_emails(limit: int = 50) -> list:
    """
    Classify all emails that do not have a classification yet.

    Args:
        limit: Maximum number of emails to classify in one batch.

    Returns:
        List of Classification objects created.
    """
    session = SessionLocal()
    new_classifications = []

    try:
        # Find emails without classifications
        unclassified = (
            session.query(Email)
            .outerjoin(Classification)
            .filter(Classification.id.is_(None))
            .order_by(Email.date.desc())
            .limit(limit)
            .all()
        )

        print(f"Found {len(unclassified)} unclassified emails")

        for email in unclassified:
            try:
                result = classify_email(email)

                classification = Classification(
                    email_id=email.id,
                    priority=result["priority"],
                    category=result["category"],
                    sentiment=result["sentiment"],
                    confidence=result["confidence"],
                    summary=result["summary"],
                )
                session.add(classification)
                new_classifications.append(classification)

                print(
                    f"  [{result['priority'].upper()}] "
                    f"[{result['category']}] "
                    f"{email.subject[:50]}"
                )

            except Exception as e:
                print(f"  Error classifying '{email.subject}': {e}")
                continue

        session.commit()
        print(f"Classified {len(new_classifications)} emails")

    except Exception as e:
        session.rollback()
        print(f"Classification pipeline error: {e}")
        raise
    finally:
        session.close()

    return new_classifications


def get_priority_inbox(priority: str = None, limit: int = 50) -> list:
    """
    Get emails sorted by priority for the inbox view.

    Args:
        priority: Filter by priority level (optional).
        limit: Maximum results.

    Returns:
        List of (Email, Classification) tuples.
    """
    session = SessionLocal()
    try:
        query = (
            session.query(Email, Classification)
            .join(Classification)
            .order_by(
                # Custom priority ordering
                Classification.priority.desc(),
                Email.date.desc()
            )
        )
        if priority:
            query = query.filter(Classification.priority == priority)

        results = query.limit(limit).all()
        return results
    finally:
        session.close()

Testing the Classifier

# Test classification on your inbox
python -c "
from app.ai.classifier import classify_unprocessed_emails

results = classify_unprocessed_emails(limit=5)
print(f'\nClassified {len(results)} emails')
for c in results:
    print(f'  Priority: {c.priority} | Category: {c.category} | {c.summary}')
"
💡
Cost Control: With gpt-4o-mini at $0.15 per 1M input tokens and $0.60 per 1M output tokens, classifying 100 emails costs roughly $0.02. The low temperature (0.1) and response_format=json_object ensure consistent, parseable results on every call.

Handling Edge Cases

Real email data is messy. Here are the edge cases our classifier handles:

  • Empty body: Falls back to classifying based on subject line and sender.
  • Long emails: Truncated to 3,000 characters to stay within token limits.
  • Non-English emails: GPT-4o-mini handles most languages well. The prompt is in English but classification works across languages.
  • Automated/system emails: The "notification" and "newsletter" categories capture these. GitHub notifications, Jira updates, and marketing emails are reliably detected.
  • Invalid JSON response: The fallback assigns "normal" priority and "fyi" category.
  • Enum validation: Any unexpected values from the LLM are mapped to safe defaults.

Key Takeaways

  • A single LLM call extracts priority, category, sentiment, confidence, and summary simultaneously — efficient and cost-effective.
  • response_format=json_object guarantees the LLM returns valid JSON, eliminating parsing failures.
  • Low temperature (0.1) produces consistent classifications. The same email will get the same priority every time.
  • Validation and fallbacks make the pipeline robust against unexpected LLM outputs.
  • The batch pipeline processes only unclassified emails, making it safe to run repeatedly.

What Is Next

In the next lesson, you will build the draft generation engine — creating context-aware reply drafts that match your writing tone, with a template system for common responses.