Step 2: Email Classification
In this lesson, you will build an LLM-powered classification engine that analyzes every incoming email and assigns a priority level (urgent, high, normal, low), a category tag (meeting, task, FYI, personal, newsletter), and a sentiment score. The classifier uses structured JSON output from OpenAI for reliable, parseable results.
Classification Strategy
We use a single LLM call per email to extract all classification signals at once. This is more efficient than making separate calls for priority, category, and sentiment. The prompt instructs the model to return structured JSON, which we parse and validate.
Email Input
|
v
[Subject + Sender + Body (truncated)]
|
v
[OpenAI gpt-4o-mini]
|
v
{
"priority": "high",
"category": "task",
"sentiment": "neutral",
"confidence": 0.92,
"summary": "Manager requesting Q3 report by Friday"
}
The Classification Prompt
The prompt is the most important part. It defines the taxonomy, provides examples, and enforces JSON output format:
# app/ai/classifier.py
"""LLM-powered email classification engine."""
import json
from openai import OpenAI
from app.config import config
from app.database import SessionLocal, Email, Classification
client = OpenAI(api_key=config.openai_api_key)
CLASSIFICATION_PROMPT = """You are an email classification assistant. Analyze the email below and return a JSON object with these fields:
1. "priority": One of "urgent", "high", "normal", "low"
- urgent: Requires immediate action (deadlines today, emergencies, critical issues)
- high: Important and time-sensitive (deadlines this week, manager requests, client emails)
- normal: Regular business communication (meetings, updates, questions)
- low: Informational only (newsletters, notifications, automated emails)
2. "category": One of "meeting", "task", "question", "fyi", "personal", "newsletter", "notification", "sales"
- meeting: Calendar invites, scheduling, meeting follow-ups
- task: Action items, assignments, requests for deliverables
- question: Direct questions requiring a response
- fyi: Informational updates, status reports, announcements
- personal: Non-work communication
- newsletter: Bulk email, subscriptions, marketing
- notification: Automated system notifications (GitHub, Jira, etc.)
- sales: Sales pitches, cold outreach, vendor proposals
3. "sentiment": One of "positive", "neutral", "negative", "urgent"
- positive: Praise, gratitude, good news, approval
- neutral: Standard business tone, factual
- negative: Complaints, criticism, problems, frustration
- urgent: Panic, deadline pressure, escalation
4. "confidence": A float between 0.0 and 1.0 indicating your confidence
5. "summary": A one-sentence summary of the email (max 100 characters)
Return ONLY the JSON object, no markdown, no explanation.
--- EMAIL ---
From: {sender}
Subject: {subject}
Date: {date}
{body}
--- END EMAIL ---"""
def classify_email(email: Email) -> dict:
"""
Classify a single email using the LLM.
Args:
email: Email ORM object from the database.
Returns:
Dict with priority, category, sentiment, confidence, summary.
"""
# Truncate body to avoid token limits
body = (email.body_text or "")[:3000]
prompt = CLASSIFICATION_PROMPT.format(
sender=f"{email.sender_name} <{email.sender}>",
subject=email.subject or "(no subject)",
date=email.date.strftime("%Y-%m-%d %H:%M") if email.date else "Unknown",
body=body,
)
response = client.chat.completions.create(
model=config.openai_model,
messages=[
{
"role": "system",
"content": "You are a precise email classifier. Always return valid JSON."
},
{"role": "user", "content": prompt}
],
temperature=0.1, # Low temperature for consistent results
max_tokens=200,
response_format={"type": "json_object"},
)
raw = response.choices[0].message.content.strip()
try:
result = json.loads(raw)
except json.JSONDecodeError:
# Fallback if JSON parsing fails
result = {
"priority": "normal",
"category": "fyi",
"sentiment": "neutral",
"confidence": 0.5,
"summary": email.snippet[:100] if email.snippet else "",
}
# Validate and normalize
result["priority"] = _validate_enum(
result.get("priority", "normal"),
["urgent", "high", "normal", "low"],
"normal"
)
result["category"] = _validate_enum(
result.get("category", "fyi"),
["meeting", "task", "question", "fyi", "personal",
"newsletter", "notification", "sales"],
"fyi"
)
result["sentiment"] = _validate_enum(
result.get("sentiment", "neutral"),
["positive", "neutral", "negative", "urgent"],
"neutral"
)
result["confidence"] = min(1.0, max(0.0, float(
result.get("confidence", 0.5)
)))
result["summary"] = str(result.get("summary", ""))[:200]
return result
def _validate_enum(value: str, allowed: list, default: str) -> str:
"""Validate a value is in the allowed list."""
value = value.lower().strip()
return value if value in allowed else default
Batch Classification Pipeline
Now build the pipeline that classifies all unclassified emails in the database:
# Add to app/ai/classifier.py
def classify_unprocessed_emails(limit: int = 50) -> list:
"""
Classify all emails that do not have a classification yet.
Args:
limit: Maximum number of emails to classify in one batch.
Returns:
List of Classification objects created.
"""
session = SessionLocal()
new_classifications = []
try:
# Find emails without classifications
unclassified = (
session.query(Email)
.outerjoin(Classification)
.filter(Classification.id.is_(None))
.order_by(Email.date.desc())
.limit(limit)
.all()
)
print(f"Found {len(unclassified)} unclassified emails")
for email in unclassified:
try:
result = classify_email(email)
classification = Classification(
email_id=email.id,
priority=result["priority"],
category=result["category"],
sentiment=result["sentiment"],
confidence=result["confidence"],
summary=result["summary"],
)
session.add(classification)
new_classifications.append(classification)
print(
f" [{result['priority'].upper()}] "
f"[{result['category']}] "
f"{email.subject[:50]}"
)
except Exception as e:
print(f" Error classifying '{email.subject}': {e}")
continue
session.commit()
print(f"Classified {len(new_classifications)} emails")
except Exception as e:
session.rollback()
print(f"Classification pipeline error: {e}")
raise
finally:
session.close()
return new_classifications
def get_priority_inbox(priority: str = None, limit: int = 50) -> list:
"""
Get emails sorted by priority for the inbox view.
Args:
priority: Filter by priority level (optional).
limit: Maximum results.
Returns:
List of (Email, Classification) tuples.
"""
session = SessionLocal()
try:
query = (
session.query(Email, Classification)
.join(Classification)
.order_by(
# Custom priority ordering
Classification.priority.desc(),
Email.date.desc()
)
)
if priority:
query = query.filter(Classification.priority == priority)
results = query.limit(limit).all()
return results
finally:
session.close()
Testing the Classifier
# Test classification on your inbox
python -c "
from app.ai.classifier import classify_unprocessed_emails
results = classify_unprocessed_emails(limit=5)
print(f'\nClassified {len(results)} emails')
for c in results:
print(f' Priority: {c.priority} | Category: {c.category} | {c.summary}')
"
gpt-4o-mini at $0.15 per 1M input tokens and $0.60 per 1M output tokens, classifying 100 emails costs roughly $0.02. The low temperature (0.1) and response_format=json_object ensure consistent, parseable results on every call.Handling Edge Cases
Real email data is messy. Here are the edge cases our classifier handles:
- Empty body: Falls back to classifying based on subject line and sender.
- Long emails: Truncated to 3,000 characters to stay within token limits.
- Non-English emails: GPT-4o-mini handles most languages well. The prompt is in English but classification works across languages.
- Automated/system emails: The "notification" and "newsletter" categories capture these. GitHub notifications, Jira updates, and marketing emails are reliably detected.
- Invalid JSON response: The fallback assigns "normal" priority and "fyi" category.
- Enum validation: Any unexpected values from the LLM are mapped to safe defaults.
Key Takeaways
- A single LLM call extracts priority, category, sentiment, confidence, and summary simultaneously — efficient and cost-effective.
response_format=json_objectguarantees the LLM returns valid JSON, eliminating parsing failures.- Low temperature (0.1) produces consistent classifications. The same email will get the same priority every time.
- Validation and fallbacks make the pipeline robust against unexpected LLM outputs.
- The batch pipeline processes only unclassified emails, making it safe to run repeatedly.
What Is Next
In the next lesson, you will build the draft generation engine — creating context-aware reply drafts that match your writing tone, with a template system for common responses.