Advanced

Reusable Patterns & Tips

After working through six ML system design problems, you have seen the same architectural patterns emerge repeatedly. This lesson crystallizes those patterns, provides a communication framework for your interview, and answers the most frequently asked questions.

Pattern 1: Two-Stage Retrieval + Ranking

This pattern appears in nearly every ML system that must select items from a large candidate pool.

System	Retrieval Stage	Ranking Stage
News Feed	Collaborative filtering + friend posts (~10K)	Multi-task DNN (~50 shown)
Search Autocomplete	Trie prefix matching (~1K)	LambdaMART or two-tower (~8 shown)
Ad Click	Targeting + budget filter (~10K)	DCN v2 (~5 shown)

💡

When to use this pattern: Whenever you need to select a small number of items from millions of candidates. The retrieval stage optimizes for recall (do not miss good candidates), while the ranking stage optimizes for precision (show the best candidates first).

Pattern 2: Cascade Architecture

Layer simple, fast models before complex, expensive ones. Each layer filters out easy cases so the next layer handles fewer, harder cases.

# Cascade pattern
#
# Layer 1: Rules/heuristics        --> Handle 60% of cases (0 compute cost)
# Layer 2: Lightweight ML model     --> Handle 25% of cases (cheap compute)
# Layer 3: Deep learning model      --> Handle 10% of cases (expensive compute)
# Layer 4: Human review             --> Handle 5% of cases (most expensive)
#
# Used in:
# - Spam detection: blocklist -> metadata model -> content model -> review
# - Content moderation: hash match -> keyword rules -> classifier -> review
# - Search ranking: exact match -> BM25 -> neural ranker

Pattern 3: Multi-Task Learning

When you need to predict multiple related outcomes, use a shared backbone with task-specific heads.

# Multi-task pattern
#
# Shared layers (learn general patterns)
#   |
# +---------+---------+---------+
# | Head 1  | Head 2  | Head 3  |
# | P(click)| P(like) | P(share)|
# +---------+---------+---------+
#
# Benefits:
# - More data for shared layers (all tasks contribute)
# - Single forward pass for all predictions (faster serving)
# - Regularization effect (prevents overfitting to any single task)
#
# Used in: News feed ranking, ad click + conversion, content moderation

Pattern 4: Feature Store Architecture

Separate feature computation from model training and serving to ensure consistency.

Component	Description	Example Features
Offline Feature Store	Batch-computed features (updated hourly/daily)	User demographics, 30-day engagement stats, item popularity
Online Feature Store	Real-time features (updated per-event)	Session activity, last click timestamp, current location
Feature Registry	Metadata about features (schema, owner, freshness)	Feature documentation, lineage, SLA

⚠

Training-serving skew: The #1 bug in ML systems. If features are computed differently during training vs. serving, model accuracy degrades silently. A feature store ensures the same code computes features in both contexts. Always mention this in your interview.

Pattern 5: Online Experimentation (A/B Testing)

Every ML system needs a rigorous experimentation framework.

# A/B test design checklist for ML systems
#
# 1. Define hypothesis: "New model will increase CTR by 2%"
# 2. Choose metrics:
#    - Primary: CTR (what we want to improve)
#    - Guardrails: User satisfaction, revenue, latency (must not regress)
# 3. Power analysis: How many users/days needed for significance?
#    - Effect size: 2% relative CTR lift
#    - Significance level: p < 0.05
#    - Power: 80%
#    - Typically: 1M users per arm, 7 days minimum
# 4. Randomization: Hash user ID to assign treatment/control
# 5. Ramp-up: 1% -> 5% -> 20% -> 50% -> 100% (monitor at each stage)
# 6. Analysis: Check for novelty effects, segment-level impacts

Pattern 6: Feedback Loop Design

ML systems that improve over time have well-designed feedback loops.

🔄

Positive Feedback Loop

Model predicts well → users interact more → more training data → model improves. Seen in recommendation systems. Risk: popular items get more popular (rich-get-richer).

⚠

Negative Feedback Loop

Model blocks spam → spammers adapt → model accuracy drops → need retraining. Seen in adversarial settings (spam, fraud). Mitigation: continuous retraining.

📈

Delayed Feedback

Action now, outcome later. Ad click happens immediately, but conversion may take days. Solution: use click as proxy label now, update with conversion label later.

🚫

Selection Bias

Model only sees outcomes for items it showed. If you never show an ad, you never know if the user would have clicked. Solution: exploration (epsilon-greedy, Thompson sampling).

Communication Framework for the Interview

How you communicate is almost as important as what you design. Use this framework:

The STAR-ML Method

Step	What to Do	Time	Example Phrases
Scope	Clarify requirements, define ML objective	5 min	“Before I design, let me make sure I understand the requirements…”
Topology	Draw system architecture, data flow	8 min	“Here is how data flows through the system…”
Architect	Deep dive into features, model, metrics	18 min	“Let me go deep on the ranking model. I would use… because…”
Refine	Trade-offs, extensions, what you would do next	6 min	“The key trade-off here is… I chose X over Y because…”
ML	Tie everything back to ML-specific concerns	Throughout	“This matters because of training-serving skew…”

Phrases That Impress Interviewers

“I would start with a simple baseline — logistic regression with hand-crafted features — and iterate from there.”
“The trade-off here is between X and Y. Given our latency constraints, I would choose X and revisit Y in V2.”
“We need both offline metrics (AUC for ranking quality) and online metrics (CTR for business impact) to evaluate this.”
“One risk I want to flag is training-serving skew in the feature pipeline. Here is how I would prevent it.”
“For V1, I would use batch predictions. For V2, we could move to online inference if the latency budget allows.”
“This design assumes we have labeled data. If not, we could bootstrap with rule-based labels or use semi-supervised learning.”

Common Mistakes to Avoid

#	Mistake	What to Do Instead
1	Jumping to model architecture without clarifying requirements	Spend 5 minutes asking questions first
2	Designing only the ML model, ignoring the system around it	Cover data pipeline, feature store, serving, monitoring
3	Not defining metrics until asked	Proactively define offline and online metrics
4	Choosing a complex model without justification	Start simple, explain why complexity is needed
5	Ignoring training-serving skew	Mention feature store and consistency guarantees
6	Not discussing trade-offs	Every decision has alternatives; discuss at least one
7	Treating ML as a black box	Explain features, loss function, and training strategy
8	Forgetting about data collection and labeling	Discuss where labels come from and data quality
9	Not considering edge cases and failure modes	What happens when the model is wrong? Cold start? Data outage?
10	Running out of time without covering trade-offs	Watch the clock; move to trade-offs by minute 35

Quick Reference: ML System Design Cheat Sheet

Feature Categories (Use for Any System)

# Universal feature categories
#
# 1. User features:     Demographics, historical behavior, embeddings
# 2. Item features:     Content, metadata, popularity, age
# 3. Context features:  Time, device, location, session history
# 4. Cross features:    User-item interactions, similarity scores
# 5. Social features:   Friend activity, social proof signals
# 6. Real-time features: Current session behavior, trending signals

Model Selection Quick Guide

Scenario	Recommended Model	Why
V1 / Baseline	Logistic Regression or XGBoost	Fast to train, easy to debug, good baseline
Ranking with feature interactions	DCN v2 or DeepFM	Captures cross features automatically
Low-latency serving with precomputation	Two-Tower Network	User tower cached, fast dot-product scoring
Text understanding	Fine-tuned BERT / DistilBERT	State-of-the-art text classification
Multi-modal (text + image)	CLIP-based architecture	Shared embedding space for multiple modalities
Spatial-temporal data	GNN or Transformer with positional encoding	Captures graph structure and time patterns

Frequently Asked Questions

How detailed should my system architecture diagram be?

Draw 5–8 major components with arrows showing data flow. Include the data pipeline, feature store, training pipeline, model serving, and monitoring. Do not draw individual microservices or database schemas — keep it at the system level. Spend 8–10 minutes on this, then go deep on the ML components.

Should I always start with a simple model and upgrade?

Yes. Mention that V1 would use logistic regression or gradient boosted trees as a baseline. Then explain how V2 would use a deep learning model and why the added complexity is justified. This shows engineering maturity. Exception: if the interviewer explicitly asks about a specific deep learning architecture, go there directly.

How do I handle "I don't know" moments?

Be honest and pivot constructively. Say: “I have not worked directly with graph neural networks, but based on my understanding, the key idea is message passing between neighboring nodes. For this use case, I would start with a simpler approach like segment-level features and move to GNN if needed.” Interviewers value honesty over faking expertise.

What if the interviewer asks me to go deeper than I know?

Acknowledge the limits of your knowledge and reason from first principles. “I know the general approach but not the specific implementation detail. From first principles, I would expect that X because Y.” Then ask: “Is this the right direction?” Interviewers often test how you think under uncertainty — your reasoning process matters more than the specific answer.

How important are actual numbers (latency, QPS, data volume)?

Very important for demonstrating practical experience. You do not need exact numbers, but ballpark estimates show you have built real systems. Know these ranges: ML model inference (1–100ms), feature store lookup (1–10ms), typical CTR (1–5%), training data size (millions to billions of examples), and A/B test duration (1–4 weeks).

Should I mention ethical considerations?

Yes, especially for senior roles. Brief mentions of fairness (bias in training data), privacy (GDPR compliance, data retention), and societal impact (filter bubbles, misinformation) demonstrate product awareness. Do not spend more than 1–2 minutes on ethics unless the interviewer goes deeper. Frame it as a trade-off: “We need to balance personalization quality with the risk of creating filter bubbles.”

How do I practice ML system design without a partner?

Set a 45-minute timer and design a system on a whiteboard or document. Record yourself explaining the design. Practice these six systems from this course, then try new ones: recommendation system for e-commerce, fraud detection for payments, voice assistant intent classification, autonomous driving perception pipeline. The patterns you learned here transfer directly.

What is the biggest difference between ML system design and regular system design?

Regular system design focuses on data storage, APIs, and scaling. ML system design adds: feature engineering (most impactful), model selection with justification, offline/online metrics, training pipeline and data flywheel, model serving (latency vs. accuracy trade-off), and monitoring for model-specific issues like data drift and concept drift. You need both skills, but the ML-specific components are what differentiate your answer.

← Previous Design Content Moderation System Course Home → ML System Design Interview