Reusable Patterns & Tips
After working through six ML system design problems, you have seen the same architectural patterns emerge repeatedly. This lesson crystallizes those patterns, provides a communication framework for your interview, and answers the most frequently asked questions.
Pattern 1: Two-Stage Retrieval + Ranking
This pattern appears in nearly every ML system that must select items from a large candidate pool.
| System | Retrieval Stage | Ranking Stage |
|---|---|---|
| News Feed | Collaborative filtering + friend posts (~10K) | Multi-task DNN (~50 shown) |
| Search Autocomplete | Trie prefix matching (~1K) | LambdaMART or two-tower (~8 shown) |
| Ad Click | Targeting + budget filter (~10K) | DCN v2 (~5 shown) |
Pattern 2: Cascade Architecture
Layer simple, fast models before complex, expensive ones. Each layer filters out easy cases so the next layer handles fewer, harder cases.
# Cascade pattern
#
# Layer 1: Rules/heuristics --> Handle 60% of cases (0 compute cost)
# Layer 2: Lightweight ML model --> Handle 25% of cases (cheap compute)
# Layer 3: Deep learning model --> Handle 10% of cases (expensive compute)
# Layer 4: Human review --> Handle 5% of cases (most expensive)
#
# Used in:
# - Spam detection: blocklist -> metadata model -> content model -> review
# - Content moderation: hash match -> keyword rules -> classifier -> review
# - Search ranking: exact match -> BM25 -> neural ranker
Pattern 3: Multi-Task Learning
When you need to predict multiple related outcomes, use a shared backbone with task-specific heads.
# Multi-task pattern
#
# Shared layers (learn general patterns)
# |
# +---------+---------+---------+
# | Head 1 | Head 2 | Head 3 |
# | P(click)| P(like) | P(share)|
# +---------+---------+---------+
#
# Benefits:
# - More data for shared layers (all tasks contribute)
# - Single forward pass for all predictions (faster serving)
# - Regularization effect (prevents overfitting to any single task)
#
# Used in: News feed ranking, ad click + conversion, content moderation
Pattern 4: Feature Store Architecture
Separate feature computation from model training and serving to ensure consistency.
| Component | Description | Example Features |
|---|---|---|
| Offline Feature Store | Batch-computed features (updated hourly/daily) | User demographics, 30-day engagement stats, item popularity |
| Online Feature Store | Real-time features (updated per-event) | Session activity, last click timestamp, current location |
| Feature Registry | Metadata about features (schema, owner, freshness) | Feature documentation, lineage, SLA |
Pattern 5: Online Experimentation (A/B Testing)
Every ML system needs a rigorous experimentation framework.
# A/B test design checklist for ML systems
#
# 1. Define hypothesis: "New model will increase CTR by 2%"
# 2. Choose metrics:
# - Primary: CTR (what we want to improve)
# - Guardrails: User satisfaction, revenue, latency (must not regress)
# 3. Power analysis: How many users/days needed for significance?
# - Effect size: 2% relative CTR lift
# - Significance level: p < 0.05
# - Power: 80%
# - Typically: 1M users per arm, 7 days minimum
# 4. Randomization: Hash user ID to assign treatment/control
# 5. Ramp-up: 1% -> 5% -> 20% -> 50% -> 100% (monitor at each stage)
# 6. Analysis: Check for novelty effects, segment-level impacts
Pattern 6: Feedback Loop Design
ML systems that improve over time have well-designed feedback loops.
Positive Feedback Loop
Model predicts well → users interact more → more training data → model improves. Seen in recommendation systems. Risk: popular items get more popular (rich-get-richer).
Negative Feedback Loop
Model blocks spam → spammers adapt → model accuracy drops → need retraining. Seen in adversarial settings (spam, fraud). Mitigation: continuous retraining.
Delayed Feedback
Action now, outcome later. Ad click happens immediately, but conversion may take days. Solution: use click as proxy label now, update with conversion label later.
Selection Bias
Model only sees outcomes for items it showed. If you never show an ad, you never know if the user would have clicked. Solution: exploration (epsilon-greedy, Thompson sampling).
Communication Framework for the Interview
How you communicate is almost as important as what you design. Use this framework:
The STAR-ML Method
| Step | What to Do | Time | Example Phrases |
|---|---|---|---|
| Scope | Clarify requirements, define ML objective | 5 min | “Before I design, let me make sure I understand the requirements…” |
| Topology | Draw system architecture, data flow | 8 min | “Here is how data flows through the system…” |
| Architect | Deep dive into features, model, metrics | 18 min | “Let me go deep on the ranking model. I would use… because…” |
| Refine | Trade-offs, extensions, what you would do next | 6 min | “The key trade-off here is… I chose X over Y because…” |
| ML | Tie everything back to ML-specific concerns | Throughout | “This matters because of training-serving skew…” |
Phrases That Impress Interviewers
- “I would start with a simple baseline — logistic regression with hand-crafted features — and iterate from there.”
- “The trade-off here is between X and Y. Given our latency constraints, I would choose X and revisit Y in V2.”
- “We need both offline metrics (AUC for ranking quality) and online metrics (CTR for business impact) to evaluate this.”
- “One risk I want to flag is training-serving skew in the feature pipeline. Here is how I would prevent it.”
- “For V1, I would use batch predictions. For V2, we could move to online inference if the latency budget allows.”
- “This design assumes we have labeled data. If not, we could bootstrap with rule-based labels or use semi-supervised learning.”
Common Mistakes to Avoid
| # | Mistake | What to Do Instead |
|---|---|---|
| 1 | Jumping to model architecture without clarifying requirements | Spend 5 minutes asking questions first |
| 2 | Designing only the ML model, ignoring the system around it | Cover data pipeline, feature store, serving, monitoring |
| 3 | Not defining metrics until asked | Proactively define offline and online metrics |
| 4 | Choosing a complex model without justification | Start simple, explain why complexity is needed |
| 5 | Ignoring training-serving skew | Mention feature store and consistency guarantees |
| 6 | Not discussing trade-offs | Every decision has alternatives; discuss at least one |
| 7 | Treating ML as a black box | Explain features, loss function, and training strategy |
| 8 | Forgetting about data collection and labeling | Discuss where labels come from and data quality |
| 9 | Not considering edge cases and failure modes | What happens when the model is wrong? Cold start? Data outage? |
| 10 | Running out of time without covering trade-offs | Watch the clock; move to trade-offs by minute 35 |
Quick Reference: ML System Design Cheat Sheet
Feature Categories (Use for Any System)
# Universal feature categories
#
# 1. User features: Demographics, historical behavior, embeddings
# 2. Item features: Content, metadata, popularity, age
# 3. Context features: Time, device, location, session history
# 4. Cross features: User-item interactions, similarity scores
# 5. Social features: Friend activity, social proof signals
# 6. Real-time features: Current session behavior, trending signals
Model Selection Quick Guide
| Scenario | Recommended Model | Why |
|---|---|---|
| V1 / Baseline | Logistic Regression or XGBoost | Fast to train, easy to debug, good baseline |
| Ranking with feature interactions | DCN v2 or DeepFM | Captures cross features automatically |
| Low-latency serving with precomputation | Two-Tower Network | User tower cached, fast dot-product scoring |
| Text understanding | Fine-tuned BERT / DistilBERT | State-of-the-art text classification |
| Multi-modal (text + image) | CLIP-based architecture | Shared embedding space for multiple modalities |
| Spatial-temporal data | GNN or Transformer with positional encoding | Captures graph structure and time patterns |
Frequently Asked Questions
How detailed should my system architecture diagram be?
Draw 5–8 major components with arrows showing data flow. Include the data pipeline, feature store, training pipeline, model serving, and monitoring. Do not draw individual microservices or database schemas — keep it at the system level. Spend 8–10 minutes on this, then go deep on the ML components.
Should I always start with a simple model and upgrade?
Yes. Mention that V1 would use logistic regression or gradient boosted trees as a baseline. Then explain how V2 would use a deep learning model and why the added complexity is justified. This shows engineering maturity. Exception: if the interviewer explicitly asks about a specific deep learning architecture, go there directly.
How do I handle "I don't know" moments?
Be honest and pivot constructively. Say: “I have not worked directly with graph neural networks, but based on my understanding, the key idea is message passing between neighboring nodes. For this use case, I would start with a simpler approach like segment-level features and move to GNN if needed.” Interviewers value honesty over faking expertise.
What if the interviewer asks me to go deeper than I know?
Acknowledge the limits of your knowledge and reason from first principles. “I know the general approach but not the specific implementation detail. From first principles, I would expect that X because Y.” Then ask: “Is this the right direction?” Interviewers often test how you think under uncertainty — your reasoning process matters more than the specific answer.
How important are actual numbers (latency, QPS, data volume)?
Very important for demonstrating practical experience. You do not need exact numbers, but ballpark estimates show you have built real systems. Know these ranges: ML model inference (1–100ms), feature store lookup (1–10ms), typical CTR (1–5%), training data size (millions to billions of examples), and A/B test duration (1–4 weeks).
Should I mention ethical considerations?
Yes, especially for senior roles. Brief mentions of fairness (bias in training data), privacy (GDPR compliance, data retention), and societal impact (filter bubbles, misinformation) demonstrate product awareness. Do not spend more than 1–2 minutes on ethics unless the interviewer goes deeper. Frame it as a trade-off: “We need to balance personalization quality with the risk of creating filter bubbles.”
How do I practice ML system design without a partner?
Set a 45-minute timer and design a system on a whiteboard or document. Record yourself explaining the design. Practice these six systems from this course, then try new ones: recommendation system for e-commerce, fraud detection for payments, voice assistant intent classification, autonomous driving perception pipeline. The patterns you learned here transfer directly.
What is the biggest difference between ML system design and regular system design?
Regular system design focuses on data storage, APIs, and scaling. ML system design adds: feature engineering (most impactful), model selection with justification, offline/online metrics, training pipeline and data flywheel, model serving (latency vs. accuracy trade-off), and monitoring for model-specific issues like data drift and concept drift. You need both skills, but the ML-specific components are what differentiate your answer.
Lilly Tech Systems