Multi-Model Recommendation & Personalization
Modern recommendation systems combine multiple model types — embedding models for retrieval, ranking models for relevance, and LLMs for explanation and conversational recommendations — to deliver highly personalized user experiences at scale.
Modern Recommendation Architecture
Today's recommendation systems are multi-stage pipelines. Each stage uses a different model optimized for its specific task: fast retrieval from millions of candidates, precise ranking of hundreds, and rich explanation of the final results.
# Multi-Stage Recommendation Pipeline User Data (behavior, profile, context) → Embedding Model (user & item embeddings) → Candidate Retrieval (ANN search, ~1000 candidates from millions) → Ranking Model (cross-encoder, pointwise/pairwise scoring) → Business Rules (filtering, diversity, freshness) → LLM (explanation generation, personalized descriptions) → Final Recommendations (top 10–50 with explanations) # Models Used at Each Stage: # 1. Embedding: Two-Tower model, sentence-transformers, OpenAI ada-002 # 2. Retrieval: FAISS, Pinecone, Weaviate (ANN index) # 3. Ranking: XGBoost, LightGBM, cross-encoder transformer # 4. Explanation: Claude, GPT-4, Llama (LLM)
Combining Multiple Model Types
The power of modern recommenders comes from combining different signal types, each captured by a different model:
- Collaborative filtering: Learns from user-item interaction patterns (users who bought X also bought Y). Implemented as matrix factorization or neural collaborative filtering.
- Content-based embeddings: Encode item features (text descriptions, images, categories) into dense vectors. Uses sentence-transformers, CLIP, or domain-specific models.
- Behavioral signals: Click sequences, dwell time, purchase history encoded as sequential embeddings (transformers, GRU4Rec).
- Contextual features: Time of day, device, location, session intent — fed as features to the ranking model.
- LLM-generated metadata: Use LLMs to enrich item descriptions, extract tags, generate summaries that improve content-based matching.
Two-Tower Retrieval + Cross-Encoder Reranking
The two-tower architecture is the industry standard for large-scale retrieval. A user tower and an item tower independently produce embeddings, enabling precomputation of item embeddings and fast approximate nearest neighbor (ANN) search at query time.
import torch import torch.nn as nn class TwoTowerModel(nn.Module): """Two-tower retrieval model for candidate generation.""" def __init__(self, user_features_dim, item_features_dim, embedding_dim=128): super().__init__() # User tower: maps user features to embedding space self.user_tower = nn.Sequential( nn.Linear(user_features_dim, 256), nn.ReLU(), nn.BatchNorm1d(256), nn.Linear(256, 128), nn.ReLU(), nn.Linear(128, embedding_dim), nn.functional.normalize # L2 normalize for cosine similarity ) # Item tower: maps item features to same embedding space self.item_tower = nn.Sequential( nn.Linear(item_features_dim, 256), nn.ReLU(), nn.BatchNorm1d(256), nn.Linear(256, 128), nn.ReLU(), nn.Linear(128, embedding_dim), ) def encode_user(self, user_features): emb = self.user_tower(user_features) return nn.functional.normalize(emb, dim=-1) def encode_item(self, item_features): emb = self.item_tower(item_features) return nn.functional.normalize(emb, dim=-1) def forward(self, user_features, item_features): user_emb = self.encode_user(user_features) item_emb = self.encode_item(item_features) # Cosine similarity as relevance score return torch.sum(user_emb * item_emb, dim=-1) # Training with contrastive loss model = TwoTowerModel(user_features_dim=64, item_features_dim=128) optimizer = torch.optim.Adam(model.parameters(), lr=1e-3) # In-batch negatives: positives on diagonal, negatives off-diagonal def contrastive_loss(user_embs, item_embs, temperature=0.07): similarity = torch.matmul(user_embs, item_embs.T) / temperature labels = torch.arange(similarity.shape[0], device=similarity.device) return nn.CrossEntropyLoss()(similarity, labels)
LLM-Enhanced Recommendations
LLMs add a powerful new dimension to recommendation systems: they can explain why an item was recommended, generate personalized descriptions, and enable conversational recommendation experiences.
Generating Recommendation Explanations
import anthropic import numpy as np from sentence_transformers import SentenceTransformer class SmartRecommender: def __init__(self): self.encoder = SentenceTransformer("all-MiniLM-L6-v2") self.llm = anthropic.Anthropic() self.product_db = {} # product_id -> {name, description, category, ...} self.product_embeddings = {} # product_id -> np.array def index_products(self, products: list[dict]): """Encode all products into the embedding space.""" for product in products: pid = product["id"] self.product_db[pid] = product # Combine name + description + category for rich embedding text = f"{product['name']}. {product['description']}. Category: {product['category']}" self.product_embeddings[pid] = self.encoder.encode(text) def get_user_embedding(self, user_history: list[str]) -> np.ndarray: """Create user embedding from their interaction history.""" # Average embeddings of interacted products (weighted by recency) embeddings = [] weights = [] for i, pid in enumerate(user_history): if pid in self.product_embeddings: embeddings.append(self.product_embeddings[pid]) weights.append(1.0 + i * 0.1) # More recent = higher weight weights = np.array(weights) / sum(weights) return np.average(embeddings, axis=0, weights=weights) def retrieve_candidates(self, user_embedding: np.ndarray, k: int = 20) -> list: """Retrieve top-k similar products using cosine similarity.""" scores = {} for pid, emb in self.product_embeddings.items(): similarity = np.dot(user_embedding, emb) / ( np.linalg.norm(user_embedding) * np.linalg.norm(emb) ) scores[pid] = float(similarity) top_pids = sorted(scores, key=scores.get, reverse=True)[:k] return [(pid, scores[pid]) for pid in top_pids] def generate_explanations(self, user_history: list[str], recommendations: list[tuple]) -> list[dict]: """Use LLM to generate personalized explanations for each recommendation.""" # Build context about user's history history_items = [self.product_db[pid]["name"] for pid in user_history if pid in self.product_db] rec_items = [ f"- {self.product_db[pid]['name']}: {self.product_db[pid]['description']}" for pid, score in recommendations[:5] ] response = self.llm.messages.create( model="claude-sonnet-4-20250514", max_tokens=1024, messages=[{ "role": "user", "content": f"""A user has previously interacted with these products: {', '.join(history_items[-5:])} We are recommending these products: {chr(10).join(rec_items)} For each recommended product, write a brief 1-sentence personalized explanation of why this user would like it, based on their history. Format as JSON array: [{{"product": "name", "explanation": "Because you liked X, ..."}}]""" }] ) import json explanations = json.loads(response.content[0].text) results = [] for i, (pid, score) in enumerate(recommendations[:5]): product = self.product_db[pid] results.append({ "product": product, "score": score, "explanation": explanations[i]["explanation"] if i < len(explanations) else "" }) return results def recommend(self, user_history: list[str]) -> list[dict]: """Full recommendation pipeline: embed → retrieve → explain.""" user_emb = self.get_user_embedding(user_history) candidates = self.retrieve_candidates(user_emb) # Filter out already-seen products candidates = [(pid, s) for pid, s in candidates if pid not in user_history] return self.generate_explanations(user_history, candidates) # Usage recommender = SmartRecommender() recommender.index_products(product_catalog) results = recommender.recommend(user_history=["prod_1", "prod_42", "prod_7"]) for r in results: print(f"{r['product']['name']} (score: {r['score']:.3f})") print(f" {r['explanation']}")
Personalized Search with Embeddings + LLM Reranking
Personalized search combines the user's query embedding with their profile embedding and uses an LLM to rerank results for maximum relevance and personalization.
import anthropic from sentence_transformers import SentenceTransformer import numpy as np class PersonalizedSearch: def __init__(self): self.encoder = SentenceTransformer("all-MiniLM-L6-v2") self.llm = anthropic.Anthropic() def search(self, query: str, user_profile: dict, items: list[dict], top_k: int = 10) -> list[dict]: """Search with personalized reranking.""" # Step 1: Semantic search with query embedding query_emb = self.encoder.encode(query) item_scores = [] for item in items: item_emb = self.encoder.encode(item["description"]) score = float(np.dot(query_emb, item_emb) / ( np.linalg.norm(query_emb) * np.linalg.norm(item_emb) )) item_scores.append((item, score)) # Step 2: Get top candidates from embedding search candidates = sorted(item_scores, key=lambda x: x[1], reverse=True)[:top_k * 2] # Step 3: LLM reranking with user profile context items_text = "\n".join( f"[{i}] {item['name']}: {item['description'][:100]}" for i, (item, _) in enumerate(candidates) ) response = self.llm.messages.create( model="claude-sonnet-4-20250514", max_tokens=500, messages=[{ "role": "user", "content": f"""Rerank these search results for the query "{query}" considering this user profile: - Preferences: {user_profile.get('preferences', 'none')} - Past purchases: {user_profile.get('past_categories', 'none')} - Budget: {user_profile.get('budget', 'any')} Items: {items_text} Return the indices in order of best match as JSON: [0, 3, 1, ...]""" }] ) import json reranked_indices = json.loads(response.content[0].text) return [candidates[i][0] for i in reranked_indices[:top_k]]
Recommendation Architecture by Scale
| Scale | Items | Users | Retrieval | Ranking | Infrastructure |
|---|---|---|---|---|---|
| Startup | < 10K | < 100K | Simple embedding similarity (NumPy) | LLM reranking or rule-based | Single server, SQLite/Postgres |
| Growth | 10K–1M | 100K–10M | FAISS or Pinecone (ANN) | XGBoost/LightGBM ranker | Managed vector DB, Redis cache |
| Enterprise | 1M+ | 10M+ | Two-tower model + distributed ANN | Deep cross-encoder + multi-objective | Kubernetes, feature store, A/B platform |
| Hyperscale | 100M+ | 1B+ | Multi-stage retrieval (hash + ANN) | Mixture of experts, real-time training | Custom infra, thousands of GPUs |
Real-Time Feature Computation
Production recommendation systems need features computed in real time — a user's last click, trending items in the past hour, or current session intent. This requires a feature store architecture:
import redis import json from datetime import datetime, timedelta class RealtimeFeatureStore: """Compute and serve real-time features for recommendations.""" def __init__(self): self.redis = redis.Redis(host="localhost", port=6379, db=0) def record_interaction(self, user_id: str, item_id: str, event_type: str): """Record a user interaction for real-time feature updates.""" timestamp = datetime.now().isoformat() event = json.dumps({ "item_id": item_id, "event": event_type, "timestamp": timestamp }) # User's recent interactions (sliding window) self.redis.lpush(f"user:{user_id}:recent", event) self.redis.ltrim(f"user:{user_id}:recent", 0, 99) # Item popularity counter (hourly bucket) hour_key = datetime.now().strftime("%Y%m%d%H") self.redis.hincrby(f"trending:{hour_key}", item_id, 1) self.redis.expire(f"trending:{hour_key}", 86400) def get_user_features(self, user_id: str) -> dict: """Get real-time user features for recommendation scoring.""" recent = self.redis.lrange(f"user:{user_id}:recent", 0, 9) recent_items = [json.loads(e)["item_id"] for e in recent] return { "recent_items": recent_items, "session_length": len(recent), "last_event_type": json.loads(recent[0])["event"] if recent else None } def get_trending_items(self, hours: int = 24, top_k: int = 50) -> list: """Get trending items over the past N hours.""" counts = {} now = datetime.now() for h in range(hours): hour_key = (now - timedelta(hours=h)).strftime("%Y%m%d%H") hour_counts = self.redis.hgetall(f"trending:{hour_key}") for item_id, count in hour_counts.items(): item_id = item_id.decode() counts[item_id] = counts.get(item_id, 0) + int(count) return sorted(counts.items(), key=lambda x: x[1], reverse=True)[:top_k]
A/B Testing and Online Evaluation
Recommendation systems require rigorous A/B testing because offline metrics (precision, recall, NDCG) often do not correlate perfectly with online business metrics (revenue, engagement, retention).
- Offline metrics: Precision@K, Recall@K, NDCG, Mean Reciprocal Rank (MRR), catalog coverage, diversity
- Online metrics: Click-through rate (CTR), conversion rate, revenue per session, time on site, return visits
- Interleaving experiments: Mix results from two models in a single list and measure which model's items get more clicks. Requires fewer users than traditional A/B tests.
- Multi-armed bandits: Dynamically allocate more traffic to better-performing models using Thompson Sampling or UCB. Reduces regret during experiments.
- Long-term effects: Measure user retention and lifetime value, not just immediate clicks. Some models optimize short-term engagement at the cost of long-term satisfaction.
Cold Start Solutions with LLM Content Understanding
The cold start problem — making recommendations for new users or new items with no interaction history — is where LLMs provide a significant advantage:
import anthropic def cold_start_item_enrichment(item: dict) -> dict: """Use LLM to generate rich features for new items with no interaction data.""" client = anthropic.Anthropic() response = client.messages.create( model="claude-sonnet-4-20250514", max_tokens=500, messages=[{ "role": "user", "content": f"""Analyze this product for a recommendation system. Product: {item['name']} Description: {item['description']} Category: {item['category']} Price: ${item.get('price', 'N/A')} Generate a JSON object with: - "target_audience": ["audience segment 1", ...], - "use_cases": ["use case 1", ...], - "similar_to": ["comparable product types"], - "keywords": ["semantic keywords for matching"], - "appeal_factors": ["what makes this appealing"], - "complementary_categories": ["categories often bought together"]""" }] ) import json enrichment = json.loads(response.content[0].text) item["llm_features"] = enrichment return item def cold_start_user_onboarding(user_preferences: str) -> list[str]: """Convert new user's stated preferences into recommendation signals.""" client = anthropic.Anthropic() response = client.messages.create( model="claude-sonnet-4-20250514", max_tokens=300, messages=[{ "role": "user", "content": f"""A new user described their preferences as: "{user_preferences}" Generate a JSON array of 10 search queries that would match products this user would likely enjoy. Be specific and diverse. Example: ["wireless noise-canceling headphones under $200", ...]""" }] ) import json return json.loads(response.content[0].text)
Use Cases
- E-commerce: Product recommendations with “because you bought X” explanations, personalized homepage, complementary items at checkout
- Streaming (Netflix, Spotify): Content recommendations using multi-signal ranking — viewing history, explicit ratings, time-of-day patterns, and social graphs
- News and content feeds: Personalized article ranking balancing relevance, recency, diversity, and avoiding filter bubbles
- Education (course recommendation): Suggest next courses based on skill gaps, learning pace, career goals, and peer learning paths
- Job matching: Match candidates to jobs using resume embeddings, skill extraction, experience matching, and culture fit scoring
Lilly Tech Systems