Problem Statement & Requirements
Why Recommendation Engines Matter
Recommendation engines drive 35% of Amazon's revenue, 80% of Netflix watch time, and 60% of YouTube views. They are one of the most impactful ML systems in production, directly translating to billions of dollars in engagement and revenue.
Think of a recommendation engine like a knowledgeable friend who knows your tastes perfectly. When you walk into a bookstore, this friend immediately pulls books from the shelves that you will love — some because they are similar to books you have enjoyed, others because people with similar tastes loved them, and a few surprises to keep things interesting.
Functional Requirements
- Personalized recommendations — Generate ranked lists of items tailored to each user
- Multiple recommendation types — "Because you watched X", "Trending", "Similar items"
- Real-time updates — Recommendations reflect recent user behavior within minutes
- Content-based + collaborative filtering — Combine multiple signals for relevance
- A/B testing — Support concurrent experiments on recommendation algorithms
- Explainability — Provide reasons for each recommendation
Non-Functional Requirements
- Low latency — Serve recommendations in <200ms (p99)
- High throughput — Handle 100M+ daily active users
- Freshness — New items surfaced within hours of being added
- Availability — 99.99% uptime; fallback to popular items if models fail
Back-of-Envelope Estimation
| Parameter | Estimate |
|---|---|
| Daily Active Users | 100M |
| Total items in catalog | 10M |
| User interactions/day | 1B (clicks, views, purchases) |
| Embedding dimension | 256 floats per user/item |
| User embedding storage | 100M × 256 × 4B = ~100 GB |
| Item embedding storage | 10M × 256 × 4B = ~10 GB |
| Recommendation QPS | ~50K (peak: 150K) |
| Candidate generation latency budget | <50ms |
| Total latency budget | <200ms |
System API Design
# Get personalized recommendations for a user
GET /api/v1/recommendations/{user_id}
?type=personalized|similar|trending
&limit=20
&context=homepage|product_page
&exclude=[item_ids]
# Response
{
"recommendations": [
{
"item_id": "item_12345",
"score": 0.94,
"reason": "Because you watched Inception",
"model_version": "v3.2"
}
],
"experiment_id": "exp_ab_2024_q1"
}
# Record user interaction event
POST /api/v1/events
{
"user_id": "user_789",
"item_id": "item_12345",
"event_type": "click|view|purchase|rating",
"timestamp": "2024-01-15T10:30:00Z",
"context": { "source": "homepage_carousel" }
}
# Get similar items
GET /api/v1/similar/{item_id}?limit=10
Data Model
-- User profiles with preferences
CREATE TABLE users (
user_id BIGINT PRIMARY KEY,
created_at TIMESTAMP,
country VARCHAR(2),
preferences JSONB -- explicit preferences
);
-- Item catalog
CREATE TABLE items (
item_id BIGINT PRIMARY KEY,
title TEXT,
category VARCHAR(50),
tags TEXT[],
created_at TIMESTAMP,
popularity FLOAT -- rolling popularity score
);
-- Interaction events (append-only, partitioned by date)
CREATE TABLE interactions (
event_id BIGINT,
user_id BIGINT,
item_id BIGINT,
event_type VARCHAR(20),
timestamp TIMESTAMP,
context JSONB
) PARTITION BY RANGE (timestamp);
-- Precomputed embeddings
CREATE TABLE embeddings (
entity_id BIGINT,
entity_type VARCHAR(10), -- 'user' or 'item'
vector FLOAT[256],
model_ver VARCHAR(10),
updated_at TIMESTAMP
);
High-Level Architecture
The recommendation pipeline follows a two-stage funnel pattern used at Netflix, YouTube, and Pinterest:
Stage 1: Candidate Generation
Quickly retrieve ~1,000 candidates from millions of items using cheap models (embedding similarity, co-occurrence). Multiple candidate generators run in parallel: collaborative filtering, content-based, trending, and exploration.
Stage 2: Scoring & Ranking
Apply an expensive, feature-rich model to rank the ~1,000 candidates. Uses user features, item features, context (time of day, device), and cross-features. Outputs a score per item.
Stage 3: Re-Ranking & Business Rules
Apply diversity rules (no 3 items from same genre in a row), freshness boosts, promotional slots, and content policy filters. This stage ensures the final list is balanced and business-aligned.
Online vs. Offline Split
Offline: Train models, compute embeddings, build ANN indices (runs hourly/daily).
Nearline: Update user features from recent events via streaming (Kafka → Flink).
Online: Serve candidates, score, re-rank in real-time (<200ms).
Deep Dive: Core Components
Collaborative Filtering
Collaborative filtering finds patterns in user-item interaction matrices. Two main approaches:
import numpy as np
from scipy.sparse import csr_matrix
class ALSModel:
def __init__(self, n_factors=256, reg=0.01, epochs=15):
self.n_factors = n_factors
self.reg = reg
self.epochs = epochs
def fit(self, interactions: csr_matrix):
"""Alternating Least Squares on user-item matrix."""
n_users, n_items = interactions.shape
# Initialize user and item factor matrices
self.user_factors = np.random.normal(
0, 0.01, (n_users, self.n_factors)
)
self.item_factors = np.random.normal(
0, 0.01, (n_items, self.n_factors)
)
for epoch in range(self.epochs):
# Fix items, solve for users
self._solve(interactions, self.user_factors,
self.item_factors, is_user=True)
# Fix users, solve for items
self._solve(interactions.T, self.item_factors,
self.user_factors, is_user=False)
def recommend(self, user_id, n=20):
"""Compute dot product: user_vector @ all_items."""
scores = self.user_factors[user_id] @ self.item_factors.T
top_items = np.argsort(scores)[::-1][:n]
return top_items, scores[top_items]
Content-Based Filtering
Uses item metadata (genre, tags, description) to find similar items. Works well for new items with no interaction history (cold start).
from sentence_transformers import SentenceTransformer
model = SentenceTransformer('all-MiniLM-L6-v2')
def compute_item_embeddings(items):
"""Generate embeddings from item descriptions."""
texts = [f"{item.title} {item.description}"
for item in items]
embeddings = model.encode(texts, batch_size=256)
return embeddings # shape: (n_items, 384)
def find_similar(query_embedding, index, k=50):
"""ANN search using FAISS for fast retrieval."""
distances, indices = index.search(
query_embedding.reshape(1, -1), k
)
return indices[0], distances[0]
Hybrid Approaches
Production systems combine multiple signals. The ranking model takes features from all sources:
- Collaborative signals: User/item embedding similarity scores
- Content signals: Genre match, tag overlap, text similarity
- Behavioral signals: Click-through rate, watch completion, time spent
- Context signals: Time of day, device type, session history
Cold Start Strategies
Handling New Users & Items
New users: Start with popularity-based recs, then use onboarding quiz preferences, then transition to personalized as interactions accumulate (>10 events).
New items: Use content-based features (metadata, embeddings) for initial placement. Boost exploration probability. Use multi-armed bandits to efficiently learn item quality.
Scaling & Optimization
Approximate Nearest Neighbor (ANN)
Exact brute-force search over 10M item embeddings is too slow. Use ANN indices:
| Library | Index Type | QPS (10M items) | Recall@10 |
|---|---|---|---|
| FAISS | IVF + PQ | ~50,000 | 95% |
| ScaNN | Anisotropic quantization | ~80,000 | 97% |
| HNSW | Graph-based | ~30,000 | 99% |
Feature Caching
Cache hot user features in Redis with TTL. User embeddings updated every 15 minutes via streaming pipeline. Item features cached at CDN edge for popular items.
A/B Testing Framework
Route users deterministically to experiment buckets using hash(user_id) % 100. Log experiment assignment with every recommendation served. Use interleaving (team-draft) for faster statistical significance than traditional A/B.
Practice Problems
Practice 1: Cold Start
A new streaming platform launches with 10,000 titles but zero user data. Design a recommendation strategy for the first 30 days. How do you bootstrap collaborative filtering?
Practice 2: Real-Time Personalization
A user watches 3 episodes of a sci-fi series in one sitting. How quickly should recommendations update? Design the nearline pipeline to handle session-based context.
Practice 3: Diversity vs. Relevance
Your recommendation model achieves great click-through rate but users complain of "filter bubbles." Design a re-ranking strategy that balances relevance, diversity, and serendipity.
Quick Reference
| Component | Technology | Purpose |
|---|---|---|
| Candidate Generation | FAISS / ScaNN | Fast ANN retrieval from embeddings |
| Feature Store | Redis + Feast | Low-latency user/item features |
| Ranking Model | XGBoost / Deep&Wide | Score and rank candidates |
| Event Streaming | Kafka + Flink | Real-time interaction processing |
| Model Training | Spark + PyTorch | Offline model retraining |
| Experiment Platform | Custom / Statsig | A/B test management |
| Embedding Storage | Milvus / Pinecone | Vector database for embeddings |
Key Takeaways
- Use a two-stage funnel: cheap candidate generation → expensive ranking
- Combine collaborative + content-based signals for robustness
- Separate offline training from online serving with nearline feature updates
- Use ANN indices (FAISS/ScaNN) for sub-millisecond embedding lookup
- Always have fallback strategies (popular items, trending) for cold start and failures