Design a Recommendation Engine | LIZIU System Design

Problem Statement & Requirements

Why Recommendation Engines Matter

Recommendation engines drive 35% of Amazon's revenue, 80% of Netflix watch time, and 60% of YouTube views. They are one of the most impactful ML systems in production, directly translating to billions of dollars in engagement and revenue.

Think of a recommendation engine like a knowledgeable friend who knows your tastes perfectly. When you walk into a bookstore, this friend immediately pulls books from the shelves that you will love — some because they are similar to books you have enjoyed, others because people with similar tastes loved them, and a few surprises to keep things interesting.

Functional Requirements

Personalized recommendations — Generate ranked lists of items tailored to each user
Multiple recommendation types — "Because you watched X", "Trending", "Similar items"
Real-time updates — Recommendations reflect recent user behavior within minutes
Content-based + collaborative filtering — Combine multiple signals for relevance
A/B testing — Support concurrent experiments on recommendation algorithms
Explainability — Provide reasons for each recommendation

Non-Functional Requirements

Low latency — Serve recommendations in <200ms (p99)
High throughput — Handle 100M+ daily active users
Freshness — New items surfaced within hours of being added
Availability — 99.99% uptime; fallback to popular items if models fail

Back-of-Envelope Estimation

Parameter	Estimate
Daily Active Users	100M
Total items in catalog	10M
User interactions/day	1B (clicks, views, purchases)
Embedding dimension	256 floats per user/item
User embedding storage	100M × 256 × 4B = ~100 GB
Item embedding storage	10M × 256 × 4B = ~10 GB
Recommendation QPS	~50K (peak: 150K)
Candidate generation latency budget	<50ms
Total latency budget	<200ms

System API Design

Recommendation API

# Get personalized recommendations for a user
GET /api/v1/recommendations/{user_id}
  ?type=personalized|similar|trending
  &limit=20
  &context=homepage|product_page
  &exclude=[item_ids]

# Response
{
  "recommendations": [
    {
      "item_id": "item_12345",
      "score": 0.94,
      "reason": "Because you watched Inception",
      "model_version": "v3.2"
    }
  ],
  "experiment_id": "exp_ab_2024_q1"
}

# Record user interaction event
POST /api/v1/events
{
  "user_id": "user_789",
  "item_id": "item_12345",
  "event_type": "click|view|purchase|rating",
  "timestamp": "2024-01-15T10:30:00Z",
  "context": { "source": "homepage_carousel" }
}

# Get similar items
GET /api/v1/similar/{item_id}?limit=10

Data Model

Core Schema

-- User profiles with preferences
CREATE TABLE users (
    user_id     BIGINT PRIMARY KEY,
    created_at  TIMESTAMP,
    country     VARCHAR(2),
    preferences JSONB        -- explicit preferences
);

-- Item catalog
CREATE TABLE items (
    item_id     BIGINT PRIMARY KEY,
    title       TEXT,
    category    VARCHAR(50),
    tags        TEXT[],
    created_at  TIMESTAMP,
    popularity  FLOAT         -- rolling popularity score
);

-- Interaction events (append-only, partitioned by date)
CREATE TABLE interactions (
    event_id    BIGINT,
    user_id     BIGINT,
    item_id     BIGINT,
    event_type  VARCHAR(20),
    timestamp   TIMESTAMP,
    context     JSONB
) PARTITION BY RANGE (timestamp);

-- Precomputed embeddings
CREATE TABLE embeddings (
    entity_id   BIGINT,
    entity_type VARCHAR(10),  -- 'user' or 'item'
    vector      FLOAT[256],
    model_ver   VARCHAR(10),
    updated_at  TIMESTAMP
);

High-Level Architecture

The recommendation pipeline follows a two-stage funnel pattern used at Netflix, YouTube, and Pinterest:

Stage 1: Candidate Generation

Quickly retrieve ~1,000 candidates from millions of items using cheap models (embedding similarity, co-occurrence). Multiple candidate generators run in parallel: collaborative filtering, content-based, trending, and exploration.

Stage 2: Scoring & Ranking

Apply an expensive, feature-rich model to rank the ~1,000 candidates. Uses user features, item features, context (time of day, device), and cross-features. Outputs a score per item.

Stage 3: Re-Ranking & Business Rules

Apply diversity rules (no 3 items from same genre in a row), freshness boosts, promotional slots, and content policy filters. This stage ensures the final list is balanced and business-aligned.

Online vs. Offline Split

Offline: Train models, compute embeddings, build ANN indices (runs hourly/daily).
Nearline: Update user features from recent events via streaming (Kafka → Flink).
Online: Serve candidates, score, re-rank in real-time (<200ms).

Deep Dive: Core Components

Collaborative Filtering

Collaborative filtering finds patterns in user-item interaction matrices. Two main approaches:

Matrix Factorization (ALS)

import numpy as np
from scipy.sparse import csr_matrix

class ALSModel:
    def __init__(self, n_factors=256, reg=0.01, epochs=15):
        self.n_factors = n_factors
        self.reg = reg
        self.epochs = epochs

    def fit(self, interactions: csr_matrix):
        """Alternating Least Squares on user-item matrix."""
        n_users, n_items = interactions.shape
        # Initialize user and item factor matrices
        self.user_factors = np.random.normal(
            0, 0.01, (n_users, self.n_factors)
        )
        self.item_factors = np.random.normal(
            0, 0.01, (n_items, self.n_factors)
        )

        for epoch in range(self.epochs):
            # Fix items, solve for users
            self._solve(interactions, self.user_factors,
                        self.item_factors, is_user=True)
            # Fix users, solve for items
            self._solve(interactions.T, self.item_factors,
                        self.user_factors, is_user=False)

    def recommend(self, user_id, n=20):
        """Compute dot product: user_vector @ all_items."""
        scores = self.user_factors[user_id] @ self.item_factors.T
        top_items = np.argsort(scores)[::-1][:n]
        return top_items, scores[top_items]

Content-Based Filtering

Uses item metadata (genre, tags, description) to find similar items. Works well for new items with no interaction history (cold start).

Embedding-Based Similarity

from sentence_transformers import SentenceTransformer

model = SentenceTransformer('all-MiniLM-L6-v2')

def compute_item_embeddings(items):
    """Generate embeddings from item descriptions."""
    texts = [f"{item.title} {item.description}"
             for item in items]
    embeddings = model.encode(texts, batch_size=256)
    return embeddings  # shape: (n_items, 384)

def find_similar(query_embedding, index, k=50):
    """ANN search using FAISS for fast retrieval."""
    distances, indices = index.search(
        query_embedding.reshape(1, -1), k
    )
    return indices[0], distances[0]

Hybrid Approaches

Production systems combine multiple signals. The ranking model takes features from all sources:

Collaborative signals: User/item embedding similarity scores
Content signals: Genre match, tag overlap, text similarity
Behavioral signals: Click-through rate, watch completion, time spent
Context signals: Time of day, device type, session history

Cold Start Strategies

Handling New Users & Items

New users: Start with popularity-based recs, then use onboarding quiz preferences, then transition to personalized as interactions accumulate (>10 events).
New items: Use content-based features (metadata, embeddings) for initial placement. Boost exploration probability. Use multi-armed bandits to efficiently learn item quality.

Scaling & Optimization

Approximate Nearest Neighbor (ANN)

Exact brute-force search over 10M item embeddings is too slow. Use ANN indices:

Library	Index Type	QPS (10M items)	Recall@10
FAISS	IVF + PQ	~50,000	95%
ScaNN	Anisotropic quantization	~80,000	97%
HNSW	Graph-based	~30,000	99%

Feature Caching

Cache hot user features in Redis with TTL. User embeddings updated every 15 minutes via streaming pipeline. Item features cached at CDN edge for popular items.

A/B Testing Framework

Route users deterministically to experiment buckets using hash(user_id) % 100. Log experiment assignment with every recommendation served. Use interleaving (team-draft) for faster statistical significance than traditional A/B.

Practice Problems

Practice 1: Cold Start

A new streaming platform launches with 10,000 titles but zero user data. Design a recommendation strategy for the first 30 days. How do you bootstrap collaborative filtering?

Practice 2: Real-Time Personalization

A user watches 3 episodes of a sci-fi series in one sitting. How quickly should recommendations update? Design the nearline pipeline to handle session-based context.

Practice 3: Diversity vs. Relevance

Your recommendation model achieves great click-through rate but users complain of "filter bubbles." Design a re-ranking strategy that balances relevance, diversity, and serendipity.

Quick Reference

Component	Technology	Purpose
Candidate Generation	FAISS / ScaNN	Fast ANN retrieval from embeddings
Feature Store	Redis + Feast	Low-latency user/item features
Ranking Model	XGBoost / Deep&Wide	Score and rank candidates
Event Streaming	Kafka + Flink	Real-time interaction processing
Model Training	Spark + PyTorch	Offline model retraining
Experiment Platform	Custom / Statsig	A/B test management
Embedding Storage	Milvus / Pinecone	Vector database for embeddings

Key Takeaways

Use a two-stage funnel: cheap candidate generation → expensive ranking
Combine collaborative + content-based signals for robustness
Separate offline training from online serving with nearline feature updates
Use ANN indices (FAISS/ScaNN) for sub-millisecond embedding lookup
Always have fallback strategies (popular items, trending) for cold start and failures