Design a Distributed Cache | LIZIU System Design

Problem Statement & Requirements

Why Distributed Caching Is Fundamental

The Problem: Databases are slow. A typical database read takes 1-10ms. For systems handling millions of requests per second, this creates unacceptable latency and overwhelms database resources.

The Solution: A distributed cache stores frequently accessed data in memory across multiple nodes, delivering sub-millisecond reads at massive scale.

Real Impact: Twitter caches 30+ TB of timeline data in Redis. Facebook's Memcached cluster handles billions of requests per second across thousands of servers.

Real-World Analogy

Think of a distributed cache like a chain of libraries across a city:

Single cache = One library downtown -- fast if you live nearby, slow otherwise
Distributed cache = Branch libraries in every neighborhood
Consistent hashing = The system that decides which branch stores which books
Eviction policy = Rules for which books to remove when shelves are full
Replication = Popular books copied to multiple branches

Requirements

Sub-Millisecond Reads

GET operations must complete in under 1ms for p99 latency. This is 10-100x faster than database reads.

Horizontal Scalability

Add nodes to increase capacity linearly. Support 100K+ operations per second per node.

High Availability

Cache should remain available during node failures. No single point of failure.

Configurable Eviction

Support multiple eviction policies (LRU, LFU, TTL) to optimize for different workloads.

Cache Architecture

Multi-Tier Cache Architecture

Consistent Hashing for Cache

Why Not Simple Modular Hashing?

With simple hashing (hash(key) % N), adding or removing a node remaps almost all keys, causing a cache stampede. Consistent hashing ensures that only K/N keys are remapped (K = total keys, N = nodes), making scaling smooth.

Consistent Hashing Ring with Virtual Nodes

Eviction Policies

LRU Cache Eviction Visualization

lru_cache.py

from collections import OrderedDict

class LRUCache:
    """
    LRU Cache using OrderedDict.

    - GET: O(1) - move accessed key to end (most recent)
    - PUT: O(1) - insert at end, evict from front if full
    - Memory: O(capacity)
    """

    def __init__(self, capacity: int):
        self.capacity = capacity
        self.cache = OrderedDict()
        self.hits = 0
        self.misses = 0

    def get(self, key: str):
        """Retrieve value and mark as recently used."""
        if key in self.cache:
            self.hits += 1
            self.cache.move_to_end(key)  # Mark as MRU
            return self.cache[key]
        self.misses += 1
        return None

    def put(self, key: str, value) -> str:
        """Insert or update. Returns evicted key if any."""
        evicted = None
        if key in self.cache:
            self.cache.move_to_end(key)
        else:
            if len(self.cache) >= self.capacity:
                # Evict the least recently used (first item)
                evicted = self.cache.popitem(last=False)[0]
        self.cache[key] = value
        return evicted

    @property
    def hit_rate(self) -> float:
        """Cache hit ratio."""
        total = self.hits + self.misses
        return self.hits / total if total > 0 else 0.0


# Example usage
cache = LRUCache(capacity=3)
cache.put("user:1", {"name": "Alice"})
cache.put("user:2", {"name": "Bob"})
cache.put("user:3", {"name": "Carol"})
cache.get("user:1")    # Moves user:1 to MRU
cache.put("user:4", {"name": "Dave"})  # Evicts user:2 (LRU)
print(f"Hit rate: {cache.hit_rate:.1%}")

Eviction Policy Comparison

Policy	Evicts	Best For	Weakness
LRU	Least Recently Used	General-purpose, temporal locality	Full-table scans pollute cache
LFU	Least Frequently Used	Frequency-based workloads	Slow to adapt to changing access patterns
FIFO	First In, First Out	Simple, predictable	Ignores access patterns entirely
TTL	Expired entries first	Time-sensitive data (sessions)	Doesn't handle memory pressure
Random	Random entry	Uniform access patterns	May evict hot data

Cache Replication

Replication Strategies

Primary-Replica: Writes go to primary, replicated async to replicas. Replicas serve reads. Used by Redis Cluster.
Multi-Primary: Any node can accept writes. Conflict resolution needed. Higher availability but more complex.
Leaderless: Read/write quorums (W + R > N). Used by Dynamo-style systems. Tunable consistency.

Cache Invalidation: The Two Hard Problems

"There are only two hard things in Computer Science: cache invalidation and naming things." -- Phil Karlton

Write-Through: Write to cache and DB simultaneously. Consistent but slower writes.
Write-Behind: Write to cache, async write to DB. Fast but risk of data loss.
Cache-Aside: Application manages cache reads/writes. Most flexible and common pattern.

Cache Warming & Thundering Herd

The Thundering Herd Problem

Scenario: A popular cache key expires. 1,000 concurrent requests all get a cache miss simultaneously and all hit the database at once, potentially crashing it.

Solutions:

Locking: Only the first request fetches from DB. Others wait for the cache to be repopulated.
Probabilistic Early Expiration: Randomly refresh the cache before the actual TTL expires.
Stale-While-Revalidate: Serve stale data immediately while refreshing in the background.

thundering_herd_protection.py

import redis
import time
import random

class CacheWithProtection:
    """Cache client with thundering herd protection."""

    def __init__(self):
        self.redis = redis.Redis()
        self.LOCK_TTL = 5  # Lock expires after 5 seconds

    def get_with_lock(self, key: str, fetch_fn, ttl: int = 300):
        """
        Cache-aside with distributed lock to prevent thundering herd.

        1. Try cache
        2. On miss: acquire lock
        3. If lock acquired: fetch from DB, populate cache
        4. If lock not acquired: wait and retry cache
        """
        # Step 1: Try cache
        value = self.redis.get(key)
        if value:
            return value

        # Step 2: Try to acquire lock
        lock_key = f"lock:{key}"
        acquired = self.redis.set(
            lock_key, "1",
            nx=True,  # Only set if not exists
            ex=self.LOCK_TTL
        )

        if acquired:
            # Step 3: We hold the lock - fetch and cache
            try:
                value = fetch_fn()
                self.redis.setex(key, ttl, value)
                return value
            finally:
                self.redis.delete(lock_key)
        else:
            # Step 4: Another request is fetching - wait and retry
            time.sleep(0.05)  # 50ms backoff
            return self.get_with_lock(key, fetch_fn, ttl)

    def get_with_early_refresh(self, key: str, fetch_fn,
                               ttl: int = 300, beta: float = 1.0):
        """
        Probabilistic early expiration (XFetch algorithm).
        Randomly refresh before TTL to prevent simultaneous expiry.
        """
        value = self.redis.get(key)
        remaining_ttl = self.redis.ttl(key)

        if value and remaining_ttl > 0:
            # Probabilistically refresh if close to expiry
            expiry_gap = ttl - remaining_ttl
            if random.random() < beta * math.log(random.random()) * -expiry_gap:
                # Refresh in background
                refresh_async(key, fetch_fn, ttl)
            return value

        # Cache miss: fetch synchronously
        value = fetch_fn()
        self.redis.setex(key, ttl, value)
        return value

Memcached vs Redis Deep Dive

Feature	Memcached	Redis
Data types	String only	Strings, Lists, Sets, Sorted Sets, Hashes, Streams
Threading	Multi-threaded	Single-threaded (with I/O threads in 6.0+)
Persistence	None (pure cache)	RDB snapshots + AOF log
Clustering	Client-side sharding	Built-in Redis Cluster
Replication	None built-in	Primary-Replica with automatic failover
Max value size	1 MB	512 MB
Pub/Sub	No	Yes
Lua scripting	No	Yes (atomic operations)
Best for	Simple key-value caching at massive scale	Feature-rich caching, sessions, queues, leaderboards

redis_cluster_config.py

from redis.cluster import RedisCluster

# Redis Cluster configuration
# 6 nodes: 3 primaries + 3 replicas
startup_nodes = [
    {"host": "redis-node-1", "port": 6379},
    {"host": "redis-node-2", "port": 6379},
    {"host": "redis-node-3", "port": 6379},
]

rc = RedisCluster(
    startup_nodes=startup_nodes,
    decode_responses=True,
    skip_full_coverage_check=True,
    # Redis Cluster uses 16,384 hash slots
    # distributed across primary nodes
)

# Operations work transparently across cluster
rc.set("user:1001", "Alice")      # Routes to correct shard
rc.get("user:1001")                 # Reads from primary or replica

# Pipeline for batch operations (same hash slot)
pipe = rc.pipeline()
pipe.set("{user:1001}:name", "Alice")   # Hash tag {user:1001}
pipe.set("{user:1001}:email", "[email protected]") # ensures same slot
pipe.execute()

Practice Problems

Medium Cache Warming Strategy

Design a cache warming strategy for a newly deployed cache cluster. How do you populate the cache without overwhelming the database?

When should cache warming happen?
How do you identify the "hot" keys to pre-load?
How do you throttle warming to protect the database?

Analyze access logs to identify hot keys. Use a gradual warm-up with rate limiting. Consider a "shadow" cache that mirrors production traffic before cutover.

# Cache Warming Strategy
# 1. Analyze access logs (last 24h) for top 10K keys
# 2. Sort by access frequency
# 3. Warm in batches of 100, with 100ms delay between batches
# 4. Monitor DB load during warming

async def warm_cache(hot_keys, batch_size=100, delay=0.1):
    for i in range(0, len(hot_keys), batch_size):
        batch = hot_keys[i:i + batch_size]
        values = await db.multi_get(batch)
        pipe = redis.pipeline()
        for key, val in zip(batch, values):
            pipe.setex(key, 3600, val)
        await pipe.execute()
        await asyncio.sleep(delay)  # Throttle

# Alternative: Shadow traffic replay
# Record production reads, replay against new cache

Hard Cache Consistency

In a system where the database is the source of truth, how do you keep the cache consistent after database updates? Design a strategy that handles concurrent reads and writes.

Should you update the cache or invalidate it on write?
What happens with concurrent read-modify-write operations?
How do you handle network partitions between cache and DB?

Cache invalidation (delete) is safer than cache update because it avoids race conditions. Consider the "delete-then-write" vs "write-then-delete" ordering carefully. CDC (Change Data Capture) from the database can provide reliable invalidation.

# Recommended: Cache-Aside with Delete on Write

def update_user(user_id, new_data):
    # 1. Update database (source of truth)
    db.update("users", user_id, new_data)
    # 2. Invalidate cache (don't update!)
    cache.delete(f"user:{user_id}")
    # 3. Next read will populate cache from DB

# Why delete instead of update?
# Race condition with update:
# T1: Read DB (old value)
# T2: Write DB (new value)
# T2: Update cache (new value)
# T1: Update cache (old value!) -- STALE!

# With delete, worst case is an extra DB read

# For strongest consistency: CDC pipeline
# DB binlog -> Kafka -> Cache invalidation consumer

Hard Multi-Region Cache

Design a cache architecture for a globally distributed application serving users from US, EU, and Asia regions. How do you handle cache coherence across regions?

Should each region have its own independent cache?
How do you handle writes that need to invalidate caches in other regions?
What consistency guarantees can you provide?

Independent regional caches with cross-region invalidation via a message bus. Accept eventual consistency (200-500ms cross-region propagation). Consider which data needs strong consistency vs eventual.

# Multi-Region Cache Architecture

# Each region has independent Redis cluster
# Cross-region invalidation via Kafka

# Write path:
# 1. App writes to regional DB primary
# 2. Invalidate local cache immediately
# 3. Publish invalidation event to Kafka
# 4. Other regions consume and invalidate

# Consistency: eventual (~200ms cross-region)
# For strong consistency: route writes to one region
# For read-your-writes: sticky sessions or version tokens

# Example: User updates profile in US-East
# US-East cache: invalidated immediately
# EU-West cache: invalidated in ~150ms
# AP-Southeast cache: invalidated in ~200ms

Quick Reference

Distributed Cache Design Summary

Component	Choice	Rationale
Cache Engine	Redis Cluster	Rich data types, persistence, replication
Distribution	Consistent Hashing	Smooth scaling, minimal remapping
Eviction	LRU (default), LFU for skewed	General-purpose, adapts to access patterns
Replication	Primary-Replica (1:1)	HA with automatic failover
Invalidation	Cache-Aside + CDC	Reliable invalidation, no stale data
Herd Protection	Distributed locks + early refresh	Prevents DB stampedes

Key Takeaways

Interview Tips

Start with "why cache?" -- quantify the latency and throughput benefits
Consistent hashing is essential for distributed caches -- explain virtual nodes
Always discuss eviction policies and their tradeoffs for the specific workload
Cache invalidation is the hardest part -- prefer delete over update
Address the thundering herd problem with locking or probabilistic refresh
Know when to choose Memcached (simple, multi-threaded) vs Redis (feature-rich)