Design a Distributed Cache

Hard 32 min read

Problem Statement & Requirements

Why Distributed Caching Is Fundamental

The Problem: Databases are slow. A typical database read takes 1-10ms. For systems handling millions of requests per second, this creates unacceptable latency and overwhelms database resources.

The Solution: A distributed cache stores frequently accessed data in memory across multiple nodes, delivering sub-millisecond reads at massive scale.

Real Impact: Twitter caches 30+ TB of timeline data in Redis. Facebook's Memcached cluster handles billions of requests per second across thousands of servers.

Real-World Analogy

Think of a distributed cache like a chain of libraries across a city:

  • Single cache = One library downtown -- fast if you live nearby, slow otherwise
  • Distributed cache = Branch libraries in every neighborhood
  • Consistent hashing = The system that decides which branch stores which books
  • Eviction policy = Rules for which books to remove when shelves are full
  • Replication = Popular books copied to multiple branches

Requirements

Sub-Millisecond Reads

GET operations must complete in under 1ms for p99 latency. This is 10-100x faster than database reads.

Horizontal Scalability

Add nodes to increase capacity linearly. Support 100K+ operations per second per node.

High Availability

Cache should remain available during node failures. No single point of failure.

Configurable Eviction

Support multiple eviction policies (LRU, LFU, TTL) to optimize for different workloads.

Cache Architecture

Multi-Tier Cache Architecture
Client App Server L1: Local Cache In-process memory ~100 microseconds ~100 MB per node Cache Node 1 Redis / Memcached Cache Node 2 Redis / Memcached Cache Node 3 Redis / Memcached L2: Distributed Cache ~1 ms, ~100 GB total L3: Database MySQL / Postgres ~5-50 ms Terabytes Check Miss Target Hit Rate 95-99%

Consistent Hashing for Cache

Why Not Simple Modular Hashing?

With simple hashing (hash(key) % N), adding or removing a node remaps almost all keys, causing a cache stampede. Consistent hashing ensures that only K/N keys are remapped (K = total keys, N = nodes), making scaling smooth.

Consistent Hashing Ring with Virtual Nodes
Node A Node B Node C Node D key1 -> A key2 -> B key3 -> D Consistent Hashing Physical node Virtual node (150-200 per physical) Cache key Keys map to the next node clockwise on the ring. Adding a node only remaps ~1/N of all keys. Result: smooth scaling!

Eviction Policies

LRU Cache Eviction Visualization
Cache (capacity = 4): A (MRU) Most Recent B C D (LRU) Least Recent Insert E (cache full): Evict D (LRU), insert E at MRU position E (new) A B C D evicted LRU Implementation HashMap + Doubly Linked List GET: O(1) - move to head PUT: O(1) - insert at head EVICT: O(1) - remove tail All operations are constant time!
lru_cache.py
from collections import OrderedDict

class LRUCache:
    """
    LRU Cache using OrderedDict.

    - GET: O(1) - move accessed key to end (most recent)
    - PUT: O(1) - insert at end, evict from front if full
    - Memory: O(capacity)
    """

    def __init__(self, capacity: int):
        self.capacity = capacity
        self.cache = OrderedDict()
        self.hits = 0
        self.misses = 0

    def get(self, key: str):
        """Retrieve value and mark as recently used."""
        if key in self.cache:
            self.hits += 1
            self.cache.move_to_end(key)  # Mark as MRU
            return self.cache[key]
        self.misses += 1
        return None

    def put(self, key: str, value) -> str:
        """Insert or update. Returns evicted key if any."""
        evicted = None
        if key in self.cache:
            self.cache.move_to_end(key)
        else:
            if len(self.cache) >= self.capacity:
                # Evict the least recently used (first item)
                evicted = self.cache.popitem(last=False)[0]
        self.cache[key] = value
        return evicted

    @property
    def hit_rate(self) -> float:
        """Cache hit ratio."""
        total = self.hits + self.misses
        return self.hits / total if total > 0 else 0.0


# Example usage
cache = LRUCache(capacity=3)
cache.put("user:1", {"name": "Alice"})
cache.put("user:2", {"name": "Bob"})
cache.put("user:3", {"name": "Carol"})
cache.get("user:1")    # Moves user:1 to MRU
cache.put("user:4", {"name": "Dave"})  # Evicts user:2 (LRU)
print(f"Hit rate: {cache.hit_rate:.1%}")

Eviction Policy Comparison

Policy Evicts Best For Weakness
LRU Least Recently Used General-purpose, temporal locality Full-table scans pollute cache
LFU Least Frequently Used Frequency-based workloads Slow to adapt to changing access patterns
FIFO First In, First Out Simple, predictable Ignores access patterns entirely
TTL Expired entries first Time-sensitive data (sessions) Doesn't handle memory pressure
Random Random entry Uniform access patterns May evict hot data

Cache Replication

Replication Strategies

  • Primary-Replica: Writes go to primary, replicated async to replicas. Replicas serve reads. Used by Redis Cluster.
  • Multi-Primary: Any node can accept writes. Conflict resolution needed. Higher availability but more complex.
  • Leaderless: Read/write quorums (W + R > N). Used by Dynamo-style systems. Tunable consistency.

Cache Invalidation: The Two Hard Problems

"There are only two hard things in Computer Science: cache invalidation and naming things." -- Phil Karlton

  • Write-Through: Write to cache and DB simultaneously. Consistent but slower writes.
  • Write-Behind: Write to cache, async write to DB. Fast but risk of data loss.
  • Cache-Aside: Application manages cache reads/writes. Most flexible and common pattern.

Cache Warming & Thundering Herd

The Thundering Herd Problem

Scenario: A popular cache key expires. 1,000 concurrent requests all get a cache miss simultaneously and all hit the database at once, potentially crashing it.

Solutions:

  • Locking: Only the first request fetches from DB. Others wait for the cache to be repopulated.
  • Probabilistic Early Expiration: Randomly refresh the cache before the actual TTL expires.
  • Stale-While-Revalidate: Serve stale data immediately while refreshing in the background.
thundering_herd_protection.py
import redis
import time
import random

class CacheWithProtection:
    """Cache client with thundering herd protection."""

    def __init__(self):
        self.redis = redis.Redis()
        self.LOCK_TTL = 5  # Lock expires after 5 seconds

    def get_with_lock(self, key: str, fetch_fn, ttl: int = 300):
        """
        Cache-aside with distributed lock to prevent thundering herd.

        1. Try cache
        2. On miss: acquire lock
        3. If lock acquired: fetch from DB, populate cache
        4. If lock not acquired: wait and retry cache
        """
        # Step 1: Try cache
        value = self.redis.get(key)
        if value:
            return value

        # Step 2: Try to acquire lock
        lock_key = f"lock:{key}"
        acquired = self.redis.set(
            lock_key, "1",
            nx=True,  # Only set if not exists
            ex=self.LOCK_TTL
        )

        if acquired:
            # Step 3: We hold the lock - fetch and cache
            try:
                value = fetch_fn()
                self.redis.setex(key, ttl, value)
                return value
            finally:
                self.redis.delete(lock_key)
        else:
            # Step 4: Another request is fetching - wait and retry
            time.sleep(0.05)  # 50ms backoff
            return self.get_with_lock(key, fetch_fn, ttl)

    def get_with_early_refresh(self, key: str, fetch_fn,
                               ttl: int = 300, beta: float = 1.0):
        """
        Probabilistic early expiration (XFetch algorithm).
        Randomly refresh before TTL to prevent simultaneous expiry.
        """
        value = self.redis.get(key)
        remaining_ttl = self.redis.ttl(key)

        if value and remaining_ttl > 0:
            # Probabilistically refresh if close to expiry
            expiry_gap = ttl - remaining_ttl
            if random.random() < beta * math.log(random.random()) * -expiry_gap:
                # Refresh in background
                refresh_async(key, fetch_fn, ttl)
            return value

        # Cache miss: fetch synchronously
        value = fetch_fn()
        self.redis.setex(key, ttl, value)
        return value

Memcached vs Redis Deep Dive

Feature Memcached Redis
Data types String only Strings, Lists, Sets, Sorted Sets, Hashes, Streams
Threading Multi-threaded Single-threaded (with I/O threads in 6.0+)
Persistence None (pure cache) RDB snapshots + AOF log
Clustering Client-side sharding Built-in Redis Cluster
Replication None built-in Primary-Replica with automatic failover
Max value size 1 MB 512 MB
Pub/Sub No Yes
Lua scripting No Yes (atomic operations)
Best for Simple key-value caching at massive scale Feature-rich caching, sessions, queues, leaderboards
redis_cluster_config.py
from redis.cluster import RedisCluster

# Redis Cluster configuration
# 6 nodes: 3 primaries + 3 replicas
startup_nodes = [
    {"host": "redis-node-1", "port": 6379},
    {"host": "redis-node-2", "port": 6379},
    {"host": "redis-node-3", "port": 6379},
]

rc = RedisCluster(
    startup_nodes=startup_nodes,
    decode_responses=True,
    skip_full_coverage_check=True,
    # Redis Cluster uses 16,384 hash slots
    # distributed across primary nodes
)

# Operations work transparently across cluster
rc.set("user:1001", "Alice")      # Routes to correct shard
rc.get("user:1001")                 # Reads from primary or replica

# Pipeline for batch operations (same hash slot)
pipe = rc.pipeline()
pipe.set("{user:1001}:name", "Alice")   # Hash tag {user:1001}
pipe.set("{user:1001}:email", "[email protected]") # ensures same slot
pipe.execute()

Practice Problems

Medium Cache Warming Strategy

Design a cache warming strategy for a newly deployed cache cluster. How do you populate the cache without overwhelming the database?

  1. When should cache warming happen?
  2. How do you identify the "hot" keys to pre-load?
  3. How do you throttle warming to protect the database?

Analyze access logs to identify hot keys. Use a gradual warm-up with rate limiting. Consider a "shadow" cache that mirrors production traffic before cutover.

# Cache Warming Strategy
# 1. Analyze access logs (last 24h) for top 10K keys
# 2. Sort by access frequency
# 3. Warm in batches of 100, with 100ms delay between batches
# 4. Monitor DB load during warming

async def warm_cache(hot_keys, batch_size=100, delay=0.1):
    for i in range(0, len(hot_keys), batch_size):
        batch = hot_keys[i:i + batch_size]
        values = await db.multi_get(batch)
        pipe = redis.pipeline()
        for key, val in zip(batch, values):
            pipe.setex(key, 3600, val)
        await pipe.execute()
        await asyncio.sleep(delay)  # Throttle

# Alternative: Shadow traffic replay
# Record production reads, replay against new cache

Hard Cache Consistency

In a system where the database is the source of truth, how do you keep the cache consistent after database updates? Design a strategy that handles concurrent reads and writes.

  1. Should you update the cache or invalidate it on write?
  2. What happens with concurrent read-modify-write operations?
  3. How do you handle network partitions between cache and DB?

Cache invalidation (delete) is safer than cache update because it avoids race conditions. Consider the "delete-then-write" vs "write-then-delete" ordering carefully. CDC (Change Data Capture) from the database can provide reliable invalidation.

# Recommended: Cache-Aside with Delete on Write

def update_user(user_id, new_data):
    # 1. Update database (source of truth)
    db.update("users", user_id, new_data)
    # 2. Invalidate cache (don't update!)
    cache.delete(f"user:{user_id}")
    # 3. Next read will populate cache from DB

# Why delete instead of update?
# Race condition with update:
# T1: Read DB (old value)
# T2: Write DB (new value)
# T2: Update cache (new value)
# T1: Update cache (old value!) -- STALE!

# With delete, worst case is an extra DB read

# For strongest consistency: CDC pipeline
# DB binlog -> Kafka -> Cache invalidation consumer

Hard Multi-Region Cache

Design a cache architecture for a globally distributed application serving users from US, EU, and Asia regions. How do you handle cache coherence across regions?

  1. Should each region have its own independent cache?
  2. How do you handle writes that need to invalidate caches in other regions?
  3. What consistency guarantees can you provide?

Independent regional caches with cross-region invalidation via a message bus. Accept eventual consistency (200-500ms cross-region propagation). Consider which data needs strong consistency vs eventual.

# Multi-Region Cache Architecture

# Each region has independent Redis cluster
# Cross-region invalidation via Kafka

# Write path:
# 1. App writes to regional DB primary
# 2. Invalidate local cache immediately
# 3. Publish invalidation event to Kafka
# 4. Other regions consume and invalidate

# Consistency: eventual (~200ms cross-region)
# For strong consistency: route writes to one region
# For read-your-writes: sticky sessions or version tokens

# Example: User updates profile in US-East
# US-East cache: invalidated immediately
# EU-West cache: invalidated in ~150ms
# AP-Southeast cache: invalidated in ~200ms

Quick Reference

Distributed Cache Design Summary

Component Choice Rationale
Cache Engine Redis Cluster Rich data types, persistence, replication
Distribution Consistent Hashing Smooth scaling, minimal remapping
Eviction LRU (default), LFU for skewed General-purpose, adapts to access patterns
Replication Primary-Replica (1:1) HA with automatic failover
Invalidation Cache-Aside + CDC Reliable invalidation, no stale data
Herd Protection Distributed locks + early refresh Prevents DB stampedes

Key Takeaways

Interview Tips

  • Start with "why cache?" -- quantify the latency and throughput benefits
  • Consistent hashing is essential for distributed caches -- explain virtual nodes
  • Always discuss eviction policies and their tradeoffs for the specific workload
  • Cache invalidation is the hardest part -- prefer delete over update
  • Address the thundering herd problem with locking or probabilistic refresh
  • Know when to choose Memcached (simple, multi-threaded) vs Redis (feature-rich)