Problem Statement & Requirements
Why Distributed Caching Is Fundamental
The Problem: Databases are slow. A typical database read takes 1-10ms. For systems handling millions of requests per second, this creates unacceptable latency and overwhelms database resources.
The Solution: A distributed cache stores frequently accessed data in memory across multiple nodes, delivering sub-millisecond reads at massive scale.
Real Impact: Twitter caches 30+ TB of timeline data in Redis. Facebook's Memcached cluster handles billions of requests per second across thousands of servers.
Real-World Analogy
Think of a distributed cache like a chain of libraries across a city:
- Single cache = One library downtown -- fast if you live nearby, slow otherwise
- Distributed cache = Branch libraries in every neighborhood
- Consistent hashing = The system that decides which branch stores which books
- Eviction policy = Rules for which books to remove when shelves are full
- Replication = Popular books copied to multiple branches
Requirements
Sub-Millisecond Reads
GET operations must complete in under 1ms for p99 latency. This is 10-100x faster than database reads.
Horizontal Scalability
Add nodes to increase capacity linearly. Support 100K+ operations per second per node.
High Availability
Cache should remain available during node failures. No single point of failure.
Configurable Eviction
Support multiple eviction policies (LRU, LFU, TTL) to optimize for different workloads.
Cache Architecture
Consistent Hashing for Cache
Why Not Simple Modular Hashing?
With simple hashing (hash(key) % N), adding or removing a node remaps almost all keys, causing a cache stampede. Consistent hashing ensures that only K/N keys are remapped (K = total keys, N = nodes), making scaling smooth.
Eviction Policies
from collections import OrderedDict
class LRUCache:
"""
LRU Cache using OrderedDict.
- GET: O(1) - move accessed key to end (most recent)
- PUT: O(1) - insert at end, evict from front if full
- Memory: O(capacity)
"""
def __init__(self, capacity: int):
self.capacity = capacity
self.cache = OrderedDict()
self.hits = 0
self.misses = 0
def get(self, key: str):
"""Retrieve value and mark as recently used."""
if key in self.cache:
self.hits += 1
self.cache.move_to_end(key) # Mark as MRU
return self.cache[key]
self.misses += 1
return None
def put(self, key: str, value) -> str:
"""Insert or update. Returns evicted key if any."""
evicted = None
if key in self.cache:
self.cache.move_to_end(key)
else:
if len(self.cache) >= self.capacity:
# Evict the least recently used (first item)
evicted = self.cache.popitem(last=False)[0]
self.cache[key] = value
return evicted
@property
def hit_rate(self) -> float:
"""Cache hit ratio."""
total = self.hits + self.misses
return self.hits / total if total > 0 else 0.0
# Example usage
cache = LRUCache(capacity=3)
cache.put("user:1", {"name": "Alice"})
cache.put("user:2", {"name": "Bob"})
cache.put("user:3", {"name": "Carol"})
cache.get("user:1") # Moves user:1 to MRU
cache.put("user:4", {"name": "Dave"}) # Evicts user:2 (LRU)
print(f"Hit rate: {cache.hit_rate:.1%}")
Eviction Policy Comparison
| Policy | Evicts | Best For | Weakness |
|---|---|---|---|
| LRU | Least Recently Used | General-purpose, temporal locality | Full-table scans pollute cache |
| LFU | Least Frequently Used | Frequency-based workloads | Slow to adapt to changing access patterns |
| FIFO | First In, First Out | Simple, predictable | Ignores access patterns entirely |
| TTL | Expired entries first | Time-sensitive data (sessions) | Doesn't handle memory pressure |
| Random | Random entry | Uniform access patterns | May evict hot data |
Cache Replication
Replication Strategies
- Primary-Replica: Writes go to primary, replicated async to replicas. Replicas serve reads. Used by Redis Cluster.
- Multi-Primary: Any node can accept writes. Conflict resolution needed. Higher availability but more complex.
- Leaderless: Read/write quorums (W + R > N). Used by Dynamo-style systems. Tunable consistency.
Cache Invalidation: The Two Hard Problems
"There are only two hard things in Computer Science: cache invalidation and naming things." -- Phil Karlton
- Write-Through: Write to cache and DB simultaneously. Consistent but slower writes.
- Write-Behind: Write to cache, async write to DB. Fast but risk of data loss.
- Cache-Aside: Application manages cache reads/writes. Most flexible and common pattern.
Cache Warming & Thundering Herd
The Thundering Herd Problem
Scenario: A popular cache key expires. 1,000 concurrent requests all get a cache miss simultaneously and all hit the database at once, potentially crashing it.
Solutions:
- Locking: Only the first request fetches from DB. Others wait for the cache to be repopulated.
- Probabilistic Early Expiration: Randomly refresh the cache before the actual TTL expires.
- Stale-While-Revalidate: Serve stale data immediately while refreshing in the background.
import redis
import time
import random
class CacheWithProtection:
"""Cache client with thundering herd protection."""
def __init__(self):
self.redis = redis.Redis()
self.LOCK_TTL = 5 # Lock expires after 5 seconds
def get_with_lock(self, key: str, fetch_fn, ttl: int = 300):
"""
Cache-aside with distributed lock to prevent thundering herd.
1. Try cache
2. On miss: acquire lock
3. If lock acquired: fetch from DB, populate cache
4. If lock not acquired: wait and retry cache
"""
# Step 1: Try cache
value = self.redis.get(key)
if value:
return value
# Step 2: Try to acquire lock
lock_key = f"lock:{key}"
acquired = self.redis.set(
lock_key, "1",
nx=True, # Only set if not exists
ex=self.LOCK_TTL
)
if acquired:
# Step 3: We hold the lock - fetch and cache
try:
value = fetch_fn()
self.redis.setex(key, ttl, value)
return value
finally:
self.redis.delete(lock_key)
else:
# Step 4: Another request is fetching - wait and retry
time.sleep(0.05) # 50ms backoff
return self.get_with_lock(key, fetch_fn, ttl)
def get_with_early_refresh(self, key: str, fetch_fn,
ttl: int = 300, beta: float = 1.0):
"""
Probabilistic early expiration (XFetch algorithm).
Randomly refresh before TTL to prevent simultaneous expiry.
"""
value = self.redis.get(key)
remaining_ttl = self.redis.ttl(key)
if value and remaining_ttl > 0:
# Probabilistically refresh if close to expiry
expiry_gap = ttl - remaining_ttl
if random.random() < beta * math.log(random.random()) * -expiry_gap:
# Refresh in background
refresh_async(key, fetch_fn, ttl)
return value
# Cache miss: fetch synchronously
value = fetch_fn()
self.redis.setex(key, ttl, value)
return value
Memcached vs Redis Deep Dive
| Feature | Memcached | Redis |
|---|---|---|
| Data types | String only | Strings, Lists, Sets, Sorted Sets, Hashes, Streams |
| Threading | Multi-threaded | Single-threaded (with I/O threads in 6.0+) |
| Persistence | None (pure cache) | RDB snapshots + AOF log |
| Clustering | Client-side sharding | Built-in Redis Cluster |
| Replication | None built-in | Primary-Replica with automatic failover |
| Max value size | 1 MB | 512 MB |
| Pub/Sub | No | Yes |
| Lua scripting | No | Yes (atomic operations) |
| Best for | Simple key-value caching at massive scale | Feature-rich caching, sessions, queues, leaderboards |
from redis.cluster import RedisCluster
# Redis Cluster configuration
# 6 nodes: 3 primaries + 3 replicas
startup_nodes = [
{"host": "redis-node-1", "port": 6379},
{"host": "redis-node-2", "port": 6379},
{"host": "redis-node-3", "port": 6379},
]
rc = RedisCluster(
startup_nodes=startup_nodes,
decode_responses=True,
skip_full_coverage_check=True,
# Redis Cluster uses 16,384 hash slots
# distributed across primary nodes
)
# Operations work transparently across cluster
rc.set("user:1001", "Alice") # Routes to correct shard
rc.get("user:1001") # Reads from primary or replica
# Pipeline for batch operations (same hash slot)
pipe = rc.pipeline()
pipe.set("{user:1001}:name", "Alice") # Hash tag {user:1001}
pipe.set("{user:1001}:email", "[email protected]") # ensures same slot
pipe.execute()
Practice Problems
Medium Cache Warming Strategy
Design a cache warming strategy for a newly deployed cache cluster. How do you populate the cache without overwhelming the database?
- When should cache warming happen?
- How do you identify the "hot" keys to pre-load?
- How do you throttle warming to protect the database?
Analyze access logs to identify hot keys. Use a gradual warm-up with rate limiting. Consider a "shadow" cache that mirrors production traffic before cutover.
# Cache Warming Strategy
# 1. Analyze access logs (last 24h) for top 10K keys
# 2. Sort by access frequency
# 3. Warm in batches of 100, with 100ms delay between batches
# 4. Monitor DB load during warming
async def warm_cache(hot_keys, batch_size=100, delay=0.1):
for i in range(0, len(hot_keys), batch_size):
batch = hot_keys[i:i + batch_size]
values = await db.multi_get(batch)
pipe = redis.pipeline()
for key, val in zip(batch, values):
pipe.setex(key, 3600, val)
await pipe.execute()
await asyncio.sleep(delay) # Throttle
# Alternative: Shadow traffic replay
# Record production reads, replay against new cache
Hard Cache Consistency
In a system where the database is the source of truth, how do you keep the cache consistent after database updates? Design a strategy that handles concurrent reads and writes.
- Should you update the cache or invalidate it on write?
- What happens with concurrent read-modify-write operations?
- How do you handle network partitions between cache and DB?
Cache invalidation (delete) is safer than cache update because it avoids race conditions. Consider the "delete-then-write" vs "write-then-delete" ordering carefully. CDC (Change Data Capture) from the database can provide reliable invalidation.
# Recommended: Cache-Aside with Delete on Write
def update_user(user_id, new_data):
# 1. Update database (source of truth)
db.update("users", user_id, new_data)
# 2. Invalidate cache (don't update!)
cache.delete(f"user:{user_id}")
# 3. Next read will populate cache from DB
# Why delete instead of update?
# Race condition with update:
# T1: Read DB (old value)
# T2: Write DB (new value)
# T2: Update cache (new value)
# T1: Update cache (old value!) -- STALE!
# With delete, worst case is an extra DB read
# For strongest consistency: CDC pipeline
# DB binlog -> Kafka -> Cache invalidation consumer
Hard Multi-Region Cache
Design a cache architecture for a globally distributed application serving users from US, EU, and Asia regions. How do you handle cache coherence across regions?
- Should each region have its own independent cache?
- How do you handle writes that need to invalidate caches in other regions?
- What consistency guarantees can you provide?
Independent regional caches with cross-region invalidation via a message bus. Accept eventual consistency (200-500ms cross-region propagation). Consider which data needs strong consistency vs eventual.
# Multi-Region Cache Architecture
# Each region has independent Redis cluster
# Cross-region invalidation via Kafka
# Write path:
# 1. App writes to regional DB primary
# 2. Invalidate local cache immediately
# 3. Publish invalidation event to Kafka
# 4. Other regions consume and invalidate
# Consistency: eventual (~200ms cross-region)
# For strong consistency: route writes to one region
# For read-your-writes: sticky sessions or version tokens
# Example: User updates profile in US-East
# US-East cache: invalidated immediately
# EU-West cache: invalidated in ~150ms
# AP-Southeast cache: invalidated in ~200ms
Quick Reference
Distributed Cache Design Summary
| Component | Choice | Rationale |
|---|---|---|
| Cache Engine | Redis Cluster | Rich data types, persistence, replication |
| Distribution | Consistent Hashing | Smooth scaling, minimal remapping |
| Eviction | LRU (default), LFU for skewed | General-purpose, adapts to access patterns |
| Replication | Primary-Replica (1:1) | HA with automatic failover |
| Invalidation | Cache-Aside + CDC | Reliable invalidation, no stale data |
| Herd Protection | Distributed locks + early refresh | Prevents DB stampedes |
Key Takeaways
Interview Tips
- Start with "why cache?" -- quantify the latency and throughput benefits
- Consistent hashing is essential for distributed caches -- explain virtual nodes
- Always discuss eviction policies and their tradeoffs for the specific workload
- Cache invalidation is the hardest part -- prefer delete over update
- Address the thundering herd problem with locking or probabilistic refresh
- Know when to choose Memcached (simple, multi-threaded) vs Redis (feature-rich)