Why Caching Matters
Why Caching Matters
The Problem: Database queries and API calls are slow and expensive. If 1,000 users request the same product page, fetching it from the database 1,000 times is wasteful.
The Solution: Store frequently accessed data in fast, in-memory storage (cache). Serve subsequent requests from the cache instead of the slow data source.
Real Impact: Facebook uses Memcached to handle billions of requests. A cache hit takes ~1ms vs ~50ms for a database query -- a 50x improvement.
Real-World Analogy
Think of caching like a chef's prep station:
- Database = The walk-in refrigerator in the back (lots of storage, slow to access)
- Cache = The prep counter right in front of the chef (limited space, instant access)
- Cache hit = Ingredient is already on the prep counter -- grab it immediately
- Cache miss = Need to walk to the fridge, get it, bring it to the counter
- Eviction = Counter is full -- remove something to make space for new ingredients
Cache Hierarchy
Caching Strategies
There are several patterns for how your application interacts with the cache. The right choice depends on your read/write ratio and consistency requirements.
Cache-Aside (Lazy Loading)
Application checks cache first. On miss, fetches from DB and stores in cache. Most common pattern. Simple to implement. Cache only contains requested data.
Write-Through
Every write goes to both cache and DB synchronously. Cache is always consistent with DB. Higher write latency but no stale data. Good for read-heavy workloads.
Write-Back (Write-Behind)
Writes go to cache only, then asynchronously flushed to DB. Very fast writes but risk of data loss if cache crashes before flush. Good for write-heavy workloads.
Write-Around
Writes go directly to DB, bypassing cache. Cache only populated on reads (cache-aside for reads). Prevents cache pollution from data that is written but rarely read.
Cache Eviction Policies
Caches have limited memory. When the cache is full and a new item needs to be stored, an eviction policy decides which existing item to remove.
| Policy | Strategy | Best For | Complexity |
|---|---|---|---|
| LRU | Least Recently Used -- evict oldest accessed item | General purpose, most common | O(1) |
| LFU | Least Frequently Used -- evict least accessed item | Workloads with stable hot set | O(log n) |
| FIFO | First In, First Out -- evict oldest item | Simple, time-based data | O(1) |
| TTL | Time To Live -- expire after fixed duration | Session data, tokens | O(1) |
| Random | Remove a random item | Surprisingly effective, very simple | O(1) |
Distributed Caching with Redis
Redis is the most popular distributed caching solution. It is an in-memory data store that supports strings, hashes, lists, sets, and sorted sets with built-in TTL support.
import redis
import json
import time
# Connect to Redis
cache = redis.Redis(host="localhost", port=6379, db=0)
def get_user_profile(user_id):
"""Cache-aside pattern for user profiles."""
cache_key = f"user:{user_id}"
# Step 1: Check cache
cached = cache.get(cache_key)
if cached:
print("Cache HIT")
return json.loads(cached)
# Step 2: Cache miss - fetch from database
print("Cache MISS - querying database")
user = db.query("SELECT * FROM users WHERE id = %s", user_id)
# Step 3: Store in cache with 5 minute TTL
cache.setex(
cache_key,
300, # TTL in seconds (5 minutes)
json.dumps(user)
)
return user
def update_user_profile(user_id, new_data):
"""Write-through: update DB and cache together."""
# Update database first
db.execute(
"UPDATE users SET name=%s, email=%s WHERE id=%s",
(new_data["name"], new_data["email"], user_id)
)
# Invalidate cache (or update it)
cache_key = f"user:{user_id}"
cache.delete(cache_key) # Invalidation approach
# OR: cache.setex(cache_key, 300, json.dumps(new_data))
def get_leaderboard(limit=10):
"""Use Redis sorted set for a real-time leaderboard."""
# Add scores
cache.zadd("leaderboard", {"alice": 1500, "bob": 2100, "carol": 1800})
# Get top N players (descending)
top_players = cache.zrevrange("leaderboard", 0, limit - 1, withscores=True)
return [(name.decode(), score) for name, score in top_players]
# Measure cache performance
start = time.time()
profile = get_user_profile(42) # Miss: ~10ms
print(f"First call: {(time.time()-start)*1000:.1f}ms")
start = time.time()
profile = get_user_profile(42) # Hit: ~0.1ms
print(f"Second call: {(time.time()-start)*1000:.1f}ms")
Common Pitfall: Cache Stampede
Problem: A popular cache key expires. Hundreds of requests simultaneously hit the database to regenerate it.
Solutions:
- Locking: Only one request fetches from DB; others wait for cache to be populated
- Early refresh: Refresh the cache before it expires (e.g., at 80% of TTL)
- Jittered TTL: Add random variance to TTL so keys do not expire simultaneously
Practice Problems
Easy Choose the Caching Strategy
For each scenario, which caching strategy is best and why?
- A product catalog page (read 99%, write 1%)
- A logging service writing millions of events/second
- A user session store
Consider the read/write ratio and consistency needs. Cache-aside works for reads, write-back for writes, write-through for consistency.
# 1. Product catalog: Cache-Aside
# - Read-heavy (99% reads)
# - Stale data for a few minutes is OK
# - Set TTL to 5-10 minutes
# - Products rarely change
# 2. Logging service: Write-Back
# - Extremely write-heavy
# - Buffer writes in cache, flush to DB in batches
# - Acceptable to lose a few seconds of logs
# - Reduces DB write pressure by 100x+
# 3. Session store: Write-Through with TTL
# - Every session update must be persisted
# - Reads need to be fast (every request)
# - TTL = session expiry time (e.g., 30 min)
# - Redis is perfect for this use case
Medium Cache Hit Ratio Optimization
Your cache has a 60% hit ratio. Each cache hit costs 1ms and each miss costs 50ms (cache check + DB query). With 10,000 requests/second:
- What is the average response time?
- If you increase the hit ratio to 90%, what is the new average?
- What strategies would you use to improve from 60% to 90%?
Average = (hit_ratio * hit_time) + (miss_ratio * miss_time). To improve hit ratio: increase cache size, better TTL, pre-warm cache, analyze access patterns.
# 1. Average with 60% hit ratio:
avg_60 = (0.60 * 1) + (0.40 * 50)
print(f"60% hit: {avg_60}ms avg") # 20.6ms
# 2. Average with 90% hit ratio:
avg_90 = (0.90 * 1) + (0.10 * 50)
print(f"90% hit: {avg_90}ms avg") # 5.9ms
# That's a 3.5x improvement!
# 3. Strategies to improve hit ratio:
# a) Increase cache size (more items fit)
# b) Tune TTL (too short = unnecessary misses)
# c) Pre-warm cache on startup with popular items
# d) Use cache-aside + write-through together
# e) Analyze access patterns, cache hot data only
# f) Use consistent hashing for distributed cache
Medium Design a Caching Layer
Design a multi-level caching layer for a news feed that serves 1M active users:
- What would you cache at each level?
- What TTL for each cache level?
- How do you handle cache invalidation when a new post is created?
Use CDN for static assets, Redis for feed data, local memory for hot users. Invalidation can use pub/sub to notify cache nodes.
# Multi-level caching for news feed
# Level 1: CDN (static assets)
# - Profile images, CSS, JS, video thumbnails
# - TTL: 24 hours (versioned URLs)
# - Invalidate via CDN purge API
# Level 2: Redis Cluster (feed data)
# - Pre-computed feed: user_id -> [post_ids]
# - Post content: post_id -> {title, body, author}
# - TTL: 5 minutes for feeds, 1 hour for posts
# Level 3: App Server Memory (hot data)
# - Top 100 trending posts (shared across requests)
# - TTL: 30 seconds
# Cache invalidation on new post:
# 1. Write post to database
# 2. Publish event to Redis Pub/Sub channel
# 3. Feed workers subscribe to channel:
# - Update cached feeds for post author's followers
# - Use fan-out-on-write for users with <10K followers
# - Use fan-out-on-read for celebrities (>10K followers)
Quick Reference
Caching Strategy Selection Guide
| Strategy | Read Perf | Write Perf | Consistency | Best For |
|---|---|---|---|---|
| Cache-Aside | High (after warm-up) | N/A (reads only) | Eventual | Read-heavy, general purpose |
| Write-Through | High | Lower (dual write) | Strong | Read-heavy with consistency |
| Write-Back | High | Very High | Eventual | Write-heavy workloads |
| Write-Around | High (for hot data) | High | Eventual | Write-once, read-later data |
Key Takeaways
- Cache-aside is the most common and safest starting point
- Always set a TTL -- unbounded caches lead to stale data and memory issues
- LRU is the default eviction policy for most use cases
- Cache hit ratio above 90% is the target; below 80% investigate your access patterns
- Guard against cache stampede, especially for popular keys