Caching Strategies

Medium 25 min read

Why Caching Matters

Why Caching Matters

The Problem: Database queries and API calls are slow and expensive. If 1,000 users request the same product page, fetching it from the database 1,000 times is wasteful.

The Solution: Store frequently accessed data in fast, in-memory storage (cache). Serve subsequent requests from the cache instead of the slow data source.

Real Impact: Facebook uses Memcached to handle billions of requests. A cache hit takes ~1ms vs ~50ms for a database query -- a 50x improvement.

Real-World Analogy

Think of caching like a chef's prep station:

  • Database = The walk-in refrigerator in the back (lots of storage, slow to access)
  • Cache = The prep counter right in front of the chef (limited space, instant access)
  • Cache hit = Ingredient is already on the prep counter -- grab it immediately
  • Cache miss = Need to walk to the fridge, get it, bring it to the counter
  • Eviction = Counter is full -- remove something to make space for new ingredients

Cache Hierarchy

The Cache Hierarchy (Fastest to Slowest)
L1 CPU Cache ~0.5 ns | 64 KB L2 CPU Cache ~7 ns | 256 KB L3 CPU Cache ~30 ns | 8-32 MB Application Cache (Redis / Memcached) ~0.1 ms | GBs of RAM CDN Cache (Edge Servers) ~10-50 ms | TBs distributed globally Database (Disk Storage) ~1-100 ms | TBs-PBs on disk Fastest Slowest

Caching Strategies

There are several patterns for how your application interacts with the cache. The right choice depends on your read/write ratio and consistency requirements.

Cache-Aside (Lazy Loading)

Application checks cache first. On miss, fetches from DB and stores in cache. Most common pattern. Simple to implement. Cache only contains requested data.

Write-Through

Every write goes to both cache and DB synchronously. Cache is always consistent with DB. Higher write latency but no stale data. Good for read-heavy workloads.

Write-Back (Write-Behind)

Writes go to cache only, then asynchronously flushed to DB. Very fast writes but risk of data loss if cache crashes before flush. Good for write-heavy workloads.

Write-Around

Writes go directly to DB, bypassing cache. Cache only populated on reads (cache-aside for reads). Prevents cache pollution from data that is written but rarely read.

Cache-Aside Pattern Flow
Application Cache (Redis) Fast: ~0.1ms Database Slow: ~10ms 1 Check cache 2 Cache miss: query DB 3 Store result in cache for next time Next request for same data: Step 1 returns a cache hit (fast!)

Cache Eviction Policies

Caches have limited memory. When the cache is full and a new item needs to be stored, an eviction policy decides which existing item to remove.

Policy Strategy Best For Complexity
LRULeast Recently Used -- evict oldest accessed itemGeneral purpose, most commonO(1)
LFULeast Frequently Used -- evict least accessed itemWorkloads with stable hot setO(log n)
FIFOFirst In, First Out -- evict oldest itemSimple, time-based dataO(1)
TTLTime To Live -- expire after fixed durationSession data, tokensO(1)
RandomRemove a random itemSurprisingly effective, very simpleO(1)

Distributed Caching with Redis

Redis is the most popular distributed caching solution. It is an in-memory data store that supports strings, hashes, lists, sets, and sorted sets with built-in TTL support.

redis_caching.py
import redis
import json
import time

# Connect to Redis
cache = redis.Redis(host="localhost", port=6379, db=0)

def get_user_profile(user_id):
    """Cache-aside pattern for user profiles."""
    cache_key = f"user:{user_id}"

    # Step 1: Check cache
    cached = cache.get(cache_key)
    if cached:
        print("Cache HIT")
        return json.loads(cached)

    # Step 2: Cache miss - fetch from database
    print("Cache MISS - querying database")
    user = db.query("SELECT * FROM users WHERE id = %s", user_id)

    # Step 3: Store in cache with 5 minute TTL
    cache.setex(
        cache_key,
        300,  # TTL in seconds (5 minutes)
        json.dumps(user)
    )

    return user

def update_user_profile(user_id, new_data):
    """Write-through: update DB and cache together."""
    # Update database first
    db.execute(
        "UPDATE users SET name=%s, email=%s WHERE id=%s",
        (new_data["name"], new_data["email"], user_id)
    )

    # Invalidate cache (or update it)
    cache_key = f"user:{user_id}"
    cache.delete(cache_key)  # Invalidation approach
    # OR: cache.setex(cache_key, 300, json.dumps(new_data))

def get_leaderboard(limit=10):
    """Use Redis sorted set for a real-time leaderboard."""
    # Add scores
    cache.zadd("leaderboard", {"alice": 1500, "bob": 2100, "carol": 1800})

    # Get top N players (descending)
    top_players = cache.zrevrange("leaderboard", 0, limit - 1, withscores=True)
    return [(name.decode(), score) for name, score in top_players]

# Measure cache performance
start = time.time()
profile = get_user_profile(42)  # Miss: ~10ms
print(f"First call: {(time.time()-start)*1000:.1f}ms")

start = time.time()
profile = get_user_profile(42)  # Hit: ~0.1ms
print(f"Second call: {(time.time()-start)*1000:.1f}ms")

Common Pitfall: Cache Stampede

Problem: A popular cache key expires. Hundreds of requests simultaneously hit the database to regenerate it.

Solutions:

  • Locking: Only one request fetches from DB; others wait for cache to be populated
  • Early refresh: Refresh the cache before it expires (e.g., at 80% of TTL)
  • Jittered TTL: Add random variance to TTL so keys do not expire simultaneously

Practice Problems

Easy Choose the Caching Strategy

For each scenario, which caching strategy is best and why?

  1. A product catalog page (read 99%, write 1%)
  2. A logging service writing millions of events/second
  3. A user session store

Consider the read/write ratio and consistency needs. Cache-aside works for reads, write-back for writes, write-through for consistency.

# 1. Product catalog: Cache-Aside
#    - Read-heavy (99% reads)
#    - Stale data for a few minutes is OK
#    - Set TTL to 5-10 minutes
#    - Products rarely change

# 2. Logging service: Write-Back
#    - Extremely write-heavy
#    - Buffer writes in cache, flush to DB in batches
#    - Acceptable to lose a few seconds of logs
#    - Reduces DB write pressure by 100x+

# 3. Session store: Write-Through with TTL
#    - Every session update must be persisted
#    - Reads need to be fast (every request)
#    - TTL = session expiry time (e.g., 30 min)
#    - Redis is perfect for this use case

Medium Cache Hit Ratio Optimization

Your cache has a 60% hit ratio. Each cache hit costs 1ms and each miss costs 50ms (cache check + DB query). With 10,000 requests/second:

  1. What is the average response time?
  2. If you increase the hit ratio to 90%, what is the new average?
  3. What strategies would you use to improve from 60% to 90%?

Average = (hit_ratio * hit_time) + (miss_ratio * miss_time). To improve hit ratio: increase cache size, better TTL, pre-warm cache, analyze access patterns.

# 1. Average with 60% hit ratio:
avg_60 = (0.60 * 1) + (0.40 * 50)
print(f"60% hit: {avg_60}ms avg")  # 20.6ms

# 2. Average with 90% hit ratio:
avg_90 = (0.90 * 1) + (0.10 * 50)
print(f"90% hit: {avg_90}ms avg")  # 5.9ms

# That's a 3.5x improvement!

# 3. Strategies to improve hit ratio:
# a) Increase cache size (more items fit)
# b) Tune TTL (too short = unnecessary misses)
# c) Pre-warm cache on startup with popular items
# d) Use cache-aside + write-through together
# e) Analyze access patterns, cache hot data only
# f) Use consistent hashing for distributed cache

Medium Design a Caching Layer

Design a multi-level caching layer for a news feed that serves 1M active users:

  1. What would you cache at each level?
  2. What TTL for each cache level?
  3. How do you handle cache invalidation when a new post is created?

Use CDN for static assets, Redis for feed data, local memory for hot users. Invalidation can use pub/sub to notify cache nodes.

# Multi-level caching for news feed

# Level 1: CDN (static assets)
# - Profile images, CSS, JS, video thumbnails
# - TTL: 24 hours (versioned URLs)
# - Invalidate via CDN purge API

# Level 2: Redis Cluster (feed data)
# - Pre-computed feed: user_id -> [post_ids]
# - Post content: post_id -> {title, body, author}
# - TTL: 5 minutes for feeds, 1 hour for posts

# Level 3: App Server Memory (hot data)
# - Top 100 trending posts (shared across requests)
# - TTL: 30 seconds

# Cache invalidation on new post:
# 1. Write post to database
# 2. Publish event to Redis Pub/Sub channel
# 3. Feed workers subscribe to channel:
#    - Update cached feeds for post author's followers
#    - Use fan-out-on-write for users with <10K followers
#    - Use fan-out-on-read for celebrities (>10K followers)

Quick Reference

Caching Strategy Selection Guide

Strategy Read Perf Write Perf Consistency Best For
Cache-AsideHigh (after warm-up)N/A (reads only)EventualRead-heavy, general purpose
Write-ThroughHighLower (dual write)StrongRead-heavy with consistency
Write-BackHighVery HighEventualWrite-heavy workloads
Write-AroundHigh (for hot data)HighEventualWrite-once, read-later data

Key Takeaways

  • Cache-aside is the most common and safest starting point
  • Always set a TTL -- unbounded caches lead to stale data and memory issues
  • LRU is the default eviction policy for most use cases
  • Cache hit ratio above 90% is the target; below 80% investigate your access patterns
  • Guard against cache stampede, especially for popular keys