Why Consistent Hashing Matters
Why This Matters
The Problem: When you distribute data across N servers using hash(key) % N, adding or removing a server remaps almost every key. If you go from 10 to 11 servers, ~91% of keys move to a different server -- invalidating caches and triggering massive data migration.
The Solution: Consistent hashing maps both keys and servers onto a circular hash space. Adding or removing a server only remaps ~1/N of the keys (e.g., ~10% with 10 servers), minimizing disruption.
Real Impact: Amazon DynamoDB, Apache Cassandra, Akamai CDN, and Discord all use consistent hashing to distribute data across nodes with minimal reshuffling during scaling events.
Real-World Analogy
Imagine students in a circular seating arrangement:
- Hash ring = A round table with seats numbered 0 to 360
- Servers = Students sitting at specific seats (say seats 90, 180, 270)
- Data keys = Papers that land on random seat numbers
- Assignment rule = Each paper goes to the next student clockwise
- Adding a student = Only papers between the new student and the previous one move
The Problem with Naive Hashing
| Scenario | Naive hash(key) % N | Consistent Hashing |
|---|---|---|
| 10 servers, add 1 | ~91% keys remapped | ~9% keys remapped |
| 10 servers, remove 1 | ~90% keys remapped | ~10% keys remapped |
| 100 servers, add 1 | ~99% keys remapped | ~1% keys remapped |
| Cache invalidation | Near-total cache miss storm | Minimal cache disruption |
The Cache Stampede Problem
Problem: With naive hashing, adding a server remaps most keys. Every remapped key is a cache miss. If thousands of keys miss simultaneously, they all hit the database, potentially crashing it.
Solution: Consistent hashing limits remapping to ~1/N of keys. Combine with request coalescing (only one request fetches the data; others wait) to further reduce database load during node changes.
How Consistent Hashing Works
Algorithm Steps
How It Works
- Step 1: Hash each server name to a position on a circular ring (0 to 2^32 - 1)
- Step 2: Hash each data key to a position on the same ring
- Step 3: Walk clockwise from the key's position until you find a server -- that server owns the key
- Step 4: When adding a server, only keys between the new server and its predecessor are remapped
- Step 5: When removing a server, its keys are reassigned to the next server clockwise
Virtual Nodes
With only a few physical servers on the ring, data distribution can be uneven. Virtual nodes solve this by placing each physical server at multiple positions on the ring. A server with 150 virtual nodes gets 150 positions, resulting in much more even data distribution.
Better Balance
With 3 physical nodes and 0 virtual nodes, one server might own 50% of the ring. With 150 virtual nodes each, distribution approaches a uniform 33% per server.
Heterogeneous Hardware
A powerful server can have more virtual nodes than a weaker one. A server with 32GB RAM gets 200 vnodes; one with 16GB gets 100 vnodes -- proportional to capacity.
Smoother Rebalancing
When a node leaves, its virtual nodes are scattered around the ring. Load spreads evenly across many surviving nodes instead of doubling load on one neighbor.
import hashlib
from bisect import bisect_right
class ConsistentHash:
"""Consistent hashing with virtual nodes."""
def __init__(self, num_vnodes=150):
self.num_vnodes = num_vnodes
self.ring = {} # hash_value -> node_name
self.sorted_keys = [] # sorted hash values on the ring
def _hash(self, key):
"""Generate a consistent hash for a key."""
digest = hashlib.md5(key.encode()).hexdigest()
return int(digest, 16)
def add_node(self, node):
"""Add a node with virtual nodes to the ring."""
for i in range(self.num_vnodes):
vnode_key = f"{node}-vnode-{i}"
hash_val = self._hash(vnode_key)
self.ring[hash_val] = node
self.sorted_keys.append(hash_val)
self.sorted_keys.sort()
def remove_node(self, node):
"""Remove a node and all its virtual nodes."""
for i in range(self.num_vnodes):
vnode_key = f"{node}-vnode-{i}"
hash_val = self._hash(vnode_key)
del self.ring[hash_val]
self.sorted_keys.remove(hash_val)
def get_node(self, key):
"""Find which node owns a given key."""
if not self.ring:
return None
hash_val = self._hash(key)
# Find the first node clockwise on the ring
idx = bisect_right(self.sorted_keys, hash_val)
if idx == len(self.sorted_keys):
idx = 0 # Wrap around to start of ring
return self.ring[self.sorted_keys[idx]]
# Usage
ch = ConsistentHash(num_vnodes=150)
ch.add_node("server-1")
ch.add_node("server-2")
ch.add_node("server-3")
print(ch.get_node("user:1001")) # -> "server-2"
print(ch.get_node("user:1002")) # -> "server-1"
# Add a new server -- only ~33% of keys remap
ch.add_node("server-4")
print(ch.get_node("user:1001")) # likely still "server-2"
Data Partitioning Strategies
| Strategy | How It Works | Pros | Cons |
|---|---|---|---|
| Hash Partitioning | hash(key) determines partition | Even distribution, good for point lookups | Range queries require scatter-gather |
| Range Partitioning | Key ranges assigned to partitions (A-M, N-Z) | Efficient range queries, sorted data | Hot spots if keys are not uniformly distributed |
| Consistent Hashing | Hash ring with minimal remapping | Smooth scaling, minimal data movement | Slightly more complex implementation |
| Geographic | Data partitioned by region/location | Low latency for local users, data sovereignty | Cross-region queries are expensive |
Consistent Hashing in Practice
Amazon DynamoDB
Uses consistent hashing to partition data across storage nodes. Virtual nodes ensure even distribution. When a node fails, its token ranges are redistributed to surviving nodes automatically.
Apache Cassandra
Each node owns a set of token ranges on the ring. The Murmur3 partitioner hashes partition keys to tokens. Adding a node requires streaming only the relevant token ranges.
Memcached / Redis Cluster
Client libraries use consistent hashing to determine which cache server holds a key. Adding a server only invalidates ~1/N of cached items instead of all of them.
Akamai CDN
Consistent hashing determines which edge server caches a particular URL. This ensures the same URL is always routed to the same server, maximizing cache hit rates.
Practice Problems
Medium Cache Cluster Scaling
You run a 10-node Memcached cluster using consistent hashing. During Black Friday, you need to add 5 more nodes:
- Calculate the percentage of keys that will be remapped
- Design a warm-up strategy to minimize cache miss impact
- Choose the optimal number of virtual nodes
With consistent hashing, adding K nodes to N existing nodes remaps approximately K/(N+K) of the keys. For warmup, pre-populate new nodes by reading from old ones before switching traffic.
# 1. Keys remapped: K/(N+K) = 5/(10+5) = 33%
# About 33% of keys will map to new nodes (cache miss)
# 2. Warm-up strategy:
def warm_up_new_nodes(old_ring, new_ring, keys_sample):
for key in keys_sample:
old_node = old_ring.get_node(key)
new_node = new_ring.get_node(key)
if old_node != new_node:
# Copy data from old node to new node
value = old_node.get(key)
if value:
new_node.set(key, value)
# 3. Virtual nodes: 100-200 per physical node
# With 150 vnodes and 15 physical nodes = 2250 points on ring
# Standard deviation of load: ~5% (good enough)
Hard Replication on the Hash Ring
Design a replication strategy for a distributed key-value store using consistent hashing:
- Each key should be replicated on 3 distinct physical nodes
- Handle the case where virtual nodes of the same physical server are adjacent on the ring
- Implement a read repair mechanism
Walk clockwise from the key's position and pick the next 3 distinct physical nodes (skip virtual nodes of already-chosen physical servers). For read repair, compare versions from all replicas and update stale ones.
def get_replica_nodes(ring, key, num_replicas=3):
"""Get N distinct physical nodes for replication."""
hash_val = ring._hash(key)
idx = bisect_right(ring.sorted_keys, hash_val)
replicas = []
seen_physical = set()
for _ in range(len(ring.sorted_keys)):
pos = idx % len(ring.sorted_keys)
node = ring.ring[ring.sorted_keys[pos]]
if node not in seen_physical:
replicas.append(node)
seen_physical.add(node)
if len(replicas) == num_replicas:
break
idx += 1
return replicas
# Read repair: compare versions from replicas
def read_with_repair(key, replicas):
responses = [node.get(key) for node in replicas]
latest = max(responses, key=lambda r: r.version)
for node, resp in zip(replicas, responses):
if resp.version < latest.version:
node.put(key, latest) # Repair stale replica
return latest.value
Hard Hot Key Mitigation
A celebrity post goes viral and one key receives 100x normal traffic, overloading its assigned node:
- Explain why consistent hashing alone does not solve hot keys
- Design a solution that distributes hot key reads across multiple nodes
- Ensure writes still go to a single authoritative node
Append a random suffix to the key for reads (key:shard_0, key:shard_1, etc.) to spread across multiple nodes. Writes go to the primary; a background process fans out to read replicas.
import random
NUM_READ_SHARDS = 10
def write_hot_key(ring, key, value):
"""Write to primary + fan out to read shards."""
primary = ring.get_node(key)
primary.put(key, value)
# Async fan-out to read shards
for i in range(NUM_READ_SHARDS):
shard_key = f"{key}:shard_{i}"
node = ring.get_node(shard_key)
node.put(shard_key, value)
def read_hot_key(ring, key):
"""Read from a random shard to distribute load."""
shard = random.randint(0, NUM_READ_SHARDS - 1)
shard_key = f"{key}:shard_{shard}"
node = ring.get_node(shard_key)
return node.get(shard_key)
Quick Reference
| Operation | Time Complexity | Notes |
|---|---|---|
| Lookup key | O(log N) with binary search | N = total virtual nodes on ring |
| Add node | O(V log N) where V = vnodes | Insert V entries into sorted ring |
| Remove node | O(V log N) | Remove V entries from sorted ring |
| Keys remapped on add | ~K/(N+K) of total keys | K = nodes added, N = existing nodes |
Key Takeaways
- Naive modular hashing remaps ~90%+ keys when nodes change; consistent hashing remaps ~1/N
- Virtual nodes solve the uneven distribution problem with few physical servers
- Replication walks clockwise and picks distinct physical nodes
- Hot keys require application-level sharding, not just consistent hashing
- Used in DynamoDB, Cassandra, Memcached, Akamai, Discord, and many more