Why Load Balancing Matters
Why Load Balancing Matters
The Problem: A single server can only handle a limited number of requests. When traffic exceeds that capacity, users experience slow responses or errors.
The Solution: A load balancer distributes incoming traffic across multiple servers, ensuring no single server becomes a bottleneck.
Real Impact: AWS Elastic Load Balancer handles millions of requests per second for companies like Netflix, Airbnb, and Slack, automatically routing traffic to healthy instances.
Load Balancing Algorithms
Round Robin
Distribute requests sequentially: Server 1, Server 2, Server 3, Server 1, ... Simple and fair when servers have equal capacity. The default for most load balancers.
Weighted Round Robin
Like round robin, but servers with more capacity get proportionally more requests. A server with weight 3 gets 3x the traffic of weight 1.
Least Connections
Send each new request to the server with the fewest active connections. Naturally adapts to servers with different processing speeds.
IP Hash
Hash the client's IP address to determine the server. Same client always goes to the same server. Useful for session persistence (sticky sessions).
| Algorithm | Complexity | Session Sticky? | Best For |
|---|---|---|---|
| Round Robin | O(1) | No | Stateless services, equal servers |
| Weighted Round Robin | O(1) | No | Heterogeneous server fleet |
| Least Connections | O(log n) | No | Variable request durations |
| IP Hash | O(1) | Yes | Session persistence needed |
| Least Response Time | O(n) | No | Latency-sensitive workloads |
| Random | O(1) | No | Large server pools, simple impl |
L4 vs L7 Load Balancers
L4 Load Balancer (Transport Layer)
Routes based on IP address and TCP/UDP port. Cannot inspect HTTP headers or content. Very fast (operates at network level). Examples: AWS NLB, HAProxy (TCP mode).
L7 Load Balancer (Application Layer)
Routes based on HTTP headers, URL path, cookies, or content. Can do SSL termination, compression, and caching. Slower but much more flexible. Examples: Nginx, AWS ALB, Envoy.
When to Use Which?
- L4: Raw TCP/UDP traffic, gaming servers, database connections, maximum performance needed
- L7: Web applications, API gateways, microservices routing (route /api/users to user service, /api/orders to order service)
- Both: Many architectures use L4 at the edge for raw speed, then L7 internally for intelligent routing
Health Checks
Load balancers must know which servers are healthy. They do this by periodically sending health check requests to each server.
# Nginx Load Balancer Configuration
upstream backend {
# Least connections algorithm
least_conn;
# Backend servers with weights
server 10.0.1.1:8080 weight=3; # 3x traffic
server 10.0.1.2:8080 weight=2; # 2x traffic
server 10.0.1.3:8080 weight=1; # 1x traffic
server 10.0.1.4:8080 backup; # only if others fail
# Health check: mark failed after 3 failures
# Retry after 30 seconds
server 10.0.1.1:8080 max_fails=3 fail_timeout=30s;
}
server {
listen 443 ssl;
server_name api.example.com;
# SSL termination at load balancer
ssl_certificate /etc/ssl/cert.pem;
ssl_certificate_key /etc/ssl/key.pem;
location / {
proxy_pass http://backend;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
# Connection timeouts
proxy_connect_timeout 5s;
proxy_read_timeout 60s;
}
# Health check endpoint
location /health {
proxy_pass http://backend/health;
}
}
from flask import Flask, jsonify
import psycopg2
app = Flask(__name__)
@app.route("/health")
def health_check():
"""Health endpoint for load balancer to probe."""
checks = {}
# Check database connectivity
try:
conn = psycopg2.connect(host="localhost", database="mydb")
conn.cursor().execute("SELECT 1")
conn.close()
checks["database"] = "healthy"
except:
checks["database"] = "unhealthy"
# Check Redis connectivity
try:
cache.ping()
checks["cache"] = "healthy"
except:
checks["cache"] = "unhealthy"
# Overall status
all_healthy = all(v == "healthy" for v in checks.values())
status_code = 200 if all_healthy else 503
return jsonify({
"status": "healthy" if all_healthy else "degraded",
"checks": checks
}), status_code
Global vs Local Load Balancing
Local Load Balancing
Distributes traffic within a single datacenter or region. Handles server-level routing. Uses Nginx, HAProxy, or cloud ALB/NLB. Most common type.
Global Load Balancing (GSLB)
Routes users to the nearest datacenter based on geographic location, health, and latency. Uses DNS-based routing or anycast. AWS Route 53, Cloudflare, Google Cloud LB.
Global Load Balancing Architecture
- User in New York -> DNS resolves to US-East datacenter
- User in Tokyo -> DNS resolves to AP-Northeast datacenter
- User in London -> DNS resolves to EU-West datacenter
- If US-East is down, traffic automatically shifts to US-West (failover)
Practice Problems
Easy Choose the Algorithm
Which load balancing algorithm would you use for each scenario?
- A stateless REST API with identical servers
- A WebSocket-based chat application
- A fleet with 3 powerful servers and 2 smaller ones
Consider: Are servers identical? Does the client need to return to the same server? Are connection durations variable?
# 1. Stateless REST API: Round Robin
# - Servers are identical (stateless)
# - No session affinity needed
# - Simple, even distribution
# 2. WebSocket chat app: IP Hash or Least Connections
# - IP Hash: same user always connects to same server
# - Important for maintaining WebSocket connections
# - Alt: Least Connections if using external session store
# 3. Mixed server fleet: Weighted Round Robin
# - Powerful servers: weight=3
# - Smaller servers: weight=1
# - Distributes proportionally to capacity
Medium Design High Availability
Your load balancer is a single point of failure. Design a highly available load balancing setup:
- How do you make the load balancer itself redundant?
- How do you handle load balancer failover?
- What is the role of DNS in this setup?
Use active-passive or active-active LB pairs. Virtual IP (VIP) with keepalived for failover. DNS with multiple A records.
# Highly Available Load Balancer Design
# 1. Redundant load balancers:
# Active-Passive pair with shared Virtual IP (VIP)
# - LB1 (Active): handles all traffic
# - LB2 (Passive): monitors LB1 via heartbeat
# - Both share VIP: 10.0.0.100
# 2. Failover mechanism:
# - Use keepalived/VRRP protocol
# - LB2 sends heartbeats to LB1 every 1 second
# - If 3 heartbeats fail, LB2 takes over the VIP
# - Failover time: ~3 seconds
# - OR: Active-Active with DNS round robin
# 3. DNS role:
# - DNS A records point to multiple LB VIPs
# - api.example.com -> 10.0.0.100 (Region 1)
# - api.example.com -> 10.0.1.100 (Region 2)
# - Health-checked DNS (Route 53) removes unhealthy
# - Client retries with next IP on failure
Medium Microservices Routing
Design L7 load balancing for a microservices architecture with these services: User Service, Order Service, Payment Service, and Notification Service.
- How would you route requests based on URL path?
- How do you handle different scaling needs per service?
- How do you implement canary deployments?
Use L7 path-based routing. Each service has its own upstream pool with independent scaling. Canary: route a percentage of traffic to the new version.
# L7 Path-Based Routing (Nginx config)
# 1. Route by URL path:
# /api/users/* -> User Service (3 instances)
# /api/orders/* -> Order Service (5 instances)
# /api/payments/* -> Payment Service (2 instances)
# 2. Independent scaling:
# upstream user_service {
# server user-1:8080;
# server user-2:8080;
# server user-3:8080;
# }
# upstream order_service {
# server order-1:8080; # ... 5 servers
# }
# 3. Canary deployment (10% to new version):
# upstream user_service {
# server user-v1-1:8080 weight=9; # 90%
# server user-v1-2:8080 weight=9;
# server user-v2-1:8080 weight=1; # 10% canary
# }
# Monitor error rates on v2, if good: increase weight
Quick Reference
Load Balancer Comparison
| Load Balancer | Type | Layer | Best For |
|---|---|---|---|
| Nginx | Software | L7 (L4 capable) | Web apps, reverse proxy, API gateway |
| HAProxy | Software | L4 / L7 | High performance, TCP/HTTP |
| AWS ALB | Cloud Managed | L7 | AWS HTTP/HTTPS workloads |
| AWS NLB | Cloud Managed | L4 | AWS TCP/UDP, ultra-low latency |
| Envoy | Software | L7 | Service mesh, microservices |
| Cloudflare LB | Cloud Managed | L7 + DNS | Global load balancing, DDoS protection |
Key Takeaways
- Round robin is the simplest and works well for stateless services
- Least connections adapts better when request durations vary
- L7 load balancers give you path-based routing and SSL termination
- Always implement health checks -- never send traffic to unhealthy servers
- Make the load balancer itself redundant (active-passive or active-active)
- Use global load balancing (DNS-based) for multi-region deployments