Load Balancing | LIZIU System Design

Why Load Balancing Matters

The Problem: A single server can only handle a limited number of requests. When traffic exceeds that capacity, users experience slow responses or errors.

The Solution: A load balancer distributes incoming traffic across multiple servers, ensuring no single server becomes a bottleneck.

Real Impact: AWS Elastic Load Balancer handles millions of requests per second for companies like Netflix, Airbnb, and Slack, automatically routing traffic to healthy instances.

Load Balancer Architecture

Load Balancing Algorithms

Round Robin

Distribute requests sequentially: Server 1, Server 2, Server 3, Server 1, ... Simple and fair when servers have equal capacity. The default for most load balancers.

Weighted Round Robin

Like round robin, but servers with more capacity get proportionally more requests. A server with weight 3 gets 3x the traffic of weight 1.

Least Connections

Send each new request to the server with the fewest active connections. Naturally adapts to servers with different processing speeds.

IP Hash

Hash the client's IP address to determine the server. Same client always goes to the same server. Useful for session persistence (sticky sessions).

Algorithm	Complexity	Session Sticky?	Best For
Round Robin	O(1)	No	Stateless services, equal servers
Weighted Round Robin	O(1)	No	Heterogeneous server fleet
Least Connections	O(log n)	No	Variable request durations
IP Hash	O(1)	Yes	Session persistence needed
Least Response Time	O(n)	No	Latency-sensitive workloads
Random	O(1)	No	Large server pools, simple impl

L4 vs L7 Load Balancers

L4 Load Balancer (Transport Layer)

Routes based on IP address and TCP/UDP port. Cannot inspect HTTP headers or content. Very fast (operates at network level). Examples: AWS NLB, HAProxy (TCP mode).

L7 Load Balancer (Application Layer)

Routes based on HTTP headers, URL path, cookies, or content. Can do SSL termination, compression, and caching. Slower but much more flexible. Examples: Nginx, AWS ALB, Envoy.

When to Use Which?

L4: Raw TCP/UDP traffic, gaming servers, database connections, maximum performance needed
L7: Web applications, API gateways, microservices routing (route /api/users to user service, /api/orders to order service)
Both: Many architectures use L4 at the edge for raw speed, then L7 internally for intelligent routing

Health Checks

Load balancers must know which servers are healthy. They do this by periodically sending health check requests to each server.

nginx_load_balancer.conf

# Nginx Load Balancer Configuration

upstream backend {
    # Least connections algorithm
    least_conn;

    # Backend servers with weights
    server 10.0.1.1:8080 weight=3;  # 3x traffic
    server 10.0.1.2:8080 weight=2;  # 2x traffic
    server 10.0.1.3:8080 weight=1;  # 1x traffic
    server 10.0.1.4:8080 backup;    # only if others fail

    # Health check: mark failed after 3 failures
    # Retry after 30 seconds
    server 10.0.1.1:8080 max_fails=3 fail_timeout=30s;
}

server {
    listen 443 ssl;
    server_name api.example.com;

    # SSL termination at load balancer
    ssl_certificate     /etc/ssl/cert.pem;
    ssl_certificate_key /etc/ssl/key.pem;

    location / {
        proxy_pass http://backend;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;

        # Connection timeouts
        proxy_connect_timeout 5s;
        proxy_read_timeout    60s;
    }

    # Health check endpoint
    location /health {
        proxy_pass http://backend/health;
    }
}

health_endpoint.py

from flask import Flask, jsonify
import psycopg2

app = Flask(__name__)

@app.route("/health")
def health_check():
    """Health endpoint for load balancer to probe."""
    checks = {}

    # Check database connectivity
    try:
        conn = psycopg2.connect(host="localhost", database="mydb")
        conn.cursor().execute("SELECT 1")
        conn.close()
        checks["database"] = "healthy"
    except:
        checks["database"] = "unhealthy"

    # Check Redis connectivity
    try:
        cache.ping()
        checks["cache"] = "healthy"
    except:
        checks["cache"] = "unhealthy"

    # Overall status
    all_healthy = all(v == "healthy" for v in checks.values())
    status_code = 200 if all_healthy else 503

    return jsonify({
        "status": "healthy" if all_healthy else "degraded",
        "checks": checks
    }), status_code

Global vs Local Load Balancing

Local Load Balancing

Distributes traffic within a single datacenter or region. Handles server-level routing. Uses Nginx, HAProxy, or cloud ALB/NLB. Most common type.

Global Load Balancing (GSLB)

Routes users to the nearest datacenter based on geographic location, health, and latency. Uses DNS-based routing or anycast. AWS Route 53, Cloudflare, Google Cloud LB.

Global Load Balancing Architecture

User in New York -> DNS resolves to US-East datacenter
User in Tokyo -> DNS resolves to AP-Northeast datacenter
User in London -> DNS resolves to EU-West datacenter
If US-East is down, traffic automatically shifts to US-West (failover)

Practice Problems

Easy Choose the Algorithm

Which load balancing algorithm would you use for each scenario?

A stateless REST API with identical servers
A WebSocket-based chat application
A fleet with 3 powerful servers and 2 smaller ones

Consider: Are servers identical? Does the client need to return to the same server? Are connection durations variable?

# 1. Stateless REST API: Round Robin
#    - Servers are identical (stateless)
#    - No session affinity needed
#    - Simple, even distribution

# 2. WebSocket chat app: IP Hash or Least Connections
#    - IP Hash: same user always connects to same server
#    - Important for maintaining WebSocket connections
#    - Alt: Least Connections if using external session store

# 3. Mixed server fleet: Weighted Round Robin
#    - Powerful servers: weight=3
#    - Smaller servers: weight=1
#    - Distributes proportionally to capacity

Medium Design High Availability

Your load balancer is a single point of failure. Design a highly available load balancing setup:

How do you make the load balancer itself redundant?
How do you handle load balancer failover?
What is the role of DNS in this setup?

Use active-passive or active-active LB pairs. Virtual IP (VIP) with keepalived for failover. DNS with multiple A records.

# Highly Available Load Balancer Design

# 1. Redundant load balancers:
#    Active-Passive pair with shared Virtual IP (VIP)
#    - LB1 (Active):  handles all traffic
#    - LB2 (Passive): monitors LB1 via heartbeat
#    - Both share VIP: 10.0.0.100

# 2. Failover mechanism:
#    - Use keepalived/VRRP protocol
#    - LB2 sends heartbeats to LB1 every 1 second
#    - If 3 heartbeats fail, LB2 takes over the VIP
#    - Failover time: ~3 seconds
#    - OR: Active-Active with DNS round robin

# 3. DNS role:
#    - DNS A records point to multiple LB VIPs
#    - api.example.com -> 10.0.0.100 (Region 1)
#    - api.example.com -> 10.0.1.100 (Region 2)
#    - Health-checked DNS (Route 53) removes unhealthy
#    - Client retries with next IP on failure

Medium Microservices Routing

Design L7 load balancing for a microservices architecture with these services: User Service, Order Service, Payment Service, and Notification Service.

How would you route requests based on URL path?
How do you handle different scaling needs per service?
How do you implement canary deployments?

Use L7 path-based routing. Each service has its own upstream pool with independent scaling. Canary: route a percentage of traffic to the new version.

# L7 Path-Based Routing (Nginx config)

# 1. Route by URL path:
# /api/users/*   -> User Service (3 instances)
# /api/orders/*  -> Order Service (5 instances)
# /api/payments/* -> Payment Service (2 instances)

# 2. Independent scaling:
# upstream user_service {
#     server user-1:8080;
#     server user-2:8080;
#     server user-3:8080;
# }
# upstream order_service {
#     server order-1:8080; # ... 5 servers
# }

# 3. Canary deployment (10% to new version):
# upstream user_service {
#     server user-v1-1:8080 weight=9;  # 90%
#     server user-v1-2:8080 weight=9;
#     server user-v2-1:8080 weight=1;  # 10% canary
# }
# Monitor error rates on v2, if good: increase weight

Quick Reference

Load Balancer Comparison

Load Balancer	Type	Layer	Best For
Nginx	Software	L7 (L4 capable)	Web apps, reverse proxy, API gateway
HAProxy	Software	L4 / L7	High performance, TCP/HTTP
AWS ALB	Cloud Managed	L7	AWS HTTP/HTTPS workloads
AWS NLB	Cloud Managed	L4	AWS TCP/UDP, ultra-low latency
Envoy	Software	L7	Service mesh, microservices
Cloudflare LB	Cloud Managed	L7 + DNS	Global load balancing, DDoS protection

Key Takeaways

Round robin is the simplest and works well for stateless services
Least connections adapts better when request durations vary
L7 load balancers give you path-based routing and SSL termination
Always implement health checks -- never send traffic to unhealthy servers
Make the load balancer itself redundant (active-passive or active-active)
Use global load balancing (DNS-based) for multi-region deployments

Why Load Balancing Matters

Why Load Balancing Matters

Load Balancing Algorithms

Round Robin

Weighted Round Robin

Least Connections

IP Hash

L4 vs L7 Load Balancers

L4 Load Balancer (Transport Layer)

L7 Load Balancer (Application Layer)

When to Use Which?

Health Checks

Global vs Local Load Balancing

Local Load Balancing

Global Load Balancing (GSLB)

Global Load Balancing Architecture

Practice Problems

Easy Choose the Algorithm

Medium Design High Availability

Medium Microservices Routing

Quick Reference

Load Balancer Comparison

Key Takeaways

Related Topics