Load Balancing

Medium 22 min read

Why Load Balancing Matters

Why Load Balancing Matters

The Problem: A single server can only handle a limited number of requests. When traffic exceeds that capacity, users experience slow responses or errors.

The Solution: A load balancer distributes incoming traffic across multiple servers, ensuring no single server becomes a bottleneck.

Real Impact: AWS Elastic Load Balancer handles millions of requests per second for companies like Netflix, Airbnb, and Slack, automatically routing traffic to healthy instances.

Load Balancer Architecture
Client 1 Client 2 Client 3 Client N Load Balancer Distributes traffic Health checks SSL termination Server 1 CPU: 45% | Healthy Server 2 CPU: 62% | Healthy Server 3 CPU: 38% | Healthy Server 4 Unhealthy (removed) Unhealthy servers are automatically removed from the pool

Load Balancing Algorithms

Round Robin

Distribute requests sequentially: Server 1, Server 2, Server 3, Server 1, ... Simple and fair when servers have equal capacity. The default for most load balancers.

Weighted Round Robin

Like round robin, but servers with more capacity get proportionally more requests. A server with weight 3 gets 3x the traffic of weight 1.

Least Connections

Send each new request to the server with the fewest active connections. Naturally adapts to servers with different processing speeds.

IP Hash

Hash the client's IP address to determine the server. Same client always goes to the same server. Useful for session persistence (sticky sessions).

Algorithm Complexity Session Sticky? Best For
Round RobinO(1)NoStateless services, equal servers
Weighted Round RobinO(1)NoHeterogeneous server fleet
Least ConnectionsO(log n)NoVariable request durations
IP HashO(1)YesSession persistence needed
Least Response TimeO(n)NoLatency-sensitive workloads
RandomO(1)NoLarge server pools, simple impl

L4 vs L7 Load Balancers

L4 Load Balancer (Transport Layer)

Routes based on IP address and TCP/UDP port. Cannot inspect HTTP headers or content. Very fast (operates at network level). Examples: AWS NLB, HAProxy (TCP mode).

L7 Load Balancer (Application Layer)

Routes based on HTTP headers, URL path, cookies, or content. Can do SSL termination, compression, and caching. Slower but much more flexible. Examples: Nginx, AWS ALB, Envoy.

When to Use Which?

  • L4: Raw TCP/UDP traffic, gaming servers, database connections, maximum performance needed
  • L7: Web applications, API gateways, microservices routing (route /api/users to user service, /api/orders to order service)
  • Both: Many architectures use L4 at the edge for raw speed, then L7 internally for intelligent routing

Health Checks

Load balancers must know which servers are healthy. They do this by periodically sending health check requests to each server.

nginx_load_balancer.conf
# Nginx Load Balancer Configuration

upstream backend {
    # Least connections algorithm
    least_conn;

    # Backend servers with weights
    server 10.0.1.1:8080 weight=3;  # 3x traffic
    server 10.0.1.2:8080 weight=2;  # 2x traffic
    server 10.0.1.3:8080 weight=1;  # 1x traffic
    server 10.0.1.4:8080 backup;    # only if others fail

    # Health check: mark failed after 3 failures
    # Retry after 30 seconds
    server 10.0.1.1:8080 max_fails=3 fail_timeout=30s;
}

server {
    listen 443 ssl;
    server_name api.example.com;

    # SSL termination at load balancer
    ssl_certificate     /etc/ssl/cert.pem;
    ssl_certificate_key /etc/ssl/key.pem;

    location / {
        proxy_pass http://backend;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;

        # Connection timeouts
        proxy_connect_timeout 5s;
        proxy_read_timeout    60s;
    }

    # Health check endpoint
    location /health {
        proxy_pass http://backend/health;
    }
}
health_endpoint.py
from flask import Flask, jsonify
import psycopg2

app = Flask(__name__)

@app.route("/health")
def health_check():
    """Health endpoint for load balancer to probe."""
    checks = {}

    # Check database connectivity
    try:
        conn = psycopg2.connect(host="localhost", database="mydb")
        conn.cursor().execute("SELECT 1")
        conn.close()
        checks["database"] = "healthy"
    except:
        checks["database"] = "unhealthy"

    # Check Redis connectivity
    try:
        cache.ping()
        checks["cache"] = "healthy"
    except:
        checks["cache"] = "unhealthy"

    # Overall status
    all_healthy = all(v == "healthy" for v in checks.values())
    status_code = 200 if all_healthy else 503

    return jsonify({
        "status": "healthy" if all_healthy else "degraded",
        "checks": checks
    }), status_code

Global vs Local Load Balancing

Local Load Balancing

Distributes traffic within a single datacenter or region. Handles server-level routing. Uses Nginx, HAProxy, or cloud ALB/NLB. Most common type.

Global Load Balancing (GSLB)

Routes users to the nearest datacenter based on geographic location, health, and latency. Uses DNS-based routing or anycast. AWS Route 53, Cloudflare, Google Cloud LB.

Global Load Balancing Architecture

  • User in New York -> DNS resolves to US-East datacenter
  • User in Tokyo -> DNS resolves to AP-Northeast datacenter
  • User in London -> DNS resolves to EU-West datacenter
  • If US-East is down, traffic automatically shifts to US-West (failover)

Practice Problems

Easy Choose the Algorithm

Which load balancing algorithm would you use for each scenario?

  1. A stateless REST API with identical servers
  2. A WebSocket-based chat application
  3. A fleet with 3 powerful servers and 2 smaller ones

Consider: Are servers identical? Does the client need to return to the same server? Are connection durations variable?

# 1. Stateless REST API: Round Robin
#    - Servers are identical (stateless)
#    - No session affinity needed
#    - Simple, even distribution

# 2. WebSocket chat app: IP Hash or Least Connections
#    - IP Hash: same user always connects to same server
#    - Important for maintaining WebSocket connections
#    - Alt: Least Connections if using external session store

# 3. Mixed server fleet: Weighted Round Robin
#    - Powerful servers: weight=3
#    - Smaller servers: weight=1
#    - Distributes proportionally to capacity

Medium Design High Availability

Your load balancer is a single point of failure. Design a highly available load balancing setup:

  1. How do you make the load balancer itself redundant?
  2. How do you handle load balancer failover?
  3. What is the role of DNS in this setup?

Use active-passive or active-active LB pairs. Virtual IP (VIP) with keepalived for failover. DNS with multiple A records.

# Highly Available Load Balancer Design

# 1. Redundant load balancers:
#    Active-Passive pair with shared Virtual IP (VIP)
#    - LB1 (Active):  handles all traffic
#    - LB2 (Passive): monitors LB1 via heartbeat
#    - Both share VIP: 10.0.0.100

# 2. Failover mechanism:
#    - Use keepalived/VRRP protocol
#    - LB2 sends heartbeats to LB1 every 1 second
#    - If 3 heartbeats fail, LB2 takes over the VIP
#    - Failover time: ~3 seconds
#    - OR: Active-Active with DNS round robin

# 3. DNS role:
#    - DNS A records point to multiple LB VIPs
#    - api.example.com -> 10.0.0.100 (Region 1)
#    - api.example.com -> 10.0.1.100 (Region 2)
#    - Health-checked DNS (Route 53) removes unhealthy
#    - Client retries with next IP on failure

Medium Microservices Routing

Design L7 load balancing for a microservices architecture with these services: User Service, Order Service, Payment Service, and Notification Service.

  1. How would you route requests based on URL path?
  2. How do you handle different scaling needs per service?
  3. How do you implement canary deployments?

Use L7 path-based routing. Each service has its own upstream pool with independent scaling. Canary: route a percentage of traffic to the new version.

# L7 Path-Based Routing (Nginx config)

# 1. Route by URL path:
# /api/users/*   -> User Service (3 instances)
# /api/orders/*  -> Order Service (5 instances)
# /api/payments/* -> Payment Service (2 instances)

# 2. Independent scaling:
# upstream user_service {
#     server user-1:8080;
#     server user-2:8080;
#     server user-3:8080;
# }
# upstream order_service {
#     server order-1:8080; # ... 5 servers
# }

# 3. Canary deployment (10% to new version):
# upstream user_service {
#     server user-v1-1:8080 weight=9;  # 90%
#     server user-v1-2:8080 weight=9;
#     server user-v2-1:8080 weight=1;  # 10% canary
# }
# Monitor error rates on v2, if good: increase weight

Quick Reference

Load Balancer Comparison

Load Balancer Type Layer Best For
NginxSoftwareL7 (L4 capable)Web apps, reverse proxy, API gateway
HAProxySoftwareL4 / L7High performance, TCP/HTTP
AWS ALBCloud ManagedL7AWS HTTP/HTTPS workloads
AWS NLBCloud ManagedL4AWS TCP/UDP, ultra-low latency
EnvoySoftwareL7Service mesh, microservices
Cloudflare LBCloud ManagedL7 + DNSGlobal load balancing, DDoS protection

Key Takeaways

  • Round robin is the simplest and works well for stateless services
  • Least connections adapts better when request durations vary
  • L7 load balancers give you path-based routing and SSL termination
  • Always implement health checks -- never send traffic to unhealthy servers
  • Make the load balancer itself redundant (active-passive or active-active)
  • Use global load balancing (DNS-based) for multi-region deployments