Proxies & API Gateways | LIZIU System Design

Why Proxies Matter

Why This Matters

The Problem: As your architecture grows from one monolith to dozens of microservices, clients need a single entry point. Without one, each client must know every service's address, handle authentication, rate limiting, and retries independently.

The Solution: Proxies and API gateways provide a unified entry point that handles cross-cutting concerns: routing, auth, rate limiting, load balancing, and observability -- all in one place.

Real Impact: Netflix's Zuul gateway processes over 50 billion requests per day, routing traffic to hundreds of backend microservices while enforcing security and resiliency policies.

Real-World Analogy

Think of a proxy like a hotel concierge:

Forward proxy = A travel agent who books on your behalf (hides your identity from the hotel)
Reverse proxy = The hotel concierge (a single point of contact for all hotel services)
API gateway = The concierge desk with a menu of services, security badge checks, and rate limits
Service discovery = The concierge's directory of internal extensions and room numbers

Forward vs Reverse Proxy

Feature	Forward Proxy	Reverse Proxy
Position	Client side	Server side
Purpose	Hide client identity, filter content, bypass restrictions	Load balancing, SSL termination, caching, security
Who knows	Client knows about proxy; server does not	Server knows about proxy; client does not
Examples	Squid, corporate web filters, VPNs	Nginx, HAProxy, Envoy, Cloudflare

API Gateway Pattern

An API gateway is a reverse proxy that acts as the single entry point for all client requests. Beyond simple routing, it handles authentication, rate limiting, request/response transformation, protocol translation, and aggregation of responses from multiple backend services.

API Gateway Architecture

nginx-reverse-proxy.conf

# Nginx as a reverse proxy and API gateway

upstream user_service {
    server 10.0.1.10:8080;
    server 10.0.1.11:8080;
    server 10.0.1.12:8080;
}

upstream order_service {
    server 10.0.2.10:8080;
    server 10.0.2.11:8080;
}

server {
    listen 443 ssl;
    server_name api.example.com;

    # SSL termination at the gateway
    ssl_certificate     /etc/ssl/api.example.com.crt;
    ssl_certificate_key /etc/ssl/api.example.com.key;

    # Rate limiting: 10 requests per second per IP
    limit_req_zone $binary_remote_addr zone=api:10m rate=10r/s;

    # Route /api/users to user_service
    location /api/users {
        limit_req zone=api burst=20;
        proxy_pass http://user_service;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    }

    # Route /api/orders to order_service
    location /api/orders {
        limit_req zone=api burst=20;
        proxy_pass http://order_service;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
    }
}

Service Discovery

In dynamic environments where services scale up and down constantly (like Kubernetes), hardcoding IP addresses is impractical. Service discovery automatically tracks which instances are available and where they are running.

Client-Side Discovery

The client queries the service registry and picks an instance. The client handles load balancing. Used by Netflix Eureka and Ribbon.

Server-Side Discovery

The client sends requests to a load balancer that queries the registry. Simpler for clients. Used by AWS ALB, Kubernetes Services.

DNS-Based Discovery

Services register DNS records. Clients resolve hostnames. Simple but DNS TTLs can cause stale routing. Used by Consul DNS and CoreDNS.

Service Mesh (Sidecar)

A sidecar proxy (like Envoy) runs next to each service instance. Handles discovery, load balancing, retries, and mTLS transparently. Used by Istio and Linkerd.

Circuit Breaker Pattern

When a downstream service is failing, continuing to send requests wastes resources and increases latency. The circuit breaker pattern detects failures and "opens the circuit" -- rejecting requests immediately rather than waiting for timeouts.

State	Behavior	Transition
Closed	Normal operation. Requests pass through. Failures are counted.	Opens when failure threshold is exceeded
Open	All requests immediately fail with a fallback response.	After timeout, transitions to Half-Open
Half-Open	A limited number of test requests are sent through.	If test succeeds, closes. If fails, re-opens.

circuit_breaker.py

import time
from enum import Enum

class State(Enum):
    CLOSED = "closed"
    OPEN = "open"
    HALF_OPEN = "half_open"

class CircuitBreaker:
    def __init__(self, failure_threshold=5, recovery_timeout=30):
        self.failure_threshold = failure_threshold
        self.recovery_timeout = recovery_timeout
        self.state = State.CLOSED
        self.failure_count = 0
        self.last_failure_time = 0

    def call(self, func, *args, **kwargs):
        if self.state == State.OPEN:
            if time.time() - self.last_failure_time > self.recovery_timeout:
                self.state = State.HALF_OPEN
            else:
                raise CircuitOpenError("Circuit is open")

        try:
            result = func(*args, **kwargs)
            self._on_success()
            return result
        except Exception as e:
            self._on_failure()
            raise

    def _on_success(self):
        self.failure_count = 0
        self.state = State.CLOSED

    def _on_failure(self):
        self.failure_count += 1
        self.last_failure_time = time.time()
        if self.failure_count >= self.failure_threshold:
            self.state = State.OPEN

# Usage
breaker = CircuitBreaker(failure_threshold=5, recovery_timeout=30)
try:
    result = breaker.call(call_payment_service, order_id=123)
except CircuitOpenError:
    # Return cached or default response
    result = {"status": "pending", "message": "Service temporarily unavailable"}

Common Pitfall: Cascading Failures

Problem: Service A calls Service B, which calls Service C. If C is slow, B's threads are exhausted waiting, then A's threads are exhausted too. One slow service takes down the entire chain.

Solution: Use circuit breakers at every service boundary, set aggressive timeouts, and implement bulkheads (isolated thread pools per dependency) to contain failures.

Practice Problems

Medium API Gateway Design

Design an API gateway for a ride-sharing app with these requirements:

Route requests to 5 backend services (user, ride, payment, notification, analytics)
Different rate limits for free vs premium users
Aggregate driver location + ride status into a single response for the mobile app

Use path-based routing (/api/rides, /api/users, etc.). For rate limiting, extract the user tier from the JWT. For aggregation, make parallel calls to ride and location services and merge responses.

# Gateway routing configuration
routes = {
    '/api/users/**':    'user-service:8080',
    '/api/rides/**':    'ride-service:8080',
    '/api/payments/**': 'payment-service:8080',
}

# Tiered rate limiting
rate_limits = {
    'free': 60,      # 60 requests/min
    'premium': 600,  # 600 requests/min
}

# Response aggregation endpoint
async def get_ride_status(ride_id):
    # Parallel calls to multiple services
    ride, location = await asyncio.gather(
        ride_service.get(ride_id),
        location_service.get_driver(ride_id),
    )
    return {
        'ride': ride,
        'driver_location': location,
        'eta': calculate_eta(location, ride['destination'])
    }

Medium Service Discovery Migration

Your monolith is being split into microservices. Design the service discovery approach:

Choose between client-side vs server-side discovery
Handle service instances scaling from 2 to 50 during peak hours
Ensure zero-downtime during deployments

Server-side discovery (Kubernetes Services or AWS ALB) is simpler for most teams. Use health checks to deregister unhealthy instances. Rolling deployments with readiness probes ensure zero downtime.

# Kubernetes-based service discovery (recommended)
# 1. Each microservice gets a Kubernetes Service
# 2. DNS resolves service-name.namespace.svc.cluster.local
# 3. kube-proxy handles load balancing across pods

# Service definition
apiVersion: v1
kind: Service
metadata:
  name: user-service
spec:
  selector:
    app: user-service
  ports:
    - port: 80
      targetPort: 8080

# Readiness probe for zero-downtime deploys
readinessProbe:
  httpGet:
    path: /health
    port: 8080
  initialDelaySeconds: 5
  periodSeconds: 10

Hard Resilient Gateway

Design a highly available API gateway that handles:

50,000 requests per second across 3 regions
Automatic failover if one region's gateway goes down
Circuit breakers, retries with exponential backoff, and bulkheads

Use DNS-based global load balancing (Route53 latency routing) across regions. Each region runs multiple gateway instances behind a local load balancer. Implement circuit breakers per-service and bulkheads (separate thread pools) per dependency.

# Multi-region gateway architecture
# 1. Global DNS (Route53) with health checks
# 2. Each region: 3+ gateway instances behind NLB
# 3. Per-service circuit breakers + bulkheads

class ResilientGateway:
    def __init__(self):
        # Separate circuit breakers per service
        self.breakers = {
            'user': CircuitBreaker(threshold=5, timeout=30),
            'order': CircuitBreaker(threshold=5, timeout=30),
            'payment': CircuitBreaker(threshold=3, timeout=60),
        }
        # Bulkheads: separate thread pools per service
        self.pools = {
            'user': ThreadPoolExecutor(max_workers=50),
            'order': ThreadPoolExecutor(max_workers=50),
            'payment': ThreadPoolExecutor(max_workers=20),
        }

    async def route(self, service, request):
        breaker = self.breakers[service]
        return await breaker.call(
            self._call_with_retry, service, request
        )

    async def _call_with_retry(self, service, request, retries=3):
        for attempt in range(retries):
            try:
                return await asyncio.wait_for(
                    self._forward(service, request),
                    timeout=5.0
                )
            except asyncio.TimeoutError:
                await asyncio.sleep(0.1 * (2 ** attempt))
        raise ServiceUnavailable(service)

Quick Reference

Tool	Type	Best For
Nginx	Reverse proxy / LB	Static content, SSL termination, basic routing
HAProxy	TCP/HTTP load balancer	High-performance L4/L7 load balancing
Envoy	Service proxy	Service mesh sidecar, advanced observability
Kong	API gateway	Plugin ecosystem, authentication, rate limiting
AWS API Gateway	Managed gateway	Serverless APIs, Lambda integration
Traefik	Cloud-native proxy	Kubernetes ingress, auto-discovery

Key Takeaways

Forward proxies protect clients; reverse proxies protect servers
API gateways consolidate cross-cutting concerns (auth, rate limiting, routing)
Service discovery replaces hardcoded addresses with dynamic resolution
Circuit breakers prevent cascading failures in distributed systems
Bulkheads isolate failures by limiting resources per dependency
Use the Backend-for-Frontend (BFF) pattern for platform-specific gateways