Why Proxies Matter
Why This Matters
The Problem: As your architecture grows from one monolith to dozens of microservices, clients need a single entry point. Without one, each client must know every service's address, handle authentication, rate limiting, and retries independently.
The Solution: Proxies and API gateways provide a unified entry point that handles cross-cutting concerns: routing, auth, rate limiting, load balancing, and observability -- all in one place.
Real Impact: Netflix's Zuul gateway processes over 50 billion requests per day, routing traffic to hundreds of backend microservices while enforcing security and resiliency policies.
Real-World Analogy
Think of a proxy like a hotel concierge:
- Forward proxy = A travel agent who books on your behalf (hides your identity from the hotel)
- Reverse proxy = The hotel concierge (a single point of contact for all hotel services)
- API gateway = The concierge desk with a menu of services, security badge checks, and rate limits
- Service discovery = The concierge's directory of internal extensions and room numbers
Forward vs Reverse Proxy
| Feature | Forward Proxy | Reverse Proxy |
|---|---|---|
| Position | Client side | Server side |
| Purpose | Hide client identity, filter content, bypass restrictions | Load balancing, SSL termination, caching, security |
| Who knows | Client knows about proxy; server does not | Server knows about proxy; client does not |
| Examples | Squid, corporate web filters, VPNs | Nginx, HAProxy, Envoy, Cloudflare |
API Gateway Pattern
An API gateway is a reverse proxy that acts as the single entry point for all client requests. Beyond simple routing, it handles authentication, rate limiting, request/response transformation, protocol translation, and aggregation of responses from multiple backend services.
# Nginx as a reverse proxy and API gateway
upstream user_service {
server 10.0.1.10:8080;
server 10.0.1.11:8080;
server 10.0.1.12:8080;
}
upstream order_service {
server 10.0.2.10:8080;
server 10.0.2.11:8080;
}
server {
listen 443 ssl;
server_name api.example.com;
# SSL termination at the gateway
ssl_certificate /etc/ssl/api.example.com.crt;
ssl_certificate_key /etc/ssl/api.example.com.key;
# Rate limiting: 10 requests per second per IP
limit_req_zone $binary_remote_addr zone=api:10m rate=10r/s;
# Route /api/users to user_service
location /api/users {
limit_req zone=api burst=20;
proxy_pass http://user_service;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
}
# Route /api/orders to order_service
location /api/orders {
limit_req zone=api burst=20;
proxy_pass http://order_service;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
}
}
Service Discovery
In dynamic environments where services scale up and down constantly (like Kubernetes), hardcoding IP addresses is impractical. Service discovery automatically tracks which instances are available and where they are running.
Client-Side Discovery
The client queries the service registry and picks an instance. The client handles load balancing. Used by Netflix Eureka and Ribbon.
Server-Side Discovery
The client sends requests to a load balancer that queries the registry. Simpler for clients. Used by AWS ALB, Kubernetes Services.
DNS-Based Discovery
Services register DNS records. Clients resolve hostnames. Simple but DNS TTLs can cause stale routing. Used by Consul DNS and CoreDNS.
Service Mesh (Sidecar)
A sidecar proxy (like Envoy) runs next to each service instance. Handles discovery, load balancing, retries, and mTLS transparently. Used by Istio and Linkerd.
Circuit Breaker Pattern
When a downstream service is failing, continuing to send requests wastes resources and increases latency. The circuit breaker pattern detects failures and "opens the circuit" -- rejecting requests immediately rather than waiting for timeouts.
| State | Behavior | Transition |
|---|---|---|
| Closed | Normal operation. Requests pass through. Failures are counted. | Opens when failure threshold is exceeded |
| Open | All requests immediately fail with a fallback response. | After timeout, transitions to Half-Open |
| Half-Open | A limited number of test requests are sent through. | If test succeeds, closes. If fails, re-opens. |
import time
from enum import Enum
class State(Enum):
CLOSED = "closed"
OPEN = "open"
HALF_OPEN = "half_open"
class CircuitBreaker:
def __init__(self, failure_threshold=5, recovery_timeout=30):
self.failure_threshold = failure_threshold
self.recovery_timeout = recovery_timeout
self.state = State.CLOSED
self.failure_count = 0
self.last_failure_time = 0
def call(self, func, *args, **kwargs):
if self.state == State.OPEN:
if time.time() - self.last_failure_time > self.recovery_timeout:
self.state = State.HALF_OPEN
else:
raise CircuitOpenError("Circuit is open")
try:
result = func(*args, **kwargs)
self._on_success()
return result
except Exception as e:
self._on_failure()
raise
def _on_success(self):
self.failure_count = 0
self.state = State.CLOSED
def _on_failure(self):
self.failure_count += 1
self.last_failure_time = time.time()
if self.failure_count >= self.failure_threshold:
self.state = State.OPEN
# Usage
breaker = CircuitBreaker(failure_threshold=5, recovery_timeout=30)
try:
result = breaker.call(call_payment_service, order_id=123)
except CircuitOpenError:
# Return cached or default response
result = {"status": "pending", "message": "Service temporarily unavailable"}
Common Pitfall: Cascading Failures
Problem: Service A calls Service B, which calls Service C. If C is slow, B's threads are exhausted waiting, then A's threads are exhausted too. One slow service takes down the entire chain.
Solution: Use circuit breakers at every service boundary, set aggressive timeouts, and implement bulkheads (isolated thread pools per dependency) to contain failures.
Practice Problems
Medium API Gateway Design
Design an API gateway for a ride-sharing app with these requirements:
- Route requests to 5 backend services (user, ride, payment, notification, analytics)
- Different rate limits for free vs premium users
- Aggregate driver location + ride status into a single response for the mobile app
Use path-based routing (/api/rides, /api/users, etc.). For rate limiting, extract the user tier from the JWT. For aggregation, make parallel calls to ride and location services and merge responses.
# Gateway routing configuration
routes = {
'/api/users/**': 'user-service:8080',
'/api/rides/**': 'ride-service:8080',
'/api/payments/**': 'payment-service:8080',
}
# Tiered rate limiting
rate_limits = {
'free': 60, # 60 requests/min
'premium': 600, # 600 requests/min
}
# Response aggregation endpoint
async def get_ride_status(ride_id):
# Parallel calls to multiple services
ride, location = await asyncio.gather(
ride_service.get(ride_id),
location_service.get_driver(ride_id),
)
return {
'ride': ride,
'driver_location': location,
'eta': calculate_eta(location, ride['destination'])
}
Medium Service Discovery Migration
Your monolith is being split into microservices. Design the service discovery approach:
- Choose between client-side vs server-side discovery
- Handle service instances scaling from 2 to 50 during peak hours
- Ensure zero-downtime during deployments
Server-side discovery (Kubernetes Services or AWS ALB) is simpler for most teams. Use health checks to deregister unhealthy instances. Rolling deployments with readiness probes ensure zero downtime.
# Kubernetes-based service discovery (recommended)
# 1. Each microservice gets a Kubernetes Service
# 2. DNS resolves service-name.namespace.svc.cluster.local
# 3. kube-proxy handles load balancing across pods
# Service definition
apiVersion: v1
kind: Service
metadata:
name: user-service
spec:
selector:
app: user-service
ports:
- port: 80
targetPort: 8080
# Readiness probe for zero-downtime deploys
readinessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 5
periodSeconds: 10
Hard Resilient Gateway
Design a highly available API gateway that handles:
- 50,000 requests per second across 3 regions
- Automatic failover if one region's gateway goes down
- Circuit breakers, retries with exponential backoff, and bulkheads
Use DNS-based global load balancing (Route53 latency routing) across regions. Each region runs multiple gateway instances behind a local load balancer. Implement circuit breakers per-service and bulkheads (separate thread pools) per dependency.
# Multi-region gateway architecture
# 1. Global DNS (Route53) with health checks
# 2. Each region: 3+ gateway instances behind NLB
# 3. Per-service circuit breakers + bulkheads
class ResilientGateway:
def __init__(self):
# Separate circuit breakers per service
self.breakers = {
'user': CircuitBreaker(threshold=5, timeout=30),
'order': CircuitBreaker(threshold=5, timeout=30),
'payment': CircuitBreaker(threshold=3, timeout=60),
}
# Bulkheads: separate thread pools per service
self.pools = {
'user': ThreadPoolExecutor(max_workers=50),
'order': ThreadPoolExecutor(max_workers=50),
'payment': ThreadPoolExecutor(max_workers=20),
}
async def route(self, service, request):
breaker = self.breakers[service]
return await breaker.call(
self._call_with_retry, service, request
)
async def _call_with_retry(self, service, request, retries=3):
for attempt in range(retries):
try:
return await asyncio.wait_for(
self._forward(service, request),
timeout=5.0
)
except asyncio.TimeoutError:
await asyncio.sleep(0.1 * (2 ** attempt))
raise ServiceUnavailable(service)
Quick Reference
| Tool | Type | Best For |
|---|---|---|
| Nginx | Reverse proxy / LB | Static content, SSL termination, basic routing |
| HAProxy | TCP/HTTP load balancer | High-performance L4/L7 load balancing |
| Envoy | Service proxy | Service mesh sidecar, advanced observability |
| Kong | API gateway | Plugin ecosystem, authentication, rate limiting |
| AWS API Gateway | Managed gateway | Serverless APIs, Lambda integration |
| Traefik | Cloud-native proxy | Kubernetes ingress, auto-discovery |
Key Takeaways
- Forward proxies protect clients; reverse proxies protect servers
- API gateways consolidate cross-cutting concerns (auth, rate limiting, routing)
- Service discovery replaces hardcoded addresses with dynamic resolution
- Circuit breakers prevent cascading failures in distributed systems
- Bulkheads isolate failures by limiting resources per dependency
- Use the Backend-for-Frontend (BFF) pattern for platform-specific gateways