Proxies & API Gateways

Medium 22 min read

Why Proxies Matter

Why This Matters

The Problem: As your architecture grows from one monolith to dozens of microservices, clients need a single entry point. Without one, each client must know every service's address, handle authentication, rate limiting, and retries independently.

The Solution: Proxies and API gateways provide a unified entry point that handles cross-cutting concerns: routing, auth, rate limiting, load balancing, and observability -- all in one place.

Real Impact: Netflix's Zuul gateway processes over 50 billion requests per day, routing traffic to hundreds of backend microservices while enforcing security and resiliency policies.

Real-World Analogy

Think of a proxy like a hotel concierge:

  • Forward proxy = A travel agent who books on your behalf (hides your identity from the hotel)
  • Reverse proxy = The hotel concierge (a single point of contact for all hotel services)
  • API gateway = The concierge desk with a menu of services, security badge checks, and rate limits
  • Service discovery = The concierge's directory of internal extensions and room numbers

Forward vs Reverse Proxy

Forward vs Reverse Proxy
Forward Proxy (client-side) Client A Client B Forward Proxy Hides clients Internet Server Reverse Proxy (server-side) Client Reverse Proxy Hides servers Server 1 Server 2 Server 3 Forward proxy protects clients. Reverse proxy protects servers.
FeatureForward ProxyReverse Proxy
PositionClient sideServer side
PurposeHide client identity, filter content, bypass restrictionsLoad balancing, SSL termination, caching, security
Who knowsClient knows about proxy; server does notServer knows about proxy; client does not
ExamplesSquid, corporate web filters, VPNsNginx, HAProxy, Envoy, Cloudflare

API Gateway Pattern

An API gateway is a reverse proxy that acts as the single entry point for all client requests. Beyond simple routing, it handles authentication, rate limiting, request/response transformation, protocol translation, and aggregation of responses from multiple backend services.

API Gateway Architecture
Web App Mobile App 3rd Party API Gateway Auth & JWT validation Rate limiting Request routing Response aggregation Protocol translation Logging & metrics Circuit breaking User Service Order Service Product Service Payment Service Service Registry
nginx-reverse-proxy.conf
# Nginx as a reverse proxy and API gateway

upstream user_service {
    server 10.0.1.10:8080;
    server 10.0.1.11:8080;
    server 10.0.1.12:8080;
}

upstream order_service {
    server 10.0.2.10:8080;
    server 10.0.2.11:8080;
}

server {
    listen 443 ssl;
    server_name api.example.com;

    # SSL termination at the gateway
    ssl_certificate     /etc/ssl/api.example.com.crt;
    ssl_certificate_key /etc/ssl/api.example.com.key;

    # Rate limiting: 10 requests per second per IP
    limit_req_zone $binary_remote_addr zone=api:10m rate=10r/s;

    # Route /api/users to user_service
    location /api/users {
        limit_req zone=api burst=20;
        proxy_pass http://user_service;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    }

    # Route /api/orders to order_service
    location /api/orders {
        limit_req zone=api burst=20;
        proxy_pass http://order_service;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
    }
}

Service Discovery

In dynamic environments where services scale up and down constantly (like Kubernetes), hardcoding IP addresses is impractical. Service discovery automatically tracks which instances are available and where they are running.

Client-Side Discovery

The client queries the service registry and picks an instance. The client handles load balancing. Used by Netflix Eureka and Ribbon.

Server-Side Discovery

The client sends requests to a load balancer that queries the registry. Simpler for clients. Used by AWS ALB, Kubernetes Services.

DNS-Based Discovery

Services register DNS records. Clients resolve hostnames. Simple but DNS TTLs can cause stale routing. Used by Consul DNS and CoreDNS.

Service Mesh (Sidecar)

A sidecar proxy (like Envoy) runs next to each service instance. Handles discovery, load balancing, retries, and mTLS transparently. Used by Istio and Linkerd.

Circuit Breaker Pattern

When a downstream service is failing, continuing to send requests wastes resources and increases latency. The circuit breaker pattern detects failures and "opens the circuit" -- rejecting requests immediately rather than waiting for timeouts.

StateBehaviorTransition
ClosedNormal operation. Requests pass through. Failures are counted.Opens when failure threshold is exceeded
OpenAll requests immediately fail with a fallback response.After timeout, transitions to Half-Open
Half-OpenA limited number of test requests are sent through.If test succeeds, closes. If fails, re-opens.
circuit_breaker.py
import time
from enum import Enum

class State(Enum):
    CLOSED = "closed"
    OPEN = "open"
    HALF_OPEN = "half_open"

class CircuitBreaker:
    def __init__(self, failure_threshold=5, recovery_timeout=30):
        self.failure_threshold = failure_threshold
        self.recovery_timeout = recovery_timeout
        self.state = State.CLOSED
        self.failure_count = 0
        self.last_failure_time = 0

    def call(self, func, *args, **kwargs):
        if self.state == State.OPEN:
            if time.time() - self.last_failure_time > self.recovery_timeout:
                self.state = State.HALF_OPEN
            else:
                raise CircuitOpenError("Circuit is open")

        try:
            result = func(*args, **kwargs)
            self._on_success()
            return result
        except Exception as e:
            self._on_failure()
            raise

    def _on_success(self):
        self.failure_count = 0
        self.state = State.CLOSED

    def _on_failure(self):
        self.failure_count += 1
        self.last_failure_time = time.time()
        if self.failure_count >= self.failure_threshold:
            self.state = State.OPEN

# Usage
breaker = CircuitBreaker(failure_threshold=5, recovery_timeout=30)
try:
    result = breaker.call(call_payment_service, order_id=123)
except CircuitOpenError:
    # Return cached or default response
    result = {"status": "pending", "message": "Service temporarily unavailable"}

Common Pitfall: Cascading Failures

Problem: Service A calls Service B, which calls Service C. If C is slow, B's threads are exhausted waiting, then A's threads are exhausted too. One slow service takes down the entire chain.

Solution: Use circuit breakers at every service boundary, set aggressive timeouts, and implement bulkheads (isolated thread pools per dependency) to contain failures.

Practice Problems

Medium API Gateway Design

Design an API gateway for a ride-sharing app with these requirements:

  1. Route requests to 5 backend services (user, ride, payment, notification, analytics)
  2. Different rate limits for free vs premium users
  3. Aggregate driver location + ride status into a single response for the mobile app

Use path-based routing (/api/rides, /api/users, etc.). For rate limiting, extract the user tier from the JWT. For aggregation, make parallel calls to ride and location services and merge responses.

# Gateway routing configuration
routes = {
    '/api/users/**':    'user-service:8080',
    '/api/rides/**':    'ride-service:8080',
    '/api/payments/**': 'payment-service:8080',
}

# Tiered rate limiting
rate_limits = {
    'free': 60,      # 60 requests/min
    'premium': 600,  # 600 requests/min
}

# Response aggregation endpoint
async def get_ride_status(ride_id):
    # Parallel calls to multiple services
    ride, location = await asyncio.gather(
        ride_service.get(ride_id),
        location_service.get_driver(ride_id),
    )
    return {
        'ride': ride,
        'driver_location': location,
        'eta': calculate_eta(location, ride['destination'])
    }

Medium Service Discovery Migration

Your monolith is being split into microservices. Design the service discovery approach:

  1. Choose between client-side vs server-side discovery
  2. Handle service instances scaling from 2 to 50 during peak hours
  3. Ensure zero-downtime during deployments

Server-side discovery (Kubernetes Services or AWS ALB) is simpler for most teams. Use health checks to deregister unhealthy instances. Rolling deployments with readiness probes ensure zero downtime.

# Kubernetes-based service discovery (recommended)
# 1. Each microservice gets a Kubernetes Service
# 2. DNS resolves service-name.namespace.svc.cluster.local
# 3. kube-proxy handles load balancing across pods

# Service definition
apiVersion: v1
kind: Service
metadata:
  name: user-service
spec:
  selector:
    app: user-service
  ports:
    - port: 80
      targetPort: 8080

# Readiness probe for zero-downtime deploys
readinessProbe:
  httpGet:
    path: /health
    port: 8080
  initialDelaySeconds: 5
  periodSeconds: 10

Hard Resilient Gateway

Design a highly available API gateway that handles:

  1. 50,000 requests per second across 3 regions
  2. Automatic failover if one region's gateway goes down
  3. Circuit breakers, retries with exponential backoff, and bulkheads

Use DNS-based global load balancing (Route53 latency routing) across regions. Each region runs multiple gateway instances behind a local load balancer. Implement circuit breakers per-service and bulkheads (separate thread pools) per dependency.

# Multi-region gateway architecture
# 1. Global DNS (Route53) with health checks
# 2. Each region: 3+ gateway instances behind NLB
# 3. Per-service circuit breakers + bulkheads

class ResilientGateway:
    def __init__(self):
        # Separate circuit breakers per service
        self.breakers = {
            'user': CircuitBreaker(threshold=5, timeout=30),
            'order': CircuitBreaker(threshold=5, timeout=30),
            'payment': CircuitBreaker(threshold=3, timeout=60),
        }
        # Bulkheads: separate thread pools per service
        self.pools = {
            'user': ThreadPoolExecutor(max_workers=50),
            'order': ThreadPoolExecutor(max_workers=50),
            'payment': ThreadPoolExecutor(max_workers=20),
        }

    async def route(self, service, request):
        breaker = self.breakers[service]
        return await breaker.call(
            self._call_with_retry, service, request
        )

    async def _call_with_retry(self, service, request, retries=3):
        for attempt in range(retries):
            try:
                return await asyncio.wait_for(
                    self._forward(service, request),
                    timeout=5.0
                )
            except asyncio.TimeoutError:
                await asyncio.sleep(0.1 * (2 ** attempt))
        raise ServiceUnavailable(service)

Quick Reference

ToolTypeBest For
NginxReverse proxy / LBStatic content, SSL termination, basic routing
HAProxyTCP/HTTP load balancerHigh-performance L4/L7 load balancing
EnvoyService proxyService mesh sidecar, advanced observability
KongAPI gatewayPlugin ecosystem, authentication, rate limiting
AWS API GatewayManaged gatewayServerless APIs, Lambda integration
TraefikCloud-native proxyKubernetes ingress, auto-discovery

Key Takeaways

  • Forward proxies protect clients; reverse proxies protect servers
  • API gateways consolidate cross-cutting concerns (auth, rate limiting, routing)
  • Service discovery replaces hardcoded addresses with dynamic resolution
  • Circuit breakers prevent cascading failures in distributed systems
  • Bulkheads isolate failures by limiting resources per dependency
  • Use the Backend-for-Frontend (BFF) pattern for platform-specific gateways