Microservices Architecture

Medium 30 min read

Why Microservices Matter

Why This Matters

The Problem: As applications grow, monolithic codebases become unwieldy. A single change can require redeploying the entire application, and scaling means duplicating everything.

The Solution: Microservices decompose an application into small, independently deployable services that each own their data and logic.

Real Impact: Netflix runs over 1,000 microservices, enabling them to deploy hundreds of times per day and serve 230+ million subscribers worldwide.

Real-World Analogy

Think of microservices like a food court versus a single restaurant:

  • Monolith = One restaurant that serves everything: pizza, sushi, burgers, desserts. If the pizza oven breaks, the whole restaurant might shut down.
  • Microservices = A food court with specialized stalls. Each stall operates independently, has its own kitchen, and can scale on its own. If the pizza stall is overwhelmed, it can add more workers without affecting the sushi stall.
  • API Gateway = The food court directory that helps customers find the right stall.
  • Message Queue = The order ticket system connecting front counters to kitchens.

Core Benefits

Independent Deployment

Each service can be deployed, updated, and scaled independently. No need to redeploy the entire application for a single change.

Technology Diversity

Each team can choose the best language, framework, and database for their service. Python for ML, Go for networking, Node for real-time.

Fault Isolation

A failure in one service does not cascade to bring down the entire system. Circuit breakers prevent cascading failures.

Team Autonomy

Small teams own their services end-to-end. They can move fast without coordinating deployments across the entire organization.

Monolith vs Microservices

Monolith vs Microservices Architecture
Monolith Single Deployment Unit User Interface Business Logic Auth Orders Payment Inventory Single Database Shared Data Layer Microservices Independent Services API Gateway Auth Service + own DB Order Service + own DB Payment Svc + own DB Inventory Svc + own DB Message Bus (Kafka / RabbitMQ) Service Mesh / Monitoring
Aspect Monolith Microservices
Deployment All-or-nothing deployment Independent per service
Scaling Scale entire application Scale individual services
Tech Stack Single language/framework Polyglot (different per service)
Failure Impact One bug can crash everything Failures are isolated
Data Shared database Database per service
Complexity Simple at first, hard to maintain Complex upfront, easier long-term
Team Size Works for small teams (<10) Best for larger organizations

Common Pitfall: Premature Microservices

Problem: Many teams adopt microservices too early, before they understand their domain boundaries.

Solution: Start with a well-structured monolith ("monolith first"). Extract services only when you have clear bounded contexts and the team is large enough to justify the operational overhead. Martin Fowler calls this the "Monolith First" approach.

Service Communication

Microservices need to talk to each other. There are two fundamental patterns: synchronous (request/response) and asynchronous (event-driven).

Synchronous vs Asynchronous Communication
Synchronous (REST / gRPC) Service A Service B request response A waits for B to respond Simple but creates coupling Asynchronous (Events / MQ) Service A Queue Service B (consumer) publish consume A does not wait for B Decoupled but more complex Synchronous Asynchronous REST, gRPC, GraphQL Immediate response needed Tight coupling between services Cascading failure risk Kafka, RabbitMQ, SQS Fire-and-forget, eventual Loose coupling Better fault tolerance

gRPC vs REST

Feature REST gRPC
Protocol HTTP/1.1 (JSON) HTTP/2 (Protocol Buffers)
Performance Slower (text-based) Faster (binary, streaming)
Contract OpenAPI/Swagger (optional) Strict .proto files (required)
Best For Public APIs, web clients Internal service-to-service
order_service.py
# A simple microservice using Flask
from flask import Flask, jsonify, request
import requests
import os

app = Flask(__name__)

# Each service has its own configuration
INVENTORY_SERVICE_URL = os.getenv("INVENTORY_URL", "http://inventory-service:5001")
PAYMENT_SERVICE_URL = os.getenv("PAYMENT_URL", "http://payment-service:5002")

class OrderService:
    def create_order(self, user_id, items):
        # Step 1: Check inventory (synchronous call)
        inventory_resp = requests.post(
            f"{INVENTORY_SERVICE_URL}/check",
            json={"items": items}
        )
        if not inventory_resp.json()["available"]:
            return {"error": "Items not available"}, 400

        # Step 2: Process payment (synchronous call)
        total = sum(item["price"] * item["qty"] for item in items)
        payment_resp = requests.post(
            f"{PAYMENT_SERVICE_URL}/charge",
            json={"user_id": user_id, "amount": total}
        )
        if payment_resp.status_code != 200:
            return {"error": "Payment failed"}, 402

        # Step 3: Create order record
        order = {
            "user_id": user_id,
            "items": items,
            "total": total,
            "status": "confirmed"
        }
        return order, 201

order_svc = OrderService()

@app.route("/orders", methods=["POST"])
def create_order():
    data = request.get_json()
    result, status = order_svc.create_order(data["user_id"], data["items"])
    return jsonify(result), status

if __name__ == "__main__":
    app.run(host="0.0.0.0", port=5000)

Service Discovery

In a microservices architecture, services need to find each other. Unlike monoliths where everything shares a process, microservices run on different hosts and ports that can change dynamically.

Service Discovery Patterns

  • Client-Side Discovery: The client queries a service registry (e.g., Netflix Eureka) and picks an instance. The client handles load balancing.
  • Server-Side Discovery: The client sends a request to a load balancer (e.g., AWS ALB, Kubernetes Services), which queries the registry and routes the request.
  • DNS-Based: Services register DNS records. Simple but has TTL caching issues.
  • Service Mesh: A sidecar proxy (e.g., Envoy in Istio) handles discovery transparently. The application code does not need to know about service discovery at all.

Service Registry

A database of available service instances. Services register on startup and deregister on shutdown. Examples: Consul, etcd, ZooKeeper, Kubernetes DNS.

Health Checks

The registry periodically pings services to verify they are healthy. Unhealthy instances are removed from the pool automatically.

Load Balancing

Once discovered, requests must be distributed across instances. Common strategies: round-robin, least connections, consistent hashing.

Data Management in Microservices

The Database-per-Service Pattern

Each microservice owns its data and exposes it only through its API. No direct database access from other services. This ensures loose coupling but introduces challenges around data consistency and joins across services.

Strategies for Cross-Service Queries

Pattern How It Works Trade-offs
API Composition A composer service calls multiple services and joins results in memory Simple but increases latency; no transactional guarantees
CQRS Separate read and write models. Write to service DB, publish events, build read-optimized views Great for read-heavy workloads; eventual consistency
Event Sourcing Store events instead of current state. Rebuild state by replaying events Full audit trail; complex to implement
Saga Pattern Coordinate multi-service transactions through a sequence of local transactions and compensating actions Handles distributed transactions without 2PC

Saga Pattern

Distributed transactions across microservices cannot use traditional ACID transactions. The Saga pattern breaks a transaction into a sequence of local transactions, each with a compensating action if something fails.

Two Saga Implementations

  • Choreography: Each service publishes events that trigger the next step. No central coordinator. Simple but hard to track for complex workflows.
  • Orchestration: A central orchestrator service tells each participant what to do. Easier to understand and debug, but the orchestrator can become a bottleneck.
saga_orchestrator.py
# Saga Orchestrator for Order Processing
from enum import Enum
from dataclasses import dataclass
from typing import List, Callable

class SagaStatus(Enum):
    PENDING = "pending"
    COMPLETED = "completed"
    COMPENSATING = "compensating"
    FAILED = "failed"

@dataclass
class SagaStep:
    name: str
    action: Callable       # The forward action
    compensate: Callable   # The rollback action

class SagaOrchestrator:
    def __init__(self, steps: List[SagaStep]):
        self.steps = steps
        self.completed_steps = []
        self.status = SagaStatus.PENDING

    def execute(self, context: dict) -> dict:
        """Execute all saga steps in order."""
        for step in self.steps:
            try:
                print(f"Executing: {step.name}")
                result = step.action(context)
                context.update(result or {})
                self.completed_steps.append(step)
            except Exception as e:
                print(f"Failed at: {step.name} - {e}")
                self._compensate(context)
                return {"status": "failed", "failed_at": step.name}

        self.status = SagaStatus.COMPLETED
        return {"status": "completed", "context": context}

    def _compensate(self, context: dict):
        """Roll back completed steps in reverse order."""
        self.status = SagaStatus.COMPENSATING
        for step in reversed(self.completed_steps):
            try:
                print(f"Compensating: {step.name}")
                step.compensate(context)
            except Exception as e:
                print(f"Compensation failed: {step.name} - {e}")
        self.status = SagaStatus.FAILED

# Example usage: Order processing saga
def reserve_inventory(ctx):
    print(f"  Reserving items: {ctx['items']}")
    return {"reservation_id": "RSV-001"}

def release_inventory(ctx):
    print(f"  Releasing reservation: {ctx['reservation_id']}")

def process_payment(ctx):
    print(f"  Charging ${ctx['total']}")
    return {"payment_id": "PAY-001"}

def refund_payment(ctx):
    print(f"  Refunding payment: {ctx['payment_id']}")

def confirm_order(ctx):
    print(f"  Order confirmed!")
    return {"order_id": "ORD-001"}

def cancel_order(ctx):
    print(f"  Cancelling order: {ctx.get('order_id')}")

saga = SagaOrchestrator([
    SagaStep("Reserve Inventory", reserve_inventory, release_inventory),
    SagaStep("Process Payment", process_payment, refund_payment),
    SagaStep("Confirm Order", confirm_order, cancel_order),
])

result = saga.execute({"items": ["widget"], "total": 29.99})
print(result)

Docker Compose for Local Development

docker-compose.yml
# Run the full microservices stack locally
version: "3.8"

services:
  api-gateway:
    image: nginx:alpine
    ports:
      - "80:80"
    volumes:
      - ./nginx.conf:/etc/nginx/nginx.conf
    depends_on:
      - order-service
      - inventory-service
      - payment-service

  order-service:
    build: ./services/orders
    environment:
      - INVENTORY_URL=http://inventory-service:5001
      - PAYMENT_URL=http://payment-service:5002
      - DATABASE_URL=postgres://orders_db:5432/orders
    depends_on:
      - orders-db
      - rabbitmq

  inventory-service:
    build: ./services/inventory
    environment:
      - DATABASE_URL=postgres://inventory_db:5432/inventory
    depends_on:
      - inventory-db

  payment-service:
    build: ./services/payments
    environment:
      - DATABASE_URL=postgres://payments_db:5432/payments
      - STRIPE_KEY=${STRIPE_KEY}
    depends_on:
      - payments-db

  # Each service gets its own database
  orders-db:
    image: postgres:15-alpine
    environment:
      POSTGRES_DB: orders
    volumes:
      - orders_data:/var/lib/postgresql/data

  inventory-db:
    image: postgres:15-alpine
    environment:
      POSTGRES_DB: inventory
    volumes:
      - inventory_data:/var/lib/postgresql/data

  payments-db:
    image: postgres:15-alpine
    environment:
      POSTGRES_DB: payments
    volumes:
      - payments_data:/var/lib/postgresql/data

  rabbitmq:
    image: rabbitmq:3-management
    ports:
      - "15672:15672"

volumes:
  orders_data:
  inventory_data:
  payments_data:

Practice Problems

Medium Design a Service Boundary

You are building an e-commerce platform. Identify the microservice boundaries:

  1. List at least 5 services you would create
  2. Define which data each service owns
  3. Identify the synchronous vs asynchronous communication patterns between them
  4. Explain how a user placing an order would flow through the services

Think about bounded contexts from Domain-Driven Design. Each service should own a single business capability. Consider: User, Catalog, Cart, Order, Payment, Inventory, Notification, Shipping.

# Microservice Boundaries for E-Commerce

# 1. User Service - owns user profiles, auth
#    DB: users, addresses, preferences

# 2. Catalog Service - owns product info
#    DB: products, categories, reviews

# 3. Cart Service - owns shopping carts
#    DB: carts (Redis for speed)

# 4. Order Service - owns order lifecycle
#    DB: orders, order_items, order_status

# 5. Payment Service - owns transactions
#    DB: payments, refunds

# 6. Inventory Service - owns stock levels
#    DB: stock, warehouses, reservations

# 7. Notification Service - sends emails/SMS
#    DB: templates, notification_log

# Order Flow:
# User clicks "Buy" -> Cart Service (sync)
#   -> Order Service creates order (sync)
#   -> Inventory Service reserves stock (sync)
#   -> Payment Service charges card (sync)
#   -> Order confirmed (event published)
#   -> Notification Service sends email (async)
#   -> Shipping Service creates shipment (async)

Medium Circuit Breaker Implementation

Implement a circuit breaker pattern that:

  1. Tracks failure counts for an external service call
  2. Opens the circuit after 5 consecutive failures
  3. Returns a fallback response while the circuit is open
  4. Attempts to close the circuit after a timeout period

Use three states: CLOSED (normal), OPEN (failing fast), HALF_OPEN (testing). Track failure count and last failure time. In OPEN state, check if enough time has passed before allowing a test request.

import time
from enum import Enum

class CircuitState(Enum):
    CLOSED = "closed"
    OPEN = "open"
    HALF_OPEN = "half_open"

class CircuitBreaker:
    def __init__(self, failure_threshold=5,
                 timeout=30):
        self.failure_threshold = failure_threshold
        self.timeout = timeout
        self.failure_count = 0
        self.last_failure = None
        self.state = CircuitState.CLOSED

    def call(self, func, fallback, *args):
        if self.state == CircuitState.OPEN:
            if self._timeout_expired():
                self.state = CircuitState.HALF_OPEN
            else:
                return fallback()

        try:
            result = func(*args)
            self._on_success()
            return result
        except Exception:
            self._on_failure()
            return fallback()

    def _on_success(self):
        self.failure_count = 0
        self.state = CircuitState.CLOSED

    def _on_failure(self):
        self.failure_count += 1
        self.last_failure = time.time()
        if self.failure_count >= self.failure_threshold:
            self.state = CircuitState.OPEN

    def _timeout_expired(self):
        return (time.time() - self.last_failure
                > self.timeout)

Hard Saga with Compensation

Design a saga for a travel booking system that books a flight, hotel, and car rental. If any step fails, all previous steps must be compensated (cancelled).

  1. Define the saga steps and their compensating actions
  2. Handle the case where the hotel booking fails after the flight is booked
  3. Handle the case where a compensation itself fails (idempotent retries)

Each step must be idempotent. Store saga state in a database so it can be resumed after crashes. Use unique transaction IDs to ensure compensations can be retried safely.

# Travel Booking Saga with idempotent compensation
import uuid

class TravelBookingSaga:
    def __init__(self):
        self.saga_id = str(uuid.uuid4())
        self.state = {}  # persisted to DB

    def execute(self, trip):
        steps = [
            ("flight", self._book_flight, self._cancel_flight),
            ("hotel", self._book_hotel, self._cancel_hotel),
            ("car", self._book_car, self._cancel_car),
        ]
        completed = []
        for name, action, compensate in steps:
            try:
                ref = action(trip)
                self.state[name] = ref
                completed.append((name, compensate))
            except Exception as e:
                print(f"Failed: {name} - {e}")
                for cname, cfn in reversed(completed):
                    self._safe_compensate(cname, cfn, trip)
                return False
        return True

    def _safe_compensate(self, name, fn, trip,
                         max_retries=3):
        for attempt in range(max_retries):
            try:
                fn(trip, self.state[name])
                return
            except Exception:
                if attempt == max_retries - 1:
                    # Log for manual resolution
                    print(f"ALERT: {name} needs manual fix")

Quick Reference

Microservices Decision Framework

Question Monolith Microservices
Team size? < 10 developers > 10, multiple teams
Domain well understood? Still exploring Clear bounded contexts
Deployment frequency? Weekly/monthly Multiple times per day
Scale requirements? Uniform scaling OK Services scale independently
Operational maturity? Limited DevOps Strong CI/CD, monitoring

Key Patterns Summary

Essential Microservices Patterns

  • API Gateway: Single entry point that routes requests to services, handles auth, rate limiting
  • Circuit Breaker: Prevent cascading failures by failing fast when a downstream service is unhealthy
  • Service Mesh: Infrastructure layer (Istio, Linkerd) for service-to-service communication
  • Database per Service: Each service owns its data; no shared databases
  • Saga Pattern: Manage distributed transactions via compensating actions
  • CQRS: Separate read and write models for complex domains
  • Event Sourcing: Store state changes as immutable events
  • Strangler Fig: Gradually migrate from monolith by routing requests to new services