Microservices Architecture | LIZIU System Design

Why Microservices Matter

Why This Matters

The Problem: As applications grow, monolithic codebases become unwieldy. A single change can require redeploying the entire application, and scaling means duplicating everything.

The Solution: Microservices decompose an application into small, independently deployable services that each own their data and logic.

Real Impact: Netflix runs over 1,000 microservices, enabling them to deploy hundreds of times per day and serve 230+ million subscribers worldwide.

Real-World Analogy

Think of microservices like a food court versus a single restaurant:

Monolith = One restaurant that serves everything: pizza, sushi, burgers, desserts. If the pizza oven breaks, the whole restaurant might shut down.
Microservices = A food court with specialized stalls. Each stall operates independently, has its own kitchen, and can scale on its own. If the pizza stall is overwhelmed, it can add more workers without affecting the sushi stall.
API Gateway = The food court directory that helps customers find the right stall.
Message Queue = The order ticket system connecting front counters to kitchens.

Core Benefits

Independent Deployment

Each service can be deployed, updated, and scaled independently. No need to redeploy the entire application for a single change.

Technology Diversity

Each team can choose the best language, framework, and database for their service. Python for ML, Go for networking, Node for real-time.

Fault Isolation

A failure in one service does not cascade to bring down the entire system. Circuit breakers prevent cascading failures.

Team Autonomy

Small teams own their services end-to-end. They can move fast without coordinating deployments across the entire organization.

Monolith vs Microservices

Monolith vs Microservices Architecture

Aspect	Monolith	Microservices
Deployment	All-or-nothing deployment	Independent per service
Scaling	Scale entire application	Scale individual services
Tech Stack	Single language/framework	Polyglot (different per service)
Failure Impact	One bug can crash everything	Failures are isolated
Data	Shared database	Database per service
Complexity	Simple at first, hard to maintain	Complex upfront, easier long-term
Team Size	Works for small teams (<10)	Best for larger organizations

Common Pitfall: Premature Microservices

Problem: Many teams adopt microservices too early, before they understand their domain boundaries.

Solution: Start with a well-structured monolith ("monolith first"). Extract services only when you have clear bounded contexts and the team is large enough to justify the operational overhead. Martin Fowler calls this the "Monolith First" approach.

Service Communication

Microservices need to talk to each other. There are two fundamental patterns: synchronous (request/response) and asynchronous (event-driven).

Synchronous vs Asynchronous Communication

gRPC vs REST

Feature	REST	gRPC
Protocol	HTTP/1.1 (JSON)	HTTP/2 (Protocol Buffers)
Performance	Slower (text-based)	Faster (binary, streaming)
Contract	OpenAPI/Swagger (optional)	Strict .proto files (required)
Best For	Public APIs, web clients	Internal service-to-service

order_service.py

# A simple microservice using Flask
from flask import Flask, jsonify, request
import requests
import os

app = Flask(__name__)

# Each service has its own configuration
INVENTORY_SERVICE_URL = os.getenv("INVENTORY_URL", "http://inventory-service:5001")
PAYMENT_SERVICE_URL = os.getenv("PAYMENT_URL", "http://payment-service:5002")

class OrderService:
    def create_order(self, user_id, items):
        # Step 1: Check inventory (synchronous call)
        inventory_resp = requests.post(
            f"{INVENTORY_SERVICE_URL}/check",
            json={"items": items}
        )
        if not inventory_resp.json()["available"]:
            return {"error": "Items not available"}, 400

        # Step 2: Process payment (synchronous call)
        total = sum(item["price"] * item["qty"] for item in items)
        payment_resp = requests.post(
            f"{PAYMENT_SERVICE_URL}/charge",
            json={"user_id": user_id, "amount": total}
        )
        if payment_resp.status_code != 200:
            return {"error": "Payment failed"}, 402

        # Step 3: Create order record
        order = {
            "user_id": user_id,
            "items": items,
            "total": total,
            "status": "confirmed"
        }
        return order, 201

order_svc = OrderService()

@app.route("/orders", methods=["POST"])
def create_order():
    data = request.get_json()
    result, status = order_svc.create_order(data["user_id"], data["items"])
    return jsonify(result), status

if __name__ == "__main__":
    app.run(host="0.0.0.0", port=5000)

Service Discovery

In a microservices architecture, services need to find each other. Unlike monoliths where everything shares a process, microservices run on different hosts and ports that can change dynamically.

Service Discovery Patterns

Client-Side Discovery: The client queries a service registry (e.g., Netflix Eureka) and picks an instance. The client handles load balancing.
Server-Side Discovery: The client sends a request to a load balancer (e.g., AWS ALB, Kubernetes Services), which queries the registry and routes the request.
DNS-Based: Services register DNS records. Simple but has TTL caching issues.
Service Mesh: A sidecar proxy (e.g., Envoy in Istio) handles discovery transparently. The application code does not need to know about service discovery at all.

Service Registry

A database of available service instances. Services register on startup and deregister on shutdown. Examples: Consul, etcd, ZooKeeper, Kubernetes DNS.

Health Checks

The registry periodically pings services to verify they are healthy. Unhealthy instances are removed from the pool automatically.

Load Balancing

Once discovered, requests must be distributed across instances. Common strategies: round-robin, least connections, consistent hashing.

Data Management in Microservices

The Database-per-Service Pattern

Each microservice owns its data and exposes it only through its API. No direct database access from other services. This ensures loose coupling but introduces challenges around data consistency and joins across services.

Strategies for Cross-Service Queries

Pattern	How It Works	Trade-offs
API Composition	A composer service calls multiple services and joins results in memory	Simple but increases latency; no transactional guarantees
CQRS	Separate read and write models. Write to service DB, publish events, build read-optimized views	Great for read-heavy workloads; eventual consistency
Event Sourcing	Store events instead of current state. Rebuild state by replaying events	Full audit trail; complex to implement
Saga Pattern	Coordinate multi-service transactions through a sequence of local transactions and compensating actions	Handles distributed transactions without 2PC

Saga Pattern

Distributed transactions across microservices cannot use traditional ACID transactions. The Saga pattern breaks a transaction into a sequence of local transactions, each with a compensating action if something fails.

Two Saga Implementations

Choreography: Each service publishes events that trigger the next step. No central coordinator. Simple but hard to track for complex workflows.
Orchestration: A central orchestrator service tells each participant what to do. Easier to understand and debug, but the orchestrator can become a bottleneck.

saga_orchestrator.py

# Saga Orchestrator for Order Processing
from enum import Enum
from dataclasses import dataclass
from typing import List, Callable

class SagaStatus(Enum):
    PENDING = "pending"
    COMPLETED = "completed"
    COMPENSATING = "compensating"
    FAILED = "failed"

@dataclass
class SagaStep:
    name: str
    action: Callable       # The forward action
    compensate: Callable   # The rollback action

class SagaOrchestrator:
    def __init__(self, steps: List[SagaStep]):
        self.steps = steps
        self.completed_steps = []
        self.status = SagaStatus.PENDING

    def execute(self, context: dict) -> dict:
        """Execute all saga steps in order."""
        for step in self.steps:
            try:
                print(f"Executing: {step.name}")
                result = step.action(context)
                context.update(result or {})
                self.completed_steps.append(step)
            except Exception as e:
                print(f"Failed at: {step.name} - {e}")
                self._compensate(context)
                return {"status": "failed", "failed_at": step.name}

        self.status = SagaStatus.COMPLETED
        return {"status": "completed", "context": context}

    def _compensate(self, context: dict):
        """Roll back completed steps in reverse order."""
        self.status = SagaStatus.COMPENSATING
        for step in reversed(self.completed_steps):
            try:
                print(f"Compensating: {step.name}")
                step.compensate(context)
            except Exception as e:
                print(f"Compensation failed: {step.name} - {e}")
        self.status = SagaStatus.FAILED

# Example usage: Order processing saga
def reserve_inventory(ctx):
    print(f"  Reserving items: {ctx['items']}")
    return {"reservation_id": "RSV-001"}

def release_inventory(ctx):
    print(f"  Releasing reservation: {ctx['reservation_id']}")

def process_payment(ctx):
    print(f"  Charging ${ctx['total']}")
    return {"payment_id": "PAY-001"}

def refund_payment(ctx):
    print(f"  Refunding payment: {ctx['payment_id']}")

def confirm_order(ctx):
    print(f"  Order confirmed!")
    return {"order_id": "ORD-001"}

def cancel_order(ctx):
    print(f"  Cancelling order: {ctx.get('order_id')}")

saga = SagaOrchestrator([
    SagaStep("Reserve Inventory", reserve_inventory, release_inventory),
    SagaStep("Process Payment", process_payment, refund_payment),
    SagaStep("Confirm Order", confirm_order, cancel_order),
])

result = saga.execute({"items": ["widget"], "total": 29.99})
print(result)

Docker Compose for Local Development

docker-compose.yml

# Run the full microservices stack locally
version: "3.8"

services:
  api-gateway:
    image: nginx:alpine
    ports:
      - "80:80"
    volumes:
      - ./nginx.conf:/etc/nginx/nginx.conf
    depends_on:
      - order-service
      - inventory-service
      - payment-service

  order-service:
    build: ./services/orders
    environment:
      - INVENTORY_URL=http://inventory-service:5001
      - PAYMENT_URL=http://payment-service:5002
      - DATABASE_URL=postgres://orders_db:5432/orders
    depends_on:
      - orders-db
      - rabbitmq

  inventory-service:
    build: ./services/inventory
    environment:
      - DATABASE_URL=postgres://inventory_db:5432/inventory
    depends_on:
      - inventory-db

  payment-service:
    build: ./services/payments
    environment:
      - DATABASE_URL=postgres://payments_db:5432/payments
      - STRIPE_KEY=${STRIPE_KEY}
    depends_on:
      - payments-db

  # Each service gets its own database
  orders-db:
    image: postgres:15-alpine
    environment:
      POSTGRES_DB: orders
    volumes:
      - orders_data:/var/lib/postgresql/data

  inventory-db:
    image: postgres:15-alpine
    environment:
      POSTGRES_DB: inventory
    volumes:
      - inventory_data:/var/lib/postgresql/data

  payments-db:
    image: postgres:15-alpine
    environment:
      POSTGRES_DB: payments
    volumes:
      - payments_data:/var/lib/postgresql/data

  rabbitmq:
    image: rabbitmq:3-management
    ports:
      - "15672:15672"

volumes:
  orders_data:
  inventory_data:
  payments_data:

Practice Problems

Medium Design a Service Boundary

You are building an e-commerce platform. Identify the microservice boundaries:

List at least 5 services you would create
Define which data each service owns
Identify the synchronous vs asynchronous communication patterns between them
Explain how a user placing an order would flow through the services

Think about bounded contexts from Domain-Driven Design. Each service should own a single business capability. Consider: User, Catalog, Cart, Order, Payment, Inventory, Notification, Shipping.

# Microservice Boundaries for E-Commerce

# 1. User Service - owns user profiles, auth
#    DB: users, addresses, preferences

# 2. Catalog Service - owns product info
#    DB: products, categories, reviews

# 3. Cart Service - owns shopping carts
#    DB: carts (Redis for speed)

# 4. Order Service - owns order lifecycle
#    DB: orders, order_items, order_status

# 5. Payment Service - owns transactions
#    DB: payments, refunds

# 6. Inventory Service - owns stock levels
#    DB: stock, warehouses, reservations

# 7. Notification Service - sends emails/SMS
#    DB: templates, notification_log

# Order Flow:
# User clicks "Buy" -> Cart Service (sync)
#   -> Order Service creates order (sync)
#   -> Inventory Service reserves stock (sync)
#   -> Payment Service charges card (sync)
#   -> Order confirmed (event published)
#   -> Notification Service sends email (async)
#   -> Shipping Service creates shipment (async)

Medium Circuit Breaker Implementation

Implement a circuit breaker pattern that:

Tracks failure counts for an external service call
Opens the circuit after 5 consecutive failures
Returns a fallback response while the circuit is open
Attempts to close the circuit after a timeout period

Use three states: CLOSED (normal), OPEN (failing fast), HALF_OPEN (testing). Track failure count and last failure time. In OPEN state, check if enough time has passed before allowing a test request.

import time
from enum import Enum

class CircuitState(Enum):
    CLOSED = "closed"
    OPEN = "open"
    HALF_OPEN = "half_open"

class CircuitBreaker:
    def __init__(self, failure_threshold=5,
                 timeout=30):
        self.failure_threshold = failure_threshold
        self.timeout = timeout
        self.failure_count = 0
        self.last_failure = None
        self.state = CircuitState.CLOSED

    def call(self, func, fallback, *args):
        if self.state == CircuitState.OPEN:
            if self._timeout_expired():
                self.state = CircuitState.HALF_OPEN
            else:
                return fallback()

        try:
            result = func(*args)
            self._on_success()
            return result
        except Exception:
            self._on_failure()
            return fallback()

    def _on_success(self):
        self.failure_count = 0
        self.state = CircuitState.CLOSED

    def _on_failure(self):
        self.failure_count += 1
        self.last_failure = time.time()
        if self.failure_count >= self.failure_threshold:
            self.state = CircuitState.OPEN

    def _timeout_expired(self):
        return (time.time() - self.last_failure
                > self.timeout)

Hard Saga with Compensation

Design a saga for a travel booking system that books a flight, hotel, and car rental. If any step fails, all previous steps must be compensated (cancelled).

Define the saga steps and their compensating actions
Handle the case where the hotel booking fails after the flight is booked
Handle the case where a compensation itself fails (idempotent retries)

Each step must be idempotent. Store saga state in a database so it can be resumed after crashes. Use unique transaction IDs to ensure compensations can be retried safely.

# Travel Booking Saga with idempotent compensation
import uuid

class TravelBookingSaga:
    def __init__(self):
        self.saga_id = str(uuid.uuid4())
        self.state = {}  # persisted to DB

    def execute(self, trip):
        steps = [
            ("flight", self._book_flight, self._cancel_flight),
            ("hotel", self._book_hotel, self._cancel_hotel),
            ("car", self._book_car, self._cancel_car),
        ]
        completed = []
        for name, action, compensate in steps:
            try:
                ref = action(trip)
                self.state[name] = ref
                completed.append((name, compensate))
            except Exception as e:
                print(f"Failed: {name} - {e}")
                for cname, cfn in reversed(completed):
                    self._safe_compensate(cname, cfn, trip)
                return False
        return True

    def _safe_compensate(self, name, fn, trip,
                         max_retries=3):
        for attempt in range(max_retries):
            try:
                fn(trip, self.state[name])
                return
            except Exception:
                if attempt == max_retries - 1:
                    # Log for manual resolution
                    print(f"ALERT: {name} needs manual fix")

Quick Reference

Microservices Decision Framework

Question	Monolith	Microservices
Team size?	< 10 developers	> 10, multiple teams
Domain well understood?	Still exploring	Clear bounded contexts
Deployment frequency?	Weekly/monthly	Multiple times per day
Scale requirements?	Uniform scaling OK	Services scale independently
Operational maturity?	Limited DevOps	Strong CI/CD, monitoring

Key Patterns Summary

Essential Microservices Patterns

API Gateway: Single entry point that routes requests to services, handles auth, rate limiting
Circuit Breaker: Prevent cascading failures by failing fast when a downstream service is unhealthy
Service Mesh: Infrastructure layer (Istio, Linkerd) for service-to-service communication
Database per Service: Each service owns its data; no shared databases
Saga Pattern: Manage distributed transactions via compensating actions
CQRS: Separate read and write models for complex domains
Event Sourcing: Store state changes as immutable events
Strangler Fig: Gradually migrate from monolith by routing requests to new services