⏱️ 55 min read 🎯 Beginner Friendly 🔧 Hands-On Examples 📚 Complete Guide

Introduction to Microservices Patterns

Design patterns are proven solutions to common problems. Think of them as recipes that successful companies use to build reliable systems.

Saga Pattern - Managing Distributed Transactions

The Problem

In a monolith, you can wrap everything in a database transaction. In microservices, each service has its own database - how do you ensure consistency?

Solution: Break the transaction into a series of local transactions, each with a compensating action if something fails.

Python - Simple Saga Pattern
class OrderSaga:
    def create_order(self, order_data):
        # Step 1: Create order
        order = order_service.create(order_data)

        try:
            # Step 2: Reserve inventory
            inventory_service.reserve(order.items)
        except Exception:
            # Compensate: Cancel order
            order_service.cancel(order.id)
            raise

        try:
            # Step 3: Process payment
            payment_service.charge(order.total)
        except Exception:
            # Compensate: Release inventory and cancel order
            inventory_service.release(order.items)
            order_service.cancel(order.id)
            raise

        return order

CQRS - Command Query Responsibility Segregation

Separate read and write operations for better performance and scalability.

  • Commands: Create, Update, Delete (Write operations)
  • Queries: Read operations
  • Benefit: Optimize each side independently
Python - CQRS Example
# Write Model - Commands
class OrderCommandService:
    def create_order(self, order_data):
        order = Order(**order_data)
        db.session.add(order)
        db.session.commit()
        # Publish event
        event_bus.publish('OrderCreated', order)

# Read Model - Queries
class OrderQueryService:
    def get_order_summary(self, order_id):
        # Optimized read from denormalized view
        return read_db.query(OrderSummary).filter_by(id=order_id).first()

Core Architecture Patterns

Patterns for complex scenarios like event sourcing, strangler fig migration, and more.

Event Sourcing

Store all changes as a sequence of events instead of just the current state.

Traditional Event Sourcing
Store current state only Store all state changes
Lost history Complete audit trail
Can't rebuild state Replay events to rebuild
Python - Event Sourcing
class AccountEventStore:
    def __init__(self):
        self.events = []

    def apply_event(self, event):
        self.events.append(event)

    def get_current_state(self, account_id):
        # Rebuild state from events
        balance = 0
        for event in self.events:
            if event.account_id == account_id:
                if event.type == 'DEPOSITED':
                    balance += event.amount
                elif event.type == 'WITHDRAWN':
                    balance -= event.amount
        return balance

# Usage
store = AccountEventStore()
store.apply_event(Event('DEPOSITED', account_id=123, amount=100))
store.apply_event(Event('WITHDRAWN', account_id=123, amount=30))
current_balance = store.get_current_state(123)  # Returns 70

Strangler Fig Pattern

Gradually migrate from monolith to microservices without a big-bang rewrite.

Amazon's Strangler Fig Migration

Amazon migrated their monolith over years:

  • Step 1: Identify bounded contexts
  • Step 2: Extract one service at a time
  • Step 3: Route new traffic to new service
  • Step 4: Migrate old traffic gradually
  • Step 5: Decommission monolith code
Anti-Patterns to Avoid
  • Distributed Monolith: Services too tightly coupled
  • Shared Database: Services sharing same database
  • Chatty Services: Too many network calls

Resilience & Infrastructure Patterns

Patterns that make your microservices robust, fault-tolerant, and production-ready.

Circuit Breaker Pattern

Prevent cascading failures by failing fast when a service is unavailable.

Why It Matters

Without circuit breakers, one failing service can bring down your entire system through cascading failures and resource exhaustion.

States:

  • Closed: Normal operation, requests pass through
  • Open: Too many failures, all requests fail fast
  • Half-Open: Test if service recovered
Python - Circuit Breaker Implementation
from enum import Enum
from datetime import datetime, timedelta

class CircuitState(Enum):
    CLOSED = "closed"
    OPEN = "open"
    HALF_OPEN = "half_open"

class CircuitBreaker:
    def __init__(self, failure_threshold=5, timeout=60):
        self.failure_threshold = failure_threshold
        self.timeout = timeout
        self.failures = 0
        self.last_failure_time = None
        self.state = CircuitState.CLOSED

    def call(self, func, *args, **kwargs):
        if self.state == CircuitState.OPEN:
            if self._should_attempt_reset():
                self.state = CircuitState.HALF_OPEN
            else:
                raise Exception("Circuit breaker is OPEN")

        try:
            result = func(*args, **kwargs)
            self._on_success()
            return result
        except Exception as e:
            self._on_failure()
            raise e

    def _on_success(self):
        self.failures = 0
        self.state = CircuitState.CLOSED

    def _on_failure(self):
        self.failures += 1
        self.last_failure_time = datetime.now()
        if self.failures >= self.failure_threshold:
            self.state = CircuitState.OPEN

    def _should_attempt_reset(self):
        return (datetime.now() - self.last_failure_time).seconds >= self.timeout

# Usage
breaker = CircuitBreaker(failure_threshold=3, timeout=30)
try:
    result = breaker.call(external_service.get_data)
except Exception:
    # Fall back to cached data or default response
    result = get_cached_data()

Bulkhead Pattern

Isolate resources to prevent one failing component from consuming all resources.

Python - Thread Pool Bulkhead
from concurrent.futures import ThreadPoolExecutor
from functools import wraps

class Bulkhead:
    def __init__(self, max_workers=10):
        self.executor = ThreadPoolExecutor(max_workers=max_workers)

    def execute(self, func, *args, **kwargs):
        future = self.executor.submit(func, *args, **kwargs)
        return future.result(timeout=5)  # 5 second timeout

# Separate bulkheads for different services
payment_bulkhead = Bulkhead(max_workers=5)
inventory_bulkhead = Bulkhead(max_workers=10)

# Payment service gets 5 threads max
payment_result = payment_bulkhead.execute(payment_service.charge, order)

# Inventory service gets 10 threads max
inventory_result = inventory_bulkhead.execute(inventory_service.reserve, items)

Retry Pattern with Exponential Backoff

Automatically retry failed requests with increasing delays.

Python - Retry with Backoff
import time
from functools import wraps

def retry_with_backoff(max_retries=3, base_delay=1, max_delay=60):
    def decorator(func):
        @wraps(func)
        def wrapper(*args, **kwargs):
            retries = 0
            delay = base_delay

            while retries < max_retries:
                try:
                    return func(*args, **kwargs)
                except Exception as e:
                    retries += 1
                    if retries >= max_retries:
                        raise e

                    # Exponential backoff: 1s, 2s, 4s, 8s...
                    time.sleep(min(delay, max_delay))
                    delay *= 2
                    print(f"Retry {retries}/{max_retries} after {delay}s")

            return wrapper
    return decorator

@retry_with_backoff(max_retries=3, base_delay=1)
def call_external_api():
    response = requests.get("https://api.example.com/data")
    response.raise_for_status()
    return response.json()

Rate Limiting Pattern

Control request rate to prevent overload and ensure fair usage.

Algorithm How It Works Best For
Token Bucket Tokens refill at fixed rate, consume per request Smooth traffic with bursts
Leaky Bucket Process requests at constant rate Strict rate control
Fixed Window X requests per time window Simple implementation
Sliding Window Rolling time window More accurate rate limiting

API Gateway Pattern

Single entry point that routes requests, handles cross-cutting concerns.

Responsibilities:

  • Request routing to appropriate microservices
  • Authentication and authorization
  • Rate limiting and throttling
  • Request/response transformation
  • Caching
  • Load balancing

Hands-On Implementation

Complete, production-ready implementations of core patterns.

Saga Pattern - Complete Orchestration

Python - Saga Orchestrator
class SagaStep:
    def __init__(self, action, compensation):
        self.action = action
        self.compensation = compensation

class SagaOrchestrator:
    def __init__(self):
        self.steps = []
        self.completed_steps = []

    def add_step(self, action, compensation):
        self.steps.append(SagaStep(action, compensation))

    def execute(self):
        try:
            for step in self.steps:
                result = step.action()
                self.completed_steps.append(step)
            return {"status": "success"}
        except Exception as e:
            # Compensate in reverse order
            for step in reversed(self.completed_steps):
                try:
                    step.compensation()
                except Exception as comp_error:
                    print(f"Compensation failed: {comp_error}")
            raise e

# Order Creation Saga
def create_order_saga(order_data):
    saga = SagaOrchestrator()

    # Step 1: Create Order
    saga.add_step(
        action=lambda: order_service.create(order_data),
        compensation=lambda: order_service.cancel(order_id)
    )

    # Step 2: Reserve Inventory
    saga.add_step(
        action=lambda: inventory_service.reserve(order_data['items']),
        compensation=lambda: inventory_service.release(order_data['items'])
    )

    # Step 3: Process Payment
    saga.add_step(
        action=lambda: payment_service.charge(order_data['total']),
        compensation=lambda: payment_service.refund(transaction_id)
    )

    # Step 4: Ship Order
    saga.add_step(
        action=lambda: shipping_service.create_shipment(order_id),
        compensation=lambda: shipping_service.cancel_shipment(shipment_id)
    )

    return saga.execute()

CQRS with Event-Driven Updates

Python - CQRS Implementation
from abc import ABC, abstractmethod
from dataclasses import dataclass
from datetime import datetime

# Command Side (Write Model)
@dataclass
class CreateOrderCommand:
    customer_id: str
    items: list
    total: float

class OrderCommandHandler:
    def __init__(self, event_bus):
        self.event_bus = event_bus

    def handle(self, command: CreateOrderCommand):
        # Create order in write database
        order = Order(
            id=generate_id(),
            customer_id=command.customer_id,
            items=command.items,
            total=command.total,
            status="PENDING",
            created_at=datetime.now()
        )

        write_db.orders.insert(order)

        # Publish event for read model
        self.event_bus.publish(OrderCreatedEvent(
            order_id=order.id,
            customer_id=order.customer_id,
            total=order.total,
            timestamp=order.created_at
        ))

        return order.id

# Query Side (Read Model)
class OrderQueryHandler:
    def get_order_summary(self, order_id):
        # Query optimized read model
        return read_db.order_summaries.find_one({"order_id": order_id})

    def get_customer_orders(self, customer_id):
        # Denormalized view for fast queries
        return read_db.customer_orders.find({"customer_id": customer_id})

# Event Handler - Updates Read Model
class OrderEventHandler:
    def on_order_created(self, event: OrderCreatedEvent):
        # Update denormalized read model
        read_db.order_summaries.insert({
            "order_id": event.order_id,
            "customer_id": event.customer_id,
            "total": event.total,
            "status": "PENDING",
            "created_at": event.timestamp
        })

        read_db.customer_orders.insert({
            "customer_id": event.customer_id,
            "order_id": event.order_id,
            "total": event.total
        })

Java Circuit Breaker with Resilience4j

Java - Resilience4j Circuit Breaker
import io.github.resilience4j.circuitbreaker.CircuitBreaker;
import io.github.resilience4j.circuitbreaker.CircuitBreakerConfig;
import io.github.resilience4j.circuitbreaker.CircuitBreakerRegistry;

import java.time.Duration;

public class PaymentService {
    private final CircuitBreaker circuitBreaker;

    public PaymentService() {
        CircuitBreakerConfig config = CircuitBreakerConfig.custom()
            .failureRateThreshold(50) // 50% failure rate triggers open
            .waitDurationInOpenState(Duration.ofSeconds(30))
            .slidingWindowSize(10) // Last 10 calls
            .permittedNumberOfCallsInHalfOpenState(3)
            .build();

        CircuitBreakerRegistry registry = CircuitBreakerRegistry.of(config);
        this.circuitBreaker = registry.circuitBreaker("paymentService");
    }

    public PaymentResult processPayment(Order order) {
        return circuitBreaker.executeSupplier(() -> {
            // Call external payment gateway
            return externalPaymentGateway.charge(order.getTotal());
        });
    }

    public PaymentResult processPaymentWithFallback(Order order) {
        return circuitBreaker.executeSupplier(
            () -> externalPaymentGateway.charge(order.getTotal()),
            throwable -> {
                // Fallback: Queue payment for later processing
                paymentQueue.enqueue(order);
                return new PaymentResult(Status.QUEUED, "Payment queued");
            }
        );
    }
}

Node.js API Gateway with Rate Limiting

Node.js - API Gateway
const express = require('express');
const rateLimit = require('express-rate-limit');
const axios = require('axios');

const app = express();
app.use(express.json());

// Rate limiting middleware
const limiter = rateLimit({
    windowMs: 15 * 60 * 1000, // 15 minutes
    max: 100, // Max 100 requests per window
    message: 'Too many requests from this IP'
});

app.use('/api/', limiter);

// Service registry
const services = {
    users: 'http://user-service:3001',
    products: 'http://product-service:3002',
    orders: 'http://order-service:3003'
};

// Route requests to appropriate service
app.all('/api/:service/*', async (req, res) => {
    const serviceName = req.params.service;
    const serviceUrl = services[serviceName];

    if (!serviceUrl) {
        return res.status(404).json({ error: 'Service not found' });
    }

    const targetUrl = req.url.replace(`/api/${serviceName}`, '');

    try {
        const response = await axios({
            method: req.method,
            url: serviceUrl + targetUrl,
            data: req.body,
            headers: {
                'Authorization': req.headers.authorization
            },
            timeout: 5000
        });

        res.status(response.status).json(response.data);
    } catch (error) {
        if (error.code === 'ECONNABORTED') {
            res.status(504).json({ error: 'Service timeout' });
        } else {
            res.status(500).json({ error: 'Service unavailable' });
        }
    }
});

app.listen(8000, () => console.log('API Gateway running on port 8000'));

Event Sourcing with Go

Go - Event Store
package main

import (
    "time"
)

type Event struct {
    ID          string
    AggregateID string
    Type        string
    Data        map[string]interface{}
    Timestamp   time.Time
}

type EventStore struct {
    events []Event
}

func NewEventStore() *EventStore {
    return &EventStore{events: make([]Event, 0)}
}

func (es *EventStore) AppendEvent(event Event) {
    event.Timestamp = time.Now()
    es.events = append(es.events, event)
}

func (es *EventStore) GetEvents(aggregateID string) []Event {
    result := make([]Event, 0)
    for _, event := range es.events {
        if event.AggregateID == aggregateID {
            result = append(result, event)
        }
    }
    return result
}

// Account Aggregate
type Account struct {
    ID      string
    Balance float64
}

func (a *Account) ApplyEvent(event Event) {
    switch event.Type {
    case "DEPOSITED":
        a.Balance += event.Data["amount"].(float64)
    case "WITHDRAWN":
        a.Balance -= event.Data["amount"].(float64)
    }
}

func ReconstructAccount(accountID string, store *EventStore) *Account {
    account := &Account{ID: accountID, Balance: 0}
    events := store.GetEvents(accountID)

    for _, event := range events {
        account.ApplyEvent(event)
    }

    return account
}

// Usage
func main() {
    store := NewEventStore()

    store.AppendEvent(Event{
        AggregateID: "acc-123",
        Type:        "DEPOSITED",
        Data:        map[string]interface{}{"amount": 100.0},
    })

    store.AppendEvent(Event{
        AggregateID: "acc-123",
        Type:        "WITHDRAWN",
        Data:        map[string]interface{}{"amount": 30.0},
    })

    account := ReconstructAccount("acc-123", store)
    // account.Balance = 70.0
}

Try It Yourself

Clone these examples and experiment:

  1. Modify the Circuit Breaker thresholds and observe behavior
  2. Add more steps to the Saga and test compensation logic
  3. Implement a simple rate limiter using the Token Bucket algorithm
  4. Create event sourcing for a shopping cart aggregate

Practice Exercises

Apply patterns through hands-on challenges.

Exercise 1: Implement Saga Pattern for E-commerce

Objective: Build a Saga for order processing with compensation logic

Scenario: E-commerce order flow

  1. Create Order (compensation: cancel order)
  2. Reserve Inventory (compensation: release inventory)
  3. Process Payment (compensation: refund payment)
  4. Schedule Shipping (compensation: cancel shipping)

Requirements:

  • If any step fails, run compensations in reverse order
  • Log each step and compensation
  • Return success/failure with details

Solution:

Python Solution
class ECommerceSaga:
    def __init__(self):
        self.steps_executed = []
        self.order_id = None
        self.reservation_id = None
        self.transaction_id = None
        self.shipment_id = None

    def execute(self, order_data):
        try:
            # Step 1: Create Order
            self.order_id = self._create_order(order_data)
            self.steps_executed.append('create_order')

            # Step 2: Reserve Inventory
            self.reservation_id = self._reserve_inventory(order_data['items'])
            self.steps_executed.append('reserve_inventory')

            # Step 3: Process Payment
            self.transaction_id = self._process_payment(order_data['total'])
            self.steps_executed.append('process_payment')

            # Step 4: Schedule Shipping
            self.shipment_id = self._schedule_shipping(self.order_id)
            self.steps_executed.append('schedule_shipping')

            return {"status": "success", "order_id": self.order_id}

        except Exception as e:
            print(f"Saga failed at step: {e}")
            self._compensate()
            return {"status": "failed", "error": str(e)}

    def _compensate(self):
        print("Starting compensation...")

        for step in reversed(self.steps_executed):
            try:
                if step == 'schedule_shipping':
                    shipping_service.cancel(self.shipment_id)
                    print("✓ Cancelled shipping")

                elif step == 'process_payment':
                    payment_service.refund(self.transaction_id)
                    print("✓ Refunded payment")

                elif step == 'reserve_inventory':
                    inventory_service.release(self.reservation_id)
                    print("✓ Released inventory")

                elif step == 'create_order':
                    order_service.cancel(self.order_id)
                    print("✓ Cancelled order")

            except Exception as comp_error:
                print(f"✗ Compensation failed for {step}: {comp_error}")

    def _create_order(self, order_data):
        # Implementation
        return order_service.create(order_data)

    def _reserve_inventory(self, items):
        # Implementation
        return inventory_service.reserve(items)

    def _process_payment(self, amount):
        # Implementation
        return payment_service.charge(amount)

    def _schedule_shipping(self, order_id):
        # Implementation
        return shipping_service.schedule(order_id)

Exercise 2: Build Circuit Breaker from Scratch

Objective: Implement a Circuit Breaker with all three states

Requirements:

  1. Track failure count and success count
  2. Implement CLOSED, OPEN, HALF_OPEN states
  3. Configurable failure threshold (e.g., 5 failures triggers OPEN)
  4. Configurable timeout (e.g., 60 seconds before trying HALF_OPEN)
  5. In HALF_OPEN, allow limited requests to test recovery

Test cases:

  • 5 consecutive failures should open the circuit
  • Requests during OPEN state should fail immediately
  • After timeout, circuit should transition to HALF_OPEN
  • Successful requests in HALF_OPEN should close the circuit

Hint: Use the example from the Hands-On section as a starting point

Exercise 3: Design CQRS for Blog Platform

Objective: Separate read and write models for a blog system

Scenario: Blog platform with posts, comments, and likes

Write Model Commands:

  • CreatePost
  • UpdatePost
  • DeletePost
  • AddComment
  • LikePost

Read Model Queries:

  • GetPostDetail (post + comments + like count)
  • GetUserFeed (personalized feed with denormalized data)
  • GetTrendingPosts (sorted by likes, recent activity)

Task:

  1. Design the write model (normalized database)
  2. Design the read model (denormalized views)
  3. Define events that sync write → read
  4. Implement one command handler and one query handler

Hint: Write model focuses on data integrity, read model on query performance

Pattern Catalog & Reference

Production-grade patterns for complex distributed systems.

Choreography vs Orchestration

Aspect Choreography Orchestration
Control Decentralized Centralized
Communication Event-driven Command-driven
Coupling Loose Tighter
Complexity Harder to trace Easier to understand
Python - Orchestration with Temporal
from temporalio import workflow, activity

@workflow.defn
class OrderWorkflow:
    @workflow.run
    async def run(self, order_data):
        # Orchestrator controls the flow
        order = await workflow.execute_activity(
            create_order, order_data, start_to_close_timeout=timedelta(seconds=30)
        )

        inventory = await workflow.execute_activity(
            reserve_inventory, order.items, start_to_close_timeout=timedelta(seconds=30)
        )

        payment = await workflow.execute_activity(
            process_payment, order.total, start_to_close_timeout=timedelta(seconds=30)
        )

        await workflow.execute_activity(
            ship_order, order.id, start_to_close_timeout=timedelta(seconds=30)
        )

        return order

Spotify's Saga Pattern at Scale

  • Use Case: Playlist creation across multiple services
  • Pattern: Event-driven choreography
  • Services Involved: 12+ microservices
  • Events/sec: 100,000+
  • Key Learning: Idempotency is critical for event replay

Complete Pattern Catalog

Pattern Problem Solution When to Use Trade-offs
Circuit Breaker Cascading failures Prevent calls to failing service External dependencies, slow services Complexity vs. resilience
Saga Distributed transactions Compensating transactions Multi-service workflows Eventual consistency
CQRS Read/write performance Separate read/write models Complex queries, high read load Data synchronization complexity
Event Sourcing Audit trail, temporal queries Store events, not state Financial systems, compliance Storage overhead, complexity
Strangler Fig Legacy migration Gradually replace old system Monolith to microservices Dual maintenance period
API Gateway Client complexity, cross-cutting Single entry point Mobile apps, public APIs Single point of failure
Service Mesh Service-to-service communication Infrastructure layer for networking Large-scale microservices Operational complexity
Bulkhead Resource exhaustion Isolate resources per service Shared resources, critical services Resource inefficiency
Retry Pattern Transient failures Automatic retry with backoff Network glitches, temporary outages Increased latency, thundering herd
Rate Limiting Service abuse, overload Throttle requests per client Public APIs, DoS protection Legitimate traffic may be blocked
BFF (Backend for Frontend) Different client needs Custom backend per client type Mobile/web/IoT different requirements Code duplication
Sidecar Cross-cutting concerns Co-located helper process Logging, monitoring, proxying Resource overhead per service
Service Discovery Dynamic service locations Registry for service lookup Cloud, containerized environments Additional infrastructure dependency
Database per Service Tight coupling via shared DB Each service owns its data Independent deployability needed Data consistency challenges
Outbox Pattern Dual-write problem Transactional outbox table Database + messaging atomic writes Polling overhead, latency

Real-World Case Studies

Netflix Chaos Engineering

  • Challenge: Ensure resilience in distributed system with 1000+ microservices
  • Patterns Used: Circuit Breaker (Hystrix), Bulkhead, Retry, Fallback
  • Innovation: Chaos Monkey - randomly terminates production instances
  • Impact: 99.99% uptime despite constant failures
  • Tech Stack: Spring Cloud, Hystrix, Ribbon, Eureka
  • Key Metric: Serves 230M+ subscribers across 190+ countries
  • Lesson: "Embrace failure as a feature, not a bug"

Uber's Saga Pattern at Scale

  • Challenge: Coordinate ride booking across payment, dispatch, driver, rider services
  • Pattern: Saga with Orchestration (Cadence workflow engine)
  • Workflow Steps: Validate rider → Match driver → Process payment → Start trip
  • Compensation: Refund payment if driver cancels, release driver if payment fails
  • Scale: 18M+ trips/day, sub-second booking confirmation
  • Tech: Cadence (now Temporal), Go, Node.js microservices
  • Key Learning: Orchestration better than choreography for complex workflows

Amazon's Strangler Fig Migration

  • Challenge: Migrate monolithic e-commerce platform to microservices
  • Pattern: Strangler Fig + API Gateway
  • Approach: Route new features to microservices, legacy to monolith
  • Duration: 5+ year gradual migration
  • Result: 2-tier SOA → hundreds of microservices
  • Services: Product Catalog, Cart, Checkout, Recommendations (all separate)
  • Impact: Deploy every 11.7 seconds, 99.99% availability

Capital One's CQRS + Event Sourcing

  • Use Case: Banking transaction processing and fraud detection
  • Pattern: CQRS + Event Sourcing
  • Write Side: Process transactions, store events in event store
  • Read Side: Fraud detection models, account balance views, transaction history
  • Benefits: Complete audit trail, temporal queries for compliance
  • Scale: Billions of events, real-time fraud detection
  • Tech: Kafka for event streaming, Cassandra for event store

Anti-Patterns to Avoid

Distributed Monolith

Problem: Microservices with tight coupling, shared database, synchronous dependencies

Symptoms: Can't deploy independently, cascading failures, slow deployments

Solution: Database per service, async communication, bounded contexts

Chatty Services

Problem: Too many synchronous inter-service calls for a single user request

Symptoms: High latency, network congestion, timeout cascades

Solution: API composition, data duplication, event-driven communication

Shared Database

Problem: Multiple services reading/writing same database tables

Symptoms: Schema changes break multiple services, tight coupling

Solution: Database per service, data replication via events

Microservices for Every Small Feature

Problem: Over-engineering with too many tiny services

Symptoms: Operational nightmare, debugging complexity, network overhead

Solution: Start with modular monolith, extract services based on team/scaling needs

Pattern Combinations & When to Use

Scenario Recommended Patterns Rationale
E-commerce Order Flow Saga + Circuit Breaker + Retry Saga for multi-step workflow, Circuit Breaker for payment gateway, Retry for transient failures
Banking/Finance Event Sourcing + CQRS + Outbox Complete audit trail, read/write optimization, guaranteed message delivery
Social Media Feed CQRS + Cache-Aside + Rate Limiting Fast reads, write optimization, prevent abuse
IoT Platform Event-Driven + Bulkhead + Service Mesh Handle massive events, isolate device types, secure service-to-service
Legacy Migration Strangler Fig + BFF + API Gateway Gradual migration, client-specific APIs, routing layer
Public API Platform API Gateway + Rate Limiting + Circuit Breaker Single entry, abuse prevention, backend protection
Real-time Analytics Event Sourcing + CQRS + Materialized Views Event replay, query optimization, pre-computed aggregates

Pattern Selection Decision Tree

Start Here: Do you need microservices?
  • NO if: Small team (< 5 devs), simple domain, startup/MVP stage → Use modular monolith
  • YES if: Multiple teams, scaling bottlenecks, polyglot needs, independent deployability
Multi-Service Workflows?
  • Simple flow (2-3 steps): Event-driven choreography
  • Complex flow (4+ steps with compensation): Saga with orchestration
  • Long-running (days/weeks): Temporal/Cadence workflow engine
Read vs Write Performance Issues?
  • Heavy writes, simple reads: Database optimization, write-through cache
  • Heavy reads, complex queries: CQRS with materialized views
  • Audit/compliance required: Event Sourcing + CQRS
Resilience Concerns?
  • External service failures: Circuit Breaker + Fallback
  • Transient network errors: Retry with Exponential Backoff
  • Resource exhaustion: Bulkhead + Rate Limiting
  • Cascading failures: Circuit Breaker + Timeout + Bulkhead
Quick Pattern Selector
  1. Start small: Implement Circuit Breaker + Retry for all external calls
  2. Add workflow: Use Saga when you have 3+ service coordination
  3. Scale reads: Add CQRS when read load >> write load (10x+)
  4. Audit trail: Add Event Sourcing only if compliance requires it
  5. API layer: Add API Gateway when you have 3+ client types