Why Microservices Matter
Why This Matters
The Problem: As applications grow, monolithic codebases become unwieldy. A single change can require redeploying the entire application, and scaling means duplicating everything.
The Solution: Microservices decompose an application into small, independently deployable services that each own their data and logic.
Real Impact: Netflix runs over 1,000 microservices, enabling them to deploy hundreds of times per day and serve 230+ million subscribers worldwide.
Real-World Analogy
Think of microservices like a food court versus a single restaurant:
- Monolith = One restaurant that serves everything: pizza, sushi, burgers, desserts. If the pizza oven breaks, the whole restaurant might shut down.
- Microservices = A food court with specialized stalls. Each stall operates independently, has its own kitchen, and can scale on its own. If the pizza stall is overwhelmed, it can add more workers without affecting the sushi stall.
- API Gateway = The food court directory that helps customers find the right stall.
- Message Queue = The order ticket system connecting front counters to kitchens.
Core Benefits
Independent Deployment
Each service can be deployed, updated, and scaled independently. No need to redeploy the entire application for a single change.
Technology Diversity
Each team can choose the best language, framework, and database for their service. Python for ML, Go for networking, Node for real-time.
Fault Isolation
A failure in one service does not cascade to bring down the entire system. Circuit breakers prevent cascading failures.
Team Autonomy
Small teams own their services end-to-end. They can move fast without coordinating deployments across the entire organization.
Monolith vs Microservices
| Aspect | Monolith | Microservices |
|---|---|---|
| Deployment | All-or-nothing deployment | Independent per service |
| Scaling | Scale entire application | Scale individual services |
| Tech Stack | Single language/framework | Polyglot (different per service) |
| Failure Impact | One bug can crash everything | Failures are isolated |
| Data | Shared database | Database per service |
| Complexity | Simple at first, hard to maintain | Complex upfront, easier long-term |
| Team Size | Works for small teams (<10) | Best for larger organizations |
Common Pitfall: Premature Microservices
Problem: Many teams adopt microservices too early, before they understand their domain boundaries.
Solution: Start with a well-structured monolith ("monolith first"). Extract services only when you have clear bounded contexts and the team is large enough to justify the operational overhead. Martin Fowler calls this the "Monolith First" approach.
Service Communication
Microservices need to talk to each other. There are two fundamental patterns: synchronous (request/response) and asynchronous (event-driven).
gRPC vs REST
| Feature | REST | gRPC |
|---|---|---|
| Protocol | HTTP/1.1 (JSON) | HTTP/2 (Protocol Buffers) |
| Performance | Slower (text-based) | Faster (binary, streaming) |
| Contract | OpenAPI/Swagger (optional) | Strict .proto files (required) |
| Best For | Public APIs, web clients | Internal service-to-service |
# A simple microservice using Flask
from flask import Flask, jsonify, request
import requests
import os
app = Flask(__name__)
# Each service has its own configuration
INVENTORY_SERVICE_URL = os.getenv("INVENTORY_URL", "http://inventory-service:5001")
PAYMENT_SERVICE_URL = os.getenv("PAYMENT_URL", "http://payment-service:5002")
class OrderService:
def create_order(self, user_id, items):
# Step 1: Check inventory (synchronous call)
inventory_resp = requests.post(
f"{INVENTORY_SERVICE_URL}/check",
json={"items": items}
)
if not inventory_resp.json()["available"]:
return {"error": "Items not available"}, 400
# Step 2: Process payment (synchronous call)
total = sum(item["price"] * item["qty"] for item in items)
payment_resp = requests.post(
f"{PAYMENT_SERVICE_URL}/charge",
json={"user_id": user_id, "amount": total}
)
if payment_resp.status_code != 200:
return {"error": "Payment failed"}, 402
# Step 3: Create order record
order = {
"user_id": user_id,
"items": items,
"total": total,
"status": "confirmed"
}
return order, 201
order_svc = OrderService()
@app.route("/orders", methods=["POST"])
def create_order():
data = request.get_json()
result, status = order_svc.create_order(data["user_id"], data["items"])
return jsonify(result), status
if __name__ == "__main__":
app.run(host="0.0.0.0", port=5000)
Service Discovery
In a microservices architecture, services need to find each other. Unlike monoliths where everything shares a process, microservices run on different hosts and ports that can change dynamically.
Service Discovery Patterns
- Client-Side Discovery: The client queries a service registry (e.g., Netflix Eureka) and picks an instance. The client handles load balancing.
- Server-Side Discovery: The client sends a request to a load balancer (e.g., AWS ALB, Kubernetes Services), which queries the registry and routes the request.
- DNS-Based: Services register DNS records. Simple but has TTL caching issues.
- Service Mesh: A sidecar proxy (e.g., Envoy in Istio) handles discovery transparently. The application code does not need to know about service discovery at all.
Service Registry
A database of available service instances. Services register on startup and deregister on shutdown. Examples: Consul, etcd, ZooKeeper, Kubernetes DNS.
Health Checks
The registry periodically pings services to verify they are healthy. Unhealthy instances are removed from the pool automatically.
Load Balancing
Once discovered, requests must be distributed across instances. Common strategies: round-robin, least connections, consistent hashing.
Data Management in Microservices
The Database-per-Service Pattern
Each microservice owns its data and exposes it only through its API. No direct database access from other services. This ensures loose coupling but introduces challenges around data consistency and joins across services.
Strategies for Cross-Service Queries
| Pattern | How It Works | Trade-offs |
|---|---|---|
| API Composition | A composer service calls multiple services and joins results in memory | Simple but increases latency; no transactional guarantees |
| CQRS | Separate read and write models. Write to service DB, publish events, build read-optimized views | Great for read-heavy workloads; eventual consistency |
| Event Sourcing | Store events instead of current state. Rebuild state by replaying events | Full audit trail; complex to implement |
| Saga Pattern | Coordinate multi-service transactions through a sequence of local transactions and compensating actions | Handles distributed transactions without 2PC |
Saga Pattern
Distributed transactions across microservices cannot use traditional ACID transactions. The Saga pattern breaks a transaction into a sequence of local transactions, each with a compensating action if something fails.
Two Saga Implementations
- Choreography: Each service publishes events that trigger the next step. No central coordinator. Simple but hard to track for complex workflows.
- Orchestration: A central orchestrator service tells each participant what to do. Easier to understand and debug, but the orchestrator can become a bottleneck.
# Saga Orchestrator for Order Processing
from enum import Enum
from dataclasses import dataclass
from typing import List, Callable
class SagaStatus(Enum):
PENDING = "pending"
COMPLETED = "completed"
COMPENSATING = "compensating"
FAILED = "failed"
@dataclass
class SagaStep:
name: str
action: Callable # The forward action
compensate: Callable # The rollback action
class SagaOrchestrator:
def __init__(self, steps: List[SagaStep]):
self.steps = steps
self.completed_steps = []
self.status = SagaStatus.PENDING
def execute(self, context: dict) -> dict:
"""Execute all saga steps in order."""
for step in self.steps:
try:
print(f"Executing: {step.name}")
result = step.action(context)
context.update(result or {})
self.completed_steps.append(step)
except Exception as e:
print(f"Failed at: {step.name} - {e}")
self._compensate(context)
return {"status": "failed", "failed_at": step.name}
self.status = SagaStatus.COMPLETED
return {"status": "completed", "context": context}
def _compensate(self, context: dict):
"""Roll back completed steps in reverse order."""
self.status = SagaStatus.COMPENSATING
for step in reversed(self.completed_steps):
try:
print(f"Compensating: {step.name}")
step.compensate(context)
except Exception as e:
print(f"Compensation failed: {step.name} - {e}")
self.status = SagaStatus.FAILED
# Example usage: Order processing saga
def reserve_inventory(ctx):
print(f" Reserving items: {ctx['items']}")
return {"reservation_id": "RSV-001"}
def release_inventory(ctx):
print(f" Releasing reservation: {ctx['reservation_id']}")
def process_payment(ctx):
print(f" Charging ${ctx['total']}")
return {"payment_id": "PAY-001"}
def refund_payment(ctx):
print(f" Refunding payment: {ctx['payment_id']}")
def confirm_order(ctx):
print(f" Order confirmed!")
return {"order_id": "ORD-001"}
def cancel_order(ctx):
print(f" Cancelling order: {ctx.get('order_id')}")
saga = SagaOrchestrator([
SagaStep("Reserve Inventory", reserve_inventory, release_inventory),
SagaStep("Process Payment", process_payment, refund_payment),
SagaStep("Confirm Order", confirm_order, cancel_order),
])
result = saga.execute({"items": ["widget"], "total": 29.99})
print(result)
Docker Compose for Local Development
# Run the full microservices stack locally
version: "3.8"
services:
api-gateway:
image: nginx:alpine
ports:
- "80:80"
volumes:
- ./nginx.conf:/etc/nginx/nginx.conf
depends_on:
- order-service
- inventory-service
- payment-service
order-service:
build: ./services/orders
environment:
- INVENTORY_URL=http://inventory-service:5001
- PAYMENT_URL=http://payment-service:5002
- DATABASE_URL=postgres://orders_db:5432/orders
depends_on:
- orders-db
- rabbitmq
inventory-service:
build: ./services/inventory
environment:
- DATABASE_URL=postgres://inventory_db:5432/inventory
depends_on:
- inventory-db
payment-service:
build: ./services/payments
environment:
- DATABASE_URL=postgres://payments_db:5432/payments
- STRIPE_KEY=${STRIPE_KEY}
depends_on:
- payments-db
# Each service gets its own database
orders-db:
image: postgres:15-alpine
environment:
POSTGRES_DB: orders
volumes:
- orders_data:/var/lib/postgresql/data
inventory-db:
image: postgres:15-alpine
environment:
POSTGRES_DB: inventory
volumes:
- inventory_data:/var/lib/postgresql/data
payments-db:
image: postgres:15-alpine
environment:
POSTGRES_DB: payments
volumes:
- payments_data:/var/lib/postgresql/data
rabbitmq:
image: rabbitmq:3-management
ports:
- "15672:15672"
volumes:
orders_data:
inventory_data:
payments_data:
Practice Problems
Medium Design a Service Boundary
You are building an e-commerce platform. Identify the microservice boundaries:
- List at least 5 services you would create
- Define which data each service owns
- Identify the synchronous vs asynchronous communication patterns between them
- Explain how a user placing an order would flow through the services
Think about bounded contexts from Domain-Driven Design. Each service should own a single business capability. Consider: User, Catalog, Cart, Order, Payment, Inventory, Notification, Shipping.
# Microservice Boundaries for E-Commerce
# 1. User Service - owns user profiles, auth
# DB: users, addresses, preferences
# 2. Catalog Service - owns product info
# DB: products, categories, reviews
# 3. Cart Service - owns shopping carts
# DB: carts (Redis for speed)
# 4. Order Service - owns order lifecycle
# DB: orders, order_items, order_status
# 5. Payment Service - owns transactions
# DB: payments, refunds
# 6. Inventory Service - owns stock levels
# DB: stock, warehouses, reservations
# 7. Notification Service - sends emails/SMS
# DB: templates, notification_log
# Order Flow:
# User clicks "Buy" -> Cart Service (sync)
# -> Order Service creates order (sync)
# -> Inventory Service reserves stock (sync)
# -> Payment Service charges card (sync)
# -> Order confirmed (event published)
# -> Notification Service sends email (async)
# -> Shipping Service creates shipment (async)
Medium Circuit Breaker Implementation
Implement a circuit breaker pattern that:
- Tracks failure counts for an external service call
- Opens the circuit after 5 consecutive failures
- Returns a fallback response while the circuit is open
- Attempts to close the circuit after a timeout period
Use three states: CLOSED (normal), OPEN (failing fast), HALF_OPEN (testing). Track failure count and last failure time. In OPEN state, check if enough time has passed before allowing a test request.
import time
from enum import Enum
class CircuitState(Enum):
CLOSED = "closed"
OPEN = "open"
HALF_OPEN = "half_open"
class CircuitBreaker:
def __init__(self, failure_threshold=5,
timeout=30):
self.failure_threshold = failure_threshold
self.timeout = timeout
self.failure_count = 0
self.last_failure = None
self.state = CircuitState.CLOSED
def call(self, func, fallback, *args):
if self.state == CircuitState.OPEN:
if self._timeout_expired():
self.state = CircuitState.HALF_OPEN
else:
return fallback()
try:
result = func(*args)
self._on_success()
return result
except Exception:
self._on_failure()
return fallback()
def _on_success(self):
self.failure_count = 0
self.state = CircuitState.CLOSED
def _on_failure(self):
self.failure_count += 1
self.last_failure = time.time()
if self.failure_count >= self.failure_threshold:
self.state = CircuitState.OPEN
def _timeout_expired(self):
return (time.time() - self.last_failure
> self.timeout)
Hard Saga with Compensation
Design a saga for a travel booking system that books a flight, hotel, and car rental. If any step fails, all previous steps must be compensated (cancelled).
- Define the saga steps and their compensating actions
- Handle the case where the hotel booking fails after the flight is booked
- Handle the case where a compensation itself fails (idempotent retries)
Each step must be idempotent. Store saga state in a database so it can be resumed after crashes. Use unique transaction IDs to ensure compensations can be retried safely.
# Travel Booking Saga with idempotent compensation
import uuid
class TravelBookingSaga:
def __init__(self):
self.saga_id = str(uuid.uuid4())
self.state = {} # persisted to DB
def execute(self, trip):
steps = [
("flight", self._book_flight, self._cancel_flight),
("hotel", self._book_hotel, self._cancel_hotel),
("car", self._book_car, self._cancel_car),
]
completed = []
for name, action, compensate in steps:
try:
ref = action(trip)
self.state[name] = ref
completed.append((name, compensate))
except Exception as e:
print(f"Failed: {name} - {e}")
for cname, cfn in reversed(completed):
self._safe_compensate(cname, cfn, trip)
return False
return True
def _safe_compensate(self, name, fn, trip,
max_retries=3):
for attempt in range(max_retries):
try:
fn(trip, self.state[name])
return
except Exception:
if attempt == max_retries - 1:
# Log for manual resolution
print(f"ALERT: {name} needs manual fix")
Quick Reference
Microservices Decision Framework
| Question | Monolith | Microservices |
|---|---|---|
| Team size? | < 10 developers | > 10, multiple teams |
| Domain well understood? | Still exploring | Clear bounded contexts |
| Deployment frequency? | Weekly/monthly | Multiple times per day |
| Scale requirements? | Uniform scaling OK | Services scale independently |
| Operational maturity? | Limited DevOps | Strong CI/CD, monitoring |
Key Patterns Summary
Essential Microservices Patterns
- API Gateway: Single entry point that routes requests to services, handles auth, rate limiting
- Circuit Breaker: Prevent cascading failures by failing fast when a downstream service is unhealthy
- Service Mesh: Infrastructure layer (Istio, Linkerd) for service-to-service communication
- Database per Service: Each service owns its data; no shared databases
- Saga Pattern: Manage distributed transactions via compensating actions
- CQRS: Separate read and write models for complex domains
- Event Sourcing: Store state changes as immutable events
- Strangler Fig: Gradually migrate from monolith by routing requests to new services