Service Communication Patterns - Microservices Architecture

Introduction to Service Communication

In microservices architecture, services need to communicate to deliver business functionality. Choosing the right communication pattern is critical for performance, reliability, and scalability.

Why Communication Matters

Poor communication patterns can lead to:

High latency and slow response times
Tight coupling between services
Cascading failures during outages
Difficult debugging and troubleshooting

Synchronous vs Asynchronous Communication

Aspect	Synchronous	Asynchronous
Analogy	Phone call - wait for response	Text message - continue without waiting
Examples	REST, gRPC, GraphQL	Message Queues, Event Streams, Pub/Sub
Coupling	Tight - both services must be available	Loose - services don't need to be online simultaneously
Response Time	Immediate - blocking	Eventual - non-blocking
Use Cases	User-facing APIs, real-time queries	Background jobs, event notifications, data processing
Complexity	Simple - request/response	Complex - requires message broker

Communication Pattern Categories

Pattern Categories

Request/Response (Synchronous): Client sends request, waits for response
- REST APIs (HTTP/JSON)
- gRPC (HTTP/2 + Protocol Buffers)
- GraphQL (Query language over HTTP)
Event-Driven (Asynchronous): Services emit events, others react
- Message Queues (RabbitMQ, AWS SQS)
- Event Streams (Apache Kafka, AWS Kinesis)
- Pub/Sub (Redis Pub/Sub, Google Pub/Sub)
Hybrid: Combination of sync and async
- API Gateway + Message Queue
- CQRS (Command Query Responsibility Segregation)

When to Use Synchronous

Need immediate response (e.g., user login, product search)
Data must be consistent right away
Simple request/response flow
Client needs to know if operation succeeded immediately

When to Use Asynchronous

Long-running operations (video processing, report generation)
Fire-and-forget scenarios (sending email, logging)
Need to decouple services for scalability
Handling traffic spikes with queue buffering

Synchronous Protocols Deep Dive

Understanding REST, gRPC, and GraphQL - when to use each and how they compare.

REST APIs - The Universal Standard

REST (Representational State Transfer) is the most common way services communicate using HTTP methods and JSON.

Why REST Dominates

Universal - works everywhere (browsers, mobile, IoT)
Human-readable JSON format
Well-understood by developers
Extensive tooling and libraries

REST Principles:

Stateless: Each request contains all information needed
Resource-based: Everything is a resource with a URI
HTTP Methods: GET (read), POST (create), PUT (update), DELETE (delete)
Standard Status Codes: 200 OK, 404 Not Found, 500 Server Error

Python - Flask REST API

from flask import Flask, jsonify, request

app = Flask(__name__)

# In-memory database
products = {
    1: {"id": 1, "name": "Laptop", "price": 999.99, "stock": 50},
    2: {"id": 2, "name": "Mouse", "price": 29.99, "stock": 200}
}

@app.route('/products', methods=['GET'])
def get_products():
    return jsonify(list(products.values()))

@app.route('/products/', methods=['GET'])
def get_product(product_id):
    product = products.get(product_id)
    if not product:
        return jsonify({"error": "Product not found"}), 404
    return jsonify(product)

@app.route('/products', methods=['POST'])
def create_product():
    data = request.json
    product_id = max(products.keys()) + 1
    products[product_id] = {
        "id": product_id,
        "name": data['name'],
        "price": data['price'],
        "stock": data['stock']
    }
    return jsonify(products[product_id]), 201

@app.route('/products/', methods=['PUT'])
def update_product(product_id):
    if product_id not in products:
        return jsonify({"error": "Product not found"}), 404
    data = request.json
    products[product_id].update(data)
    return jsonify(products[product_id])

@app.route('/products/', methods=['DELETE'])
def delete_product(product_id):
    if product_id not in products:
        return jsonify({"error": "Product not found"}), 404
    del products[product_id]
    return jsonify({"message": "Product deleted"}), 200

gRPC - High Performance RPC

gRPC uses HTTP/2 and Protocol Buffers for fast, efficient communication between services.

Feature	REST	gRPC
Protocol	HTTP/1.1	HTTP/2
Data Format	JSON (text)	Protocol Buffers (binary)
Performance	Good	Excellent (7-10x faster)
Streaming	Not supported	Bi-directional streaming
Browser Support	Native	Requires gRPC-Web
Code Generation	Manual	Automatic from .proto files

Protocol Buffers - product.proto

syntax = "proto3";

package ecommerce;

service ProductService {
    rpc GetProduct(ProductRequest) returns (ProductResponse);
    rpc ListProducts(Empty) returns (stream ProductResponse);
    rpc CreateProduct(CreateProductRequest) returns (ProductResponse);
}

message ProductRequest {
    int32 product_id = 1;
}

message CreateProductRequest {
    string name = 1;
    double price = 2;
    int32 stock = 3;
}

message ProductResponse {
    int32 id = 1;
    string name = 2;
    double price = 3;
    int32 stock = 4;
}

message Empty {}

Python - gRPC Server

import grpc
from concurrent import futures
import product_pb2
import product_pb2_grpc

class ProductServicer(product_pb2_grpc.ProductServiceServicer):
    def __init__(self):
        self.products = {
            1: {"id": 1, "name": "Laptop", "price": 999.99, "stock": 50},
            2: {"id": 2, "name": "Mouse", "price": 29.99, "stock": 200}
        }

    def GetProduct(self, request, context):
        product = self.products.get(request.product_id)
        if not product:
            context.set_code(grpc.StatusCode.NOT_FOUND)
            context.set_details('Product not found')
            return product_pb2.ProductResponse()

        return product_pb2.ProductResponse(**product)

    def ListProducts(self, request, context):
        for product in self.products.values():
            yield product_pb2.ProductResponse(**product)

    def CreateProduct(self, request, context):
        product_id = max(self.products.keys()) + 1
        product = {
            "id": product_id,
            "name": request.name,
            "price": request.price,
            "stock": request.stock
        }
        self.products[product_id] = product
        return product_pb2.ProductResponse(**product)

def serve():
    server = grpc.server(futures.ThreadPoolExecutor(max_workers=10))
    product_pb2_grpc.add_ProductServiceServicer_to_server(
        ProductServicer(), server
    )
    server.add_insecure_port('[::]:50051')
    server.start()
    server.wait_for_termination()

if __name__ == '__main__':
    serve()

GraphQL - Flexible Query Language

GraphQL allows clients to request exactly the data they need, avoiding over-fetching and under-fetching.

GraphQL Benefits

Single endpoint for all queries
Client specifies exact fields needed
Reduces network requests (aggregate data from multiple sources)
Strong typing and introspection

GraphQL Schema

type Product {
    id: ID!
    name: String!
    price: Float!
    stock: Int!
    reviews: [Review!]!
}

type Review {
    id: ID!
    rating: Int!
    comment: String
    user: User!
}

type User {
    id: ID!
    name: String!
    email: String!
}

type Query {
    product(id: ID!): Product
    products: [Product!]!
}

type Mutation {
    createProduct(name: String!, price: Float!, stock: Int!): Product!
    updateProduct(id: ID!, name: String, price: Float, stock: Int): Product!
}

GraphQL Query Example

# Client query - request only needed fields
query {
    product(id: "1") {
        name
        price
        reviews {
            rating
            comment
            user {
                name
            }
        }
    }
}

# Response - exactly what was requested
{
    "data": {
        "product": {
            "name": "Laptop",
            "price": 999.99,
            "reviews": [
                {
                    "rating": 5,
                    "comment": "Great laptop!",
                    "user": {"name": "John Doe"}
                }
            ]
        }
    }
}

Asynchronous Messaging Patterns

Message queues, event streams, and pub/sub for decoupled, scalable communication.

Message Queues - RabbitMQ

Message Queues enable asynchronous communication where services send messages to a queue, and other services consume them when ready.

Why Message Queues

Decouple producers and consumers
Handle traffic spikes with buffering
Retry failed messages automatically
Load balancing across multiple consumers

Queue Patterns:

Work Queue: Multiple workers process tasks from a shared queue
Publish/Subscribe: Message broadcast to all subscribers
Routing: Messages routed based on keys
Topics: Pattern-based message routing

Python - RabbitMQ Producer

import pika
import json

class OrderPublisher:
    def __init__(self):
        self.connection = pika.BlockingConnection(
            pika.ConnectionParameters('localhost')
        )
        self.channel = self.connection.channel()

        # Declare exchange and queue
        self.channel.exchange_declare(
            exchange='orders',
            exchange_type='direct',
            durable=True
        )
        self.channel.queue_declare(queue='order_processing', durable=True)
        self.channel.queue_bind(
            exchange='orders',
            queue='order_processing',
            routing_key='new_order'
        )

    def publish_order(self, order_data):
        message = json.dumps(order_data)
        self.channel.basic_publish(
            exchange='orders',
            routing_key='new_order',
            body=message,
            properties=pika.BasicProperties(
                delivery_mode=2,  # Make message persistent
                content_type='application/json'
            )
        )
        print(f"Published order: {order_data['order_id']}")

    def close(self):
        self.connection.close()

# Usage
publisher = OrderPublisher()
publisher.publish_order({
    "order_id": "ORD-12345",
    "user_id": "USER-789",
    "items": [{"product_id": 1, "quantity": 2}],
    "total": 1999.98
})

Python - RabbitMQ Consumer

import pika
import json

class OrderProcessor:
    def __init__(self):
        self.connection = pika.BlockingConnection(
            pika.ConnectionParameters('localhost')
        )
        self.channel = self.connection.channel()
        self.channel.queue_declare(queue='order_processing', durable=True)

        # Fair dispatch - don't give more than 1 message to a worker at a time
        self.channel.basic_qos(prefetch_count=1)

    def process_order(self, order_data):
        print(f"Processing order: {order_data['order_id']}")
        # Business logic here
        # - Validate inventory
        # - Process payment
        # - Schedule shipping
        return True

    def callback(self, ch, method, properties, body):
        order_data = json.loads(body)
        try:
            success = self.process_order(order_data)
            if success:
                ch.basic_ack(delivery_tag=method.delivery_tag)
        except Exception as e:
            print(f"Error processing order: {e}")
            # Negative acknowledgment - requeue the message
            ch.basic_nack(delivery_tag=method.delivery_tag, requeue=True)

    def start_consuming(self):
        self.channel.basic_consume(
            queue='order_processing',
            on_message_callback=self.callback
        )
        print("Waiting for orders...")
        self.channel.start_consuming()

# Usage
processor = OrderProcessor()
processor.start_consuming()

Event Streaming - Apache Kafka

Kafka is a distributed event streaming platform for high-throughput, fault-tolerant message processing.

Feature	RabbitMQ	Kafka
Model	Message Queue	Event Log
Throughput	20K msgs/sec	1M+ msgs/sec
Message Retention	Deleted after consumption	Retained for configured time
Ordering	Queue level	Partition level
Use Case	Task queues, RPC	Event sourcing, log aggregation, stream processing

Python - Kafka Producer

from kafka import KafkaProducer
import json
from datetime import datetime

class EventPublisher:
    def __init__(self):
        self.producer = KafkaProducer(
            bootstrap_servers=['localhost:9092'],
            value_serializer=lambda v: json.dumps(v).encode('utf-8'),
            key_serializer=lambda k: k.encode('utf-8') if k else None
        )

    def publish_event(self, topic, key, event_data):
        # Add metadata
        event_data['timestamp'] = datetime.utcnow().isoformat()

        future = self.producer.send(topic, key=key, value=event_data)

        # Block for synchronous send
        record_metadata = future.get(timeout=10)
        print(f"Event sent to {record_metadata.topic} partition {record_metadata.partition} offset {record_metadata.offset}")

    def close(self):
        self.producer.flush()
        self.producer.close()

# Usage
publisher = EventPublisher()

# Publish order created event
publisher.publish_event(
    topic='order-events',
    key='ORD-12345',  # Key determines partition
    event_data={
        "event_type": "OrderCreated",
        "order_id": "ORD-12345",
        "user_id": "USER-789",
        "total": 1999.98
    }
)

# Publish payment processed event
publisher.publish_event(
    topic='payment-events',
    key='ORD-12345',
    event_data={
        "event_type": "PaymentProcessed",
        "order_id": "ORD-12345",
        "amount": 1999.98,
        "payment_method": "credit_card"
    }
)

Python - Kafka Consumer

from kafka import KafkaConsumer
import json

class EventConsumer:
    def __init__(self, topic, group_id):
        self.consumer = KafkaConsumer(
            topic,
            bootstrap_servers=['localhost:9092'],
            group_id=group_id,
            value_deserializer=lambda m: json.loads(m.decode('utf-8')),
            auto_offset_reset='earliest',  # Start from beginning if no offset
            enable_auto_commit=True
        )

    def process_event(self, event_data):
        event_type = event_data.get('event_type')

        if event_type == 'OrderCreated':
            print(f"New order: {event_data['order_id']}")
            # Trigger inventory reservation
            # Send confirmation email

        elif event_type == 'PaymentProcessed':
            print(f"Payment confirmed for: {event_data['order_id']}")
            # Update order status
            # Trigger shipping

    def start_consuming(self):
        print("Waiting for events...")
        for message in self.consumer:
            print(f"Received from partition {message.partition} offset {message.offset}")
            self.process_event(message.value)

# Usage - multiple consumers in same group for load balancing
consumer = EventConsumer('order-events', group_id='order-processor-group')
consumer.start_consuming()

Pub/Sub Pattern

The Publish/Subscribe pattern allows multiple subscribers to receive the same message.

Netflix Event-Driven Architecture

Use Case: User watches a movie → trigger multiple actions
Publisher: Video Streaming Service publishes "VideoWatched" event
Subscribers:
- Recommendation Service → Update user preferences
- Analytics Service → Track viewing metrics
- Billing Service → Update watch time for subscription
- Continue Watching Service → Update progress
Benefits: Services are decoupled, can add new subscribers without changing publisher

Hands-On: Building Communication Patterns

Complete, production-ready code examples you can run and modify.

Example 1: REST API with JWT Authentication

Python - Authenticated REST API

from flask import Flask, jsonify, request
from flask_jwt_extended import JWTManager, create_access_token, jwt_required, get_jwt_identity
from functools import wraps

app = Flask(__name__)
app.config['JWT_SECRET_KEY'] = 'your-secret-key'  # Change in production
jwt = JWTManager(app)

# Mock database
users = {
    "user1": {"password": "pass123", "role": "admin"},
    "user2": {"password": "pass456", "role": "user"}
}

products = [
    {"id": 1, "name": "Laptop", "price": 999.99},
    {"id": 2, "name": "Mouse", "price": 29.99}
]

@app.route('/login', methods=['POST'])
def login():
    username = request.json.get('username')
    password = request.json.get('password')

    user = users.get(username)
    if not user or user['password'] != password:
        return jsonify({"error": "Invalid credentials"}), 401

    access_token = create_access_token(
        identity=username,
        additional_claims={"role": user['role']}
    )
    return jsonify({"access_token": access_token})

@app.route('/products', methods=['GET'])
@jwt_required()
def get_products():
    current_user = get_jwt_identity()
    return jsonify({
        "user": current_user,
        "products": products
    })

@app.route('/products', methods=['POST'])
@jwt_required()
def create_product():
    # Admin-only endpoint
    from flask_jwt_extended import get_jwt
    claims = get_jwt()
    if claims.get('role') != 'admin':
        return jsonify({"error": "Admin access required"}), 403

    data = request.json
    new_product = {
        "id": len(products) + 1,
        "name": data['name'],
        "price": data['price']
    }
    products.append(new_product)
    return jsonify(new_product), 201

if __name__ == '__main__':
    app.run(debug=True, port=5000)

# Usage Example:
# 1. Login: POST http://localhost:5000/login
#    Body: {"username": "user1", "password": "pass123"}
#    Response: {"access_token": "eyJ0eXAi..."}
#
# 2. Get Products: GET http://localhost:5000/products
#    Headers: Authorization: Bearer eyJ0eXAi...
#
# 3. Create Product: POST http://localhost:5000/products
#    Headers: Authorization: Bearer eyJ0eXAi...
#    Body: {"name": "Keyboard", "price": 79.99}

Example 2: gRPC Bi-directional Streaming

chat.proto - Streaming Chat Service

syntax = "proto3";

package chat;

service ChatService {
    // Bi-directional streaming for real-time chat
    rpc Chat(stream ChatMessage) returns (stream ChatMessage);

    // Server streaming for chat history
    rpc GetHistory(HistoryRequest) returns (stream ChatMessage);
}

message ChatMessage {
    string user_id = 1;
    string message = 2;
    int64 timestamp = 3;
    string room_id = 4;
}

message HistoryRequest {
    string room_id = 1;
    int32 limit = 2;
}

Python - gRPC Streaming Server

import grpc
from concurrent import futures
import chat_pb2
import chat_pb2_grpc
import time
from collections import defaultdict

class ChatServicer(chat_pb2_grpc.ChatServiceServicer):
    def __init__(self):
        self.rooms = defaultdict(list)  # room_id -> [messages]
        self.active_streams = defaultdict(list)  # room_id -> [response_streams]

    def Chat(self, request_iterator, context):
        """Bi-directional streaming for real-time chat"""
        room_id = None
        response_queue = []

        for message in request_iterator:
            room_id = message.room_id

            # Store message
            self.rooms[room_id].append(message)

            # Broadcast to all clients in this room
            for stream_queue in self.active_streams[room_id]:
                stream_queue.append(message)

            # Add current client's stream
            if response_queue not in self.active_streams[room_id]:
                self.active_streams[room_id].append(response_queue)

            # Yield messages for this client
            while response_queue:
                yield response_queue.pop(0)

    def GetHistory(self, request, context):
        """Server streaming for chat history"""
        room_id = request.room_id
        limit = request.limit or 100

        messages = self.rooms[room_id][-limit:]
        for message in messages:
            yield message

def serve():
    server = grpc.server(futures.ThreadPoolExecutor(max_workers=10))
    chat_pb2_grpc.add_ChatServiceServicer_to_server(ChatServicer(), server)
    server.add_insecure_port('[::]:50051')
    print("gRPC Chat Server started on port 50051")
    server.start()
    server.wait_for_termination()

if __name__ == '__main__':
    serve()

Example 3: GraphQL Server with Mutations

Python - GraphQL Server (Graphene)

from flask import Flask
from flask_graphql import GraphQLView
import graphene
from datetime import datetime

# Mock database
products_db = [
    {"id": "1", "name": "Laptop", "price": 999.99, "stock": 50},
    {"id": "2", "name": "Mouse", "price": 29.99, "stock": 200}
]

class Product(graphene.ObjectType):
    id = graphene.ID()
    name = graphene.String()
    price = graphene.Float()
    stock = graphene.Int()

class Query(graphene.ObjectType):
    product = graphene.Field(Product, id=graphene.ID(required=True))
    products = graphene.List(Product)
    search_products = graphene.List(Product, name=graphene.String())

    def resolve_product(self, info, id):
        return next((p for p in products_db if p['id'] == id), None)

    def resolve_products(self, info):
        return products_db

    def resolve_search_products(self, info, name):
        return [p for p in products_db if name.lower() in p['name'].lower()]

class CreateProduct(graphene.Mutation):
    class Arguments:
        name = graphene.String(required=True)
        price = graphene.Float(required=True)
        stock = graphene.Int(required=True)

    product = graphene.Field(Product)

    def mutate(self, info, name, price, stock):
        new_id = str(len(products_db) + 1)
        product = {"id": new_id, "name": name, "price": price, "stock": stock}
        products_db.append(product)
        return CreateProduct(product=product)

class UpdateProduct(graphene.Mutation):
    class Arguments:
        id = graphene.ID(required=True)
        name = graphene.String()
        price = graphene.Float()
        stock = graphene.Int()

    product = graphene.Field(Product)

    def mutate(self, info, id, name=None, price=None, stock=None):
        product = next((p for p in products_db if p['id'] == id), None)
        if not product:
            raise Exception("Product not found")

        if name:
            product['name'] = name
        if price:
            product['price'] = price
        if stock is not None:
            product['stock'] = stock

        return UpdateProduct(product=product)

class Mutation(graphene.ObjectType):
    create_product = CreateProduct.Field()
    update_product = UpdateProduct.Field()

schema = graphene.Schema(query=Query, mutation=Mutation)

app = Flask(__name__)
app.add_url_rule(
    '/graphql',
    view_func=GraphQLView.as_view('graphql', schema=schema, graphiql=True)
)

if __name__ == '__main__':
    app.run(debug=True, port=5000)
    # Access GraphiQL at http://localhost:5000/graphql

# Example Queries:
# query {
#   products {
#     id name price stock
#   }
# }
#
# mutation {
#   createProduct(name: "Keyboard", price: 79.99, stock: 100) {
#     product { id name price }
#   }
# }

Example 4: Production RabbitMQ with Retry Logic

Python - RabbitMQ with Dead Letter Queue

import pika
import json
import time
from functools import wraps

class RobustRabbitMQConsumer:
    def __init__(self):
        self.connection = None
        self.channel = None
        self.setup_connection()
        self.setup_queues()

    def setup_connection(self):
        """Setup connection with retry logic"""
        max_retries = 5
        for i in range(max_retries):
            try:
                self.connection = pika.BlockingConnection(
                    pika.ConnectionParameters(
                        'localhost',
                        heartbeat=600,
                        blocked_connection_timeout=300
                    )
                )
                self.channel = self.connection.channel()
                print("Connected to RabbitMQ")
                return
            except pika.exceptions.AMQPConnectionError:
                if i < max_retries - 1:
                    time.sleep(2 ** i)  # Exponential backoff
                else:
                    raise

    def setup_queues(self):
        """Setup main queue, retry queue, and dead letter queue"""
        # Dead Letter Exchange for failed messages
        self.channel.exchange_declare(
            exchange='dlx',
            exchange_type='direct',
            durable=True
        )

        # Dead Letter Queue - messages that failed all retries
        self.channel.queue_declare(
            queue='failed_orders',
            durable=True
        )
        self.channel.queue_bind(
            exchange='dlx',
            queue='failed_orders',
            routing_key='failed'
        )

        # Main queue with DLX
        self.channel.queue_declare(
            queue='orders',
            durable=True,
            arguments={
                'x-dead-letter-exchange': 'dlx',
                'x-dead-letter-routing-key': 'failed'
            }
        )

        # Fair dispatch
        self.channel.basic_qos(prefetch_count=1)

    def process_order(self, order_data):
        """Business logic - may fail"""
        print(f"Processing order: {order_data['order_id']}")

        # Simulate processing
        if order_data.get('should_fail'):
            raise Exception("Payment gateway timeout")

        # Success
        print(f"Order {order_data['order_id']} processed successfully")
        return True

    def callback(self, ch, method, properties, body):
        """Message handler with retry logic"""
        order_data = json.loads(body)

        # Track retry count
        headers = properties.headers or {}
        retry_count = headers.get('x-retry-count', 0)
        max_retries = 3

        try:
            self.process_order(order_data)
            ch.basic_ack(delivery_tag=method.delivery_tag)

        except Exception as e:
            print(f"Error processing order: {e}")

            if retry_count < max_retries:
                # Retry with exponential backoff
                retry_count += 1
                print(f"Retrying... (attempt {retry_count}/{max_retries})")

                # Republish with updated retry count
                ch.basic_publish(
                    exchange='',
                    routing_key='orders',
                    body=body,
                    properties=pika.BasicProperties(
                        delivery_mode=2,
                        headers={'x-retry-count': retry_count}
                    )
                )
                ch.basic_ack(delivery_tag=method.delivery_tag)

            else:
                # Max retries exceeded - send to DLQ
                print(f"Max retries exceeded for order {order_data['order_id']}")
                ch.basic_nack(delivery_tag=method.delivery_tag, requeue=False)

    def start(self):
        """Start consuming messages"""
        self.channel.basic_consume(
            queue='orders',
            on_message_callback=self.callback
        )
        print("Waiting for orders. Press CTRL+C to exit")
        try:
            self.channel.start_consuming()
        except KeyboardInterrupt:
            self.channel.stop_consuming()
        finally:
            if self.connection:
                self.connection.close()

if __name__ == '__main__':
    consumer = RobustRabbitMQConsumer()
    consumer.start()

Try It Yourself

Install dependencies: pip install flask flask-jwt-extended grpcio graphene-flask pika kafka-python
Run RabbitMQ: docker run -p 5672:5672 -p 15672:15672 rabbitmq:management
Run Kafka: docker-compose up -d kafka zookeeper
Test each example and modify parameters to see how they behave

Practice Exercises

Build these projects to master service communication patterns.

Exercise 1: Build a Multi-Protocol API Gateway

Goal: Create an API Gateway that routes requests to backend services using different protocols.

Requirements:

Accept HTTP REST requests from clients
Route product queries to a gRPC service
Route user auth to a REST service
Route order submissions to a RabbitMQ queue
Implement rate limiting (100 requests/minute per client)
Add request correlation IDs for tracing

Tech Stack: Node.js/Express for gateway, Python gRPC service, RabbitMQ

Bonus: Add circuit breaker for backend service failures

Exercise 2: Real-Time Notification System

Goal: Build a notification system using event-driven architecture.

Requirements:

Kafka producer publishes events (OrderPlaced, PaymentProcessed, ShippingStarted)
Multiple consumers:
- Email Service → Sends email notifications
- SMS Service → Sends SMS for critical events
- Push Notification Service → Mobile push notifications
- Analytics Service → Tracks event metrics
Ensure message ordering by user ID
Handle consumer failures gracefully with retry
Implement dead letter queue for failed messages

Tech Stack: Kafka, Python/Java consumers

Bonus: Add event versioning for backward compatibility

Exercise 3: GraphQL Federation for Microservices

Goal: Create a federated GraphQL schema across multiple microservices.

Requirements:

Service 1 (Users): Manages user data
Service 2 (Products): Manages product catalog
Service 3 (Orders): Manages orders, references users and products
Apollo Federation Gateway aggregates schemas

Client can query across services in one request:

query {
  order(id: "123") {
    id
    user { name email }
    items {
      product { name price }
      quantity
    }
  }
}

Tech Stack: Apollo Federation, Node.js

Bonus: Add authentication and authorization per service

Solution Hints:

Use @key directive to mark entities that can be extended
Use @extends and @external for cross-service references
Implement __resolveReference for entity resolution
Gateway handles query planning and execution across services

Protocol Comparison & Decision Guide

Quick reference for choosing the right communication pattern.

Complete Protocol Comparison

Protocol	Performance	Browser Support	Use Case	Pros	Cons
REST	Good	Native	Public APIs, CRUD operations	Simple, universal, human-readable	Over-fetching, multiple round-trips
gRPC	Excellent	Requires gRPC-Web	Internal microservices, real-time	Fast, streaming, code generation	Complex setup, binary format
GraphQL	Good	Native	Mobile apps, complex data needs	Flexible queries, no over-fetching	Complexity, caching challenges
RabbitMQ	Good (20K/sec)	N/A	Task queues, RPC	Reliable, routing flexibility	Message size limits, persistence overhead
Kafka	Excellent (1M+/sec)	N/A	Event streaming, log aggregation	High throughput, replay events	Complex setup, higher latency
WebSocket	Good	Native	Real-time apps, chat, gaming	Bi-directional, low latency	Stateful, scaling challenges

Decision Tree: Choosing the Right Pattern

Decision Framework

Need real-time bi-directional communication?
- YES → Use WebSocket or gRPC streaming
- NO → Continue...
Internal service-to-service communication?
- YES → Use gRPC (high performance) or REST (simplicity)
- NO (external/public) → Continue...
Need flexible data fetching for mobile/web?
- YES → Use GraphQL
- NO → Continue...
Asynchronous processing needed?
- YES → Continue to messaging decision...
- NO → Use REST
High throughput event streaming?
- YES (100K+ events/sec) → Use Kafka
- NO (task queues, <20K/sec) → Use RabbitMQ or AWS SQS

Best Practices

Idempotency

Same request = same result. Critical for retries and message reprocessing.

Implementation: Use idempotency keys in request headers

@app.route('/payments', methods=['POST'])
def process_payment():
    idempotency_key = request.headers.get('Idempotency-Key')
    if cache.get(idempotency_key):
        return cache.get(idempotency_key)  # Return cached response

    result = charge_payment(request.json)
    cache.set(idempotency_key, result, ttl=86400)  # Cache for 24 hours
    return result

Correlation IDs

Track requests across multiple services for debugging and monitoring.

Implementation: Pass correlation ID in headers across all service calls

import uuid
from flask import request

@app.before_request
def add_correlation_id():
    correlation_id = request.headers.get('X-Correlation-ID') or str(uuid.uuid4())
    g.correlation_id = correlation_id

@app.route('/products/')
def get_product(id):
    # Log with correlation ID
    logger.info(f"[{g.correlation_id}] Fetching product {id}")

    # Pass to downstream services
    response = requests.get(
        'http://inventory-service/check',
        headers={'X-Correlation-ID': g.correlation_id}
    )

Timeout Configuration

Always set timeouts to prevent hanging requests.

Service Type	Recommended Timeout
Database queries	5-10 seconds
Microservice calls	2-5 seconds
External APIs	10-30 seconds
Message queue processing	30-60 seconds

Real-World Architecture Examples

Netflix Communication Architecture

Client to API: REST + GraphQL (mobile uses GraphQL for flexible queries)
Service-to-Service: gRPC for low-latency internal calls
Event Processing: Kafka for user activity streams (billions of events/day)
Caching: EVCache (memcached) to reduce backend calls
Scale: 1000+ microservices, 100B+ API calls/day

Shopify Communication Patterns

Public API: GraphQL for merchant apps and integrations
Internal Services: REST + gRPC hybrid
Background Jobs: Kafka + Sidekiq (Ruby)
Real-time: WebSocket for live order notifications
Scale: 1M+ merchants, peak 80K requests/second

Common Anti-Patterns

Synchronous Cascade: Service A → B → C → D in sync. Use async for non-critical paths.
No Timeout: Requests hang forever. Always set timeouts.
Ignoring Backpressure: Producer overwhelms consumer. Use rate limiting and queue depth monitoring.
Large Payload Sync Calls: Sending MBs via REST. Use streaming or async file transfer.