Data in Microservices

One of the biggest challenges: each microservice should own its data. No sharing databases!

Database Per Service Pattern

Core Principle

Each microservice has its own database. Other services can't access it directly - they must use the service's API.

Why?

  • Independence: Services can be deployed separately
  • Technology Choice: Use the right database for the job
  • Scalability: Scale databases independently
  • Resilience: One database failure doesn't affect all services
Database Per Service Example
# Order Service - PostgreSQL (Relational)
order_db = PostgreSQL({
    "host": "order-db",
    "database": "orders"
})

# Product Catalog - MongoDB (Document)
catalog_db = MongoDB({
    "host": "catalog-db",
    "database": "products"
})

# Session Store - Redis (Key-Value)
session_db = Redis({
    "host": "session-db"
})

The Challenge: Data Consistency

Problem

Order Service needs customer data, but Customer Service owns it. How do we keep data in sync?

Solutions:

  • API Calls: Query in real-time (simple but slow)
  • Data Replication: Copy data you need (fast but can be stale)
  • Event-Driven: Subscribe to data change events (best balance)

Data Synchronization Patterns

Techniques for maintaining data consistency across services.

Change Data Capture (CDC)

Capture database changes and stream them to other services.

Debezium CDC Configuration
{
  "name": "customer-connector",
  "config": {
    "connector.class": "io.debezium.connector.postgresql.PostgresConnector",
    "database.hostname": "customer-db",
    "database.port": "5432",
    "database.user": "debezium",
    "database.dbname": "customers",
    "table.include.list": "public.customers",
    "topic.prefix": "customer-events"
  }
}

Eventual Consistency

Accept that data won't be immediately consistent across all services - it will become consistent eventually.

Pattern Consistency Availability Use Case
Strong Consistency Immediate Lower Banking transactions
Eventual Consistency Delayed Higher Social media likes

Facebook's Data Consistency

  • Approach: Eventual consistency for most features
  • Example: Like counts may be slightly off for a few seconds
  • Benefit: Massive scalability (2+ billion users)
  • Trade-off: User experience remains good despite slight delays

Advanced Data Patterns

Sophisticated approaches for data management at scale.

Polyglot Persistence

Use different database technologies for different services based on their needs.

Database Selection Strategy
# User Service: PostgreSQL for relational data
users = PostgreSQL("user-db")

# Product Catalog: Elasticsearch for full-text search
products = Elasticsearch("product-search")

# Shopping Cart: Redis for fast key-value access
cart = Redis("cart-cache")

# Analytics: Cassandra for time-series data
analytics = Cassandra("analytics-db")

# File Storage: S3 for large objects
files = S3("file-bucket")

Data Mesh Architecture

Treat data as a product, with domain teams owning their data end-to-end.

Data Mesh Principles
  • Domain Ownership: Teams own their data pipelines
  • Data as a Product: High-quality, documented, discoverable
  • Self-Serve Platform: Infrastructure for data management
  • Federated Governance: Standards without centralization

Uber's Data Management

  • Scale: 100+ petabytes of data
  • Databases: MySQL, PostgreSQL, Cassandra, Redis, DynamoDB
  • Replication: Kafka for change data capture
  • Key Insight: No one-size-fits-all database solution