Schema Registry & Avro

Schema evolution, compatibility modes, and Avro serialization.

Intermediate 35 min read 📨 Kafka

Why Schema Registry?

Kafka stores messages as raw bytes. Without schemas, producers and consumers must agree on the message format out-of-band. This breaks in practice — a producer adds a field, a consumer doesn't know about it, and deserialization fails at 3am on a Saturday.

Schema Registry solves this by providing a central registry of message schemas. Producers register schemas before sending, consumers fetch schemas before reading, and the registry enforces compatibility rules to prevent breaking changes.

Apache Avro

Avro is the most common serialization format with Schema Registry. It's compact (binary), fast, and supports schema evolution. Unlike JSON, Avro messages don't include field names — the schema defines the structure, and the data is just values.

{
  "type": "record",
  "name": "User",
  "namespace": "com.example",
  "fields": [
    {"name": "id", "type": "int"},
    {"name": "name", "type": "string"},
    {"name": "email", "type": "string"},
    {"name": "age", "type": ["null", "int"], "default": null}
  ]
}

The age field uses a union type ["null", "int"] with a default of null. This means age is optional — old messages without an age field can still be read by consumers expecting one.

Compatibility Modes

Schema Registry enforces compatibility between schema versions to prevent breaking consumers:

ModeRuleSafe ChangesUnsafe Changes
BACKWARDNew schema can read old dataAdd optional field, delete fieldAdd required field
FORWARDOld schema can read new dataDelete optional field, add fieldDelete required field
FULLBoth directionsAdd/delete optional fields onlyAny required field change
NONENo checksAnything goes
Key Takeaway: Use BACKWARD compatibility (the default). This ensures consumers can always read data written by older producers. Always add new fields as optional with defaults.
# Register a schema
curl -X POST http://localhost:8081/subjects/users-value/versions \
  -H "Content-Type: application/vnd.schemaregistry.v1+json" \
  -d '{"schema": "{\"type\":\"record\",\"name\":\"User\",\"fields\":[{\"name\":\"id\",\"type\":\"int\"},{\"name\":\"name\",\"type\":\"string\"}]}"}'

# Check compatibility before registering
curl -X POST http://localhost:8081/compatibility/subjects/users-value/versions/latest \
  -H "Content-Type: application/vnd.schemaregistry.v1+json" \
  -d '{"schema": "..."}'

# List all subjects
curl http://localhost:8081/subjects
🔍 Deep Dive: Protobuf and JSON Schema

While Avro is the most common format, Schema Registry also supports Protobuf (popular in gRPC services) and JSON Schema (human-readable but less efficient). Choose Avro for new projects, Protobuf if your team already uses gRPC, and JSON Schema only if human readability of raw messages is critical for debugging.

Schema Registry Flow
Producer
serialize
Schema Registry
validate schema
Kafka
store bytes
Consumer
deserialize

Practice Exercises

Medium Build a Mini Project

Combine concepts from this tutorial to build a small utility or tool.

Medium Debug Challenge

Introduce a bug in one of the code examples and practice finding and fixing it.

Hard Refactoring Exercise

Rewrite one example using a different approach and compare the tradeoffs.