Getting Started with Python

Why Schema Registry?

Kafka stores messages as raw bytes. Without schemas, producers and consumers must agree on the message format out-of-band. This breaks in practice — a producer adds a field, a consumer doesn't know about it, and deserialization fails at 3am on a Saturday.

Schema Registry solves this by providing a central registry of message schemas. Producers register schemas before sending, consumers fetch schemas before reading, and the registry enforces compatibility rules to prevent breaking changes.

Apache Avro

Avro is the most common serialization format with Schema Registry. It's compact (binary), fast, and supports schema evolution. Unlike JSON, Avro messages don't include field names — the schema defines the structure, and the data is just values.

{
  "type": "record",
  "name": "User",
  "namespace": "com.example",
  "fields": [
    {"name": "id", "type": "int"},
    {"name": "name", "type": "string"},
    {"name": "email", "type": "string"},
    {"name": "age", "type": ["null", "int"], "default": null}
  ]
}

The age field uses a union type ["null", "int"] with a default of null. This means age is optional — old messages without an age field can still be read by consumers expecting one.

Compatibility Modes

Schema Registry enforces compatibility between schema versions to prevent breaking consumers:

Mode	Rule	Safe Changes	Unsafe Changes
BACKWARD	New schema can read old data	Add optional field, delete field	Add required field
FORWARD	Old schema can read new data	Delete optional field, add field	Delete required field
FULL	Both directions	Add/delete optional fields only	Any required field change
NONE	No checks	Anything goes	—

Key Takeaway: Use BACKWARD compatibility (the default). This ensures consumers can always read data written by older producers. Always add new fields as optional with defaults.

# Register a schema
curl -X POST http://localhost:8081/subjects/users-value/versions \
  -H "Content-Type: application/vnd.schemaregistry.v1+json" \
  -d '{"schema": "{\"type\":\"record\",\"name\":\"User\",\"fields\":[{\"name\":\"id\",\"type\":\"int\"},{\"name\":\"name\",\"type\":\"string\"}]}"}'

# Check compatibility before registering
curl -X POST http://localhost:8081/compatibility/subjects/users-value/versions/latest \
  -H "Content-Type: application/vnd.schemaregistry.v1+json" \
  -d '{"schema": "..."}'

# List all subjects
curl http://localhost:8081/subjects

🔍 Deep Dive: Protobuf and JSON Schema

While Avro is the most common format, Schema Registry also supports Protobuf (popular in gRPC services) and JSON Schema (human-readable but less efficient). Choose Avro for new projects, Protobuf if your team already uses gRPC, and JSON Schema only if human readability of raw messages is critical for debugging.

Schema Registry Flow

Producer

serialize

→

Schema Registry

validate schema

→

Kafka

store bytes

→

Consumer

deserialize

Practice Exercises

Medium Build a Mini Project

Combine concepts from this tutorial to build a small utility or tool.

Medium Debug Challenge

Introduce a bug in one of the code examples and practice finding and fixing it.

Hard Refactoring Exercise

Rewrite one example using a different approach and compare the tradeoffs.