ksqlDB & Stream Processing

SQL-like stream processing with persistent queries, materialized views, and push queries.

Advanced 40 min read 📨 Kafka

What is ksqlDB?

ksqlDB lets you write stream processing logic using SQL instead of Java/Python code. It runs on top of Kafka Streams but exposes a familiar SQL interface. You can filter, join, aggregate, and window Kafka data with simple SQL statements.

-- Create a stream from a Kafka topic
CREATE STREAM page_views (
  user_id VARCHAR KEY,
  page VARCHAR,
  timestamp BIGINT
) WITH (
  KAFKA_TOPIC = 'page-views',
  VALUE_FORMAT = 'JSON'
);

-- Filter: only homepage views
CREATE STREAM homepage_views AS
  SELECT * FROM page_views WHERE page = '/home';

-- Aggregate: count views per page in 5-minute windows
CREATE TABLE page_view_counts AS
  SELECT page, COUNT(*) AS view_count
  FROM page_views
  WINDOW TUMBLING (SIZE 5 MINUTES)
  GROUP BY page;

-- Query the materialized table
SELECT * FROM page_view_counts WHERE page = '/home';

Streams vs Tables in ksqlDB

ConceptSQL AnalogyKafka AnalogyExample
STREAMUnbounded INSERT-only tableKStream (event log)Page views, clicks, orders
TABLEMaterialized view with primary keyKTable (latest per key)User profiles, account balances

Push vs Pull Queries

Pull queries (like traditional SQL) return current state: SELECT * FROM users WHERE id = 42;

Push queries stream continuous results: SELECT * FROM page_views EMIT CHANGES; — stays open and pushes every new event.

Key Takeaway: ksqlDB is the fastest way to build stream processing without writing code. Use it for real-time dashboards, alerting, data enrichment, and ETL. For complex logic, use Kafka Streams directly.

Practice Exercises

Hard Production Scenario

Design a solution using these concepts for a real-world production system.

Hard Performance Analysis

Benchmark two different approaches and explain which is better and why.