Getting Started with Python

What is ksqlDB?

ksqlDB lets you write stream processing logic using SQL instead of Java/Python code. It runs on top of Kafka Streams but exposes a familiar SQL interface. You can filter, join, aggregate, and window Kafka data with simple SQL statements.

-- Create a stream from a Kafka topic
CREATE STREAM page_views (
  user_id VARCHAR KEY,
  page VARCHAR,
  timestamp BIGINT
) WITH (
  KAFKA_TOPIC = 'page-views',
  VALUE_FORMAT = 'JSON'
);

-- Filter: only homepage views
CREATE STREAM homepage_views AS
  SELECT * FROM page_views WHERE page = '/home';

-- Aggregate: count views per page in 5-minute windows
CREATE TABLE page_view_counts AS
  SELECT page, COUNT(*) AS view_count
  FROM page_views
  WINDOW TUMBLING (SIZE 5 MINUTES)
  GROUP BY page;

-- Query the materialized table
SELECT * FROM page_view_counts WHERE page = '/home';

Streams vs Tables in ksqlDB

Concept	SQL Analogy	Kafka Analogy	Example
STREAM	Unbounded INSERT-only table	KStream (event log)	Page views, clicks, orders
TABLE	Materialized view with primary key	KTable (latest per key)	User profiles, account balances

Push vs Pull Queries

Pull queries (like traditional SQL) return current state: SELECT * FROM users WHERE id = 42;

Push queries stream continuous results: SELECT * FROM page_views EMIT CHANGES; — stays open and pushes every new event.

Key Takeaway: ksqlDB is the fastest way to build stream processing without writing code. Use it for real-time dashboards, alerting, data enrichment, and ETL. For complex logic, use Kafka Streams directly.

ksqlDB & Stream Processing

What is ksqlDB?

Streams vs Tables in ksqlDB

Push vs Pull Queries

Practice Exercises

Hard Production Scenario

Hard Performance Analysis

What is ksqlDB?

Streams vs Tables in ksqlDB

Push vs Pull Queries

Practice Exercises

Hard Production Scenario

Hard Performance Analysis

Related Topics