Kafka-Streams

Kafka Streams: Tick stream-time with control messages

Kafka Streams is in many ways governed by the concept of time. For instance, as soon as stateful operations are used, the event-time drives how events are grouped, joined, and emitted. Stream-time is the concept within Kafka Streams representing the largest timestamp seen by the the stream application (per-partition). In comparison with wall-clock time (i.e. system time) — at the execution of an application — stream-time is driven by the data seen by the application. This ensures that the results produced by a Kafka Streams application are reproducible. One nuance of stream-time is that it needs incoming events to “tick”. This could represent an issue for events that are sparse in time, and we expect results to be produced more often (e.g. windows to be closed and emit, punctiation to be calculated). This is a known issue, and there are some proposals to overcome it in certain parts of the framework, e.g. KIP-424. This post covers a proof-of-concept instrumenting producers to emit contol messages to advance stream time.
Read more
poc kafka-streams dev

Kafka Streams FK-join within the same KTable

KTable to KTable foreign-key joins is one of the coolest features in Kafka Streams. I was wondering whether this feature would handle FK-joins between values on the same table.
Read more
til kafka-streams streaming-joins dev

Kafka Streams abstracts access to multiple tasks state stores when reading

Kafka Streams applications could scale either horizontally (add more instances) or vertically (add more threads). When scaled vertically, multiple tasks store multiple partitions locally. An interesting question is whether Kafka Streams gives access when reading (i.e. Interactive Queries) to these stores, and how does it manage to abstract the access to different stores managed by multiple tasks.
Read more
til kafka-streams api dev