Kafka Streams: Tick stream-time with control messages
Kafka Streams is in many ways governed by the concept of time. For instance, as soon as stateful operations are used, the event-time drives how events are grouped, joined, and emitted.
Stream-time is the concept within Kafka Streams representing the largest timestamp seen by the the stream application (per-partition). In comparison with wall-clock time (i.e. system time) — at the execution of an application — stream-time is driven by the data seen by the application. This ensures that the results produced by a Kafka Streams application are reproducible.
One nuance of stream-time is that it needs incoming events to “tick”. This could represent an issue for events that are sparse in time, and we expect results to be produced more often (e.g. windows to be closed and emit, punctiation to be calculated).
This is a known issue, and there are some proposals to overcome it in certain parts of the framework, e.g. KIP-424.
This post covers a proof-of-concept instrumenting producers to emit contol messages to advance stream time.
Read more