Dev

Piggyback on Kafka Connect Schemas to process Kafka records in a generic way

When reading from/writing to Kafka topics, a serializer/deserializer (a.k.a SerDes) is needed to process record key and value bytes. Specific SerDes that turn bytes into specific objects (e.g. POJO) are used, unless a generic JSON object or Avro structure is used. Kafka Connect has to deal with generic structures to apply message transformations and convert messages from external sources into Kafka records and vice-versa. It has a SchemaAndValue composed type that includes a Connect Schema type derived from Schema Registry or JSON Schema included in the payload, and a value object.
Read more
til kafka connect dev

Kafka Streams: Tick stream-time with control messages

Kafka Streams is in many ways governed by the concept of time. For instance, as soon as stateful operations are used, the event-time drives how events are grouped, joined, and emitted. Stream-time is the concept within Kafka Streams representing the largest timestamp seen by the the stream application (per-partition). In comparison with wall-clock time (i.e. system time) — at the execution of an application — stream-time is driven by the data seen by the application. This ensures that the results produced by a Kafka Streams application are reproducible. One nuance of stream-time is that it needs incoming events to “tick”. This could represent an issue for events that are sparse in time, and we expect results to be produced more often (e.g. windows to be closed and emit, punctiation to be calculated). This is a known issue, and there are some proposals to overcome it in certain parts of the framework, e.g. KIP-424. This post covers a proof-of-concept instrumenting producers to emit contol messages to advance stream time.
Read more
poc kafka-streams dev

Kafka client applications with GraalVM

Shipping CLI binaries with Java hasn’t been the most user-friendly experience. Java is required to be installed on the client-side, starting Java applications (e.g. executable JAR) tend to be slower than to binary applications. GraalVM, and specifically native-image tooling, is aimed to tackle most of these issues with Java by enable building native binaries from Java applications. Even though this has been supported for a while now, reflection and other practices require additional configurations that make this process either unsupported or very cumbersome to implement. With the arrival of new frameworks that target the benefits of GraalVM, like Micronaut and Quarkus, it started to be possible and simpler to implement applications that included Kafka clients, and package them as native binaries. This post is going to explore the steps to package vanilla Kafka client applications —i.e. no framework— as native binaries.
Read more
kafka cli graalvm dev

Explore Kafka data with kcat, sqlite, and Datasette

I have been playing with Datasette and sqlite for a bit, trying to collect and expose data efficiently for others to analyze. Recently started finding use-cases to get data from Apache Kafka, and expose it quickly to analyze it. Why not using Datasette?
Read more
til datasette sqlite kcat kafka dev data

Releasing OS-specific GraalVM native image binaries easier with JReleaser

Packaging and releasing Java applications (e.g. CLI) tend to be cumbersome, and the user-experience tended not to be the best as users have to download a valid version of JRE, etc.
Read more
til java graalvm dev

Changing void returning type in Java methods breaks binary compatibility

While proposing changes to Kafka Streams DSL, I propose changing the return type of one method from void to KStream<KOut, VOut. I was under the (wrong) impression that this change wouldn’t affect users. I was also not considering that applications might just drop a new library without recompiling their application.
Read more
til java compatibility dev

Kafka Streams FK-join within the same KTable

KTable to KTable foreign-key joins is one of the coolest features in Kafka Streams. I was wondering whether this feature would handle FK-joins between values on the same table.
Read more
til kafka-streams streaming-joins dev

Kafka Streams abstracts access to multiple tasks state stores when reading

Kafka Streams applications could scale either horizontally (add more instances) or vertically (add more threads). When scaled vertically, multiple tasks store multiple partitions locally. An interesting question is whether Kafka Streams gives access when reading (i.e. Interactive Queries) to these stores, and how does it manage to abstract the access to different stores managed by multiple tasks.
Read more
til kafka-streams api dev