Impressions from Kafka Summit London 2024

I have been lucky to attend Kafka Summit London this year (thanks, Aiven!) and wanted to share some notes on topics that caught my attention from the sessions I was able to attend:
Read more
kafka-summit

On Anarchism book

On Anarchism by Noam Chomsky
Read more
ideas chomsky

Ingesting log files to sqlite

I recently was looking into how to analyze multiple related log files (e.g. application log from multiple instances), and found that sqlite may be enough :) The first step is ingesting content from log files into sqlite tables. sqlite-utils to the rescue! I was initially happy with having each line as a row and adding full-text support to the log column to query events. However, a Java log may span across multiple lines and the outputs may not be ideal — timestamps could be in 1 line, and the stack trace root cause in another one.
Read more
til sqlite datasette ops troubleshooting

Kafka Emulator CLI: Record and Reply Records considering time-distance

Looking for alternative ways to reproduce time-based conditions in Kafka Streams applications —e.g. if you’re doing some sort of join or windowing based on time— I ended up creating a CLI tool to support a couple of features: Record events from topics, including their timestamps and gap Replay events, including waiting periods between them SQLite is used a storage for recorded events, so events can be generated, updated, tweaked using SQL.
Read more
poc kafka cli tools

Piggyback on Kafka Connect Schemas to process Kafka records in a generic way

When reading from/writing to Kafka topics, a serializer/deserializer (a.k.a SerDes) is needed to process record key and value bytes. Specific SerDes that turn bytes into specific objects (e.g. POJO) are used, unless a generic JSON object or Avro structure is used. Kafka Connect has to deal with generic structures to apply message transformations and convert messages from external sources into Kafka records and vice-versa. It has a SchemaAndValue composed type that includes a Connect Schema type derived from Schema Registry or JSON Schema included in the payload, and a value object.
Read more
til kafka connect dev

Kafka Streams: Tick stream-time with control messages

Kafka Streams is in many ways governed by the concept of time. For instance, as soon as stateful operations are used, the event-time drives how events are grouped, joined, and emitted. Stream-time is the concept within Kafka Streams representing the largest timestamp seen by the the stream application (per-partition). In comparison with wall-clock time (i.e. system time) — at the execution of an application — stream-time is driven by the data seen by the application. This ensures that the results produced by a Kafka Streams application are reproducible. One nuance of stream-time is that it needs incoming events to “tick”. This could represent an issue for events that are sparse in time, and we expect results to be produced more often (e.g. windows to be closed and emit, punctiation to be calculated). This is a known issue, and there are some proposals to overcome it in certain parts of the framework, e.g. KIP-424. This post covers a proof-of-concept instrumenting producers to emit contol messages to advance stream time.
Read more
poc kafka-streams dev

Getting started with Kafka quotas

Kafka quotas have been around for a while since initial versions of the project — though not necessarily being enabled in most deployments that I have seen. This post shares some thoughts on how to start adopting quotas and gives some practical advice, and a bit of the history of quotas in the Kafka project.
Read more
kafka ops

sqlite can be used document and graph database

I found that the use-cases for sqlite keep increasing now the JSON is supported. This week I found the following presentation: https://www.hytradboi.com/2022/simple-graph-sqlite-as-probably-the-only-graph-database-youll-ever-need Which makes the case for a simple graph schema, and using SQL out-of-the-box functionality to store graphs and execute traversal queries. This repository is actually based on this one focused on JSON support and document databases: https://dgl.cx/2020/06/sqlite-json-support
til sqlite database

Kafka client applications with GraalVM

Shipping CLI binaries with Java hasn’t been the most user-friendly experience. Java is required to be installed on the client-side, starting Java applications (e.g. executable JAR) tend to be slower than to binary applications. GraalVM, and specifically native-image tooling, is aimed to tackle most of these issues with Java by enable building native binaries from Java applications. Even though this has been supported for a while now, reflection and other practices require additional configurations that make this process either unsupported or very cumbersome to implement. With the arrival of new frameworks that target the benefits of GraalVM, like Micronaut and Quarkus, it started to be possible and simpler to implement applications that included Kafka clients, and package them as native binaries. This post is going to explore the steps to package vanilla Kafka client applications —i.e. no framework— as native binaries.
Read more
kafka cli graalvm dev

Explore Kafka data with kcat, sqlite, and Datasette

I have been playing with Datasette and sqlite for a bit, trying to collect and expose data efficiently for others to analyze. Recently started finding use-cases to get data from Apache Kafka, and expose it quickly to analyze it. Why not using Datasette?
Read more
til datasette sqlite kcat kafka dev data