kafka

Kafka Emulator CLI: Record and Reply Records considering time-distance

Posted on Aug 24, 2022

Looking for alternative ways to reproduce time-based conditions in Kafka Streams applications —e.g. if you’re doing some sort of join or windowing based on time— I ended up creating a CLI tool to support a couple of features: Record events from topics, including their timestamps and gap Replay events, including waiting periods between them SQLite is used a storage for recorded events, so events can be generated, updated, tweaked using SQL.

poc kafka cli tools

Piggyback on Kafka Connect Schemas to process Kafka records in a generic way

Posted on Aug 24, 2022

When reading from/writing to Kafka topics, a serializer/deserializer (a.k.a SerDes) is needed to process record key and value bytes. Specific SerDes that turn bytes into specific objects (e.g. POJO) are used, unless a generic JSON object or Avro structure is used. Kafka Connect has to deal with generic structures to apply message transformations and convert messages from external sources into Kafka records and vice-versa. It has a SchemaAndValue composed type that includes a Connect Schema type derived from Schema Registry or JSON Schema included in the payload, and a value object.

til kafka connect dev

Getting started with Kafka quotas

Posted on May 11, 2022

Kafka quotas have been around for a while since initial versions of the project — though not necessarily being enabled in most deployments that I have seen. This post shares some thoughts on how to start adopting quotas and gives some practical advice, and a bit of the history of quotas in the Kafka project.

kafka ops

Kafka client applications with GraalVM

Posted on Mar 18, 2022

Shipping CLI binaries with Java hasn’t been the most user-friendly experience. Java is required to be installed on the client-side, starting Java applications (e.g. executable JAR) tend to be slower than to binary applications. GraalVM, and specifically native-image tooling, is aimed to tackle most of these issues with Java by enable building native binaries from Java applications. Even though this has been supported for a while now, reflection and other practices require additional configurations that make this process either unsupported or very cumbersome to implement. With the arrival of new frameworks that target the benefits of GraalVM, like Micronaut and Quarkus, it started to be possible and simpler to implement applications that included Kafka clients, and package them as native binaries. This post is going to explore the steps to package vanilla Kafka client applications —i.e. no framework— as native binaries.

kafka cli graalvm dev

Explore Kafka data with kcat, sqlite, and Datasette

Posted on Mar 10, 2022

I have been playing with Datasette and sqlite for a bit, trying to collect and expose data efficiently for others to analyze. Recently started finding use-cases to get data from Apache Kafka, and expose it quickly to analyze it. Why not using Datasette?

til datasette sqlite kcat kafka dev data

Enable Certificate Revocation on Kafka clusters

Posted on Feb 9, 2022

Recently I got a question on how to manage revoked SSL certificates in Kafka clusters. With a proper Public Key Infrastructure, a Certificate Revocation List (CRL) can be available for clients to validate if a certificate is still valid regardless of its time-to-live. For instance, if a private key has been compromised, then a certificate can be revoked before it’s valid date.

til kafka ssl ops security

Changing Kafka Broker's rack

Posted on Dec 10, 2021

Kafka broker configuration includes a rack label to define the location of the broker. This is useful when placing replicas across the cluster to ensure replicas are spread across locations as evenly as possible.

til kafka deployment ops

Kafka data loss scenarios

Posted on Dec 10, 2021

Kafka topic partitions are replicated across brokers. Data loss happens when the brokers where replicas are located are unavailable or have fully failed. The worst scenario — and where is no much to do — is when all the brokers fail; then no remediation is possible. Replication allows to increase redundancy so this scenarios is less likely to happen. The following scenarios show different trade-offs that could increase the risk of lossing data:

kafka durability ops

Kafka Producer idempotency is enabled by default since 3.0

Posted on Dec 9, 2021

Since Apache Kafka 3.0, Producers come with enable.idempotency=true which leads to acks=all, along with other changes enforced by idempotency. This means by default Producers will be balanced between latency (no batching) and durability — different from previous versions where the main goal was to reduce latency even by risking durability with acks=1.

til kafka latency performance-tuning

Reducing `acks` doesn't help to reduce end-to-end latency

Posted on Dec 9, 2021

Kafka Producers enforce durability across replicas by setting acks=all (default since v3.0). As enforcing this guarantee requires waiting for replicas to sync, this increases latency; and reducing it tends to give the impression that latency gets reduced overall.

til kafka latency ops

Use min.insync.replicas for fault-tolerance

Posted on Dec 2, 2021

Things to remember: Topic replication factor is not enough to guarantee fault-tolerance. If min.insync.replicas is not defined i.e. 1, then data could potentially be lost. acks=all will force replica leader to wait for all brokers in the ISR, not only the min.insync.replicas. If replicas available are equal to minimum ISR, then the topic partitions are at the edge of losing availability. If one broker becomes unavailable (e.g. restarting), then producers will fail to write data. Topic configuration is inherited from the server. If broker configuration changes, it affects the existing topics. Keep the topic defaults, unless it needs to be different than broker default for easier maintenance.

til kafka fault-tolerance ops

Making sense of Event-Driven Systems @ Oracle Code One 2019

Posted on Sep 19, 2019

Presented at Oracle Code One 2019

observability distributed-tracing kafka zipkin

Making sense of Event-Driven Systems @ Kafka Summit 2019

Posted on Apr 2, 2019

Presented at Kafka Summit NYC 2019

observability distributed-tracing kafka zipkin

The Importance of Distributed Tracing for Apache Kafka Based Applications

Posted on Mar 26, 2019

Originally posted in Confluent Blog Apache Kafka® based applications stand out for their ability to decouple producers and consumers using an event log as an intermediate layer. One result of this is that producers and consumers don’t know about each other, as there is no direct communication between them. This enables choreographed service collaborations, where many components can subscribe to events stored in the event log and react to them asynchronously.

event-driven distributed-tracing observability kafka zipkin

The Importance of Observability for Kafka-based applications with Zipkin @ Oslo Apache Kafka Meetup

Posted on Sep 10, 2018

Presented at Oslo Apache Kafka Meetup

observability distributed-tracing kafka zipkin

Notes on Kafka, Samza and the Unix Philosophy of Distributed Data

Posted on Jul 27, 2018

notes kafka samza

Tracing Kafka applications

Posted on Sep 28, 2017

for a more updated version, check https://jeqo.github.io/posts/2019-03-26-importance-of-distributed-tracing-for-apache-kafka-based-applications/ Tracing is one of the hardest time in integration or microservice development: knowing how a request impact your different components, and if your components have behave as expected. This could be fairly easy if we have monolith where we have one database and with some queries or checking one log file you can validate everything went well. Once you introduce distributed components and asynchronous communication this starts to get more complex and tedious.

integration observability kafka tracing

From Messaging to Logs with Apache Kafka @ OUGN 2017

Posted on Mar 10, 2017

Presented at OUGN 2017

development integration back-end integration kafka

Rewind Kafka Consumer Offsets

Posted on Jan 31, 2017

One of the most important features from Apache Kafka is how it manages Multiple Consumers. Each consumer group has a current offset, that determine at what point in a topic this consumer group has consume messages. So, each consumer group can manage its offset independently, by partition. This offers the possibility to rollback in time and reprocess messages from the beginning of a topic and regenerate the current status of the system. But how to do it (programmatically)?

integration kafka

Scaling Kafka with Docker Containers

Posted on Jan 15, 2017

In this post I will show how to use Docker containers to create and scale a Kafka cluster, and also how to create, scale and move topics inside the cluster.

devops integration kafka docker

Integrate Java EE 7 and Kafka using Avro and RxJava

Posted on Jul 31, 2015

I decided to implement a naive integration between Java EE applications and RxJava/Kafka/Avro, to publish and subscribe to events. You can go directly to that code, or check my approach:

development integration back-end java ee kafka avro rx