Notes

Notes on Kafka, Samza and the Unix Philosophy of Distributed Data

 From Batch to Streaming workflows Key properties for large-scale systems: [Large-Scale Personalized Services] should have the following properties: System scalability Organizational scalability Operational robustness Where Batch jobs have been successfully used, and represent a reference model to improve from: [Batch, Map-Reduce jobs] has been remarkably successful tool for implementing recommendation systems. [Batch important benefits:] Multi-consumer: several jobs reading input directories without affecting each others. Visibility: job’s input and output can be inspected for tracking down the cause of an error.
Read more
notes kafka samza