Notes on Kafka, Samza and the Unix Philosophy of Distributed Data
From Batch to Streaming workflows Key properties for large-scale systems:
[Large-Scale Personalized Services] should have the following properties:
System scalability Organizational scalability Operational robustness Where Batch jobs have been successfully used, and represent a reference model to improve from:
[Batch, Map-Reduce jobs] has been remarkably successful tool for implementing recommendation systems.
[Batch important benefits:]
Multi-consumer: several jobs reading input directories without affecting each others. Visibility: job’s input and output can be inspected for tracking down the cause of an error.
Read more