Distributed Systems

Notes on Co-evolving Tracing and Fault Injection with Box of Pain

This paper explores how related tracing and fault injection systems are, and if they should be part of the same thing. The space of possible executions of a distributed system is exponential in the number of communicating precesses and the number of messages, […] […] some of the most pernicious bugs in distributed programs involve mistakes on how programs handle partial failure of remote components. In order to expose this failures, fault injection mechanisms are used to cause network partitions, or machine crashes.
Read more
papers distributed systems tracing fault injection peter alvaro daniel bittman ethan l miller

Data on the Outside vs Data on the Inside

I found this paper as relevant and accurate today as it was in 2005, when it was published. It is fascinating how even 12 years later and with new technologies in vogue, same concepts keep applying.
Read more
papers distributed systems microservices pat helland transactions