Maysam Yabandeh, Nikola Knežević, Dejan Kostić, and Viktor Kuncak.
Predicting and preventing inconsistencies in deployed distributed
systems.
ACM Transactions on Computer Systems, 28(1), 2010.
We propose a new approach for developing and deploying
distributed systems, in which nodes predict distributed
consequences of their actions, and use this information to
detect and avoid errors. Each node continuously runs a
state exploration algorithm on a recent consistent
snapshot of its neighborhood and predicts possible future
violations of specified safety properties. We describe a
new state exploration algorithm, consequence prediction,
which explores causally related chains of events that
lead to property violation.
This article describes the design and implementation of this
approach, termed CrystalBall. We evaluate CrystalBall on RandTree,
BulletPrime, Paxos, and Chord distributed system implementations. We
identified new bugs in mature Mace implementations of three systems.
Furthermore, we show that if the bug is not corrected during
system development, CrystalBall is effective in steering the
execution away from inconsistent states at runtime.
[ bib ]
Back