CrystalBall: Predicting and Preventing Inconsistencies in Deployed Distributed Systems

We propose a new approach for developing and deploying distributed systems, in which nodes predict distributed consequences of their actions, and use this information to detect and avoid errors. Each node continuously runs a state exploration algorithm on a recent consistent snapshot of its neighborhood and predicts possible future violations of specified safety properties. We present a new state exploration algorithm, consequence prediction, which explores causally related chains of events that lead to property violation. This paper describes the design and the implementation of this approach, termed CrystalBall. Using CrystalBall we identified new bugs in mature Mace implementations of a random overlay tree and the Chord distributed hash table implementation. Furthermore, we show show that if the bug is not corrected during system development, CrystalBall is effective in steering the execution away from inconsistent states at run-time.

Citation

Maysam Yabandeh, Nikola Knežević, Dejan Kostić, and Viktor Kuncak. CrystalBall: Predicting and preventing inconsistencies in deployed distributed systems. Technical Report LARA-REPORT-2008-006, EPFL-IC-LARA, May 2008.

BibTex Entry

@techreport{YabandehETAL08CrystalBall,
  author = {Maysam Yabandeh and Nikola Kne\v{z}evi\'c and Dejan Kosti\'c and Viktor Kuncak},
  title = {{CrystalBall}: Predicting and Preventing Inconsistencies in Deployed Distributed Systems},
  institution = {EPFL-IC-LARA},
  year = 2008,
  number = {LARA-REPORT-2008-006},
  abstract = {
  We propose a new approach for developing and deploying
  distributed systems, in which nodes predict distributed
  consequences of their actions, and use this information to
  detect and avoid errors.  Each node continuously runs a
  state exploration algorithm on a recent consistent
  snapshot of its neighborhood and predicts possible future
  violations of specified safety properties.  We present a
  new state exploration algorithm, consequence prediction,
  which explores causally related chains of events that 
  lead to property violation.
  This paper describes the design and the implementation of
  this approach, termed CrystalBall.  Using CrystalBall we
  identified new bugs in mature Mace implementations of a
  random overlay tree and the Chord distributed hash table
  implementation.  Furthermore, we show show that if the bug
  is not corrected during system development, CrystalBall is
  effective in steering the execution away from inconsistent
  states at run-time.
},
  month = {May}
}