CrystalBall for avoiding inconsistencies

Why Execution Steering? Why not only 1step?

  1. Avoiding bug sooner can have performance benefits, like reduced unnecessary transmitted message
  2. Faster reaction might mean more chance not to stuck in a liveness bug
  3. The node which is supposed to avoid inconsistency by 1step might not have snapshots necessary to notice the violation

Why only 1step?

  1. The filter we install might be either too generic (results in false positive) or too specific (miss similar cases; false negatives)
  2. Avoiding bug sooner have false positive effect, because we never can be sure that system will definitely go towards the erroneous path.
  3. If there is a bug in the system, all I want is to temporarily bypass it, I do not care about performance
  4. Assume that we have implemented symbolic execution and nodes are always ready for receipt of any kind of message, then the last node in bug path will have more insight about the consequences of steering than us, so it is better to let him decide

Why forget about 1step as well?

  1. Specifying the requirements of the system in the form of safety property is almost impossible
  2. Safety properties need global view of the system which is not scalable

So, Where CrystalBall can be useful then?

CrystalBall for improving performance & liveness

  1. By looking till depth 10, we can not see the tangible difference in performance or liveness to make decision
  2. I can not follow the consequences of event towards outside my view, to check its performance
  3. making decision at non-determinism point requires a reasoning based on the node's state and environment's state, not just simple probabilistic selection of one path, consequently we have two options:
    • install a filter for non-determinism point that does complex reasoning
    • install a simple deterministic filter which works only for the current state; for future decision at this non-determinism point we need another separate systematic exploration to install a new filter.

So, Where CrystalBall can be helpful?

  1. Application should give crystalBall some pictures of the global state at different granularities;
    • A precise picture of my immediate neighborhood
    • A less accurate picture of my two-step neighborhoods
    • A very vague picture of the whole system
  2. Application should define operators that get a picture of the system with a given granularity and an event as inputs and return the consequences of the event on the picture.
  3. Then the system would be:
    • scalable: because I do not need precise snapshot from every node in the system
    • efficient: even by searching up to depth 10, I can obtain a general idea of the event's impact on the system as a whole
  4. The systematic search by crystalBall would be only symbolic execution, in this way, crystalBall would have prepared an answer for every possible incoming event; of course this filter is temporary and will expire by the next round

Example: RandTree

Modelchecking vs. Symbolic Execution

The figure below demonstrates the effect of systematic exploration along the live run while we have limited view of the system. The yellow part is the parts that we can manage to check. The other parts would be missed because we do not have snapshots from all the nodes.

The figure below, depicts the coverage rate of symbolic execution. As you can see the yellow part which are checked by SE, scattered along the model space and hence most of them are irrelative. On the other hand, the advantage is that we are totally prepared for the near future and we do not miss any spot.

Hence, although systematic modelchecking might be more useful for bug finding, symbolic execution fits more for the sake of execution steering.

In other words, we have two purpose of modelchecking by live run:

  1. check the relevant states that are more likely to happen in practice
  2. be ready for near future inconsistencies.

The normal model checker fulfill the first objective: it might not prepare us for all the events that are about to happen (because of in flight messages, missing snapshots that generate the message and so on), but all the state that are checked are relevant; they really can happen in the step.

On the other hand, symbolic execution fulfills the second objective: I would be prepared for any kind of message that could happen in near future. But, most of events that I am prepared for are totally irrelevant, because there precondition of them are not hold by the system. e.g. I am prepared by a join message from node i which is far far away, but the state of this node will never issue such a message.