In this section, we explain why DPOR is not directly applicable to distributed systems and even if we can apply a modified version of that, we need to prove its soundness and completeness again.
First, I need to clarify that Happens-before (HB) definition is different from system to system. For example, in DPOR if two events are dependent they are considered as happens-before. However, in distributed systems two events e1 and e2 can be received by the same node (dependent) but there is not strict happens-before relationship between them. Nevertheless, it is worth noting that it might not be a serious issue; we can redefine the happens-before in distributed systems to be able to use the algorithm.
To clarify this, the following example, illustrates the difference between state space that would be explored by a perfect HP-based algorithm, and the state space explored by an algorithm which blindly applies the techniques of DPOR on a simple example.
As shown in the above picture, the blind DPOR-inspired algorithm explores the extra path of e2-e1-e3.
The basic idea is as follows: in each explored path look for pairs of events which - are dependent - do not have happens-before relation It implies that, they can be reordered to check for possible effects of dependency.
After finding a case, we a add a branch before the first event. We attach to the branch the desired order which motivated us to add the branch. Later in exploration, we force the algorithm to follow the desired orders.
Example, in Figure above, but and . Also, we have and are dependent. Hence, after exploring the first path, we add a branch before to explore . However, we also attach the desired order of . In the next exploration round, the model checker forces this order and explores only one path (and not the extra one).
The attached desired order also helps to have a correct algorithm as well. Remember that in contrary to multi-threaded systems, there are several events that can be run at each step and considering the bounded depth, it is not guaranteed that all of them have a chance to appear in the bounded path. Hence, adding the branch does not guarantee that the algorithm explores the different orders of the current events at the next round (when it explores the branch). Having this imposed order, guarantees that the algorithm check the alternative order that we had in mind.
s is start of the current subtree, m is the branch point, n is the end
If we are checking only local properties, then still there are redundant paths which are explored by this algorithm. For example, in the following Figure, if the state of the content of the message emitted by the second node does not depend to receipt of , the 4th explored path is redundant.
Checking the happens-before and in-dependency relation with all events in the subtree is not efficient because of duplicate checks. To address that, we introduce the notion of last explored path (lastPath) and all the checking operations are done only on this path. The notion is depicted in Figure below:
In this section, we explain formally our definition of happens-before for state-machine based systems.
First, we need to define events. We have two type of events: local and message-receive events. Local events can be a fired timer or an application request. Both local and message-receive events might send some new messages to the network, which we include the send messages as part of the running event.
We have two rules:
Note that our definition of events and HB is different from Lamport; in Lamport's definition, send of events is a separate event. However, our definition is closer to the nature of events in state-machine based systems.
In the first glance, this definition of HB might raise some concerns that the search might not be complete. In this section, by an example we try to address this concern.
In the example below, , , and . Thus, by transitivity we also have . Hence, even though they are dependent, since they are not concurrent the system would not consider the case where is received after .
This is true that the algorithm does not immediately check this case, however, because and are concurrent, the algorithm first consider the case that is received before . This case is depicted in the figure below. After that, in the obtained sequence and hence the receipt of before is considered which is depicted in the figure below.
In this section, we give an example in which the DPOR-DS can avoid exponential explosion in state space. In the figure below, you can see that events are received by different nodes, .
The number of explored states, even after remove duplicates by global duplicate state detection, is as follows:
For =10, we have =5120 and for =30, we have =16106127360.
However, by DPOR-DS, we have .