LARA – Lab for Automated Reasoning and Analysis -

DPOR_DS: Dynamic partial order reduction for Distributed Systems based on happens-before relation

Why DPOR does not work for distributed systems

In this section, we explain why DPOR is not directly applicable to distributed systems and even if we can apply a modified version of that, we need to prove its soundness and completeness again.

DPOR is designed for multi-threaded systems. There, there exist a concrete definition of shared objects between threads. In distributed systems, we need to first settle on a definition of the shared object, like the whole network. However, we can think of a receive buffer + local events as a shared object corresponding to each node. It sound reasonable, but one can come up with a different one.
At DPOR, at each step, each process can only run one event, i.e. running the next sequential command. The whole algorithm and notations are based on this assumption. However, in distributed systems, at each point of time, each process can run several events: trigger a scheduler, run an application events, receive one of the ready messages, drop messages or break connections, an so on.
there are several events that can be run at each step and considering the bounded depth, it is not guaranteed that all of them have a chance to appear in the bounded path. Hence, adding the branch does not guarantee that the algorithm explores the different orders of the current events at the next round (when it explores the branch).
In DPOR we have two types of operations: visible (done on communication objects) and invisible. Each transition is defined to start with a visible operations continued by a set of invisible operations. However, this does not work for distributed systems. For example, if we consider the visible operation of receiving a message, the handler can be done by sending another message; in other words, the transition starts with a visible operation and ends a visible operation as well.
The happens-before definition in DPOR is different from happens-before definition in distributed systems. For example, in DPOR if two events are dependent they are considered as happens-before. However, in distributed systems two events e1 and e2 can be received by the same node (dependent) but there is not strict happens-before relationship between them. Nevertheless, it is worth noting that it might not be a serious issue; we can redefine the happens-before in distributed systems to be able to use the algorithm.
It assumes that it is acyclic graph, we can never make such a assumption in distributed systems.
It assumes that in the first round, the algorithm finishes a path from start to end; i.e. at the last transition there is no other transition enabled. However, we know that in distributed systems, the complexity of the protocols makes these kinds of model checking impossible and that is why we have bounded DFS.
It is a state-less algorithm, however we can take advantage of the fact the search algorithms in MaceMC can be stateful.

Happens-before (HB)

First, I need to clarify that Happens-before (HB) definition is different from system to system. For example, in DPOR if two events are dependent they are considered as happens-before. However, in distributed systems two events e1 and e2 can be received by the same node (dependent) but there is not strict happens-before relationship between them. Nevertheless, it is worth noting that it might not be a serious issue; we can redefine the happens-before in distributed systems to be able to use the algorithm.

To clarify this, the following example, illustrates the difference between state space that would be explored by a perfect HP-based algorithm, and the state space explored by an algorithm which blindly applies the techniques of DPOR on a simple example.

As shown in the above picture, the blind DPOR-inspired algorithm explores the extra path of e2-e1-e3.

The main idea

The basic idea is as follows: in each explored path look for pairs of events which - are dependent - do not have happens-before relation It implies that, they can be reordered to check for possible effects of dependency.

After finding a case, we a add a branch before the first event. We attach to the branch the desired order which motivated us to add the branch. Later in exploration, we force the algorithm to follow the desired orders.

Example, in Figure above, $e_2 \to e_3$ but $e_1 \not\to e_3$ and $e_1 \not\to e_2$ . Also, we have $e_1$ and $e_3$ are dependent. Hence, after exploring the first path, we add a branch before $e_1$ to explore $e_2$ . However, we also attach the desired order of $[e_3,e_1]$ . In the next exploration round, the model checker forces this order and explores only one path (and not the extra one).

The attached desired order also helps to have a correct algorithm as well. Remember that in contrary to multi-threaded systems, there are several events that can be run at each step and considering the bounded depth, it is not guaranteed that all of them have a chance to appear in the bounded path. Hence, adding the branch does not guarantee that the algorithm explores the different orders of the current events at the next round (when it explores the branch). Having this imposed order, guarantees that the algorithm check the alternative order that we had in mind.

The (non-precise) algorithm

$DPOR-DS$

$path =$ Explore an arbitrary path of size $n$
$tree$ .add( $path$ )
AddBranch( $tree$ , $path$ , $0$ , $n-1$ , $n$ )

  s is start of the current subtree, m is the branch point, n is the end

$AddBranch$ ( $tree$ , $path$ , $s$ , $m$ , $n$ )

$subtree$ = $tree$ .subtreeUnderEvent( $path[m]$ )
foreach $e$ in $subtree$
1. if $path[m] \not\to e$ and $(path[m],e) \in D$
  1. $subpath$ = $tree$ .pathBetweenEvents( $path[m]$ , $e$ )
  2. $subpath$ = $subpath$ .reorderEvents( $path[m]$ , $e$ )
  3. $tree$ .addBranch( $m$ , $subpath$ ) #subpath is the desired order
  4. $newpath$ = ExploreOpenBranches( $tree$ )
  5. AddBranch( $tree$ , $newpath$ , $m$ , $n-1$ , $n$ ) #to limit the branches to the new subtree
if ( $s$ < $m$ )
1. AddBranch( $tree$ , $path$ , $s$ , $m-1$ , $n$ )

The problems of current algorithm

If we are checking only local properties, then still there are redundant paths which are explored by this algorithm. For example, in the following Figure, if the state of the content of the message $e_3$ emitted by the second node does not depend to receipt of $e_4$ , the 4th explored path is redundant.

The precise algorithm

Checking the happens-before and in-dependency relation with all events in the subtree is not efficient because of duplicate checks. To address that, we introduce the notion of last explored path (lastPath) and all the checking operations are done only on this path. The notion is depicted in Figure below:

$DPOR-DS$

$latPath =$ Explore an arbitrary path of size $n$
$lastBranch = -1$
do
1. #first, add all the branches
2. For $e_i$ in [ $lastPath[1]$ , $lastPath[n-1]$ ]
  1. For $e_j$ in [ $lastPath[max(i,lastBranch)+1]$ , $n$ ]
    1. if $e_i \not\to e_j$ and $(e_i,e_j) \in D$
      1. $hb\_list$ = $\{e_k | i+1 \le k \le j \And e_k \to e_j\}$
      2. $subpath$ = $hb\_list . e_i$
      3. $lastPath$ .addBranch( $i$ , $subpath$ ) #subpath is the desired order
3. #update the last branch
4. $lastBranch$ = last enabled branch in $lastPath$
5. #update the last path
6. $lastPath$ = explore( $lastPath$ , $lastBranch$ )
while ( $lastBranch$ > 0)

HB distilled

In this section, we explain formally our definition of happens-before for state-machine based systems.

First, we need to define events. We have two type of events: local and message-receive events. Local events can be a fired timer or an application request. Both local and message-receive events might send some new messages to the network, which we include the send messages as part of the running event.

We have two rules:

If $e_i$ and $e_j$ are events on the same process and in the serialized sequence of events $e_i$ happens before $e_j$ , then $e_i \to e_j$
If $e_j$ is send by the receive handler of $e_i$ , then $e_i \to e_j$

Note that our definition of events and HB is different from Lamport; in Lamport's definition, send of events is a separate event. However, our definition is closer to the nature of events in state-machine based systems.

HB example

In the first glance, this definition of HB might raise some concerns that the search might not be complete. In this section, by an example we try to address this concern.

In the example below, $e_1 \to e_2$ , $e_3 \to e_4$ , and $e_2 \to e_3$ . Thus, by transitivity we also have $e_1 \to e_4$ . Hence, even though they are dependent, since they are not concurrent the system would not consider the case where $e_1$ is received after $e_4$ .

This is true that the algorithm does not immediately check this case, however, because $e_2$ and $e_3$ are concurrent, the algorithm first consider the case that $e_3$ is received before $e_2$ . This case is depicted in the figure below. After that, in the obtained sequence $e_1 \not\to e_3$ and hence the receipt of $e_4$ before $e_1$ is considered which is depicted in the figure below.

Killer Example

In this section, we give an example in which the DPOR-DS can avoid exponential explosion in state space. In the figure below, you can see that $N$ events $e_1 \dots e_N$ are received by $N$ different nodes, $1 \dots N$ .

The number of explored states, even after remove duplicates by global duplicate state detection, is as follows:

$states = N + 2.\binom{N}{2} + 3.\binom{N}{3} + 4.\binom{N}{4} + \dots + N.\binom{N}{N}$

For $N$ =10, we have $states$ =5120 and for $N$ =30, we have $states$ =16106127360.

However, by DPOR-DS, we have $states = N$ .