Abstract Interpretation with Conjunctions of Predicates

Parameters: a finite set of formulas ${\cal P} = \{ P_0, P_1,\ldots,P_n\}$ in program variables (often called predicates in this context). Suppose $P_0$ is the formula 'false'.

Example: ${\cal P} = \{ P_0, P_1, P_2, P_3 \}$ with

$\begin{equation*} \begin{array}{rcl} P_0 &\equiv& \mbox{false} \\ P_1 &\equiv& 0 < x \\ P_2 &\equiv& 0 < y \\ P_3 &\equiv& x < y \\ \end{array} \end{equation*}$

A simple description of this analysis is as the instance of the following idea: assume optimistically that all predicates hold at every program point (the conjunction of all of them holds). Then check whether the result is a sufficiently annotated program. If it is not, then remove the predicates that do not hold, and repeat this process. In the worst case we end up with no predicates holding (the top element of the lattice). There are, however, different ways of removing predicates, and this abstract interpretation formulation gives us one particular way.

Suppose for simplicity we consider as states tuples $(x,y)$ of values of integer variables $x$ and $y$ .

For a formula $F$ containing variables $x$ and $y$ and given values $u,v$ we write $F(u,v)$ to denote the truth value of $F$ in interpretation that maps $x$ to $u$ and maps $y$ to $v$ . (We can formally define this if we have a truth-value function that gives semantics to formulas under a given interpretation function, as in classical semantics of first-order logic.)

Lattice

Let $A = 2^{\cal P}$ , so for $a \in A$ we have $a \subseteq {\cal P}$ .

We design analysis using Abstract Interpretation Recipe.

Define a syntactic order on $(A,\le)$ :

$\begin{equation*} a_1 \sqsubseteq a_2 \iff a_1 \supseteq a_2 \end{equation*}$

Clearly, this forms a finite lattice with height $n+1$ .

Note that, because the order is inverted, the full set ${\cal P}$ is the bottom and $\emptyset$ is the top of the lattice.

Example:

$\begin{equation*} \{ false, 0 < x, 0 < y, x < y \} \sqsubseteq \{ 0 < x, 0 < y \} \sqsubseteq \{ 0 < x \} \sqsubseteq \emptyset \end{equation*}$

Note that $a_1 \sqsubseteq a_2$ implies that the formula

$\begin{equation*} (\bigwedge_{P \in a_1} P) \rightarrow (\bigwedge_{P \in a_2} P) \end{equation*}$

is a valid formula, because every conjunct on right-hand side appears also on the left-hand side.

Does the converse hold?

We could also define ordering on elements of $A$ using the above implication. This is a pre-order. We could work with equivalence classes if we use a theorem prover to check the equivalence of formulas.

Concretization Function Gamma

Define

$\begin{equation*} \gamma(a) = \{ s \mid \bigwedge_{P \in a} P(s) \} \end{equation*}$

i.e. set of states in which all predicates in $a$ evaluate to true.

Example: In our example, the definition reduces to

$\begin{equation*} \gamma(a) = \{ (x,y) \mid \bigwedge_{P \in a} P(x,y) \} \end{equation*}$

In particular, whenever $false \in a$ , then

$\begin{equation*} \gamma(a) = \{ (x,y) \mid false \} = \emptyset \end{equation*}$

Empty conjunction is by definition true, so

$\begin{equation*} \gamma(\emptyset) = \{ (x,y) \mid true \} = \mathbb{Z}^2 \end{equation*}$

Monotonicity of Gamma

We observe that concretization is monotonic. Indeed, suppose $a_1 \sqsubseteq a_2$ . Then $a_2 \subseteq a_1$ , so

$\begin{equation*} \{ (x,y) \mid (\bigwedge_{P \in a_1} P(x,y)) \} \subseteq \{ (x,y).\ (\bigwedge_{P \in a_2} P(x,y)) \} \end{equation*}$

which means $\gamma(a_1) \subseteq \gamma(a_2)$ .

Example:

$\begin{equation*} \{(x,y) \mid 0 < x \land 0 < y \} \subseteq \{(x,y) \mid 0 < x \} \end{equation*}$

i.e. $\gamma(\{ 0 < x, 0 < y \}) \subseteq \gamma(\{ 0 < x \})$ .

Abstraction Function Alpha

For $c \subseteq \mathbb{Z}^2$ , define

$\begin{equation*} \alpha(c) = \{ P \in {\cal P} \mid \forall (x,y) \in c \mid P(x,y) \} \end{equation*}$

Example:

$\begin{equation*} \alpha(\{(-1,1)\}) = \{ y > 0 \} \end{equation*}$

$\begin{equation*} \alpha(\{(1,1), (2,2), (3,6) \}) = \{ x > 0, y > 0 \} \end{equation*}$

$\begin{equation*} \alpha(\{(1,0), (0,1) \}) = \emptyset \end{equation*}$

Monotonicity of Alpha

Let $c_1 \subseteq c_2$ .

We wish to prove that $\alpha(c_1) \supseteq \alpha(c_2)$ .

Let $P \in \alpha(c_2)$ . Then for all $(x,y) \in c_2$ we have $P(x,y)$ .

Then also for all $(x,y) \in c_1$ we have $P(x,y)$ , because $c_1 \subseteq c_2$ .

Therefore $P \in \alpha(c_1)$ . We showed $c_2 \subseteq c_1$ , so $c_1 \sqsubseteq c_2$ .

Alpha and Gamma are a Galois Connection

We show $(\alpha,\gamma)$ are a Galois Connection. We need to show that

$\begin{equation*} c \subseteq \gamma(a) \ \iff\ \alpha(c) \supseteq a \end{equation*}$

This is easy because both conditions reduce to

$\begin{equation*} \forall P \in a.\ \forall (x,y) \in c.\ P(x,y) \end{equation*}$

Not a Galois Insertion

Is it the case that $\alpha(\gamma(a))=a$ ?

We show this is not the case. This is because $\gamma$ is not injective.

Indeed, take $a_1 = \{false\}$ and $a_2 = \{false, x > 0\}$ . Then

$\begin{equation*} \gamma(a_1) = \emptyset = \gamma(a_2) \end{equation*}$

Note $\alpha(\gamma(a_1)) = \alpha({\cal P}) = \alpha(\gamma(a_2))$ , but $a_1 \neq a_2$ , but it is not the case that $a_1 = a_2$ . In this particular case,

$\begin{equation*} \alpha(\emptyset) = {\cal P} \end{equation*}$

and $a_1 \neq {\cal P}$ so

$\begin{equation*} \alpha(\gamma(a_1)) \neq a_1 \end{equation*}$

However, the approach works and is sound even without the condition $\alpha(\gamma(a))=a$ .

Least Upper Bound and Greatest Lower Bound

As expected, according to the syntactic ordering, $\sqcup$ is $\cap$ and $\sqcap$ is $\cup$ .

That means that, at control-flow graph join points, we intersect sets of predicates that are true on each of the incoming edges.

Computing sp sharp

Example: Consider computing $sp^\#(\{0<x\},y:=x+1)$ . We can test for each predicate $P' \in {\cal P}$ whether

$\begin{equation*} x > 0 \land y' = x + 1 \land x'=x \implies P'(x',y') \end{equation*}$

We obtain that the condition holds for $0<x$ , $0<y$ , and for $x < y$ , but not for $false$ . Thus,

$\begin{equation*} sp^\#(\{0<x\},y:=x+1) = \{ 0 < x, 0 < y, x < y \} \end{equation*}$

On the other hand,

$\begin{equation*} sp^\#(\{0<x\},y:=x-1) = \{ 0 < x \} \end{equation*}$

Fix some command given by relation $r$ .

Denote $a' = sp^\#(a,r)$ . We try to find how to compute $a'$ . As usual in Abstract Interpretation Recipe, for correctness we need to have

$\begin{equation*} sp(\gamma(a),r) \subseteq \gamma(a') \end{equation*}$

which, thanks to Galois connection is equivalent to

$\begin{equation*} \alpha(sp(\gamma(a),r)) \sqsubseteq a' \end{equation*}$

i.e.

$\begin{equation*} a' \subseteq \alpha(sp(\gamma(a),r)) \end{equation*}$

We wish to find the smallest lattice element, which is the largest set (this gives the tightest approximation). So we let

$\begin{equation*} a' = \alpha(sp(\gamma(a),r)) \end{equation*}$

By observation above,

$\begin{equation*} a' = \{ P' \in {\cal P} \mid \forall (x',y') \in sp(\gamma(a),r).\ P'(x',y') \} \end{equation*}$

Let $R(x,y,x',y')$ denote the meaning of relation $r$ (according to e.g. Compositional VCG). Then $(x',y') \in sp(\gamma(a),r)$ means

$\begin{equation*} \exists x,y. (x,y) \in \gamma(a) \land R(x,y,x',y') \end{equation*}$

i.e.

$\begin{equation*} \exists x,y. (\bigwedge_{P\in a} P(x,y)) \land R(x,y,x',y') \end{equation*}$

We then plug this expression back into $a'$ definition. Because the existentials are left of implication, the result is:

$\begin{equation*} a' = \{ P' \in {\cal P} \mid \forall x,y,x'y'.\ (\bigwedge_{P\in a} P(x,y)) \land R(x,y,x',y') \rightarrow P'(x',y') \} \end{equation*}$

This is the rule to compute $sp^{\#}$ .

In summary, to compute $sp^{\#}$ , all we need to know is to check whether the following formula is valid

$\begin{equation*} ((\bigwedge_{P\in a} P(x,y)) \land R(x,y,x',y')) \rightarrow P'(x',y') \end{equation*}$

for given $P' \in {\cal P}$ and $a \subseteq {\cal P}$ .

Example of analysis run:

x = 0;
y = 1;
while (x < 1000) {
  x = x + 1;
  y = 2*x;
  y = y + 1;
  print(y);
}

Take as the predicates ${\cal P} = \{ false, 0<x, 0<=x, 0<y, x<y, x=0, y=1, x<1000,1000 \le x \}$ .

x = 0;
y = 1;
// 0<y, x<y,x=0,y=1, x<1000
// 0<y, 0<=x, x<y
while (x < 1000) {
  // 0<y, 0<=x, x<y, x<1000
  x = x + 1;
  // 0<y, 0<=x, 0<x
  y = 2*x;
  // 0<y, 0<=x, 0<x, x<y
  y = y + 1;
  // 0<y, 0<=x, 0<x, x<y
  print(y);
}
// 0<y,  0<=x, x<y, 1000 <= x

Pre-Computation using Weakest Preconditions

There is an alternative definition, which can be used to convert the above problem into a finite-state exploration problem. (It will be particularly useful if we consider an extension from conjunctions to disjunction of conjunctions.)

The condition

$\begin{equation*} \forall x,y,x',y'.\ (\bigwedge_{P\in a} P(x,y)) \land R(x,y,x',y') \rightarrow P'(x',y') \end{equation*}$

is equivalent to the validity of the formula

$\begin{equation*} (\bigwedge_{P\in a} P) \rightarrow wp(R, P') \end{equation*}$

Fix a predicate $P'$ . Consider all $a \subseteq {\cal P}$ for which the above condition holds, define them as

$\begin{equation*} suf(R,P') = \{ a \mid (\bigwedge_{P\in a} P) \rightarrow wp(R, P') \} \end{equation*}$

Clearly, if $a_1 \in suf(R,P')$ and $a_1 \subseteq a_2$ , then also $a_2 \in suf(R,P')$ . So, we can represent $suf(R,P')$ using its minimal elements $sufm(R,P')$ . These are minimal sufficient sets of predicates that ensure that $P'$ holds.

Then, we can define equivalently

$\begin{equation*} sp^\#(a,R) = \{ P' \mid \exists a' \in sufm(R,P').\ a' \subseteq a \} \end{equation*}$

The advantage of this definition is that we compute $sufm$ for a given predicate $P'$ and command given by $R$ , and then we can efficiently compute $sp^{\#}$ for all abstract values $a$ by checking inclusion of finite sets.

Approximation: The set $sufm(R,P')$ can have up to $2^{|{\cal P}|}$ elements. To avoid this, we can restrict it to only those subsets $a'$ that have at most e.g. 3 predicates.

Note that using smaller $sufm$ means that some $a'$ will be missing, the condition will be false in some cases, and in those cases $P'$ will not be included in the postcondition, so the resulting abstract element will be larger in the lattice. So the approach is still correct, just not as precise.

Further Techniques

Predicates per program point: Normally each program point has its own set of predicates. Indeed, these predicates may talk about local variables that only exist at those program points. The entire approach remains the same, just whenever computing the set of predicates at a program point $v_i$ , use a local set ${\cal P}_i$ instead of one global set ${\cal P}$ .

Further challanges:

compute the sets ${\cal P}_i$ automatically, i.e. discover predicates automatically (usually through counterexample-guided abstraction refinement)
be more precise when handling paths through programs (path sensitivity, dependent attribute analysis)
analyze procedures (context-free reachability)

See also predicate abstraction references.