This is an old revision of the document!

Lecture 3 (Skeleton)

Converting programs (with simple values) to formulas

Context

Recall that we can

represent programs using guarded command language, e.g. desugaring of 'if' into non-deterministic choice and assume
give meaning to guarded command language statements as relations
we can represent relations using set comprehensions; if our program c has two state components, we can represent its meaning R( c ) as $\{((x_0,y_0),(x,y)) \mid F \}$ , where F is some formula that has x,y,x_0,y_0 as free variables.

this is what I mean by simple values: later we will talk about modeling pointers and arrays, but we will still use this as a starting point.

Our goal is to find rules for computing R( c ) that are

correct
efficient
create formulas that we can effectively prove later

What exactly do we prove about the formula R( c ) ?

We prove that this formula is valid:

R( c ) -> error=false

Formulas for basic statements

In our simple language, basic statements are assignment, havoc, assume, assert.

R(x=t) = (x=t & y=y_0 & error=error_0)

Note: all our statements will have the property that if error_0 = true, then error=true. That is, you can never recover from an error state. This is convenient: if we prove no errors at the end, then there were never errors in between.

Note: the condition y=y_0 & error=error_0 is called <b>frame condition</b>. There are as many conjuncts as there are components of the state. This can be annoying to write, so let us use shorthand frame(x) for it. The shorthand frame(x) denotes a conjunction of v=v_0 for all v that are distinct from x (in this case y and error). We can have zero or more variables as arguments of frame, so frame() means that nothing changes.

R(havoc x) = frame(x)
R(assume F) = F[x:=x_0, y:=y_0, error:=error_0]
R(assert F) = (F -> frame)

Note:

x=t  is same as  havoc(x);assume(x=t)

assert false = crash  (stops with error)

assume true  = skip   (does nothing)

Composing formulas using relation composition

This is perhaps the most direct way of transforming programs to formulas. It creates formulas that are linear in the size of the program.

Non-deterministic choice is union of relations, that is, disjunction of formulas:

CR(c1 [] c2) = CR(c1) | CR(c2)

In sequential composition we follow the rule for composition of relations. We want to get again formula with free variables x_0,y_0,x,y. So we need to do renaming. Let x_1,y_1,error_1 be fresh variables.

CR(c1 ; c2) = exists x_1,y_1,error_1.  CR(c1)[x:=x_1,y:=y_1,error:=error_1] & CR(c2)[x:=x_1,y:=y_1,error:=error_1]

The base case is

CR(c)=R(c)

when c is a basic command.

Avoiding accumulation of equalities

This approach generates many variables and many frame conditions.

Ignoring error for the moment, we have, for example:

R(x=3) = (x=3 & y=y_0)
R(y=x+2) = (y=x_0 + 2 & x=x_0)

CR(x=3;y=x+2) = x_1=3 & y_1 = y_0 & y = x_1 + 2 & x = x_1

But if a variable is equal to another, it can be substituted using the substitution rules

(exists x_1. x_1=t & F(x_1))     <->    F(t)
(forall x_1. x_1=t -> F(x_1)     <->    F(t)

We can apply these rules to reduce the size of formulas.

Approximation

If (F → G) is value, we say that F is stronger than F and we say G is weaker than F.

When a formula would be too complicated, we can instead create a simpler approximate formula. To be sound, if our goal is to prove a property, we need to generate a *larger* relation, which corresponds to a weaker formula describing a relation, and a stronger verification condition. (If we were trying to identify counterexamples, we would do the opposite).

We can replace “assume F” with “assume F1” where F1 is weaker. Consequences:

omtiting complex if conditionals (assuming both branches can happen - as in most type systems)
replacing complex assignments with arbitrary change to variable: because x=t is havoc(x);assume(x=t) and we drop the assume

This idea is important in static analysis.

Symbolic execution

Symbolic execution converts programs into formulas by going forward. It is therefore somewhat analogous to the way an interpreter for the language would work. It is based on the notion of strongest postcondition.

Weakest preconditions

While symbolic execution computes formula by going forward along the program syntax tree, weakest precondition computes formula by going backward.

Proving quantifier-free linear arithmetic formulas

Papers

Verification condition generation in Spec#: http://research.microsoft.com/~leino/papers/krml157.pdf
Loop invariant inference for set algebra formulas: hob-tcs.pdf
Induction-iteration method for machine code checking: http://www.cs.wisc.edu/wpis/papers/pldi00.ps

Presburger Arithmetic (PA) bounds: papadimitriou81complexityintegerprogramming.pdf
Specializing PA bounds: http://www.lmcs-online.org/ojs/viewarticle.php?id=43&layout=abstract