====== Lecture 3 (Skeleton) ====== Summary of what we are doing in today's class: {{vcg-big-picture.png}} ===== Verification condition generation: converting programs into formulas ===== ==== Context ==== Recall that we can * represent programs using guarded command language, e.g. desugaring of 'if' into non-deterministic choice and assume * give meaning to guarded command language statements as relations * we can represent relations using set comprehensions; if our program c has two state components, we can represent its meaning R( c ) as $\{((x_0,y_0),(x,y)) \mid F \}$, where F is some formula that has x,y,x_0,y_0 as free variables. * simple values: variables are integers. Later we will talk about modeling pointers and arrays, but what we say now applies Our goal is to find rules for computing R( c ) that are * correct * efficient * create formulas that we can effectively prove later What exactly do we prove about the formula R( c ) ? We prove that this formula is **valid**: R( c ) -> error=false ==== Formulas for basic statements ==== In our simple language, basic statements are assignment, havoc, assume, assert. R(x=t) = (x=t & y=y_0 & error=error_0) **Note**: all our statements will have the property that if error_0 = true, then error=true. That is, you can never recover from an error state. This is convenient: if we prove no errors at the end, then there were never errors in between. **Note**: the condition y=y_0 & error=error_0 is called frame condition. There are as many conjuncts as there are components of the state. This can be annoying to write, so let us use shorthand frame(x) for it. The shorthand frame(x) denotes a conjunction of v=v_0 for all v that are distinct from x (in this case y and error). We can have zero or more variables as arguments of frame, so frame() means that nothing changes. R(havoc x) = frame(x) R(assume F) = F[x:=x_0, y:=y_0, error:=error_0] & frame() R(assert F) = (F -> frame) **Note**: x=t is same as havoc(x);assume(x=t) assert false = crash (stops with error) assume true = skip (does nothing) ==== Composing formulas using relation composition ==== This is perhaps the most direct way of transforming programs to formulas. It creates formulas that are linear in the size of the program. Non-deterministic choice is union of relations, that is, disjunction of formulas: CR(c1 [] c2) = CR(c1) | CR(c2) In sequential composition we follow the rule for composition of relations. We want to get again formula with free variables x_0,y_0,x,y. So we need to do renaming. Let x_1,y_1,error_1 be fresh variables. CR(c1 ; c2) = exists x_1,y_1,error_1. CR(c1)[x:=x_1,y:=y_1,error:=error_1] & CR(c2)[x:=x_1,y:=y_1,error:=error_1] The base case is CR(c)=R(c) when c is a basic command. ==== Avoiding accumulation of equalities ==== This approach generates many variables and many frame conditions. Ignoring error for the moment, we have, for example: R(x=3) = (x=3 & y=y_0) R(y=x+2) = (y=x_0 + 2 & x=x_0) CR(x=3;y=x+2) = x_1=3 & y_1 = y_0 & y = x_1 + 2 & x = x_1 But if a variable is equal to another, it can be substituted using the substitution rules (exists x_1. x_1=t & F(x_1)) <-> F(t) (forall x_1. x_1=t -> F(x_1) <-> F(t) We can apply these rules to reduce the size of formulas. ==== Approximation ==== If (F -> G) is value, we say that F is stronger than F and we say G is weaker than F. When a formula would be too complicated, we can instead create a simpler approximate formula. To be sound, if our goal is to prove a property, we need to generate a *larger* relation, which corresponds to a weaker formula describing a relation, and a stronger verification condition. (If we were trying to identify counterexamples, we would do the opposite). We can replace "assume F" with "assume F1" where F1 is weaker. Consequences: * omtiting complex if conditionals (assuming both branches can happen - as in most type systems) * replacing complex assignments with arbitrary change to variable: because x=t is havoc(x);assume(x=t) and we drop the assume This idea is important in static analysis. ==== Symbolic execution ==== Symbolic execution converts programs into formulas by going forward. It is therefore somewhat analogous to the way an [[interpreter]] for the language would work. Avoid renaming all the time. SE(F,k, c1; c2) = SE(F & R(c1), k+1, c2) (update formula) SE(F,k,(c1 [] c2); c2) = SE(F, k, c1) | SE(F,k,c2) (explore both branches) Note: how many branches do we get? Strongest postcondition: \begin{equation*} sp(P,r) = \{ s_2 \mid \exists s_1.\ s_1 \in P \land (s_1,s_2) \in r \} \end{equation*} Like composition of a set with a relation. It's called ''relational image'' of set $P$ under relation $r$. Note: when proving our verification condition, instead of proving that semantics of relation implies error=false, it's same as proving that the formula for set sp(U,r) implies error=false, where U is the universal relation, or, in terms of formulas, computing the strongest postcondition of formula 'true'. ==== Weakest preconditions ==== While symbolic execution computes formula by going forward along the program syntax tree, weakest precondition computes formula by going backward. wp(Q, x=t) = wp(Q, assume F) = wp(Q, assert F) = wp(Q, c1 [] c2) = wp(Q, c1 ; c2) = ==== Inferring Loop Invariants ==== Suppose we compute strongest postcondition in a program where we unroll loop k times. * What does it denote? * What is its relationship to loop invariant? Weakening strategies * maintain a conjunction * drop conjuncts that do not remain true Alternative: * decide that you will only loop for formulas of restricted form, as in abstract interpretation and data flow analysis (next week) ===== One useful decision procedure: Proving quantifier-free linear arithmetic formulas ===== Suppose that we obtain (one or more) verification conditions of the form \begin{equation*} F\ \rightarrow\ (\mbox{error}=\mbox{false}) \end{equation*} whose validity we need to prove. We here assume that F contains only linear arithmetic. Note: we can check satisfiability of $F\ \land\ (\mbox{error}=\mbox{true})$. We show an algorithm to check this satisfiability. ==== Quantifier Presburger arithmetic ==== Here is the grammar: var = x | y | z | ... (variables) K = ... | -2 | -1 | 0 | 1 | 2 | ... (integer constants) T ::= var | T + T | K * T (terms) A ::= T=T | T <= T (atomic formulas) F ::= A | F & F | F|F | ~F (formulas) To get full Presburger arithmetic, allow existential and universal quantifiers in formula as well. Note: we can assume we have boolean variables (such as 'error') as well, because we can represent them as 0/1 integers. Satisfiability of quantifier-free Presburger arithmetic is decidable. Proof: small model theorem. ==== Small model theorem for Quantifier-Free Presburger Arithmetic (QFPA) ==== First step: transform to disjunctive normal form. Next: reduce to integer linear programming: \begin{equation*} A\vec x = \vec b, \qquad \vec x \geq \vec 0 \end{equation*} where $A \in {\cal Z}^{m,n}$ and $x \in {\cal Z}^n$. Then solve integer linear programming (ILP) problem * [[wk>Integer Linear Programming]] * online book chapter on ILP * [[http://www.gnu.org/software/glpk/|GLPK]] tool We can prove small model theorem for ILP - gives bound on search. Short proof by {{papadimitriou81complexityintegerprogramming.pdf|Papadimitriou}}: * solution of Ax=b (A regular) has as components rationals of form p/q with bounded p,q * duality of linear programming * obtains bound $M = n(ma)^{2m+1}$, which needs $\log n + (2m+1)\log(ma)$ bits * we could encode the problem into SAT: use circuits for addition, comparison etc. Note: if small model theorem applies to conjunctions, it also applies to arbitrary QFPA formulas. Moreover, one can improve these bounds. One tool based on these ideas is [[http://www.cs.cmu.edu/~uclid/|UCLID]]. Alternative: enumerate disjuncts of DNF on demand, each disjunct is a conjunction, then use ILP techniques (often first solve the underlying linear programming problem over reals). Many SMT tools are based on this idea (along with Nelson-Oppen combination: next class). * [[http://www.cs.nyu.edu/acsys/cvc3/download.html|CVC3]] (successor of CVC Lite) * [[http://combination.cs.uiowa.edu/smtlib/|SMT-LIB]] Standard for formulas, competition ==== Full Presburger arithmetic ==== Full Presburger arithmetic is also decidable. Approaches: * Quantifier-Elimination (Omega tool from Maryland) - see homework * Automata Theoretic approaches: LASH, MONA (as a special case) ===== Papers ===== * Verification condition generation in Spec#: http://research.microsoft.com/~leino/papers/krml157.pdf * Loop invariant inference for set algebra formulas: {{hob-tcs.pdf}} * Induction-iteration method for machine code checking: http://www.cs.wisc.edu/wpis/papers/pldi00.ps * Presburger Arithmetic (PA) bounds: {{papadimitriou81complexityintegerprogramming.pdf}} * Specializing PA bounds: http://www.lmcs-online.org/ojs/viewarticle.php?id=43&layout=abstract