LARA

Forward Symbolic Execution

Can be used for

  • verification condition generation
  • executing programs
  • testing, bug finding
  • proving programs correct (if extended with techniques for loop invariants)

Basic Ideas

Generalize an interpreter to work with symbolic instead of concrete values.

Assume the set of variables $V = \{x,y\}$.

Symbolic execution maintains a formula $R(x_0,y_0,x,y)$ describing the relation between the initial state and the current state.

Exploiting Analogy with Execution

Maintain symbolic state formula $R(x_0,y_0,x,y)$ in the following normal form:

\begin{equation*}
    x = e_x \land y = e_y \land P
\end{equation*}

where $e_x$ is term describing symbolic state of $x$ and $e_y$ describes symbolic state of $y$. The remaining part of the formula $R$ is given in path condition $P$. It is a conjunction of all branch conditions up to current program point. We require $P$ to be expressed only in terms of $x_0,y_0$, not $x,y$.

Checking Assertions

Postcondition at current program point is $\exists x_0,y_0. R(x_0,y_0,x,y)$.

What should it do when encountering assert(F)?

Branch Pruning

If condition $P$ ever becomes contradictory, relation is empty and symbolic execution can stop (path is unreachable).

Special case: if initial condition specifies values of variables, then we know which branch to enter.

Symbolic Execution Rules

We derive these rules from semantics, taking into account the definition of R in terms of e_x,e_y,P.

Assignment

Given

x = f(x,y)

change state:

e_x := f(e_x,e_y)

How does this follow from relation composition?

Assume

Given

assume(F)

change state:

P := P & F[x<-e_x, y<-e_y]

Havoc

Given

havoc(x)

change state:

e_x := x_k

where x_k is a fresh variable.

Assert

Given

assert(F(x,y))

emit proof obligation:

\begin{equation*}
   (x = e_x \land y = e_y \land P) \rightarrow F(e_x,e_y)
\end{equation*}

and change the state:

x := x_k (fresh)
y := y_k (fresh)
P := F

Sequential composition

Handled implicitly: execute statements one by one.

Non-deterministic choice

Execute both branches and collect proof obligations from both branches.

Exponential explosion remedy:

  • work on control-flow graph
  • merge symbolic states when edges merge

Merging: Given two symbolic states:

x = e_x1 & y = e_y1 & P1
x = e_x2 & y = e_y2 & P2

generate a single symbolic state

x = x_k & y = y_k & P

where P is ((x_k = e_x1 & y_k = e_y1 & P1) | ((x_k = e_x2 & y_k = e_y2 & P2)

Then we continue only from this one state.

Execution

If the first statements are

assume (x=0)
assume (x=1)

Then symbolic execution knows the exact values and can work as an interpreter.

Underapproximations

Here we generate real counterexamples (underapproximation), we compute smaller relation than one given by program semantics.

If there are no loop invariants, we can still execute program on concrete inputs. For symbolic initial state (e.g. giving upper bound of loop) execution need not terminate.

  • Bounding depth: stop executing for paths longer than some value
  • Bounding state: small integer width, small number of objects in heap
  • Loop unrolling: replace loop( c ) with c[]c;c[]c;c;c for some number of times
  • Instantiating values: find a satisfying assignment for R, take value v of x in it and replace e_x with v (moves towards concrete execution)

Overaproximation

In this case we compute overaproximation, bigger relation than the actual relation given by program semantics.

Examples:

  • do not maintain exact expressions for variables, but only whether they are positive
  • maintain only linear or certain polynomial constraints - if actual constraints are more complex, keep only the polynomial constraints that are guaranteed to hold

Why we approximate we can obtain

  • termination even for programs that have loops
  • useful information about the program (not exactly all information that we are interested in, depends on the approximation)

References