Forward Symbolic Execution
Can be used for
- verification condition generation
- executing programs
- testing, bug finding
- proving programs correct (if extended with techniques for loop invariants)
Intuition for Symbolic Execution
Symbolic execution engine is like an interpreter, but can keep values of variables symbolic, and keep track of constraints on these symbolic values. This is why it is called symbolic execution. In the special case when the input values are known (and there is no I/O in the program), it reduces to execution (interpretation).
State of Symbolic Execution Engine
Assume for simplicity:
- the set of variables is
- there are no loops
Symbolic execution maintains a formula describing the relation between the initial state and the current state.
The engine ensures that has the following normal form:
- is a term describing symbolic state of
- describes symbolic state of
- is a path condition
- variables do not appear in (they only appear on LHS of ,.
- are fresh variables that do not appear in the program (we often omit existential quantification over these variables)
Symbolic Execution Rules
We derive these rules from the relational semantics. We assume has the form as above, with .
where is an expression, change into interpreted in the current state, obtaining
How does this follow from relation composition?
Given assume statement
add to path condition, obtaining
Given havoc statement:
change into a fresh variable:
where is a fresh variable (implicitly -quantified).
prove the condition:
(or save this condition as a conjunct in the final verification condition).
Justification: note that the postcondition at current program point is
so the above statement expresses the condition that the assertion will hold starting from all initial states.
Weakening Optionally, after assert(F), symbolic execution can execute
havoc(x) havoc(y) assume(F)
which essentially has the effect of obtaining
where , are fresh variables and where becomes the new path condition. This means that only information about the assertion is propagated further, other information about the past is forgotten.
This is handled implicitly: execute statements one by one.
Execute both branches and collect proof obligations from both branches.
Exponential explosion remedy:
- work on control-flow graph
- merge symbolic states when edges merge
Merging: Given two symbolic states:
generate a merged symbolic state
where are fresh variables, and where is
Then continue symbolic execution from this state.
Simplification: if we can simplify symbolic expressions , or path condition , this can be very useful, especially if the size of the expression decreases.
An important special case is constant folding. Example: if is , we replace it by .
Much more sophisticated simplifications are possible (e.g. keep polynomials in normal form, apply terminating rewrite rules); see the references at the end of page.
If the path condition formula becomes unsatisfiable, the relation is empty and symbolic execution can stop (path is unreachable).
Special case: if the initial condition specifies values of variables, then we know which branch to enter, and symbolic execution reduces to concrete execution (interpreter for the programming language).
Approximation in Symbolic Execution
Instead of computing exact relation, we can compute approximate relation for the program. Two main approaches:
- larger relation: overapproximation (safe approximation)
- smaller relation: underaproximation (used for bug finding, sometimes it's not guaranteed to be smaller or bigger)
Why we approximate:
- ensure termination even for programs that have loops
- can still obtain useful information about the program
We can do approximation by modifying the current symbolic execution formula to be
- stronger (for under-approximation)
- weaker (for over-approximation)
We will study this topic in the rest of the course under the framework of abstract interpretation.
In terms of symbolic execution, these can often be described by imposing a stronger normal form on the formula of symbolic execution.
Creates weaker formulas, which are guaranteed to be of stronger form. Here are some examples of how normal form for looks in that case. Key question is how to ensure that the formulas look like this–it means that we need to modify propagation rules to generate as strong as possible formulas in this normal form.
where is one of the formulas , , (and similarly for )
where is a formula of the forms , , , or (analogously for )
Linear or Polynomial Constraint Analysis
where are polynomials of degree . For we obtain systems of linear constraints.
Ensuring termination of the analysis in the presence of loops:
- we check whether we have seen a weaker symbolic state before, if we have, then we stop
- we design normal forms of so that the number of iterations through loops is always finite
We look at these topics when we study abstract interpretation and data-flow analysis.
Over-Approximation Using Supplied Asserts
Approach two: interpreting assert(P) as:
assert(P) havoc(x,y) assume(P)
which is an over-approximation. To see this, we need to give a relational meaning to assert, which we do next. The advantage is that, if asserts express correct loop invariants, this over-approximation will terminate and prove the desired assertions. We can ensure termination by providing a correct assert in each cycle of the control-flow graph.
Creates stronger formulas than formulas representing program meaning.
- can often be expressed as adding certain 'assume' statements
If the first statements are
assume (x=0) assume (x=1)
Then symbolic execution knows the exact values and can work as an interpreter.
- simplification optimization computes the exact values of variables
There are techniques that do under-approximation by selecting values of certain variables.
- find a satisfying assignment for , take value of in it and replace with
Other techniques for underapproximation:
- Bounding depth: stop executing for paths longer than some value
- Bounding state: small integer width, small number of objects in heap
- Loop unrolling: replace loop( c ) with cc;cc;c;c for some number of times
- Lori Clarke and Debra Richardson: Symbolic Evaluation Methods for Program Analysis, in Program Flow Analysis: Theory and Applications, Prentice-Hall 1981.