Designing Correct Data-Flow Analyses

Define Properties of Interest

Define precisely what property is needed (depends on the application), e.g.

absence of errors
potentially improving code performance without changing program meaning

Example: every variable assigned before being read

Define Analysis (Semi)Lattice to Express Properties

Find a lattice that

encodes properties of interest
enables terminating analysis

Specify lattice top, bottom, join

Precisely Specify Meaning of Lattice Elements

Specify concretization function, mapping lattice elements to sets of states

$\begin{equation*} \gamma : L \to {\cal P}(States) \end{equation*}$

In SignAnalysis.scala, we have for $m : Var \to Sign$ for $Var = \{v_1,\ldots,v_n\}$

$\begin{equation*} \gamma(m) = \gamma_0(m(v_1)) \times \ldots \times \gamma_0(m(v_n)) \end{equation*}$

where, assuming infinite integers we have

$\begin{equation*} \begin{array}{l} \gamma_0(\bottom) = \emptyset \\ \gamma_0(Positive) = \{ 1, 2, \ldots, \} \\ \gamma_0(Zero) = \{ 0 \} \\ \gamma_0(Negative) = \{ \ldots, -2, -1 \} \\ \gamma_0(\top) = \{ \ldots, -2, -1, 0, 1, 2, \ldots \} \\ \end{array} \end{equation*}$

Monotonicity requirement for $\gamma$ :

$x \sqsubseteq y$ implies $\gamma(x) \subseteq \gamma(y)$ , for all $x,y$

Note that this implies

$\begin{equation*} \gamma(x) \cup \gamma(y) \subseteq \gamma(x \sqcup y) \end{equation*}$

Example

$Var = \{ x, y \}$

Lattice element $m : Var \to Sign$ given by $m(x)=Negative$ , $m(y) = Negative$

Then

$\begin{equation*} \begin{array}{r@{}l} \gamma(m) = & \{ \ldots, -2, -1 \} \times \{ 1, 2, \ldots, \} \\ = & \{ (-1, 1), (-1, 2), (-1, 3), \ldots, (-2, 1), (-2, -2), \ldots \} \end{array} \end{equation*}$

Here we wrote $(a,b)$ to mean the function $\{(x,a), (y,b)\}$ .

Compute a Sound Initial Lattice Element

Initially, the set of reachable states is empty at all points, except at CFG entry, where we have set of initial states of program $S_0$

In the initial state of data-flow analysis, we will assign $\bot$ to each program location except entry

At entry we assign lattice element $a_0$ such that

$\begin{equation*} \gamma(a_0) \supseteq S_0 \end{equation*}$

Derive Sound Transfer Functions

For each statement $st$ , define $transFun(st,a)$

Needs to correctly approximate the effect of statement $st$ on states given by $a$ .

Formally, for all statements $st$ and lattice elements $a$ the correctness condition is:

$\begin{equation*} \gamma(transFun(st,a)) \supseteq transFun_C(st,\gamma(a)) \end{equation*}$

Here $transFun_C$ is the transfer function from Concrete Execution as Data-Flow Analysis

takes set of states and runs statement $st$ on each state

Recall SignAnalysis.scala transfer functions

Sketch of Correctness Proof

Why are computed results correct?

Imagine running in parallel a step of

our data flow analysis (abstract analysis)
Concrete Execution as Data-Flow Analysis - step $k$ computes all executions of length up to $k$

We can show, for each $k$

at step $k$ results in abstract analysis approximate concrete execution of up to step $k$

We can show that in the limit, this approximation holds

limit of concrete data-flow analysis computes set of reachable states
thus, the compute results approximate reachable states

This can be made completely precise using theory of lattices and fixed points