Designing Correct Data-Flow Analyses

Define Properties of Interest

Define precisely what property is needed (depends on the application), e.g.

  • absence of errors
  • potentially improving code performance without changing program meaning

Example: every variable assigned before being read

Define Analysis (Semi)Lattice to Express Properties

Find a lattice that

  • encodes properties of interest
  • enables terminating analysis

Specify lattice top, bottom, join

Precisely Specify Meaning of Lattice Elements

Specify concretization function, mapping lattice elements to sets of states

   \gamma : L \to {\cal P}(States)

In SignAnalysis.scala, we have for $m : Var \to Sign$ for $Var = \{v_1,\ldots,v_n\}$

\gamma(m) = \gamma_0(m(v_1)) \times \ldots \times \gamma_0(m(v_n))

where, assuming infinite integers we have

\gamma_0(\bottom) = \emptyset \\
\gamma_0(Positive) = \{ 1, 2, \ldots, \} \\
\gamma_0(Zero) = \{ 0 \} \\
\gamma_0(Negative) = \{ \ldots, -2, -1 \} \\
\gamma_0(\top) = \{ \ldots, -2, -1, 0, 1, 2, \ldots \} \\

Monotonicity requirement for $\gamma$:

  • $x \sqsubseteq y$ implies $\gamma(x) \subseteq \gamma(y)$, for all $x,y$

Note that this implies

   \gamma(x) \cup \gamma(y) \subseteq \gamma(x \sqcup y)


$Var = \{ x, y \}$

Lattice element $m : Var \to Sign$ given by $m(x)=Negative$, $m(y) = Negative$


   \gamma(m) = & \{ \ldots, -2, -1 \} \times  \{ 1, 2, \ldots, \} \\
             = & \{ (-1, 1), (-1, 2), (-1, 3), \ldots, (-2, 1), (-2, -2), \ldots  \}

Here we wrote $(a,b)$ to mean the function $\{(x,a), (y,b)\}$.

Compute a Sound Initial Lattice Element

Initially, the set of reachable states is empty at all points, except at CFG entry, where we have set of initial states of program $S_0$

In the initial state of data-flow analysis, we will assign $\bot$ to each program location except entry

At entry we assign lattice element $a_0$ such that

   \gamma(a_0) \supseteq S_0

Derive Sound Transfer Functions

For each statement $st$, define $transFun(st,a)$

Needs to correctly approximate the effect of statement $st$ on states given by $a$.

Formally, for all statements $st$ and lattice elements $a$ the correctness condition is:

   \gamma(transFun(st,a)) \supseteq transFun_C(st,\gamma(a))

Here $transFun_C$ is the transfer function from Concrete Execution as Data-Flow Analysis

  • takes set of states and runs statement $st$ on each state

Recall SignAnalysis.scala transfer functions

Sketch of Correctness Proof

Why are computed results correct?

Imagine running in parallel a step of

We can show, for each $k$

  • at step $k$ results in abstract analysis approximate concrete execution of up to step $k$

We can show that in the limit, this approximation holds

  • limit of concrete data-flow analysis computes set of reachable states
  • thus, the compute results approximate reachable states

This can be made completely precise using theory of lattices and fixed points