# Designing Correct Data-Flow Analyses

We describe guidelines to design a correct analysis for a set of properties of interest

## Define Properties of Interest

Define precisely what property is needed (depends on the application), e.g.

- absence of errors
- improving code performance without changing program meaning

Recall Applications of Data-Flow Analysis

Example: every variable assigned before being read

## Define Analysis (Semi)Lattice to Express Properties

Find a lattice that

- encodes properties of interest
- enables terminating analysis

Specify lattice top, bottom, join

## Precisely Specify Meaning of Lattice Elements

Specify **concretization function**, mapping lattice elements to sets of states

In SignAnalysis.scala, we have for for

where, assuming infinite integers we have

**Monotonicity requirement** for :

- implies , for all

Note that this implies

**Example**

Lattice element given by ,

Then

Here we wrote to mean the function .

## Compute a Sound Initial Lattice Element

Initially, the set of reachable states is empty at all points, except at CFG entry, where we have set of initial states of program

In the initial state of data-flow analysis, we will assign to each program location except entry

At program entry we assign a lattice element such that

Putting to is always safe (this corresponds to assuming any possible program states at program initialization time). If we know something more about the initial program states, we can give better initial element. For sign analysis, if we assume all variables are initialized to 0, then we can set .

## Derive Sound Transfer Functions

For each statement , define

Needs to correctly approximate the effect of statement on states given by .

Formally, for all statements and lattice elements the **Correctness Condition** is:

Here is the transfer function from Concrete Execution as Data-Flow Analysis

- takes set of states and runs statement on each state

Recall SignAnalysis.scala transfer functions

## Running the Algorithm

As described in DataFlowAnalysis.scala, we execute this iteration

The algorithm terminates because the elements get larger and larger.

When the algorithm terminates we know that the above assignment is in fact a true as an equation (not assignment).

## Sketch of Correctness Proof

Here is why the computed result, given by for each vertex is correct.

Consider any execution of the program represented as control-flow graph (CFG). The execution goes through states, which we can describe by the vertex and the values of variables

The execution is a path in CFG, starting at 'entry' and finishing at 'exit'. Consider the resulting alternating sequence of vertices and CFG statements:

where = 'entry', ='exit'.

We prove by induction on claim that, for each ,

In other words, the values of variables at each program point are correctly described by the lattice value at that program point.

**Proof:**

**Base case:** this is how we chose .

**Inductive step:** follows from

- Monotonicity of
- Correctness Condition of
- The condition that map has stabilized

Indeed:

- suppose by inductive hypothesis
- then by definition, because it is a next step in program execution
- by Correctness Condition,
- because stabilized, ; indeed, is an upper bound of values on right-hand side of equation
- by Monotonicity of , we obtain .
**QED**

Note: for *correctness*, it does not matter how we obtained fact, as long as we have checked that it stabilized (we could have guessed those values or found not-necessarily least solution).