Checking Initialization Using Data-Flow Analysis

We follow the methodology of Designing Correct Data-Flow Analyses

Defining Properties of Interest

We represent program execution as a sequence of program states and simple statements.

Program state is $(u_i,w_i)$ where

$u_i$ is a program point
$w_i$ is the map from variables to their values

Given a sequence

$\begin{equation*} (u_1,w_1), s_1, (u_2,w_2), s_2, \ldots, s_{k-1}, (u_k,w_k), \ldots, s_n, (u_{n+1},w_{n+1}) \end{equation*}$

we say that variable $x$ is initialized at step $k$ if one of the statements $s_1,\ldots,s_{k-1}$ is an assignment to $x$ .

Note that if the variable is initialized, it remains initialized during that execution.

We say that a variable $x$ is definitely initialized at program point $u$ , if for every execution of length $k$ such that $u_k=u$ , we have that $x$ is initialized at step $k$ .

We would like to compute, for each program point, which variables at that point are definitely initialized.

Then, we will emit a warning if a variable is not definitely initialized at some program point, but there is a statement reading its value that can be executed at this program point.

Reformulation of the Property

To avoid talking about previous states in a sequence and talk only about states, we use the following trick

(an instance of a general technique of history variables)

For each variable, introduce an additional boolean flag (0 or 1) that indicates whether the variable is initialized

We have

$Var_X = \{ x_1, \ldots, x_n \}$ is the set of original variables
$Var_B = \{ b_1, \ldots, b_n \}$ the corresponding boolean flags

The state is map $c : (Var_X \cup Var_B) \to Int$

Then:

at the beginning, all flags are 0
at each assignment, we set the flag to 1

Desired property: if a statement reads a variable, then the variable's flag must be 1

Defining Semilattice to Express Properties

We use

$\bot$ to represent that the case when there are no states of initialization flag set to 0 (either there are no states at all, or there are only states with flag 1)
- in other words: initialized variable
$\top$ to represent states where initialization flag can be anything (0 or 1)
- in other words: possibly uninitialized variable

Lattice element is map $m : Var_X \to \{\bot,\top\}$

We use a pointwise lattice

Specifying Meaning of Lattice Elements

$\begin{equation*} \gamma(m) = \{ c \mid \forall b_i \in Var_B.\ (m(x_i)=\bot \rightarrow c(b_i)=1) \} \end{equation*}$

Example: Let $Var = \{ x, y \}$ and let $m = \{(x_1,\bot),(x_2,\top)\}$ .

Then $\gamma(m)$ is the set of concrete states $c$ such that $m(b_1) = 1$ i.e. $x_1$ is initialized.

There is no constraint on $x_2$ , because $m(x_2)=\top$ .

Checking monotonicity: if $m \sqsubseteq m'$ then $\gamma(m) \subseteq \gamma(m')$

note that if element in map goes from $\bot$ to $\top$ , then $\gamma$ value gets only bigger

Initial Lattice Element

All points except entry get value $\bot$ for each variable - as usual

What can we assign to entry?

all variables are uninitialized
so all elements are $\top$

Transfer Functions

Two kinds of statements in CFG (ignoring procedure calls):

dooes not change state (e.g. test) - initialization remains same
they assign to some variable x - set initialization of x to $\bot$ (it is initialized)

Formally,

$\begin{equation*} transferFun(x=e,a) = a[x:=\bot] \end{equation*}$

and

$\begin{equation*} transferFun(s,a) = a \end{equation*}$

if $s$ does not assign to any variable.

Note on Procedures

What about procedure calls?

for local variables - no change, procedure cannot change them
for global variables
- who knows what procedure might do
- a safe thing is to set all global variables to $\top$

Intraprocedural analysis: analyzes one procedure at time

Interprocedural analysis: descend into procedures at call site

Using Results of Initialization Analysis

If at node $v$ we have $m(x)=\top$ and there is a statement from $v$ that reads $x$ , report error “reading a possibly uninitialized variable x”.