Lecture 22: Set constraints

Introduction

We can view analysis and verification as two stage process

program + properties => system of equations/formulas => solution

In general, the system of equations (for a program expressed as one loop) is an approximation of the condition

$sp(I,c) \subseteq I \Longleftrightarrow sp(I,c) \cup I = I$

where $I$ is the set of reachable states and $c$ the code of the program.

We have seen earlier

solution for programs that have no loops (verification condition generation): $I$ is given as formula
iterative solutions based on finite height lattice abstractions: $I$ computed iteratively
constraint-based analysis based on linear inequality templates: postulate coefficients for $I$ of the form $\bigwedge_{j} \sum_{i} c_{ij} x_{i} \geq d_{j}$ , solve for them

Today we look at another constraint based analysis, which deals with non-numeric properties and is based on set constraints.

Two problems:

how to generate set constraints for a given analysis
how to solve set constraints (techniques often reusable for different analysis)

Set constraints can be naturally used to express

interprocedural abstract interpretation and dataflow analysis
type inference

Different axes for classifying analyses:

context sensitivity: does not produce false paths in procedure calls/returns
flow sensitivity: taking into account the ordering of statements within a procedure, taking not into account the ordering may seem strange but is useful to fasten the analysis of large programs
field sensitivity: when analyzing data structures, y=x.f and y=x.g can be treated like members of the same object x
non-relational (independent attribute, Cartesian) VS. relational (dependent attribute)
- example : non-relational $1 \leq x \leq 5 \wedge 2 \leq y \leq 7$ VS. relational $x + y \leq 10$

Set constraints are suitable for non-relational analyses, whether they are flow sensitive or not and whether they are context-sensitive or not.

Type systems: context-sensitive (e.g. generics), work for higher-order programs (and callbacks and function pointers). Often flow insensitive (unless they are “typestate” or “flow sensitive type qualifiers”). Sometimes designed for purely functional programs (higher order imperative programs are difficult to reason about).

Notion of set constraints

Syntax

We can define the subset constraints between expressions denoting sets of terms using the following grammar :

$S ::= c | v | S \cap S | S \cup S | S \setminus S | f(S, ..., S) | f^{-i}(S)$

where $c$ is a constant, $v$ is a terminal (it can be a variable or function symbol), $f$ is an application function and $f^{-i}$ is the projection of th $i^{th}$ parameter of $f$ .

Note: set difference ( $\setminus$ ) as it is not monotonic, so its use is restricted if we want to ensure existence of solutions.

Semantic

We define the semantic of our syntax recursively by defining each of its elements over a language $L = \lbrace a, \ldots, f_1, \ldots, g_1, \ldots \rbrace$ . We first have that :

$\begin{equation*} [[S]] \in H \end{equation*}$

where $H$ is the Herbrand univers, or “set of all ground terms”, or also called the “term model”. As we already saw, it's the set of all possibly constructed trees using language $L$ :

$\begin{equation*} H = \lbrace a, f_1(a, a), f_1(g_1(a, a), a), \ldots \rbrace \end{equation*}$

Given that, if the meaning of $v$ is given (i.e. we have $[[v]] = \alpha(v) \subseteq H$ ), then we can estimate the meaning of our language as following :

$[[c]] = \lbrace c \rbrace$
$[[S_1 \cup S_2]] = [[S_1]] \cup [[S_2]]$
$[[S_1 \cap S_2]] = [[S_1]] \cap [[S_2]]$
$[[S_1 \setminus S_2]] = [[S_1]] \setminus [[S_2]]$
$[[f(S_1, \ldots, S_n)]] = \lbrace f(t_1, \ldots, t_n) | t_1 \in [[S_1]] \wedge \ldots \wedge t_n \in [[S_n]] \rbrace$
$[[f^{-i}(S)]] = \lbrace t_i | \exists t_1, \ldots, t_n. f(t_1, \ldots, t_i, \ldots t_n) \in [[S]] \rbrace$

Example

$[[f(\lbrace a, f(a,a)\rbrace , \lbrace a \rbrace)]] = \lbrace f(a, a), f(f(a, a), a) \rbrace$

Using this example we can already see how we could use set constraints to analyze procedure calls as $f$ can be viewed as a procedure call on which we apply possible parameters.

An important property we would like to have in this semantic is a very intuitive one :

$\begin{equation*} [[f^{-1}(f(S_1, S_2))]] = [[S_1]] \end{equation*}$

This property is in fact not conserved as we can easily see using a very simple counter-example :

$\begin{equation*} [[f^{-1}(f(S_1, \emptyset))]] = [[f^{-1}(\lbrace f(t_1, t_2) | t_1 \in [[S_1]] \wedge t_2 \in \emptyset \rbrace)]] = [[f^{-1}(\emptyset)]] = \emptyset \end{equation*}$

The correct interpretation of this property is :

$\begin{displaymath} [[f^{-1}(f(S_1, S_2))]] = \left\{ \begin{array}{ll} \emptyset & if [[S_2]] = \emptyset & [[S_1]] & otherwise \\ \end{array} \right \end{displaymath}$

Live variable analysis using set constraints

A variable is said to ba alive when it's value will be reused later in the program.

Example :

x = 1;
y = 5;
z = 7;
z = x + y;

in this example, only variables x and y are alive.

Example of use:

register allocation
related problems arise in memory preallocation

In order to compute the set of live variable, let's take the same example as before to use it to deduce the update rule. We can see our program as the following tree :

From this we can deduce the following set equations that define the relation between the sets of active variables $S$ at different program points :

$[[S]]_{in} = S_{use} \cup ([[S]]_{out} \setminus S_{df})$

$[[S]]_{out} = \bigcup_{X \in succ(S)} [[X]]_{in}$

Where $S_{use} = \lbrace x, y \rbrace$ as they are used in $c$ and $S_{df} = z$ as it's the newly defined variable in $c$ . Here we do not have a problem with the monotonicity of the \ operator as $S_{df}$ is not a variable.

Control-flow analysis | Closure Analysis

Example of the problem

In a functional language, f is a variable that pass a function. In this context, to which function (f x y) refers to ?
In order to answer this question, we have to keep track of who called f.

This analysis is handled by the $\lambda$ calculus. As a reminder, it has the following grammar : $E := v | \lambda x.E | E\:E$ where $E\:E$ represent the application of the function $E$ (the left one) to its argument $E$
We can rename functions and variables such that they are unique
We can also add $\lambda x.x$ to say that it exists $x$
We can then convert these expressions to set constraints :

$\lambda x.e_0 \Longrightarrow \lambda x \subseteq [[\lambda x.e_0]]$

$e_1\:e_2 \Longrightarrow \forall \lambda x.e_3\;in\;e. \lambda_x \subseteq [[e_1]] \Rightarrow \left ( [[e_2]] \subseteq [[x]] \wedge [[e_3]] \subseteq [[e_1 e_2]] \right )$
where $e_1\:e_2$ is a part of the program $e$ and $\lambda_x$ is a constant.

For the case of application, the question is : Can $\lambda_x$ describe the expression $e_1$ ?

Example We will try to see if this is the case using $\left ( \lambda x.x \right ) \lambda y.y$ .

From this we can deduce, as we defined previously, the following contraints :

$\begin{equation*}\begin{array}[c] \left { \lambda_x \right } \subseteq [[\lambda x.x]] \\ \left { \lambda_y \right } \subseteq [[\lambda y.y]] \\ \lambda_x \subseteq [[\lambda x.x]] \Rightarrow \left ( [[\lambda y.y]] \subseteq [[x]] \wedge [[x]] \subseteq [[\left ( \lambda x.x \right ) \lambda y.y]] \right ) \\ \lambda_y \subseteq [[\lambda x.x]] \Rightarrow \left ( [[\lambda y.y]] \subseteq [[y]] \wedge [[y]] \subseteq [[\left ( \lambda x.x \right ) \lambda y.y]] \right ) \end{array} \end{equation*}$

Which leads to the following solutions :

$\begin{displaymath}\begin{array}[l] & [[x]] = \lbrace \lambda_y \rbrace & [[y]] = \emptyset & [[\lambda x.x]] = \lbrace \lambda_x \rbrace & [[\lambda y.y]] = \lbrace \lambda_y \rbrace & [[\left ( \lambda x.x \right ) \lambda y.y]] = \lbrace \lambda_y \rbrace \end{array}\end{displaymath}$

Which shows that the variable $\lambda_y$ can described our test expression.

Conditional Constraints

Now that we know how to convert the \lambda calculus into set constraints, we need to be able to express $\Rightarrow$ into set constraints too as it is not part of our gramar.

The general form of the conditional constraints is the following : $\lambda_x \subseteq S_3 \Rightarrow S_1 \subseteq S_2$ .

We can transform it like this :

$\begin{displaymath} \begin{array}{l} \lambda_x \cap S_3 \neq \emptyset \Rightarrow S_1 \subseteq S_2 \\ S_0 \neq \emptyset \Rightarrow S_1 \subseteq S_2 \\ f^{-1} \left ( f \left ( S_1, S_0 \right ) \right ) \subseteq S_2 \\ \end{array}\end{displaymath}$

Which has the correct meaning as we have :

$\begin{displaymath} \left \begin{array}{ll} \emptyset & if [[S_0]] = \emptyset & [[S_1]] & otherwise \\ \end{array} \right \} \subseteq S_2 \end{displaymath}$

Shape analysis of algebraic data types

Not seen in class, have a look at the papers !

Using unary functions for interprocedural analysis

Not seen in class, have a look at the papers !

Solving set constraints

Not seen in class, have a look at the papers !

References

Banshee