LARA

Relational Semantics of a Simple Language

We will next give meaning (semantics) for our Simple Programming Language using relations between initial and final state of each command. We illustrate the use of this semantics by showing certain very simple program equivalence properties.

When we write “x” we mean the variable named x viewed as a syntactic object (string or a node in syntax tree). We sometimes omit quotes.

Examples of Semantics for Some Programs

In such a semantics, if the state of the program is given by one integer variable x, then the meaning of this 'increment program':

x = x + 3;
x = x - 2

is the relation $r_1 = \{(0,1),(1,2),(2,3),...,(-1,0),(-2,-1),(-3,-2),...\}$.

A pair of integers $(v,v')$ is in relation, written $(v,v') \in r_1$, if and only if $v' = v + 1$. Using set comprehensions, we write

\begin{equation*}
    r_1 = \{(v,v') \mid v'=v+1 \}
\end{equation*}

The meaning $r_2$ of

x = x + x

is

\begin{equation*}
   r_2 = \{ (v,v') \mid v' = 2v \}
\end{equation*}

The meaning of

while (true) { }

is empty relation.

Why Relations

The meaning is, in general, an arbitrary relation. Therefore:

  • For certain states there will be no results. In particular, if a computation starting at a state $s$ does not terminate, then for no $s'$ will there be a state of the form $(s,s')$ in the relation.
  • For certain states there will be multiple results, when $(s,s_1) \in r$ and $(s,s_2) \in r$ for $s_1 \neq s_2$. Intuitively, this means command execution starting in $s$ will sometimes compute $s_1$ and sometimes $s_2$. Verification of such program must account for both possibilities.
  • Multiple results are important for modelling e.g. concurrency, as well as approximating behavior that we do not know (e.g. what the operating system or environment will do, or what the result of complex computation is)

Program States, Relations on States

Our programs can have any finite number of integer variables. Let $V$ denote the set of variable names such as 'x' in the example above. The state of our program is a function $V \to R$ mapping variable names to their values, and the set of states

\begin{equation*}
    S = (V \to R)
\end{equation*}

is the set of all such functions. We assume that the range of variables is $R = \mathbb{Z}$ (the set of integers), but most of what we say will be independent on whether the variables are integers or have some other type.

In the above increment program, $V = \{"x"\}$ and a state is of the form $\{("x",v)\}$ for some integer $v$ (recall that we represent each function $x$ as relation $\{(x,f(x)\mid x \in D \}$). So the increment function is given by a relation, call it $r$, that takes the state $\{("x",v)\}$ and maps it to state $\{("x",v+1)\}$:

\begin{equation*}
    r = \{(\{("x",v)\}, \{("x",v')\}) \mid v' = v + 1 \}
\end{equation*}

What happens when we have more than one variable? Let $V = \{"x", "y"\}$ and consider this program:

x = y + y

The meaning of this program is given by relation

\begin{equation*}
   r = \{(\{("x",v),("y",u)\},\{("x",v'),("y",u')\}) \mid v'=2u \land u' = u \}
\end{equation*}

Renaming variables so the initial value of variable “x” is denoted $x$ and final value denoted $x'$, we have

\begin{equation*}
   r = \{(\{("x",x),("y",y)\},\{("x",x'),("y",y')\}) \mid x'=2y \land y' = y \}
\end{equation*}

I will call the relation r the relational semantics of this command and the formula $x'=2y \land y'=y$ the formula semantics of this command.

When we have multiple variables, we can give semantics to assignment statements using function update notation. Regardless of the number of variables in $V$, the relational semantics of x=y+y; is

\begin{equation*}
    r = \{(s,s["x" \mapsto 2 s("y")]) \mid s \in S \}
\end{equation*}

Semantic Functions

Let $C$ denote the set of all commands (programs). Our goal is to define semantic function for commands, denoted $r_c$ (relation of the command), which maps each command $c_1$ into its relational semantics $r_c(c_1) \subseteq S \times S$. Formally,

\begin{equation*}
   r_c : C \to 2^{S \times S}
\end{equation*}

To define meaning of statements such as x=y+y; we need a way to evaluate a term such as y+y in a given state. We introduce function $f_T$ (function of the term), which maps a term into a function from states to values:

\begin{equation*}
   f_T : T \to (S \to R)
\end{equation*}

Meaning of Assignment

Given some definition of $f_T$, the meaning of the assignment statement is simply

\begin{equation*}
   r_c(x=e) = \{ (s,s["x" \mapsto f_T(e)(s)]) \mid s \in S \}
\end{equation*}

Meaning of Terms

We next define meaning of terms, whose syntax we defined earlier.

T ::= K | V | (T + T) | (T - T) | (K * T) | (T / K) | (T % K)
F ::= (T=T) | (T < T) | (T > T) | (~F) | (F & F) | (F|F)
V ::= x | y | z | ...
K ::= 0 | 1 | 2 | ...

The recursive definition corresponds to a program you would write in e.g. Scala to compute the value of an expression in a given state when writing an interpreter.

Based on that context-free grammar, we can define recursively the terms semantic :

\begin{equation*}
\begin{array}{lcl}
   f_T(k)(s) & = & k \\
   f_T(v)(s) & = & s(v) \\
   f_T(t_1 + t_2)(s) & = & f_T(t_1)(s) + f_T(t_2)(s) \\
   f_T(t_1 - t_2)(s) & = & f_T(t_1)(s) - f_T(t_2)(s) \\
   f_T(k * t)(s) & = & k * f_T(t)(s) \\
   f_T(t / k)(s) & = & f_T(t)(s) / k \\
   f_T(t \% k)(s) & = & f_T(t)(s) \% k \\
   f_T(t_1 = t_2)(s) & = & f_T(t_1) = f_T(t_2) \\
   f_T(t_1 < t_2)(s) & = & f_T(t_1) < f_T(t_2) \\
   f_T(t_1 > t_2)(s) & = & f_T(t_1) > f_T(t_2) \\
   f_T(\neg f)(s) & = & \neg f_T(f)(s) \\
   f_T(f_1 \& f_2)(s) & = & f_T(f_1)(s) \land f_T(f_2) \\
   f_T(f_1 | f_2)(s) & = & f_T(f_1)(s) \lor f_T(f_2)(s) 
\end{array}
\end{equation*}

Sequential Composition as Relation Composition

To execute a statement sequence, we execute the first statement and from the resulting state execute the second statement. We therefore define statement sequential composition as relation composition of statement meanings:

\begin{equation*}
   r_c(c_1\ ;\ c_2) = r_c(c_1) \circ r_c(c_2)
\end{equation*}

Example: commuting assignments

Consider the task of a programmer or a compiler who wishes to reorder two assignment statements, converting

x = e1;
y = e2

into

y = e2;
x = e1

where $x,y \in V$ and $x \neq y$. Under what conditions will these two program fragments have the same meaning?

To answer that question, we just have to derive these two expressions :

\begin{equation*}
\begin{array}{lll}
&   & r_c ( x = e_1 ; y = e_2 ) \\
& = & r_c ( x = e_1 ) \circ r_c ( x = e_2 ) \\
& = & \{ ( s , s[ "x" \mapsto f_T(e_1)(s) ]) | s \in S \} \circ \{ ( s , s[ "y" \mapsto f_T(e_2)(s) ])~|~s \in S \} \\
& = & \{ ( s , ( s["x" \mapsto f_T(e_1)(s)] )["y" \mapsto f_T(e_2)( s["x" \mapsto f_T(e_1)(s) ] ) ] ) )~|~s \in S \} 
\end{array}
\end{equation*}

Symetrically, we have:

\begin{equation*}
\begin{array}{lll}
&   & r_c ( y = e_2 ; x = e_1 ) \\
& = & \{ ( s , ( s["y" \mapsto f_T(e_2)(s)] )["x" \mapsto f_T(e_1)( s["y" \mapsto f_T(e_2)(s) ] ) ] ) )~|~s \in S \} 
\end{array}
\end{equation*}

Thus, if y is free in $e_1$ and x is free in $e_2$, these two fragments will have the same meaning.

Assume Statement

Assume statement is a useful declarative statement, which we will use to define the meaning of several remaining commands. The meaning of assume is simply is the diagonal relation, which produces no resulting states if the given expression is false, and does nothing if the expression is true:

\begin{equation*}
    r_c(\mbox{assume}(e)) = \{ (s,s) \mid f_T(e)(s) \} = \Delta_P
\end{equation*}

where $P = \{ s \mid f_T(e)(s) \}$.

Non-deterministic Choice as Union

Non-deterministic choice is another useful declarative statement, which non-deterministically chooses to execute one of the two statements. We use it to express other statements, as well as to model branching with conditions that we cannot express in the language. Its meaning is simply union of relations:

\begin{equation*}
   r_c(c_1\ \mbox{[]}\ c_2) = r_c(c_1) \cup r_c(c_2)
\end{equation*}

Representing if-then-else Using Non-Deterministic Choice and Assume

We will define the semantics of

if (F) then c1 else c2

as the semantics of

(assume(F); c1) [] (assume(~F); c2)

Note that we are using non-deterministic choice, but exactly one of the two statements in the non-determinitic choice will succeed.

Exercise: expanding the definition of conditionals

We can also express this semantics directly as:

\begin{equation*}
    r_c(\mbox{if}(F)\mbox{ then } c_1 \mbox{ else } c_2) = \{ (s,s') \mid (f_T(F)(s) \land (s,s') \in r_c(c_1)) \lor (\lnot f_T(F)(s) \land (s,s') \in r_c(c_2) \}
\end{equation*}

Non-deterministic Loop as Transitive Closure

Just as we use non-deterministic choice (union) to represent conditional statment, we use non-deterministic loop statement (transitive closure) to represent while loops. A statement loop( c ) executes statement c any number of times, possibly zero. Its meaning is transitive closure of the meaning of c.

\begin{equation*}
   r_c(\mbox{l{}o{}o{}p}(c_1)) = (r_c(c_1))^*
\end{equation*}

Expressing While using Non-deterministic Loop

We define semantics of

while(F) c

as the semantics of

loop (assume(F);c);
assume(~F)

Why does this definition make sense? It will be union of relations of the form:

(assume(F); c)^n
assume(~F)

for all n. But if the value of n is wrong the result will be empty relation:

  • if it executes even if F is false, then assume(F) will give empty relation
  • if it stops even if F is true, then assume(~F) at the end will give empty relation

Therefore, the result is equal to the correct number of execution times.

Summary of Simplifying the Language

Instead of

c ::=  x=T | (if (F) c else c) | c ; c | (while (F) c)

we can have language

c ::=  x=T | assume(F) |  c [] c  |  c ; c | loop c

Applying it to a multiplication example, we obtain

r = 0;
i = y;
loop (
  assume (i > 0);
  r = r + x;
  i = i - 1
);
assume (i <= 0);

Havoc Statement

Havoc statement is another useful declarative statement. It changes a given variable entirely arbitrarily: there will be one possible state for each possible value of integer variable.

\begin{equation*}
    r_c(\mbox{havoc}(x)) = \{ (s,s') \mid \forall v \in V. v \neq "x" \rightarrow s(v)=s'(v) \}
\end{equation*}

Expressing Assignment with Havoc+Assume

We can prove that the following equality holds under certain conditions:

\begin{equation*}
    r_c(\mbox{havoc}(x);\mbox{assume}(x=e)) = r_c(x=e)
\end{equation*}

In other words, assigning a variable is the same as changing it arbitrarily and then assuming that it has the right value. Under what condition does this equality hold?

Summary of Correspondence between Programs and Relations

program construct relational meaning
sequential composition relation composition
branching union
loops transitive closure
assignment update of one state component
havoc arbitrary change of one state component

Control-Flow Graphs, While Theorem: One Loop is Enough

  • from programs to control-flow graphs: transformation rules
  • interpreter running over control flow graph
  • normal form using only one loop and one extra variable

Sum of all matrix elements as control-flow graph and as one loop.

Proving Program Properties

A general approach:

  • compute semantics of a program as a mathematical object (e.g. relation)
  • prove that the relation satisfies the desired property

Example

Consider the code

r = 0;
i = y;
while (i > 0) (
  r = r + x;
  i = i - 1
)

We represent it as

r = 0;
i = y;
loop (
  assume (i > 0);
  r = r + x;
  i = i - 1
);
assume (i <= 0);

We compute the meaning of

assume (i > 0);
r = r + x;
i = i - 1

as the relation $b$ whose formula is

\begin{equation*}
    i > 0 \land r' = r + x \land i' = i - 1
\end{equation*}

We then wish to compute the transitive closure $b^* = \bigcup_{n \ge 0} b^n$.

We claim that for every $n \ge 0$ relation $b^n$ is given by formula

\begin{equation*}
    i \ge n \land r' = r + n x \land i' = i - n
\end{equation*}

We can prove this claim by induction. The union of these relations is then given by

\begin{equation*}
     \exists n \ge 0. i \ge n \land r' = r + n x \land i' = i - n
\end{equation*}

From there we can derive the meaning of

r = 0;
i = y;
loop (
  assume (i > 0);
  r = r + x;
  i = i - 1
)

as

\begin{equation*}
     \exists n \ge 0. y \ge n \land r' = n x \land i' = y - n
\end{equation*}

which when composed with assume($\neg (i > 0)$) = assume($i \leq 0$) gives

\begin{equation*}
     \exists n \ge 0. y \ge n \land r' = n x \land i' = y - n \land i' \leq 0 
\end{equation*}

which is equivalent to

\begin{equation*}
    y \ge 0 \land i' = 0 \land r' = y x
\end{equation*}

This is the semantics of the above code.

Relations as Specifications

Nondeterministic programs can be used to specify other programs

If $r$ is relation representing program and $s$ relation representing specification, we say that program meets specification iff

\begin{equation*}
    r \subseteq s
\end{equation*}

Example:

This allows us to prove that programs satisfy their contracts, expressed using preconditions and postconditions:

void f()
  requires x > 0
  ensures x > 0
{
  x = x + 1;
}

What is the relation for contract with

  • precondition $P(x)$
  • postcondition $Q(x)$

Answer: $\{(x,x')\ \mid\ P(x) \rightarrow Q(x')\}$

The correctness condition:

\begin{equation*}
    \{ (v,v').\ v' = v + 1 \} \subseteq \{ (v,v').\ v > 0 \rightarrow v' > 0 \}
\end{equation*}

reduces to:

\begin{equation*}
   \forall v. \forall v'.\  v' = v + 1 \rightarrow ( v > 0 \rightarrow v' > 0)
\end{equation*}

Further reading

  • C A R Hoare and He Jifeng. Unifying Theories of Programming. Prentice Hall, 1998
  • Semantics-based Program Analysis via Symbolic Composition of Transfer Relations, PhD dissertation by Christopher Colby, 1996