# LARA

## Introduction to Pointer Analysis

Pointers and references: a way to manipulate values through a level of indirection.

Uses:

• dynamically allocated data structures
• parameter passing by value and reference
• in compiled code, e.g. left-hand side of assignment evaluates to a memory address

Different programming languages have different variations.

Classification by memory safety:

• memory unsafe
• pointers - in C, C++, assembly
• essentially like addresses, can be incremented
• problems with buffer overruns, dangling pointers, memory leaks - major cause of critical flaws in OS, a big failure of C memory model that manifested decades after its invention
• in principle can be fixed (CCured, SafeC); questions of interoperability
• references
• in ML, ocaml, Java, Scala
• can only redirect and follow a reference, cannot increment like a numerical address
• used with garbage collection
• no dangling pointers, fewer memory leaks, type safety

We mostly look at references, but that also applies to C (but there more work may be needed to be sound).

Examples

```#include <stdio.h>
int main()
{
int x;
int* p;
...
p = &x;
...
x = 3;
*p = 4;
if(x==3) {
printf("3\n");
} else {
printf("not 3\n");
}
}```

Some of the questions we may want to answer include:

• Can compiler prove one branch is dead?
• Can it do common subexpression elimination?
```  x = sin(z);
*p = 4.0;
y = x + sin(z);```

Here we cannot be sure of the value of x after the assignment to *p, because p might point to x. So we would be interested if finding out whether we can reuse the computation of complex expression sin(z).

Another question we might be interested in answering:

Is

```  int* p1, p2;
...S...
*p1 = E1;
*p2 = E2;```

equivalent to

```  int* p1, p2;
...S...
*p2 = E2;
*p1 = E1;```

There are also corresponding examples in Java, ocaml.

```  x.f = 3;
p.f = 4;
if(x.f==3) {
printf("3\n");
} else {
printf("not 3\n");
}```
```  let x = ref 3 in
let p = x in
p := 4;
if((!x)=3) then print_string("3\n")
else print_string("not 3\n")```

How we can describe this semantically? (Recall verification condition generation in the presence of structures): we can view functions as arrays:

```  f[x] = 3;
f[p] = 4;
if(f[x]==3) {
printf("3\n");
} else {
printf("not 3\n");
}```

In C and ML references, there is an implicit 'cell content' field.

`x.f = y`

becomes

`f := f(x:=y)`

Expanding function update:

`f(x:=y)(z) = IF(z=x,y,f(z))`

Key question for reasoning about the result of such IF expressions: do two pointers (references) point to same location, are two 'memory array' indices equal.

To try answer these questions, we use the following techniques:

• pointer analysis: represent (usually disjoint) sets of locations to which each variable points to. For example, if we can determine that a pointer p points to an abstract location 1, and q points to location 2. If location 1 and location 2 are disjoint, then we can be sure that p and q do not overlap.
• alias analysis: can two symbolic expressions be an 'alias' for the same physical location i.e. can they point to the same location.