Introduction to Pointer Analysis
Pointers and references: a way to manipulate values through a level of indirection.
Uses:
- dynamically allocated data structures
- parameter passing by value and reference
- in compiled code, e.g. left-hand side of assignment evaluates to a memory address
Different programming languages have different variations.
Classification by memory safety:
- memory unsafe
- pointers - in C, C++, assembly
- essentially like addresses, can be incremented
- problems with buffer overruns, dangling pointers, memory leaks - major cause of critical flaws in OS, a big failure of C memory model that manifested decades after its invention
- references
- in ML, ocaml, Java, Scala
- can only redirect and follow a reference, cannot increment like a numerical address
- used with garbage collection
- no dangling pointers, fewer memory leaks, type safety
We mostly look at references, but that also applies to C (but there more work may be needed to be sound).
Examples
#include <stdio.h>
int main()
{
int x;
int* p;
...
p = &x;
...
x = 3;
*p = 4;
if(x==3) {
printf("3\n");
} else {
printf("not 3\n");
}
}
Some of the questions we may want to answer include:
- Can compiler prove one branch is dead?
- Can it do common subexpression elimination?
x = sin(z); *p = 4.0; y = x + sin(z);
Here we cannot be sure of the value of x after the assignment to *p, because p might point to x. So we would be interested if finding out whether we can reuse the computation of complex expression sin(z).
Another question we might be interested in answering:
Is
int* p1, p2; ...S... *p1 = E1; *p2 = E2;
equivalent to
int* p1, p2; ...S... *p2 = E2; *p1 = E1;
There are also corresponding examples in Java, ocaml.
x.f = 3;
p.f = 4;
if(x.f==3) {
printf("3\n");
} else {
printf("not 3\n");
}
let x = ref 3 in
let p = x in
p := 4;
if((!x)=3) then print_string("3\n")
else print_string("not 3\n")
How we can describe this semantically? (Recall verification condition generation in the presence of structures): we can view functions as arrays:
f[x] = 3;
f[p] = 4;
if(f[x]==3) {
printf("3\n");
} else {
printf("not 3\n");
}
In C and ML references, there is an implicit 'cell content' field.
x.f = y
becomes
f := f(x:=y)
Expanding function update:
f(x:=y)(z) = IF(z=x,y,f(z))
Key question for reasoning about the result of such IF expressions: do two pointers (references) point to same location, are two 'memory array' indices equal.
To try answer these questions, we use the following techniques:
- pointer analysis: represent (usually disjoint) sets of locations to which each variable points to. For example, if we can determine that a pointer p points to an abstract location 1, and q points to location 2. If location 1 and location 2 are disjoint, then we can be sure that p and q do not overlap.
- alias analysis: can two symbolic expressions be an 'alias' for the same physical location i.e. can they point to the same location.