Introduction to Pointer Analysis
Pointers and references: a way to manipulate values through a level of indirection.
Uses:
- dynamically allocated data structures
- parameter passing by value and reference
- in compiled code, e.g. left-hand side of assignment evaluates to a memory address
Different programming languages have different variations.
Classification by memory safety:
- memory unsafe
- pointers - in C, C++, assembly
- essentially like addresses, can be incremented
- problems with buffer overruns, dangling pointers, memory leaks - major cause of critical flaws in OS, a big failure of C memory model that manifested decades after its invention
- references
- in ML, ocaml, Java, Scala
- can only redirect and follow a reference, cannot increment like a numerical address
- used with garbage collection
- no dangling pointers, fewer memory leaks, type safety
We mostly look at references, but that also applies to C (but there more work may be needed to be sound).
Examples
#include <stdio.h> int main() { int x; int* p; ... p = &x; ... x = 3; *p = 4; if(x==3) { printf("3\n"); } else { printf("not 3\n"); } }
Some of the questions we may want to answer include:
- Can compiler prove one branch is dead?
- Can it do common subexpression elimination?
x = sin(z); *p = 4.0; y = x + sin(z);
Here we cannot be sure of the value of x after the assignment to *p, because p might point to x. So we would be interested if finding out whether we can reuse the computation of complex expression sin(z).
Another question we might be interested in answering:
Is
int* p1, p2; ...S... *p1 = E1; *p2 = E2;
equivalent to
int* p1, p2; ...S... *p2 = E2; *p1 = E1;
There are also corresponding examples in Java, ocaml.
x.f = 3; p.f = 4; if(x.f==3) { printf("3\n"); } else { printf("not 3\n"); }
let x = ref 3 in let p = x in p := 4; if((!x)=3) then print_string("3\n") else print_string("not 3\n")
How we can describe this semantically? (Recall verification condition generation in the presence of structures): we can view functions as arrays:
f[x] = 3; f[p] = 4; if(f[x]==3) { printf("3\n"); } else { printf("not 3\n"); }
In C and ML references, there is an implicit 'cell content' field.
x.f = y
becomes
f := f(x:=y)
Expanding function update:
f(x:=y)(z) = IF(z=x,y,f(z))
Key question for reasoning about the result of such IF expressions: do two pointers (references) point to same location, are two 'memory array' indices equal.
To try answer these questions, we use the following techniques:
- pointer analysis: represent (usually disjoint) sets of locations to which each variable points to. For example, if we can determine that a pointer p points to an abstract location 1, and q points to location 2. If location 1 and location 2 are disjoint, then we can be sure that p and q do not overlap.
- alias analysis: can two symbolic expressions be an 'alias' for the same physical location i.e. can they point to the same location.