LARA

Introduction to Pointer Analysis

Pointers and references: a way to manipulate values through a level of indirection.

Uses:

  • dynamically allocated data structures
  • parameter passing by value and reference
  • in compiled code, e.g. left-hand side of assignment evaluates to a memory address

Different programming languages have different variations.

Classification by memory safety:

  • memory unsafe
    • pointers - in C, C++, assembly
    • essentially like addresses, can be incremented
    • problems with buffer overruns, dangling pointers, memory leaks - major cause of critical flaws in OS, a big failure of C memory model that manifested decades after its invention
    • in principle can be fixed (CCured, SafeC); questions of interoperability
  • references
    • in ML, ocaml, Java, Scala
    • can only redirect and follow a reference, cannot increment like a numerical address
    • used with garbage collection
    • no dangling pointers, fewer memory leaks, type safety

We mostly look at references, but that also applies to C (but there more work may be needed to be sound).

Examples

#include <stdio.h>
int main()
{
  int x;
  int* p;
  ...
  p = &x;
  ...
  x = 3;
  *p = 4;
  if(x==3) {
    printf("3\n");
  } else {
    printf("not 3\n");
  }
}

Some of the questions we may want to answer include:

  • Can compiler prove one branch is dead?
  • Can it do common subexpression elimination?
  x = sin(z);
  *p = 4.0;
  y = x + sin(z);

Here we cannot be sure of the value of x after the assignment to *p, because p might point to x. So we would be interested if finding out whether we can reuse the computation of complex expression sin(z).

Another question we might be interested in answering:

Is

  int* p1, p2;
  ...S...
  *p1 = E1;
  *p2 = E2;

equivalent to

  int* p1, p2;
  ...S...
  *p2 = E2;
  *p1 = E1;

There are also corresponding examples in Java, ocaml.

  x.f = 3;
  p.f = 4;
  if(x.f==3) {
    printf("3\n");
  } else {
    printf("not 3\n");
  }
  let x = ref 3 in
  let p = x in
    p := 4;
    if((!x)=3) then print_string("3\n")
    else print_string("not 3\n")

How we can describe this semantically? (Recall verification condition generation in the presence of structures): we can view functions as arrays:

  f[x] = 3;
  f[p] = 4;
  if(f[x]==3) {
    printf("3\n");
  } else {
    printf("not 3\n");
  }

In C and ML references, there is an implicit 'cell content' field.

x.f = y

becomes

f := f(x:=y)

Expanding function update:

f(x:=y)(z) = IF(z=x,y,f(z))

Key question for reasoning about the result of such IF expressions: do two pointers (references) point to same location, are two 'memory array' indices equal.

To try answer these questions, we use the following techniques:

  • pointer analysis: represent (usually disjoint) sets of locations to which each variable points to. For example, if we can determine that a pointer p points to an abstract location 1, and q points to location 2. If location 1 and location 2 are disjoint, then we can be sure that p and q do not overlap.
  • alias analysis: can two symbolic expressions be an 'alias' for the same physical location i.e. can they point to the same location.