Proof Carrying Code Survey

The idea of proof-carrying code is that compilers emits not only executable code, but also proofs that the executable code satisfies certain desirable properties.

Compile Memory Safety using Proof-Carrying Code

One of the traditional applications of proof-carrying code is the following. Consider compilation from a memory safe language such as Java bytecode to machine code such as i386 assembly. In general, machine code need not satisfy memory safety, so a library function written in assembly may in principle write to arbitrary memory locations that it can compute. Buffer overflows are one manifestation of this problem. Such memory unsafe code is in contrast to memory safe code such as Java bytecodes, where library function can only write to objects reachable by references that are visible to it. Without proof carrying code, if we compile Java bytecodes to native code (using, for example, in GCJ) and ship it over network, the recepient has no assurance that the received machine code still satisfies memory safety. Proof-carrying code allows us to recover this missing guarantee, because the compiler will emit the proof object encoding the fact that the result of sequence of instructions in the compiled code is still memory safe. Before executing the code, a client will examine machine code instructions and generate a proof obligation that the instructions are memory-safe. It will then try to see whether the supplied proof object is a valid proof for the generated proof obligation. Therefore, if an adversery changes the machine code by intercepting network traffic and makes the code memory unsafe, the proof will fail and the client will refuse to execute the code. (If the adversary changes both the machine code and the proof object so that the proof object correctly proves that the modified machine code is memory safe, the client will gladly execute the code, which is acceptable because the code is then memory safe, even if it is not equivalent to the original program.)

Enforcing Security Policies

Translation Validation

A very strong specification ensures that the compiled program is equivalent to the original program. If the original program is memory safe, this can easily be used to derive memory safety as a consequence. In general, however, the original program may be complex and it is not always clear how to derive partial correctness properties from knowing the original program. In automated program synthesis, the original program can be declarative and at high level of abstraction, in that case translation validation ensures full functional correctness of the compiled program.

Non-deterministic computation and proofs

Proofs in complexity theory can be used to define non-deterministic complexity classes, such as NP.

In theorem proving, the notion of independently checkable proof certificate is proposed.

Typed Assembly Language

In compilation of functional programs it was observed that preserving types throughout the compilation process catches many compiler bugs. The gern if these ideas was in TIL: a type-directed, optimizing compiler for ML, published in 1994. The focus there was on propagating type information to enable optimizations, but the correctness of both intermediate code and the output code became more influential in the long run, leading to proof carrying code.

Here is Annotated Bibliography by George Necula himself.