Labs 12

This is the last lab, you're almost there :) Due date is Saturday, Jan. 10th, 2009.

In this lab, you will implement range analysis, and will use it to try and prevent potential ArrayIndexOutOfBoundsExceptions.

The lab can seem long at first, but if you follow the steps you should be just fine. The starting one (adding assert statements to the language) may seem unrelated, but the rest builds on it, and you will also be able to use this construct to do more checks if you want.

Assert statements

Add “assert” statement to the language: the syntax is simply assert(expr);. For this you will need to modify all phases of the compiler:

recognize the new keyword in the lexer
add a tree node in the AST and parse the new statement
do name analysis on the asserted expression
type check the expression (it should be a Boolean)
but you don't have to generate code for these statements (actual Java assertions are a mess to generate)

The run-time behaviour of a program with assert statements should be the same as if they weren't there.

Transforming assertions in the control-flow graphs

Modify your control-flow graph generator from the previous lab so that the assert statements get translated to subgraphs equivalent to the following code:

assert(expr);

…translates as…

if(!expr) {
  error
}

…where the special “action” error is represented by a new node defined in CFGTrees.scala:

case class CFGError extends CFGStatement with Positional

You can have all edges labeled with this new error statement go straight to the exit point of the function. This represents abnormal termination, for instance the raising of an AssertionError.

Here's a simple example:

class SA {
	public static void main(String [] args) {
		System.out.println(new SimpleAssert().run());
	}
}
 
class SimpleAssert {
	public int run() {
		assert(5 < 3);
		return 42;
	}
}

Generating "assertions" for array reads and writes

For each array read and write in the control flow graph, add just before it what would be equivalent to an assertion checking that the index is within the bounds of the array. That is, if you have an array read a[i], add the equivalent to assert(i >= 0 && i < a.length); right before it, and similarly for array writes. Make sure you modify your ASTtoCFG.scala file in a way that is not too destructive (ie. set a Boolean flag indicating whether you should add these “assertions”, or better yet, write the code to add the checks in a separate file).

For example, here's some code and the corresponding CFGs, without and with the additional check:

class SA {
	public static void main(String [] args) {
		System.out.println(new SimpleArrayAssign().run());
	}
}
 
class SimpleArrayAssign {
	int [] arr;
 
	public int run() {
		arr = new int[4];
		arr[3] = 42;
		return 42;
	}
}

(Don't mind the extra vertex v3 in the first graph: our implementation always generates an extra vertex to collect assertion failures. You don't see it in the second graph because it gets “compacted” by the call to fewerSkips if they actually were assertions.)

Note that it is enough to have one error edge per array bound check (even if they are actually two checks done), since the positional information will be the same for both (the place in the source code where the index is passed).

Range analysis

Implement range analysis on top of your control-flow graphs. Use these files adapted from Lecture 11 to do that (note that you need to overwrite the file LabeledDirectedGraph.scala from last lab):

Your analysis should stabilize at a fix-point where for all integer variables you know at each program point some estimation of the range in which their value can be. Additionally, for each array you should keep trace of its possibles sizes in another range. You should probably take the following steps:

Collect all integer and integer array variables in your control flow graph (you can omit result# if it is of one of these types)
Define a new PointWiseRangeLattice, as in SignAnalysis.scala
Instantiate it with the collected variables
Write the appropriate transfer function to compute the effect of CFG statements

You should be conservative in your analysis: if you don't know anything about an integer, for instance because it was returned from a method, assume the worst case. For class members, you cannot do much either, unless you could know for sure that no method was called between a write and read to a given field (why?). You are allowed to make worst-case assumptions about values read from fields.

Emitting warnings

It should be clear that the property “an assertion can be violated” is equivalent to “the corresponding error node in the control-flow graph is reachable”. Using the information from the static analysis, traverse the control-flow graphs and emit a warning each time you are able to reach an error node. Use the positional information from that node to emit a warning such as “assertion potentially violated.”. You don't have to distinguish between the assertions you generated for the array bounds and the ones originally in the program, but in the first case, the positional information should point to the place in the source code corresponding to the array index expression.

You may realize now that your analysis can check more properties than just array access safety. If you write assert statements with reasonable test conditions, it should be able to check them too.