LARA

The Tool Compiler Project

The main project in this course consists in implementing toolc, a compiler for a small, Java-like, object-oriented progamming language, which we call Tool. You will code all phases of a modern compiler, resulting in an implementation which will allow you to compile source files into Java Bytecode, in the same way javac does.

The first task in the first Tool lab is to write test cases, that is, sample Tool programs. We will use the input from all groups to constitute a corpus of benchmarks which you will be able to use to test the various phases of your compiler. While it is important that your compiler works on these examples, you should not limit your testing to these particular examples. It is important that you always try to think about the corner cases for each task. You are encouraged to write more examples than what is asked for the corpus, and to share them with your colleagues. Hopefully this will help every group to write a more robust compiler.

Reference Implementation

We have written a reference implementation for the toolc compiler. It compiles Tool programs to Java bytecode .class files that you can then run with java. You can download our compiler here. You can use it to compare the output with yours, or more generally to test that some code is a valid Tool program. Use it as follows:

java -jar toolc-reference-*.*.jar program.tool

By default, it outputs the class files in the same directory. Run them by typing:

java MainObj

…where MainObj is the name of the main object.

If you think something's wrong with our implementation, please let us know, you might very well be right.

Differences with Java and Scala

Although the syntax may be close to the one of Scala or Java, you should really try to think of Tool as a different programming language. Never assume anything about its syntax or semantics if it's not properly described. If we forgot to specify something, ask in the forum, but don't assume it will follow Scala's or Java's way, it may or may not.

Class stubs and deliverables

For each toolc lab, we will give you some classes or class stubs. The goal is to make sure your project always has a structure that is easy to extend in future steps. We will in general only ask you to code the parts relevant to the specific phase you're asked to implement, and give you “glue-code” (interfaces, method stubs, etc.) to make sure everything fits together.

General purpose classes and code

This section describes some of the code that we give you and that applies to more than one phase. It will be extended as the project grows.

Main Project Components

Context

The compilation context represents state that is be available through the entire compilation process. It will store compilation options (if any), as well as a reporter (see Reporter) instance.

Pipeline

The compiler pipeline will assemble the different phases of compilation. It will be responsible of chaining the phases and passing state from one phase to the next. At the end of your project, your compiler pipeline will at least contain the following phases:

LexerParserName AnalysisType CheckingBackend (bytecode generation)

Pipelines are defined by chaining Pipeline nodes. Each node has an input type F and output type T, and expose a method run from F to T. For instance, the Lexer takes a File as input and returns an Iterator[Token]. It is thus defined as a Pipeline[File, Iterator[Token]].

Reporter

The compiler will need to output information to the user, beside the final binary files; warnings, error messages. We have written a Reporter class for you that does just that.

The reporter remembers whenever it had to emit an error. This allows you to put milestones in your pipeline that abort the compilation whenever errors occurred (e.g. after having discovered multiple parse errors and reported them to the user, you can skip all the following phases by stopping the compilation process).

Positioned objects

To get meaningful error messages, it is important to maintain the position of various objects (tokens, and later symbols in the syntax trees, for instance). We provide you with a Positioned trait to do this. Note that the .setPos method always returns the object on which you called it, with its proper type. This can be useful to directly assign a position to an object when you create it, and return it in the same statement. The Reporter class will use this position information (when available) to display nice error messages with the relevant line of source code.