The Tool Compiler Project
The main project in this course consists in implementing
toolc, a compiler for a small, Java-like, object-oriented progamming language, which we call Tool. You will code all phases of a modern compiler, resulting in an implementation which will allow you to compile source files into Java Bytecode, in the same way
The first task in the first Tool lab is to write test cases, that is, sample Tool programs. We will use the input from all groups to constitute a corpus of benchmarks which you will be able to use to test the various phases of your compiler. While it is important that your compiler works on these examples, you should not limit your testing to these particular examples. It is important that you always try to think about the corner cases for each task. You are encouraged to write more examples than what is asked for the corpus, and to share them with your colleagues. Hopefully this will help every group to write a more robust compiler.
We have written a reference implementation for the
toolc compiler. It compiles Tool programs to Java bytecode
.class files that you can then run with
java. You can download our compiler here. You can use it to compare the output with yours, or more generally to test that some code is a valid Tool program. Use it as follows:
java -jar toolc-reference-*.*.jar program.tool
By default, it outputs the class files in the same directory. Run them by typing:
MainObj is the name of the main object.
If you think something's wrong with our implementation, please let us know, you might very well be right.
Differences with Java and Scala
Although the syntax may be close to the one of Scala or Java, you should really try to think of Tool as a different programming language. Never assume anything about its syntax or semantics if it's not properly described. If we forgot to specify something, ask in the forum, but don't assume it will follow Scala's or Java's way, it may or may not.
Class stubs and deliverables
toolc lab, we will give you some classes or class stubs. The goal is to make sure your project always has a structure that is easy to extend in future steps. We will in general only ask you to code the parts relevant to the specific phase you're asked to implement, and give you “glue-code” (interfaces, method stubs, etc.) to make sure everything fits together.
General purpose classes and code
This section describes some of the code that we give you and that applies to more than one phase. It will be extended as the project grows.
Main Project Components
The compilation context represents state that is be available through the entire compilation process. It will store compilation options (if any), as well as a reporter (see Reporter) instance.
The compiler pipeline will assemble the different phases of compilation. It will be responsible of chaining the phases and passing state from one phase to the next. At the end of your project, your compiler pipeline will at least contain the following phases:
Name Analysis →
Type Checking →
Backend (bytecode generation)
Pipelines are defined by chaining Pipeline nodes. Each node has an input type
F and output type
T, and expose a method
For instance, the Lexer takes a
File as input and returns an
Iterator[Token]. It is thus defined as a
The compiler will need to output information to the user, beside the final binary files; warnings, error messages. We have written a
Reporter class for you that does just that.
The reporter remembers whenever it had to emit an error. This allows you to put milestones in your pipeline that abort the compilation whenever errors occurred (e.g. after having discovered multiple parse errors and reported them to the user, you can skip all the following phases by stopping the compilation process).
To get meaningful error messages, it is important to maintain the position of various objects (tokens, and later symbols in the syntax trees, for instance). We provide you with a
Positioned trait to do this. Note that the
.setPos method always returns the object on which you called it, with its proper type. This can be useful to directly assign a position to an object when you create it, and return it in the same statement. The
Reporter class will use this position information (when available) to display nice error messages with the relevant line of source code.