The Tool Compiler Project
The main project in this course consists in implementing toolc
, a compiler for a small, Java-like, object-oriented progamming language, which we call Tool. You will code all phases of a modern compiler, resulting in an implementation which will allow you to compile source files into Java Bytecode, in the same way javac
does.
The first task in the first Tool lab is to write test cases, that is, sample Tool programs. We will use the input from all groups to constitute a corpus of benchmarks which you will be able to use to test the various phases of your compiler. While it is important that your compiler works on these examples, you should not limit your testing to these particular examples. It is important that you always try to think about the corner cases for each task. You are encouraged to write more examples than what is asked for the corpus, and to share them with your colleagues. Hopefully this will help every group to write a more robust compiler.
Reference Implementation
We have written a reference implementation for the toolc
compiler. It compiles Tool programs to Java bytecode .class
files that you can then run with java
. You can download our compiler here. You can use it to compare the output with yours, or more generally to test that some code is a valid Tool program. Use it as follows:
java -jar toolc-reference-*.*.jar program.tool
By default, it outputs the class files in the same directory. Run them by typing:
java MainObj
…where MainObj
is the name of the main object.
If you think something's wrong with our implementation, please let us know, you might very well be right.
Differences with Java and Scala
Although the syntax may be close to the one of Scala or Java, you should really try to think of Tool as a different programming language. Never assume anything about its syntax or semantics if it's not properly described. If we forgot to specify something, ask in the forum, but don't assume it will follow Scala's or Java's way, it may or may not.
Class stubs and deliverables
For each toolc
lab, we will give you some classes or class stubs. The goal is to make sure your project always has a structure that is easy to extend in future steps. We will in general only ask you to code the parts relevant to the specific phase you're asked to implement, and give you “glue-code” (interfaces, method stubs, etc.) to make sure everything fits together.
General purpose classes and code
This section describes some of the code that we give you and that applies to more than one phase. It will be extended as the project grows.
Main Project Components
Context
The compilation context represents state that is be available through the entire compilation process. It will store compilation options (if any), as well as a reporter (see Reporter) instance.
Pipeline
The compiler pipeline will assemble the different phases of compilation. It will be responsible of chaining the phases and passing state from one phase to the next. At the end of your project, your compiler pipeline will at least contain the following phases:
Lexer
→ Parser
→ Name Analysis
→ Type Checking
→ Backend (bytecode generation)
Pipelines are defined by chaining Pipeline nodes. Each node has an input type F
and output type T
, and expose a method run
from F
to T
.
For instance, the Lexer takes a File
as input and returns an Iterator[Token]
. It is thus defined as a Pipeline[File, Iterator[Token]]
.
Reporter
The compiler will need to output information to the user, beside the final binary files; warnings, error messages. We have written a Reporter
class for you that does just that.
The reporter remembers whenever it had to emit an error. This allows you to put milestones in your pipeline that abort the compilation whenever errors occurred (e.g. after having discovered multiple parse errors and reported them to the user, you can skip all the following phases by stopping the compilation process).
Positioned objects
To get meaningful error messages, it is important to maintain the position of various objects (tokens, and later symbols in the syntax trees, for instance). We provide you with a Positioned
trait to do this. Note that the .setPos
method always returns the object on which you called it, with its proper type. This can be useful to directly assign a position to an object when you create it, and return it in the same statement. The Reporter
class will use this position information (when available) to display nice error messages with the relevant line of source code.