The Tool Compiler Project

The main project in this course consists in implementing toolc, a compiler for a small, Java-like, object-oriented progamming language, which we call Tool. You will code all phases of a modern compiler, resulting in an implementation which will allow you to compile source files into Java Bytecode, in the same way javac does.

One of the tasks in the first Tool lab is to write test cases, that is, sample Tool programs. We will use the input from all groups to constitute a corpus of benchmarks which you will be able to use to test the various phases of your compiler. While it is important that your compiler works on these examples, you should not limit your testing to these particular examples. It is important that you always try to think about the corner cases for each task. You are encouraged to write more examples than what is asked for the corpus, and to share them with your colleagues. Hopefully this will help every group to write a more robust compiler.

Reference Implementation

We have written a reference implementation for the toolc compiler. It compiles Tool programs to Java bytecode .class files that you can then run with java. You can download our compiler here. You can use it to compare the output with yours, or more generally to test that some code is a valid Tool program. Use it as follows:

java -jar toolc-reference.jar myprog.tool

By default, it outputs the class files in the same directory. Run them by typing:

java MainObj

…where MainObj is the name of the main object.

If you think something's wrong with our implementation, please let us know, you might very well be right.

Differences with Java and Scala

Although the syntax may be close to the one of Scala or Java, you should really try to think of Tool as a different programming language. Never assume anything about its syntax or semantics if it's not properly described. If we forgot to specify something, ask in the forum, but don't assume it will follow Scala's or Java's way, it may or may not.

Class stubs and deliverables

For each toolc lab, we will give you some classes or class stubs. The goal is to make sure your project always has a structure that is easy to extend in future steps. We will in general only ask you to code the parts relevant to the specific phase you're asked to implement, and give you “glue-code” (interfaces, method stubs, etc.) to make sure everything fits together.

General purpose classes and code

This section describes some of the code that we give you and that applies to more than one phase. It will be extended as the project grows.

Mixin composition

You will write the code for the various phases in traits rather than in separate object or class files, and use mixin class composition (a harmless form of multiple inheritance) to merge them into your compiler. You can see how this works already in the first ''toolc'' lab, where our compiler stub combines the capabilities from the Reporter and the Lexer and adds a test method to them.

An advantage of this approach is that all the state information is contained in a single object. This makes it easy, for instance, to have different compilation units for different files, and is in general cleaner that resorting to global objects. Encapsulation is not violated, since private members of traits are not accessible from other parts of a composed object.

Reporter

The compiler will need to output information to the user, beside the final binary files; warnings, error messages. We have written a Reporter class for you that does just that. A potential drawback of the mixin approach is that the various traits that implement the methods for each compilation phase don't have access to a global instance of the Reporter trait.

Fortunately, we can work around this by using explicitely typed self references. Concretely, we can indicate in the various traits that the final, concrete class where they will be used will be of a certain type. This allows us to use methods from that type before the actual composition (in our case, this allows us to call methods from the Reporter trait from all phases). An example of this is given in our stub for the Lexer.

Positional objects

To get meaningful error messages, it is important to maintain the position of various objects (tokens, and later symbols in the syntax trees, for instance). We provide you with a Positional trait to do this. Note that the .setPos method always returns the object on which you called it, with its proper type. This can be useful to directly assign a position to an object when you create it, and return it in the same statement. Positions are stored as an an integer, which encodes both the row and the column. This trick is also used in the scala.io.Position class. You don't have to know about the encoding in details, all you need to know is that scala.io.Source, which you will use to get a character stream from a file in your Lexer, uses that format (through source.pos). This has the additional advantage that we can also use scala.io.Source to display nice error messages (you'll see them, you don't have to do anything for that, it's all in Reporter).