MiniJava+ Compiler Project
The main project in this course consists in implementing minijavac
, a compiler for a (very slightly) extended version of MiniJava, which we call MiniJava+. You will code all phases of a modern compiler, resulting in an implementation which will allow you to compile source files into Java Bytecode, in the same way javac
does. Since we are dealing with a strict subset of Java, you will also be able to directly compare your implementation to javac
, and use javac
to control the results of your compiler.
One of the tasks in the first MiniJava+ lab is to write test cases, that is, sample MiniJava+ programs. We will use the input from all groups to constitute a corpus of benchmarks which you will be able to use to test the various phases of your compiler. While it is important that your compiler works on these examples, you should not limit your testing to these particular examples. It is important that you always try to think about the corner cases for each task. You are encouraged to write more examples than what is asked for the corpus, and to share them with your colleagues. Hopefully this will help every group to write a more robust compiler.
Differences with MiniJava
- Integer divisions
- == comparison operator, || boolean operator
- Block comments may not be nested. While this would be a little more interesting to implement (how would you do it?), the language would no longer be a strict subset of Java, and we don't want this to happen.
- Strings: constant strings, concatenation of strings + strings, and strings + ints, passing strings as arguments, etc.
- We do not require you to allow method calls from the String class (
“this is a string”.length()
, etc.) - We do not (yet?) specify the semantics of string comparison using
==
- The else branch of the if construct is optional.
Class stubs and deliverables
For each minijavac
lab, we will give you some classes or class stubs. The goal is to make sure your project always has a structure that is easy to extend in future steps. We will in general only ask you to code the parts relevant to the specific phase you're asked to implement, and give you “glue-code” (interfaces, method stubs, etc.) to make sure everything fits together.
General purpose classes and code
This section describes some of the code that we give you and that applies to more than one phase. It will be extended as the project grows.
Mixin composition
You will write the code for the various phases in traits rather than in separate object or class files, and use mixin class composition (a harmless form of multiple inheritance) to merge them into your compiler. You can see how this works already in the first minijavac
lab, where our compiler stub combines the capabilities from the Reporter
and the Lexer
and adds a test method to them.
An advantage of this approach is that all the state information is contained in a single object. This makes it easy, for instance, to have different compilation units for different files, and is in general cleaner that resorting to global objects. Encapsulation is not violated, since private members of traits are not accessible from other parts of a composed object.
Reporter
The compiler will need to output information to the user, beside the final binary files; warnings, error messages. We have written a Reporter
class for you that does just that. A potential drawback of the mixin approach is that the various traits that implement the methods for each compilation phase don't have access to a global instance of the Reporter
trait.
Fortunately, we can work around this by using explicitely typed self references. Concretely, we can indicate in the various traits that the final, concrete class where they will be used will be of a certain type. This allows us to use methods from that type before the actual composition (in our case, this allows us to call methods from the Reporter
trait from all phases). An example of this is given in our stub for the Lexer.
Positional objects
To get meaningful error messages, it is important to maintain the position of various objects (tokens, and later symbols in the syntax trees, for instance). We provide you with a Positional trait to do this. Note that the .setPos
method always returns the object on which you called it, with its proper type. This can be useful to directly assign a position to an object when you create it, and return it in the same statement.