Labs 06: Code Generation
Congratulations, your front-end is complete! You are now one step away from completing your compiler. The last step of the compiler is to generate binary code for the Java Virtual Machine. This week's lab description is short, but it is important that you study the documentation of the support library before you begin.
Support library
Generating class files is a tedious and not so interesting task. Therefore you will use a library for this, called Cafebabe.
Cafebabe provides an interface to generate class files and add fields and methods to them. To generate method code, a CodeHandler
provides and interface to add JVM opcodes. In addition to the concrete opcodes which map 1-to-1 to JVM opcodes, there are some abstract opcodes, which abstract some details of the underlying JVM opcode and help you generate code more easily.
For details please read the Cafebabe documentation pages.
If in doubt, compile an example with our reference compiler and run javap -c
on the .class
file to see what we did.
Concatenating and printing strings
To concatenate and print out strings, you will need to call methods from the Java library (as there are no bytecodes to do that directly). You only need to handle the cases where you print booleans, integers or strings1). Turns out that System.out.println(…) of the Java library is conveniently overloaded for all these cases.
To call System.out.println(…), you need to invoke the proper method println(…) (with the signature matching what you're trying to print, that is) on the static object System.out, so you need to emit a GETSTATIC bytecode, then emit the code for the expression you're trying to print, then emit an INVOKEVIRTUAL bytecode. System.out is of type java.io.PrintStream.
Concatenation is done using java.lang.StringBuilder. The procedure consists in building a new instance of a StringBuilder, then appending to it whatever you want to concatenate together, then calling toString() on the builder. The append method is overloaded for strings and integers, and you're not asked to be able to concatenate any other type.
Equality
We will handle equality in a simple way:
- integers and booleans are compared by value
- other types (strings, arrays and objects) are compared by reference. In the case of strings, the result may or may not be the same as calling .equals(…), depending on whether the strings are constant. (In other words, don't do anything special for equality.)
Boolean || and && operators
You have to apply lazy evaluation to those two boolean operators (short-circuiting). E.g. make sure your compiled code for expressions such as (true || (1/0 == 1)) doesn't crash (this is the example expression that we give in the code).
Type-specific bytecode names in the JVM
Some bytecodes contain a letter indicating to which type of operands they apply (eg. ILOAD which loads integers, IF_ACMPEQ which compares two references for equality or LRETURN which returns a long
value). The naming convention is as follows:
Letter | Corresponding type |
---|---|
I | Integer/Boolean |
L | Long |
D | Double |
F | Float |
A | Reference (object or array) |
Note that returning from a void
method is done using the RETURN bytecode (no prefix letter).
Type names in the JVM
To see how you can refer to a type in the JVM (e.g. for method signatures), consult this page.
Note the the L in the beginning of class types is not needed when we indicate the class type on which we invoke a method (e.g. the first argument of the InvokeVirtual
abstract bytecode).
Your task
Complete the stub for CodeGeneration.scala such that your compiler emits class files. The generated class files should be executable on the Java Virtual Machine: you should be able to run the main class with java
and get the same result as with our reference compiler for valid Tool programs. (If you think the reference compiler is doing something wrong, let us know.)
Your compiler should generate class files (one per class, as in Java, and one for the main object) silently if the compilation is successful, or generate errors and warnings from the previous phases in case there was a problem. The code generation phase should not produce any error. You should store all the relevant meta information in the class files: line numbers and source file identification (see the Cafebabe doc).
Example of code generation
package example import cafebabe._ import AbstractByteCodes._ import ByteCodes._ object CodegenExample { /* We will translate the code: * program Main { * println(new Foo().bar(1)); * } * * class Foo { * var x: Int; * def bar(i: Int): Int = { * x = 42; * return (x + i); * } * } * * Note: We are not adding source file names or line numbers, * which you should do in your code */ def codegenFoo() { val cf = new ClassFile("Foo", None) cf.addDefaultConstructor cf.addField("I", "x") val ch = cf.addMethod("I", "bar", List("I")).codeHandler // x = 42 ch << ALoad(0) // push "this" into the stack ch << Ldc(42) // push 42 into the stack ch << PutField("Foo", "x", "I") // Now both are popped from the stack // x + i ch << ALoad(0) ch << GetField("Foo", "x", "I") // Stack is value of "x" ch << ILoad(1) // Load an integer from the 1st slot (0 is 'this') ch << IADD // return an Int ch << IRETURN ch.freeze try { cf.writeToFile("out/Foo.class") } catch { case io: java.io.IOException => sys.error("Failed to write file!") } } def codegenMain() { val cf = new ClassFile("Main", None) cf.addDefaultConstructor val ch = cf.addMainMethod.codeHandler // Find java.lang.System.out ch << GetStatic("java/lang/System", "out", "Ljava/io/PrintStream;") // stack is now java.lang.System.out // Initialize a Foo ch << DefaultNew("Foo") // Stack: out, Foo ch << Ldc(1) // Stack: out, Foo, 1 ch << InvokeVirtual("Foo", "bar", "(I)I") // stack: out, bar(1) ch << InvokeVirtual("java/io/PrintStream", "println", "(I)V") ch << RETURN ch.freeze try { cf.writeToFile("out/Main.class") } catch { case io: java.io.IOException => sys.error("Failed to write file!") } } def main(args: Array[String]) = { val dir = new java.io.File("out") if (!dir.exists()) { dir.mkdir() } codegenFoo() codegenMain() } }
Stubs
As usual, merge the stubs of the parser into your main branch by typing
git fetch --all git merge origin/Lab06
(assuming that origin is the name of your remote repository).
It should merge the following files in your project. If you get conflicts, don't panic, they should be relatively easy to resolve.
toolc ├── Main.scala (updated this week) │ ├── code │ └── CodeGeneration.scala (new) │ ├── lexer │ ├── Lexer.scala │ └── Tokens.scala │ ├── eval │ └── Evaluator.scala │ ├── analyzer │ ├── Symbols.scala │ ├── NameAnalysis.scala │ ├── TypeChecking.scala │ └── Types.scala ├── ast │ ├── ASTConstructor.scala │ ├── ASTConstructorLL1.scala │ ├── Parser.scala │ ├── Printer.scala │ └── Trees.scala │ └── utils ├── Context.scala (updated this week) ├── Positioned.scala ├── Reporter.scala └── Pipeline.scala
Deliverables
As usual, please choose a commit from your git repository as a deliverable on our server before Saturday, Dec. 24th, 11.59pm (23h59).
References
- The complete JVM specification (may be hard to digest)
- opcodes arranged conveniently, in particular:by function