LARA

Labs 04

This week and the next one, you'll work on the second part of the MiniJava+ Compiler Project. Your goal is to manually implement a recursive-descent parser to transform programs described by the MiniJava+ grammar into Abstract Syntax Trees. You also need to write a pretty-printer for these trees. This assignment is rather long and we can only recommend that you start early, that you make sure you understand every step and that you ask otherwise.

One way to partially check your implementation is to use the pretty-printer to output the parsed AST (reproducing the input file from the trees, if you want). If you then re-parse this output and re-pretty-print it, you should obtain exactly the same result. In other “words”: for all input programs $P$, you should have $print(parse(P)) = print(parse(print(parse(P))))$. While this condition is necessary to ensure you have properly written your parser and printer, it is not sufficient, and you are as usual responsible for checking the details of your implementation before you submit it.

Parser

Define a class hierarchy to represent nodes in the Abstract Syntax Trees. You should at least have individual nodes corresponding to each production in the grammar, and you can have more if you feel they can be useful. You can refer yourself to the AST structure for the while language from the first lab for an example.

Write a recusive-descent Parser for MiniJava+ by manually writing mutually recursive procedures, as sketched in Recursive Descent Parsing and generating the appropriate Abstract Syntax Trees. You will need a LL(2) parser for the MiniJava+ grammar (why?). The stub we give you hints at how to implement this.

Notes

  • You don't have to recognize length, out, println and main as identifiers. Although they are not keywords in Java and you could in theory call a class length, we will consider that this is forbidden in MiniJava+. These words should only appear in the places where they are used in the grammar. In other words, consider them to be reserved keywords of MiniJava+.
  • You need to properly encode the operator precedence as in Java. From highest priority to lowest: !, then * and /, then + and -, then < and ==, then &&, then ||. Except for the unary operators, all operators are left-associative. There is no precedence between operators of the same level. For instance: 4 / 3 * 6 reads as ((4 / 3) * 6) and 6 * 4 / 3 as ((6 * 4) / 3. (edit: there is no unary minus in MiniJava+)
  • An else keyword always applies to the closest if. You can add comments to mark the end of if blocks in your pretty-printer to make sure you parse things properly.
if(expr1) if(expr2) a=1; else a=2;

…could be reprinted as:

if(expr1) if(expr2) a=1; else a=2; /* end if */ /* end if */

…and should not produce:

if(expr1) if(expr2) a=1; /* end if */ else a=2; /* end if */

Pretty-printer

Write a pretty-printer for the AST hierarchy you defined in the previous step. Your pretty-printer should transform any valid AST into the corresponding textual representation in the MiniJava+ programming language. You can have a look at the pretty-printer for the while language from the first lab if you need inspiration. The output does not need to be exactly as the input file; whitespaces can be placed differently, comments will naturally disappear and parentheses can be added (or removed), as long as the program keeps the original intended meaning. Re-parsing the output and re-printing it should yield an identical result.

Stubs and files

You can use the following stubs:

  • Tokens.scala is an extension of the file from the previous lab. Note that we renamed the trait TokenKind to the more appropriate and less confusing TokenInfo. We also added a new trait TokenClass which accounts for the fact that two identifiers, while containing different information (the respective textual representations of the identifiers), are really from the same class (since they're both identifiers). This new information, which is attached to every token and is accessible through the .tokenClass method, will be useful in the parser.
  • Trees.scala contains a stub for the AST hierarchy. Note that tree nodes are Positional objects and you will need to set their position appropriately by copying the information from the relevant tokens.
  • Parser.scala contains a stub for the LL(2) parser.
  • TreePrinter.scala contains a stub for the pretty-printer. Note that applying the printer to a tree returns a string and prints out no output directly.
  • Main.scala contains the main program. It calls the parser on the input file and uses the pretty-printer as expected. The last few lines re-feed that result to a new instance of the parser and check that the two outputs match. It raises an exception otherwise. You have nothing to change in that file.
  • Compiler.scala combines the Reporter and Parser traits to form the compiler. Note that it doesn't need to explicitly refer to Lexer, since Parser already extends Lexer. You have nothing to change in that file.

As usual, mind the package names. The structure of your project should be as following:

src
 └── minijava
      ├── Compiler.scala        (given this week)
      ├── Main.scala            (given this week)
      ├── Positional.scala      (given in Lab 02)
      ├── Reporter.scala        (given in Lab 02)
      ├── TreePrinter.scala     (stub given this week)
      │
      ├── lexer
      │    ├── Lexer.scala      (adapt your work from Lab 02 to the new Tokens.scala)
      │    └── Tokens.scala     (merge stub from this week with your work from Lab 02)
      │
      └── parser
           ├── Parser.scala     (stub given this week)
           └── Trees.scala      (stub given this week)

Deliverables

Submit an archive of your src directory (containing only .scala files, no binaries, please) through Moodle before Wednesday, Oct. 22nd, 8.15am.