LARA

Labs 03: Parser

This week and the next one, you'll work on the second part of the Tool compiler project. Your goal is to manually implement a recursive-descent parser to transform programs described by the Tool grammar into Abstract Syntax Trees. You also need to write a pretty-printer for these trees. This assignment is rather long and we can only recommend that you start early, that you make sure you understand every step, and that you ask otherwise.

One way to partially check your implementation is to use the pretty-printer to output the parsed AST (reproducing the input file from the trees, if you want). If you then re-parse this output and re-pretty-print it, you should obtain exactly the same result. In other words: for all input programs $P$, you should have that the identity $print(parse(P)) = print(parse(print(parse(P))))$ holds. While this condition is necessary to ensure you have properly written your parser and printer, it is not sufficient, and you are as usual responsible for checking the details of your implementation before you submit it.

As testcases, you can use the ones that you and the other students in this class wrote. As usual, your compiler needs to work on these examples, but this may not be sufficient. Some corner cases may not be covered in those. Also, note that these are all valid programs, yet the parser's role is also to reject invalid programs!

Parser

Write a recursive-descent Parser for Tool by manually writing mutually recursive procedures, as sketched in the lecture on recursive descent parsing, and generating the appropriate Abstract Syntax Trees. The stub we give you hints at how to implement this.

Notes

  • It's not allowed in Tool to have a variable with a name that matches a keyword. You will never have a variable or a class called length, for instance. (Although that would be allowed in Java or Scala, for instance.)
  • You need to properly encode the operator precedence as in Java and Scala. From highest priority to lowest: !, then * and /, then + and -, then < and ==, then &&, then ||. Except for the unary operators, all operators are left-associative. There is no precedence between operators of the same level. For instance: $4 / 3 * 6$ reads as $((4 / 3) * 6)$ and $6 * 4 / 3$ as $((6 * 4) / 3$.
  • An else keyword always applies to the closest if. You can add comments to mark the end of if blocks in your pretty-printer to make sure you parse things properly.
if(expr1) if(expr2) a=1; else a=2;

…could be reprinted as:

if(expr1) { if(expr2) { a=1; } else { a=2; } }

…and should not produce:

if(expr1) { if(expr2) { a=1; } } else { a=2; }
  • You have to write the parser manually, as a series of mutually recursive functions. You are not allowed to use parser combinators or other such libraries.
  • You need to set the position to the trees. Normally, the position of the tree corresponding to “42 + 23” is the position of the '4'.

Pretty-printer

Write a pretty-printer for the AST hierarchy you defined in the previous step. Your pretty-printer should transform any valid AST into the corresponding textual representation in the Tool programming language. The output does not need to be exactly as the input file; whitespaces can be placed differently, comments will disappear and parentheses can be added (or removed), as long as the program keeps the original intended meaning. Re-parsing the pretty-printed output and re-printing it should yield an identical result, though.

Note that if you don't print any parenthesis, you will pass printing checks. However, the trees parsed in the two phases will look very differently indicating that you printed them incorrectly. For example, parsing:

 (a + 3) * 4   // Times(Plus(a, 3), 4)

and printing it (incorrectly) as

 a + 3 * 4

which should get reparsed as

 a + (3 * 4)  // Plus(a, Times(3, 4))

and reprinted (correctly) as

 a + 3 * 4

will pass the checks even though it is incorrect. You should thus print as many parentheses as necessary to ensure that the parsed trees are the same.

Stubs

You can download the stubs from our main repository by following the instructions. It should merge the following files in your project:

  • ast/Parser.scala contains a stub for the LL parser.
  • ast/Printer.scala contains a stub for the pretty-printer. Note that applying the printer to a tree returns a string and prints out no output directly.
  • Main.scala contains the updated pipeline construction.

As usual, mind the package names. The structure of your project src directory should be as following:

toolc
  ├── Main.scala            (updated this week)
  │
  ├── lexer
  │    ├── Lexer.scala      
  │    └── Tokens.scala     
  │
  ├── eval                     
  │    └── Evaluator.scala 
  │ 
  ├── ast
  │    ├── Parser.scala     (stub given this week)
  │    ├── Printer.scala    (stub given this week)
  │    └── Trees.scala
  │
  └── utils
       ├── Context.scala
       ├── Positioned.scala
       ├── Reporter.scala
       └── Pipeline.scala

Deliverable

Please choose a commit from your git repository as a deliverable on our server before Sunday, Nov. 09th, 11.59pm (23h59).