LARA

Labs 02: Lexer for Tool

This assignment is the first real part of the Tool compiler project. Make sure you have read the general project overview page first.

Lexer

Write the lexer for Tool. Here are some details you should pay attention to:

  • Make sure you recognize keywords as their own token type. while, for instance, should be lexed as the token type WHILE, not as an identifier representing an object called “while”.
  • Make sure you correctly register the position of all tokens. Note the Source.pos and the Positioned.setPos methods.
  • In general, it is good to output as many errors as possible (this helps whoever uses your compiler). For instance, if your lexer encounters an invalid character, it can output an error message, skip it, and keep on lexing the rest of the input. After lexing, the compiler still won't proceed to the next phase, but this helps the user correct more than one error per compilation. Use the special BAD token type to mark errors and keep lexing as long as it is possible.
  • The Lexer does not immediately read and return all tokens, it returns an Iterator[Token] that will be used by future phases to read tokens on demand.
  • New: Comments should not produce tokens.

Code Stubs

You can download the additional stubs from our main repository by following the instructions. It should merge the following files in your project:

  • lexer/Tokens.scala: list of token kinds and tokens.
  • lexer/Lexer.scala: stub for the Lexer phase.
  • Main.scala: Will call your lexer and list the tokens.

Mind the package names. The structure of your project directory should be as follows:

toolc
 ├── Main.scala                (updated)
 │
 ├── lexer                     
 │    ├── Lexer.scala          (new)
 │    └── Tokens.scala         
 │
 └── utils
      ├── Context.scala
      ├── Positioned.scala
      ├── Reporter.scala
      └── Pipeline.scala

Example Output

For reference, here is a possible output for the factorial program:

  OBJECT(1:1) ID(Factorial)(1:8) LBRACE(1:18) DEF(2:5) MAIN(2:9) LPAREN(2:13) RPAREN(2:14) COLON(2:16) 
  UNIT(2:18) EQSIGN(2:23) LBRACE(2:25) PRINTLN(3:9) LPAREN(3:16) NEW(3:17) ID(Fact)(3:21) LPAREN(3:25) 
  RPAREN(3:26) DOT(3:27) ID(computeFactorial)(3:28) LPAREN(3:44) INT(10)(3:45) RPAREN(3:47) RPAREN(3:48) 
  SEMICOLON(3:49) RBRACE(4:5) RBRACE(5:1) CLASS(7:1) ID(Fact)(7:7) LBRACE(7:12) DEF(8:5) 
  ID(computeFactorial)(8:9) LPAREN(8:25) ID(num)(8:26) COLON(8:30) INT(8:32) RPAREN(8:35) COLON(8:37) 
  INT(8:39) EQSIGN(8:43) LBRACE(8:45) VAR(9:9) ID(num_aux)(9:13) COLON(9:21) INT(9:23) SEMICOLON(9:26) 
  IF(10:9) LPAREN(10:12) ID(num)(10:13) LESSTHAN(10:17) INT(1)(10:19) RPAREN(10:20) 
  ID(num_aux)(11:13) EQSIGN(11:21) INT(1)(11:23) SEMICOLON(11:24) ELSE(12:9) ID(num_aux)(13:13) 
  EQSIGN(13:21) ID(num)(13:23) TIMES(13:27) LPAREN(13:29) THIS(13:30) DOT(13:34) 
  ID(computeFactorial)(13:35) LPAREN(13:51) ID(num)(13:52) MINUS(13:56) INT(1)(13:58) RPAREN(13:59) 
  RPAREN(13:60) SEMICOLON(13:61) RETURN(14:9) ID(num_aux)(14:16) SEMICOLON(14:23) RBRACE(15:5) 
  RBRACE(16:1) EOF(16:2)

Note that the lexer emits EOF and only then the Iterator.hasNext method will return false.

You can also use the reference compiler with the flag --tokens

java -jar toolc-reference-?.?.jar --tokens <file.tool>

to show the tokens of any Tool source file.

Deliverable

Please choose a commit from your git repository as a deliverable on our server before Tuesday, Oct. 15th, 11.59pm (23h59).