Labs 02: Lexer for Tool
This assignment is the first real part of the Tool compiler project. Make sure you have read the general project overview page first.
Lexer
Write the lexer for Tool. Here are some details you should pay attention to:
- Make sure you recognize keywords as their own token type.
while
, for instance, should be lexed as the token typeWHILE
, not as an identifier representing an object called “while”. - Make sure you correctly register the position of all tokens. Note the
Source.pos
and thePositioned.setPos
methods. - In general, it is good to output as many errors as possible (this helps whoever uses your compiler). For instance, if your lexer encounters an invalid character, it can output an error message, skip it, and keep on lexing the rest of the input. After lexing, the compiler still won't proceed to the next phase, but this helps the user correct more than one error per compilation. Use the special
BAD
token type to mark errors and keep lexing as long as it is possible. - The Lexer does not immediately read and return all tokens, it returns an Iterator[Token] that will be used by future phases to read tokens on demand.
- New: Comments should not produce tokens.
Code Stubs
You can download the additional stubs from our main repository by following the instructions. It should merge the following files in your project:
lexer/Tokens.scala
: list of token kinds and tokens.lexer/Lexer.scala
: stub for theLexer
phase.Main.scala
: Will call your lexer and list the tokens.
Mind the package names. The structure of your project directory should be as follows:
toolc ├── Main.scala (updated) │ ├── lexer │ ├── Lexer.scala (new) │ └── Tokens.scala │ └── utils ├── Context.scala ├── Positioned.scala ├── Reporter.scala └── Pipeline.scala
Example Output
For reference, here is a possible output for the factorial program:
OBJECT(1:1) ID(Factorial)(1:8) LBRACE(1:18) DEF(2:5) MAIN(2:9) LPAREN(2:13) RPAREN(2:14) COLON(2:16) UNIT(2:18) EQSIGN(2:23) LBRACE(2:25) PRINTLN(3:9) LPAREN(3:16) NEW(3:17) ID(Fact)(3:21) LPAREN(3:25) RPAREN(3:26) DOT(3:27) ID(computeFactorial)(3:28) LPAREN(3:44) INT(10)(3:45) RPAREN(3:47) RPAREN(3:48) SEMICOLON(3:49) RBRACE(4:5) RBRACE(5:1) CLASS(7:1) ID(Fact)(7:7) LBRACE(7:12) DEF(8:5) ID(computeFactorial)(8:9) LPAREN(8:25) ID(num)(8:26) COLON(8:30) INT(8:32) RPAREN(8:35) COLON(8:37) INT(8:39) EQSIGN(8:43) LBRACE(8:45) VAR(9:9) ID(num_aux)(9:13) COLON(9:21) INT(9:23) SEMICOLON(9:26) IF(10:9) LPAREN(10:12) ID(num)(10:13) LESSTHAN(10:17) INT(1)(10:19) RPAREN(10:20) ID(num_aux)(11:13) EQSIGN(11:21) INT(1)(11:23) SEMICOLON(11:24) ELSE(12:9) ID(num_aux)(13:13) EQSIGN(13:21) ID(num)(13:23) TIMES(13:27) LPAREN(13:29) THIS(13:30) DOT(13:34) ID(computeFactorial)(13:35) LPAREN(13:51) ID(num)(13:52) MINUS(13:56) INT(1)(13:58) RPAREN(13:59) RPAREN(13:60) SEMICOLON(13:61) RETURN(14:9) ID(num_aux)(14:16) SEMICOLON(14:23) RBRACE(15:5) RBRACE(16:1) EOF(16:2)
Note that the lexer emits EOF
and only then the Iterator.hasNext
method will return false.
You can also use the reference compiler with the flag --tokens
java -jar toolc-reference-?.?.jar --tokens <file.tool>
to show the tokens of any Tool source file.
Deliverable
Please choose a commit from your git repository as a deliverable on our server before Tuesday, Oct. 15th, 11.59pm (23h59).