Lecture 02: Lexical Analysis
Key insights:
- lexical analyzer maps a stream of characters into a stream of tokens
- while doing that, it typically needs only finite memory
- we can specify tokens for a lexical analyzers using regular expressions
- it is not difficult to construct a lexical analyzer manually – we give an example
- in such case, we often use the first character to decide on token class; there is a notion first(L) = { a | aw in L }
- it is also possible to automate the construction of lexical analyzers; the starting point of this construction is a conversion of regular expressions to deterministic automata
- we follow the maximal munch rule: lexical analyzer should always eagerly accept the longest token that it can recognize from the current point
- tools that automate this construction are part of compiler-compilers such as JavaCC
Hand-Written Scanner for While Language
Background on regular languages and automata:
- Regular Languages and Finite Automata from Andrew M. Pitts
References
- Tiger book, Chapters 1-2
- Slides from previous years:
- Compiler Construction by Niklaus Wirth, Chapters 1-3