LARA

Lecture 02: Lexical Analysis

Partial Slides in PDF

Key insights:

  • lexical analyzer maps a stream of characters into a stream of tokens
    • while doing that, it typically needs only finite memory
  • we can specify tokens for a lexical analyzers using regular expressions
  • it is not difficult to construct a lexical analyzer manually – we give an example
  • in such case, we often use the first character to decide on token class; there is a notion first(L) = { a | aw in L }
  • it is also possible to automate the construction of lexical analyzers; the starting point of this construction is a conversion of regular expressions to deterministic automata
  • we follow the maximal munch rule: lexical analyzer should always eagerly accept the longest token that it can recognize from the current point
  • tools that automate this construction are part of compiler-compilers such as JavaCC

Hand-Written Scanner for While Language

Tools for Constructing Lexers

Compiler Construction Tools

Background on regular languages and automata:

References