Towards a More Systematic Approach

We have seen how to make a lexer for simple language

What if the language has complex operators, each of which can be part of another, e.g.


how do can we make sure to produce a correct sequence of tokens?

We need a more principled solution

Basis: finite state machines

We will see how to

  • build code to recognize any regular expression
  • combine token classes, even if they start with common symbols
  • enforce longest match rule and token priorities
  • build a compiler from regular expressions to lexers
  • such compiler is called 'lexer generator' and is part of
    • text manipulation software, and
    • compiler-compilers