Parser Combinators
For the task of implementing a parser, we've seen two options so far:
- Hand-written recursive-descent parsers
- Compiler-compilers
We now briefly discuss a third, intermediate one: parser combinators.
Parsers are Functions
From a conceptual point of view, a parser is just a function: it takes as an input a stream of characters or tokens, and either successfully consumes a part of the input and returns “success”, or fails to read a proper sequence and fails. Parser combinators are operators that combine several such parsers to build a new one.
From two parsers and , the two basic operations are:
- sequencing: construct the parser that recognizes the language
- alternation: construct the parser that recognizes the language
In a programming language that supports lazy evaluation, it is possible to implement these two operations in a very natural way. Additionally, when the language supports a form of operator overloading, it is possible to construct parsers in a way that looks very close to context-free grammars; we'll see how we can for instance write a parser for arithmetic expressions that looks like this:
def ID : Parser = "x" | "y" | "z" def expr : Parser = factor ~ (( "+" ~ factor | "-" ~ factor ) | epsilon) def factor : Parser = term ~ (( "*" ~ term | "/" ~ term ) | epsilon) def term : Parser = ( "(" ~ expr ~ ")" | ID | NUM )
Implementing Parser Combinators in Scala
See the code here. Observe how simple the definitions of Sequence and Alternative are. Some remarks on this code:
- Sequence and Alternative take lazy arguments. Why?
- There are no semantic actions (the parser just recognizes a string).
- White spaces are not handled specially, ie. “space” is a character like any other.
- Where does backtracking occur?
The second point can be improved by adding a return value as part of the Success class (ie. on success, parsers return the remaining stream and a value).
Note on the ~ combinator
The implementation of ~ as shown in the source file above does not always produce a parser that recognizes . As an example, consider a parser that recognizes any number of repetitions of “a”, and a parser that recognizes the words starting with one “a” and immediately followed by any number of “b”. The sequence “a” “a” “a” “b” will not be recognized as belonging to by the combined parser, because the parser in the combination will consume all “a” and will then fail.
Scala Standard Library Implementation
The standard library includes an implementation of parser combinators (in the package scala.util.parsing.combinator
). There are more combinators, for instance the * and ? operators, as well as full support for semantic actions (see the ^^ combinator).