Recursive descent is a decent parsing technique.
| descent: |
|---|
| a movement downward |
| decent: |
|---|
| adequate: sufficient for the purpose; “an adequate income”; “the food was adequate”; “a decent wage”; “enough food”; “good enough” |
(from Wordnet)
In context-free grammars we have seen a grammar of polynomials.
Consider first this version of grammar:
Here is a parser for this grammar: polynomials.pscala
Note correspondence:
| (”+” term)* | ⇒ while (lex.current=PLUS) { lex.next; parseTerm } |
Example: running the code above on “x + y*(u+3)”
For statements, we use keyword to decide what we are about to parse:
| “if” X | “while” Y | ⇒ if (lex.current=IF) parseX else if (lex.current=WHILE) parseY |
Consider a variant of grammar with this definition:
polynomial ::= term | term "+" polynomial
How to write parsePolynomial procedure for this grammar?
Solution: apply left-factoring:
polynomial ::= term ("" | "+" polynomial)
Now we can construct parser:
def parsePolynomial = {
parseTerm
if (lex.current==PLUS) {
lex.next
parsePolynomial
}
}
We obtained tail-recursive version equivalent to a while loop.
Instead of
polynomial ::= term ("" | "+" polynomial)
consider grammar defining the same language:
polynomial ::= ("" | polynomial "+") term
we need to parse polynomial, so we check which alternative to parse, so we check whether the current symbol is parsed by polynomial, so
we need to parse polynomial, so we check which alternative to parse, so we check whether the current symbol is parsed by polynomial, so
we need to parse polynomial, so we check which alternative to parse, so we check whether the current symbol is parsed by polynomial, so ...
This does not work, because it contains left-recursive alternative
polynomial ::= polynomial "+" term
For recursive descent parsing, we need right-recursive rules, which work, like
polynomial ::= term "+" polynomial
Simple forms of right recursion can be expressed using repetition (and become while loops), which work.
If you understand how we do this, you should be able to do it for MiniJava in homework as well.
The parser: WhileParser.scala
When can we be sure that recursive descent parser will parse grammar correctly?
Consider grammar without repetition construct * (eliminate it using right recursion).
Given rules
X ::= p X ::= q
where p,q are sequences of terminals and non-terminals, we need to decide which one to use when parsing X, based on the first character of possible string given by p and q.
How to choose alternative: check whether current token belongs to first(p) or first(q)
Assume for now
For every terminal X, if X =⇒* w and w is a string of terminals, then w is non-empty
We then have
We compute first(p) set of terminals for
Example grammar:
S ::= X | Y X ::= "b" | S Y Y ::= "a" X "b" | Y "b"
Equations:
first(Y)
first(S Y) = {b}
first(S)
Expansion: first(S) = first(X)
first(Y) = {b}
first(S)
{a}
first(Y)
Bottom up computation, while there is change:
Solving equations
first(Y)
first(S)
first(Y)bottom up
| first(S) | first(X) | first(Y) |
|---|---|---|
| {} | {} | {} |
| {} | {b} | {a} |
| {a,b} | {b} | {a} |
| {a,b} | {a,b} | {a} |
| {a,b} | {a,b} | {a} |
Does this process terminate?
There is a unique least solution
General Remark:
In general, a non-terminal can expand to empty string
A sequence of non-terminals is nullable if it can derive an empty string
Computing nullable non-terminals:
Algorithm:
nullable = {}
changed = true
while (changed) {
if X is not nullable and either
1) grammar contains rule
X ::= "" | ...
or
2) grammar contains rule
X ::= Y1 ... Yn | ...
and
{Y1,...,Yn} is contained in nullable
then
nullable = nullable union {X}
changed = true
}
Computing first(X), given rule X =
...
...
,...,
are all nullable, then add first(
) to first(X)Then repeat until no change, as before.
What if we have
X = Y Z | U
and U is nullable? When can we choose a nullable alternative (U)?
t is in follow(X), if there exists a derivation containing substring X t
Example of language with ‘named blocks’:
statements ::= "" | statement statements statement ::= assign | block assign ::= ID "=" (ID|INT) ";" block ::= "begin" ID statements ID "end"
Try to parse
begin myPrettyCode x = 3; y = x; myPrettyCode end
Problem parsing ‘statements’:
Computing follow(
), given rule X =
...
...
...
), if
,...,
are all nullable
), if
, ...,
are all nullablePossible computation order:
Example: compute these values for grammar above
The grammar cannot be parsed because we have
statements ::= "" | statement statements
where
nullable
follow(statements) = {ID} 
If the parser sees ID, it does not know if it should
First, compute nullable, first, follow
Then, make parsing table which stores the alternative, given
Given (X ::=
| ... |
) we insert alternative j into table iff
first(p_j), or
) and t
follow(X)If in parsing table we have two or more alternatives for same token and non-terminal:
Otherwise, we say that the grammar is LL(1)
What about empty entries?
nullable
follow(X), if X
nullableAccording to some opinions error recover is not worth it
Approaches to error recovery in recursive descent parsing:
Recursive descent is decent because
Compute parsing table for WhileParser.scala.