Table of Contents

Recursive Descent (Predictive) Parsing

Recursive descent is a decent parsing technique.

descent:
a movement downward
decent:
adequate: sufficient for the purpose; “an adequate income”; “the food was adequate”; “a decent wage”; “enough food”; “good enough”

(from Wordnet)

Basic Idea

Example: Recursive Descent Parsing of Polynomials

In context-free grammars we have seen a grammar of polynomials.

Consider first this version of grammar:

Here is a parser for this grammar: polynomials.pscala

Note correspondence:

(”+” term)* ⇒ while (lex.current=PLUS) { lex.next; parseTerm }

Example: running the code above on “x + y*(u+3)”

For statements, we use keyword to decide what we are about to parse:

“if” X | “while” Y ⇒ if (lex.current=IF) parseX else if (lex.current=WHILE) parseY

Transforming Grammar: Left Factoring

Consider a variant of grammar with this definition:

polynomial ::= term | term "+" polynomial

How to write parsePolynomial procedure for this grammar?

Solution: apply left-factoring:

polynomial ::= term ("" | "+" polynomial)

Now we can construct parser:

def parsePolynomial = {
  parseTerm
  if (lex.current==PLUS) {
    lex.next
    parsePolynomial
  }
}

We obtained tail-recursive version equivalent to a while loop.

Transforming Grammar: Left Recursion Makes Trouble

Instead of

polynomial ::= term ("" | "+" polynomial)

consider grammar defining the same language:

polynomial ::= ("" | polynomial "+") term

we need to parse polynomial, so we check which alternative to parse, so we check whether the current symbol is parsed by polynomial, so

we need to parse polynomial, so we check which alternative to parse, so we check whether the current symbol is parsed by polynomial, so

we need to parse polynomial, so we check which alternative to parse, so we check whether the current symbol is parsed by polynomial, so ...

This does not work, because it contains left-recursive alternative

polynomial ::= polynomial "+" term

For recursive descent parsing, we need right-recursive rules, which work, like

polynomial ::= term "+" polynomial

Simple forms of right recursion can be expressed using repetition (and become while loops), which work.

Recursive Descent Parser for While Language

If you understand how we do this, you should be able to do it for MiniJava in homework as well.

The parser: WhileParser.scala

When Exactly Does Recursive Descent Work?

When can we be sure that recursive descent parser will parse grammar correctly?

Consider grammar without repetition construct * (eliminate it using right recursion).

Given rules

X ::= p
X ::= q

where p,q are sequences of terminals and non-terminals, we need to decide which one to use when parsing X, based on the first character of possible string given by p and q.

How to choose alternative: check whether current token belongs to first(p) or first(q)

Computing 'first' in Simple Case

Assume for now

For every terminal X, if X =⇒* w and w is a string of terminals, then w is non-empty

We then have

We compute first(p) set of terminals for

Example grammar:

S ::= X | Y
X ::= "b" | S Y
Y ::= "a" X "b" | Y "b"

Equations:

How to solve equations for first?

Expansion: first(S) = first(X) $\cup$ first(Y) = {b} $\cup$ first(S) $\cup$ {a} $\cup$ first(Y)

Bottom up computation, while there is change:

Solving equations

bottom up

first(S) first(X) first(Y)
{} {} {}
{} {b} {a}
{a,b} {b} {a}
{a,b} {a,b} {a}
{a,b} {a,b} {a}

Does this process terminate?

There is a unique least solution

General Remark:

General Case: Nullable Non-terminals

In general, a non-terminal can expand to empty string

first(Y Z) = first(Y)?

A sequence of non-terminals is nullable if it can derive an empty string

Computing nullable non-terminals:

Algorithm:

nullable = {}
changed = true
while (changed) {
  if X is not nullable and either
  1) grammar contains rule
    X ::= "" | ...
  or
  2) grammar contains rule 
    X ::= Y1 ... Yn | ...
      and
    {Y1,...,Yn} is contained in nullable
  then
    nullable = nullable union {X}
    changed = true
}

Computing First Given Nullable

Computing first(X), given rule X = $Y_1$ ... $Y_i$ ... $Y_k$

Then repeat until no change, as before.

The Need for Follow

What if we have

X = Y Z | U

and U is nullable? When can we choose a nullable alternative (U)?

t is in follow(X), if there exists a derivation containing substring X t

Example of language with ‘named blocks’:

statements ::= "" | statement statements
statement ::= assign | block
assign ::= ID "=" (ID|INT) ";"
block ::= "begin" ID statements ID "end"

Try to parse

begin myPrettyCode
  x = 3;
  y = x;
myPrettyCode end

Problem parsing ‘statements’:

Computing follow( $Y_i$), given rule X = $Y_1$ ... $Y_i$ ... $Y_j$ ... $Y_k$

Possible computation order:

Example: compute these values for grammar above

The grammar cannot be parsed because we have

statements ::= "" | statement statements

where

If the parser sees ID, it does not know if it should

Recursive Descent Parsing

First, compute nullable, first, follow

Then, make parsing table which stores the alternative, given

Given (X ::= $p_1$ | ... | $p_n$) we insert alternative j into table iff

If in parsing table we have two or more alternatives for same token and non-terminal:

Otherwise, we say that the grammar is LL(1)

What about empty entries?

Error Recovery

According to some opinions error recover is not worth it

Approaches to error recovery in recursive descent parsing:

Summary

Recursive descent is decent because

Exercise

Compute parsing table for WhileParser.scala.