Basic Idea of First Symbol Computation
When Exactly Does Recursive Descent Work?
When can we be sure that recursive descent parser will parse grammar correctly?
- it will accept without error exactly when string can be derived
Consider grammar without repetition construct * (eliminate it using right recursion).
Given rules
X ::= p X ::= q
that is,
X ::= p | q
where p,q are sequences of terminals and non-terminals, we need to decide which one to use when parsing X, based on the first character of possible string given by p and q.
- first(p) - first characters of strings that p can generate
- first(q) - first characters of strings that q can generate
- requirement: first(p) and first(q) are disjoint
How to choose alternative: check whether current token belongs to first(p) or first(q)
Computing 'first' in Simple Case
Assume for now
- no non-terminal derives empty string, that is:
For every terminal X, if X ⇒* w and w is a string of terminals, then w is non-empty
We then have
- first(X …) = first(X)
- first(“a” …) = {a}
We compute first(p) set of terminals for
- every right-hand side alternative p, and
- every non-terminal X
Example grammar:
S ::= X | Y X ::= "b" | S Y Y ::= "a" X "b" | Y "b"
Equations:
- first(S) = first(X|Y) = first(X) first(Y)
- first(X) = first(“b” | S Y) = first(“b”) first(S Y) = {b} first(S)
How to solve equations for first?
Expansion: first(S) = first(X) first(Y) = {b} first(S) {a} first(Y)
- could keep expanding forever
- does further expansion make difference?
- is there a solution?
- is there unique solution?
Bottom up computation, while there is change:
- initially all sets are empty
- if right hand side is bigger, add different to left-hand side
Solving equations
- first(S) = first(X) first(Y)
- first(X) = {b} first(S)
- first(Y) = {a} first(Y)
bottom up
first(S) | first(X) | first(Y) |
---|---|---|
{} | {} | {} |
{} | {b} | {a} |
{a,b} | {b} | {a} |
{a,b} | {a,b} | {a} |
{a,b} | {a,b} | {a} |
Does this process terminate?
- all sets are increasing
- a finite number of symbols in grammar
There is a unique least solution
- this is what we want to compute
- the above bottom up algorithm computes it
General Remark:
- this is an example of a 'fixed point' computation algorithm
- also be useful for semantic analysis, later