Basic Idea of First Symbol Computation
When Exactly Does Recursive Descent Work?
When can we be sure that recursive descent parser will parse grammar correctly?
- it will accept without error exactly when string can be derived
 
Consider grammar without repetition construct * (eliminate it using right recursion).
Given rules
X ::= p X ::= q
that is,
X ::= p | q
where p,q are sequences of terminals and non-terminals, we need to decide which one to use when parsing X, based on the first character of possible string given by p and q.
- first(p) - first characters of strings that p can generate
 - first(q) - first characters of strings that q can generate
 - requirement: first(p) and first(q) are disjoint
 
How to choose alternative: check whether current token belongs to first(p) or first(q)
Computing 'first' in Simple Case
Assume for now
- no non-terminal derives empty string, that is:
 
For every terminal X, if X ⇒* w and w is a string of terminals, then w is non-empty
We then have
- first(X …) = first(X)
 - first(“a” …) = {a}
 
We compute first(p) set of terminals for
- every right-hand side alternative p, and
 - every non-terminal X
 
Example grammar:
S ::= X | Y X ::= "b" | S Y Y ::= "a" X "b" | Y "b"
Equations:
- first(S) = first(X|Y) = first(X)
 first(Y) - first(X) = first(“b” | S Y) = first(“b”)
 first(S Y) = {b} 
 first(S) 
How to solve equations for first?
Expansion: first(S) = first(X) 
 first(Y) = {b} 
 first(S) 
 {a} 
 first(Y)
- could keep expanding forever
 - does further expansion make difference?
 - is there a solution?
 - is there unique solution?
 
Bottom up computation, while there is change:
- initially all sets are empty
 - if right hand side is bigger, add different to left-hand side
 
Solving equations
- first(S) = first(X)
 first(Y) - first(X) = {b}
 first(S) - first(Y) = {a}
 first(Y) 
bottom up
| first(S) | first(X) | first(Y) | 
|---|---|---|
| {} | {} | {} | 
| {} | {b} | {a} | 
| {a,b} | {b} | {a} | 
| {a,b} | {a,b} | {a} | 
| {a,b} | {a,b} | {a} | 
Does this process terminate?
- all sets are increasing
 - a finite number of symbols in grammar
 
There is a unique least solution
- this is what we want to compute
 - the above bottom up algorithm computes it
 
General Remark:
- this is an example of a 'fixed point' computation algorithm
 - also be useful for semantic analysis, later