Automata for LR Parsing without Lookahead
These automata are used to construct LR(0) and SLR parsers
- same automata, SLR just uses some extra information
We describe their states and transitions
If was original start symbol, we
- add (D'::=D eof) to grammar
- make new start symbol
States: Sets of Items
We will construct states as sets of items
LR(0) item is a dotted production
- is grammar rule
- some strings of terminals and non-terminals
Note: this is similar to Earley Parser item
- but there is no more initial position!
We use letters I,J to denote sets of LR(0) items
- approximates a set of paths in a syntax tree
- left side of tree corresponds to input seen so far
Example:
Rules in our simple example grammar:
D' ::= E eof E ::= T E ::= E + T T ::= ID
Example state is , equal to the following set of items:
{ D' ::= .E eof, E ::= .T, E ::= .E + T, T ::= .ID }
Transitions: Closure and Goto Functions
These functions map sets of items to sets of items
Closure is similar to applying prediction in Earley Parser.
If is a set of states, then closure(I) is least set such that
- I closure(I)
- if and is a grammar rule, then
Example: Take the singleton set
{ D' ::= .E eof }
Then closure of this set is :
{ D' ::= .E eof, E ::= .T, E ::= .E + T, T ::= .ID }
where rules are added in this order using the definition.
Goto corresponds to
- scan in Earley Parser when X is a terminal
- completion in Earley Parser when X is a non-terminal
Example: Let us compute goto(, ID). Because only one element contains .ID, we obtain
{ T ::= ID. }
and the closure remains the same because . does not appear before a non-terminal.
I1 = goto(I0, E) = { E ::= E . + T } I2 = goto(I1, +) = { E ::= E + . T , T ::= . ID } I3 = goto(I2, T) = { E ::= E + T. }
Constructing Automaton
Initial state
Example: What we denoted was indeed a correct initial state.
Edges are labeled by grammar terminals and non-terminals
- they are function of the Finite state machine
States and edges are the least sets (T,E) such that
- for each terminal and non-terminal and each , if then
Examples
The automaton in this example is deterministic
- empty set is dead state
- we do not represent dead state or its edges
The automaton represents possible sets of slices through syntax trees. The state records, for each non-terminal the way in which it's last ocurrence was unfolded in the tree.
Consider a parse tree for
ID + ID + ID ^
when parser has read only token ID and +, we have the following cut through the tree, and the last expansions of E and T describe the state.