Automata for LR Parsing without Lookahead

These automata are used to construct LR(0) and SLR parsers

We describe their states and transitions

If $D$ was original start symbol, we

States: Sets of Items

We will construct states as sets of items

LR(0) item is a dotted production $X::=p.q$

Note: this is similar to Earley Parser item

We use letters I,J to denote sets of LR(0) items

Example:

Rules in our simple example grammar:

D' ::= E eof
E ::= T
E ::= E + T
T ::= ID

Example state is $I_0$, equal to the following set of items:

{ D' ::= .E eof,
  E ::= .T,
  E ::= .E + T,
  T ::= .ID }

Transitions: Closure and Goto Functions

These functions map sets of items to sets of items

Closure is similar to applying prediction in Earley Parser.

If $I$ is a set of states, then closure(I) is least set such that

Example: Take the singleton set

{ D' ::= .E eof }

Then closure of this set is $I_0$:

{ D' ::= .E eof,
  E ::= .T,
  E ::= .E + T,
  T ::= .ID }

where rules are added in this order using the definition.

Goto corresponds to

\begin{equation*}
   goto(I,X) = closure(\{ Y::=qX.r \mid (Y::=q.Xr)\in I \})
\end{equation*}

Example: Let us compute goto($I_0$, ID). Because only one element contains .ID, we obtain

{ T ::= ID. }

and the closure remains the same because . does not appear before a non-terminal.

I1 = goto(I0, E) = { E ::= E . + T }
I2 = goto(I1, +) = { E ::= E + . T , T ::= . ID }
I3 = goto(I2, T) = { E ::= E + T. }

Constructing Automaton

Initial state $I_0 = closure(\{D'::=.D \verb! EOF!\})$

Example: What we denoted $I_0$ was indeed a correct initial state.

Edges are labeled by grammar terminals and non-terminals

States and edges are the least sets (T,E) such that

Examples

LR0 Automaton

The automaton in this example is deterministic

The automaton represents possible sets of slices through syntax trees. The state records, for each non-terminal the way in which it's last ocurrence was unfolded in the tree.

Consider a parse tree for

ID +   ID + ID
     ^     

when parser has read only token ID and +, we have the following cut through the tree, and the last expansions of E and T describe the state.

Parse Tree for Expressions