Automata for LR Parsing without Lookahead

These automata are used to construct LR(0) and SLR parsers

same automata, SLR just uses some extra information

We describe their states and transitions

If $D$ was original start symbol, we

add (D'::=D eof) to grammar
make $D'$ new start symbol

States: Sets of Items

We will construct states as sets of items

LR(0) item is a dotted production $X::=p.q$

$X::=pq$ is grammar rule
$p,q$ some strings of terminals and non-terminals

Note: this is similar to Earley Parser item

but there is no more initial position!

We use letters I,J to denote sets of LR(0) items

approximates a set of paths in a syntax tree
left side of tree corresponds to input seen so far

Example:

Rules in our simple example grammar:

D' ::= E eof
E ::= T
E ::= E + T
T ::= ID

Example state is $I_0$ , equal to the following set of items:

{ D' ::= .E eof,
  E ::= .T,
  E ::= .E + T,
  T ::= .ID }

Transitions: Closure and Goto Functions

These functions map sets of items to sets of items

Closure is similar to applying prediction in Earley Parser.

If $I$ is a set of states, then closure(I) is least set such that

I $\subseteq$ closure(I)
if $(X::=p.Yq)\in I$ and $(Y::=r)$ is a grammar rule, then $(Y::=.r)\ \in\ closure(I)$

Example: Take the singleton set

{ D' ::= .E eof }

Then closure of this set is $I_0$ :

{ D' ::= .E eof,
  E ::= .T,
  E ::= .E + T,
  T ::= .ID }

where rules are added in this order using the definition.

Goto corresponds to

scan in Earley Parser when X is a terminal
completion in Earley Parser when X is a non-terminal

$\begin{equation*} goto(I,X) = closure(\{ Y::=qX.r \mid (Y::=q.Xr)\in I \}) \end{equation*}$

Example: Let us compute goto( $I_0$ , ID). Because only one element contains .ID, we obtain

{ T ::= ID. }

and the closure remains the same because . does not appear before a non-terminal.

I1 = goto(I0, E) = { E ::= E . + T }
I2 = goto(I1, +) = { E ::= E + . T , T ::= . ID }
I3 = goto(I2, T) = { E ::= E + T. }

Constructing Automaton

Initial state $I_0 = closure(\{D'::=.D \verb! EOF!\})$

Example: What we denoted $I_0$ was indeed a correct initial state.

Edges are labeled by grammar terminals and non-terminals

they are $\Delta$ function of the Finite state machine

States and edges are the least sets (T,E) such that

$I_0 \in T$
for each terminal and non-terminal $X$ and each $I \in T$ , if $J=goto(I,X)$ then
- $J \in T$
- $(I,X,J) \in E$

Examples

The automaton in this example is deterministic

empty set is dead state
we do not represent dead state or its edges

The automaton represents possible sets of slices through syntax trees. The state records, for each non-terminal the way in which it's last ocurrence was unfolded in the tree.

Consider a parse tree for

ID +   ID + ID
     ^

when parser has read only token ID and +, we have the following cut through the tree, and the last expansions of E and T describe the state.