LARA

Automata for LR Parsing without Lookahead

These automata are used to construct LR(0) and SLR parsers

  • same automata, SLR just uses some extra information

We describe their states and transitions

If $D$ was original start symbol, we

  • add (D'::=D eof) to grammar
  • make $D'$ new start symbol

States: Sets of Items

We will construct states as sets of items

LR(0) item is a dotted production $X::=p.q$

  • $X::=pq$ is grammar rule
  • $p,q$ some strings of terminals and non-terminals

Note: this is similar to Earley Parser item

  • but there is no more initial position!

We use letters I,J to denote sets of LR(0) items

  • approximates a set of paths in a syntax tree
  • left side of tree corresponds to input seen so far

Example:

Rules in our simple example grammar:

D' ::= E eof
E ::= T
E ::= E + T
T ::= ID

Example state is $I_0$, equal to the following set of items:

{ D' ::= .E eof,
  E ::= .T,
  E ::= .E + T,
  T ::= .ID }

Transitions: Closure and Goto Functions

These functions map sets of items to sets of items

Closure is similar to applying prediction in Earley Parser.

If $I$ is a set of states, then closure(I) is least set such that

  • I $\subseteq$ closure(I)
  • if $(X::=p.Yq)\in I$ and $(Y::=r)$ is a grammar rule, then $(Y::=.r)\ \in\ closure(I)$

Example: Take the singleton set

{ D' ::= .E eof }

Then closure of this set is $I_0$:

{ D' ::= .E eof,
  E ::= .T,
  E ::= .E + T,
  T ::= .ID }

where rules are added in this order using the definition.

Goto corresponds to

\begin{equation*}
   goto(I,X) = closure(\{ Y::=qX.r \mid (Y::=q.Xr)\in I \})
\end{equation*}

Example: Let us compute goto($I_0$, ID). Because only one element contains .ID, we obtain

{ T ::= ID. }

and the closure remains the same because . does not appear before a non-terminal.

I1 = goto(I0, E) = { E ::= E . + T }
I2 = goto(I1, +) = { E ::= E + . T , T ::= . ID }
I3 = goto(I2, T) = { E ::= E + T. }

Constructing Automaton

Initial state $I_0 = closure(\{D'::=.D \verb! EOF!\})$

Example: What we denoted $I_0$ was indeed a correct initial state.

Edges are labeled by grammar terminals and non-terminals

States and edges are the least sets (T,E) such that

  • $I_0 \in T$
  • for each terminal and non-terminal $X$ and each $I \in T$, if $J=goto(I,X)$ then
    • $J \in T$
    • $(I,X,J) \in E$

Examples

LR0 Automaton

The automaton in this example is deterministic

  • empty set is dead state
  • we do not represent dead state or its edges

The automaton represents possible sets of slices through syntax trees. The state records, for each non-terminal the way in which it's last ocurrence was unfolded in the tree.

Consider a parse tree for

ID +   ID + ID
     ^     

when parser has read only token ID and +, we have the following cut through the tree, and the last expansions of E and T describe the state.

Parse Tree for Expressions