LARA

Transforming a Grammar to Chomsky Form

Noam Chomsky

We show how to make a grammar:

  • without unproductive symbols
  • without unreachable symbols
  • $\epsilon$-free (no non-start nullable symbols)
  • without single productions X::=Y (X,Y non-terminals)
  • without productions of arity more than two
  • with terminals occurring alone on right-hand side

Elimination of Non-Productive Non-Terminals

Non-terminal is non-productive if it cannot generate any string of terminals

  • if X::=w and w has only terminals, then X is productive
  • if X:=p and p has only productive symbols, then X is productive

Start from empty set, apply these rules to compute productive symbols

  • remaining symbols are non-productive
  • delete them and all rules that contain them

Elimination of Unreachable Non-Terminals

Non-terminal is unreachable if no derivation from start symbol contains it

  • S is reachable
  • if X is reachable and X::=p, then every symbol in p is reachable

Start from empty set, apply these rules to compute productive symbols

  • remaining symbols are non-productive
  • delete them and all rules that contain them

Making Non-Terminals Non-Nullable

A grammar where is $\epsilon$-free if

  • no-non terminal is nullable, or
  • only S is nullable and S does not appear on right-hand side

First, compute nullable nonterminals.

Then, for each production

\begin{equation*}
   X := Y_1 \ldots Y_n
\end{equation*}

create all productions of form

\begin{equation*}
   X := B_1 \ldots B_n
\end{equation*}

where

  • $B_i$ is $Y_i$ (as before), or,
  • $Y_i$ is nullable and $B_i$ is empty string (so it disappears)

Remove all right-hand sides that are “”

If S is nullable, introduce new initial symbol S' and add

\begin{equation*}
   S' ::= S | ""
\end{equation*}

Eliminating Single Productions

Construct a directed graph of non-terminals between X,Y such that $X::=Y$ is production

Compute $\Rightarrow^*$ as the transitive closure of this graph

If X $\Rightarrow^*$ Y and Y ::= p is a rule, then add rule

X ::= p

Then remove single productions.

Eliminating Non-Binary Productions

\begin{equation*}
X ::= Y_1 Y_2 Y_3 \ldots Y_n
\end{equation*}

becomes

\begin{equation*}
\begin{array}{l}
X ::= Z Y_3 \ldots Y_n \\
Z ::= Y_1 Y_2
\end{array}
\end{equation*}

Making Terminals Alone on Right-Hand Side

Just introduce a fresh non-terminal $X_t$ for terminal t

Use $X_t$ on RHS instead of t

Add rule $X_t ::= t$

Summary of Parsing General Context-Free Grammars

To check if $w \in L(G)$: