LARA

CYK Parsing Algorithm for General Context-Free Grammars

Assume the grammar is in Chomsky Normal Form.

Example Grammar

S ::= L R | S S | L X
X ::= S R
L ::= "{"
R ::= "}"

Consider an input string

{ { } { } { } }

For each terminal t in input, for which non-terminal X is it the case that X $\Rightarrow^* t$?

{ { } { } { } }   length of substring
L L R L R L R R      1
   S   S   S         2
            X        3
     S   S           4
          X          5
       S             6
         X           7
       S             8

For each string w of length 2 in input, for which non-terminal X is it the case that X $\Rightarrow^* w$?

Dynamic programming algorithm: for each substring, determine which non-terminals can generate it.

Let $w$ be input word

Let d(i)(j) denote non-terminals deriving substring $w_{ij}$ of $w$ from i to j.

\begin{equation*}
   d(i)(j) = \{ X | (X::=Y Z) \in G,\ Y \in w_{ik},\ Z \in w_{k+1,j} \}
\end{equation*}

CYK algorithm:

  INPUT: word w, grammar G in Chomsky normal form
  OUTPUT: true iff (w in L(G))
  N = |w|
  var d : Array[N][N] 
  forall i in {1..N} {
    d(i)(i) = {X | G contains X->w(i)}
    for j in {i + 1 .. N}
      d(i)(j) = {}
  }
  for k = 2 to N // substring length
    for i = 0 to N-k // initial position
      for j = 1 to k-1 // length of first half
        for each (X::=Y Z) in G
          if Y in d(i)(i+j-1) and Z in d(i+j)(i+k-1)
            d(i)(j) = d(i)(j) union {X}
  return (S in d(0,N-1))

A Programming Language Example

Consider this fragment of a grammar of language with references and procedure calls

statement ::= assign | call
assign ::= expr "=" expr
call ::= expr "." ID "(" expr ")"
expr ::= ID | expr "." expr

Is this grammar LL(1)?

What is its Chomsky Normal Form?

Additional information:

Number of Distinct Parse Trees

Let $w$ be the string

()()()

It has 2 parse trees

What is the number of parse trees for a string of the form $w ... w$ where $w$ repeats $n$ times?

What is the running time of the CYK parser on this input?

How to generate one or all syntax trees?