Homework 04

Due Wednesday, 3 November, 10:10am. Please hand it in to Hossein before the beginning of the exercise session.

Problem 1

In bottom up parsing replacing the RHS of a rule with its LHS is called reduction. A reductions is called “useless” if it is actually generated during the parsing but it cannot be used in any valid parse tree.

We denote a reduction by a triple (p, A → u, q). We define the fact that a reduction occurs during parsing depending on the parsing technique, as follows.

For CYK: CYK parser, if the non-terminal A belongs to d(p)(q) because of its right-hand side u, that is

either u is a terminal and is stored at substring from p to q (so q = p+1), or
u is of the form BC, where B belongs to d(p)(r) and C belongs to d(r+1)(q)

For Earley parser: in Earley parser executes a completion step because of the item $(A \rightarrow u. , p) \in S(q)$

Consider the following grammar.

S -> UT      |  V "Int"  |  "Int"
T -> V "Int" |  "Int"
U -> S "=>"
V -> T ","

In all of the following questions, consider the input to be “Int , Int ⇒ Int”.

Construct the CYK parsing table.
Is the grammar ambiguous for the input?
Determine the useless reductions in CYK parsing.
Give the Earley parsing for the same input and grammar.
Determine the useless reductions in Earley parsing.
Classify the useless reductions in three categories: both Earley and CYK, only CYK, only Earley.

Problem 2

The Predictor of an Earley parser can be improved by taking into account the look-ahead of the next symbol. As an example if the next symbol of the input is $a$ and $a\not\in \textit{FIRST}(Y)$ then it will never predict an item of the form $X\rightarrow\alpha\bullet Y\beta$ . Describe how this improved Predictor can eliminate some useless reductions in the first problem.

Problem 3

Consider a productive grammar in Chomsky normal form which does not contain $\epsilon$ . Assume the input consists of exactly $n$ tokens, i.e., the lexical analyzer returns $n$ tokens before hitting the EOF.

Assume there are $k$ productions of form A → BC and $l$ productions are of form A → a. By considering all the combinations of possible dot positions in the right-hand-side of a production and all different possible values of an item, compute the number of possible items.

Consider the following grammar.

S -> AB
A -> BA
A -> a
B -> b

Determine which of the following items are possible and which are impossible during the parsing of an input.

$(A\rightarrow a\bullet,n - 1)$
$(S\rightarrow AB\bullet,1)$
$(B\rightarrow b\bullet,4)$
$(A\rightarrow B\bullet A,2)$