LARA

About Parsing Potentially Ambiguous Context-Free Grammars

Given a context-free grammar $G$, how to check if $w \in L(G)$?

  • recursive descent gives efficient answer when $G$ is LL(1)
  • we now see how to do it for arbitrary context-free grammar $G$

Applications:

  • applications in Natural Language Processing
  • parsing mathematical formulas: e.g. the parser within the Isabelle interactive theorem prover
  • more flexibility in writing the grammar for programming languages

Example from Natural Language Processing:

A student gets a good recommendation letter if the student makes a good impression on Viktor or makes a good impression on Giuliano and does an excellent project.

Interpretation on the condition for good recommendation letter:

  • $(v \lor g) \land p$
  • $v \lor (g \land p)$

In program natural language processing we need to represent multiple possible interpretations and resolve them using other methods (probabilities derived from a training set, or semantic interpretation using automated reasoning tools in some semantic domain).

Example from parsing formulas:: The Isabelle interactive theorem prover supports higher-order logic, which contains typed lambda calculus and first-order logic. The theorem prover has a flexible syntax that allows writing expressions close to the common mathematical practice. for example

\begin{equation*}
    f(x:=v) = g
\end{equation*}

makes $g$ denote the function such that $g(x)=v$ and $g(y)=f(y)$ for $y \neq x$. Unfortunatelly, the above expression looks like application of function $f$ to some value of the form $(x:=v)$, so it can be difficult to parse. Another example in Isabelle are expressions denoting sets. The formula

\begin{equation*}
    \{ (x,y,z),\ q \}
\end{equation*}

denotes a singleton set containing two elements: the first element is a triple $(x,y,z)$, the second element is $q$. On the other hand, the expression

\begin{equation*}
   \{ (x,y,z).\ x + y = z \}
\end{equation*}

is a set comprehension denoting infinitely many elements. These two expressions can also be difficult to differentiate in simple parsers.