LARA

Homework 01

Due Wednesday, 30th September, 10:15am. Hand it in to Giuliano Losa before the beginning of the exercise session.

Problem 1

Write a regular expression for the language of nonnegative integer constants in the C programming language, where numbers beginning with O are octal constants and other numbers are decimal constants.

Comments:
As the statement of the problem was misleading (0 instead of 0), only the solutions that could not fall into any interpretation are deemed wrong.

5 points

Problem 2

Convert the following regular expression to nondeterministic finite automata: a((b|a*c)x)*|x*a

5 points

Problem 3

Show that any deterministic finite automaton for regular expression (a|b)*a(a|b)(a|b)…(a|b) where (a|b) appears n-1 times at the end must have at least 2n states.

Possible solution:

Let $A=(\Sigma,Q,\Delta q_{0},F)$ be a deterministic automaton for the given regular expression. Let's prove that $|Q|\ge2^{n}$ by contradiction :
Suppose $|Q|<2^{n}$. We are going to exhibit two strings such that only one is in the language but both lead the automaton in the same state.
Consider the set of strings$S=\{c_{1}c_{2}...c_{n}|c_{i}\in\{a,b\}\}$. Notice that $|S|=2^{n}$.
Consider the set of states $Q_{S}=\{q \in Q|\exists s \in S.q_{0}sq\}$ in which the automaton $A$ is after processing each of those strings. Since $|Q|<2^{n}$ we have $|Q_{S}|<2^{n}$. But $|S|=2^{n}$, so there has to be at least 2 strings that lead to the same state: $\exists s_{0},s_{1}\in S.\exists q\in Q.s_{1}\neq s_{2}\wedge q_{0}s_{0}q\wedge q_{0}s_{1}q$.
Now consider a position $i$ at which $s_{0}$ and $s_{1}$ differ: because $s_{0}\neq s_{1}$, $\exists i.1\leq i\leq n\wedge s_{0}[i]\neq s_{1}[i]$.
Suppose $s_{0}[i]=a$ (the other case is symmetric), and consider inputs $s_{0}'=s_{0}a^{i-1}$ and $s_{1}'=s_{1}a^{i-1}$. We have $s_{0}'[i]=a$, $s_{1}'[i]=b$, $|s_{0}|=n+i-1$, and $|s_{1}|=n+i-1$ by construction. Thus $s_{0}$ is in the language of the regular expression and $s_{1}$ is not. But because the automaton is in the same state for both strings after having consumed $n$ characters and because it is deterministic, it will be in the same state for both strings after consuming all the input. It will thus either accept $s_{1}$or reject $s_{0}$. This is in contradition with our assumption that $A$ is an automaton for the given regular expression, Thus our supposition is false: $|Q|\geq2^{n}$.

Comments:

The last part of the argument doesn't apply to a NFA: for a word in the language, not all the possible paths have to end in an accepting state.

Some tried to prove it by induction, but it is difficult to construct a minimal automaton recognizing the language from the minimal automaton of the previous step. Those who tried didn't consider the issue of minimality of the automaton. They just proved that one can construct an automaton with $2^n$ states.

Others said that the NFA has n states and that determinization always gives $2^n$ states. That is wrong, $2^n$ is just the maximum number of states you can get.

10 points : 5 for a correct intuition, 10 for writing a correct proof.