Homework 1

Due Monday October 8th by 10:15am sharp (right before the class).

Problem 1

Find a regular expression that generates all alternating sequences of 0 and 1 with arbitrary length (including lengths zero, one, two, …). For example, the alternating sequences of length one are 0 and 1, length two are 01 and 10, length three are 010 and 101. Note that no two adjacent character can be the same in an alternating sequence.

Problem 2

Construct a DFA for the language of well-nested parenthesis with a maximal nesting depth of 3. For example, ε, ()(), (()(())) and (()())()(), but not (((()))) nor (()(()(()))), nor ())).

By well-nested parentheses we mean those that are correctly nested, and are thus given by the grammar such as the following:

  S ::= \epsilon \mid (S)S

An alternative way to describe such strings $w \in \{ (, ) \}$ is to say that:

  • the total number of open and closed parentheses in $w$ is the same;
  • in each prefix $w'$ of $w$ (i.e. in $w'$ such that $w = w'v$) the number of closed parantheses in $w'$ is less than or equal the number of open parantheses in $w'$.

Problem 3

Find two equivalent states in the automaton, and merge them to produce a smaller automaton that recognizes the same language. Repeat until there are no longer equivalent states.

Recall that the general algorithm for minimizing finite automata works in reverse. First, find all pairs of inequivalent states. States X, Y are inequivalent if X is final and Y is not, or (by iteration) if $X \xrightarrow{a} X'$ and $Y \xrightarrow{a} Y'$ and X' and Y' are inequivalent. After this iteration ceases to find new pairs of inequivalent states, then X, Y are equivalent, if they are not inequivalent.

Problem 4

Let $D$ be any deterministic finite automaton. Assume that $D$ contains exactly $n$ states. Show that if it accepts at least one string of length $n$ or greater then the accepted language is infinite.

Problem 5

Let rtail be a function that returns all the symbols of a string except the last one. For example, rtail(Lexer) = Lexe. rtail is undefined for an empty string. If $R$ is a regular expression, then $\textit{rtail}(R)$ applies the function to all the elements. For example, $\textit{rtail}(abba|ba^*|ab^*)$ is the language that can be described by a regular expression $(ba^*|ab^*|\varepsilon)$.

Prove that $\textit{rtail}(R)$ is regular if $R$ is not nullable.

Bonus part: Let $|v|$ denote the length of the string $v$. Let $R$ be a regular language. Is the language $\{ u \in \Sigma^* \mid 
\exists v \in \Sigma^*. uv \in R \land |v|=7 \}$ always regular?

Problem 6

Consider a language with the following tokens and token classes:

ident  ::= letter (letter|digit)*
LT     ::= "<"
GT     ::= ">"  
shiftL ::= "<<"
shiftR ::= ">>"
dot    ::= "."
LP     ::= "("
RP     ::= ")"

Give a sequence of tokens for the following character sequence, applying the longest match rule:


Note that the input sequence contains no space character. You can separate the tokens in the output sequence by a comma for the purpose of presentation, like so:

LP, ident, ...