LARA

Equivalence of finite state machines and regular expression languages

We next show that a language is given by a regular expression if and only if it is a language of some finite state machine.

If a language is given by one of these two ways, we can always convert to the other if this is more convenient.

  • regular expressions can be easier to specify textually
  • it is easier to check whether a string is accepted by a finite state machine

Every language given by a regular expression is accepted by some finite state machine

By induction on the structure of regular expressions, we construct the finite state machine that accepts this language.

For the base case, observe that we can easily construct finite state machines for empty language $\epsilon$, and a finite state machine for a singleton language $a$ for $a \in \Sigma$.

For the inductive step, we use closure properties of finite state machines for the cases of union, concatenation, and iteration.

Every language accepted by a finite state machine is given by some regular expression

Finite state machines with regular expression labels. We first generalize the notion of a finite state automaton so that we can label its edges not only with elements of $\Sigma$ as in the standard definition of finite state machine and with epsilon transitions as in finite state machine with epsilon transitions, but with arbitrary regular expressions.

An accepting execution for such a generalized finite state machine is a sequence of states and regular expressions $q_0, r_1, q_1, r_2, \ldots, r_n, q_n$ with $q \in F$, and the accepted strings of that execution are all strings in the union of $L(r_1 r_2 \ldots r_n)$ over all such accepting sequences.

Note that if all regular expressions are elements of $\Sigma$, the definition reduces to the standard definition of finite state machine.

Preparing for conversion to regular expression. To convert a finite state machine into a regular expression, we first view it as a generalized finite state machine, and then eliminate states of the state machine one by one. We start the process by creating a fresh initial state $q_I$ and fresh final state $q_f$ and expressing all final states in $F$ by $\epsilon$ transitions to $q_f$. We let $\{q_f\}$ be the new set of final states. For any pair of states, we then ensure that there is exactly one edge between them:

  • if there was no edge, we introduce an edge with the label is $\emptyset$;
  • if there are multiple edges with labels $a_1,\ldots,a_n$, we introduce instead one edge whose label is the regular expression $a_1 | \ldots | a_n$.

Elimination step: We show how to eliminate a state $q$ (we assume $q$ is not initial and not a final state).

  • let the self-loop edge labelled $q$ be $r$
  • for every two states $q_1$ and $q_2$ (possibly equal), distinct from $q$:
  • suppose we have these regular expressions on edges:
    • $q_1 \mathop{\to}\limits^{r_1} q  \mathop{\to}\limits^{r_1} q_2$
  • extend label from $q_1$ to $q_2$ with $r_1 r^* r_2$

At the end, we are left with one non-empty edge from $q_i$ to $q_f$, whose label is the desired regular expression.

Some consequences

  • Given an automaton that accepts $L$, construct an automaton that accepts $L^{-1} = \{ a_1 \ldots a_n \mid a_n \ldots a_1 \in L \}$ ?
  • Given a regular expression for $L$, construct a regular expression for $\Sigma^* \setminus L$ ?
  • Given two regular expressions, how to compute their intersection?

Exercise

Consider alphabet $\Sigma = \{a,b\}$. A string $s \in \Sigma^*$ is desperate if it contains 'aaa' as a substring. Construct a regular expression that describes the set of all strings that are not desperate.

One solution: