LARA

Limitations of Regular Expressions

Take a deterministic finite state machine $M$ with $n$ states.

Suppose that a word of $w$ with $|w| > n$ is accepted by the machine. This means that the machine went through some state twice, say after reading $i$ and then after reading $j$ symbols of the word.

Then for some words $p,q,r$ with $|p|=i$ and $|pq|=j$ we have $w = pqr$ where the state after reading $p$ and state after reading $pq$ are the same. Then it means that the machine is in the same state after reading $pq^k$ for any $k \ge 0$, so all words of the form $pq^k r$ are accepted by the machine.

This is a rough statement of the so called “pumping lemma”.

Example of application: The language $ L = \{ a^l b^l \mid l \ge 0\}$ is not regular. Suppose the contrary. Then there is a machine $M$ that accepts the language; let $n$ be the number of states of $M$. The word $a^{n+1} b^{n+1}$ is accepted by $M$. While reading the prefix $a^{n+1}$ the machine must go through some state twice. If we skip this loop, then there is a word of the form $a^t b^{n+1}$ for $t < n+1$ that is also accepted, which is contradiction with the condition that $M$ accepts $L$.