Strings and languages

We next formalize mathematically the notion of a string and a set of strings.

Alphabet is a finite set of elements. (It represents characters of a string.) Let's denote it $\Sigma$ .

A string is a potentially empty, finite sequence of alphabet elements. Often strings are called “words” in automata theory terminology. We use $\epsilon$ to denote an empty string. We denote the set of all strings by $\Sigma^*$ . We can represent strings of length $n$ by $n$ -tuples, so we can define

$\begin{equation*} \Sigma^* = \{\epsilon\} \cup \Sigma \cup \Sigma^2 \cup \ldots \end{equation*}$

We use centered dot for string concatenation, as in $s_1 \cdot s_2$ , and we sometimes omit it, as in $s_1 s_2$ (programming language Objective Caml uses ^ to denote string concatenation; other languages often use + as an overloaded operator).

The concatentation is an operation $\Sigma^* \times \Sigma^* \to \Sigma^*$ . It is associative, and $\epsilon$ is left and right neutral element:

$\begin{eqnarray*} s_1 \cdot (s_2 \cdot s_3) = (s_1 \cdot s_2) \cdot s_3 \\ \epsilon \cdot s = s \cdot \epsilon = s \end{eqnarray*}$

Therefore, $(\Sigma^*, {\ \cdot\ }, \epsilon)$ is a monoid.

If $w$ is a word, we define $w^0 = \epsilon$ and $w^{n+1} = w \cdot w^n$ .

A language is any set of strings, that is, a set $L \subseteq \Sigma^*$ .

Languages are sets, so operations $\cap, \cup, \setminus$ all apply to them.

We define concatenation and iteration of languages by

$\begin{eqnarray*} L_1 \cdot L_2 &=& \{ s_1 \cdot s_2 \mid s_1 \in L_1 \land s_2 \in L_2 \} \\ L^0 &=& \{ \epsilon \} \\ L^{n+1} &=& L \cdot L^n \\ L^* &=& \bigcup_{n \geq 0} L^n = \{ w_1 \ldots w_n \mid w_1,\ldots,w_n \in L \} \end{eqnarray*}$

Simple Consequences

Observe that

$\begin{equation*} \emptyset \cdot L = \{ s_1 \cdot s_2 \mid s_1 \in \emptyset \land s_2 \in L \} = \{s_1 \cdot s_2 \mid \mbox{false} \} = \emptyset \end{equation*}$

Similarly, $L \cdot \emptyset = \emptyset$ .

Also directly from definition follows:

$\begin{equation*} \{ w_1 \} \cdot \{ w_2 \} = \{ w_1\cdot w_2 \} \end{equation*}$