Strings and languages

We next formalize mathematically the notion of a string and a set of strings.

Alphabet is a finite set of elements. (It represents characters of a string.) Let's denote it $\Sigma$.

A string is a potentially empty, finite sequence of alphabet elements. Often strings are called “words” in automata theory terminology. We use $\epsilon$ to denote an empty string. We denote the set of all strings by $\Sigma^*$. We can represent strings of length $n$ by $n$-tuples, so we can define

  \Sigma^* = \{\epsilon\} \cup \Sigma \cup \Sigma^2 \cup \ldots

We use centered dot for string concatenation, as in $s_1 \cdot s_2$, and we sometimes omit it, as in $s_1 s_2$ (programming language Objective Caml uses ^ to denote string concatenation; other languages often use + as an overloaded operator).

The concatentation is an operation $\Sigma^* \times \Sigma^* \to \Sigma^*$. It is associative, and $\epsilon$ is left and right neutral element:

  s_1 \cdot (s_2 \cdot s_3) = (s_1 \cdot s_2) \cdot s_3 \\
  \epsilon \cdot s = s \cdot \epsilon = s

Therefore, $(\Sigma^*, {\ \cdot\ }, \epsilon)$ is a monoid.

If $w$ is a word, we define $w^0 = \epsilon$ and $w^{n+1} = w \cdot w^n$.

A language is any set of strings, that is, a set $L \subseteq \Sigma^*$.

Languages are sets, so operations $\cap, \cup, \setminus$ all apply to them.

We define concatenation and iteration of languages by

  L_1 \cdot L_2  &=& \{ s_1 \cdot s_2 \mid s_1 \in L_1 \land s_2 \in L_2 \} \\
  L^0 &=& \{ \epsilon \} \\
  L^{n+1} &=& L \cdot L^n \\
  L^* &=& \bigcup_{n \geq 0} L^n = \{ w_1 \ldots w_n \mid w_1,\ldots,w_n \in L \}

Simple Consequences

Observe that

   \emptyset \cdot L = \{ s_1 \cdot s_2 \mid s_1 \in \emptyset \land s_2 \in L \} = \{s_1 \cdot s_2 \mid \mbox{false} \} = \emptyset

Similarly, $L \cdot \emptyset = \emptyset$.

Also directly from definition follows:

  \{ w_1 \} \cdot \{ w_2 \} = \{ w_1\cdot w_2 \}