# Regular expressions

Regular expressions are expressions denoting certain languages. They are precisely those languages that can be built from the singleton languages , and for each , using operations union ( ), concatenation ( ), and iteration ( ) on languages.

In regular expressions, is typically denoted (sometimes ), and the curly braces around singleton sets are omitted, so the language for is denoted simply and the language by . The concatenation operator is simply omitted. We next make this more precise. We write to denote the set that denotes the meaning of a regular expression.

A regular expression over alphabet is given by:

• is a regular expression, • is a regular expression • if , then is a regular expression • if and are regular expressions, so are , , and ; we define , , • every regular expression is obtained by a finite number of applications of the rules above.

Example: let . The regular expression

(bar)*(a+b)*

denotes the language .

Example: Integer constants (such as 3123, 9123, 1389123) that do not begin with digit zero are given by (using '|' notation instead of '+'):

(1|2|3|4|5|6|7|8|9)(0|1|2|3|4|5|6|7|8|9)*

Example: let (be*r + coke)*

denotes the set of strings that include:

br, ber, beer, beeer, ... , coke, brcoke, ... , berbrcokecokebeeer

Notes:

• we can introduce shorthand r+ to stand for rr*
• we can allow non-recursive shorthands, for example
posDigit = 1|2|3|4|5|6|7|8|9
digit = 0|posDigit
intConst = posDigit digit*