Regular expressions
Regular expressions are expressions denoting certain languages. They are precisely those languages that can be built from the singleton languages , and for each , using operations union (), concatenation (), and iteration () on languages.
In regular expressions, is typically denoted (sometimes ), and the curly braces around singleton sets are omitted, so the language for is denoted simply and the language by . The concatenation operator is simply omitted. We next make this more precise. We write to denote the set that denotes the meaning of a regular expression.
A regular expression over alphabet is given by:
- is a regular expression,
- is a regular expression
- if , then is a regular expression
- if and are regular expressions, so are , , and ; we define , ,
- every regular expression is obtained by a finite number of applications of the rules above.
Example: let . The regular expression
(bar)*(a+b)*
denotes the language .
Example: Integer constants (such as 3123, 9123, 1389123) that do not begin with digit zero are given by (using '|' notation instead of '+'):
(1|2|3|4|5|6|7|8|9)(0|1|2|3|4|5|6|7|8|9)*
Example: let
(be*r + coke)*
denotes the set of strings that include:
br, ber, beer, beeer, ... , coke, brcoke, ... , berbrcokecokebeeer
Notes:
- we can introduce shorthand r+ to stand for rr*
- we can allow non-recursive shorthands, for example
posDigit = 1|2|3|4|5|6|7|8|9 digit = 0|posDigit intConst = posDigit digit*