Regular expressions
Regular expressions are expressions denoting certain languages. They are precisely those languages that can be built from the singleton languages , and
for each
, using operations union (
), concatenation (
), and iteration (
) on languages.
In regular expressions, is typically denoted
(sometimes
), and the curly braces around singleton sets are omitted, so the language
for
is denoted simply
and the language
by
. The concatenation operator is simply omitted. We next make this more precise. We write
to denote the set that denotes the meaning of a regular expression.
A regular expression over alphabet is given by:
is a regular expression,
is a regular expression
- if
, then
is a regular expression
- if
and
are regular expressions, so are
,
, and
; we define
,
,
- every regular expression is obtained by a finite number of applications of the rules above.
Example: let . The regular expression
(bar)*(a+b)*
denotes the language .
Example: Integer constants (such as 3123, 9123, 1389123) that do not begin with digit zero are given by (using '|' notation instead of '+'):
(1|2|3|4|5|6|7|8|9)(0|1|2|3|4|5|6|7|8|9)*
Example: let
(be*r + coke)*
denotes the set of strings that include:
br, ber, beer, beeer, ... , coke, brcoke, ... , berbrcokecokebeeer
Notes:
- we can introduce shorthand r+ to stand for rr*
- we can allow non-recursive shorthands, for example
posDigit = 1|2|3|4|5|6|7|8|9 digit = 0|posDigit intConst = posDigit digit*