Description of MySQL regular expression syntax
**********************************************
A regular expression (regex) is a powerful way of specifying a complex
search.
MySQL uses Henry Spencer's implementation of regular expressions, which
is aimed at conformance with POSIX 1003.2. MySQL uses the extended
version.
This is a simplistic reference that skips the details. To get more exact
information, see Henry Spencer's `regex(7)' manual page that is
included in the source distribution. Note:Credits.
A regular expression describes a set of strings. The simplest regexp is
one that has no special characters in it. For example, the regexp
`hello' matches `hello' and nothing else.
Non-trivial regular expressions use certain special constructs so that
they can match more than one string. For example, the regexp
`hello|word' matches either the string `hello' or the string `word'.
As a more complex example, the regexp `B[an]*s' matches any of the
strings `Bananas', `Baaaaas', `Bs', and any other string starting with
a `B', ending with an `s', and containing any number of `a' or `n'
characters in between.
A regular expression may use any of the following special
characters/constructs:
`^'
Match the beginning of a string.
mysql> select "fo\nfo" REGEXP "^fo$"; -> 0
mysql> select "fofo" REGEXP "^fo"; -> 1
`$'
Match the end of a string.
mysql> select "fo\no" REGEXP "^fo\no$"; -> 1
mysql> select "fo\no" REGEXP "^fo$"; -> 0
`.'
Match any character (including newline).
mysql> select "fofo" REGEXP "^f.*"; -> 1
mysql> select "fo\nfo" REGEXP "^f.*"; -> 1
`a*'
Match any sequence of zero or more `a' characters.
mysql> select "Ban" REGEXP "^Ba*n"; -> 1
mysql> select "Baaan" REGEXP "^Ba*n"; -> 1
mysql> select "Bn" REGEXP "^Ba*n"; -> 1
`a+'
Match any sequence of one or more `a' characters.
mysql> select "Ban" REGEXP "^Ba+n"; -> 1
mysql> select "Bn" REGEXP "^Ba+n"; -> 0
`a?'
Match either zero or one `a' character.
mysql> select "Bn" REGEXP "^Ba?n"; -> 1
mysql> select "Ban" REGEXP "^Ba?n"; -> 1
mysql> select "Baan" REGEXP "^Ba?n"; -> 0
`de|abc'
Match either of the sequences `de' or `abc'.
mysql> select "pi" REGEXP "pi|apa"; -> 1
mysql> select "axe" REGEXP "pi|apa"; -> 0
mysql> select "apa" REGEXP "pi|apa"; -> 1
mysql> select "apa" REGEXP "^(pi|apa)$"; -> 1
mysql> select "pi" REGEXP "^(pi|apa)$"; -> 1
mysql> select "pix" REGEXP "^(pi|apa)$"; -> 0
`(abc)*'
Match zero or more instances of the sequence `abc'.
mysql> select "pi" REGEXP "^(pi)*$"; -> 1
mysql> select "pip" REGEXP "^(pi)*$"; -> 0
mysql> select "pipi" REGEXP "^(pi)*$"; -> 1
`{1}'
`{2,3}'
The is a more general way of writing regexps that match many
occurrences of the previous atom.
`a*'
Can be written as `a{0,}'.
`a+'
Can be written as `a{1,}'.
`a?'
Can be written as `a{0,1}'. To be more precise, an atom
followed by a bound containing one integer `i' and no comma
matches a sequence of exactly `i' matches of the atom. An atom
followed by a bound containing one integer `i' and a comma matches
a sequence of `i' or more matches of the atom. An atom followed
by a bound containing two integers `i' and `j' matches a sequence
of `i' through `j' (inclusive) matches of the atom.
Both arguments must be in the range from `0' to `RE_DUP_MAX'
(default 255), inclusive. If there are two arguments, the second
must be greater than or equal to the first.
`[a-dX]'
`[^a-dX]'
Matches any character which is (or is not, if ^ is used) either
`a', `b', `c', `d' or `X'. To include a literal `]' character, it
must immediately follow the opening bracket `['. To include a
literal `-' character, it must be written first or last. So
`[0-9]' matches any decimal digit. Any character that does not have
a defined meaning inside a `[]' pair has no special meaning and
matches only itself.
mysql> select "aXbc" REGEXP "[a-dXYZ]"; -> 1
mysql> select "aXbc" REGEXP "^[a-dXYZ]$"; -> 0
mysql> select "aXbc" REGEXP "^[a-dXYZ]+$"; -> 1
mysql> select "aXbc" REGEXP "^[^a-dXYZ]+$"; -> 0
mysql> select "gheis" REGEXP "^[^a-dXYZ]+$"; -> 1
mysql> select "gheisa" REGEXP "^[^a-dXYZ]+$"; -> 0
`[[.characters.]]'
The sequence of characters of that collating element. The sequence
is a single element of the bracket expression's list. A bracket
expression containing a multi-character collating element can thus
match more than one character, for example, if the collating
sequence includes a `ch' collating element, then the regular
expression `[[.ch.]]*c' matches the first five characters of
`chchcc'.
`[=character_class=]'
An equivalence class, standing for the sequences of characters of
all collating elements equivalent to that one, including itself.
For example, if `o' and `(+)' are the members of an equivalence
class, then `[[=o=]]', `[[=(+)=]]', and `[o(+)]' are all
synonymous. An equivalence class may not be an endpoint of a range.
`[:character_class:]'
Within a bracket expression, the name of a character class
enclosed in `[:' and `:]' stands for the list of all characters
belonging to that class. Standard character class names are:
alnum digit punct
alpha graph space
blank lower upper
cntrl print xdigit
These stand for the character classes defined in the `ctype(3)'
manual page. A locale may provide others. A character class may
not be used as an endpoint of a range.
mysql> select "justalnums" REGEXP "[[:alnum:]]+"; -> 1
mysql> select "!!" REGEXP "[[:alnum:]]+"; -> 0
`[[:<:]]'
`[[:>:]]'
These match the null string at the beginning and end of a word
respectively. A word is defined as a sequence of word characters
which is neither preceded nor followed by word characters. A word
character is an alnum character (as defined by `ctype(3)') or an
underscore (`_').
mysql> select "a word a" REGEXP "[[:<:]]word[[:>:]]"; -> 1
mysql> select "a xword a" REGEXP "[[:<:]]word[[:>:]]"; -> 0
mysql> select "weeknights" REGEXP "^(wee|week)(knights|nights)$"; -> 1