Complex Regexp Example
----------------------
Here is a complicated regexp, used by Emacs to recognize the end of a
sentence together with any whitespace that follows. It is the value of
the variable `sentence-end'.
First, we show the regexp as a string in Lisp syntax to distinguish
spaces from tab characters. The string constant begins and ends with a
double-quote. `\"' stands for a double-quote as part of the string,
`\\' for a backslash as part of the string, `\t' for a tab and `\n' for
a newline.
"[.?!][]\"')}]*\\($\\| $\\|\t\\| \\)[ \t\n]*"
In contrast, if you evaluate the variable `sentence-end', you will see
the following:
sentence-end
=> "[.?!][]\"')}]*\\($\\| $\\| \\| \\)[
]*"
In this output, tab and newline appear as themselves.
This regular expression contains four parts in succession and can be
deciphered as follows:
`[.?!]'
The first part of the pattern is a character alternative that
matches any one of three characters: period, question mark, and
exclamation mark. The match must begin with one of these three
characters.
`[]\"')}]*'
The second part of the pattern matches any closing braces and
quotation marks, zero or more of them, that may follow the period,
question mark or exclamation mark. The `\"' is Lisp syntax for a
double-quote in a string. The `*' at the end indicates that the
immediately preceding regular expression (a character alternative,
in this case) may be repeated zero or more times.
`\\($\\| $\\|\t\\| \\)'
The third part of the pattern matches the whitespace that follows
the end of a sentence: the end of a line (optionally with a
space), or a tab, or two spaces. The double backslashes mark the
parentheses and vertical bars as regular expression syntax; the
parentheses delimit a group and the vertical bars separate
alternatives. The dollar sign is used to match the end of a line.
`[ \t\n]*'
Finally, the last part of the pattern matches any additional
whitespace beyond the minimum needed to end a sentence.