GNU Info

Info Node: (elisp)Regexp Example

(elisp)Regexp Example


Prev: Regexp Functions Up: Regular Expressions
Enter node , (file) or (file)node

Complex Regexp Example
----------------------

   Here is a complicated regexp, used by Emacs to recognize the end of a
sentence together with any whitespace that follows.  It is the value of
the variable `sentence-end'.

   First, we show the regexp as a string in Lisp syntax to distinguish
spaces from tab characters.  The string constant begins and ends with a
double-quote.  `\"' stands for a double-quote as part of the string,
`\\' for a backslash as part of the string, `\t' for a tab and `\n' for
a newline.

     "[.?!][]\"')}]*\\($\\| $\\|\t\\|  \\)[ \t\n]*"

In contrast, if you evaluate the variable `sentence-end', you will see
the following:

     sentence-end
          => "[.?!][]\"')}]*\\($\\| $\\|  \\|  \\)[
     ]*"

In this output, tab and newline appear as themselves.

   This regular expression contains four parts in succession and can be
deciphered as follows:

`[.?!]'
     The first part of the pattern is a character alternative that
     matches any one of three characters: period, question mark, and
     exclamation mark.  The match must begin with one of these three
     characters.

`[]\"')}]*'
     The second part of the pattern matches any closing braces and
     quotation marks, zero or more of them, that may follow the period,
     question mark or exclamation mark.  The `\"' is Lisp syntax for a
     double-quote in a string.  The `*' at the end indicates that the
     immediately preceding regular expression (a character alternative,
     in this case) may be repeated zero or more times.

`\\($\\| $\\|\t\\|  \\)'
     The third part of the pattern matches the whitespace that follows
     the end of a sentence: the end of a line (optionally with a
     space), or a tab, or two spaces.  The double backslashes mark the
     parentheses and vertical bars as regular expression syntax; the
     parentheses delimit a group and the vertical bars separate
     alternatives.  The dollar sign is used to match the end of a line.

`[ \t\n]*'
     Finally, the last part of the pattern matches any additional
     whitespace beyond the minimum needed to end a sentence.


automatically generated by info2www version 1.2.2.9