GNU Info

Info Node: (guile.info)Backslash Escapes

(guile.info)Backslash Escapes


Next: Rx Interface Prev: Match Structures Up: Regular Expressions
Enter node , (file) or (file)node

Backslash Escapes
-----------------

Sometimes you will want a regexp to match characters like `*' or `$'
exactly.  For example, to check whether a particular string represents
a menu entry from an Info node, it would be useful to match it against
a regexp like `^* [^:]*::'.  However, this won't work; because the
asterisk is a metacharacter, it won't match the `*' at the beginning of
the string.  In this case, we want to make the first asterisk un-magic.

You can do this by preceding the metacharacter with a backslash
character `\'.  (This is also called "quoting" the metacharacter, and
is known as a "backslash escape".)  When Guile sees a backslash in a
regular expression, it considers the following glyph to be an ordinary
character, no matter what special meaning it would ordinarily have.
Therefore, we can make the above example work by changing the regexp to
`^\* [^:]*::'.  The `\*' sequence tells the regular expression engine
to match only a single asterisk in the target string.

Since the backslash is itself a metacharacter, you may force a regexp to
match a backslash in the target string by preceding the backslash with
itself.  For example, to find variable references in a TeX program, you
might want to find occurrences of the string `\let\' followed by any
number of alphabetic characters.  The regular expression
`\\let\\[A-Za-z]*' would do this: the double backslashes in the regexp
each match a single backslash in the target string.

 - procedure: regexp-quote str
     Quote each special character found in STR with a backslash, and
     return the resulting string.

*Very important:* Using backslash escapes in Guile source code (as in
Emacs Lisp or C) can be tricky, because the backslash character has
special meaning for the Guile reader.  For example, if Guile encounters
the character sequence `\n' in the middle of a string while processing
Scheme code, it replaces those characters with a newline character.
Similarly, the character sequence `\t' is replaced by a horizontal tab.
Several of these "escape sequences" are processed by the Guile reader
before your code is executed.  Unrecognized escape sequences are
ignored: if the characters `\*' appear in a string, they will be
translated to the single character `*'.

This translation is obviously undesirable for regular expressions, since
we want to be able to include backslashes in a string in order to
escape regexp metacharacters.  Therefore, to make sure that a backslash
is preserved in a string in your Guile program, you must use _two_
consecutive backslashes:

     (define Info-menu-entry-pattern (make-regexp "^\\* [^:]*"))

The string in this example is preprocessed by the Guile reader before
any code is executed.  The resulting argument to `make-regexp' is the
string `^\* [^:]*', which is what we really want.

This also means that in order to write a regular expression that matches
a single backslash character, the regular expression string in the
source code must include _four_ backslashes.  Each consecutive pair of
backslashes gets translated by the Guile reader to a single backslash,
and the resulting double-backslash is interpreted by the regexp engine
as matching a single backslash character.  Hence:

     (define tex-variable-pattern (make-regexp "\\\\let\\\\=[A-Za-z]*"))

The reason for the unwieldiness of this syntax is historical.  Both
regular expression pattern matchers and Unix string processing systems
have traditionally used backslashes with the special meanings described
above.  The POSIX regular expression specification and ANSI C standard
both require these semantics.  Attempting to abandon either convention
would cause other kinds of compatibility problems, possibly more severe
ones.  Therefore, without extending the Scheme reader to support
strings with different quoting conventions (an ungainly and confusing
extension when implemented in other languages), we must adhere to this
cumbersome escape syntax.


automatically generated by info2www version 1.2.2.9