GNU Info

Info Node: (guile.info)Rx Interface

(guile.info)Rx Interface


Prev: Backslash Escapes Up: Regular Expressions
Enter node , (file) or (file)node

Rx Interface
------------

[FIXME: this is taken from Gary and Mark's quick summaries and should be
reviewed and expanded.  Rx is pretty stable, so could already be done!]

Guile includes an interface to Tom Lord's Rx library (currently only to
POSIX regular expressions).  Use of the library requires a two step
process: compile a regular expression into an efficient structure, then
use the structure in any number of string comparisons.

For example, given the regular expression `abc.' (which matches any
string containing `abc' followed by any single character):

     guile> (define r (regcomp "abc."))
     guile> r
     #<rgx abc.>
     guile> (regexec r "abc")
     #f
     guile> (regexec r "abcd")
     #((0 . 4))
     guile>

The definitions of `regcomp' and `regexec' are as follows:

 - primitive: regcomp pattern [flags]
     Compile the regular expression pattern using POSIX rules.  Flags is
     optional and should be specified using symbolic names:

      - Variable: REG_EXTENDED
          use extended POSIX syntax

      - Variable: REG_ICASE
          use case-insensitive matching

      - Variable: REG_NEWLINE
          allow anchors to match after newline characters in the string
          and prevents `.' or `[^...]' from matching newlines.

     The `logior' procedure can be used to combine multiple flags.  The
     default is to use POSIX basic syntax, which makes `+' and `?'
     literals and `\+' and `\?' operators.  Backslashes in PATTERN must
     be escaped if specified in a literal string e.g., `"\\(a\\)\\?"'.

 - primitive: regexec regex string [match-pick] [flags]
     Match STRING against the compiled POSIX regular expression REGEX.
     MATCH-PICK and FLAGS are optional.  Possible flags (which can be
     combined using the logior procedure) are:

      - Variable: REG_NOTBOL
          The beginning of line operator won't match the beginning of
          STRING (presumably because it's not the beginning of a line)

      - Variable: REG_NOTEOL
          Similar to REG_NOTBOL, but prevents the end of line operator
          from matching the end of STRING.

     If no match is possible, regexec returns #f.  Otherwise MATCH-PICK
     determines the return value:

     `#t' or unspecified: a newly-allocated vector is returned,
     containing pairs with the indices of the matched part of STRING
     and any substrings.

     `""': a list is returned: the first element contains a nested list
     with the matched part of STRING surrounded by the the unmatched
     parts.  Remaining elements are matched substrings (if any).  All
     returned substrings share memory with STRING.

     `#f': regexec returns #t if a match is made, otherwise #f.

     vector: the supplied vector is returned, with the first element
     replaced by a pair containing the indices of the matched portion
     of STRING and further elements replaced by pairs containing the
     indices of matched substrings (if any).

     list: a list will be returned, with each member of the list
     specified by a code in the corresponding position of the supplied
     list:

     a number: the numbered matching substring (0 for the entire match).

     `#\<': the beginning of STRING to the beginning of the part matched
     by regex.

     `#\>': the end of the matched part of STRING to the end of STRING.

     `#\c': the "final tag", which seems to be associated with the "cut
     operator", which doesn't seem to be available through the posix
     interface.

     e.g., `(list #\< 0 1 #\>)'.  The returned substrings share memory
     with STRING.

Here are some other procedures that might be used when using regular
expressions:

 - primitive: compiled-regexp? obj
     Test whether obj is a compiled regular expression.

 - primitive: regexp->dfa regex [flags]

 - primitive: dfa-fork dfa

 - primitive: reset-dfa! dfa

 - primitive: dfa-final-tag dfa

 - primitive: dfa-continuable? dfa

 - primitive: advance-dfa! dfa string


automatically generated by info2www version 1.2.2.9