Copyright (C) 2000-2012 |
GNU Info (guile.info)Rx InterfaceRx Interface ------------ [FIXME: this is taken from Gary and Mark's quick summaries and should be reviewed and expanded. Rx is pretty stable, so could already be done!] Guile includes an interface to Tom Lord's Rx library (currently only to POSIX regular expressions). Use of the library requires a two step process: compile a regular expression into an efficient structure, then use the structure in any number of string comparisons. For example, given the regular expression `abc.' (which matches any string containing `abc' followed by any single character): guile> (define r (regcomp "abc.")) guile> r #<rgx abc.> guile> (regexec r "abc") #f guile> (regexec r "abcd") #((0 . 4)) guile> The definitions of `regcomp' and `regexec' are as follows: - primitive: regcomp pattern [flags] Compile the regular expression pattern using POSIX rules. Flags is optional and should be specified using symbolic names: - Variable: REG_EXTENDED use extended POSIX syntax - Variable: REG_ICASE use case-insensitive matching - Variable: REG_NEWLINE allow anchors to match after newline characters in the string and prevents `.' or `[^...]' from matching newlines. The `logior' procedure can be used to combine multiple flags. The default is to use POSIX basic syntax, which makes `+' and `?' literals and `\+' and `\?' operators. Backslashes in PATTERN must be escaped if specified in a literal string e.g., `"\\(a\\)\\?"'. - primitive: regexec regex string [match-pick] [flags] Match STRING against the compiled POSIX regular expression REGEX. MATCH-PICK and FLAGS are optional. Possible flags (which can be combined using the logior procedure) are: - Variable: REG_NOTBOL The beginning of line operator won't match the beginning of STRING (presumably because it's not the beginning of a line) - Variable: REG_NOTEOL Similar to REG_NOTBOL, but prevents the end of line operator from matching the end of STRING. If no match is possible, regexec returns #f. Otherwise MATCH-PICK determines the return value: `#t' or unspecified: a newly-allocated vector is returned, containing pairs with the indices of the matched part of STRING and any substrings. `""': a list is returned: the first element contains a nested list with the matched part of STRING surrounded by the the unmatched parts. Remaining elements are matched substrings (if any). All returned substrings share memory with STRING. `#f': regexec returns #t if a match is made, otherwise #f. vector: the supplied vector is returned, with the first element replaced by a pair containing the indices of the matched portion of STRING and further elements replaced by pairs containing the indices of matched substrings (if any). list: a list will be returned, with each member of the list specified by a code in the corresponding position of the supplied list: a number: the numbered matching substring (0 for the entire match). `#\<': the beginning of STRING to the beginning of the part matched by regex. `#\>': the end of the matched part of STRING to the end of STRING. `#\c': the "final tag", which seems to be associated with the "cut operator", which doesn't seem to be available through the posix interface. e.g., `(list #\< 0 1 #\>)'. The returned substrings share memory with STRING. Here are some other procedures that might be used when using regular expressions: - primitive: compiled-regexp? obj Test whether obj is a compiled regular expression. - primitive: regexp->dfa regex [flags] - primitive: dfa-fork dfa - primitive: reset-dfa! dfa - primitive: dfa-final-tag dfa - primitive: dfa-continuable? dfa - primitive: advance-dfa! dfa string |