GNU Info

Info Node: ( of Module re

( of Module re

Next: Regular Expression Objects Prev: Matching vs. Searching Up: re
Enter node , (file) or (file)node

Module Contents

The module defines the following functions and constants, and an

`compile(pattern[, flags])'
     Compile a regular expression pattern into a regular expression
     object, which can be used for matching using its `match()' and
     `search()' methods, described below.

     The expression's behaviour can be modified by specifying a FLAGS
     value.  Values can be any of the following variables, combined
     using bitwise OR (the `|' operator).

     The sequence

          prog = re.compile(pat)
          result = prog.match(str)

     is equivalent to

          result = re.match(pat, str)

     but the version using `compile()' is more efficient when the
     expression will be used several times in a single program.


     Perform case-insensitive matching; expressions like "[A-Z]" will
     match lowercase letters, too.  This is not affected by the current


     Make "\w", "\W", "\b", and "\B" dependent on the current locale.


     When specified, the pattern character `^' matches at the beginning
     of the string and at the beginning of each line (immediately
     following each newline); and the pattern character `$' matches at
     the end of the string and at the end of each line (immediately
     preceding each newline).  By default, `^' matches only at the
     beginning of the string, and `$' only at the end of the string and
     immediately before the newline (if any) at the end of the string.


     Make the `.' special character match any character at all,
     including a newline; without this flag, `.' will match anything
     _except_ a newline.


     Make "\w", "\W", "\b", and "\B" dependent on the Unicode character
     properties database.  _Added in Python version 2.0_


     This flag allows you to write regular expressions that look nicer.
     Whitespace within the pattern is ignored, except when in a
     character class or preceded by an unescaped backslash, and, when a
     line contains a `#' neither in a character class or preceded by an
     unescaped backslash, all characters from the leftmost such `#'
     through the end of the line are ignored.

`search(pattern, string[, flags])'
     Scan through STRING looking for a location where the regular
     expression PATTERN produces a match, and return a corresponding
     `MatchObject' instance.  Return `None' if no position in the
     string matches the pattern; note that this is different from
     finding a zero-length match at some point in the string.

`match(pattern, string[, flags])'
     If zero or more characters at the beginning of STRING match the
     regular expression PATTERN, return a corresponding `MatchObject'
     instance.  Return `None' if the string does not match the pattern;
     note that this is different from a zero-length match.

     *Note:*  If you want to locate a match anywhere in STRING, use
     `search()' instead.

`split(pattern, string[, maxsplit` = 0'])'
     Split STRING by the occurrences of PATTERN.  If capturing
     parentheses are used in PATTERN, then the text of all groups in
     the pattern are also returned as part of the resulting list.  If
     MAXSPLIT is nonzero, at most MAXSPLIT splits occur, and the
     remainder of the string is returned as the final element of the
     list.  (Incompatibility note: in the original Python 1.5 release,
     MAXSPLIT was ignored.  This has been fixed in later releases.)

          >>> re.split('\W+', 'Words, words, words.')
          ['Words', 'words', 'words', '']
          >>> re.split('(\W+)', 'Words, words, words.')
          ['Words', ', ', 'words', ', ', 'words', '.', '']
          >>> re.split('\W+', 'Words, words, words.', 1)
          ['Words', 'words, words.']

     This function combines and extends the functionality of the old
     `regsub.split()' and `regsub.splitx()'.

`findall(pattern, string)'
     Return a list of all non-overlapping matches of PATTERN in STRING.
     If one or more groups are present in the pattern, return a list
     of groups; this will be a list of tuples if the pattern has more
     than one group.  Empty matches are included in the result.  _Added
     in Python version 1.5.2_

`sub(pattern, repl, string[, count` = 0'])'
     Return the string obtained by replacing the leftmost
     non-overlapping occurrences of PATTERN in STRING by the replacement
     REPL.  If the pattern isn't found, STRING is returned unchanged.
     REPL can be a string or a function; if a function, it is called
     for every non-overlapping occurrence of PATTERN.  The function
     takes a single match object argument, and returns the replacement
     string.  For example:

          >>> def dashrepl(matchobj):
          ....    if == '-': return ' '
          ....    else: return '-'
          >>> re.sub('-{1,2}', dashrepl, 'pro----gram-files')
          'pro--gram files'

     The pattern may be a string or an RE object; if you need to specify
     regular expression flags, you must use a RE object, or use
     embedded modifiers in a pattern; e.g.  `sub("(?i)b+", "x", "bbbb
     BBBB")' returns `'x x''.

     The optional argument COUNT is the maximum number of pattern
     occurrences to be replaced; COUNT must be a non-negative integer,
     and the default value of 0 means to replace all occurrences.

     Empty matches for the pattern are replaced only when not adjacent
     to a previous match, so `sub('x*', '-', 'abc')' returns

     If REPL is a string, any backslash escapes in it are processed.
     That is, `\n' is converted to a single newline character, `\r' is
     converted to a linefeed, and so forth.  Unknown escapes such as
     `\j' are left alone.  Backreferences, such as `\6', are replaced
     with the substring matched by group 6 in the pattern.

     In addition to character escapes and backreferences as described
     above, `\g<name>' will use the substring matched by the group
     named `name', as defined by the "(?P<name>...)" syntax.
     `\g<number>' uses the corresponding group number; `\g<2>' is
     therefore equivalent to `\2', but isn't ambiguous in a replacement
     such as `\g<2>0'.  `\20' would be interpreted as a reference to
     group 20, not a reference to group 2 followed by the literal
     character `0'.

`subn(pattern, repl, string[, count` = 0'])'
     Perform the same operation as `sub()', but return a tuple

     Return STRING with all non-alphanumerics backslashed; this is
     useful if you want to match an arbitrary literal string that may
     have regular expression metacharacters in it.

     Exception raised when a string passed to one of the functions here
     is not a valid regular expression (e.g., unmatched parentheses) or
     when some other error occurs during compilation or matching.  It is
     never an error if a string contains no match for a pattern.

automatically generated by info2www version