GNU Info

Info Node: (libc.info)Subexpression Complications

(libc.info)Subexpression Complications


Next: Regexp Cleanup Prev: Regexp Subexpressions Up: Regular Expressions
Enter node , (file) or (file)node

Complications in Subexpression Matching
---------------------------------------

   Sometimes a subexpression matches a substring of no characters.  This
happens when `f\(o*\)' matches the string `fum'.  (It really matches
just the `f'.)  In this case, both of the offsets identify the point in
the string where the null substring was found.  In this example, the
offsets are both `1'.

   Sometimes the entire regular expression can match without using some
of its subexpressions at all--for example, when `ba\(na\)*' matches the
string `ba', the parenthetical subexpression is not used.  When this
happens, `regexec' stores `-1' in both fields of the element for that
subexpression.

   Sometimes matching the entire regular expression can match a
particular subexpression more than once--for example, when `ba\(na\)*'
matches the string `bananana', the parenthetical subexpression matches
three times.  When this happens, `regexec' usually stores the offsets
of the last part of the string that matched the subexpression.  In the
case of `bananana', these offsets are `6' and `8'.

   But the last match is not always the one that is chosen.  It's more
accurate to say that the last _opportunity_ to match is the one that
takes precedence.  What this means is that when one subexpression
appears within another, then the results reported for the inner
subexpression reflect whatever happened on the last match of the outer
subexpression.  For an example, consider `\(ba\(na\)*s \)*' matching
the string `bananas bas '.  The last time the inner expression actually
matches is near the end of the first word.  But it is _considered_
again in the second word, and fails to match there.  `regexec' reports
nonuse of the "na" subexpression.

   Another place where this rule applies is when the regular expression
     \(ba\(na\)*s \|nefer\(ti\)* \)*

matches `bananas nefertiti'.  The "na" subexpression does match in the
first word, but it doesn't match in the second word because the other
alternative is used there.  Once again, the second repetition of the
outer subexpression overrides the first, and within that second
repetition, the "na" subexpression is not used.  So `regexec' reports
nonuse of the "na" subexpression.


automatically generated by info2www version 1.2.2.9