html2 Flex - a scanner generator: Incompatibilities Whole document tree

Whole document tree

html2 Flex - a scanner generator: Incompatibilities
[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

20. Incompatibilities with lex and POSIX

flex is a rewrite of the AT&T Unix lex tool (the two implementations do not share any code, though), with some extensions and incompatibilities, both of which are of concern to those who wish to write scanners acceptable to either implementation. Flex is fully compliant with the POSIX lex specification, except that when using `%pointer' (the default), a call to `unput()' destroys the contents of yytext, which is counter to the POSIX specification.

In this section we discuss all of the known areas of incompatibility between flex, AT&T lex, and the POSIX specification.

flex's `-l' option turns on maximum compatibility with the original AT&T lex implementation, at the cost of a major loss in the generated scanner's performance. We note below which incompatibilities can be overcome using the `-l' option.

flex is fully compatible with lex with the following exceptions:

  • The undocumented lex scanner internal variable yylineno is not supported unless `-l' or `%option yylineno' is used. yylineno should be maintained on a per-buffer basis, rather than a per-scanner (single global variable) basis. yylineno is not part of the POSIX specification.

  • The `input()' routine is not redefinable, though it may be called to read characters following whatever has been matched by a rule. If `input()' encounters an end-of-file the normal `yywrap()' processing is done. A "real" end-of-file is returned by `input()' as EOF.

    Input is instead controlled by defining the YY_INPUT macro.

    The flex restriction that `input()' cannot be redefined is in accordance with the POSIX specification, which simply does not specify any way of controlling the scanner's input other than by making an initial assignment to yyin.

  • The `unput()' routine is not redefinable. This restriction is in accordance with POSIX.

  • flex scanners are not as reentrant as lex scanners. In particular, if you have an interactive scanner and an interrupt handler which long-jumps out of the scanner, and the scanner is subsequently called again, you may get the following message:

    fatal flex scanner internal error--end of buffer missed

    To reenter the scanner, first use

    yyrestart( yyin );

    Note that this call will throw away any buffered input; usually this isn't a problem with an interactive scanner.

    Also note that flex C++ scanner classes are reentrant, so if using C++ is an option for you, you should use them instead. See section Generating C++ Scanners.

  • `output()' is not supported. Output from the `ECHO' macro is done to the file-pointer yyout (default stdout).

    `output()' is not part of the POSIX specification.

  • lex does not support exclusive start conditions (%x), though they are in the POSIX specification.

  • When definitions are expanded, flex encloses them in parentheses. With lex, the following:

    NAME    [A-Z][A-Z0-9]*
    foo{NAME}?      printf( "Found it\n" );

    will not match the string "foo" because when the macro is expanded the rule is equivalent to "foo[A-Z][A-Z0-9]*?" and the precedence is such that the '?' is associated with "[A-Z0-9]*". With flex, the rule will be expanded to "foo([A-Z][A-Z0-9]*)?" and so the string "foo" will match.

    Note that if the definition begins with `^' or ends with `$' then it is not expanded with parentheses, to allow these operators to appear in definitions without losing their special meanings. But the `<s>, /', and `<<EOF>>' operators cannot be used in a flex definition.

    Using `-l' results in the lex behavior of no parentheses around the definition.

    The POSIX specification is that the definition be enclosed in parentheses.

  • Some implementations of lex allow a rule's action to begin on a separate line, if the rule's pattern has trailing whitespace:

    foo|bar<space here>
      { foobar_action(); }

    flex does not support this feature.

  • The lex `%r' (generate a Ratfor scanner) option is not supported. It is not part of the POSIX specification.

  • After a call to `unput()', yytext is undefined until the next token is matched, unless the scanner was built using `%array'. This is not the case with lex or the POSIX specification. The `-l' option does away with this incompatibility.

  • The precedence of the `{}' (numeric range) operator is different. lex interprets "abc{1,3}" as "match one, two, or three occurrences of 'abc'", whereas flex interprets it as "match 'ab' followed by one, two, or three occurrences of 'c'". The latter is in agreement with the POSIX specification.

  • The precedence of the `^' operator is different. lex interprets "^foo|bar" as "match either 'foo' at the beginning of a line, or 'bar' anywhere", whereas flex interprets it as "match either 'foo' or 'bar' if they come at the beginning of a line". The latter is in agreement with the POSIX specification.

  • The special table-size declarations such as `%a' supported by lex are not required by flex scanners; flex ignores them.

  • The name FLEX_SCANNER is #define'd so scanners may be written for use with either flex or lex. Scanners also include YY_FLEX_MAJOR_VERSION and YY_FLEX_MINOR_VERSION indicating which version of flex generated the scanner (for example, for the 2.5 release, these defines would be 2 and 5 respectively).

The following flex features are not included in lex or the POSIX specification:

C++ scanners
start condition scopes
start condition stacks
interactive/non-interactive scanners
yy_scan_string() and friends
#line directives
%{}'s around actions
multiple actions on a line

plus almost all of the flex flags. The last feature in the list refers to the fact that with flex you can put multiple actions on the same line, separated with semicolons, while with lex, the following

foo    handle_foo(); ++num_foos_seen;

is (rather surprisingly) truncated to

foo    handle_foo();

flex does not truncate the action. Actions that are not enclosed in braces are simply terminated at the end of the line.

[ << ] [ >> ]           [Top] [Contents] [Index] [ ? ]

This document was generated by root on March, 17 2002 using texi2html