GNU Info

Info Node: (flex.info)Incompatibilities

(flex.info)Incompatibilities


Next: Diagnostics Prev: C++ Up: Top
Enter node , (file) or (file)node

Incompatibilities with `lex' and POSIX
**************************************

   `flex' is a rewrite of the AT&T Unix `lex' tool (the two
implementations do not share any code, though), with some extensions
and incompatibilities, both of which are of concern to those who wish
to write scanners acceptable to either implementation.  Flex is fully
compliant with the POSIX `lex' specification, except that when using
`%pointer' (the default), a call to `unput()' destroys the contents of
`yytext', which is counter to the POSIX specification.

   In this section we discuss all of the known areas of incompatibility
between flex, AT&T lex, and the POSIX specification.

   `flex's' `-l' option turns on maximum compatibility with the
original AT&T `lex' implementation, at the cost of a major loss in the
generated scanner's performance.  We note below which incompatibilities
can be overcome using the `-l' option.

   `flex' is fully compatible with `lex' with the following exceptions:

   - The undocumented `lex' scanner internal variable `yylineno' is not
     supported unless `-l' or `%option yylineno' is used.  `yylineno'
     should be maintained on a per-buffer basis, rather than a
     per-scanner (single global variable) basis.  `yylineno' is not
     part of the POSIX specification.

   - The `input()' routine is not redefinable, though it may be called
     to read characters following whatever has been matched by a rule.
     If `input()' encounters an end-of-file the normal `yywrap()'
     processing is done.  A "real" end-of-file is returned by `input()'
     as `EOF'.

     Input is instead controlled by defining the `YY_INPUT' macro.

     The `flex' restriction that `input()' cannot be redefined is in
     accordance with the POSIX specification, which simply does not
     specify any way of controlling the scanner's input other than by
     making an initial assignment to `yyin'.

   - The `unput()' routine is not redefinable.  This restriction is in
     accordance with POSIX.

   - `flex' scanners are not as reentrant as `lex' scanners.  In
     particular, if you have an interactive scanner and an interrupt
     handler which long-jumps out of the scanner, and the scanner is
     subsequently called again, you may get the following message:

          fatal flex scanner internal error--end of buffer missed

     To reenter the scanner, first use

          yyrestart( yyin );

     Note that this call will throw away any buffered input; usually
     this isn't a problem with an interactive scanner.

     Also note that flex C++ scanner classes _are_ reentrant, so if
     using C++ is an option for you, you should use them instead.
     Note: Generating C++ Scanners.

   - `output()' is not supported.  Output from the `ECHO' macro is done
     to the file-pointer `yyout' (default `stdout').

     `output()' is not part of the POSIX specification.

   - `lex' does not support exclusive start conditions (%x), though
     they are in the POSIX specification.

   - When definitions are expanded, `flex' encloses them in
     parentheses.  With lex, the following:

          NAME    [A-Z][A-Z0-9]*
          %%
          foo{NAME}?      printf( "Found it\n" );
          %%

     will not match the string "foo" because when the macro is expanded
     the rule is equivalent to "foo[A-Z][A-Z0-9]*?" and the precedence
     is such that the '?' is associated with "[A-Z0-9]*".  With `flex',
     the rule will be expanded to "foo([A-Z][A-Z0-9]*)?" and so the
     string "foo" will match.

     Note that if the definition begins with `^' or ends with `$' then
     it is _not_ expanded with parentheses, to allow these operators to
     appear in definitions without losing their special meanings.  But
     the `<s>, /', and `<<EOF>>' operators cannot be used in a `flex'
     definition.

     Using `-l' results in the `lex' behavior of no parentheses around
     the definition.

     The POSIX specification is that the definition be enclosed in
     parentheses.

   - Some implementations of `lex' allow a rule's action to begin on a
     separate line, if the rule's pattern has trailing whitespace:

          %%
          foo|bar<space here>
            { foobar_action(); }

     `flex' does not support this feature.

   - The `lex' `%r' (generate a Ratfor scanner) option is not
     supported.  It is not part of the POSIX specification.

   - After a call to `unput()', `yytext' is undefined until the next
     token is matched, unless the scanner was built using `%array'.
     This is not the case with `lex' or the POSIX specification.  The
     `-l' option does away with this incompatibility.

   - The precedence of the `{}' (numeric range) operator is different.
     `lex' interprets "abc{1,3}" as "match one, two, or three
     occurrences of 'abc'", whereas `flex' interprets it as "match 'ab'
     followed by one, two, or three occurrences of 'c'".  The latter is
     in agreement with the POSIX specification.

   - The precedence of the `^' operator is different.  `lex' interprets
     "^foo|bar" as "match either 'foo' at the beginning of a line, or
     'bar' anywhere", whereas `flex' interprets it as "match either
     'foo' or 'bar' if they come at the beginning of a line".  The
     latter is in agreement with the POSIX specification.

   - The special table-size declarations such as `%a' supported by
     `lex' are not required by `flex' scanners; `flex' ignores them.

   - The name FLEX_SCANNER is #define'd so scanners may be written for
     use with either `flex' or `lex'.  Scanners also include
     `YY_FLEX_MAJOR_VERSION' and `YY_FLEX_MINOR_VERSION' indicating
     which version of `flex' generated the scanner (for example, for the
     2.5 release, these defines would be 2 and 5 respectively).

   The following `flex' features are not included in `lex' or the POSIX
specification:

     C++ scanners
     %option
     start condition scopes
     start condition stacks
     interactive/non-interactive scanners
     yy_scan_string() and friends
     yyterminate()
     yy_set_interactive()
     yy_set_bol()
     YY_AT_BOL()
     <<EOF>>
     <*>
     YY_DECL
     YY_START
     YY_USER_ACTION
     YY_USER_INIT
     #line directives
     %{}'s around actions
     multiple actions on a line

plus almost all of the flex flags.  The last feature in the list refers
to the fact that with `flex' you can put multiple actions on the same
line, separated with semicolons, while with `lex', the following

     foo    handle_foo(); ++num_foos_seen;

is (rather surprisingly) truncated to

     foo    handle_foo();

   `flex' does not truncate the action.  Actions that are not enclosed
in braces are simply terminated at the end of the line.


automatically generated by info2www version 1.2.2.9