GNU Info

Info Node: (cppinternals-300.info)Whitespace

(cppinternals-300.info)Whitespace


Next: Hash Nodes Prev: Lexer Up: Top
Enter node , (file) or (file)node

Whitespace
**********

   The lexer has been written to treat each of `\r', `\n', `\r\n' and
`\n\r' as a single new line indicator.  This allows it to transparently
preprocess MS-DOS, Macintosh and Unix files without their needing to
pass through a special filter beforehand.

   We also decided to treat a backslash, either `\' or the trigraph
`??/', separated from one of the above newline indicators by
non-comment whitespace only, as intending to escape the newline.  It
tends to be a typing mistake, and cannot reasonably be mistaken for
anything else in any of the C-family grammars.  Since handling it this
way is not strictly conforming to the ISO standard, the library issues a
warning wherever it encounters it.

   Handling newlines like this is made simpler by doing it in one place
only.  The function `handle_newline' takes care of all newline
characters, and `skip_escaped_newlines' takes care of arbitrarily long
sequences of escaped newlines, deferring to `handle_newline' to handle
the newlines themselves.

   Another whitespace issue only concerns the stand-alone preprocessor:
we want to guarantee that re-reading the preprocessed output results in
an identical token stream.  Without taking special measures, this might
not be the case because of macro substitution.  We could simply insert a
space between adjacent tokens, but ideally we would like to keep this to
a minimum, both for aesthetic reasons and because it causes problems for
people who still try to abuse the preprocessor for things like Fortran
source and Makefiles.

   The token structure contains a flags byte, and two flags are of
interest here: `PREV_WHITE' and `AVOID_LPASTE'.  `PREV_WHITE' indicates
that the token was preceded by whitespace; if this is the case we need
not worry about it incorrectly pasting with its predecessor.  The
`AVOID_LPASTE' flag is set by the macro expansion routines, and
indicates that paste avoidance by insertion of a space to the left of
the token may be necessary.  Recursively, the first token of a macro
substitution, the first token after a macro substitution, the first
token of a substituted argument, and the first token after a substituted
argument are all flagged `AVOID_LPASTE' by the macro expander.

   If a token flagged in this way does not have a `PREV_WHITE' flag,
and the routine `cpp_avoid_paste' determines that it might be
misinterpreted by the lexer if a space is not inserted between it and
the immediately preceding token, then stand-alone CPP's output routines
will insert a space between them.  To avoid excessive spacing,
`cpp_avoid_paste' tries hard to only request a space if one is likely
to be necessary, but for reasons of efficiency it is slightly
conservative and might recommend a space where one is not strictly
needed.

   Finally, the preprocessor takes great care to ensure it keeps track
of both the position of a token in the source file, for diagnostic
purposes, and where it should appear in the output file, because using
CPP for other languages like assembler requires this.  The two positions
may differ for the following reasons:

   * Escaped newlines are deleted, so lines spliced in this way are
     joined to form a single logical line.

   * A macro expansion replaces the tokens that form its invocation,
     but any newlines appearing in the macro's arguments are
     interpreted as a single space, with the result that the macro's
     replacement appears in full on the same line that the macro name
     appeared in the source file.  This is particularly important for
     stringification of arguments--newlines embedded in the arguments
     must appear in the string as spaces.

   The source file location is maintained in the `lineno' member of the
`cpp_buffer' structure, and the column number inferred from the current
position in the buffer relative to the `line_base' buffer variable,
which is updated with every newline whether escaped or not.

   TODO: Finish this.


automatically generated by info2www version 1.2.2.9