Copyright (C) 2000-2012 |
GNU Info (cppinternals-300.info)WhitespaceWhitespace ********** The lexer has been written to treat each of `\r', `\n', `\r\n' and `\n\r' as a single new line indicator. This allows it to transparently preprocess MS-DOS, Macintosh and Unix files without their needing to pass through a special filter beforehand. We also decided to treat a backslash, either `\' or the trigraph `??/', separated from one of the above newline indicators by non-comment whitespace only, as intending to escape the newline. It tends to be a typing mistake, and cannot reasonably be mistaken for anything else in any of the C-family grammars. Since handling it this way is not strictly conforming to the ISO standard, the library issues a warning wherever it encounters it. Handling newlines like this is made simpler by doing it in one place only. The function `handle_newline' takes care of all newline characters, and `skip_escaped_newlines' takes care of arbitrarily long sequences of escaped newlines, deferring to `handle_newline' to handle the newlines themselves. Another whitespace issue only concerns the stand-alone preprocessor: we want to guarantee that re-reading the preprocessed output results in an identical token stream. Without taking special measures, this might not be the case because of macro substitution. We could simply insert a space between adjacent tokens, but ideally we would like to keep this to a minimum, both for aesthetic reasons and because it causes problems for people who still try to abuse the preprocessor for things like Fortran source and Makefiles. The token structure contains a flags byte, and two flags are of interest here: `PREV_WHITE' and `AVOID_LPASTE'. `PREV_WHITE' indicates that the token was preceded by whitespace; if this is the case we need not worry about it incorrectly pasting with its predecessor. The `AVOID_LPASTE' flag is set by the macro expansion routines, and indicates that paste avoidance by insertion of a space to the left of the token may be necessary. Recursively, the first token of a macro substitution, the first token after a macro substitution, the first token of a substituted argument, and the first token after a substituted argument are all flagged `AVOID_LPASTE' by the macro expander. If a token flagged in this way does not have a `PREV_WHITE' flag, and the routine `cpp_avoid_paste' determines that it might be misinterpreted by the lexer if a space is not inserted between it and the immediately preceding token, then stand-alone CPP's output routines will insert a space between them. To avoid excessive spacing, `cpp_avoid_paste' tries hard to only request a space if one is likely to be necessary, but for reasons of efficiency it is slightly conservative and might recommend a space where one is not strictly needed. Finally, the preprocessor takes great care to ensure it keeps track of both the position of a token in the source file, for diagnostic purposes, and where it should appear in the output file, because using CPP for other languages like assembler requires this. The two positions may differ for the following reasons: * Escaped newlines are deleted, so lines spliced in this way are joined to form a single logical line. * A macro expansion replaces the tokens that form its invocation, but any newlines appearing in the macro's arguments are interpreted as a single space, with the result that the macro's replacement appears in full on the same line that the macro name appeared in the source file. This is particularly important for stringification of arguments--newlines embedded in the arguments must appear in the string as spaces. The source file location is maintained in the `lineno' member of the `cpp_buffer' structure, and the column number inferred from the current position in the buffer relative to the `line_base' buffer variable, which is updated with every newline whether escaped or not. TODO: Finish this. automatically generated by info2www version 1.2.2.9 |