GNU Info

Info Node: (cppinternals-300.info)Hash Nodes

(cppinternals-300.info)Hash Nodes


Next: Macro Expansion Prev: Whitespace Up: Top
Enter node , (file) or (file)node

Hash Nodes
**********

   When cpplib encounters an "identifier", it generates a hash code for
it and stores it in the hash table.  By "identifier" we mean tokens with
type `CPP_NAME'; this includes identifiers in the usual C sense, as
well as keywords, directive names, macro names and so on.  For example,
all of `pragma', `int', `foo' and `__GNUC__' are identifiers and hashed
when lexed.

   Each node in the hash table contain various information about the
identifier it represents.  For example, its length and type.  At any one
time, each identifier falls into exactly one of three categories:

   * Macros

     These have been declared to be macros, either on the command line
     or with `#define'.  A few, such as `__TIME__' are builtins entered
     in the hash table during initialisation.  The hash node for a
     normal macro points to a structure with more information about the
     macro, such as whether it is function-like, how many arguments it
     takes, and its expansion.  Builtin macros are flagged as special,
     and instead contain an enum indicating which of the various
     builtin macros it is.

   * Assertions

     Assertions are in a separate namespace to macros.  To enforce
     this, cpp actually prepends a `#' character before hashing and
     entering it in the hash table.  An assertion's node points to a
     chain of answers to that assertion.

   * Void

     Everything else falls into this category--an identifier that is not
     currently a macro, or a macro that has since been undefined with
     `#undef'.

     When preprocessing C++, this category also includes the named
     operators, such as `xor'.  In expressions these behave like the
     operators they represent, but in contexts where the spelling of a
     token matters they are spelt differently.  This spelling
     distinction is relevant when they are operands of the stringizing
     and pasting macro operators `#' and `##'.  Named operator hash
     nodes are flagged, both to catch the spelling distinction and to
     prevent them from being defined as macros.

   The same identifiers share the same hash node.  Since each identifier
token, after lexing, contains a pointer to its hash node, this is used
to provide rapid lookup of various information.  For example, when
parsing a `#define' statement, CPP flags each argument's identifier
hash node with the index of that argument.  This makes duplicated
argument checking an O(1) operation for each argument.  Similarly, for
each identifier in the macro's expansion, lookup to see if it is an
argument, and which argument it is, is also an O(1) operation.  Further,
each directive name, such as `endif', has an associated directive enum
stored in its hash node, so that directive lookup is also O(1).


automatically generated by info2www version 1.2.2.9