(python2.1-ref.info)Lexical analysis


Lexical analysis
****************

A Python program is read by a _parser_.  Input to the parser is a
stream of _tokens_, generated by the _lexical analyzer_.  This chapter
describes how the lexical analyzer breaks a file into tokens.

Python uses the 7-bit ASCII character set for program text and string
literals. 8-bit characters may be used in string literals and comments
but their interpretation is platform dependent; the proper way to
insert 8-bit characters in string literals is by using octal or
hexadecimal escape sequences.

The run-time character set depends on the I/O devices connected to the
program but is generally a superset of ASCII.

*Future compatibility note:* It may be tempting to assume that the
character set for 8-bit characters is ISO Latin-1 (an ASCII superset
that covers most western languages that use the Latin alphabet), but it
is possible that in the future Unicode text editors will become common.
These generally use the UTF-8 encoding, which is also an ASCII
superset, but with very different use for the characters with ordinals
128-255.  While there is no consensus on this subject yet, it is unwise
to assume either Latin-1 or UTF-8, even though the current
implementation appears to favor Latin-1.  This applies both to the
source character set and the run-time character set.

Line structure
Other tokens
Identifiers and keywords
Literals
Operators
Delimiters

automatically generated by info2www version 1.2.2.9