GNU Info

Info Node: (python2.1-ref.info)String literals

(python2.1-ref.info)String literals


Next: String literal concatenation Prev: Literals Up: Literals
Enter node , (file) or (file)node

String literals
---------------

String literals are described by the following lexical definitions:

     stringliteral:   shortstring | longstring
     shortstring:     "'" shortstringitem* "'" | '"' shortstringitem* '"'
     longstring:      "'''" longstringitem* "'''" | '"""' longstringitem* '"""'
     shortstringitem: shortstringchar | escapeseq
     longstringitem:  longstringchar | escapeseq
     shortstringchar: <any ASCII character except "\" or newline or the quote>
     longstringchar:  <any ASCII character except "\">
     escapeseq:       "\" <any ASCII character>

In plain English: String literals can be enclosed in matching single
quotes (`'') or double quotes (`"').  They can also be enclosed in
matching groups of three single or double quotes (these are generally
referred to as _triple-quoted strings_).  The backslash (`\') character
is used to escape characters that otherwise have a special meaning,
such as newline, backslash itself, or the quote character.  String
literals may optionally be prefixed with a letter `r' or `R'; such
strings are called "raw strings"  and use different rules for backslash
escape sequences.  A prefix of 'u' or 'U' makes the string a Unicode
string.  Unicode strings use the Unicode character set as defined by
the Unicode Consortium and ISO~10646.  Some additional escape
sequences, described below, are available in Unicode strings.

In triple-quoted strings, unescaped newlines and quotes are allowed
(and are retained), except that three unescaped quotes in a row
terminate the string.  (A "quote" is the character used to open the
string, i.e. either `'' or `"'.)

Unless an `r' or `R' prefix is present, escape sequences in strings are
interpreted according to rules similar to those used by Standard C.
The recognized escape sequences are:

Escape Sequence                      Meaning
------                               -----
\NEWLINE                             Ignored
\\                                   Backslash (`\')
\'                                   Single quote (`'')
\"                                   Double quote (`"')
\a                                   ASCII Bell (BEL)
\b                                   ASCII Backspace (BS)
\f                                   ASCII Formfeed (FF)
\n                                   ASCII Linefeed (LF)
\N{NAME}                             Character named NAME in the Unicode
                                     database (Unicode only)
\r                                   ASCII Carriage Return (CR)
\t                                   ASCII Horizontal Tab (TAB)
\uXXXX                               Character with 16-bit hex value
                                     XXXX (Unicode only)
\UXXXXXXXX                           Character with 32-bit hex value
                                     XXXXXXXX (Unicode only)
\v                                   ASCII Vertical Tab (VT)
\OOO                                 ASCII character with octal value OOO
\xHH                                 ASCII character with hex value HH

As in Standard C, up to three octal digits are accepted.  However,
exactly two hex digits are taken in hex escapes.

Unlike Standard C, all unrecognized escape sequences are left in the
string unchanged, i.e., _the backslash is left in the string_.  (This
behavior is useful when debugging: if an escape sequence is mistyped,
the resulting output is more easily recognized as broken.)  It is also
important to note that the escape sequences marked as "(Unicode only)"
in the table above fall into the category of unrecognized escapes for
non-Unicode string literals.

When an `r' or `R' prefix is present, a character following a backslash
is included in the string without change, and _all backslashes are left
in the string_.  For example, the string literal `r"\n"' consists of
two characters: a backslash and a lowercase `n'.  String quotes can be
escaped with a backslash, but the backslash remains in the string; for
example, `r"\""' is a valid string literal consisting of two
characters: a backslash and a double quote; `r"\"' is not a value
string literal (even a raw string cannot end in an odd number of
backslashes).  Specifically, _a raw string cannot end in a single
backslash_ (since the backslash would escape the following quote
character).  Note also that a single backslash followed by a newline is
interpreted as those two characters as part of the string, _not_ as a
line continuation.


automatically generated by info2www version 1.2.2.9