GNU Info

Info Node: (python2.1-api.info)Builtin Codecs

(python2.1-api.info)Builtin Codecs


Next: Methods and Slot Functions Prev: Unicode Objects Up: Unicode Objects
Enter node , (file) or (file)node

Builtin Codecs
..............

Python provides a set of builtin codecs which are written in C for
speed. All of these codecs are directly usable via the following
functions.

Many of the following APIs take two arguments encoding and errors.
These parameters encoding and errors have the same semantics as the
ones of the builtin unicode() Unicode object constructor.

Setting encoding to NULL causes the default encoding to be used which
is UTF-8.

Error handling is set by errors which may also be set to NULL meaning
to use the default handling defined for the codec. Default error
handling for all builtin codecs is "strict" (ValueErrors are raised).

The codecs all use a similar interface. Only deviation from the
following generic ones are documented for simplicity.

These are the generic codec APIs:

`PyObject* PyUnicode_Decode(const char *s, int size, const char *encoding, const char *errors)'
     Create a Unicode object by decoding SIZE bytes of the encoded
     string S. ENCODING and ERRORS have the same meaning as the
     parameters of the same name in the unicode() builtin function. The
     codec to be used is looked up using the Python codec registry.
     Returns `NULL' in case an exception was raised by the codec.

`PyObject* PyUnicode_Encode(const Py_UNICODE *s, int size, const char *encoding, const char *errors)'
     Encodes the `Py_UNICODE' buffer of the given size and returns a
     Python string object. ENCODING and ERRORS have the same meaning as
     the parameters of the same name in the Unicode .encode() method.
     The codec to be used is looked up using the Python codec registry.
     Returns `NULL' in case an exception was raised by the codec.

`PyObject* PyUnicode_AsEncodedString(PyObject *unicode, const char *encoding, const char *errors)'
     Encodes a Unicode object and returns the result as Python string
     object. ENCODING and ERRORS have the same meaning as the
     parameters of the same name in the Unicode .encode() method. The
     codec to be used is looked up using the Python codec registry.
     Returns `NULL' in case an exception was raised by the codec.

These are the UTF-8 codec APIs:

`PyObject* PyUnicode_DecodeUTF8(const char *s, int size, const char *errors)'
     Creates a Unicode object by decoding SIZE bytes of the UTF-8
     encoded string S. Returns `NULL' in case an exception was raised
     by the codec.

`PyObject* PyUnicode_EncodeUTF8(const Py_UNICODE *s, int size, const char *errors)'
     Encodes the `Py_UNICODE' buffer of the given size using UTF-8 and
     returns a Python string object.  Returns `NULL' in case an
     exception was raised by the codec.

`PyObject* PyUnicode_AsUTF8String(PyObject *unicode)'
     Encodes a Unicode objects using UTF-8 and returns the result as
     Python string object. Error handling is "strict". Returns `NULL'
     in case an exception was raised by the codec.

These are the UTF-16 codec APIs:

`PyObject* PyUnicode_DecodeUTF16(const char *s, int size, const char *errors, int *byteorder)'
     Decodes LENGTH bytes from a UTF-16 encoded buffer string and
     returns the corresponding Unicode object.

     ERRORS (if non-NULL) defines the error handling. It defaults to
     "strict".

     If BYTEORDER is non-`NULL', the decoder starts decoding using the
     given byte order:

             *byteorder == -1: little endian
             *byteorder == 0:  native order
             *byteorder == 1:  big endian

     and then switches according to all byte order marks (BOM) it finds
     in the input data. BOM marks are not copied into the resulting
     Unicode string.  After completion, *BYTEORDER is set to the
     current byte order at the end of input data.

     If BYTEORDER is `NULL', the codec starts in native order mode.

     Returns `NULL' in case an exception was raised by the codec.

`PyObject* PyUnicode_EncodeUTF16(const Py_UNICODE *s, int size, const char *errors, int byteorder)'
     Returns a Python string object holding the UTF-16 encoded value of
     the Unicode data in S.

     If BYTEORDER is not `0', output is written according to the
     following byte order:

             byteorder == -1: little endian
             byteorder == 0:  native byte order (writes a BOM mark)
             byteorder == 1:  big endian

     If byteorder is `0', the output string will always start with the
     Unicode BOM mark (U+FEFF). In the other two modes, no BOM mark is
     prepended.

     Note that `Py_UNICODE' data is being interpreted as UTF-16 reduced
     to UCS-2. This trick makes it possible to add full UTF-16
     capabilities at a later point without comprimising the APIs.

     Returns `NULL' in case an exception was raised by the codec.

`PyObject* PyUnicode_AsUTF16String(PyObject *unicode)'
     Returns a Python string using the UTF-16 encoding in native byte
     order. The string always starts with a BOM mark. Error handling is
     "strict". Returns `NULL' in case an exception was raised by the
     codec.

These are the "Unicode Esacpe" codec APIs:

`PyObject* PyUnicode_DecodeUnicodeEscape(const char *s, int size, const char *errors)'
     Creates a Unicode object by decoding SIZE bytes of the
     Unicode-Esacpe encoded string S. Returns `NULL' in case an
     exception was raised by the codec.

`PyObject* PyUnicode_EncodeUnicodeEscape(const Py_UNICODE *s, int size, const char *errors)'
     Encodes the `Py_UNICODE' buffer of the given size using
     Unicode-Escape and returns a Python string object.  Returns `NULL'
     in case an exception was raised by the codec.

`PyObject* PyUnicode_AsUnicodeEscapeString(PyObject *unicode)'
     Encodes a Unicode objects using Unicode-Escape and returns the
     result as Python string object. Error handling is "strict". Returns
     `NULL' in case an exception was raised by the codec.

These are the "Raw Unicode Esacpe" codec APIs:

`PyObject* PyUnicode_DecodeRawUnicodeEscape(const char *s, int size, const char *errors)'
     Creates a Unicode object by decoding SIZE bytes of the
     Raw-Unicode-Esacpe encoded string S. Returns `NULL' in case an
     exception was raised by the codec.

`PyObject* PyUnicode_EncodeRawUnicodeEscape(const Py_UNICODE *s, int size, const char *errors)'
     Encodes the `Py_UNICODE' buffer of the given size using
     Raw-Unicode-Escape and returns a Python string object.  Returns
     `NULL' in case an exception was raised by the codec.

`PyObject* PyUnicode_AsRawUnicodeEscapeString(PyObject *unicode)'
     Encodes a Unicode objects using Raw-Unicode-Escape and returns the
     result as Python string object. Error handling is "strict". Returns
     `NULL' in case an exception was raised by the codec.

These are the Latin-1 codec APIs:

Latin-1 corresponds to the first 256 Unicode ordinals and only these
are accepted by the codecs during encoding.

`PyObject* PyUnicode_DecodeLatin1(const char *s, int size, const char *errors)'
     Creates a Unicode object by decoding SIZE bytes of the Latin-1
     encoded string S. Returns `NULL' in case an exception was raised
     by the codec.

`PyObject* PyUnicode_EncodeLatin1(const Py_UNICODE *s, int size, const char *errors)'
     Encodes the `Py_UNICODE' buffer of the given size using Latin-1
     and returns a Python string object.  Returns `NULL' in case an
     exception was raised by the codec.

`PyObject* PyUnicode_AsLatin1String(PyObject *unicode)'
     Encodes a Unicode objects using Latin-1 and returns the result as
     Python string object. Error handling is "strict". Returns `NULL'
     in case an exception was raised by the codec.

These are the ASCII codec APIs.  Only 7-bit ASCII data is accepted. All
other codes generate errors.

`PyObject* PyUnicode_DecodeASCII(const char *s, int size, const char *errors)'
     Creates a Unicode object by decoding SIZE bytes of the ASCII
     encoded string S. Returns `NULL' in case an exception was raised
     by the codec.

`PyObject* PyUnicode_EncodeASCII(const Py_UNICODE *s, int size, const char *errors)'
     Encodes the `Py_UNICODE' buffer of the given size using ASCII and
     returns a Python string object.  Returns `NULL' in case an
     exception was raised by the codec.

`PyObject* PyUnicode_AsASCIIString(PyObject *unicode)'
     Encodes a Unicode objects using ASCII and returns the result as
     Python string object. Error handling is "strict". Returns `NULL'
     in case an exception was raised by the codec.

These are the mapping codec APIs:

This codec is special in that it can be used to implement many
different codecs (and this is in fact what was done to obtain most of
the standard codecs included in the `encodings' package). The codec
uses mapping to encode and decode characters.

Decoding mappings must map single string characters to single Unicode
characters, integers (which are then interpreted as Unicode ordinals)
or None (meaning "undefined mapping" and causing an error).

Encoding mappings must map single Unicode characters to single string
characters, integers (which are then interpreted as Latin-1 ordinals)
or None (meaning "undefined mapping" and causing an error).

The mapping objects provided must only support the __getitem__ mapping
interface.

If a character lookup fails with a LookupError, the character is copied
as-is meaning that its ordinal value will be interpreted as Unicode or
Latin-1 ordinal resp. Because of this, mappings only need to contain
those mappings which map characters to different code points.

`PyObject* PyUnicode_DecodeCharmap(const char *s, int size, PyObject *mapping, const char *errors)'
     Creates a Unicode object by decoding SIZE bytes of the encoded
     string S using the given MAPPING object.  Returns `NULL' in case
     an exception was raised by the codec.

`PyObject* PyUnicode_EncodeCharmap(const Py_UNICODE *s, int size, PyObject *mapping, const char *errors)'
     Encodes the `Py_UNICODE' buffer of the given size using the given
     MAPPING object and returns a Python string object.  Returns `NULL'
     in case an exception was raised by the codec.

`PyObject* PyUnicode_AsCharmapString(PyObject *unicode, PyObject *mapping)'
     Encodes a Unicode objects using the given MAPPING object and
     returns the result as Python string object. Error handling is
     "strict". Returns `NULL' in case an exception was raised by the
     codec.

The following codec API is special in that maps Unicode to Unicode.

`PyObject* PyUnicode_TranslateCharmap(const Py_UNICODE *s, int size, PyObject *table, const char *errors)'
     Translates a `Py_UNICODE' buffer of the given length by applying a
     character mapping TABLE to it and returns the resulting Unicode
     object.  Returns `NULL' when an exception was raised by the codec.

     The MAPPING table must map Unicode ordinal integers to Unicode
     ordinal integers or None (causing deletion of the character).

     Mapping tables must only provide the __getitem__ interface, e.g.
     dictionaries or sequences. Unmapped character ordinals (ones which
     cause a LookupError) are left untouched and are copied as-is.

These are the MBCS codec APIs. They are currently only available on
Windows and use the Win32 MBCS converters to implement the conversions.
Note that MBCS (or DBCS) is a class of encodings, not just one.  The
target encoding is defined by the user settings on the machine running
the codec.

`PyObject* PyUnicode_DecodeMBCS(const char *s, int size, const char *errors)'
     Creates a Unicode object by decoding SIZE bytes of the MBCS
     encoded string S.  Returns `NULL' in case an exception was raised
     by the codec.

`PyObject* PyUnicode_EncodeMBCS(const Py_UNICODE *s, int size, const char *errors)'
     Encodes the `Py_UNICODE' buffer of the given size using MBCS and
     returns a Python string object.  Returns `NULL' in case an
     exception was raised by the codec.

`PyObject* PyUnicode_AsMBCSString(PyObject *unicode)'
     Encodes a Unicode objects using MBCS and returns the result as
     Python string object.  Error handling is "strict".  Returns `NULL'
     in case an exception was raised by the codec.


automatically generated by info2www version 1.2.2.9