GNU Info

Info Node: (python2.1-lib.info)codecs

(python2.1-lib.info)codecs


Next: unicodedata Prev: cStringIO Up: String Services
Enter node , (file) or (file)node

Codec registry and base classes
===============================

Encode and decode data and streams.  This module was written by
Marc-Andre Lemburg <mal@lemburg.com>.
This manual section was written by Marc-Andre Lemburg <mal@lemburg.com>.
This module defines base classes for standard Python codecs (encoders
and decoders) and provides access to the internal Python codec registry
which manages the codec lookup process.

It defines the following functions:

`register(search_function)'
     Register a codec search function. Search functions are expected to
     take one argument, the encoding name in all lower case letters, and
     return a tuple of functions `(ENCODER, DECODER, STREAM_READER,
     STREAM_WRITER)' taking the following arguments:

     ENCODER and DECODER: These must be functions or methods which have
     the same interface as the `encode()'/`decode()' methods of Codec
     instances (see Codec Interface). The functions/methods are
     expected to work in a stateless mode.

     STREAM_READER and STREAM_WRITER: These have to be factory
     functions providing the following interface:

     `factory(STREAM, ERRORS='strict')'

     The factory functions must return objects providing the interfaces
     defined by the base classes `StreamWriter' and `StreamReader',
     respectively. Stream codecs can maintain state.

     Possible values for errors are `'strict'' (raise an exception in
     case of an encoding error), `'replace'' (replace malformed data
     with a suitable replacement marker, such as `?') and `'ignore''
     (ignore malformed data and continue without further notice).

     In case a search function cannot find a given encoding, it should
     return `None'.

`lookup(encoding)'
     Looks up a codec tuple in the Python codec registry and returns the
     function tuple as defined above.

     Encodings are first looked up in the registry's cache. If not
     found, the list of registered search functions is scanned. If no
     codecs tuple is found, a `LookupError' is raised. Otherwise, the
     codecs tuple is stored in the cache and returned to the caller.

To simplify working with encoded files or stream, the module also
defines these utility functions:

`open(filename, mode[, encoding[, errors[, buffering]]])'
     Open an encoded file using the given MODE and return a wrapped
     version providing transparent encoding/decoding.

     *Note:* The wrapped version will only accept the object format
     defined by the codecs, i.e. Unicode objects for most built-in
     codecs.  Output is also codec-dependent and will usually be
     Unicode as well.

     ENCODING specifies the encoding which is to be used for the the
     file.

     ERRORS may be given to define the error handling. It defaults to
     `'strict'' which causes a `ValueError' to be raised in case an
     encoding error occurs.

     BUFFERING has the same meaning as for the built-in `open()'
     function.  It defaults to line buffered.

`EncodedFile(file, input[, output[, errors]])'
     Return a wrapped version of file which provides transparent
     encoding translation.

     Strings written to the wrapped file are interpreted according to
     the given INPUT encoding and then written to the original file as
     strings using the OUTPUT encoding. The intermediate encoding will
     usually be Unicode but depends on the specified codecs.

     If OUTPUT is not given, it defaults to INPUT.

     ERRORS may be given to define the error handling. It defaults to
     `'strict'', which causes `ValueError' to be raised in case an
     encoding error occurs.

The module also provides the following constants which are useful for
reading and writing to platform dependent files:

`BOM'

`BOM_BE'

`BOM_LE'

`BOM32_BE'

`BOM32_LE'

`BOM64_BE'

`BOM64_LE'
     These constants define the byte order marks (BOM) used in data
     streams to indicate the byte order used in the stream or file.
     `BOM' is either `BOM_BE' or `BOM_LE' depending on the platform's
     native byte order, while the others represent big endian (`_BE'
     suffix) and little endian (`_LE' suffix) byte order using 32-bit
     and 64-bit encodings.

See also:
    <http://sourceforge.net/projects/python-codecs/>
          A SourceForge project working on additional support for Asian
          codecs for use with Python.  They are in the early stages of
          development at the time of this writing -- look in their FTP
          area for downloadable files.

Codec Base Classes

automatically generated by info2www version 1.2.2.9