GNU Info

Info Node: (python2.1-lib.info)pickle

(python2.1-lib.info)pickle


Next: cPickle Prev: linecache Up: Python Runtime Services
Enter node , (file) or (file)node

Python object serialization
===========================

Convert Python objects to streams of bytes and back.

The `pickle' module implements a basic but powerful algorithm for
"pickling" (a.k.a. serializing, marshalling or flattening) nearly
arbitrary Python objects.  This is the act of converting objects to a
stream of bytes (and back: "unpickling").  This is a more primitive
notion than persistence -- although `pickle' reads and writes file
objects, it does not handle the issue of naming persistent objects, nor
the (even more complicated) area of concurrent access to persistent
objects.  The `pickle' module can transform a complex object into a
byte stream and it can transform the byte stream into an object with
the same internal structure.  The most obvious thing to do with these
byte streams is to write them onto a file, but it is also conceivable
to send them across a network or store them in a database.  The module
`shelve'  provides a simple interface to pickle and unpickle objects on
DBM-style database files.

*Note:* The `pickle' module is rather slow.  A reimplementation of the
same algorithm in C, which is up to 1000 times faster, is available as
the `cPickle'  module.  This has the same interface except that
`Pickler' and `Unpickler' are factory functions, not classes (so they
cannot be used as base classes for inheritance).

Although the `pickle' module can use the built-in module `marshal'
internally, it differs from `marshal' in the way it handles certain
kinds of data:

   * Recursive objects (objects containing references to themselves):
     `pickle' keeps track of the objects it has already serialized, so
     later references to the same object won't be serialized again.
     (The `marshal' module breaks for this.)

   * Object sharing (references to the same object in different
     places):  This is similar to self-referencing objects; `pickle'
     stores the object once, and ensures that all other references
     point to the master copy.  Shared objects remain shared, which can
     be very important for mutable objects.

   * User-defined classes and their instances:  `marshal' does not
     support these at all, but `pickle' can save and restore class
     instances transparently.  The class definition must be importable
     and live in the same module as when the object was stored.


The data format used by `pickle' is Python-specific.  This has the
advantage that there are no restrictions imposed by external standards
such as XDR  (which can't represent pointer sharing); however it means
that non-Python programs may not be able to reconstruct pickled Python
objects.

By default, the `pickle' data format uses a printable ASCII
representation.  This is slightly more voluminous than a binary
representation.  The big advantage of using printable ASCII (and of
some other characteristics of `pickle''s representation) is that for
debugging or recovery purposes it is possible for a human to read the
pickled file with a standard text editor.

A binary format, which is slightly more efficient, can be chosen by
specifying a nonzero (true) value for the BIN argument to the `Pickler'
constructor or the `dump()' and `dumps()' functions.  The binary format
is not the default because of backwards compatibility with the Python
1.4 pickle module.  In a future version, the default may change to
binary.

The `pickle' module doesn't handle code objects, which the `marshal'
module does.  I suppose `pickle' could, and maybe it should, but
there's probably no great need for it right now (as long as `marshal'
continues to be used for reading and writing code objects), and at
least this avoids the possibility of smuggling Trojan horses into a
program.

For the benefit of persistence modules written using `pickle', it
supports the notion of a reference to an object outside the pickled
data stream.  Such objects are referenced by a name, which is an
arbitrary string of printable ASCII characters.  The resolution of such
names is not defined by the `pickle' module -- the persistent object
module will have to implement a method `persistent_load()'.  To write
references to persistent objects, the persistent module must define a
method `persistent_id()' which returns either `None' or the persistent
ID of the object.

There are some restrictions on the pickling of class instances.

First of all, the class must be defined at the top level in a module.
Furthermore, all its instance variables must be picklable.

When a pickled class instance is unpickled, its `__init__()' method is
normally _not_ invoked.  *Note:* This is a deviation from previous
versions of this module; the change was introduced in Python 1.5b2.
The reason for the change is that in many cases it is desirable to have
a constructor that requires arguments; it is a (minor) nuisance to have
to provide a `__getinitargs__()' method.

If it is desirable that the `__init__()' method be called on
unpickling, a class can define a method `__getinitargs__()', which
should return a _tuple_ containing the arguments to be passed to the
class constructor (`__init__()').  This method is called at pickle
time; the tuple it returns is incorporated in the pickle for the
instance.

Classes can further influence how their instances are pickled -- if the
class defines the method `__getstate__()', it is called and the return
state is pickled as the contents for the instance, and if the class
defines the method `__setstate__()', it is called with the unpickled
state.  (Note that these methods can also be used to implement copying
class instances.)  If there is no `__getstate__()' method, the
instance's `__dict__' is pickled.  If there is no `__setstate__()'
method, the pickled object must be a dictionary and its items are
assigned to the new instance's dictionary.  (If a class defines both
`__getstate__()' and `__setstate__()', the state object needn't be a
dictionary -- these methods can do what they want.)  This protocol is
also used by the shallow and deep copying operations defined in the
`copy'  module.

Note that when class instances are pickled, their class's code and data
are not pickled along with them.  Only the instance data are pickled.
This is done on purpose, so you can fix bugs in a class or add methods
and still load objects that were created with an earlier version of the
class.  If you plan to have long-lived objects that will see many
versions of a class, it may be worthwhile to put a version number in
the objects so that suitable conversions can be made by the class's
`__setstate__()' method.

When a class itself is pickled, only its name is pickled -- the class
definition is not pickled, but re-imported by the unpickling process.
Therefore, the restriction that the class must be defined at the top
level in a module applies to pickled classes as well.

The interface can be summarized as follows.

To pickle an object `x' onto a file `f', open for writing:

     p = pickle.Pickler(f)
     p.dump(x)

A shorthand for this is:

     pickle.dump(x, f)

To unpickle an object `x' from a file `f', open for reading:

     u = pickle.Unpickler(f)
     x = u.load()

A shorthand is:

     x = pickle.load(f)

The `Pickler' class only calls the method `f.write()' with a string
argument.  The `Unpickler' calls the methods `f.read()' (with an
integer argument) and `f.readline()' (without argument), both returning
a string.  It is explicitly allowed to pass non-file objects here, as
long as they have the right methods.

The constructor for the `Pickler' class has an optional second
argument, BIN.  If this is present and true, the binary pickle format
is used; if it is absent or false, the (less efficient, but backwards
compatible) text pickle format is used.  The `Unpickler' class does not
have an argument to distinguish between binary and text pickle formats;
it accepts either format.

The following types can be pickled:

   * `None'

   * integers, long integers, floating point numbers

   * normal and Unicode strings

   * tuples, lists and dictionaries containing only picklable objects

   * functions defined at the top level of a module (by name reference,
     not storage of the implementation)

   * built-in functions

   * classes that are defined at the top level in a module

   * instances of such classes whose `__dict__' or `__setstate__()' is
     picklable


Attempts to pickle unpicklable objects will raise the `PicklingError'
exception; when this happens, an unspecified number of bytes may have
been written to the file.

It is possible to make multiple calls to the `dump()' method of the
same `Pickler' instance.  These must then be matched to the same number
of calls to the `load()' method of the corresponding `Unpickler'
instance.  If the same object is pickled by multiple `dump()' calls,
the `load()' will all yield references to the same object.  _Warning_:
this is intended for pickling multiple objects without intervening
modifications to the objects or their parts.  If you modify an object
and then pickle it again using the same `Pickler' instance, the object
is not pickled again -- a reference to it is pickled and the
`Unpickler' will return the old value, not the modified one.  (There
are two problems here: (a) detecting changes, and (b) marshalling a
minimal set of changes.  I have no answers.  Garbage Collection may
also become a problem here.)

Apart from the `Pickler' and `Unpickler' classes, the module defines
the following functions, and an exception:

`dump(object, file[, bin])'
     Write a pickled representation of OBJECT to the open file object
     FILE.  This is equivalent to `Pickler(FILE, BIN).dump(OBJECT)'.
     If the optional BIN argument is present and nonzero, the binary
     pickle format is used; if it is zero or absent, the (less
     efficient) text pickle format is used.

`load(file)'
     Read a pickled object from the open file object FILE.  This is
     equivalent to `Unpickler(FILE).load()'.

`dumps(object[, bin])'
     Return the pickled representation of the object as a string,
     instead of writing it to a file.  If the optional BIN argument is
     present and nonzero, the binary pickle format is used; if it is
     zero or absent, the (less efficient) text pickle format is used.

`loads(string)'
     Read a pickled object from a string instead of a file.  Characters
     in the string past the pickled object's representation are ignored.

`PicklingError'
     This exception is raised when an unpicklable object is passed to
     `Pickler.dump()'.

See also:
     Note: copy_reg Pickle interface constructor registration for
     extension types.

     Note: shelve Indexed databases of objects; uses `pickle'.

     Note: copy Shallow and deep object copying.

     Note: marshal High-performance serialization of built-in types.

Example 3

automatically generated by info2www version 1.2.2.9