This document describes the implementation of an architecture dependent binary
cache file. The goal of this cache file is two fold, firstly to speed loading
and processing of the package file array and secondly to reduce memory
consumption of the package file array.
The implementation is aimed at an environment with many primary package files,
for instance someone that has a Package file for their CD-ROM, a Package file
for the latest version of the distribution on the CD-ROM and a package file for
the development version. Always present is the information contained in the
status file which might be considered a separate package file.
Please understand, this is designed as a -CACHE FILE- it is not meant to be
used on any system other than the one it was created for. It is not meant to
be authoritative either, i.e. if a system crash or software failure occurs it
must be perfectly acceptable for the cache file to be in an inconsistent state.
Furthermore at any time the cache file may be erased without losing any
information.
Also the structures and storage layout is optimized for use by the APT GUI and
may not be suitable for all purposes. However it should be possible to extend
it with associate cache files that contain other information.
To keep memory use down the cache file only contains often used fields and
fields that are inexpensive to store, the Package file has a full list of
fields. Also the client may assume that all items are perfectly valid and need
not perform checks against their correctness. Removal of information from the
cache is possible, but blanks will be left in the file, and unused strings will
also be present. The recommended implementation is to simply rebuild the cache
each time any of the data files change. It is possible to add a new package
file to the cache without any negative side effects.
Every item in every structure is stored as the index to that structure. What
this means is that once the files is mmaped every data access has to go through
a fixup stage to get a real memory pointer. This is done by taking the index,
multiplying it by the type size and then adding it to the start address of the
memory block. This sounds complex, but in C it is a single array dereference.
Because all items are aligned to their size and indexes are stored as multiples
of the size of the structure the format is immediately portable to all possible
architectures - BUT the generated files are -NOT-.
Notice the lack of casting or multiplication. The net result is to return the
name of the first package in the first hash bucket, without error checks.
The generator uses allocation pools to group similarly sized structures in
large blocks to eliminate any alignment overhead. The generator also assures
that no structures overlap and all indexes are unique. Although at first
glance it may seem like there is the potential for two structures to exist at
the same point the generator never allows this to happen. (See the discussion
of free space pools)