APT Cache File Format
Chapter 1 Introduction

1.1 Purpose

This document describes the implementation of an architecture dependent binary cache file. The goal of this cache file is two fold, firstly to speed loading and processing of the package file array and secondly to reduce memory consumption of the package file array.

The implementation is aimed at an environment with many primary package files, for instance someone that has a Package file for their CD-ROM, a Package file for the latest version of the distribution on the CD-ROM and a package file for the development version. Always present is the information contained in the status file which might be considered a separate package file.

Please understand, this is designed as a -CACHE FILE- it is not meant to be used on any system other than the one it was created for. It is not meant to be authoritative either, i.e. if a system crash or software failure occurs it must be perfectly acceptable for the cache file to be in an inconsistent state. Furthermore at any time the cache file may be erased without losing any information.

Also the structures and storage layout is optimized for use by the APT GUI and may not be suitable for all purposes. However it should be possible to extend it with associate cache files that contain other information.

To keep memory use down the cache file only contains often used fields and fields that are inexpensive to store, the Package file has a full list of fields. Also the client may assume that all items are perfectly valid and need not perform checks against their correctness. Removal of information from the cache is possible, but blanks will be left in the file, and unused strings will also be present. The recommended implementation is to simply rebuild the cache each time any of the data files change. It is possible to add a new package file to the cache without any negative side effects.

1.1.1 Note on Pointer access

Every item in every structure is stored as the index to that structure. What this means is that once the files is mmaped every data access has to go through a fixup stage to get a real memory pointer. This is done by taking the index, multiplying it by the type size and then adding it to the start address of the memory block. This sounds complex, but in C it is a single array dereference. Because all items are aligned to their size and indexes are stored as multiples of the size of the structure the format is immediately portable to all possible architectures - BUT the generated files are -NOT-.

This scheme allows code like this to be written:

       void *Map = mmap(...);
       Package *PkgList = (Package *)Map;
       Header *Head = (Header *)Map;
       char *Strings = (char *)Map;
       cout << (Strings + PkgList[Head->HashTable[0]]->Name) << endl;

Notice the lack of casting or multiplication. The net result is to return the name of the first package in the first hash bucket, without error checks.

The generator uses allocation pools to group similarly sized structures in large blocks to eliminate any alignment overhead. The generator also assures that no structures overlap and all indexes are unique. Although at first glance it may seem like there is the potential for two structures to exist at the same point the generator never allows this to happen. (See the discussion of free space pools)

[ Abstract ] [ Copyright Notice ] [ Contents ] [ next ]

APT Cache File Format

$Id: cache.sgml,v 1.9 2001/04/04 05:00:14 jgg Exp $
Jason Gunthorpe jgg@debian.org

APT Cache File Format Chapter 1 Introduction

1.1 Purpose

1.1.1 Note on Pointer access

APT Cache File Format
Chapter 1 Introduction