GNU Info

Info Node: (tar.info)sparse

(tar.info)sparse


Prev: gzip Up: Compression
Enter node , (file) or (file)node

Archiving Sparse Files
----------------------

     _(This message will disappear, once this node revised.)_

`-S'
`--sparse'
     Handle sparse files efficiently.

   This option causes all files to be put in the archive to be tested
for sparseness, and handled specially if they are.  The `--sparse'
(`-S') option is useful when many `dbm' files, for example, are being
backed up.  Using this option dramatically decreases the amount of
space needed to store such a file.

   In later versions, this option may be removed, and the testing and
treatment of sparse files may be done automatically with any special
GNU options.  For now, it is an option needing to be specified on the
command line with the creation or updating of an archive.

   Files in the filesystem occasionally have "holes."  A hole in a file
is a section of the file's contents which was never written.  The
contents of a hole read as all zeros.  On many operating systems,
actual disk storage is not allocated for holes, but they are counted in
the length of the file.  If you archive such a file, `tar' could create
an archive longer than the original.  To have `tar' attempt to
recognize the holes in a file, use `--sparse' (`-S').  When you use the
`--sparse' (`-S') option, then, for any file using less disk space than
would be expected from its length, `tar' searches the file for
consecutive stretches of zeros.  It then records in the archive for the
file where the consecutive stretches of zeros are, and only archives
the "real contents" of the file.  On extraction (using `--sparse'
(`-S') is not needed on extraction) any such files have hols created
wherever the continuous stretches of zeros were found.  Thus, if you
use `--sparse' (`-S'), `tar' archives won't take more space than the
original.

   A file is sparse if it contains blocks of zeros whose existence is
recorded, but that have no space allocated on disk.  When you specify
the `--sparse' (`-S') option in conjunction with the `--create' (`-c')
operation, `tar' tests all files for sparseness while archiving.  If
`tar' finds a file to be sparse, it uses a sparse representation of the
file in the archive.  Note: create, for more information about
creating archives.

   `--sparse' (`-S') is useful when archiving files, such as dbm files,
likely to contain many nulls.  This option dramatically decreases the
amount of space needed to store such an archive.

     *Please Note:* Always use `--sparse' (`-S') when performing file
     system backups, to avoid archiving the expanded forms of files
     stored sparsely in the system.

     Even if your system has no sparse files currently, some may be
     created in the future.  If you use `--sparse' (`-S') while making
     file system backups as a matter of course, you can be assured the
     archive will never take more space on the media than the files
     take on disk (otherwise, archiving a disk filled with sparse files
     might take hundreds of tapes).

   `tar' ignores the `--sparse' (`-S') option when reading an archive.

`--sparse'
`-S'
     Files stored sparsely in the file system are represented sparsely
     in the archive.  Use in conjunction with write operations.

   However, users should be well aware that at archive creation time,
GNU `tar' still has to read whole disk file to locate the "holes", and
so, even if sparse files use little space on disk and in the archive,
they may sometimes require inordinate amount of time for reading and
examining all-zero blocks of a file.  Although it works, it's painfully
slow for a large (sparse) file, even though the resulting tar archive
may be small.  (One user reports that dumping a `core' file of over 400
megabytes, but with only about 3 megabytes of actual data, took about 9
minutes on a Sun Sparcstation ELC, with full CPU utilization.)

   This reading is required in all cases and is not related to the fact
the `--sparse' (`-S') option is used or not, so by merely _not_ using
the option, you are not saving time(1).

   Programs like `dump' do not have to read the entire file; by
examining the file system directly, they can determine in advance
exactly where the holes are and thus avoid reading through them.  The
only data it need read are the actual allocated data blocks.  GNU `tar'
uses a more portable and straightforward archiving approach, it would
be fairly difficult that it does otherwise.  Elizabeth Zwicky writes to
`comp.unix.internals', on 1990-12-10:

     What I did say is that you cannot tell the difference between a
     hole and an equivalent number of nulls without reading raw blocks.
     `st_blocks' at best tells you how many holes there are; it
     doesn't tell you _where_.  Just as programs may, conceivably, care
     what `st_blocks' is (care to name one that does?), they may also
     care where the holes are (I have no examples of this one either,
     but it's equally imaginable).

     I conclude from this that good archivers are not portable.  One can
     arguably conclude that if you want a portable program, you can in
     good conscience restore files with as many holes as possible,
     since you can't get it right.

   ---------- Footnotes ----------

   (1) Well!  We should say the whole truth, here.  When `--sparse'
(`-S') is selected while creating an archive, the current `tar'
algorithm requires sparse files to be read twice, not once.  We hope to
develop a new archive format for saving sparse files in which one pass
will be sufficient.


automatically generated by info2www version 1.2.2.9