GNU Info

Info Node: (tar.info)cpio

(tar.info)cpio


Prev: Extensions Up: Formats
Enter node , (file) or (file)node

Comparison of `tar' and `cpio'
==============================

     _(This message will disappear, once this node revised.)_

   The `cpio' archive formats, like `tar', do have maximum pathname
lengths.  The binary and old ASCII formats have a max path length of
256, and the new ASCII and CRC ASCII formats have a max path length of
1024.  GNU `cpio' can read and write archives with arbitrary pathname
lengths, but other `cpio' implementations may crash unexplainedly
trying to read them.

   `tar' handles symbolic links in the form in which it comes in BSD;
`cpio' doesn't handle symbolic links in the form in which it comes in
System V prior to SVR4, and some vendors may have added symlinks to
their system without enhancing `cpio' to know about them.  Others may
have enhanced it in a way other than the way I did it at Sun, and which
was adopted by AT&T (and which is, I think, also present in the `cpio'
that Berkeley picked up from AT&T and put into a later BSD release--I
think I gave them my changes).

   (SVR4 does some funny stuff with `tar'; basically, its `cpio' can
handle `tar' format input, and write it on output, and it probably
handles symbolic links.  They may not have bothered doing anything to
enhance `tar' as a result.)

   `cpio' handles special files; traditional `tar' doesn't.

   `tar' comes with V7, System III, System V, and BSD source; `cpio'
comes only with System III, System V, and later BSD (4.3-tahoe and
later).

   `tar''s way of handling multiple hard links to a file can handle
file systems that support 32-bit inumbers (e.g., the BSD file system);
`cpio's way requires you to play some games (in its "binary" format,
i-numbers are only 16 bits, and in its "portable ASCII" format, they're
18 bits--it would have to play games with the "file system ID" field of
the header to make sure that the file system ID/i-number pairs of
different files were always different), and I don't know which `cpio's,
if any, play those games.  Those that don't might get confused and
think two files are the same file when they're not, and make hard links
between them.

   `tar's way of handling multiple hard links to a file places only one
copy of the link on the tape, but the name attached to that copy is the
_only_ one you can use to retrieve the file; `cpio's way puts one copy
for every link, but you can retrieve it using any of the names.

     What type of check sum (if any) is used, and how is this
     calculated.

   See the attached manual pages for `tar' and `cpio' format.  `tar'
uses a checksum which is the sum of all the bytes in the `tar' header
for a file; `cpio' uses no checksum.

     If anyone knows why `cpio' was made when `tar' was present at the
     unix scene,

   It wasn't.  `cpio' first showed up in PWB/UNIX 1.0; no
generally-available version of UNIX had `tar' at the time.  I don't
know whether any version that was generally available _within AT&T_ had
`tar', or, if so, whether the people within AT&T who did `cpio' knew
about it.

   On restore, if there is a corruption on a tape `tar' will stop at
that point, while `cpio' will skip over it and try to restore the rest
of the files.

   The main difference is just in the command syntax and header format.

   `tar' is a little more tape-oriented in that everything is blocked
to start on a record boundary.

     Is there any differences between the ability to recover crashed
     archives between the two of them. (Is there any chance of
     recovering crashed archives at all.)

   Theoretically it should be easier under `tar' since the blocking
lets you find a header with some variation of `dd skip=NN'.  However,
modern `cpio''s and variations have an option to just search for the
next file header after an error with a reasonable chance of resyncing.
However, lots of tape driver software won't allow you to continue past
a media error which should be the only reason for getting out of sync
unless a file changed sizes while you were writing the archive.

     If anyone knows why `cpio' was made when `tar' was present at the
     unix scene, please tell me about this too.

   Probably because it is more media efficient (by not blocking
everything and using only the space needed for the headers where `tar'
always uses 512 bytes per file header) and it knows how to archive
special files.

   You might want to look at the freely available alternatives.  The
major ones are `afio', GNU `tar', and `pax', each of which have their
own extensions with some backwards compatibility.

   Sparse files were `tar'red as sparse files (which you can easily
test, because the resulting archive gets smaller, and GNU `cpio' can no
longer read it).


automatically generated by info2www version 1.2.2.9