GNU Info

Info Node: (tar.info)posix

(tar.info)posix


Next: Checksumming Prev: old Up: Portability
Enter node , (file) or (file)node

GNU `tar' and POSIX `tar'
-------------------------

   GNU `tar' was based on an early draft of the POSIX 1003.1 `ustar'
standard.  GNU extensions to `tar', such as the support for file names
longer than 100 characters, use portions of the `tar' header record
which were specified in that POSIX draft as unused.  Subsequent changes
in POSIX have allocated the same parts of the header record for other
purposes.  As a result, GNU `tar' is incompatible with the current
POSIX spec, and with `tar' programs that follow it.

   We plan to reimplement these GNU extensions in a new way which is
upward compatible with the latest POSIX `tar' format, but we don't know
when this will be done.

   In the mean time, there is simply no telling what might happen if you
read a GNU `tar' archive, which uses the GNU extensions, using some
other `tar' program.  So if you want to read the archive with another
`tar' program, be sure to write it using the `--old-archive' option
(`-o').

   Traditionally, old `tar's have a limit of 100 characters.  GNU `tar'
attempted two different approaches to overcome this limit, using and
extending a format specified by a draft of some P1003.1.  The first way
was not that successful, and involved `@MaNgLeD@' file names, or such;
while a second approach used `././@LongLink' and other tricks, yielding
better success.  In theory, GNU `tar' should be able to handle file
names of practically unlimited length.  So, if GNU `tar' fails to dump
and retrieve files having more than 100 characters, then there is a bug
in GNU `tar', indeed.

   But, being strictly POSIX, the limit was still 100 characters.  For
various other purposes, GNU `tar' used areas left unassigned in the
POSIX draft.  POSIX later revised P1003.1 `ustar' format by assigning
previously unused header fields, in such a way that the upper limit for
file name length was raised to 256 characters.  However, the actual
POSIX limit oscillates between 100 and 256, depending on the precise
location of slashes in full file name (this is rather ugly).  Since GNU
`tar' use the same fields for quite other purposes, it became
incompatible with the latest POSIX standards.

   For longer or non-fitting file names, we plan to use yet another set
of GNU extensions, but this time, complying with the provisions POSIX
offers for extending the format, rather than conflicting with it.
Whenever an archive uses old GNU `tar' extension format or POSIX
extensions, would it be for very long file names or other specialities,
this archive becomes non-portable to other `tar' implementations.  In
fact, anything can happen.  The most forgiving `tar's will merely
unpack the file using a wrong name, and maybe create another file named
something like `@LongName', with the true file name in it.  `tar's not
protecting themselves may segment violate!

   Compatibility concerns make all this thing more difficult, as we
will have to support _all_ these things together, for a while.  GNU
`tar' should be able to produce and read true POSIX format files, while
being able to detect old GNU `tar' formats, besides old V7 format, and
process them conveniently.  It would take years before this whole area
stabilizes...

   There are plans to raise this 100 limit to 256, and yet produce POSIX
conforming archives.  Past 256, I do not know yet if GNU `tar' will go
non-POSIX again, or merely refuse to archive the file.

   There are plans so GNU `tar' support more fully the latest POSIX
format, while being able to read old V7 format, GNU (semi-POSIX plus
extension), as well as full POSIX.  One may ask if there is part of the
POSIX format that we still cannot support.  This simple question has a
complex answer.  Maybe that, on intimate look, some strong limitations
will pop up, but until now, nothing sounds too difficult (but see
below).  I only have these few pages of POSIX telling about "Extended
tar Format" (P1003.1-1990 - section 10.1.1), and there are references
to other parts of the standard I do not have, which should normally
enforce limitations on stored file names (I suspect things like fixing
what `/' and `<NUL>' means).  There are also some points which the
standard does not make clear, Existing practice will then drive what I
should do.

   POSIX mandates that, when a file name cannot fit within 100 to 256
characters (the variance comes from the fact a `/' is ideally needed as
the 156'th character), or a link name cannot fit within 100 characters,
a warning should be issued and the file _not_ be stored.  Unless some
`--posix' option is given (or `POSIXLY_CORRECT' is set), I suspect that
GNU `tar' should disobey this specification, and automatically switch
to using GNU extensions to overcome file name or link name length
limitations.

   There is a problem, however, which I did not intimately studied yet.
Given a truly POSIX archive with names having more than 100 characters,
I guess that GNU `tar' up to 1.11.8 will process it as if it were an
old V7 archive, and be fooled by some fields which are coded
differently.  So, the question is to decide if the next generation of
GNU `tar' should produce POSIX format by default, whenever possible,
producing archives older versions of GNU `tar' might not be able to read
correctly.  I fear that we will have to suffer such a choice one of
these days, if we want GNU `tar' to go closer to POSIX.  We can rush it.
Another possibility is to produce the current GNU `tar' format by
default for a few years, but have GNU `tar' versions from some 1.POSIX
and up able to recognize all three formats, and let older GNU `tar'
fade out slowly.  Then, we could switch to producing POSIX format by
default, with not much harm to those still having (very old at that
time) GNU `tar' versions prior to 1.POSIX.

   POSIX format cannot represent very long names, volume headers,
splitting of files in multi-volumes, sparse files, and incremental
dumps; these would be all disallowed if `--posix' or `POSIXLY_CORRECT'.
Otherwise, if `tar' is given long names, or `-[VMSgG]', then it should
automatically go non-POSIX.  I think this is easily granted without
much discussion.

   Another point is that only `mtime' is stored in POSIX archives,
while GNU `tar' currently also store `atime' and `ctime'.  If we want
GNU `tar' to go closer to POSIX, my choice would be to drop `atime' and
`ctime' support on average.  On the other hand, I perceive that full
dumps or incremental dumps need `atime' and `ctime' support, so for
those special applications, POSIX has to be avoided altogether.

   A few users requested that `--sparse' (`-S') be always active by
default, I think that before replying to them, we have to decide if we
want GNU `tar' to go closer to POSIX on average, while producing files.
My choice would be to go closer to POSIX in the long run.  Besides
possible double reading, I do not see any point of not trying to save
files as sparse when creating archives which are neither POSIX nor
old-V7, so the actual `--sparse' (`-S') would become selected by
default when producing such archives, whatever the reason is.  So,
`--sparse' (`-S') alone might be redefined to force GNU-format
archives, and recover its previous meaning from this fact.

   GNU-format as it exists now can easily fool other POSIX `tar', as it
uses fields which POSIX considers to be part of the file name prefix.
I wonder if it would not be a good idea, in the long run, to try
changing GNU-format so any added field (like `ctime', `atime', file
offset in subsequent volumes, or sparse file descriptions) be wholly
and always pushed into an extension block, instead of using space in
the POSIX header block.  I could manage to do that portably between
future GNU `tar's.  So other POSIX `tar's might be at least able to
provide kind of correct listings for the archives produced by GNU
`tar', if not able to process them otherwise.

   Using these projected extensions might induce older `tar's to fail.
We would use the same approach as for POSIX.  I'll put out a `tar'
capable of reading POSIXier, yet extended archives, but will not produce
this format by default, in GNU mode.  In a few years, when newer GNU
`tar's will have flooded out `tar' 1.11.X and previous, we could switch
to producing POSIXier extended archives, with no real harm to users, as
almost all existing GNU `tar's will be ready to read POSIXier format.
In fact, I'll do both changes at the same time, in a few years, and
just prepare `tar' for both changes, without effecting them, from
1.POSIX.  (Both changes: 1--using POSIX convention for getting over 100
characters; 2--avoiding mangling POSIX headers for GNU extensions,
using only POSIX mandated extension techniques).

   So, a future `tar' will have a `--posix' flag forcing the usage of
truly POSIX headers, and so, producing archives previous GNU `tar' will
not be able to read.  So, _once_ pretest will announce that feature, it
would be particularly useful that users test how exchangeable will be
archives between GNU `tar' with `--posix' and other POSIX `tar'.

   In a few years, when GNU `tar' will produce POSIX headers by
default, `--posix' will have a strong meaning and will disallow GNU
extensions.  But in the meantime, for a long while, `--posix' in GNU
tar will not disallow GNU extensions like `--label=ARCHIVE-LABEL' (`-V
ARCHIVE-LABEL'), `--multi-volume' (`-M'), `--sparse' (`-S'), or very
long file or link names.  However, `--posix' with GNU extensions will
use POSIX headers with reserved-for-users extensions to headers, and I
will be curious to know how well or bad POSIX `tar's will react to
these.

   GNU `tar' prior to 1.POSIX, and after 1.POSIX without `--posix',
generates and checks `ustar  ', with two suffixed spaces.  This is
sufficient for older GNU `tar' not to recognize POSIX archives, and
consequently, wrongly decide those archives are in old V7 format.  It
is a useful bug for me, because GNU `tar' has other POSIX
incompatibilities, and I need to segregate GNU `tar' semi-POSIX
archives from truly POSIX archives, for GNU `tar' should be somewhat
compatible with itself, while migrating closer to latest POSIX
standards.  So, I'll be very careful about how and when I will do the
correction.


automatically generated by info2www version 1.2.2.9