123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348 |
- @c This is part of the paxutils manual.
- @c Copyright (C) 2006--2023 Free Software Foundation, Inc.
- @c This file is distributed under GFDL 1.1 or any later version
- @c published by the Free Software Foundation.
- @menu
- * Standard:: Basic Tar Format
- * Extensions:: @acronym{GNU} Extensions to the Archive Format
- * Sparse Formats:: Storing Sparse Files
- * Snapshot Files::
- * Dumpdir::
- @end menu
- @node Standard
- @unnumberedsec Basic Tar Format
- @UNREVISED{}
- While an archive may contain many files, the archive itself is a
- single ordinary file. Like any other file, an archive file can be
- written to a storage device such as a tape or disk, sent through a
- pipe or over a network, saved on the active file system, or even
- stored in another archive. An archive file is not easy to read or
- manipulate without using the @command{tar} utility or Tar mode in
- @acronym{GNU} Emacs.
- Physically, an archive consists of a series of file entries terminated
- by an end-of-archive entry, which consists of two 512 blocks of zero
- bytes. A file
- entry usually describes one of the files in the archive (an
- @dfn{archive member}), and consists of a file header and the contents
- of the file. File headers contain file names and statistics, checksum
- information which @command{tar} uses to detect file corruption, and
- information about file types.
- Archives are permitted to have more than one member with the same
- member name. One way this situation can occur is if more than one
- version of a file has been stored in the archive. For information
- about adding new versions of a file to an archive, see @ref{update}.
- In addition to entries describing archive members, an archive may
- contain entries which @command{tar} itself uses to store information.
- @xref{label}, for an example of such an archive entry.
- A @command{tar} archive file contains a series of blocks. Each block
- contains @code{BLOCKSIZE} bytes. Although this format may be thought
- of as being on magnetic tape, other media are often used.
- Each file archived is represented by a header block which describes
- the file, followed by zero or more blocks which give the contents
- of the file. At the end of the archive file there are two 512-byte blocks
- filled with binary zeros as an end-of-file marker. A reasonable system
- should write such end-of-file marker at the end of an archive, but
- must not assume that such a block exists when reading an archive. In
- particular, @GNUTAR{} does not treat missing end-of-file marker as an
- error and silently ignores the fact. You can instruct it to issue
- a warning, however, by using the @option{--warning=missing-zero-blocks}
- option (@pxref{General Warnings, missing-zero-blocks}).
- The blocks may be @dfn{blocked} for physical I/O operations.
- Each record of @var{n} blocks (where @var{n} is set by the
- @option{--blocking-factor=@var{512-size}} (@option{-b @var{512-size}}) option to @command{tar}) is written with a single
- @w{@samp{write ()}} operation. On magnetic tapes, the result of
- such a write is a single record. When writing an archive,
- the last record of blocks should be written at the full size, with
- blocks after the zero block containing all zeros. When reading
- an archive, a reasonable system should properly handle an archive
- whose last record is shorter than the rest, or which contains garbage
- records after a zero block.
- The header block is defined in C as follows. In the @GNUTAR{}
- distribution, this is part of file @file{src/tar.h}:
- @smallexample
- @include header.texi
- @end smallexample
- All characters in header blocks are represented by using 8-bit
- characters in the local variant of ASCII. Each field within the
- structure is contiguous; that is, there is no padding used within
- the structure. Each character on the archive medium is stored
- contiguously.
- Bytes representing the contents of files (after the header block
- of each file) are not translated in any way and are not constrained
- to represent characters in any character set. The @command{tar} format
- does not distinguish text files from binary files, and no translation
- of file contents is performed.
- The @code{name}, @code{linkname}, @code{magic}, @code{uname}, and
- @code{gname} are null-terminated character strings. All other fields
- are zero-filled octal numbers in ASCII. Each numeric field of width
- @var{w} contains @var{w} minus 1 digits, and a null.
- (In the extended @acronym{GNU} format, the numeric fields can take
- other forms.)
- The @code{name} field is the file name of the file, with directory names
- (if any) preceding the file name, separated by slashes.
- @FIXME{how big a name before field overflows?}
- The @code{mode} field provides nine bits specifying file permissions
- and three bits to specify the Set @acronym{UID}, Set @acronym{GID}, and Save Text
- (@dfn{sticky}) modes. Values for these bits are defined above.
- When special permissions are required to create a file with a given
- mode, and the user restoring files from the archive does not hold such
- permissions, the mode bit(s) specifying those special permissions
- are ignored. Modes which are not supported by the operating system
- restoring files from the archive will be ignored. Unsupported modes
- should be faked up when creating or updating an archive; e.g., the
- group permission could be copied from the @emph{other} permission.
- The @code{uid} and @code{gid} fields are the numeric user and group
- @acronym{ID} of the file owners, respectively. If the operating system does
- not support numeric user or group @acronym{ID}s, these fields should
- be ignored.
- The @code{size} field is the size of the file in bytes; for archive
- members that are symbolic or hard links to another file, this field
- is specified as zero.
- The @code{mtime} field represents the data modification time of the file at
- the time it was archived. It represents the integer number of
- seconds since January 1, 1970, 00:00 Coordinated Universal Time.
- The @code{chksum} field represents
- the simple sum of all bytes in the header block. Each 8-bit
- byte in the header is added to an unsigned integer, initialized to
- zero, the precision of which shall be no less than seventeen bits.
- When calculating the checksum, the @code{chksum} field is treated as
- if it were filled with spaces (ASCII 32).
- The @code{typeflag} field specifies the type of file archived. If a
- particular implementation does not recognize or permit the specified
- type, the file will be extracted as if it were a regular file. As this
- action occurs, @command{tar} issues a warning to the standard error.
- The @code{atime} and @code{ctime} fields are used in making incremental
- backups; they store, respectively, the particular file's access and
- status change times.
- The @code{offset} is used by the @option{--multi-volume} (@option{-M}) option, when
- making a multi-volume archive. The offset is number of bytes into
- the file that we need to restart at to continue the file on the next
- tape, i.e., where we store the location that a continued file is
- continued at.
- The following fields were added to deal with sparse files. A file
- is @dfn{sparse} if it takes in unallocated blocks which end up being
- represented as zeros, i.e., no useful data. A test to see if a file
- is sparse is to look at the number blocks allocated for it versus the
- number of characters in the file; if there are fewer blocks allocated
- for the file than would normally be allocated for a file of that
- size, then the file is sparse. This is the method @command{tar} uses to
- detect a sparse file, and once such a file is detected, it is treated
- differently from non-sparse files.
- Sparse files are often @code{dbm} files, or other database-type files
- which have data at some points and emptiness in the greater part of
- the file. Such files can appear to be very large when an @samp{ls
- -l} is done on them, when in truth, there may be a very small amount
- of important data contained in the file. It is thus undesirable
- to have @command{tar} think that it must back up this entire file, as
- great quantities of room are wasted on empty blocks, which can lead
- to running out of room on a tape far earlier than is necessary.
- Thus, sparse files are dealt with so that these empty blocks are
- not written to the tape. Instead, what is written to the tape is a
- description, of sorts, of the sparse file: where the holes are, how
- big the holes are, and how much data is found at the end of the hole.
- This way, the file takes up potentially far less room on the tape,
- and when the file is extracted later on, it will look exactly the way
- it looked beforehand. The following is a description of the fields
- used to handle a sparse file:
- The @code{sp} is an array of @code{struct sparse}. Each @code{struct
- sparse} contains two 12-character strings which represent an offset
- into the file and a number of bytes to be written at that offset.
- The offset is absolute, and not relative to the offset in preceding
- array element.
- The header can hold four of these @code{struct sparse} at the moment;
- if more are needed, they are not stored in the header.
- The @code{isextended} flag is set when an @code{extended_header}
- is needed to deal with a file. Note that this means that this flag
- can only be set when dealing with a sparse file, and it is only set
- in the event that the description of the file will not fit in the
- allotted room for sparse structures in the header. In other words,
- an extended_header is needed.
- The @code{extended_header} structure is used for sparse files which
- need more sparse structures than can fit in the header. The header can
- fit 4 such structures; if more are needed, the flag @code{isextended}
- gets set and the next block is an @code{extended_header}.
- Each @code{extended_header} structure contains an array of 21
- sparse structures, along with a similar @code{isextended} flag
- that the header had. There can be an indeterminate number of such
- @code{extended_header}s to describe a sparse file.
- @table @asis
- @item @code{REGTYPE}
- @itemx @code{AREGTYPE}
- These flags represent a regular file. In order to be compatible
- with older versions of @command{tar}, a @code{typeflag} value of
- @code{AREGTYPE} should be silently recognized as a regular file.
- New archives should be created using @code{REGTYPE}. Also, for
- backward compatibility, @command{tar} treats a regular file whose name
- ends with a slash as a directory.
- @item @code{LNKTYPE}
- This flag represents a file linked to another file, of any type,
- previously archived. Such files are identified in Unix by each
- file having the same device and inode number. The linked-to name is
- specified in the @code{linkname} field with a trailing null.
- @item @code{SYMTYPE}
- This represents a symbolic link to another file. The linked-to name
- is specified in the @code{linkname} field with a trailing null.
- @item @code{CHRTYPE}
- @itemx @code{BLKTYPE}
- These represent character special files and block special files
- respectively. In this case the @code{devmajor} and @code{devminor}
- fields will contain the major and minor device numbers respectively.
- Operating systems may map the device specifications to their own
- local specification, or may ignore the entry.
- @item @code{DIRTYPE}
- This flag specifies a directory or sub-directory. The directory
- name in the @code{name} field should end with a slash. On systems where
- disk allocation is performed on a directory basis, the @code{size} field
- will contain the maximum number of bytes (which may be rounded to
- the nearest disk block allocation unit) which the directory may
- hold. A @code{size} field of zero indicates no such limiting. Systems
- which do not support limiting in this manner should ignore the
- @code{size} field.
- @item @code{FIFOTYPE}
- This specifies a FIFO special file. Note that the archiving of a
- FIFO file archives the existence of this file and not its contents.
- @item @code{CONTTYPE}
- This specifies a contiguous file, which is the same as a normal
- file except that, in operating systems which support it, all its
- space is allocated contiguously on the disk. Operating systems
- which do not allow contiguous allocation should silently treat this
- type as a normal file.
- @item @code{A} @dots{} @code{Z}
- These are reserved for custom implementations. Some of these are
- used in the @acronym{GNU} modified format, as described below.
- @end table
- Other values are reserved for specification in future revisions of
- the P1003 standard, and should not be used by any @command{tar} program.
- The @code{magic} field indicates that this archive was output in
- the P1003 archive format. If this field contains @code{TMAGIC},
- the @code{uname} and @code{gname} fields will contain the ASCII
- representation of the owner and group of the file respectively.
- If found, the user and group @acronym{ID}s are used rather than the values in
- the @code{uid} and @code{gid} fields.
- For references, see ISO/IEC 9945-1:1990 or IEEE Std 1003.1-1990, pages
- 169-173 (section 10.1) for @cite{Archive/Interchange File Format}; and
- IEEE Std 1003.2-1992, pages 380-388 (section 4.48) and pages 936-940
- (section E.4.48) for @cite{pax - Portable archive interchange}.
- @node Extensions
- @unnumberedsec @acronym{GNU} Extensions to the Archive Format
- @UNREVISED{}
- The @acronym{GNU} format uses additional file types to describe new types of
- files in an archive. These are listed below.
- @table @code
- @item GNUTYPE_DUMPDIR
- @itemx 'D'
- This represents a directory and a list of files created by the
- @option{--incremental} (@option{-G}) option. The @code{size} field gives the total
- size of the associated list of files. Each file name is preceded by
- either a @samp{Y} (the file should be in this archive) or an @samp{N}.
- (The file is a directory, or is not stored in the archive.) Each file
- name is terminated by a null. There is an additional null after the
- last file name.
- @item GNUTYPE_MULTIVOL
- @itemx 'M'
- This represents a file continued from another volume of a multi-volume
- archive created with the @option{--multi-volume} (@option{-M}) option. The original
- type of the file is not given here. The @code{size} field gives the
- maximum size of this piece of the file (assuming the volume does
- not end before the file is written out). The @code{offset} field
- gives the offset from the beginning of the file where this part of
- the file begins. Thus @code{size} plus @code{offset} should equal
- the original size of the file.
- @item GNUTYPE_SPARSE
- @itemx 'S'
- This flag indicates that we are dealing with a sparse file. Note
- that archiving a sparse file requires special operations to find
- holes in the file, which mark the positions of these holes, along
- with the number of bytes of data to be found after the hole.
- @item GNUTYPE_VOLHDR
- @itemx 'V'
- This file type is used to mark the volume header that was given with
- the @option{--label=@var{archive-label}} (@option{-V @var{archive-label}}) option when the archive was created. The @code{name}
- field contains the @code{name} given after the @option{--label=@var{archive-label}} (@option{-V @var{archive-label}}) option.
- The @code{size} field is zero. Only the first file in each volume
- of an archive should have this type.
- @end table
- For fields containing numbers or timestamps that are out of range for
- the basic format, the @acronym{GNU} format uses a base-256
- representation instead of an ASCII octal number. If the leading byte
- is 0xff (255), all the bytes of the field (including the leading byte)
- are concatenated in big-endian order, with the result being a negative
- number expressed in two's complement form. If the leading byte is
- 0x80 (128), the non-leading bytes of the field are concatenated in
- big-endian order, with the result being a positive number expressed in
- binary form. Leading bytes other than 0xff, 0x80 and ASCII octal
- digits are reserved for future use, as are base-256 representations of
- values that would be in range for the basic format.
- You may have trouble reading a @acronym{GNU} format archive on a
- non-@acronym{GNU} system if the options @option{--incremental} (@option{-G}),
- @option{--multi-volume} (@option{-M}), @option{--sparse} (@option{-S}), or @option{--label=@var{archive-label}} (@option{-V @var{archive-label}}) were
- used when writing the archive. In general, if @command{tar} does not
- use the @acronym{GNU}-added fields of the header, other versions of
- @command{tar} should be able to read the archive. Otherwise, the
- @command{tar} program will give an error, the most likely one being a
- checksum error.
- @node Sparse Formats
- @unnumberedsec Storing Sparse Files
- @include sparse.texi
- @node Snapshot Files
- @unnumberedsec Format of the Incremental Snapshot Files
- @include snapshot.texi
- @node Dumpdir
- @unnumberedsec Dumpdir
- @include dumpdir.texi
|