19 år sedan · 97bfe578ed
--- a/doc/intern.texi
+++ b/doc/intern.texi
@@ -0,0 +1,329 @@
 
				+@c This is part of the paxutils manual.
			
 
				+@c Copyright (C) 2006 Free Software Foundation, Inc.
			
 
				+@c This file is distributed under GFDL 1.1 or any later version
			
 
				+@c published by the Free Software Foundation.
			
 
				+
			
 
				+@menu
			
 
				+* Standard::           Basic Tar Format
			
 
				+* Extensions::         @acronym{GNU} Extensions to the Archive Format
			
 
				+* Snapshot Files::
			
 
				+* Dumpdir::
			
 
				+@end menu
			
 
				+
			
 
				+@node Standard
			
 
				+@unnumberedsec Basic Tar Format
			
 
				+@UNREVISED
			
 
				+
			
 
				+While an archive may contain many files, the archive itself is a
			
 
				+single ordinary file.  Like any other file, an archive file can be
			
 
				+written to a storage device such as a tape or disk, sent through a
			
 
				+pipe or over a network, saved on the active file system, or even
			
 
				+stored in another archive.  An archive file is not easy to read or
			
 
				+manipulate without using the @command{tar} utility or Tar mode in
			
 
				+@acronym{GNU} Emacs.
			
 
				+
			
 
				+Physically, an archive consists of a series of file entries terminated
			
 
				+by an end-of-archive entry, which consists of two 512 blocks of zero
			
 
				+bytes.  A file
			
 
				+entry usually describes one of the files in the archive (an
			
 
				+@dfn{archive member}), and consists of a file header and the contents
			
 
				+of the file.  File headers contain file names and statistics, checksum
			
 
				+information which @command{tar} uses to detect file corruption, and
			
 
				+information about file types.
			
 
				+
			
 
				+Archives are permitted to have more than one member with the same
			
 
				+member name.  One way this situation can occur is if more than one
			
 
				+version of a file has been stored in the archive.  For information
			
 
				+about adding new versions of a file to an archive, see @ref{update}.
			
 
				+@FIXME-xref{To learn more about having more than one archive member with the
			
 
				+same name, see -backup node, when it's written.}
			
 
				+
			
 
				+In addition to entries describing archive members, an archive may
			
 
				+contain entries which @command{tar} itself uses to store information.
			
 
				+@xref{label}, for an example of such an archive entry.
			
 
				+
			
 
				+A @command{tar} archive file contains a series of blocks.  Each block
			
 
				+contains @code{BLOCKSIZE} bytes.  Although this format may be thought
			
 
				+of as being on magnetic tape, other media are often used.
			
 
				+
			
 
				+Each file archived is represented by a header block which describes
			
 
				+the file, followed by zero or more blocks which give the contents
			
 
				+of the file.  At the end of the archive file there are two 512-byte blocks
			
 
				+filled with binary zeros as an end-of-file marker.  A reasonable system
			
 
				+should write such end-of-file marker at the end of an archive, but
			
 
				+must not assume that such a block exists when reading an archive.  In
			
 
				+particular @GNUTAR{} always issues a warning if it does not encounter it.
			
 
				+
			
 
				+The blocks may be @dfn{blocked} for physical I/O operations.
			
 
				+Each record of @var{n} blocks (where @var{n} is set by the
			
 
				+@option{--blocking-factor=@var{512-size}} (@option{-b @var{512-size}}) option to @command{tar}) is written with a single
			
 
				+@w{@samp{write ()}} operation.  On magnetic tapes, the result of
			
 
				+such a write is a single record.  When writing an archive,
			
 
				+the last record of blocks should be written at the full size, with
			
 
				+blocks after the zero block containing all zeros.  When reading
			
 
				+an archive, a reasonable system should properly handle an archive
			
 
				+whose last record is shorter than the rest, or which contains garbage
			
 
				+records after a zero block.
			
 
				+
			
 
				+The header block is defined in C as follows.  In the @GNUTAR{}
			
 
				+distribution, this is part of file @file{src/tar.h}:
			
 
				+
			
 
				+@smallexample
			
 
				+@include header.texi
			
 
				+@end smallexample
			
 
				+
			
 
				+All characters in header blocks are represented by using 8-bit
			
 
				+characters in the local variant of ASCII.  Each field within the
			
 
				+structure is contiguous; that is, there is no padding used within
			
 
				+the structure.  Each character on the archive medium is stored
			
 
				+contiguously.
			
 
				+
			
 
				+Bytes representing the contents of files (after the header block
			
 
				+of each file) are not translated in any way and are not constrained
			
 
				+to represent characters in any character set.  The @command{tar} format
			
 
				+does not distinguish text files from binary files, and no translation
			
 
				+of file contents is performed.
			
 
				+
			
 
				+The @code{name}, @code{linkname}, @code{magic}, @code{uname}, and
			
 
				+@code{gname} are null-terminated character strings.  All other fields
			
 
				+are zero-filled octal numbers in ASCII.  Each numeric field of width
			
 
				+@var{w} contains @var{w} minus 1 digits, and a null.
			
 
				+
			
 
				+The @code{name} field is the file name of the file, with directory names
			
 
				+(if any) preceding the file name, separated by slashes.
			
 
				+
			
 
				+@FIXME{how big a name before field overflows?}
			
 
				+
			
 
				+The @code{mode} field provides nine bits specifying file permissions
			
 
				+and three bits to specify the Set UID, Set GID, and Save Text
			
 
				+(@dfn{sticky}) modes.  Values for these bits are defined above.
			
 
				+When special permissions are required to create a file with a given
			
 
				+mode, and the user restoring files from the archive does not hold such
			
 
				+permissions, the mode bit(s) specifying those special permissions
			
 
				+are ignored.  Modes which are not supported by the operating system
			
 
				+restoring files from the archive will be ignored.  Unsupported modes
			
 
				+should be faked up when creating or updating an archive; e.g., the
			
 
				+group permission could be copied from the @emph{other} permission.
			
 
				+
			
 
				+The @code{uid} and @code{gid} fields are the numeric user and group
			
 
				+ID of the file owners, respectively.  If the operating system does
			
 
				+not support numeric user or group IDs, these fields should be ignored.
			
 
				+
			
 
				+The @code{size} field is the size of the file in bytes; linked files
			
 
				+are archived with this field specified as zero.  @FIXME-xref{Modifiers, in
			
 
				+particular the @option{--incremental} (@option{-G}) option.}
			
 
				+
			
 
				+The @code{mtime} field is the data modification time of the file at
			
 
				+the time it was archived.  It is the ASCII representation of the octal
			
 
				+value of the last time the file's contents were modified, represented
			
 
				+as an integer number of
			
 
				+seconds since January 1, 1970, 00:00 Coordinated Universal Time.
			
 
				+
			
 
				+The @code{chksum} field is the ASCII representation of the octal value
			
 
				+of the simple sum of all bytes in the header block.  Each 8-bit
			
 
				+byte in the header is added to an unsigned integer, initialized to
			
 
				+zero, the precision of which shall be no less than seventeen bits.
			
 
				+When calculating the checksum, the @code{chksum} field is treated as
			
 
				+if it were all blanks.
			
 
				+
			
 
				+The @code{typeflag} field specifies the type of file archived.  If a
			
 
				+particular implementation does not recognize or permit the specified
			
 
				+type, the file will be extracted as if it were a regular file.  As this
			
 
				+action occurs, @command{tar} issues a warning to the standard error.
			
 
				+
			
 
				+The @code{atime} and @code{ctime} fields are used in making incremental
			
 
				+backups; they store, respectively, the particular file's access and
			
 
				+status change times.
			
 
				+
			
 
				+The @code{offset} is used by the @option{--multi-volume} (@option{-M}) option, when
			
 
				+making a multi-volume archive.  The offset is number of bytes into
			
 
				+the file that we need to restart at to continue the file on the next
			
 
				+tape, i.e., where we store the location that a continued file is
			
 
				+continued at.
			
 
				+
			
 
				+The following fields were added to deal with sparse files.  A file
			
 
				+is @dfn{sparse} if it takes in unallocated blocks which end up being
			
 
				+represented as zeros, i.e., no useful data.  A test to see if a file
			
 
				+is sparse is to look at the number blocks allocated for it versus the
			
 
				+number of characters in the file; if there are fewer blocks allocated
			
 
				+for the file than would normally be allocated for a file of that
			
 
				+size, then the file is sparse.  This is the method @command{tar} uses to
			
 
				+detect a sparse file, and once such a file is detected, it is treated
			
 
				+differently from non-sparse files.
			
 
				+
			
 
				+Sparse files are often @code{dbm} files, or other database-type files
			
 
				+which have data at some points and emptiness in the greater part of
			
 
				+the file.  Such files can appear to be very large when an @samp{ls
			
 
				+-l} is done on them, when in truth, there may be a very small amount
			
 
				+of important data contained in the file.  It is thus undesirable
			
 
				+to have @command{tar} think that it must back up this entire file, as
			
 
				+great quantities of room are wasted on empty blocks, which can lead
			
 
				+to running out of room on a tape far earlier than is necessary.
			
 
				+Thus, sparse files are dealt with so that these empty blocks are
			
 
				+not written to the tape.  Instead, what is written to the tape is a
			
 
				+description, of sorts, of the sparse file: where the holes are, how
			
 
				+big the holes are, and how much data is found at the end of the hole.
			
 
				+This way, the file takes up potentially far less room on the tape,
			
 
				+and when the file is extracted later on, it will look exactly the way
			
 
				+it looked beforehand.  The following is a description of the fields
			
 
				+used to handle a sparse file:
			
 
				+
			
 
				+The @code{sp} is an array of @code{struct sparse}.  Each @code{struct
			
 
				+sparse} contains two 12-character strings which represent an offset
			
 
				+into the file and a number of bytes to be written at that offset.
			
 
				+The offset is absolute, and not relative to the offset in preceding
			
 
				+array element.
			
 
				+
			
 
				+The header can hold four of these @code{struct sparse} at the moment;
			
 
				+if more are needed, they are not stored in the header.
			
 
				+
			
 
				+The @code{isextended} flag is set when an @code{extended_header}
			
 
				+is needed to deal with a file.  Note that this means that this flag
			
 
				+can only be set when dealing with a sparse file, and it is only set
			
 
				+in the event that the description of the file will not fit in the
			
 
				+allotted room for sparse structures in the header.  In other words,
			
 
				+an extended_header is needed.
			
 
				+
			
 
				+The @code{extended_header} structure is used for sparse files which
			
 
				+need more sparse structures than can fit in the header.  The header can
			
 
				+fit 4 such structures; if more are needed, the flag @code{isextended}
			
 
				+gets set and the next block is an @code{extended_header}.
			
 
				+
			
 
				+Each @code{extended_header} structure contains an array of 21
			
 
				+sparse structures, along with a similar @code{isextended} flag
			
 
				+that the header had.  There can be an indeterminate number of such
			
 
				+@code{extended_header}s to describe a sparse file.
			
 
				+
			
 
				+@table @asis
			
 
				+
			
 
				+@item @code{REGTYPE}
			
 
				+@itemx @code{AREGTYPE}
			
 
				+These flags represent a regular file.  In order to be compatible
			
 
				+with older versions of @command{tar}, a @code{typeflag} value of
			
 
				+@code{AREGTYPE} should be silently recognized as a regular file.
			
 
				+New archives should be created using @code{REGTYPE}.  Also, for
			
 
				+backward compatibility, @command{tar} treats a regular file whose name
			
 
				+ends with a slash as a directory.
			
 
				+
			
 
				+@item @code{LNKTYPE}
			
 
				+This flag represents a file linked to another file, of any type,
			
 
				+previously archived.  Such files are identified in Unix by each
			
 
				+file having the same device and inode number.  The linked-to name is
			
 
				+specified in the @code{linkname} field with a trailing null.
			
 
				+
			
 
				+@item @code{SYMTYPE}
			
 
				+This represents a symbolic link to another file.  The linked-to name
			
 
				+is specified in the @code{linkname} field with a trailing null.
			
 
				+
			
 
				+@item @code{CHRTYPE}
			
 
				+@itemx @code{BLKTYPE}
			
 
				+These represent character special files and block special files
			
 
				+respectively.  In this case the @code{devmajor} and @code{devminor}
			
 
				+fields will contain the major and minor device numbers respectively.
			
 
				+Operating systems may map the device specifications to their own
			
 
				+local specification, or may ignore the entry.
			
 
				+
			
 
				+@item @code{DIRTYPE}
			
 
				+This flag specifies a directory or sub-directory.  The directory
			
 
				+name in the @code{name} field should end with a slash.  On systems where
			
 
				+disk allocation is performed on a directory basis, the @code{size} field
			
 
				+will contain the maximum number of bytes (which may be rounded to
			
 
				+the nearest disk block allocation unit) which the directory may
			
 
				+hold.  A @code{size} field of zero indicates no such limiting.  Systems
			
 
				+which do not support limiting in this manner should ignore the
			
 
				+@code{size} field.
			
 
				+
			
 
				+@item @code{FIFOTYPE}
			
 
				+This specifies a FIFO special file.  Note that the archiving of a
			
 
				+FIFO file archives the existence of this file and not its contents.
			
 
				+
			
 
				+@item @code{CONTTYPE}
			
 
				+This specifies a contiguous file, which is the same as a normal
			
 
				+file except that, in operating systems which support it, all its
			
 
				+space is allocated contiguously on the disk.  Operating systems
			
 
				+which do not allow contiguous allocation should silently treat this
			
 
				+type as a normal file.
			
 
				+
			
 
				+@item @code{A} @dots{} @code{Z}
			
 
				+These are reserved for custom implementations.  Some of these are
			
 
				+used in the @acronym{GNU} modified format, as described below.
			
 
				+
			
 
				+@end table
			
 
				+
			
 
				+Other values are reserved for specification in future revisions of
			
 
				+the P1003 standard, and should not be used by any @command{tar} program.
			
 
				+
			
 
				+The @code{magic} field indicates that this archive was output in
			
 
				+the P1003 archive format.  If this field contains @code{TMAGIC},
			
 
				+the @code{uname} and @code{gname} fields will contain the ASCII
			
 
				+representation of the owner and group of the file respectively.
			
 
				+If found, the user and group IDs are used rather than the values in
			
 
				+the @code{uid} and @code{gid} fields.
			
 
				+
			
 
				+For references, see ISO/IEC 9945-1:1990 or IEEE Std 1003.1-1990, pages
			
 
				+169-173 (section 10.1) for @cite{Archive/Interchange File Format}; and
			
 
				+IEEE Std 1003.2-1992, pages 380-388 (section 4.48) and pages 936-940
			
 
				+(section E.4.48) for @cite{pax - Portable archive interchange}.
			
 
				+
			
 
				+@node Extensions
			
 
				+@unnumberedsec @acronym{GNU} Extensions to the Archive Format
			
 
				+@UNREVISED
			
 
				+
			
 
				+The @acronym{GNU} format uses additional file types to describe new types of
			
 
				+files in an archive.  These are listed below.
			
 
				+
			
 
				+@table @code
			
 
				+@item GNUTYPE_DUMPDIR
			
 
				+@itemx 'D'
			
 
				+This represents a directory and a list of files created by the
			
 
				+@option{--incremental} (@option{-G}) option.  The @code{size} field gives the total
			
 
				+size of the associated list of files.  Each file name is preceded by
			
 
				+either a @samp{Y} (the file should be in this archive) or an @samp{N}.
			
 
				+(The file is a directory, or is not stored in the archive.)  Each file
			
 
				+name is terminated by a null.  There is an additional null after the
			
 
				+last file name.
			
 
				+
			
 
				+@item GNUTYPE_MULTIVOL
			
 
				+@itemx 'M'
			
 
				+This represents a file continued from another volume of a multi-volume
			
 
				+archive created with the @option{--multi-volume} (@option{-M}) option.  The original
			
 
				+type of the file is not given here.  The @code{size} field gives the
			
 
				+maximum size of this piece of the file (assuming the volume does
			
 
				+not end before the file is written out).  The @code{offset} field
			
 
				+gives the offset from the beginning of the file where this part of
			
 
				+the file begins.  Thus @code{size} plus @code{offset} should equal
			
 
				+the original size of the file.
			
 
				+
			
 
				+@item GNUTYPE_SPARSE
			
 
				+@itemx 'S'
			
 
				+This flag indicates that we are dealing with a sparse file.  Note
			
 
				+that archiving a sparse file requires special operations to find
			
 
				+holes in the file, which mark the positions of these holes, along
			
 
				+with the number of bytes of data to be found after the hole.
			
 
				+
			
 
				+@item GNUTYPE_VOLHDR
			
 
				+@itemx 'V'
			
 
				+This file type is used to mark the volume header that was given with
			
 
				+the @option{--label=@var{archive-label}} (@option{-V @var{archive-label}}) option when the archive was created.  The @code{name}
			
 
				+field contains the @code{name} given after the @option{--label=@var{archive-label}} (@option{-V @var{archive-label}}) option.
			
 
				+The @code{size} field is zero.  Only the first file in each volume
			
 
				+of an archive should have this type.
			
 
				+
			
 
				+@end table
			
 
				+
			
 
				+You may have trouble reading a @acronym{GNU} format archive on a
			
 
				+non-@acronym{GNU} system if the options @option{--incremental} (@option{-G}),
			
 
				+@option{--multi-volume} (@option{-M}), @option{--sparse} (@option{-S}), or @option{--label=@var{archive-label}} (@option{-V @var{archive-label}}) were
			
 
				+used when writing the archive.  In general, if @command{tar} does not
			
 
				+use the @acronym{GNU}-added fields of the header, other versions of
			
 
				+@command{tar} should be able to read the archive.  Otherwise, the
			
 
				+@command{tar} program will give an error, the most likely one being a
			
 
				+checksum error.
			
 
				+
			
 
				+@node Snapshot Files
			
 
				+@unnumberedsec Format of the Incremental Snapshot Files
			
 
				+@include snapshot.texi
			
 
				+
			
 
				+@node Dumpdir
			
 
				+@unnumberedsec Dumpdir
			
 
				+@include dumpdir.texi