Browse Source

Use SEEK_HOLE for hole detection

Based on patch by Pavel Raiskup.

Use SEEK_HOLE/SEEK_DATA feature of lseek on systems that support
it.  This can make archiving of sparse files much faster.

Implement the --hole-detection option to allow users to select
hole-detection method.

* src/common.h (hole_detection_method): New enum.
(hole_detection): New global.
* src/sparse.c  (sparse_scan_file_wholesparse): New function as a
method for detecting sparse files without any data.
(sparse_scan_file_raw): Rename from sparse_scan_file; with edits.
(sparse_scan_file_seek): New function.
(sparse_scan_file): Reimplement function.
* src/tar.c: New option --hole-detection

* tests/checkseekhole.c: New file.
* tests/.gitignore: Mention two test binaries.
* tests/Makefile.am: Add new tests.
* tests/testsuite.at (AT_SEEKHOLE_PREREQ): New macro.
Include sparse06.at.
* tests/sparse06.at: New test case.
* tests/sparse02.at: Force raw hole-detection method.
* tests/sparsemv.at: Likewise.
* tests/sparsemvp.at: Likewise.

* doc/tar.1: Document --hole-detection option.
* doc/tar.texi: Document hole-detection algorithms and
command-line options.
* NEWS: Document hole-detection.
Sergey Poznyakoff 9 years ago
parent
commit
b684326e69
14 changed files with 426 additions and 79 deletions
  1. 20 1
      NEWS
  2. 9 2
      doc/tar.1
  3. 55 39
      doc/tar.texi
  4. 9 0
      src/common.h
  5. 140 35
      src/sparse.c
  6. 23 0
      src/tar.c
  7. 2 0
      tests/.gitignore
  8. 3 1
      tests/Makefile.am
  9. 92 0
      tests/checkseekhole.c
  10. 1 1
      tests/sparse02.at
  11. 56 0
      tests/sparse06.at
  12. 1 0
      tests/sparsemv.at
  13. 1 0
      tests/sparsemvp.at
  14. 14 0
      tests/testsuite.at

+ 20 - 1
NEWS

@@ -1,4 +1,4 @@
-GNU tar NEWS - User visible changes. 2015-11-02
+GNU tar NEWS - User visible changes. 2015-12-06
 Please send GNU tar bug reports to <bug-tar@gnu.org>
 Please send GNU tar bug reports to <bug-tar@gnu.org>
 
 
 
 
@@ -48,6 +48,25 @@ read from null-delimited file lists is treated as a file name.
 This restores the documented behavior, which was broken in version
 This restores the documented behavior, which was broken in version
 1.27.
 1.27.
 
 
+* Sparse file detection
+
+Tar now uses SEEK_DATA/SEEK_HOLE on systems that support it.  This
+allows for considerable speed-up in sparse-file detection.
+
+New option --hole-detection is provided, that allows the user to
+select the algorithm used for hole detection.  Available arguments
+are:
+
+  --hole-detection=seek
+     Use lseek(2) SEEK_DATA and SEEK_HOLE "whence" parameters.
+
+  --hole-detection=raw
+     Scan entire file before storing it to determine where holes
+     are located.
+
+The default is to use "seek" whenever possible, and fall back to
+"raw" otherwise.
+
 
 
 version 1.28, 2014-07-28
 version 1.28, 2014-07-28
 
 

+ 9 - 2
doc/tar.1

@@ -13,7 +13,7 @@
 .\"
 .\"
 .\" You should have received a copy of the GNU General Public License
 .\" You should have received a copy of the GNU General Public License
 .\" along with this program.  If not, see <http://www.gnu.org/licenses/>.
 .\" along with this program.  If not, see <http://www.gnu.org/licenses/>.
-.TH TAR 1 "November 2, 2015" "TAR" "GNU TAR Manual"
+.TH TAR 1 "December 5, 2015" "TAR" "GNU TAR Manual"
 .SH NAME
 .SH NAME
 tar \- an archiving utility
 tar \- an archiving utility
 .SH SYNOPSIS
 .SH SYNOPSIS
@@ -259,6 +259,12 @@ When listing or extracting, the actual contents of \fIFILE\fR is not
 inspected, it is needed only due to syntactical requirements.  It is
 inspected, it is needed only due to syntactical requirements.  It is
 therefore common practice to use \fB/dev/null\fR in its place.
 therefore common practice to use \fB/dev/null\fR in its place.
 .TP
 .TP
+\fB\-\-hole\-detection\fR=\fIMETHOD\fR
+Use \fIMETHOD\fR to detect holes in sparse files.  This option implies
+\fB\-\-sparse\fR.  Valid values for \fIMETHOD\fR are \fBseek\fR and
+\fBraw\fR.  Default is \fBseek\fR with fallback to \fBraw\fR when not
+applicable.
+.TP
 \fB\-G\fR, \fB\-\-incremental\fR
 \fB\-G\fR, \fB\-\-incremental\fR
 Handle old GNU-format incremental backups.
 Handle old GNU-format incremental backups.
 .TP
 .TP
@@ -821,7 +827,8 @@ environment variable.  If it is not set, \fBexisting\fR is assumed.
 .RE
 .RE
 .TP
 .TP
 \fB\-C\fR, \fB\-\-directory\fR=\fIDIR\fR
 \fB\-C\fR, \fB\-\-directory\fR=\fIDIR\fR
-Change to directory DIR.
+Change to \fIDIR\fR before performing any operations.  This option is
+order-sensitive, i.e. it affects all options that follow.
 .TP
 .TP
 \fB\-\-exclude\fR=\fIPATTERN\fR
 \fB\-\-exclude\fR=\fIPATTERN\fR
 Exclude files matching \fIPATTERN\fR, a
 Exclude files matching \fIPATTERN\fR, a

+ 55 - 39
doc/tar.texi

@@ -2782,6 +2782,13 @@ they refer to, instead of creating usual hard link members.
 @command{tar} will print out a short message summarizing the operations and
 @command{tar} will print out a short message summarizing the operations and
 options to @command{tar} and exit. @xref{help}.
 options to @command{tar} and exit. @xref{help}.
 
 
+@opsummary{hole-detection}
+@item --hole-detection=@var{method}
+Use @var{method} to detect holes in sparse files.  This option implies
+@option{--sparse}.  Valid methods are @samp{seek} and @samp{raw}.
+Default is @samp{seek} with fallback to @samp{raw} when not
+applicable. @xref{sparse}.
+
 @opsummary{ignore-case}
 @opsummary{ignore-case}
 @item --ignore-case
 @item --ignore-case
 Ignore case when matching member or file names with
 Ignore case when matching member or file names with
@@ -9536,13 +9543,15 @@ could create an archive longer than the original.  To have @command{tar}
 attempt to recognize the holes in a file, use @option{--sparse}
 attempt to recognize the holes in a file, use @option{--sparse}
 (@option{-S}).  When you use this option, then, for any file using
 (@option{-S}).  When you use this option, then, for any file using
 less disk space than would be expected from its length, @command{tar}
 less disk space than would be expected from its length, @command{tar}
-searches the file for consecutive stretches of zeros.  It then records
-in the archive for the file where the consecutive stretches of zeros
-are, and only archives the ``real contents'' of the file.  On
-extraction (using @option{--sparse} is not needed on extraction) any
-such files have holes created wherever the continuous stretches of zeros
-were found.  Thus, if you use @option{--sparse}, @command{tar} archives
-won't take more space than the original.
+searches the file for holes.  It then records in the archive for the file where
+the holes (consecutive stretches of zeros) are, and only archives the
+``real contents'' of the file.  On extraction (using @option{--sparse} is not
+needed on extraction) any such files have also holes created wherever the holes
+were found.  Thus, if you use @option{--sparse}, @command{tar} archives won't
+take more space than the original.
+
+@GNUTAR{} uses two methods for detecting holes in sparse files.  These
+methods are described later in this subsection.
 
 
 @table @option
 @table @option
 @opindex sparse
 @opindex sparse
@@ -9568,37 +9577,12 @@ will never take more space on the media than the files take on disk
 (otherwise, archiving a disk filled with sparse files might take
 (otherwise, archiving a disk filled with sparse files might take
 hundreds of tapes).  @xref{Incremental Dumps}.
 hundreds of tapes).  @xref{Incremental Dumps}.
 
 
-However, be aware that @option{--sparse} option presents a serious
-drawback.  Namely, in order to determine if the file is sparse
-@command{tar} has to read it before trying to archive it, so in total
-the file is read @strong{twice}.  So, always bear in mind that the
-time needed to process all files with this option is roughly twice
-the time needed to archive them without it.
-@FIXME{A technical note:
-
-Programs like @command{dump} do not have to read the entire file; by
-examining the file system directly, they can determine in advance
-exactly where the holes are and thus avoid reading through them.  The
-only data it need read are the actual allocated data blocks.
-@GNUTAR{} uses a more portable and straightforward
-archiving approach, it would be fairly difficult that it does
-otherwise.  Elizabeth Zwicky writes to @file{comp.unix.internals}, on
-1990-12-10:
-
-@quotation
-What I did say is that you cannot tell the difference between a hole and an
-equivalent number of nulls without reading raw blocks.  @code{st_blocks} at
-best tells you how many holes there are; it doesn't tell you @emph{where}.
-Just as programs may, conceivably, care what @code{st_blocks} is (care
-to name one that does?), they may also care where the holes are (I have
-no examples of this one either, but it's equally imaginable).
-
-I conclude from this that good archivers are not portable.  One can
-arguably conclude that if you want a portable program, you can in good
-conscience restore files with as many holes as possible, since you can't
-get it right.
-@end quotation
-}
+However, be aware that @option{--sparse} option may present a serious
+drawback.  Namely, in order to determine the positions of holes in a file
+@command{tar} may have to read it before trying to archive it, so in total
+the file may be read @strong{twice}.  This may happen when your OS or your FS
+does not support @dfn{SEEK_HOLE/SEEK_DATA} feature in @dfn{lseek} (See
+@option{--hole-detection}, below).
 
 
 @cindex sparse formats, defined
 @cindex sparse formats, defined
 When using @samp{POSIX} archive format, @GNUTAR{} is able to store
 When using @samp{POSIX} archive format, @GNUTAR{} is able to store
@@ -9612,7 +9596,6 @@ use an earlier format, you can select it using
 @table @option
 @table @option
 @opindex sparse-version
 @opindex sparse-version
 @item --sparse-version=@var{version}
 @item --sparse-version=@var{version}
-
 Select the format to store sparse files in.  Valid @var{version} values
 Select the format to store sparse files in.  Valid @var{version} values
 are: @samp{0.0}, @samp{0.1} and @samp{1.0}.  @xref{Sparse Formats},
 are: @samp{0.0}, @samp{0.1} and @samp{1.0}.  @xref{Sparse Formats},
 for a detailed description of each format.
 for a detailed description of each format.
@@ -9620,6 +9603,39 @@ for a detailed description of each format.
 
 
 Using @option{--sparse-format} option implies @option{--sparse}.
 Using @option{--sparse-format} option implies @option{--sparse}.
 
 
+@table @option
+@opindex hole-detection
+@cindex hole detection
+@item --hole-detection=@var{method}
+Enforce concrete hole detection method.  Before the real contents of sparse
+file are stored, @command{tar} needs to gather knowledge about file
+sparseness.  This is because it needs to have the file's map of holes
+stored into tar header before it starts archiving the file contents.
+Currently, two methods of hole detection are implemented:
+
+@itemize @bullet
+@item @option{--hole-detection=seek}
+Seeking the file for data and holes.  It uses enhancement of the @code{lseek}
+system call (@code{SEEK_HOLE} and @code{SEEK_DATA}) which is able to
+reuse file system knowledge about sparse file contents - so the
+detection is usually very fast.  To use this feature, your file system
+and operating system must support it.  At the time of this writing
+(2015) this feature, in spite of not being accepted by POSIX, is
+fairly widely supported by different operating systems.
+
+@item @option{--hole-detection=raw}
+Reading byte-by-byte the whole sparse file before the archiving.  This
+method detects holes like consecutive stretches of zeroes.  Comparing to
+the previous method, it is usually much slower, although more
+portable.
+@end itemize
+@end table
+
+When no @option{--hole-detection} option is given, @command{tar} uses
+the @samp{seek}, if supported by the operating system.
+
+Using @option{--hole-detection} option implies @option{--sparse}.
+
 @node Attributes
 @node Attributes
 @section Handling File Attributes
 @section Handling File Attributes
 @cindex attributes, files
 @cindex attributes, files

+ 9 - 0
src/common.h

@@ -280,6 +280,15 @@ GLOBAL bool sparse_option;
 GLOBAL unsigned tar_sparse_major;
 GLOBAL unsigned tar_sparse_major;
 GLOBAL unsigned tar_sparse_minor;
 GLOBAL unsigned tar_sparse_minor;
 
 
+enum hole_detection_method
+  {
+    HOLE_DETECTION_DEFAULT,
+    HOLE_DETECTION_RAW,
+    HOLE_DETECTION_SEEK
+  };
+
+GLOBAL enum hole_detection_method hole_detection;
+
 GLOBAL bool starting_file_option;
 GLOBAL bool starting_file_option;
 
 
 /* Specified maximum byte length of each tape volume (multiple of 1024).  */
 /* Specified maximum byte length of each tape volume (multiple of 1024).  */

+ 140 - 35
src/sparse.c

@@ -1,6 +1,6 @@
 /* Functions for dealing with sparse files
 /* Functions for dealing with sparse files
 
 
-   Copyright 2003-2007, 2010, 2013-2014 Free Software Foundation, Inc.
+   Copyright 2003-2007, 2010, 2013-2015 Free Software Foundation, Inc.
 
 
    This program is free software; you can redistribute it and/or modify it
    This program is free software; you can redistribute it and/or modify it
    under the terms of the GNU General Public License as published by the
    under the terms of the GNU General Public License as published by the
@@ -208,9 +208,9 @@ sparse_add_map (struct tar_stat_info *st, struct sp_array const *sp)
   st->sparse_map_avail = avail + 1;
   st->sparse_map_avail = avail + 1;
 }
 }
 
 
-/* Scan the sparse file and create its map */
+/* Scan the sparse file byte-by-byte and create its map. */
 static bool
 static bool
-sparse_scan_file (struct tar_sparse_file *file)
+sparse_scan_file_raw (struct tar_sparse_file *file)
 {
 {
   struct tar_stat_info *st = file->stat_info;
   struct tar_stat_info *st = file->stat_info;
   int fd = file->fd;
   int fd = file->fd;
@@ -221,41 +221,38 @@ sparse_scan_file (struct tar_sparse_file *file)
 
 
   st->archive_file_size = 0;
   st->archive_file_size = 0;
 
 
-  if (ST_NBLOCKS (st->stat) == 0)
-    offset = st->stat.st_size;
-  else
-    {
-      if (!tar_sparse_scan (file, scan_begin, NULL))
-	return false;
-
-      while ((count = blocking_read (fd, buffer, sizeof buffer)) != 0
-	     && count != SAFE_READ_ERROR)
-	{
-	  /* Analyze the block.  */
-	  if (zero_block_p (buffer, count))
-	    {
-	      if (sp.numbytes)
-		{
-		  sparse_add_map (st, &sp);
-		  sp.numbytes = 0;
-		  if (!tar_sparse_scan (file, scan_block, NULL))
-		    return false;
-		}
-	    }
-	  else
-	    {
-	      if (sp.numbytes == 0)
-		sp.offset = offset;
-	      sp.numbytes += count;
-	      st->archive_file_size += count;
-	      if (!tar_sparse_scan (file, scan_block, buffer))
-		return false;
-	    }
+  if (!tar_sparse_scan (file, scan_begin, NULL))
+    return false;
 
 
-	  offset += count;
-	}
+  while ((count = blocking_read (fd, buffer, sizeof buffer)) != 0
+         && count != SAFE_READ_ERROR)
+    {
+      /* Analyze the block.  */
+      if (zero_block_p (buffer, count))
+        {
+          if (sp.numbytes)
+            {
+              sparse_add_map (st, &sp);
+              sp.numbytes = 0;
+              if (!tar_sparse_scan (file, scan_block, NULL))
+                return false;
+            }
+        }
+      else
+        {
+          if (sp.numbytes == 0)
+            sp.offset = offset;
+          sp.numbytes += count;
+          st->archive_file_size += count;
+          if (!tar_sparse_scan (file, scan_block, buffer))
+            return false;
+        }
+
+      offset += count;
     }
     }
 
 
+  /* save one more sparse segment of length 0 to indicate that
+     the file ends with a hole */
   if (sp.numbytes == 0)
   if (sp.numbytes == 0)
     sp.offset = offset;
     sp.offset = offset;
 
 
@@ -264,6 +261,114 @@ sparse_scan_file (struct tar_sparse_file *file)
   return tar_sparse_scan (file, scan_end, NULL);
   return tar_sparse_scan (file, scan_end, NULL);
 }
 }
 
 
+static bool
+sparse_scan_file_wholesparse (struct tar_sparse_file *file)
+{
+  struct tar_stat_info *st = file->stat_info;
+  struct sp_array sp = {0, 0};
+
+  /* Note that this function is called only for truly sparse files of size >= 1
+     block size (checked via ST_IS_SPARSE before).  See the thread
+     http://www.mail-archive.com/bug-tar@gnu.org/msg04209.html for more info */
+  if (ST_NBLOCKS (st->stat) == 0)
+    {
+      st->archive_file_size = 0;
+      sp.offset = st->stat.st_size;
+      sparse_add_map (st, &sp);
+      return true;
+    }
+
+  return false;
+}
+
+#ifdef SEEK_HOLE
+/* Try to engage SEEK_HOLE/SEEK_DATA feature. */
+static bool
+sparse_scan_file_seek (struct tar_sparse_file *file)
+{
+  struct tar_stat_info *st = file->stat_info;
+  int fd = file->fd;
+  struct sp_array sp = {0, 0};
+  off_t offset = 0;
+  off_t data_offset;
+  off_t hole_offset;
+
+  st->archive_file_size = 0;
+
+  for (;;)
+    {
+      /* locate first chunk of data */
+      data_offset = lseek (fd, offset, SEEK_DATA);
+
+      if (data_offset == (off_t)-1)
+        /* ENXIO == EOF; error otherwise */
+        {
+          if (errno == ENXIO)
+            {
+              /* file ends with hole, add one more empty chunk of data */
+              sp.numbytes = 0;
+              sp.offset = st->stat.st_size;
+              sparse_add_map (st, &sp);
+              return true;
+            }
+          return false;
+        }
+
+      hole_offset = lseek (fd, data_offset, SEEK_HOLE);
+
+      /* according to specs, if FS does not fully support
+	 SEEK_DATA/SEEK_HOLE it may just implement kind of "wrapper" around
+	 classic lseek() call.  We must detect it here and try to use other
+	 hole-detection methods. */
+      if (offset == 0 /* first loop */
+          && data_offset == 0
+          && hole_offset == st->stat.st_size)
+        {
+          lseek (fd, 0, SEEK_SET);
+          return false;
+        }
+
+      sp.offset = data_offset;
+      sp.numbytes = hole_offset - data_offset;
+      sparse_add_map (st, &sp);
+
+      st->archive_file_size += sp.numbytes;
+      offset = hole_offset;
+    }
+
+  return true;
+}
+#endif
+
+static bool
+sparse_scan_file (struct tar_sparse_file *file)
+{
+  /* always check for completely sparse files */
+  if (sparse_scan_file_wholesparse (file))
+    return true;
+
+  switch (hole_detection)
+    {
+    case HOLE_DETECTION_DEFAULT:
+    case HOLE_DETECTION_SEEK:
+#ifdef SEEK_HOLE
+      if (sparse_scan_file_seek (file))
+        return true;
+#else
+      if (hole_detection == HOLE_DETECTION_SEEK)
+	WARN((0, 0,
+	      _("\"seek\" hole detection is not supported, using \"raw\".")));
+      /* fall back to "raw" for this and all other files */
+      hole_detection = HOLE_DETECTION_RAW;
+#endif
+    case HOLE_DETECTION_RAW:
+      if (sparse_scan_file_raw (file))
+	return true;
+    }
+  
+  return false;
+}
+
 static struct tar_sparse_optab const oldgnu_optab;
 static struct tar_sparse_optab const oldgnu_optab;
 static struct tar_sparse_optab const star_optab;
 static struct tar_sparse_optab const star_optab;
 static struct tar_sparse_optab const pax_optab;
 static struct tar_sparse_optab const pax_optab;

+ 23 - 0
src/tar.c

@@ -362,6 +362,7 @@ enum
   SHOW_TRANSFORMED_NAMES_OPTION,
   SHOW_TRANSFORMED_NAMES_OPTION,
   SKIP_OLD_FILES_OPTION,
   SKIP_OLD_FILES_OPTION,
   SORT_OPTION,
   SORT_OPTION,
+  HOLE_DETECTION_OPTION,
   SPARSE_VERSION_OPTION,
   SPARSE_VERSION_OPTION,
   STRIP_COMPONENTS_OPTION,
   STRIP_COMPONENTS_OPTION,
   SUFFIX_OPTION,
   SUFFIX_OPTION,
@@ -451,6 +452,8 @@ static struct argp_option options[] = {
 
 
   {"sparse", 'S', 0, 0,
   {"sparse", 'S', 0, 0,
    N_("handle sparse files efficiently"), GRID+1 },
    N_("handle sparse files efficiently"), GRID+1 },
+  {"hole-detection", HOLE_DETECTION_OPTION, N_("TYPE"), 0,
+   N_("technique to detect holes"), GRID+1 },
   {"sparse-version", SPARSE_VERSION_OPTION, N_("MAJOR[.MINOR]"), 0,
   {"sparse-version", SPARSE_VERSION_OPTION, N_("MAJOR[.MINOR]"), 0,
    N_("set version of the sparse format to use (implies --sparse)"), GRID+1},
    N_("set version of the sparse format to use (implies --sparse)"), GRID+1},
   {"incremental", 'G', 0, 0,
   {"incremental", 'G', 0, 0,
@@ -1464,6 +1467,19 @@ static int sort_mode_flag[] = {
 };
 };
 
 
 ARGMATCH_VERIFY (sort_mode_arg, sort_mode_flag);
 ARGMATCH_VERIFY (sort_mode_arg, sort_mode_flag);
+
+static char const *const hole_detection_args[] =
+{
+  "raw", "seek", NULL
+};
+
+static int const hole_detection_types[] =
+{
+  HOLE_DETECTION_RAW, HOLE_DETECTION_SEEK
+};
+
+ARGMATCH_VERIFY (hole_detection_args, hole_detection_types);
+
 
 
 static void
 static void
 set_old_files_option (int code, struct option_locus *loc)
 set_old_files_option (int code, struct option_locus *loc)
@@ -1753,6 +1769,12 @@ parse_opt (int key, char *arg, struct argp_state *state)
       set_old_files_option (SKIP_OLD_FILES, args->loc);
       set_old_files_option (SKIP_OLD_FILES, args->loc);
       break;
       break;
 
 
+    case HOLE_DETECTION_OPTION:
+      hole_detection = XARGMATCH ("--hole-detection", arg,
+				  hole_detection_args, hole_detection_types);
+      sparse_option = true;
+      break;
+
     case SPARSE_VERSION_OPTION:
     case SPARSE_VERSION_OPTION:
       sparse_option = true;
       sparse_option = true;
       {
       {
@@ -2523,6 +2545,7 @@ decode_options (int argc, char **argv)
   blocking_factor = DEFAULT_BLOCKING;
   blocking_factor = DEFAULT_BLOCKING;
   record_size = DEFAULT_BLOCKING * BLOCKSIZE;
   record_size = DEFAULT_BLOCKING * BLOCKSIZE;
   excluded = new_exclude ();
   excluded = new_exclude ();
+  hole_detection = HOLE_DETECTION_DEFAULT;
 
 
   newer_mtime_option.tv_sec = TYPE_MINIMUM (time_t);
   newer_mtime_option.tv_sec = TYPE_MINIMUM (time_t);
   newer_mtime_option.tv_nsec = -1;
   newer_mtime_option.tv_nsec = -1;

+ 2 - 0
tests/.gitignore

@@ -9,3 +9,5 @@ argcv.h
 genfile.c
 genfile.c
 genfile
 genfile
 download
 download
+ttyemu
+checkseekhole

+ 3 - 1
tests/Makefile.am

@@ -207,6 +207,7 @@ TESTSUITE_AT = \
  sparse03.at\
  sparse03.at\
  sparse04.at\
  sparse04.at\
  sparse05.at\
  sparse05.at\
+ sparse06.at\
  sparsemv.at\
  sparsemv.at\
  sparsemvp.at\
  sparsemvp.at\
  spmvp00.at\
  spmvp00.at\
@@ -275,13 +276,14 @@ installcheck-local: $(check_PROGRAMS)
 ## genfile      ##
 ## genfile      ##
 ## ------------ ##
 ## ------------ ##
 
 
-check_PROGRAMS = genfile
+check_PROGRAMS = genfile checkseekhole
 
 
 if TAR_COND_GRANTPT
 if TAR_COND_GRANTPT
 check_PROGRAMS += ttyemu
 check_PROGRAMS += ttyemu
 endif
 endif
 
 
 genfile_SOURCES = genfile.c argcv.c argcv.h
 genfile_SOURCES = genfile.c argcv.c argcv.h
+checkseekhole_SOURCES = checkseekhole.c
 
 
 ttyemu_SOURCES = ttyemu.c
 ttyemu_SOURCES = ttyemu.c
 
 

+ 92 - 0
tests/checkseekhole.c

@@ -0,0 +1,92 @@
+/* Test suite for GNU tar - SEEK_HOLE detector.
+
+   Copyright 2015 Free Software Foundation, Inc.
+
+   This program is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by the
+   Free Software Foundation; either version 3, or (at your option) any later
+   version.
+
+   This program is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General
+   Public License for more details.
+
+   You should have received a copy of the GNU General Public License along
+   with this program.  If not, see <http://www.gnu.org/licenses/>.
+
+   Description:  detect whether it is possible to work with SEEK_HOLE on
+   particular operating system and file system. */
+
+#include "config.h"
+
+#include <sys/stat.h>
+#include <sys/types.h>
+#include <unistd.h>
+#include <stdlib.h>
+#include <fcntl.h>
+
+enum {
+    EX_OK = 0,    /* SEEK_HOLE support */
+    EX_FAIL,      /* test failed - no SEEK_HOLE support */
+    EX_BAD,       /* test is not relevant */
+};
+
+int
+check_seek_hole (int fd)
+{
+#ifdef SEEK_HOLE
+  struct stat stat;
+  off_t offset;
+
+  /* hole of 100MB */
+  if (lseek (fd, 100*1024*1024, SEEK_END) < 0)
+    return EX_BAD;
+
+  /* piece of data */
+  if (write (fd, "data\n", 5) != 5)
+    return EX_BAD;
+
+  /* another hole */
+  if (lseek (fd, 100*1024*1024, SEEK_END) < 0)
+    return EX_BAD;
+
+  /* piece of data */
+  if (write (fd, "data\n", 5) != 5)
+    return EX_BAD;
+
+  if (fstat (fd, &stat))
+    return EX_BAD;
+
+  offset = lseek (fd, 0, SEEK_DATA);
+  if (offset == (off_t)-1)
+    return EX_FAIL;
+
+  offset = lseek (fd, offset, SEEK_HOLE);
+  if (offset == (off_t)-1 || offset == stat.st_size)
+    return EX_FAIL;
+
+  return EX_OK;
+#else
+  return EX_BAD;
+#endif
+}
+
+int
+main ()
+{
+#ifdef SEEK_HOLE
+  int rc;
+  char template[] = "testseekhole-XXXXXX";
+  int fd = mkstemp (template);
+  if (fd == -1)
+    return EX_BAD;
+  rc = check_seek_hole (fd);
+  close (fd);
+  unlink (template);
+
+  return rc;
+#else
+  return EX_FAIL;
+#endif
+}

+ 1 - 1
tests/sparse02.at

@@ -27,7 +27,7 @@ AT_KEYWORDS([sparse sparse02])
 
 
 AT_TAR_CHECK([
 AT_TAR_CHECK([
 genfile --sparse --file sparsefile --block-size 512 0 ABCD 1M EFGH 2000K IJKL || AT_SKIP_TEST
 genfile --sparse --file sparsefile --block-size 512 0 ABCD 1M EFGH 2000K IJKL || AT_SKIP_TEST
-tar -c -f archive --sparse sparsefile || exit 1
+tar --hole-detection=raw -c -f archive --sparse sparsefile || exit 1
 echo separator
 echo separator
 
 
 tar xfO archive | cat - > sparsecopy || exit 1
 tar xfO archive | cat - > sparsecopy || exit 1

+ 56 - 0
tests/sparse06.at

@@ -0,0 +1,56 @@
+# Process this file with autom4te to create testsuite. -*- Autotest -*-
+#
+# Test suite for GNU tar.
+# Copyright 2014 Free Software Foundation, Inc.
+
+# This file is part of GNU tar.
+
+# GNU tar is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+
+# GNU tar is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+
+AT_SETUP([storing sparse file using seek method])
+AT_KEYWORDS([sparse sparse06])
+
+m4_define([check_pattern],[
+rm -rf out archive.tar smallsparse && mkdir out
+genfile --sparse --file smallsparse $1
+tar -cSf archive.tar smallsparse
+tar -xf archive.tar -C out
+cmp smallsparse out/smallsparse
+])
+
+AT_TAR_CHECK([
+AT_SEEKHOLE_PREREQ
+AT_TIMEOUT_PREREQ
+
+TAR_OPTIONS="$TAR_OPTIONS --hole-detection=seek"
+genfile --sparse --file bigsparse 0 ABC 8G DEF
+timeout 2 tar -cSf a bigsparse
+test $? -eq 0 || exit 1
+
+check_pattern([0 ABC])
+check_pattern([0 ABC 10M])
+check_pattern([0 ABC 10M DEF])
+
+check_pattern([10M])
+check_pattern([10M ABC])
+check_pattern([10M ABC 20M])
+
+check_pattern([10M DEF 20M GHI 30M JKL 40M])
+
+],
+[0],,
+[genfile: created file is not sparse
+],,,[posix])
+
+AT_CLEANUP

+ 1 - 0
tests/sparsemv.at

@@ -30,6 +30,7 @@ AT_KEYWORDS([sparse multiv sparsemv])
 
 
 AT_TAR_CHECK([
 AT_TAR_CHECK([
 exec <&-
 exec <&-
+TAR_OPTIONS="$TAR_OPTIONS --hole-detection=raw"
 genfile --sparse --file sparsefile 0 ABCDEFGHIJK 1M ABCDEFGHI || AT_SKIP_TEST
 genfile --sparse --file sparsefile 0 ABCDEFGHIJK 1M ABCDEFGHI || AT_SKIP_TEST
 echo "Pass 1: Split between data blocks"
 echo "Pass 1: Split between data blocks"
 echo "Create archive"
 echo "Create archive"

+ 1 - 0
tests/sparsemvp.at

@@ -26,6 +26,7 @@ dnl TAR_MVP_TEST version map1 map2
 m4_define([TAR_MVP_TEST],[
 m4_define([TAR_MVP_TEST],[
 AT_TAR_CHECK([
 AT_TAR_CHECK([
 exec <&-
 exec <&-
+TAR_OPTIONS="$TAR_OPTIONS --hole-detection=raw"
 genfile --sparse --file sparsefile $2 || AT_SKIP_TEST
 genfile --sparse --file sparsefile $2 || AT_SKIP_TEST
 echo "Pass 1: Split between data blocks"
 echo "Pass 1: Split between data blocks"
 echo "Create archive"
 echo "Create archive"

+ 14 - 0
tests/testsuite.at

@@ -112,6 +112,19 @@ rm -f $[]$
 test $result -eq 0 || AT_SKIP_TEST
 test $result -eq 0 || AT_SKIP_TEST
 ])
 ])
 
 
+dnl AT_SEEKHOLE_PREREQ
+m4_define([AT_SEEKHOLE_PREREQ],[
+checkseekhole || AT_SKIP_TEST
+])
+
+m4_define([AT_TIMEOUT_PREREQ],[
+timeout 100 true
+if test $? -ne 0; then
+    echo >&2 "the 'timeout' utility not found"
+    AT_SKIP_TEST
+fi
+])
+
 m4_define([AT_TAR_MKHIER],[
 m4_define([AT_TAR_MKHIER],[
 install-sh -d $1 >/dev/null dnl
 install-sh -d $1 >/dev/null dnl
 m4_if([$2],,,&& genfile --file [$1]/[$2]) || AT_SKIP_TEST])
 m4_if([$2],,,&& genfile --file [$1]/[$2]) || AT_SKIP_TEST])
@@ -358,6 +371,7 @@ m4_include([sparse02.at])
 m4_include([sparse03.at])
 m4_include([sparse03.at])
 m4_include([sparse04.at])
 m4_include([sparse04.at])
 m4_include([sparse05.at])
 m4_include([sparse05.at])
+m4_include([sparse06.at])
 m4_include([sparsemv.at])
 m4_include([sparsemv.at])
 m4_include([spmvp00.at])
 m4_include([spmvp00.at])
 m4_include([spmvp01.at])
 m4_include([spmvp01.at])