Add support for incremental backup.
authorRobert Haas <rhaas@postgresql.org>
Wed, 20 Dec 2023 14:49:12 +0000 (09:49 -0500)
committerRobert Haas <rhaas@postgresql.org>
Wed, 20 Dec 2023 14:49:12 +0000 (09:49 -0500)
To take an incremental backup, you use the new replication command
UPLOAD_MANIFEST to upload the manifest for the prior backup. This
prior backup could either be a full backup or another incremental
backup.  You then use BASE_BACKUP with the INCREMENTAL option to take
the backup.  pg_basebackup now has an --incremental=PATH_TO_MANIFEST
option to trigger this behavior.

An incremental backup is like a regular full backup except that
some relation files are replaced with files with names like
INCREMENTAL.${ORIGINAL_NAME}, and the backup_label file contains
additional lines identifying it as an incremental backup. The new
pg_combinebackup tool can be used to reconstruct a data directory
from a full backup and a series of incremental backups.

Patch by me.  Reviewed by Matthias van de Meent, Dilip Kumar, Jakub
Wartak, Peter Eisentraut, and Álvaro Herrera. Thanks especially to
Jakub for incredibly helpful and extensive testing.

Discussion: http://postgr.es/m/CA+TgmoYOYZfMCyOXFyC-P+-mdrZqm5pP2N7S-r0z3_402h9rsA@mail.gmail.com

49 files changed:
doc/src/sgml/backup.sgml
doc/src/sgml/config.sgml
doc/src/sgml/protocol.sgml
doc/src/sgml/ref/allfiles.sgml
doc/src/sgml/ref/pg_basebackup.sgml
doc/src/sgml/ref/pg_combinebackup.sgml [new file with mode: 0644]
doc/src/sgml/reference.sgml
src/backend/access/transam/xlogbackup.c
src/backend/access/transam/xlogrecovery.c
src/backend/backup/Makefile
src/backend/backup/basebackup.c
src/backend/backup/basebackup_incremental.c [new file with mode: 0644]
src/backend/backup/meson.build
src/backend/replication/repl_gram.y
src/backend/replication/repl_scanner.l
src/backend/replication/walsender.c
src/backend/storage/ipc/ipci.c
src/bin/Makefile
src/bin/meson.build
src/bin/pg_basebackup/bbstreamer_file.c
src/bin/pg_basebackup/pg_basebackup.c
src/bin/pg_basebackup/t/010_pg_basebackup.pl
src/bin/pg_combinebackup/.gitignore [new file with mode: 0644]
src/bin/pg_combinebackup/Makefile [new file with mode: 0644]
src/bin/pg_combinebackup/backup_label.c [new file with mode: 0644]
src/bin/pg_combinebackup/backup_label.h [new file with mode: 0644]
src/bin/pg_combinebackup/copy_file.c [new file with mode: 0644]
src/bin/pg_combinebackup/copy_file.h [new file with mode: 0644]
src/bin/pg_combinebackup/load_manifest.c [new file with mode: 0644]
src/bin/pg_combinebackup/load_manifest.h [new file with mode: 0644]
src/bin/pg_combinebackup/meson.build [new file with mode: 0644]
src/bin/pg_combinebackup/nls.mk [new file with mode: 0644]
src/bin/pg_combinebackup/pg_combinebackup.c [new file with mode: 0644]
src/bin/pg_combinebackup/reconstruct.c [new file with mode: 0644]
src/bin/pg_combinebackup/reconstruct.h [new file with mode: 0644]
src/bin/pg_combinebackup/t/001_basic.pl [new file with mode: 0644]
src/bin/pg_combinebackup/t/002_compare_backups.pl [new file with mode: 0644]
src/bin/pg_combinebackup/t/003_timeline.pl [new file with mode: 0644]
src/bin/pg_combinebackup/t/004_manifest.pl [new file with mode: 0644]
src/bin/pg_combinebackup/t/005_integrity.pl [new file with mode: 0644]
src/bin/pg_combinebackup/write_manifest.c [new file with mode: 0644]
src/bin/pg_combinebackup/write_manifest.h [new file with mode: 0644]
src/bin/pg_resetwal/pg_resetwal.c
src/include/access/xlogbackup.h
src/include/backup/basebackup.h
src/include/backup/basebackup_incremental.h [new file with mode: 0644]
src/include/nodes/replnodes.h
src/test/perl/PostgreSQL/Test/Cluster.pm
src/tools/pgindent/typedefs.list

index 8cb24d6ae542f4e20e5dfd6d9f6a157bf3a4688a..b3468eea3cb999f8121e89379fb71de69c41273b 100644 (file)
@@ -857,12 +857,79 @@ test ! -f /mnt/server/archivedir/00000001000000A900000065 &amp;&amp; cp pg_wal/0
    </para>
   </sect2>
 
+  <sect2 id="backup-incremental-backup">
+   <title>Making an Incremental Backup</title>
+
+   <para>
+    You can use <xref linkend="app-pgbasebackup"/> to take an incremental
+    backup by specifying the <literal>--incremental</literal> option. You must
+    supply, as an argument to <literal>--incremental</literal>, the backup
+    manifest to an earlier backup from the same server. In the resulting
+    backup, non-relation files will be included in their entirety, but some
+    relation files may be replaced by smaller incremental files which contain
+    only the blocks which have been changed since the earlier backup and enough
+    metadata to reconstruct the current version of the file.
+   </para>
+
+   <para>
+    To figure out which blocks need to be backed up, the server uses WAL
+    summaries, which are stored in the data directory, inside the directory
+    <literal>pg_wal/summaries</literal>. If the required summary files are not
+    present, an attempt to take an incremental backup will fail. The summaries
+    present in this directory must cover all LSNs from the start LSN of the
+    prior backup to the start LSN of the current backup. Since the server looks
+    for WAL summaries just after establishing the start LSN of the current
+    backup, the necessary summary files probably won't be instantly present
+    on disk, but the server will wait for any missing files to show up.
+    This also helps if the WAL summarization process has fallen behind.
+    However, if the necessary files have already been removed, or if the WAL
+    summarizer doesn't catch up quickly enough, the incremental backup will
+    fail.
+   </para>
+
+   <para>
+    When restoring an incremental backup, it will be necessary to have not
+    only the incremental backup itself but also all earlier backups that
+    are required to supply the blocks omitted from the incremental backup.
+    See <xref linkend="app-pgcombinebackup"/> for further information about
+    this requirement.
+   </para>
+
+   <para>
+    Note that all of the requirements for making use of a full backup also
+    apply to an incremental backup. For instance, you still need all of the
+    WAL segment files generated during and after the file system backup, and
+    any relevant WAL history files. And you still need to create a
+    <literal>recovery.signal</literal> (or <literal>standby.signal</literal>)
+    and perform recovery, as described in
+    <xref linkend="backup-pitr-recovery" />. The requirement to have earlier
+    backups available at restore time and to use
+    <literal>pg_combinebackup</literal> is an additional requirement on top of
+    everything else. Keep in mind that <application>PostgreSQL</application>
+    has no built-in mechanism to figure out which backups are still needed as
+    a basis for restoring later incremental backups. You must keep track of
+    the relationships between your full and incremental backups on your own,
+    and be certain not to remove earlier backups if they might be needed when
+    restoring later incremental backups.
+   </para>
+
+   <para>
+    Incremental backups typically only make sense for relatively large
+    databases where a significant portion of the data does not change, or only
+    changes slowly. For a small database, it's simpler to ignore the existence
+    of incremental backups and simply take full backups, which are simpler
+    to manage. For a large database all of which is heavily modified,
+    incremental backups won't be much smaller than full backups.
+   </para>
+  </sect2>
+
   <sect2 id="backup-lowlevel-base-backup">
    <title>Making a Base Backup Using the Low Level API</title>
    <para>
-    The procedure for making a base backup using the low level
-    APIs contains a few more steps than
-    the <xref linkend="app-pgbasebackup"/> method, but is relatively
+    Instead of taking a full or incremental base backup using
+    <xref linkend="app-pgbasebackup"/>, you can take a base backup using the
+    low-level API. This procedure contains a few more steps than
+    the <application>pg_basebackup</application> method, but is relatively
     simple. It is very important that these steps are executed in
     sequence, and that the success of a step is verified before
     proceeding to the next step.
@@ -1118,7 +1185,8 @@ SELECT * FROM pg_backup_stop(wait_for_archive => true);
    </listitem>
    <listitem>
     <para>
-     Restore the database files from your file system backup.  Be sure that they
+     If you're restoring a full backup, you can restore the database files
+     directly into the target directories.  Be sure that they
      are restored with the right ownership (the database system user, not
      <literal>root</literal>!) and with the right permissions.  If you are using
      tablespaces,
@@ -1126,6 +1194,19 @@ SELECT * FROM pg_backup_stop(wait_for_archive => true);
      were correctly restored.
     </para>
    </listitem>
+   <listitem>
+    <para>
+     If you're restoring an incremental backup, you'll need to restore the
+     incremental backup and all earlier backups upon which it directly or
+     indirectly depends to the machine where you are performing the restore.
+     These backups will need to be placed in separate directories, not the
+     target directories where you want the running server to end up.
+     Once this is done, use <xref linkend="app-pgcombinebackup"/> to pull
+     data from the full backup and all of the subsequent incremental backups
+     and write out a synthetic full backup to the target directories. As above,
+     verify that permissions and tablespace links are correct.
+    </para>
+   </listitem>
    <listitem>
     <para>
      Remove any files present in <filename>pg_wal/</filename>; these came from the
index ee985850275d010e11c335fdc2fbd0acb7ff4b45..b5624ca884741c60d2eb395e3dea7611d57e9516 100644 (file)
@@ -4153,13 +4153,11 @@ restore_command = 'copy "C:\\server\\archivedir\\%f" "%p"'  # Windows
    <sect2 id="runtime-config-wal-summarization">
     <title>WAL Summarization</title>
 
- <!--
     <para>
      These settings control WAL summarization, a feature which must be
      enabled in order to perform an
      <link linkend="backup-incremental-backup">incremental backup</link>.
     </para>
- -->
 
     <variablelist>
      <varlistentry id="guc-summarize-wal" xreflabel="summarize_wal">
index af3f016f7467756dad9f5f1ebe208daed9aed5f5..9a66918171a56ab3d59b1b1d311d01559fe6ff01 100644 (file)
@@ -2599,6 +2599,19 @@ psql "dbname=postgres replication=database" -c "IDENTIFY_SYSTEM;"
      </listitem>
     </varlistentry>
 
+    <varlistentry id="protocol-replication-upload-manifest">
+     <term>
+      <literal>UPLOAD_MANIFEST</literal>
+      <indexterm><primary>UPLOAD_MANIFEST</primary></indexterm>
+     </term>
+     <listitem>
+      <para>
+       Uploads a backup manifest in preparation for taking an incremental
+       backup.
+      </para>
+     </listitem>
+    </varlistentry>
+
     <varlistentry id="protocol-replication-base-backup" xreflabel="BASE_BACKUP">
      <term><literal>BASE_BACKUP</literal> [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ]
       <indexterm><primary>BASE_BACKUP</primary></indexterm>
@@ -2838,6 +2851,17 @@ psql "dbname=postgres replication=database" -c "IDENTIFY_SYSTEM;"
           </para>
          </listitem>
         </varlistentry>
+
+        <varlistentry>
+         <term><literal>INCREMENTAL</literal></term>
+         <listitem>
+          <para>
+           Requests an incremental backup. The
+           <literal>UPLOAD_MANIFEST</literal> command must be executed
+           before running a base backup with this option.
+          </para>
+         </listitem>
+        </varlistentry>
        </variablelist>
       </para>
 
index 54b5f22d6ec9f1f11f2cb5cdf7113daeaa9b0df1..fda4690eab52c2919ce0a81bdc796ca6f05c0184 100644 (file)
@@ -202,6 +202,7 @@ Complete list of usable sgml source files in this directory.
 <!ENTITY pgBasebackup       SYSTEM "pg_basebackup.sgml">
 <!ENTITY pgbench            SYSTEM "pgbench.sgml">
 <!ENTITY pgChecksums        SYSTEM "pg_checksums.sgml">
+<!ENTITY pgCombinebackup    SYSTEM "pg_combinebackup.sgml">
 <!ENTITY pgConfig           SYSTEM "pg_config-ref.sgml">
 <!ENTITY pgControldata      SYSTEM "pg_controldata.sgml">
 <!ENTITY pgCtl              SYSTEM "pg_ctl-ref.sgml">
index 0b87fd2d4d6212e8c6b64c38ea7864204cbe36be..7c183a5cfd20ea4cf581a0a9aeec41bbdeb4e8bb 100644 (file)
@@ -38,11 +38,25 @@ PostgreSQL documentation
   </para>
 
   <para>
-   <application>pg_basebackup</application> makes an exact copy of the database
-   cluster's files, while making sure the server is put into and
-   out of backup mode automatically. Backups are always taken of the entire
-   database cluster; it is not possible to back up individual databases or
-   database objects. For selective backups, another tool such as
+   <application>pg_basebackup</application> can take a full or incremental
+   base backup of the database. When used to take a full backup, it makes an
+   exact copy of the database cluster's files. When used to take an incremental
+   backup, some files that would have been part of a full backup may be
+   replaced with incremental versions of the same files, containing only those
+   blocks that have been modified since the reference backup. An incremental
+   backup cannot be used directly; instead,
+   <xref linkend="app-pgcombinebackup"/> must first
+   be used to combine it with the previous backups upon which it depends.
+   See <xref linkend="backup-incremental-backup" /> for more information
+   about incremental backups, and <xref linkend="backup-pitr-recovery" />
+   for steps to recover from a backup.
+  </para>
+
+  <para>
+   In any mode, <application>pg_basebackup</application> makes sure the server
+   is put into and out of backup mode automatically. Backups are always taken of
+   the entire database cluster; it is not possible to back up individual
+   databases or database objects. For selective backups, another tool such as
    <xref linkend="app-pgdump"/> must be used.
   </para>
 
@@ -197,6 +211,19 @@ PostgreSQL documentation
       </listitem>
      </varlistentry>
 
+     <varlistentry>
+      <term><option>-i <replaceable class="parameter">old_manifest_file</replaceable></option></term>
+      <term><option>--incremental=<replaceable class="parameter">old_meanifest_file</replaceable></option></term>
+      <listitem>
+       <para>
+        Performs an <link linkend="backup-incremental-backup">incremental
+        backup</link>. The backup manifest for the reference
+        backup must be provided, and will be uploaded to the server, which will
+        respond by sending the requested incremental backup.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry>
       <term><option>-R</option></term>
       <term><option>--write-recovery-conf</option></term>
diff --git a/doc/src/sgml/ref/pg_combinebackup.sgml b/doc/src/sgml/ref/pg_combinebackup.sgml
new file mode 100644 (file)
index 0000000..e172967
--- /dev/null
@@ -0,0 +1,240 @@
+<!--
+doc/src/sgml/ref/pg_combinebackup.sgml
+PostgreSQL documentation
+-->
+
+<refentry id="app-pgcombinebackup">
+ <indexterm zone="app-pgcombinebackup">
+  <primary>pg_combinebackup</primary>
+ </indexterm>
+
+ <refmeta>
+  <refentrytitle><application>pg_combinebackup</application></refentrytitle>
+  <manvolnum>1</manvolnum>
+  <refmiscinfo>Application</refmiscinfo>
+ </refmeta>
+
+ <refnamediv>
+  <refname>pg_combinebackup</refname>
+  <refpurpose>reconstruct a full backup from an incremental backup and dependent backups</refpurpose>
+ </refnamediv>
+
+ <refsynopsisdiv>
+  <cmdsynopsis>
+   <command>pg_combinebackup</command>
+   <arg rep="repeat"><replaceable>option</replaceable></arg>
+   <arg rep="repeat"><replaceable>backup_directory</replaceable></arg>
+  </cmdsynopsis>
+ </refsynopsisdiv>
+
+ <refsect1>
+  <title>Description</title>
+  <para>
+   <application>pg_combinebackup</application> is used to reconstruct a
+   synthetic full backup from an
+   <link linkend="backup-incremental-backup">incremental backup</link> and the
+   earlier backups upon which it depends.
+  </para>
+
+  <para>
+   Specify all of the required backups on the command line from oldest to newest.
+   That is, the first backup directory should be the path to the full backup, and
+   the last should be the path to the final incremental backup
+   that you wish to restore. The reconstructed backup will be written to the
+   output directory specified by the <option>-o</option> option.
+  </para>
+
+  <para>
+   Although <application>pg_combinebackup</application> will attempt to verify
+   that the backups you specify form a legal backup chain from which a correct
+   full backup can be reconstructed, it is not designed to help you keep track
+   of which backups depend on which other backups. If you remove the one or
+   more of the previous backups upon which your incremental
+   backup relies, you will not be able to restore it.
+  </para>
+
+  <para>
+   Since the output of <application>pg_combinebackup</application> is a
+   synthetic full backup, it can be used as an input to a future invocation of
+   <application>pg_combinebackup</application>. The synthetic full backup would
+   be specified on the command line in lieu of the chain of backups from which
+   it was reconstructed.
+  </para>
+ </refsect1>
+
+ <refsect1>
+  <title>Options</title>
+
+   <para>
+    <variablelist>
+     <varlistentry>
+      <term><option>-d</option></term>
+      <term><option>--debug</option></term>
+      <listitem>
+       <para>
+        Print lots of debug logging output on <filename>stderr</filename>.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-n</option></term>
+      <term><option>--dry-run</option></term>
+      <listitem>
+       <para>
+        The <option>-n</option>/<option>--dry-run</option> option instructs
+        <command>pg_cominebackup</command> to figure out what would be done
+        without actually creating the target directory or any output files.
+        It is particularly useful in comination with <option>--debug</option>.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-N</option></term>
+      <term><option>--no-sync</option></term>
+      <listitem>
+       <para>
+        By default, <command>pg_combinebackup</command> will wait for all files
+        to be written safely to disk.  This option causes
+        <command>pg_combinebackup</command> to return without waiting, which is
+        faster, but means that a subsequent operating system crash can leave
+        the output backup corrupt.  Generally, this option is useful for testing
+        but should not be used when creating a production installation.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-o <replaceable class="parameter">outputdir</replaceable></option></term>
+      <term><option>--output=<replaceable class="parameter">outputdir</replaceable></option></term>
+      <listitem>
+       <para>
+        Specifies the output directory to which the synthetic full backup
+        should be written. Currently, this argument is required.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-T <replaceable class="parameter">olddir</replaceable>=<replaceable class="parameter">newdir</replaceable></option></term>
+      <term><option>--tablespace-mapping=<replaceable class="parameter">olddir</replaceable>=<replaceable class="parameter">newdir</replaceable></option></term>
+      <listitem>
+       <para>
+        Relocates the tablespace in directory <replaceable>olddir</replaceable>
+        to <replaceable>newdir</replaceable> during the backup.
+        <replaceable>olddir</replaceable> is the absolute path of the tablespace
+        as it exists in the first backup specified on the command line,
+        and <replaceable>newdir</replaceable> is the absolute path to use for the
+        tablespace in the reconstructed backup.  If either path needs to contain
+        an equal sign (<literal>=</literal>), precede that with a backslash.
+        This option can be specified multiple times for multiple tablespaces.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>--manifest-checksums=<replaceable class="parameter">algorithm</replaceable></option></term>
+      <listitem>
+       <para>
+        Like <xref linkend="app-pgbasebackup"/>,
+        <application>pg_combinebackup</application> writes a backup manifest
+        in the output directory. This option specifies the checksum algorithm
+        that should be applied to each file included in the backup manifest.
+        Currently, the available algorithms are <literal>NONE</literal>,
+        <literal>CRC32C</literal>, <literal>SHA224</literal>,
+        <literal>SHA256</literal>, <literal>SHA384</literal>,
+        and <literal>SHA512</literal>.  The default is <literal>CRC32C</literal>.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>--no-manifest</option></term>
+      <listitem>
+       <para>
+        Disables generation of a backup manifest. If this option is not
+        specified, a backup manifest for the reconstructed backup will be
+        written to the output directory.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>--sync-method=<replaceable class="parameter">method</replaceable></option></term>
+      <listitem>
+       <para>
+        When set to <literal>fsync</literal>, which is the default,
+        <command>pg_combinebackup</command> will recursively open and synchronize
+        all files in the backup directory.  When the plain format is used, the
+        search for files will follow symbolic links for the WAL directory and
+        each configured tablespace.
+       </para>
+       <para>
+        On Linux, <literal>syncfs</literal> may be used instead to ask the
+        operating system to synchronize the whole file system that contains the
+        backup directory.  When the plain format is used,
+        <command>pg_combinebackup</command> will also synchronize the file systems
+        that contain the WAL files and each tablespace.  See
+        <xref linkend="syncfs"/> for more information about using
+        <function>syncfs()</function>.
+       </para>
+       <para>
+        This option has no effect when <option>--no-sync</option> is used.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+       <term><option>-V</option></term>
+       <term><option>--version</option></term>
+       <listitem>
+       <para>
+        Prints the <application>pg_combinebackup</application> version and
+        exits.
+       </para>
+       </listitem>
+     </varlistentry>
+
+     <varlistentry>
+       <term><option>-?</option></term>
+       <term><option>--help</option></term>
+       <listitem>
+       <para>
+        Shows help about <application>pg_combinebackup</application> command
+        line arguments, and exits.
+       </para>
+       </listitem>
+     </varlistentry>
+
+    </variablelist>
+   </para>
+
+ </refsect1>
+
+ <refsect1>
+  <title>Environment</title>
+
+  <para>
+   This utility, like most other <productname>PostgreSQL</productname> utilities,
+   uses the environment variables supported by <application>libpq</application>
+   (see <xref linkend="libpq-envars"/>).
+  </para>
+
+  <para>
+   The environment variable <envar>PG_COLOR</envar> specifies whether to use
+   color in diagnostic messages. Possible values are
+   <literal>always</literal>, <literal>auto</literal> and
+   <literal>never</literal>.
+  </para>
+ </refsect1>
+
+ <refsect1>
+  <title>See Also</title>
+
+  <simplelist type="inline">
+   <member><xref linkend="app-pgbasebackup"/></member>
+  </simplelist>
+ </refsect1>
+
+</refentry>
index e11b4b6130753970f978ca45b37b0dd345694d1f..a07d2b5e01e605374c98ff40d6a34c675903db61 100644 (file)
    &pgamcheck;
    &pgBasebackup;
    &pgbench;
+   &pgCombinebackup;
    &pgConfig;
    &pgDump;
    &pgDumpall;
index 21d68133ae1065aafd8bfbb2faecdfa765991727..f51d4282bb8dd11a33ca6f98d2baa8fc695f25c9 100644 (file)
@@ -77,6 +77,16 @@ build_backup_content(BackupState *state, bool ishistoryfile)
                appendStringInfo(result, "STOP TIMELINE: %u\n", state->stoptli);
        }
 
+       /* either both istartpoint and istarttli should be set, or neither */
+       Assert(XLogRecPtrIsInvalid(state->istartpoint) == (state->istarttli == 0));
+       if (!XLogRecPtrIsInvalid(state->istartpoint))
+       {
+               appendStringInfo(result, "INCREMENTAL FROM LSN: %X/%X\n",
+                                                LSN_FORMAT_ARGS(state->istartpoint));
+               appendStringInfo(result, "INCREMENTAL FROM TLI: %u\n",
+                                                state->istarttli);
+       }
+
        data = result->data;
        pfree(result);
 
index a2c8fa3981ca6e6db1745b499f7d99ef067afbc5..6f4f81f99277f4127c090bd442ed1d9b7f7817ef 100644 (file)
@@ -1295,6 +1295,12 @@ read_backup_label(XLogRecPtr *checkPointLoc, TimeLineID *backupLabelTLI,
                                                                 tli_from_file, BACKUP_LABEL_FILE)));
        }
 
+       if (fscanf(lfp, "INCREMENTAL FROM LSN: %X/%X\n", &hi, &lo) > 0)
+               ereport(FATAL,
+                               (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+                                errmsg("this is an incremental backup, not a data directory"),
+                                errhint("Use pg_combinebackup to reconstruct a valid data directory.")));
+
        if (ferror(lfp) || FreeFile(lfp))
                ereport(FATAL,
                                (errcode_for_file_access(),
index a67b3c58d4741a4d16feede0ca9ae61ee68bbe74..751e6d3d5e2e4b182b1383c91bfcc194d147a539 100644 (file)
@@ -19,6 +19,7 @@ OBJS = \
        basebackup.o \
        basebackup_copy.o \
        basebackup_gzip.o \
+       basebackup_incremental.o \
        basebackup_lz4.o \
        basebackup_zstd.o \
        basebackup_progress.o \
index 35dd79babcb3774ac27b1e287a7891403c4c21f7..5ee9628422e05b35fc24087cf5bb9abb6d460377 100644 (file)
 #include "access/xlogbackup.h"
 #include "backup/backup_manifest.h"
 #include "backup/basebackup.h"
+#include "backup/basebackup_incremental.h"
 #include "backup/basebackup_sink.h"
 #include "backup/basebackup_target.h"
+#include "catalog/pg_tablespace_d.h"
 #include "commands/defrem.h"
 #include "common/compression.h"
 #include "common/file_perm.h"
@@ -33,6 +35,7 @@
 #include "pgtar.h"
 #include "port.h"
 #include "postmaster/syslogger.h"
+#include "postmaster/walsummarizer.h"
 #include "replication/walsender.h"
 #include "replication/walsender_private.h"
 #include "storage/bufpage.h"
@@ -64,6 +67,7 @@ typedef struct
        bool            fastcheckpoint;
        bool            nowait;
        bool            includewal;
+       bool            incremental;
        uint32          maxrate;
        bool            sendtblspcmapfile;
        bool            send_to_client;
@@ -76,21 +80,28 @@ typedef struct
 } basebackup_options;
 
 static int64 sendTablespace(bbsink *sink, char *path, Oid spcoid, bool sizeonly,
-                                                       struct backup_manifest_info *manifest);
+                                                       struct backup_manifest_info *manifest,
+                                                       IncrementalBackupInfo *ib);
 static int64 sendDir(bbsink *sink, const char *path, int basepathlen, bool sizeonly,
                                         List *tablespaces, bool sendtblspclinks,
-                                        backup_manifest_info *manifest, Oid spcoid);
+                                        backup_manifest_info *manifest, Oid spcoid,
+                                        IncrementalBackupInfo *ib);
 static bool sendFile(bbsink *sink, const char *readfilename, const char *tarfilename,
                                         struct stat *statbuf, bool missing_ok,
                                         Oid dboid, Oid spcoid, RelFileNumber relfilenumber,
                                         unsigned segno,
-                                        backup_manifest_info *manifest);
+                                        backup_manifest_info *manifest,
+                                        unsigned num_incremental_blocks,
+                                        BlockNumber *incremental_blocks,
+                                        unsigned truncation_block_length);
 static off_t read_file_data_into_buffer(bbsink *sink,
                                                                                const char *readfilename, int fd,
                                                                                off_t offset, size_t length,
                                                                                BlockNumber blkno,
                                                                                bool verify_checksum,
                                                                                int *checksum_failures);
+static void push_to_sink(bbsink *sink, pg_checksum_context *checksum_ctx,
+                                                size_t *bytes_done, void *data, size_t length);
 static bool verify_page_checksum(Page page, XLogRecPtr start_lsn,
                                                                 BlockNumber blkno,
                                                                 uint16 *expected_checksum);
@@ -102,7 +113,8 @@ static int64 _tarWriteHeader(bbsink *sink, const char *filename,
                                                         bool sizeonly);
 static void _tarWritePadding(bbsink *sink, int len);
 static void convert_link_to_directory(const char *pathbuf, struct stat *statbuf);
-static void perform_base_backup(basebackup_options *opt, bbsink *sink);
+static void perform_base_backup(basebackup_options *opt, bbsink *sink,
+                                                               IncrementalBackupInfo *ib);
 static void parse_basebackup_options(List *options, basebackup_options *opt);
 static int     compareWalFileNames(const ListCell *a, const ListCell *b);
 static int     basebackup_read_file(int fd, char *buf, size_t nbytes, off_t offset,
@@ -220,7 +232,8 @@ static const struct exclude_list_item excludeFiles[] =
  * clobbered by longjmp" from stupider versions of gcc.
  */
 static void
-perform_base_backup(basebackup_options *opt, bbsink *sink)
+perform_base_backup(basebackup_options *opt, bbsink *sink,
+                                       IncrementalBackupInfo *ib)
 {
        bbsink_state state;
        XLogRecPtr      endptr;
@@ -270,6 +283,10 @@ perform_base_backup(basebackup_options *opt, bbsink *sink)
                ListCell   *lc;
                tablespaceinfo *newti;
 
+               /* If this is an incremental backup, execute preparatory steps. */
+               if (ib != NULL)
+                       PrepareForIncrementalBackup(ib, backup_state);
+
                /* Add a node for the base directory at the end */
                newti = palloc0(sizeof(tablespaceinfo));
                newti->size = -1;
@@ -289,10 +306,10 @@ perform_base_backup(basebackup_options *opt, bbsink *sink)
 
                                if (tmp->path == NULL)
                                        tmp->size = sendDir(sink, ".", 1, true, state.tablespaces,
-                                                                               true, NULL, InvalidOid);
+                                                                               true, NULL, InvalidOid, NULL);
                                else
                                        tmp->size = sendTablespace(sink, tmp->path, tmp->oid, true,
-                                                                                          NULL);
+                                                                                          NULL, NULL);
                                state.bytes_total += tmp->size;
                        }
                        state.bytes_total_is_valid = true;
@@ -330,7 +347,7 @@ perform_base_backup(basebackup_options *opt, bbsink *sink)
 
                                /* Then the bulk of the files... */
                                sendDir(sink, ".", 1, false, state.tablespaces,
-                                               sendtblspclinks, &manifest, InvalidOid);
+                                               sendtblspclinks, &manifest, InvalidOid, ib);
 
                                /* ... and pg_control after everything else. */
                                if (lstat(XLOG_CONTROL_FILE, &statbuf) != 0)
@@ -340,7 +357,7 @@ perform_base_backup(basebackup_options *opt, bbsink *sink)
                                                                        XLOG_CONTROL_FILE)));
                                sendFile(sink, XLOG_CONTROL_FILE, XLOG_CONTROL_FILE, &statbuf,
                                                 false, InvalidOid, InvalidOid,
-                                                InvalidRelFileNumber, 0, &manifest);
+                                                InvalidRelFileNumber, 0, &manifest, 0, NULL, 0);
                        }
                        else
                        {
@@ -348,7 +365,7 @@ perform_base_backup(basebackup_options *opt, bbsink *sink)
 
                                bbsink_begin_archive(sink, archive_name);
 
-                               sendTablespace(sink, ti->path, ti->oid, false, &manifest);
+                               sendTablespace(sink, ti->path, ti->oid, false, &manifest, ib);
                        }
 
                        /*
@@ -610,7 +627,7 @@ perform_base_backup(basebackup_options *opt, bbsink *sink)
 
                        sendFile(sink, pathbuf, pathbuf, &statbuf, false,
                                         InvalidOid, InvalidOid, InvalidRelFileNumber, 0,
-                                        &manifest);
+                                        &manifest, 0, NULL, 0);
 
                        /* unconditionally mark file as archived */
                        StatusFilePath(pathbuf, fname, ".done");
@@ -686,6 +703,7 @@ parse_basebackup_options(List *options, basebackup_options *opt)
        bool            o_checkpoint = false;
        bool            o_nowait = false;
        bool            o_wal = false;
+       bool            o_incremental = false;
        bool            o_maxrate = false;
        bool            o_tablespace_map = false;
        bool            o_noverify_checksums = false;
@@ -764,6 +782,20 @@ parse_basebackup_options(List *options, basebackup_options *opt)
                        opt->includewal = defGetBoolean(defel);
                        o_wal = true;
                }
+               else if (strcmp(defel->defname, "incremental") == 0)
+               {
+                       if (o_incremental)
+                               ereport(ERROR,
+                                               (errcode(ERRCODE_SYNTAX_ERROR),
+                                                errmsg("duplicate option \"%s\"", defel->defname)));
+                       opt->incremental = defGetBoolean(defel);
+                       if (opt->incremental && !summarize_wal)
+                               ereport(ERROR,
+                                               (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+                                                errmsg("incremental backups cannot be taken unless WAL summarization is enabled")));
+                       opt->incremental = defGetBoolean(defel);
+                       o_incremental = true;
+               }
                else if (strcmp(defel->defname, "max_rate") == 0)
                {
                        int64           maxrate;
@@ -956,7 +988,7 @@ parse_basebackup_options(List *options, basebackup_options *opt)
  * the filesystem, bypassing the buffer cache.
  */
 void
-SendBaseBackup(BaseBackupCmd *cmd)
+SendBaseBackup(BaseBackupCmd *cmd, IncrementalBackupInfo *ib)
 {
        basebackup_options opt;
        bbsink     *sink;
@@ -980,6 +1012,20 @@ SendBaseBackup(BaseBackupCmd *cmd)
                set_ps_display(activitymsg);
        }
 
+       /*
+        * If we're asked to perform an incremental backup and the user has not
+        * supplied a manifest, that's an ERROR.
+        *
+        * If we're asked to perform a full backup and the user did supply a
+        * manifest, just ignore it.
+        */
+       if (!opt.incremental)
+               ib = NULL;
+       else if (ib == NULL)
+               ereport(ERROR,
+                               (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+                                errmsg("must UPLOAD_MANIFEST before performing an incremental BASE_BACKUP")));
+
        /*
         * If the target is specifically 'client' then set up to stream the backup
         * to the client; otherwise, it's being sent someplace else and should not
@@ -1011,7 +1057,7 @@ SendBaseBackup(BaseBackupCmd *cmd)
         */
        PG_TRY();
        {
-               perform_base_backup(&opt, sink);
+               perform_base_backup(&opt, sink, ib);
        }
        PG_FINALLY();
        {
@@ -1089,7 +1135,7 @@ sendFileWithContent(bbsink *sink, const char *filename, const char *content,
  */
 static int64
 sendTablespace(bbsink *sink, char *path, Oid spcoid, bool sizeonly,
-                          backup_manifest_info *manifest)
+                          backup_manifest_info *manifest, IncrementalBackupInfo *ib)
 {
        int64           size;
        char            pathbuf[MAXPGPATH];
@@ -1123,7 +1169,7 @@ sendTablespace(bbsink *sink, char *path, Oid spcoid, bool sizeonly,
 
        /* Send all the files in the tablespace version directory */
        size += sendDir(sink, pathbuf, strlen(path), sizeonly, NIL, true, manifest,
-                                       spcoid);
+                                       spcoid, ib);
 
        return size;
 }
@@ -1143,7 +1189,7 @@ sendTablespace(bbsink *sink, char *path, Oid spcoid, bool sizeonly,
 static int64
 sendDir(bbsink *sink, const char *path, int basepathlen, bool sizeonly,
                List *tablespaces, bool sendtblspclinks, backup_manifest_info *manifest,
-               Oid spcoid)
+               Oid spcoid, IncrementalBackupInfo *ib)
 {
        DIR                *dir;
        struct dirent *de;
@@ -1152,7 +1198,16 @@ sendDir(bbsink *sink, const char *path, int basepathlen, bool sizeonly,
        int64           size = 0;
        const char *lastDir;            /* Split last dir from parent path. */
        bool            isRelationDir = false;  /* Does directory contain relations? */
+       bool            isGlobalDir = false;
        Oid                     dboid = InvalidOid;
+       BlockNumber *relative_block_numbers = NULL;
+
+       /*
+        * Since this array is relatively large, avoid putting it on the stack.
+        * But we don't need it at all if this is not an incremental backup.
+        */
+       if (ib != NULL)
+               relative_block_numbers = palloc(sizeof(BlockNumber) * RELSEG_SIZE);
 
        /*
         * Determine if the current path is a database directory that can contain
@@ -1185,7 +1240,10 @@ sendDir(bbsink *sink, const char *path, int basepathlen, bool sizeonly,
                }
        }
        else if (strcmp(path, "./global") == 0)
+       {
                isRelationDir = true;
+               isGlobalDir = true;
+       }
 
        dir = AllocateDir(path);
        while ((de = ReadDir(dir, path)) != NULL)
@@ -1334,11 +1392,13 @@ sendDir(bbsink *sink, const char *path, int basepathlen, bool sizeonly,
                                                                        &statbuf, sizeonly);
 
                        /*
-                        * Also send archive_status directory (by hackishly reusing
-                        * statbuf from above ...).
+                        * Also send archive_status and summaries directories (by
+                        * hackishly reusing statbuf from above ...).
                         */
                        size += _tarWriteHeader(sink, "./pg_wal/archive_status", NULL,
                                                                        &statbuf, sizeonly);
+                       size += _tarWriteHeader(sink, "./pg_wal/summaries", NULL,
+                                                                       &statbuf, sizeonly);
 
                        continue;                       /* don't recurse into pg_wal */
                }
@@ -1407,16 +1467,64 @@ sendDir(bbsink *sink, const char *path, int basepathlen, bool sizeonly,
 
                        if (!skip_this_dir)
                                size += sendDir(sink, pathbuf, basepathlen, sizeonly, tablespaces,
-                                                               sendtblspclinks, manifest, spcoid);
+                                                               sendtblspclinks, manifest, spcoid, ib);
                }
                else if (S_ISREG(statbuf.st_mode))
                {
                        bool            sent = false;
+                       unsigned        num_blocks_required = 0;
+                       unsigned        truncation_block_length = 0;
+                       char            tarfilenamebuf[MAXPGPATH * 2];
+                       char       *tarfilename = pathbuf + basepathlen + 1;
+                       FileBackupMethod method = BACK_UP_FILE_FULLY;
+
+                       if (ib != NULL && isRelationFile)
+                       {
+                               Oid                     relspcoid;
+                               char       *lookup_path;
+
+                               if (OidIsValid(spcoid))
+                               {
+                                       relspcoid = spcoid;
+                                       lookup_path = psprintf("pg_tblspc/%u/%s", spcoid,
+                                                                                  tarfilename);
+                               }
+                               else
+                               {
+                                       if (isGlobalDir)
+                                               relspcoid = GLOBALTABLESPACE_OID;
+                                       else
+                                               relspcoid = DEFAULTTABLESPACE_OID;
+                                       lookup_path = pstrdup(tarfilename);
+                               }
+
+                               method = GetFileBackupMethod(ib, lookup_path, dboid, relspcoid,
+                                                                                        relfilenumber, relForkNum,
+                                                                                        segno, statbuf.st_size,
+                                                                                        &num_blocks_required,
+                                                                                        relative_block_numbers,
+                                                                                        &truncation_block_length);
+                               if (method == BACK_UP_FILE_INCREMENTALLY)
+                               {
+                                       statbuf.st_size =
+                                               GetIncrementalFileSize(num_blocks_required);
+                                       snprintf(tarfilenamebuf, sizeof(tarfilenamebuf),
+                                                        "%s/INCREMENTAL.%s",
+                                                        path + basepathlen + 1,
+                                                        de->d_name);
+                                       tarfilename = tarfilenamebuf;
+                               }
+
+                               pfree(lookup_path);
+                       }
 
                        if (!sizeonly)
-                               sent = sendFile(sink, pathbuf, pathbuf + basepathlen + 1, &statbuf,
+                               sent = sendFile(sink, pathbuf, tarfilename, &statbuf,
                                                                true, dboid, spcoid,
-                                                               relfilenumber, segno, manifest);
+                                                               relfilenumber, segno, manifest,
+                                                               num_blocks_required,
+                                                               method == BACK_UP_FILE_INCREMENTALLY ? relative_block_numbers : NULL,
+                                                               truncation_block_length);
 
                        if (sent || sizeonly)
                        {
@@ -1434,6 +1542,10 @@ sendDir(bbsink *sink, const char *path, int basepathlen, bool sizeonly,
                        ereport(WARNING,
                                        (errmsg("skipping special file \"%s\"", pathbuf)));
        }
+
+       if (relative_block_numbers != NULL)
+               pfree(relative_block_numbers);
+
        FreeDir(dir);
        return size;
 }
@@ -1446,6 +1558,12 @@ sendDir(bbsink *sink, const char *path, int basepathlen, bool sizeonly,
  * If dboid is anything other than InvalidOid then any checksum failures
  * detected will get reported to the cumulative stats system.
  *
+ * If the file is to be sent incrementally, then num_incremental_blocks
+ * should be the number of blocks to be sent, and incremental_blocks
+ * an array of block numbers relative to the start of the current segment.
+ * If the whole file is to be sent, then incremental_blocks should be NULL,
+ * and num_incremental_blocks can have any value, as it will be ignored.
+ *
  * Returns true if the file was successfully sent, false if 'missing_ok',
  * and the file did not exist.
  */
@@ -1453,7 +1571,8 @@ static bool
 sendFile(bbsink *sink, const char *readfilename, const char *tarfilename,
                 struct stat *statbuf, bool missing_ok, Oid dboid, Oid spcoid,
                 RelFileNumber relfilenumber, unsigned segno,
-                backup_manifest_info *manifest)
+                backup_manifest_info *manifest, unsigned num_incremental_blocks,
+                BlockNumber *incremental_blocks, unsigned truncation_block_length)
 {
        int                     fd;
        BlockNumber blkno = 0;
@@ -1462,6 +1581,7 @@ sendFile(bbsink *sink, const char *readfilename, const char *tarfilename,
        pgoff_t         bytes_done = 0;
        bool            verify_checksum = false;
        pg_checksum_context checksum_ctx;
+       int                     ibindex = 0;
 
        if (pg_checksum_init(&checksum_ctx, manifest->checksum_type) < 0)
                elog(ERROR, "could not initialize checksum of file \"%s\"",
@@ -1494,22 +1614,111 @@ sendFile(bbsink *sink, const char *readfilename, const char *tarfilename,
                RelFileNumberIsValid(relfilenumber))
                verify_checksum = true;
 
+       /*
+        * If we're sending an incremental file, write the file header.
+        */
+       if (incremental_blocks != NULL)
+       {
+               unsigned        magic = INCREMENTAL_MAGIC;
+               size_t          header_bytes_done = 0;
+
+               /* Emit header data. */
+               push_to_sink(sink, &checksum_ctx, &header_bytes_done,
+                                        &magic, sizeof(magic));
+               push_to_sink(sink, &checksum_ctx, &header_bytes_done,
+                                        &num_incremental_blocks, sizeof(num_incremental_blocks));
+               push_to_sink(sink, &checksum_ctx, &header_bytes_done,
+                                        &truncation_block_length, sizeof(truncation_block_length));
+               push_to_sink(sink, &checksum_ctx, &header_bytes_done,
+                                        incremental_blocks,
+                                        sizeof(BlockNumber) * num_incremental_blocks);
+
+               /* Flush out any data still in the buffer so it's again empty. */
+               if (header_bytes_done > 0)
+               {
+                       bbsink_archive_contents(sink, header_bytes_done);
+                       if (pg_checksum_update(&checksum_ctx,
+                                                                  (uint8 *) sink->bbs_buffer,
+                                                                  header_bytes_done) < 0)
+                               elog(ERROR, "could not update checksum of base backup");
+               }
+
+               /* Update our notion of file position. */
+               bytes_done += sizeof(magic);
+               bytes_done += sizeof(num_incremental_blocks);
+               bytes_done += sizeof(truncation_block_length);
+               bytes_done += sizeof(BlockNumber) * num_incremental_blocks;
+       }
+
        /*
         * Loop until we read the amount of data the caller told us to expect. The
         * file could be longer, if it was extended while we were sending it, but
         * for a base backup we can ignore such extended data. It will be restored
         * from WAL.
         */
-       while (bytes_done < statbuf->st_size)
+       while (1)
        {
-               size_t          remaining = statbuf->st_size - bytes_done;
+               /*
+                * Determine whether we've read all the data that we need, and if not,
+                * read some more.
+                */
+               if (incremental_blocks == NULL)
+               {
+                       size_t          remaining = statbuf->st_size - bytes_done;
+
+                       /*
+                        * If we've read the required number of bytes, then it's time to
+                        * stop.
+                        */
+                       if (bytes_done >= statbuf->st_size)
+                               break;
+
+                       /*
+                        * Read as many bytes as will fit in the buffer, or however many
+                        * are left to read, whichever is less.
+                        */
+                       cnt = read_file_data_into_buffer(sink, readfilename, fd,
+                                                                                        bytes_done, remaining,
+                                                                                        blkno + segno * RELSEG_SIZE,
+                                                                                        verify_checksum,
+                                                                                        &checksum_failures);
+               }
+               else
+               {
+                       BlockNumber relative_blkno;
 
-               /* Try to read some more data. */
-               cnt = read_file_data_into_buffer(sink, readfilename, fd, bytes_done,
-                                                                                remaining,
-                                                                                blkno + segno * RELSEG_SIZE,
-                                                                                verify_checksum,
-                                                                                &checksum_failures);
+                       /*
+                        * If we've read all the blocks, then it's time to stop.
+                        */
+                       if (ibindex >= num_incremental_blocks)
+                               break;
+
+                       /*
+                        * Read just one block, whichever one is the next that we're
+                        * supposed to include.
+                        */
+                       relative_blkno = incremental_blocks[ibindex++];
+                       cnt = read_file_data_into_buffer(sink, readfilename, fd,
+                                                                                        relative_blkno * BLCKSZ,
+                                                                                        BLCKSZ,
+                                                                                        relative_blkno + segno * RELSEG_SIZE,
+                                                                                        verify_checksum,
+                                                                                        &checksum_failures);
+
+                       /*
+                        * If we get a partial read, that must mean that the relation is
+                        * being truncated. Ultimately, it should be truncated to a
+                        * multiple of BLCKSZ, since this path should only be reached for
+                        * relation files, but we might transiently observe an
+                        * intermediate value.
+                        *
+                        * It should be fine to treat this just as if the entire block had
+                        * been truncated away - i.e. fill this and all later blocks with
+                        * zeroes. WAL replay will fix things up.
+                        */
+                       if (cnt < BLCKSZ)
+                               break;
+               }
 
                /*
                 * If the amount of data we were able to read was not a multiple of
@@ -1692,6 +1901,56 @@ read_file_data_into_buffer(bbsink *sink, const char *readfilename, int fd,
        return cnt;
 }
 
+/*
+ * Push data into a bbsink.
+ *
+ * It's better, when possible, to read data directly into the bbsink's buffer,
+ * rather than using this function to copy it into the buffer; this function is
+ * for cases where that approach is not practical.
+ *
+ * bytes_done should point to a count of the number of bytes that are
+ * currently used in the bbsink's buffer. Upon return, the bytes identified by
+ * data and length will have been copied into the bbsink's buffer, flushing
+ * as required, and *bytes_done will have been updated accordingly. If the
+ * buffer was flushed, the previous contents will also have been fed to
+ * checksum_ctx.
+ *
+ * Note that after one or more calls to this function it is the caller's
+ * responsibility to perform any required final flush.
+ */
+static void
+push_to_sink(bbsink *sink, pg_checksum_context *checksum_ctx,
+                        size_t *bytes_done, void *data, size_t length)
+{
+       while (length > 0)
+       {
+               size_t          bytes_to_copy;
+
+               /*
+                * We use < here rather than <= so that if the data exactly fills the
+                * remaining buffer space, we trigger a flush now.
+                */
+               if (length < sink->bbs_buffer_length - *bytes_done)
+               {
+                       /* Append remaining data to buffer. */
+                       memcpy(sink->bbs_buffer + *bytes_done, data, length);
+                       *bytes_done += length;
+                       return;
+               }
+
+               /* Copy until buffer is full and flush it. */
+               bytes_to_copy = sink->bbs_buffer_length - *bytes_done;
+               memcpy(sink->bbs_buffer + *bytes_done, data, bytes_to_copy);
+               data = ((char *) data) + bytes_to_copy;
+               length -= bytes_to_copy;
+               bbsink_archive_contents(sink, sink->bbs_buffer_length);
+               if (pg_checksum_update(checksum_ctx, (uint8 *) sink->bbs_buffer,
+                                                          sink->bbs_buffer_length) < 0)
+                       elog(ERROR, "could not update checksum");
+               *bytes_done = 0;
+       }
+}
+
 /*
  * Try to verify the checksum for the provided page, if it seems appropriate
  * to do so.
diff --git a/src/backend/backup/basebackup_incremental.c b/src/backend/backup/basebackup_incremental.c
new file mode 100644 (file)
index 0000000..1e5a5ac
--- /dev/null
@@ -0,0 +1,1003 @@
+/*-------------------------------------------------------------------------
+ *
+ * basebackup_incremental.c
+ *       code for incremental backup support
+ *
+ * This code isn't actually in charge of taking an incremental backup;
+ * the actual construction of the incremental backup happens in
+ * basebackup.c. Here, we're concerned with providing the necessary
+ * supports for that operation. In particular, we need to parse the
+ * backup manifest supplied by the user taking the incremental backup
+ * and extract the required information from it.
+ *
+ * Portions Copyright (c) 2010-2022, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *       src/backend/backup/basebackup_incremental.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/timeline.h"
+#include "access/xlog.h"
+#include "access/xlogrecovery.h"
+#include "backup/basebackup_incremental.h"
+#include "backup/walsummary.h"
+#include "common/blkreftable.h"
+#include "common/parse_manifest.h"
+#include "common/hashfn.h"
+#include "postmaster/walsummarizer.h"
+
+#define        BLOCKS_PER_READ                 512
+
+/*
+ * Details extracted from the WAL ranges present in the supplied backup manifest.
+ */
+typedef struct
+{
+       TimeLineID      tli;
+       XLogRecPtr      start_lsn;
+       XLogRecPtr      end_lsn;
+} backup_wal_range;
+
+/*
+ * Details extracted from the file list present in the supplied backup manifest.
+ */
+typedef struct
+{
+       uint32          status;
+       const char *path;
+       size_t          size;
+} backup_file_entry;
+
+static uint32 hash_string_pointer(const char *s);
+#define SH_PREFIX               backup_file
+#define SH_ELEMENT_TYPE                        backup_file_entry
+#define SH_KEY_TYPE             const char *
+#define SH_KEY                  path
+#define SH_HASH_KEY(tb, key)    hash_string_pointer(key)
+#define SH_EQUAL(tb, a, b)             (strcmp(a, b) == 0)
+#define SH_SCOPE                static inline
+#define SH_DECLARE
+#define SH_DEFINE
+#include "lib/simplehash.h"
+
+struct IncrementalBackupInfo
+{
+       /* Memory context for this object and its subsidiary objects. */
+       MemoryContext mcxt;
+
+       /* Temporary buffer for storing the manifest while parsing it. */
+       StringInfoData buf;
+
+       /* WAL ranges extracted from the backup manifest. */
+       List       *manifest_wal_ranges;
+
+       /*
+        * Files extracted from the backup manifest.
+        *
+        * We don't really need this information, because we use WAL summaries to
+        * figure what's changed. It would be unsafe to just rely on the list of
+        * files that existed before, because it's possible for a file to be
+        * removed and a new one created with the same name and different
+        * contents. In such cases, the whole file must still be sent. We can tell
+        * from the WAL summaries whether that happened, but not from the file
+        * list.
+        *
+        * Nonetheless, this data is useful for sanity checking. If a file that we
+        * think we shouldn't need to send is not present in the manifest for the
+        * prior backup, something has gone terribly wrong. We retain the file
+        * names and sizes, but not the checksums or last modified times, for
+        * which we have no use.
+        *
+        * One significant downside of storing this data is that it consumes
+        * memory. If that turns out to be a problem, we might have to decide not
+        * to retain this information, or to make it optional.
+        */
+       backup_file_hash *manifest_files;
+
+       /*
+        * Block-reference table for the incremental backup.
+        *
+        * It's possible that storing the entire block-reference table in memory
+        * will be a problem for some users. The in-memory format that we're using
+        * here is pretty efficient, converging to little more than 1 bit per
+        * block for relation forks with large numbers of modified blocks. It's
+        * possible, however, that if you try to perform an incremental backup of
+        * a database with a sufficiently large number of relations on a
+        * sufficiently small machine, you could run out of memory here. If that
+        * turns out to be a problem in practice, we'll need to be more clever.
+        */
+       BlockRefTable *brtab;
+};
+
+static void manifest_process_file(JsonManifestParseContext *context,
+                                                                 char *pathname,
+                                                                 size_t size,
+                                                                 pg_checksum_type checksum_type,
+                                                                 int checksum_length,
+                                                                 uint8 *checksum_payload);
+static void manifest_process_wal_range(JsonManifestParseContext *context,
+                                                                          TimeLineID tli,
+                                                                          XLogRecPtr start_lsn,
+                                                                          XLogRecPtr end_lsn);
+static void manifest_report_error(JsonManifestParseContext *ib,
+                                                                 const char *fmt,...)
+                       pg_attribute_printf(2, 3) pg_attribute_noreturn();
+static int     compare_block_numbers(const void *a, const void *b);
+
+/*
+ * Create a new object for storing information extracted from the manifest
+ * supplied when creating an incremental backup.
+ */
+IncrementalBackupInfo *
+CreateIncrementalBackupInfo(MemoryContext mcxt)
+{
+       IncrementalBackupInfo *ib;
+       MemoryContext oldcontext;
+
+       oldcontext = MemoryContextSwitchTo(mcxt);
+
+       ib = palloc0(sizeof(IncrementalBackupInfo));
+       ib->mcxt = mcxt;
+       initStringInfo(&ib->buf);
+
+       /*
+        * It's hard to guess how many files a "typical" installation will have in
+        * the data directory, but a fresh initdb creates almost 1000 files as of
+        * this writing, so it seems to make sense for our estimate to
+        * substantially higher.
+        */
+       ib->manifest_files = backup_file_create(mcxt, 10000, NULL);
+
+       MemoryContextSwitchTo(oldcontext);
+
+       return ib;
+}
+
+/*
+ * Before taking an incremental backup, the caller must supply the backup
+ * manifest from a prior backup. Each chunk of manifest data recieved
+ * from the client should be passed to this function.
+ */
+void
+AppendIncrementalManifestData(IncrementalBackupInfo *ib, const char *data,
+                                                         int len)
+{
+       MemoryContext oldcontext;
+
+       /* Switch to our memory context. */
+       oldcontext = MemoryContextSwitchTo(ib->mcxt);
+
+       /*
+        * XXX. Our json parser is at present incapable of parsing json blobs
+        * incrementally, so we have to accumulate the entire backup manifest
+        * before we can do anything with it. This should really be fixed, since
+        * some users might have very large numbers of files in the data
+        * directory.
+        */
+       appendBinaryStringInfo(&ib->buf, data, len);
+
+       /* Switch back to previous memory context. */
+       MemoryContextSwitchTo(oldcontext);
+}
+
+/*
+ * Finalize an IncrementalBackupInfo object after all manifest data has
+ * been supplied via calls to AppendIncrementalManifestData.
+ */
+void
+FinalizeIncrementalManifest(IncrementalBackupInfo *ib)
+{
+       JsonManifestParseContext context;
+       MemoryContext oldcontext;
+
+       /* Switch to our memory context. */
+       oldcontext = MemoryContextSwitchTo(ib->mcxt);
+
+       /* Parse the manifest. */
+       context.private_data = ib;
+       context.per_file_cb = manifest_process_file;
+       context.per_wal_range_cb = manifest_process_wal_range;
+       context.error_cb = manifest_report_error;
+       json_parse_manifest(&context, ib->buf.data, ib->buf.len);
+
+       /* Done with the buffer, so release memory. */
+       pfree(ib->buf.data);
+       ib->buf.data = NULL;
+
+       /* Switch back to previous memory context. */
+       MemoryContextSwitchTo(oldcontext);
+}
+
+/*
+ * Prepare to take an incremental backup.
+ *
+ * Before this function is called, AppendIncrementalManifestData and
+ * FinalizeIncrementalManifest should have already been called to pass all
+ * the manifest data to this object.
+ *
+ * This function performs sanity checks on the data extracted from the
+ * manifest and figures out for which WAL ranges we need summaries, and
+ * whether those summaries are available. Then, it reads and combines the
+ * data from those summary files. It also updates the backup_state with the
+ * reference TLI and LSN for the prior backup.
+ */
+void
+PrepareForIncrementalBackup(IncrementalBackupInfo *ib,
+                                                       BackupState *backup_state)
+{
+       MemoryContext oldcontext;
+       List       *expectedTLEs;
+       List       *all_wslist,
+                          *required_wslist = NIL;
+       ListCell   *lc;
+       TimeLineHistoryEntry **tlep;
+       int                     num_wal_ranges;
+       int                     i;
+       bool            found_backup_start_tli = false;
+       TimeLineID      earliest_wal_range_tli = 0;
+       XLogRecPtr      earliest_wal_range_start_lsn = InvalidXLogRecPtr;
+       TimeLineID      latest_wal_range_tli = 0;
+       XLogRecPtr      summarized_lsn;
+       XLogRecPtr      pending_lsn;
+       XLogRecPtr      prior_pending_lsn = InvalidXLogRecPtr;
+       int                     deadcycles = 0;
+       TimestampTz initial_time,
+                               current_time;
+
+       Assert(ib->buf.data == NULL);
+
+       /* Switch to our memory context. */
+       oldcontext = MemoryContextSwitchTo(ib->mcxt);
+
+       /*
+        * A valid backup manifest must always contain at least one WAL range
+        * (usually exactly one, unless the backup spanned a timeline switch).
+        */
+       num_wal_ranges = list_length(ib->manifest_wal_ranges);
+       if (num_wal_ranges == 0)
+               ereport(ERROR,
+                               (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+                                errmsg("manifest contains no required WAL ranges")));
+
+       /*
+        * Match up the TLIs that appear in the WAL ranges of the backup manifest
+        * with those that appear in this server's timeline history. We expect
+        * every backup_wal_range to match to a TimeLineHistoryEntry; if it does
+        * not, that's an error.
+        *
+        * This loop also decides which of the WAL ranges is the manifest is most
+        * ancient and which one is the newest, according to the timeline history
+        * of this server, and stores TLIs of those WAL ranges into
+        * earliest_wal_range_tli and latest_wal_range_tli. It also updates
+        * earliest_wal_range_start_lsn to the start LSN of the WAL range for
+        * earliest_wal_range_tli.
+        *
+        * Note that the return value of readTimeLineHistory puts the latest
+        * timeline at the beginning of the list, not the end. Hence, the earliest
+        * TLI is the one that occurs nearest the end of the list returned by
+        * readTimeLineHistory, and the latest TLI is the one that occurs closest
+        * to the beginning.
+        */
+       expectedTLEs = readTimeLineHistory(backup_state->starttli);
+       tlep = palloc0(num_wal_ranges * sizeof(TimeLineHistoryEntry *));
+       for (i = 0; i < num_wal_ranges; ++i)
+       {
+               backup_wal_range *range = list_nth(ib->manifest_wal_ranges, i);
+               bool            saw_earliest_wal_range_tli = false;
+               bool            saw_latest_wal_range_tli = false;
+
+               /* Search this server's history for this WAL range's TLI. */
+               foreach(lc, expectedTLEs)
+               {
+                       TimeLineHistoryEntry *tle = lfirst(lc);
+
+                       if (tle->tli == range->tli)
+                       {
+                               tlep[i] = tle;
+                               break;
+                       }
+
+                       if (tle->tli == earliest_wal_range_tli)
+                               saw_earliest_wal_range_tli = true;
+                       if (tle->tli == latest_wal_range_tli)
+                               saw_latest_wal_range_tli = true;
+               }
+
+               /*
+                * An incremental backup can only be taken relative to a backup that
+                * represents a previous state of this server. If the backup requires
+                * WAL from a timeline that's not in our history, that definitely
+                * isn't the case.
+                */
+               if (tlep[i] == NULL)
+                       ereport(ERROR,
+                                       (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+                                        errmsg("timeline %u found in manifest, but not in this server's history",
+                                                       range->tli)));
+
+               /*
+                * If we found this TLI in the server's history before encountering
+                * the latest TLI seen so far in the server's history, then this TLI
+                * is the latest one seen so far.
+                *
+                * If on the other hand we saw the earliest TLI seen so far before
+                * finding this TLI, this TLI is earlier than the earliest one seen so
+                * far. And if this is the first TLI for which we've searched, it's
+                * also the earliest one seen so far.
+                *
+                * On the first loop iteration, both things should necessarily be
+                * true.
+                */
+               if (!saw_latest_wal_range_tli)
+                       latest_wal_range_tli = range->tli;
+               if (earliest_wal_range_tli == 0 || saw_earliest_wal_range_tli)
+               {
+                       earliest_wal_range_tli = range->tli;
+                       earliest_wal_range_start_lsn = range->start_lsn;
+               }
+       }
+
+       /*
+        * Propagate information about the prior backup into the backup_label that
+        * will be generated for this backup.
+        */
+       backup_state->istartpoint = earliest_wal_range_start_lsn;
+       backup_state->istarttli = earliest_wal_range_tli;
+
+       /*
+        * Sanity check start and end LSNs for the WAL ranges in the manifest.
+        *
+        * Commonly, there won't be any timeline switches during the prior backup
+        * at all, but if there are, they should happen at the same LSNs that this
+        * server switched timelines.
+        *
+        * Whether there are any timeline switches during the prior backup or not,
+        * the prior backup shouldn't require any WAL from a timeline prior to the
+        * start of that timeline. It also shouldn't require any WAL from later
+        * than the start of this backup.
+        *
+        * If any of these sanity checks fail, one possible explanation is that
+        * the user has generated WAL on the same timeline with the same LSNs more
+        * than once. For instance, if two standbys running on timeline 1 were
+        * both promoted and (due to a broken archiving setup) both selected new
+        * timeline ID 2, then it's possible that one of these checks might trip.
+        *
+        * Note that there are lots of ways for the user to do something very bad
+        * without tripping any of these checks, and they are not intended to be
+        * comprehensive. It's pretty hard to see how we could be certain of
+        * anything here. However, if there's a problem staring us right in the
+        * face, it's best to report it, so we do.
+        */
+       for (i = 0; i < num_wal_ranges; ++i)
+       {
+               backup_wal_range *range = list_nth(ib->manifest_wal_ranges, i);
+
+               if (range->tli == earliest_wal_range_tli)
+               {
+                       if (range->start_lsn < tlep[i]->begin)
+                               ereport(ERROR,
+                                               (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+                                                errmsg("manifest requires WAL from initial timeline %u starting at %X/%X, but that timeline begins at %X/%X",
+                                                               range->tli,
+                                                               LSN_FORMAT_ARGS(range->start_lsn),
+                                                               LSN_FORMAT_ARGS(tlep[i]->begin))));
+               }
+               else
+               {
+                       if (range->start_lsn != tlep[i]->begin)
+                               ereport(ERROR,
+                                               (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+                                                errmsg("manifest requires WAL from continuation timeline %u starting at %X/%X, but that timeline begins at %X/%X",
+                                                               range->tli,
+                                                               LSN_FORMAT_ARGS(range->start_lsn),
+                                                               LSN_FORMAT_ARGS(tlep[i]->begin))));
+               }
+
+               if (range->tli == latest_wal_range_tli)
+               {
+                       if (range->end_lsn > backup_state->startpoint)
+                               ereport(ERROR,
+                                               (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+                                                errmsg("manifest requires WAL from final timeline %u ending at %X/%X, but this backup starts at %X/%X",
+                                                               range->tli,
+                                                               LSN_FORMAT_ARGS(range->end_lsn),
+                                                               LSN_FORMAT_ARGS(backup_state->startpoint))));
+               }
+               else
+               {
+                       if (range->end_lsn != tlep[i]->end)
+                               ereport(ERROR,
+                                               (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+                                                errmsg("manifest requires WAL from non-final timeline %u ending at %X/%X, but this server switched timelines at %X/%X",
+                                                               range->tli,
+                                                               LSN_FORMAT_ARGS(range->end_lsn),
+                                                               LSN_FORMAT_ARGS(tlep[i]->end))));
+               }
+
+       }
+
+       /*
+        * Wait for WAL summarization to catch up to the backup start LSN (but
+        * time out if it doesn't do so quickly enough).
+        */
+       initial_time = current_time = GetCurrentTimestamp();
+       while (1)
+       {
+               long            timeout_in_ms = 10000;
+               unsigned        elapsed_seconds;
+
+               /*
+                * Align the wait time to prevent drift. This doesn't really matter,
+                * but we'd like the warnings about how long we've been waiting to say
+                * 10 seconds, 20 seconds, 30 seconds, 40 seconds ... without ever
+                * drifting to something that is not a multiple of ten.
+                */
+               timeout_in_ms -=
+                       TimestampDifferenceMilliseconds(current_time, initial_time) %
+                       timeout_in_ms;
+
+               /* Wait for up to 10 seconds. */
+               summarized_lsn = WaitForWalSummarization(backup_state->startpoint,
+                                                                                                10000, &pending_lsn);
+
+               /* If WAL summarization has progressed sufficiently, stop waiting. */
+               if (summarized_lsn >= backup_state->startpoint)
+                       break;
+
+               /*
+                * Keep track of the number of cycles during which there has been no
+                * progression of pending_lsn. If pending_lsn is not advancing, that
+                * means that not only are no new files appearing on disk, but we're
+                * not even incorporating new records into the in-memory state.
+                */
+               if (pending_lsn > prior_pending_lsn)
+               {
+                       prior_pending_lsn = pending_lsn;
+                       deadcycles = 0;
+               }
+               else
+                       ++deadcycles;
+
+               /*
+                * If we've managed to wait for an entire minute withot the WAL
+                * summarizer absorbing a single WAL record, error out; probably
+                * something is wrong.
+                *
+                * We could consider also erroring out if the summarizer is taking too
+                * long to catch up, but it's not clear what rate of progress would be
+                * acceptable and what would be too slow. So instead, we just try to
+                * error out in the case where there's no progress at all. That seems
+                * likely to catch a reasonable number of the things that can go wrong
+                * in practice (e.g. the summarizer process is completely hung, say
+                * because somebody hooked up a debugger to it or something) without
+                * giving up too quickly when the sytem is just slow.
+                */
+               if (deadcycles >= 6)
+                       ereport(ERROR,
+                                       (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+                                        errmsg("WAL summarization is not progressing"),
+                                        errdetail("Summarization is needed through %X/%X, but is stuck at %X/%X on disk and %X/%X in memory.",
+                                                          LSN_FORMAT_ARGS(backup_state->startpoint),
+                                                          LSN_FORMAT_ARGS(summarized_lsn),
+                                                          LSN_FORMAT_ARGS(pending_lsn))));
+
+               /*
+                * Otherwise, just let the user know what's happening.
+                */
+               current_time = GetCurrentTimestamp();
+               elapsed_seconds =
+                       TimestampDifferenceMilliseconds(initial_time, current_time) / 1000;
+               ereport(WARNING,
+                               (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+                                errmsg("still waiting for WAL summarization through %X/%X after %d seconds",
+                                               LSN_FORMAT_ARGS(backup_state->startpoint),
+                                               elapsed_seconds),
+                                errdetail("Summarization has reached %X/%X on disk and %X/%X in memory.",
+                                                  LSN_FORMAT_ARGS(summarized_lsn),
+                                                  LSN_FORMAT_ARGS(pending_lsn))));
+       }
+
+       /*
+        * Retrieve a list of all WAL summaries on any timeline that overlap with
+        * the LSN range of interest. We could instead call GetWalSummaries() once
+        * per timeline in the loop that follows, but that would involve reading
+        * the directory multiple times. It should be mildly faster - and perhaps
+        * a bit safer - to do it just once.
+        */
+       all_wslist = GetWalSummaries(0, earliest_wal_range_start_lsn,
+                                                                backup_state->startpoint);
+
+       /*
+        * We need WAL summaries for everything that happened during the prior
+        * backup and everything that happened afterward up until the point where
+        * the current backup started.
+        */
+       foreach(lc, expectedTLEs)
+       {
+               TimeLineHistoryEntry *tle = lfirst(lc);
+               XLogRecPtr      tli_start_lsn = tle->begin;
+               XLogRecPtr      tli_end_lsn = tle->end;
+               XLogRecPtr      tli_missing_lsn = InvalidXLogRecPtr;
+               List       *tli_wslist;
+
+               /*
+                * Working through the history of this server from the current
+                * timeline backwards, we skip everything until we find the timeline
+                * where this backup started. Most of the time, this means we won't
+                * skip anything at all, as it's unlikely that the timeline has
+                * changed since the beginning of the backup moments ago.
+                */
+               if (tle->tli == backup_state->starttli)
+               {
+                       found_backup_start_tli = true;
+                       tli_end_lsn = backup_state->startpoint;
+               }
+               else if (!found_backup_start_tli)
+                       continue;
+
+               /*
+                * Find the summaries that overlap the LSN range of interest for this
+                * timeline. If this is the earliest timeline involved, the range of
+                * interest begins with the start LSN of the prior backup; otherwise,
+                * it begins at the LSN at which this timeline came into existence. If
+                * this is the latest TLI involved, the range of interest ends at the
+                * start LSN of the current backup; otherwise, it ends at the point
+                * where we switched from this timeline to the next one.
+                */
+               if (tle->tli == earliest_wal_range_tli)
+                       tli_start_lsn = earliest_wal_range_start_lsn;
+               tli_wslist = FilterWalSummaries(all_wslist, tle->tli,
+                                                                               tli_start_lsn, tli_end_lsn);
+
+               /*
+                * There is no guarantee that the WAL summaries we found cover the
+                * entire range of LSNs for which summaries are required, or indeed
+                * that we found any WAL summaries at all. Check whether we have a
+                * problem of that sort.
+                */
+               if (!WalSummariesAreComplete(tli_wslist, tli_start_lsn, tli_end_lsn,
+                                                                        &tli_missing_lsn))
+               {
+                       if (XLogRecPtrIsInvalid(tli_missing_lsn))
+                               ereport(ERROR,
+                                               (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+                                                errmsg("WAL summaries are required on timeline %u from %X/%X to %X/%X, but no summaries for that timeline and LSN range exist",
+                                                               tle->tli,
+                                                               LSN_FORMAT_ARGS(tli_start_lsn),
+                                                               LSN_FORMAT_ARGS(tli_end_lsn))));
+                       else
+                               ereport(ERROR,
+                                               (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+                                                errmsg("WAL summaries are required on timeline %u from %X/%X to %X/%X, but the summaries for that timeline and LSN range are incomplete",
+                                                               tle->tli,
+                                                               LSN_FORMAT_ARGS(tli_start_lsn),
+                                                               LSN_FORMAT_ARGS(tli_end_lsn)),
+                                                errdetail("The first unsummarized LSN is this range is %X/%X.",
+                                                                  LSN_FORMAT_ARGS(tli_missing_lsn))));
+               }
+
+               /*
+                * Remember that we need to read these summaries.
+                *
+                * Technically, it's possible that this could read more files than
+                * required, since tli_wslist in theory could contain redundant
+                * summaries. For instance, if we have a summary from 0/10000000 to
+                * 0/20000000 and also one from 0/00000000 to 0/30000000, then the
+                * latter subsumes the former and the former could be ignored.
+                *
+                * We ignore this possibility because the WAL summarizer only tries to
+                * generate summaries that do not overlap. If somehow they exist,
+                * we'll do a bit of extra work but the results should still be
+                * correct.
+                */
+               required_wslist = list_concat(required_wslist, tli_wslist);
+
+               /*
+                * Timelines earlier than the one in which the prior backup began are
+                * not relevant.
+                */
+               if (tle->tli == earliest_wal_range_tli)
+                       break;
+       }
+
+       /*
+        * Read all of the required block reference table files and merge all of
+        * the data into a single in-memory block reference table.
+        *
+        * See the comments for struct IncrementalBackupInfo for some thoughts on
+        * memory usage.
+        */
+       ib->brtab = CreateEmptyBlockRefTable();
+       foreach(lc, required_wslist)
+       {
+               WalSummaryFile *ws = lfirst(lc);
+               WalSummaryIO wsio;
+               BlockRefTableReader *reader;
+               RelFileLocator rlocator;
+               ForkNumber      forknum;
+               BlockNumber limit_block;
+               BlockNumber blocks[BLOCKS_PER_READ];
+
+               wsio.file = OpenWalSummaryFile(ws, false);
+               wsio.filepos = 0;
+               ereport(DEBUG1,
+                               (errmsg_internal("reading WAL summary file \"%s\"",
+                                                                FilePathName(wsio.file))));
+               reader = CreateBlockRefTableReader(ReadWalSummary, &wsio,
+                                                                                  FilePathName(wsio.file),
+                                                                                  ReportWalSummaryError, NULL);
+               while (BlockRefTableReaderNextRelation(reader, &rlocator, &forknum,
+                                                                                          &limit_block))
+               {
+                       BlockRefTableSetLimitBlock(ib->brtab, &rlocator,
+                                                                          forknum, limit_block);
+
+                       while (1)
+                       {
+                               unsigned        nblocks;
+                               unsigned        i;
+
+                               nblocks = BlockRefTableReaderGetBlocks(reader, blocks,
+                                                                                                          BLOCKS_PER_READ);
+                               if (nblocks == 0)
+                                       break;
+
+                               for (i = 0; i < nblocks; ++i)
+                                       BlockRefTableMarkBlockModified(ib->brtab, &rlocator,
+                                                                                                  forknum, blocks[i]);
+                       }
+               }
+               DestroyBlockRefTableReader(reader);
+               FileClose(wsio.file);
+       }
+
+       /* Switch back to previous memory context. */
+       MemoryContextSwitchTo(oldcontext);
+}
+
+/*
+ * Get the pathname that should be used when a file is sent incrementally.
+ *
+ * The result is a palloc'd string.
+ */
+char *
+GetIncrementalFilePath(Oid dboid, Oid spcoid, RelFileNumber relfilenumber,
+                                          ForkNumber forknum, unsigned segno)
+{
+       char       *path;
+       char       *lastslash;
+       char       *ipath;
+
+       path = GetRelationPath(dboid, spcoid, relfilenumber, InvalidBackendId,
+                                                  forknum);
+
+       lastslash = strrchr(path, '/');
+       Assert(lastslash != NULL);
+       *lastslash = '\0';
+
+       if (segno > 0)
+               ipath = psprintf("%s/INCREMENTAL.%s.%u", path, lastslash + 1, segno);
+       else
+               ipath = psprintf("%s/INCREMENTAL.%s", path, lastslash + 1);
+
+       pfree(path);
+
+       return ipath;
+}
+
+/*
+ * How should we back up a particular file as part of an incremental backup?
+ *
+ * If the return value is BACK_UP_FILE_FULLY, caller should back up the whole
+ * file just as if this were not an incremental backup.
+ *
+ * If the return value is BACK_UP_FILE_INCREMENTALLY, caller should include
+ * an incremental file in the backup instead of the entire file. On return,
+ * *num_blocks_required will be set to the number of blocks that need to be
+ * sent, and the actual block numbers will have been stored in
+ * relative_block_numbers, which should be an array of at least RELSEG_SIZE.
+ * In addition, *truncation_block_length will be set to the value that should
+ * be included in the incremental file.
+ */
+FileBackupMethod
+GetFileBackupMethod(IncrementalBackupInfo *ib, const char *path,
+                                       Oid dboid, Oid spcoid,
+                                       RelFileNumber relfilenumber, ForkNumber forknum,
+                                       unsigned segno, size_t size,
+                                       unsigned *num_blocks_required,
+                                       BlockNumber *relative_block_numbers,
+                                       unsigned *truncation_block_length)
+{
+       BlockNumber absolute_block_numbers[RELSEG_SIZE];
+       BlockNumber limit_block;
+       BlockNumber start_blkno;
+       BlockNumber stop_blkno;
+       RelFileLocator rlocator;
+       BlockRefTableEntry *brtentry;
+       unsigned        i;
+       unsigned        nblocks;
+
+       /* Should only be called after PrepareForIncrementalBackup. */
+       Assert(ib->buf.data == NULL);
+
+       /*
+        * dboid could be InvalidOid if shared rel, but spcoid and relfilenumber
+        * should have legal values.
+        */
+       Assert(OidIsValid(spcoid));
+       Assert(RelFileNumberIsValid(relfilenumber));
+
+       /*
+        * If the file size is too large or not a multiple of BLCKSZ, then
+        * something weird is happening, so give up and send the whole file.
+        */
+       if ((size % BLCKSZ) != 0 || size / BLCKSZ > RELSEG_SIZE)
+               return BACK_UP_FILE_FULLY;
+
+       /*
+        * The free-space map fork is not properly WAL-logged, so we need to
+        * backup the entire file every time.
+        */
+       if (forknum == FSM_FORKNUM)
+               return BACK_UP_FILE_FULLY;
+
+       /*
+        * If this file was not part of the prior backup, back it up fully.
+        *
+        * If this file was created after the prior backup and before the start of
+        * the current backup, then the WAL summary information will tell us to
+        * back up the whole file. However, if this file was created after the
+        * start of the current backup, then the WAL summary won't know anything
+        * about it. Without this logic, we would erroneously conclude that it was
+        * OK to send it incrementally.
+        *
+        * Note that the file could have existed at the time of the prior backup,
+        * gotten deleted, and then a new file with the same name could have been
+        * created.  In that case, this logic won't prevent the file from being
+        * backed up incrementally. But, if the deletion happened before the start
+        * of the current backup, the limit block will be 0, inducing a full
+        * backup. If the deletion happened after the start of the current backup,
+        * reconstruction will erroneously combine blocks from the current
+        * lifespan of the file with blocks from the previous lifespan -- but in
+        * this type of case, WAL replay to reach backup consistency should remove
+        * and recreate the file anyway, so the initial bogus contents should not
+        * matter.
+        */
+       if (backup_file_lookup(ib->manifest_files, path) == NULL)
+       {
+               char       *ipath;
+
+               ipath = GetIncrementalFilePath(dboid, spcoid, relfilenumber,
+                                                                          forknum, segno);
+               if (backup_file_lookup(ib->manifest_files, ipath) == NULL)
+                       return BACK_UP_FILE_FULLY;
+       }
+
+       /* Look up the block reference table entry. */
+       rlocator.spcOid = spcoid;
+       rlocator.dbOid = dboid;
+       rlocator.relNumber = relfilenumber;
+       brtentry = BlockRefTableGetEntry(ib->brtab, &rlocator, forknum,
+                                                                        &limit_block);
+
+       /*
+        * If there is no entry, then there have been no WAL-logged changes to the
+        * relation since the predecessor backup was taken, so we can back it up
+        * incrementally and need not include any modified blocks.
+        *
+        * However, if the file is zero-length, we should do a full backup,
+        * because an incremental file is always more than zero length, and it's
+        * silly to take an incremental backup when a full backup would be
+        * smaller.
+        */
+       if (brtentry == NULL)
+       {
+               if (size == 0)
+                       return BACK_UP_FILE_FULLY;
+               *num_blocks_required = 0;
+               *truncation_block_length = size / BLCKSZ;
+               return BACK_UP_FILE_INCREMENTALLY;
+       }
+
+       /*
+        * If the limit_block is less than or equal to the point where this
+        * segment starts, send the whole file.
+        */
+       if (limit_block <= segno * RELSEG_SIZE)
+               return BACK_UP_FILE_FULLY;
+
+       /*
+        * Get relevant entries from the block reference table entry.
+        *
+        * We shouldn't overflow computing the start or stop block numbers, but if
+        * it manages to happen somehow, detect it and throw an error.
+        */
+       start_blkno = segno * RELSEG_SIZE;
+       stop_blkno = start_blkno + (size / BLCKSZ);
+       if (start_blkno / RELSEG_SIZE != segno || stop_blkno < start_blkno)
+               ereport(ERROR,
+                               errcode(ERRCODE_INTERNAL_ERROR),
+                               errmsg_internal("overflow computing block number bounds for segment %u with size %zu",
+                                                               segno, size));
+       nblocks = BlockRefTableEntryGetBlocks(brtentry, start_blkno, stop_blkno,
+                                                                                 absolute_block_numbers, RELSEG_SIZE);
+       Assert(nblocks <= RELSEG_SIZE);
+
+       /*
+        * If we're going to have to send nearly all of the blocks, then just send
+        * the whole file, because that won't require much extra storage or
+        * transfer and will speed up and simplify backup restoration. It's not
+        * clear what threshold is most appropriate here and perhaps it ought to
+        * be configurable, but for now we're just going to say that if we'd need
+        * to send 90% of the blocks anyway, give up and send the whole file.
+        *
+        * NB: If you change the threshold here, at least make sure to back up the
+        * file fully when every single block must be sent, because there's
+        * nothing good about sending an incremental file in that case.
+        */
+       if (nblocks * BLCKSZ > size * 0.9)
+               return BACK_UP_FILE_FULLY;
+
+       /*
+        * Looks like we can send an incremental file, so sort the absolute the
+        * block numbers and then transpose absolute block numbers to relative
+        * block numbers.
+        *
+        * NB: If the block reference table was using the bitmap representation
+        * for a given chunk, the block numbers in that chunk will already be
+        * sorted, but when the array-of-offsets representation is used, we can
+        * receive block numbers here out of order.
+        */
+       qsort(absolute_block_numbers, nblocks, sizeof(BlockNumber),
+                 compare_block_numbers);
+       for (i = 0; i < nblocks; ++i)
+               relative_block_numbers[i] = absolute_block_numbers[i] - start_blkno;
+       *num_blocks_required = nblocks;
+
+       /*
+        * The truncation block length is the minimum length of the reconstructed
+        * file. Any block numbers below this threshold that are not present in
+        * the backup need to be fetched from the prior backup. At or above this
+        * threshold, blocks should only be included in the result if they are
+        * present in the backup. (This may require inserting zero blocks if the
+        * blocks included in the backup are non-consecutive.)
+        */
+       *truncation_block_length = size / BLCKSZ;
+       if (BlockNumberIsValid(limit_block))
+       {
+               unsigned        relative_limit = limit_block - segno * RELSEG_SIZE;
+
+               if (*truncation_block_length < relative_limit)
+                       *truncation_block_length = relative_limit;
+       }
+
+       /* Send it incrementally. */
+       return BACK_UP_FILE_INCREMENTALLY;
+}
+
+/*
+ * Compute the size for an incremental file containing a given number of blocks.
+ */
+extern size_t
+GetIncrementalFileSize(unsigned num_blocks_required)
+{
+       size_t          result;
+
+       /* Make sure we're not going to overflow. */
+       Assert(num_blocks_required <= RELSEG_SIZE);
+
+       /*
+        * Three four byte quantities (magic number, truncation block length,
+        * block count) followed by block numbers followed by block contents.
+        */
+       result = 3 * sizeof(uint32);
+       result += (BLCKSZ + sizeof(BlockNumber)) * num_blocks_required;
+
+       return result;
+}
+
+/*
+ * Helper function for filemap hash table.
+ */
+static uint32
+hash_string_pointer(const char *s)
+{
+       unsigned char *ss = (unsigned char *) s;
+
+       return hash_bytes(ss, strlen(s));
+}
+
+/*
+ * This callback is invoked for each file mentioned in the backup manifest.
+ *
+ * We store the path to each file and the size of each file for sanity-checking
+ * purposes. For further details, see comments for IncrementalBackupInfo.
+ */
+static void
+manifest_process_file(JsonManifestParseContext *context,
+                                         char *pathname, size_t size,
+                                         pg_checksum_type checksum_type,
+                                         int checksum_length,
+                                         uint8 *checksum_payload)
+{
+       IncrementalBackupInfo *ib = context->private_data;
+       backup_file_entry *entry;
+       bool            found;
+
+       entry = backup_file_insert(ib->manifest_files, pathname, &found);
+       if (!found)
+       {
+               entry->path = MemoryContextStrdup(ib->manifest_files->ctx,
+                                                                                 pathname);
+               entry->size = size;
+       }
+}
+
+/*
+ * This callback is invoked for each WAL range mentioned in the backup
+ * manifest.
+ *
+ * We're just interested in learning the oldest LSN and the corresponding TLI
+ * that appear in any WAL range.
+ */
+static void
+manifest_process_wal_range(JsonManifestParseContext *context,
+                                                  TimeLineID tli, XLogRecPtr start_lsn,
+                                                  XLogRecPtr end_lsn)
+{
+       IncrementalBackupInfo *ib = context->private_data;
+       backup_wal_range *range = palloc(sizeof(backup_wal_range));
+
+       range->tli = tli;
+       range->start_lsn = start_lsn;
+       range->end_lsn = end_lsn;
+       ib->manifest_wal_ranges = lappend(ib->manifest_wal_ranges, range);
+}
+
+/*
+ * This callback is invoked if an error occurs while parsing the backup
+ * manifest.
+ */
+static void
+manifest_report_error(JsonManifestParseContext *context, const char *fmt,...)
+{
+       StringInfoData errbuf;
+
+       initStringInfo(&errbuf);
+
+       for (;;)
+       {
+               va_list         ap;
+               int                     needed;
+
+               va_start(ap, fmt);
+               needed = appendStringInfoVA(&errbuf, fmt, ap);
+               va_end(ap);
+               if (needed == 0)
+                       break;
+               enlargeStringInfo(&errbuf, needed);
+       }
+
+       ereport(ERROR,
+                       errmsg_internal("%s", errbuf.data));
+}
+
+/*
+ * Quicksort comparator for block numbers.
+ */
+static int
+compare_block_numbers(const void *a, const void *b)
+{
+       BlockNumber aa = *(BlockNumber *) a;
+       BlockNumber bb = *(BlockNumber *) b;
+
+       if (aa > bb)
+               return 1;
+       else if (aa == bb)
+               return 0;
+       else
+               return -1;
+}
index 5d4ebe3ebeddd897641c5de50b0af30c5f517627..2a6a2dc7c0e8db553dd4dff8ed86f75e766b83a3 100644 (file)
@@ -5,6 +5,7 @@ backend_sources += files(
   'basebackup.c',
   'basebackup_copy.c',
   'basebackup_gzip.c',
+  'basebackup_incremental.c',
   'basebackup_lz4.c',
   'basebackup_progress.c',
   'basebackup_server.c',
index 0c874e33cf6ab00fef55ade3b3750cd13166fe8f..a5d118ed683aa267bfa2562c61e46744b26c091e 100644 (file)
@@ -76,11 +76,12 @@ Node *replication_parse_result;
 %token K_EXPORT_SNAPSHOT
 %token K_NOEXPORT_SNAPSHOT
 %token K_USE_SNAPSHOT
+%token K_UPLOAD_MANIFEST
 
 %type <node>   command
 %type <node>   base_backup start_replication start_logical_replication
                                create_replication_slot drop_replication_slot identify_system
-                               read_replication_slot timeline_history show
+                               read_replication_slot timeline_history show upload_manifest
 %type <list>   generic_option_list
 %type <defelt> generic_option
 %type <uintval>        opt_timeline
@@ -114,6 +115,7 @@ command:
                        | read_replication_slot
                        | timeline_history
                        | show
+                       | upload_manifest
                        ;
 
 /*
@@ -307,6 +309,15 @@ timeline_history:
                                }
                        ;
 
+/* UPLOAD_MANIFEST doesn't currently accept any arguments */
+upload_manifest:
+                       K_UPLOAD_MANIFEST
+                               {
+                                       UploadManifestCmd *cmd = makeNode(UploadManifestCmd);
+
+                                       $$ = (Node *) cmd;
+                               }
+
 opt_physical:
                        K_PHYSICAL
                        | /* EMPTY */
@@ -411,6 +422,7 @@ ident_or_keyword:
                        | K_EXPORT_SNAPSHOT                             { $$ = "export_snapshot"; }
                        | K_NOEXPORT_SNAPSHOT                   { $$ = "noexport_snapshot"; }
                        | K_USE_SNAPSHOT                                { $$ = "use_snapshot"; }
+                       | K_UPLOAD_MANIFEST                             { $$ = "upload_manifest"; }
                ;
 
 %%
index 1cc7fb858cd581acf018214fbdfacf568faee285..4805da08ee3dfabd025d43e4d48b276d045eed5b 100644 (file)
@@ -136,6 +136,7 @@ EXPORT_SNAPSHOT             { return K_EXPORT_SNAPSHOT; }
 NOEXPORT_SNAPSHOT      { return K_NOEXPORT_SNAPSHOT; }
 USE_SNAPSHOT           { return K_USE_SNAPSHOT; }
 WAIT                           { return K_WAIT; }
+UPLOAD_MANIFEST                { return K_UPLOAD_MANIFEST; }
 
 {space}+               { /* do nothing */ }
 
@@ -303,6 +304,7 @@ replication_scanner_is_replication_command(void)
                case K_DROP_REPLICATION_SLOT:
                case K_READ_REPLICATION_SLOT:
                case K_TIMELINE_HISTORY:
+               case K_UPLOAD_MANIFEST:
                case K_SHOW:
                        /* Yes; push back the first token so we can parse later. */
                        repl_pushed_back_token = first_token;
index 3bc9c823895e12aacb628a550a287e0de22e8fcd..dbcda325540dc7131099fda640b2af2b52f9db57 100644 (file)
@@ -58,6 +58,7 @@
 #include "access/xlogrecovery.h"
 #include "access/xlogutils.h"
 #include "backup/basebackup.h"
+#include "backup/basebackup_incremental.h"
 #include "catalog/pg_authid.h"
 #include "catalog/pg_type.h"
 #include "commands/dbcommands.h"
@@ -137,6 +138,17 @@ bool               wake_wal_senders = false;
  */
 static XLogReaderState *xlogreader = NULL;
 
+/*
+ * If the UPLOAD_MANIFEST command is used to provide a backup manifest in
+ * preparation for an incremental backup, uploaded_manifest will be point
+ * to an object containing information about its contexts, and
+ * uploaded_manifest_mcxt will point to the memory context that contains
+ * that object and all of its subordinate data. Otherwise, both values will
+ * be NULL.
+ */
+static IncrementalBackupInfo *uploaded_manifest = NULL;
+static MemoryContext uploaded_manifest_mcxt = NULL;
+
 /*
  * These variables keep track of the state of the timeline we're currently
  * sending. sendTimeLine identifies the timeline. If sendTimeLineIsHistoric,
@@ -233,6 +245,9 @@ static void XLogSendLogical(void);
 static void WalSndDone(WalSndSendDataCallback send_data);
 static XLogRecPtr GetStandbyFlushRecPtr(TimeLineID *tli);
 static void IdentifySystem(void);
+static void UploadManifest(void);
+static bool HandleUploadManifestPacket(StringInfo buf, off_t *offset,
+                                                                          IncrementalBackupInfo *ib);
 static void ReadReplicationSlot(ReadReplicationSlotCmd *cmd);
 static void CreateReplicationSlot(CreateReplicationSlotCmd *cmd);
 static void DropReplicationSlot(DropReplicationSlotCmd *cmd);
@@ -660,6 +675,143 @@ SendTimeLineHistory(TimeLineHistoryCmd *cmd)
        pq_endmessage(&buf);
 }
 
+/*
+ * Handle UPLOAD_MANIFEST command.
+ */
+static void
+UploadManifest(void)
+{
+       MemoryContext mcxt;
+       IncrementalBackupInfo *ib;
+       off_t           offset = 0;
+       StringInfoData buf;
+
+       /*
+        * parsing the manifest will use the cryptohash stuff, which requires a
+        * resource owner
+        */
+       Assert(CurrentResourceOwner == NULL);
+       CurrentResourceOwner = ResourceOwnerCreate(NULL, "base backup");
+
+       /* Prepare to read manifest data into a temporary context. */
+       mcxt = AllocSetContextCreate(CurrentMemoryContext,
+                                                                "incremental backup information",
+                                                                ALLOCSET_DEFAULT_SIZES);
+       ib = CreateIncrementalBackupInfo(mcxt);
+
+       /* Send a CopyInResponse message */
+       pq_beginmessage(&buf, 'G');
+       pq_sendbyte(&buf, 0);
+       pq_sendint16(&buf, 0);
+       pq_endmessage_reuse(&buf);
+       pq_flush();
+
+       /* Recieve packets from client until done. */
+       while (HandleUploadManifestPacket(&buf, &offset, ib))
+               ;
+
+       /* Finish up manifest processing. */
+       FinalizeIncrementalManifest(ib);
+
+       /*
+        * Discard any old manifest information and arrange to preserve the new
+        * information we just got.
+        *
+        * We assume that MemoryContextDelete and MemoryContextSetParent won't
+        * fail, and thus we shouldn't end up bailing out of here in such a way as
+        * to leave dangling pointrs.
+        */
+       if (uploaded_manifest_mcxt != NULL)
+               MemoryContextDelete(uploaded_manifest_mcxt);
+       MemoryContextSetParent(mcxt, CacheMemoryContext);
+       uploaded_manifest = ib;
+       uploaded_manifest_mcxt = mcxt;
+
+       /* clean up the resource owner we created */
+       WalSndResourceCleanup(true);
+}
+
+/*
+ * Process one packet received during the handling of an UPLOAD_MANIFEST
+ * operation.
+ *
+ * 'buf' is scratch space. This function expects it to be initialized, doesn't
+ * care what the current contents are, and may override them with completely
+ * new contents.
+ *
+ * The return value is true if the caller should continue processing
+ * additional packets and false if the UPLOAD_MANIFEST operation is complete.
+ */
+static bool
+HandleUploadManifestPacket(StringInfo buf, off_t *offset,
+                                                  IncrementalBackupInfo *ib)
+{
+       int                     mtype;
+       int                     maxmsglen;
+
+       HOLD_CANCEL_INTERRUPTS();
+
+       pq_startmsgread();
+       mtype = pq_getbyte();
+       if (mtype == EOF)
+               ereport(ERROR,
+                               (errcode(ERRCODE_CONNECTION_FAILURE),
+                                errmsg("unexpected EOF on client connection with an open transaction")));
+
+       switch (mtype)
+       {
+               case 'd':                               /* CopyData */
+                       maxmsglen = PQ_LARGE_MESSAGE_LIMIT;
+                       break;
+               case 'c':                               /* CopyDone */
+               case 'f':                               /* CopyFail */
+               case 'H':                               /* Flush */
+               case 'S':                               /* Sync */
+                       maxmsglen = PQ_SMALL_MESSAGE_LIMIT;
+                       break;
+               default:
+                       ereport(ERROR,
+                                       (errcode(ERRCODE_PROTOCOL_VIOLATION),
+                                        errmsg("unexpected message type 0x%02X during COPY from stdin",
+                                                       mtype)));
+                       maxmsglen = 0;          /* keep compiler quiet */
+                       break;
+       }
+
+       /* Now collect the message body */
+       if (pq_getmessage(buf, maxmsglen))
+               ereport(ERROR,
+                               (errcode(ERRCODE_CONNECTION_FAILURE),
+                                errmsg("unexpected EOF on client connection with an open transaction")));
+       RESUME_CANCEL_INTERRUPTS();
+
+       /* Process the message */
+       switch (mtype)
+       {
+               case 'd':                               /* CopyData */
+                       AppendIncrementalManifestData(ib, buf->data, buf->len);
+                       return true;
+
+               case 'c':                               /* CopyDone */
+                       return false;
+
+               case 'H':                               /* Sync */
+               case 'S':                               /* Flush */
+                       /* Ignore these while in CopyOut mode as we do elsewhere. */
+                       return true;
+
+               case 'f':
+                       ereport(ERROR,
+                                       (errcode(ERRCODE_QUERY_CANCELED),
+                                        errmsg("COPY from stdin failed: %s",
+                                                       pq_getmsgstring(buf))));
+       }
+
+       /* Not reached. */
+       Assert(false);
+       return false;
+}
+
 /*
  * Handle START_REPLICATION command.
  *
@@ -1801,7 +1953,7 @@ exec_replication_command(const char *cmd_string)
                        cmdtag = "BASE_BACKUP";
                        set_ps_display(cmdtag);
                        PreventInTransactionBlock(true, cmdtag);
-                       SendBaseBackup((BaseBackupCmd *) cmd_node);
+                       SendBaseBackup((BaseBackupCmd *) cmd_node, uploaded_manifest);
                        EndReplicationCommand(cmdtag);
                        break;
 
@@ -1863,6 +2015,14 @@ exec_replication_command(const char *cmd_string)
                        }
                        break;
 
+               case T_UploadManifestCmd:
+                       cmdtag = "UPLOAD_MANIFEST";
+                       set_ps_display(cmdtag);
+                       PreventInTransactionBlock(true, cmdtag);
+                       UploadManifest();
+                       EndReplicationCommand(cmdtag);
+                       break;
+
                default:
                        elog(ERROR, "unrecognized replication command node tag: %u",
                                 cmd_node->type);
index 0e0ac22bdd675bbc8514cb025383145529876a5c..706140eb9f4864a4c4ecf74016124a3eb674b253 100644 (file)
@@ -32,6 +32,7 @@
 #include "postmaster/bgworker_internals.h"
 #include "postmaster/bgwriter.h"
 #include "postmaster/postmaster.h"
+#include "postmaster/walsummarizer.h"
 #include "replication/logicallauncher.h"
 #include "replication/origin.h"
 #include "replication/slot.h"
@@ -140,6 +141,7 @@ CalculateShmemSize(int *num_semaphores)
        size = add_size(size, ReplicationOriginShmemSize());
        size = add_size(size, WalSndShmemSize());
        size = add_size(size, WalRcvShmemSize());
+       size = add_size(size, WalSummarizerShmemSize());
        size = add_size(size, PgArchShmemSize());
        size = add_size(size, ApplyLauncherShmemSize());
        size = add_size(size, BTreeShmemSize());
@@ -337,6 +339,7 @@ CreateOrAttachShmemStructs(void)
        ReplicationOriginShmemInit();
        WalSndShmemInit();
        WalRcvShmemInit();
+       WalSummarizerShmemInit();
        PgArchShmemInit();
        ApplyLauncherShmemInit();
 
index 373077bf52b2e5d34f8a72a5d87939bc70333444..aa2210925e28f589bf1e9e9cfa322448ba6041a3 100644 (file)
@@ -19,6 +19,7 @@ SUBDIRS = \
        pg_archivecleanup \
        pg_basebackup \
        pg_checksums \
+       pg_combinebackup \
        pg_config \
        pg_controldata \
        pg_ctl \
index 67cb50630c5a31383104d63ce23f4e7437c98778..4cb6fd59bb876cdd03440ca560b9482b1a4b3731 100644 (file)
@@ -5,6 +5,7 @@ subdir('pg_amcheck')
 subdir('pg_archivecleanup')
 subdir('pg_basebackup')
 subdir('pg_checksums')
+subdir('pg_combinebackup')
 subdir('pg_config')
 subdir('pg_controldata')
 subdir('pg_ctl')
index 45f32974ff6e3ee171f951375c2c20e5aa13eceb..6b78ee283d97d3d7bec8d4eaf14452ff7f004d66 100644 (file)
@@ -296,6 +296,7 @@ should_allow_existing_directory(const char *pathname)
        if (strcmp(filename, "pg_wal") == 0 ||
                strcmp(filename, "pg_xlog") == 0 ||
                strcmp(filename, "archive_status") == 0 ||
+               strcmp(filename, "summaries") == 0 ||
                strcmp(filename, "pg_tblspc") == 0)
                return true;
 
index f32684a8f233810b3b9a0f2084828ec973618736..5795b91261fce53d792a537bde0a20a62749bd50 100644 (file)
@@ -101,6 +101,11 @@ typedef void (*WriteDataCallback) (size_t nbytes, char *buf,
  */
 #define MINIMUM_VERSION_FOR_TERMINATED_TARFILE 150000
 
+/*
+ * pg_wal/summaries exists beginning with version 17.
+ */
+#define MINIMUM_VERSION_FOR_WAL_SUMMARIES 170000
+
 /*
  * Different ways to include WAL
  */
@@ -217,7 +222,8 @@ static void ReceiveBackupManifestInMemoryChunk(size_t r, char *copybuf,
                                                                                           void *callback_data);
 static void BaseBackup(char *compression_algorithm, char *compression_detail,
                                           CompressionLocation compressloc,
-                                          pg_compress_specification *client_compress);
+                                          pg_compress_specification *client_compress,
+                                          char *incremental_manifest);
 
 static bool reached_end_position(XLogRecPtr segendpos, uint32 timeline,
                                                                 bool segment_finished);
@@ -390,6 +396,8 @@ usage(void)
        printf(_("\nOptions controlling the output:\n"));
        printf(_("  -D, --pgdata=DIRECTORY receive base backup into directory\n"));
        printf(_("  -F, --format=p|t       output format (plain (default), tar)\n"));
+       printf(_("  -i, --incremental=OLDMANIFEST\n"));
+       printf(_("                         take incremental backup\n"));
        printf(_("  -r, --max-rate=RATE    maximum transfer rate to transfer data directory\n"
                         "                         (in kB/s, or use suffix \"k\" or \"M\")\n"));
        printf(_("  -R, --write-recovery-conf\n"
@@ -688,6 +696,23 @@ StartLogStreamer(char *startpos, uint32 timeline, char *sysidentifier,
 
                if (pg_mkdir_p(statusdir, pg_dir_create_mode) != 0 && errno != EEXIST)
                        pg_fatal("could not create directory \"%s\": %m", statusdir);
+
+               /*
+                * For newer server versions, likewise create pg_wal/summaries
+                */
+               if (PQserverVersion(conn) < MINIMUM_VERSION_FOR_WAL_SUMMARIES)
+               {
+                       char            summarydir[MAXPGPATH];
+
+                       snprintf(summarydir, sizeof(summarydir), "%s/%s/summaries",
+                                        basedir,
+                                        PQserverVersion(conn) < MINIMUM_VERSION_FOR_PG_WAL ?
+                                        "pg_xlog" : "pg_wal");
+
+                       if (pg_mkdir_p(statusdir, pg_dir_create_mode) != 0 &&
+                               errno != EEXIST)
+                               pg_fatal("could not create directory \"%s\": %m", summarydir);
+               }
        }
 
        /*
@@ -1728,7 +1753,9 @@ ReceiveBackupManifestInMemoryChunk(size_t r, char *copybuf,
 
 static void
 BaseBackup(char *compression_algorithm, char *compression_detail,
-                  CompressionLocation compressloc, pg_compress_specification *client_compress)
+                  CompressionLocation compressloc,
+                  pg_compress_specification *client_compress,
+                  char *incremental_manifest)
 {
        PGresult   *res;
        char       *sysidentifier;
@@ -1794,7 +1821,76 @@ BaseBackup(char *compression_algorithm, char *compression_detail,
                exit(1);
 
        /*
-        * Start the actual backup
+        * If the user wants an incremental backup, we must upload the manifest
+        * for the previous backup upon which it is to be based.
+        */
+       if (incremental_manifest != NULL)
+       {
+               int                     fd;
+               char            mbuf[65536];
+               int                     nbytes;
+
+               /* Reject if server is too old. */
+               if (serverVersion < MINIMUM_VERSION_FOR_WAL_SUMMARIES)
+                       pg_fatal("server does not support incremental backup");
+
+               /* Open the file. */
+               fd = open(incremental_manifest, O_RDONLY | PG_BINARY, 0);
+               if (fd < 0)
+                       pg_fatal("could not open file \"%s\": %m", incremental_manifest);
+
+               /* Tell the server what we want to do. */
+               if (PQsendQuery(conn, "UPLOAD_MANIFEST") == 0)
+                       pg_fatal("could not send replication command \"%s\": %s",
+                                        "UPLOAD_MANIFEST", PQerrorMessage(conn));
+               res = PQgetResult(conn);
+               if (PQresultStatus(res) != PGRES_COPY_IN)
+               {
+                       if (PQresultStatus(res) == PGRES_FATAL_ERROR)
+                               pg_fatal("could not upload manifest: %s",
+                                                PQerrorMessage(conn));
+                       else
+                               pg_fatal("could not upload manifest: unexpected status %s",
+                                                PQresStatus(PQresultStatus(res)));
+               }
+
+               /* Loop, reading from the file and sending the data to the server. */
+               while ((nbytes = read(fd, mbuf, sizeof mbuf)) > 0)
+               {
+                       if (PQputCopyData(conn, mbuf, nbytes) < 0)
+                               pg_fatal("could not send COPY data: %s",
+                                                PQerrorMessage(conn));
+               }
+
+               /* Bail out if we exited the loop due to an error. */
+               if (nbytes < 0)
+                       pg_fatal("could not read file \"%s\": %m", incremental_manifest);
+
+               /* End the COPY operation. */
+               if (PQputCopyEnd(conn, NULL) < 0)
+                       pg_fatal("could not send end-of-COPY: %s",
+                                        PQerrorMessage(conn));
+
+               /* See whether the server is happy with what we sent. */
+               res = PQgetResult(conn);
+               if (PQresultStatus(res) == PGRES_FATAL_ERROR)
+                       pg_fatal("could not upload manifest: %s",
+                                        PQerrorMessage(conn));
+               else if (PQresultStatus(res) != PGRES_COMMAND_OK)
+                       pg_fatal("could not upload manifest: unexpected status %s",
+                                        PQresStatus(PQresultStatus(res)));
+
+               /* Consume ReadyForQuery message from server. */
+               res = PQgetResult(conn);
+               if (res != NULL)
+                       pg_fatal("unexpected extra result while sending manifest");
+
+               /* Add INCREMENTAL option to BASE_BACKUP command. */
+               AppendPlainCommandOption(&buf, use_new_option_syntax, "INCREMENTAL");
+       }
+
+       /*
+        * Continue building up the options list for the BASE_BACKUP command.
         */
        AppendStringCommandOption(&buf, use_new_option_syntax, "LABEL", label);
        if (estimatesize)
@@ -1901,6 +1997,7 @@ BaseBackup(char *compression_algorithm, char *compression_detail,
        else
                basebkp = psprintf("BASE_BACKUP %s", buf.data);
 
+       /* OK, try to start the backup. */
        if (PQsendQuery(conn, basebkp) == 0)
                pg_fatal("could not send replication command \"%s\": %s",
                                 "BASE_BACKUP", PQerrorMessage(conn));
@@ -2256,6 +2353,7 @@ main(int argc, char **argv)
                {"version", no_argument, NULL, 'V'},
                {"pgdata", required_argument, NULL, 'D'},
                {"format", required_argument, NULL, 'F'},
+               {"incremental", required_argument, NULL, 'i'},
                {"checkpoint", required_argument, NULL, 'c'},
                {"create-slot", no_argument, NULL, 'C'},
                {"max-rate", required_argument, NULL, 'r'},
@@ -2293,6 +2391,7 @@ main(int argc, char **argv)
        int                     option_index;
        char       *compression_algorithm = "none";
        char       *compression_detail = NULL;
+       char       *incremental_manifest = NULL;
        CompressionLocation compressloc = COMPRESS_LOCATION_UNSPECIFIED;
        pg_compress_specification client_compress;
 
@@ -2317,7 +2416,7 @@ main(int argc, char **argv)
 
        atexit(cleanup_directories_atexit);
 
-       while ((c = getopt_long(argc, argv, "c:Cd:D:F:h:l:nNp:Pr:Rs:S:t:T:U:vwWX:zZ:",
+       while ((c = getopt_long(argc, argv, "c:Cd:D:F:h:i:l:nNp:Pr:Rs:S:t:T:U:vwWX:zZ:",
                                                        long_options, &option_index)) != -1)
        {
                switch (c)
@@ -2352,6 +2451,9 @@ main(int argc, char **argv)
                        case 'h':
                                dbhost = pg_strdup(optarg);
                                break;
+                       case 'i':
+                               incremental_manifest = pg_strdup(optarg);
+                               break;
                        case 'l':
                                label = pg_strdup(optarg);
                                break;
@@ -2765,7 +2867,7 @@ main(int argc, char **argv)
        }
 
        BaseBackup(compression_algorithm, compression_detail, compressloc,
-                          &client_compress);
+                          &client_compress, incremental_manifest);
 
        success = true;
        return 0;
index b9f5e1266b424ec487d6ce968d45f978e3d15739..bf765291e7d5a26126146b79a1afc2705a9347ca 100644 (file)
@@ -223,10 +223,10 @@ SKIP:
                "check backup dir permissions");
 }
 
-# Only archive_sta