</para>
</sect2>
+ <sect2 id="backup-incremental-backup">
+ <title>Making an Incremental Backup</title>
+
+ <para>
+ You can use <xref linkend="app-pgbasebackup"/> to take an incremental
+ backup by specifying the <literal>--incremental</literal> option. You must
+ supply, as an argument to <literal>--incremental</literal>, the backup
+ manifest to an earlier backup from the same server. In the resulting
+ backup, non-relation files will be included in their entirety, but some
+ relation files may be replaced by smaller incremental files which contain
+ only the blocks which have been changed since the earlier backup and enough
+ metadata to reconstruct the current version of the file.
+ </para>
+
+ <para>
+ To figure out which blocks need to be backed up, the server uses WAL
+ summaries, which are stored in the data directory, inside the directory
+ <literal>pg_wal/summaries</literal>. If the required summary files are not
+ present, an attempt to take an incremental backup will fail. The summaries
+ present in this directory must cover all LSNs from the start LSN of the
+ prior backup to the start LSN of the current backup. Since the server looks
+ for WAL summaries just after establishing the start LSN of the current
+ backup, the necessary summary files probably won't be instantly present
+ on disk, but the server will wait for any missing files to show up.
+ This also helps if the WAL summarization process has fallen behind.
+ However, if the necessary files have already been removed, or if the WAL
+ summarizer doesn't catch up quickly enough, the incremental backup will
+ fail.
+ </para>
+
+ <para>
+ When restoring an incremental backup, it will be necessary to have not
+ only the incremental backup itself but also all earlier backups that
+ are required to supply the blocks omitted from the incremental backup.
+ See <xref linkend="app-pgcombinebackup"/> for further information about
+ this requirement.
+ </para>
+
+ <para>
+ Note that all of the requirements for making use of a full backup also
+ apply to an incremental backup. For instance, you still need all of the
+ WAL segment files generated during and after the file system backup, and
+ any relevant WAL history files. And you still need to create a
+ <literal>recovery.signal</literal> (or <literal>standby.signal</literal>)
+ and perform recovery, as described in
+ <xref linkend="backup-pitr-recovery" />. The requirement to have earlier
+ backups available at restore time and to use
+ <literal>pg_combinebackup</literal> is an additional requirement on top of
+ everything else. Keep in mind that <application>PostgreSQL</application>
+ has no built-in mechanism to figure out which backups are still needed as
+ a basis for restoring later incremental backups. You must keep track of
+ the relationships between your full and incremental backups on your own,
+ and be certain not to remove earlier backups if they might be needed when
+ restoring later incremental backups.
+ </para>
+
+ <para>
+ Incremental backups typically only make sense for relatively large
+ databases where a significant portion of the data does not change, or only
+ changes slowly. For a small database, it's simpler to ignore the existence
+ of incremental backups and simply take full backups, which are simpler
+ to manage. For a large database all of which is heavily modified,
+ incremental backups won't be much smaller than full backups.
+ </para>
+ </sect2>
+
<sect2 id="backup-lowlevel-base-backup">
<title>Making a Base Backup Using the Low Level API</title>
<para>
- The procedure for making a base backup using the low level
- APIs contains a few more steps than
- the <xref linkend="app-pgbasebackup"/> method, but is relatively
+ Instead of taking a full or incremental base backup using
+ <xref linkend="app-pgbasebackup"/>, you can take a base backup using the
+ low-level API. This procedure contains a few more steps than
+ the <application>pg_basebackup</application> method, but is relatively
simple. It is very important that these steps are executed in
sequence, and that the success of a step is verified before
proceeding to the next step.
</listitem>
<listitem>
<para>
- Restore the database files from your file system backup. Be sure that they
+ If you're restoring a full backup, you can restore the database files
+ directly into the target directories. Be sure that they
are restored with the right ownership (the database system user, not
<literal>root</literal>!) and with the right permissions. If you are using
tablespaces,
were correctly restored.
</para>
</listitem>
+ <listitem>
+ <para>
+ If you're restoring an incremental backup, you'll need to restore the
+ incremental backup and all earlier backups upon which it directly or
+ indirectly depends to the machine where you are performing the restore.
+ These backups will need to be placed in separate directories, not the
+ target directories where you want the running server to end up.
+ Once this is done, use <xref linkend="app-pgcombinebackup"/> to pull
+ data from the full backup and all of the subsequent incremental backups
+ and write out a synthetic full backup to the target directories. As above,
+ verify that permissions and tablespace links are correct.
+ </para>
+ </listitem>
<listitem>
<para>
Remove any files present in <filename>pg_wal/</filename>; these came from the
<sect2 id="runtime-config-wal-summarization">
<title>WAL Summarization</title>
- <!--
<para>
These settings control WAL summarization, a feature which must be
enabled in order to perform an
<link linkend="backup-incremental-backup">incremental backup</link>.
</para>
- -->
<variablelist>
<varlistentry id="guc-summarize-wal" xreflabel="summarize_wal">
</listitem>
</varlistentry>
+ <varlistentry id="protocol-replication-upload-manifest">
+ <term>
+ <literal>UPLOAD_MANIFEST</literal>
+ <indexterm><primary>UPLOAD_MANIFEST</primary></indexterm>
+ </term>
+ <listitem>
+ <para>
+ Uploads a backup manifest in preparation for taking an incremental
+ backup.
+ </para>
+ </listitem>
+ </varlistentry>
+
<varlistentry id="protocol-replication-base-backup" xreflabel="BASE_BACKUP">
<term><literal>BASE_BACKUP</literal> [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ]
<indexterm><primary>BASE_BACKUP</primary></indexterm>
</para>
</listitem>
</varlistentry>
+
+ <varlistentry>
+ <term><literal>INCREMENTAL</literal></term>
+ <listitem>
+ <para>
+ Requests an incremental backup. The
+ <literal>UPLOAD_MANIFEST</literal> command must be executed
+ before running a base backup with this option.
+ </para>
+ </listitem>
+ </varlistentry>
</variablelist>
</para>
<!ENTITY pgBasebackup SYSTEM "pg_basebackup.sgml">
<!ENTITY pgbench SYSTEM "pgbench.sgml">
<!ENTITY pgChecksums SYSTEM "pg_checksums.sgml">
+<!ENTITY pgCombinebackup SYSTEM "pg_combinebackup.sgml">
<!ENTITY pgConfig SYSTEM "pg_config-ref.sgml">
<!ENTITY pgControldata SYSTEM "pg_controldata.sgml">
<!ENTITY pgCtl SYSTEM "pg_ctl-ref.sgml">
</para>
<para>
- <application>pg_basebackup</application> makes an exact copy of the database
- cluster's files, while making sure the server is put into and
- out of backup mode automatically. Backups are always taken of the entire
- database cluster; it is not possible to back up individual databases or
- database objects. For selective backups, another tool such as
+ <application>pg_basebackup</application> can take a full or incremental
+ base backup of the database. When used to take a full backup, it makes an
+ exact copy of the database cluster's files. When used to take an incremental
+ backup, some files that would have been part of a full backup may be
+ replaced with incremental versions of the same files, containing only those
+ blocks that have been modified since the reference backup. An incremental
+ backup cannot be used directly; instead,
+ <xref linkend="app-pgcombinebackup"/> must first
+ be used to combine it with the previous backups upon which it depends.
+ See <xref linkend="backup-incremental-backup" /> for more information
+ about incremental backups, and <xref linkend="backup-pitr-recovery" />
+ for steps to recover from a backup.
+ </para>
+
+ <para>
+ In any mode, <application>pg_basebackup</application> makes sure the server
+ is put into and out of backup mode automatically. Backups are always taken of
+ the entire database cluster; it is not possible to back up individual
+ databases or database objects. For selective backups, another tool such as
<xref linkend="app-pgdump"/> must be used.
</para>
</listitem>
</varlistentry>
+ <varlistentry>
+ <term><option>-i <replaceable class="parameter">old_manifest_file</replaceable></option></term>
+ <term><option>--incremental=<replaceable class="parameter">old_meanifest_file</replaceable></option></term>
+ <listitem>
+ <para>
+ Performs an <link linkend="backup-incremental-backup">incremental
+ backup</link>. The backup manifest for the reference
+ backup must be provided, and will be uploaded to the server, which will
+ respond by sending the requested incremental backup.
+ </para>
+ </listitem>
+ </varlistentry>
+
<varlistentry>
<term><option>-R</option></term>
<term><option>--write-recovery-conf</option></term>
--- /dev/null
+<!--
+doc/src/sgml/ref/pg_combinebackup.sgml
+PostgreSQL documentation
+-->
+
+<refentry id="app-pgcombinebackup">
+ <indexterm zone="app-pgcombinebackup">
+ <primary>pg_combinebackup</primary>
+ </indexterm>
+
+ <refmeta>
+ <refentrytitle><application>pg_combinebackup</application></refentrytitle>
+ <manvolnum>1</manvolnum>
+ <refmiscinfo>Application</refmiscinfo>
+ </refmeta>
+
+ <refnamediv>
+ <refname>pg_combinebackup</refname>
+ <refpurpose>reconstruct a full backup from an incremental backup and dependent backups</refpurpose>
+ </refnamediv>
+
+ <refsynopsisdiv>
+ <cmdsynopsis>
+ <command>pg_combinebackup</command>
+ <arg rep="repeat"><replaceable>option</replaceable></arg>
+ <arg rep="repeat"><replaceable>backup_directory</replaceable></arg>
+ </cmdsynopsis>
+ </refsynopsisdiv>
+
+ <refsect1>
+ <title>Description</title>
+ <para>
+ <application>pg_combinebackup</application> is used to reconstruct a
+ synthetic full backup from an
+ <link linkend="backup-incremental-backup">incremental backup</link> and the
+ earlier backups upon which it depends.
+ </para>
+
+ <para>
+ Specify all of the required backups on the command line from oldest to newest.
+ That is, the first backup directory should be the path to the full backup, and
+ the last should be the path to the final incremental backup
+ that you wish to restore. The reconstructed backup will be written to the
+ output directory specified by the <option>-o</option> option.
+ </para>
+
+ <para>
+ Although <application>pg_combinebackup</application> will attempt to verify
+ that the backups you specify form a legal backup chain from which a correct
+ full backup can be reconstructed, it is not designed to help you keep track
+ of which backups depend on which other backups. If you remove the one or
+ more of the previous backups upon which your incremental
+ backup relies, you will not be able to restore it.
+ </para>
+
+ <para>
+ Since the output of <application>pg_combinebackup</application> is a
+ synthetic full backup, it can be used as an input to a future invocation of
+ <application>pg_combinebackup</application>. The synthetic full backup would
+ be specified on the command line in lieu of the chain of backups from which
+ it was reconstructed.
+ </para>
+ </refsect1>
+
+ <refsect1>
+ <title>Options</title>
+
+ <para>
+ <variablelist>
+ <varlistentry>
+ <term><option>-d</option></term>
+ <term><option>--debug</option></term>
+ <listitem>
+ <para>
+ Print lots of debug logging output on <filename>stderr</filename>.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>-n</option></term>
+ <term><option>--dry-run</option></term>
+ <listitem>
+ <para>
+ The <option>-n</option>/<option>--dry-run</option> option instructs
+ <command>pg_cominebackup</command> to figure out what would be done
+ without actually creating the target directory or any output files.
+ It is particularly useful in comination with <option>--debug</option>.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>-N</option></term>
+ <term><option>--no-sync</option></term>
+ <listitem>
+ <para>
+ By default, <command>pg_combinebackup</command> will wait for all files
+ to be written safely to disk. This option causes
+ <command>pg_combinebackup</command> to return without waiting, which is
+ faster, but means that a subsequent operating system crash can leave
+ the output backup corrupt. Generally, this option is useful for testing
+ but should not be used when creating a production installation.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>-o <replaceable class="parameter">outputdir</replaceable></option></term>
+ <term><option>--output=<replaceable class="parameter">outputdir</replaceable></option></term>
+ <listitem>
+ <para>
+ Specifies the output directory to which the synthetic full backup
+ should be written. Currently, this argument is required.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>-T <replaceable class="parameter">olddir</replaceable>=<replaceable class="parameter">newdir</replaceable></option></term>
+ <term><option>--tablespace-mapping=<replaceable class="parameter">olddir</replaceable>=<replaceable class="parameter">newdir</replaceable></option></term>
+ <listitem>
+ <para>
+ Relocates the tablespace in directory <replaceable>olddir</replaceable>
+ to <replaceable>newdir</replaceable> during the backup.
+ <replaceable>olddir</replaceable> is the absolute path of the tablespace
+ as it exists in the first backup specified on the command line,
+ and <replaceable>newdir</replaceable> is the absolute path to use for the
+ tablespace in the reconstructed backup. If either path needs to contain
+ an equal sign (<literal>=</literal>), precede that with a backslash.
+ This option can be specified multiple times for multiple tablespaces.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>--manifest-checksums=<replaceable class="parameter">algorithm</replaceable></option></term>
+ <listitem>
+ <para>
+ Like <xref linkend="app-pgbasebackup"/>,
+ <application>pg_combinebackup</application> writes a backup manifest
+ in the output directory. This option specifies the checksum algorithm
+ that should be applied to each file included in the backup manifest.
+ Currently, the available algorithms are <literal>NONE</literal>,
+ <literal>CRC32C</literal>, <literal>SHA224</literal>,
+ <literal>SHA256</literal>, <literal>SHA384</literal>,
+ and <literal>SHA512</literal>. The default is <literal>CRC32C</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>--no-manifest</option></term>
+ <listitem>
+ <para>
+ Disables generation of a backup manifest. If this option is not
+ specified, a backup manifest for the reconstructed backup will be
+ written to the output directory.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>--sync-method=<replaceable class="parameter">method</replaceable></option></term>
+ <listitem>
+ <para>
+ When set to <literal>fsync</literal>, which is the default,
+ <command>pg_combinebackup</command> will recursively open and synchronize
+ all files in the backup directory. When the plain format is used, the
+ search for files will follow symbolic links for the WAL directory and
+ each configured tablespace.
+ </para>
+ <para>
+ On Linux, <literal>syncfs</literal> may be used instead to ask the
+ operating system to synchronize the whole file system that contains the
+ backup directory. When the plain format is used,
+ <command>pg_combinebackup</command> will also synchronize the file systems
+ that contain the WAL files and each tablespace. See
+ <xref linkend="syncfs"/> for more information about using
+ <function>syncfs()</function>.
+ </para>
+ <para>
+ This option has no effect when <option>--no-sync</option> is used.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>-V</option></term>
+ <term><option>--version</option></term>
+ <listitem>
+ <para>
+ Prints the <application>pg_combinebackup</application> version and
+ exits.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>-?</option></term>
+ <term><option>--help</option></term>
+ <listitem>
+ <para>
+ Shows help about <application>pg_combinebackup</application> command
+ line arguments, and exits.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ </variablelist>
+ </para>
+
+ </refsect1>
+
+ <refsect1>
+ <title>Environment</title>
+
+ <para>
+ This utility, like most other <productname>PostgreSQL</productname> utilities,
+ uses the environment variables supported by <application>libpq</application>
+ (see <xref linkend="libpq-envars"/>).
+ </para>
+
+ <para>
+ The environment variable <envar>PG_COLOR</envar> specifies whether to use
+ color in diagnostic messages. Possible values are
+ <literal>always</literal>, <literal>auto</literal> and
+ <literal>never</literal>.
+ </para>
+ </refsect1>
+
+ <refsect1>
+ <title>See Also</title>
+
+ <simplelist type="inline">
+ <member><xref linkend="app-pgbasebackup"/></member>
+ </simplelist>
+ </refsect1>
+
+</refentry>
&pgamcheck;
&pgBasebackup;
&pgbench;
+ &pgCombinebackup;
&pgConfig;
&pgDump;
&pgDumpall;
appendStringInfo(result, "STOP TIMELINE: %u\n", state->stoptli);
}
+ /* either both istartpoint and istarttli should be set, or neither */
+ Assert(XLogRecPtrIsInvalid(state->istartpoint) == (state->istarttli == 0));
+ if (!XLogRecPtrIsInvalid(state->istartpoint))
+ {
+ appendStringInfo(result, "INCREMENTAL FROM LSN: %X/%X\n",
+ LSN_FORMAT_ARGS(state->istartpoint));
+ appendStringInfo(result, "INCREMENTAL FROM TLI: %u\n",
+ state->istarttli);
+ }
+
data = result->data;
pfree(result);
tli_from_file, BACKUP_LABEL_FILE)));
}
+ if (fscanf(lfp, "INCREMENTAL FROM LSN: %X/%X\n", &hi, &lo) > 0)
+ ereport(FATAL,
+ (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("this is an incremental backup, not a data directory"),
+ errhint("Use pg_combinebackup to reconstruct a valid data directory.")));
+
if (ferror(lfp) || FreeFile(lfp))
ereport(FATAL,
(errcode_for_file_access(),
basebackup.o \
basebackup_copy.o \
basebackup_gzip.o \
+ basebackup_incremental.o \
basebackup_lz4.o \
basebackup_zstd.o \
basebackup_progress.o \
#include "access/xlogbackup.h"
#include "backup/backup_manifest.h"
#include "backup/basebackup.h"
+#include "backup/basebackup_incremental.h"
#include "backup/basebackup_sink.h"
#include "backup/basebackup_target.h"
+#include "catalog/pg_tablespace_d.h"
#include "commands/defrem.h"
#include "common/compression.h"
#include "common/file_perm.h"
#include "pgtar.h"
#include "port.h"
#include "postmaster/syslogger.h"
+#include "postmaster/walsummarizer.h"
#include "replication/walsender.h"
#include "replication/walsender_private.h"
#include "storage/bufpage.h"
bool fastcheckpoint;
bool nowait;
bool includewal;
+ bool incremental;
uint32 maxrate;
bool sendtblspcmapfile;
bool send_to_client;
} basebackup_options;
static int64 sendTablespace(bbsink *sink, char *path, Oid spcoid, bool sizeonly,
- struct backup_manifest_info *manifest);
+ struct backup_manifest_info *manifest,
+ IncrementalBackupInfo *ib);
static int64 sendDir(bbsink *sink, const char *path, int basepathlen, bool sizeonly,
List *tablespaces, bool sendtblspclinks,
- backup_manifest_info *manifest, Oid spcoid);
+ backup_manifest_info *manifest, Oid spcoid,
+ IncrementalBackupInfo *ib);
static bool sendFile(bbsink *sink, const char *readfilename, const char *tarfilename,
struct stat *statbuf, bool missing_ok,
Oid dboid, Oid spcoid, RelFileNumber relfilenumber,
unsigned segno,
- backup_manifest_info *manifest);
+ backup_manifest_info *manifest,
+ unsigned num_incremental_blocks,
+ BlockNumber *incremental_blocks,
+ unsigned truncation_block_length);
static off_t read_file_data_into_buffer(bbsink *sink,
const char *readfilename, int fd,
off_t offset, size_t length,
BlockNumber blkno,
bool verify_checksum,
int *checksum_failures);
+static void push_to_sink(bbsink *sink, pg_checksum_context *checksum_ctx,
+ size_t *bytes_done, void *data, size_t length);
static bool verify_page_checksum(Page page, XLogRecPtr start_lsn,
BlockNumber blkno,
uint16 *expected_checksum);
bool sizeonly);
static void _tarWritePadding(bbsink *sink, int len);
static void convert_link_to_directory(const char *pathbuf, struct stat *statbuf);
-static void perform_base_backup(basebackup_options *opt, bbsink *sink);
+static void perform_base_backup(basebackup_options *opt, bbsink *sink,
+ IncrementalBackupInfo *ib);
static void parse_basebackup_options(List *options, basebackup_options *opt);
static int compareWalFileNames(const ListCell *a, const ListCell *b);
static int basebackup_read_file(int fd, char *buf, size_t nbytes, off_t offset,
* clobbered by longjmp" from stupider versions of gcc.
*/
static void
-perform_base_backup(basebackup_options *opt, bbsink *sink)
+perform_base_backup(basebackup_options *opt, bbsink *sink,
+ IncrementalBackupInfo *ib)
{
bbsink_state state;
XLogRecPtr endptr;
ListCell *lc;
tablespaceinfo *newti;
+ /* If this is an incremental backup, execute preparatory steps. */
+ if (ib != NULL)
+ PrepareForIncrementalBackup(ib, backup_state);
+
/* Add a node for the base directory at the end */
newti = palloc0(sizeof(tablespaceinfo));
newti->size = -1;
if (tmp->path == NULL)
tmp->size = sendDir(sink, ".", 1, true, state.tablespaces,
- true, NULL, InvalidOid);
+ true, NULL, InvalidOid, NULL);
else
tmp->size = sendTablespace(sink, tmp->path, tmp->oid, true,
- NULL);
+ NULL, NULL);
state.bytes_total += tmp->size;
}
state.bytes_total_is_valid = true;
/* Then the bulk of the files... */
sendDir(sink, ".", 1, false, state.tablespaces,
- sendtblspclinks, &manifest, InvalidOid);
+ sendtblspclinks, &manifest, InvalidOid, ib);
/* ... and pg_control after everything else. */
if (lstat(XLOG_CONTROL_FILE, &statbuf) != 0)
XLOG_CONTROL_FILE)));
sendFile(sink, XLOG_CONTROL_FILE, XLOG_CONTROL_FILE, &statbuf,
false, InvalidOid, InvalidOid,
- InvalidRelFileNumber, 0, &manifest);
+ InvalidRelFileNumber, 0, &manifest, 0, NULL, 0);
}
else
{
bbsink_begin_archive(sink, archive_name);
- sendTablespace(sink, ti->path, ti->oid, false, &manifest);
+ sendTablespace(sink, ti->path, ti->oid, false, &manifest, ib);
}
/*
sendFile(sink, pathbuf, pathbuf, &statbuf, false,
InvalidOid, InvalidOid, InvalidRelFileNumber, 0,
- &manifest);
+ &manifest, 0, NULL, 0);
/* unconditionally mark file as archived */
StatusFilePath(pathbuf, fname, ".done");
bool o_checkpoint = false;
bool o_nowait = false;
bool o_wal = false;
+ bool o_incremental = false;
bool o_maxrate = false;
bool o_tablespace_map = false;
bool o_noverify_checksums = false;
opt->includewal = defGetBoolean(defel);
o_wal = true;
}
+ else if (strcmp(defel->defname, "incremental") == 0)
+ {
+ if (o_incremental)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("duplicate option \"%s\"", defel->defname)));
+ opt->incremental = defGetBoolean(defel);
+ if (opt->incremental && !summarize_wal)
+ ereport(ERROR,
+ (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("incremental backups cannot be taken unless WAL summarization is enabled")));
+ opt->incremental = defGetBoolean(defel);
+ o_incremental = true;
+ }
else if (strcmp(defel->defname, "max_rate") == 0)
{
int64 maxrate;
* the filesystem, bypassing the buffer cache.
*/
void
-SendBaseBackup(BaseBackupCmd *cmd)
+SendBaseBackup(BaseBackupCmd *cmd, IncrementalBackupInfo *ib)
{
basebackup_options opt;
bbsink *sink;
set_ps_display(activitymsg);
}
+ /*
+ * If we're asked to perform an incremental backup and the user has not
+ * supplied a manifest, that's an ERROR.
+ *
+ * If we're asked to perform a full backup and the user did supply a
+ * manifest, just ignore it.
+ */
+ if (!opt.incremental)
+ ib = NULL;
+ else if (ib == NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("must UPLOAD_MANIFEST before performing an incremental BASE_BACKUP")));
+
/*
* If the target is specifically 'client' then set up to stream the backup
* to the client; otherwise, it's being sent someplace else and should not
*/
PG_TRY();
{
- perform_base_backup(&opt, sink);
+ perform_base_backup(&opt, sink, ib);
}
PG_FINALLY();
{
*/
static int64
sendTablespace(bbsink *sink, char *path, Oid spcoid, bool sizeonly,
- backup_manifest_info *manifest)
+ backup_manifest_info *manifest, IncrementalBackupInfo *ib)
{
int64 size;
char pathbuf[MAXPGPATH];
/* Send all the files in the tablespace version directory */
size += sendDir(sink, pathbuf, strlen(path), sizeonly, NIL, true, manifest,
- spcoid);
+ spcoid, ib);
return size;
}
static int64
sendDir(bbsink *sink, const char *path, int basepathlen, bool sizeonly,
List *tablespaces, bool sendtblspclinks, backup_manifest_info *manifest,
- Oid spcoid)
+ Oid spcoid, IncrementalBackupInfo *ib)
{
DIR *dir;
struct dirent *de;
int64 size = 0;
const char *lastDir; /* Split last dir from parent path. */
bool isRelationDir = false; /* Does directory contain relations? */
+ bool isGlobalDir = false;
Oid dboid = InvalidOid;
+ BlockNumber *relative_block_numbers = NULL;
+
+ /*
+ * Since this array is relatively large, avoid putting it on the stack.
+ * But we don't need it at all if this is not an incremental backup.
+ */
+ if (ib != NULL)
+ relative_block_numbers = palloc(sizeof(BlockNumber) * RELSEG_SIZE);
/*
* Determine if the current path is a database directory that can contain
}
}
else if (strcmp(path, "./global") == 0)
+ {
isRelationDir = true;
+ isGlobalDir = true;
+ }
dir = AllocateDir(path);
while ((de = ReadDir(dir, path)) != NULL)
&statbuf, sizeonly);
/*
- * Also send archive_status directory (by hackishly reusing
- * statbuf from above ...).
+ * Also send archive_status and summaries directories (by
+ * hackishly reusing statbuf from above ...).
*/
size += _tarWriteHeader(sink, "./pg_wal/archive_status", NULL,
&statbuf, sizeonly);
+ size += _tarWriteHeader(sink, "./pg_wal/summaries", NULL,
+ &statbuf, sizeonly);
continue; /* don't recurse into pg_wal */
}
if (!skip_this_dir)
size += sendDir(sink, pathbuf, basepathlen, sizeonly, tablespaces,
- sendtblspclinks, manifest, spcoid);
+ sendtblspclinks, manifest, spcoid, ib);
}
else if (S_ISREG(statbuf.st_mode))
{
bool sent = false;
+ unsigned num_blocks_required = 0;
+ unsigned truncation_block_length = 0;
+ char tarfilenamebuf[MAXPGPATH * 2];
+ char *tarfilename = pathbuf + basepathlen + 1;
+ FileBackupMethod method = BACK_UP_FILE_FULLY;
+
+ if (ib != NULL && isRelationFile)
+ {
+ Oid relspcoid;
+ char *lookup_path;
+
+ if (OidIsValid(spcoid))
+ {
+ relspcoid = spcoid;
+ lookup_path = psprintf("pg_tblspc/%u/%s", spcoid,
+ tarfilename);
+ }
+ else
+ {
+ if (isGlobalDir)
+ relspcoid = GLOBALTABLESPACE_OID;
+ else
+ relspcoid = DEFAULTTABLESPACE_OID;
+ lookup_path = pstrdup(tarfilename);
+ }
+
+ method = GetFileBackupMethod(ib, lookup_path, dboid, relspcoid,
+ relfilenumber, relForkNum,
+ segno, statbuf.st_size,
+ &num_blocks_required,
+ relative_block_numbers,
+ &truncation_block_length);
+ if (method == BACK_UP_FILE_INCREMENTALLY)
+ {
+ statbuf.st_size =
+ GetIncrementalFileSize(num_blocks_required);
+ snprintf(tarfilenamebuf, sizeof(tarfilenamebuf),
+ "%s/INCREMENTAL.%s",
+ path + basepathlen + 1,
+ de->d_name);
+ tarfilename = tarfilenamebuf;
+ }
+
+ pfree(lookup_path);
+ }
if (!sizeonly)
- sent = sendFile(sink, pathbuf, pathbuf + basepathlen + 1, &statbuf,
+ sent = sendFile(sink, pathbuf, tarfilename, &statbuf,
true, dboid, spcoid,
- relfilenumber, segno, manifest);
+ relfilenumber, segno, manifest,
+ num_blocks_required,
+ method == BACK_UP_FILE_INCREMENTALLY ? relative_block_numbers : NULL,
+ truncation_block_length);
if (sent || sizeonly)
{
ereport(WARNING,
(errmsg("skipping special file \"%s\"", pathbuf)));
}
+
+ if (relative_block_numbers != NULL)
+ pfree(relative_block_numbers);
+
FreeDir(dir);
return size;
}
* If dboid is anything other than InvalidOid then any checksum failures
* detected will get reported to the cumulative stats system.
*
+ * If the file is to be sent incrementally, then num_incremental_blocks
+ * should be the number of blocks to be sent, and incremental_blocks
+ * an array of block numbers relative to the start of the current segment.
+ * If the whole file is to be sent, then incremental_blocks should be NULL,
+ * and num_incremental_blocks can have any value, as it will be ignored.
+ *
* Returns true if the file was successfully sent, false if 'missing_ok',
* and the file did not exist.
*/
sendFile(bbsink *sink, const char *readfilename, const char *tarfilename,
struct stat *statbuf, bool missing_ok, Oid dboid, Oid spcoid,
RelFileNumber relfilenumber, unsigned segno,
- backup_manifest_info *manifest)
+ backup_manifest_info *manifest, unsigned num_incremental_blocks,
+ BlockNumber *incremental_blocks, unsigned truncation_block_length)
{
int fd;
BlockNumber blkno = 0;
pgoff_t bytes_done = 0;
bool verify_checksum = false;
pg_checksum_context checksum_ctx;
+ int ibindex = 0;
if (pg_checksum_init(&checksum_ctx, manifest->checksum_type) < 0)
elog(ERROR, "could not initialize checksum of file \"%s\"",
RelFileNumberIsValid(relfilenumber))
verify_checksum = true;
+ /*
+ * If we're sending an incremental file, write the file header.
+ */
+ if (incremental_blocks != NULL)
+ {
+ unsigned magic = INCREMENTAL_MAGIC;
+ size_t header_bytes_done = 0;
+
+ /* Emit header data. */
+ push_to_sink(sink, &checksum_ctx, &header_bytes_done,
+ &magic, sizeof(magic));
+ push_to_sink(sink, &checksum_ctx, &header_bytes_done,
+ &num_incremental_blocks, sizeof(num_incremental_blocks));
+ push_to_sink(sink, &checksum_ctx, &header_bytes_done,
+ &truncation_block_length, sizeof(truncation_block_length));
+ push_to_sink(sink, &checksum_ctx, &header_bytes_done,
+ incremental_blocks,
+ sizeof(BlockNumber) * num_incremental_blocks);
+
+ /* Flush out any data still in the buffer so it's again empty. */
+ if (header_bytes_done > 0)
+ {
+ bbsink_archive_contents(sink, header_bytes_done);
+ if (pg_checksum_update(&checksum_ctx,
+ (uint8 *) sink->bbs_buffer,
+ header_bytes_done) < 0)
+ elog(ERROR, "could not update checksum of base backup");
+ }
+
+ /* Update our notion of file position. */
+ bytes_done += sizeof(magic);
+ bytes_done += sizeof(num_incremental_blocks);
+ bytes_done += sizeof(truncation_block_length);
+ bytes_done += sizeof(BlockNumber) * num_incremental_blocks;
+ }
+
/*
* Loop until we read the amount of data the caller told us to expect. The
* file could be longer, if it was extended while we were sending it, but
* for a base backup we can ignore such extended data. It will be restored
* from WAL.
*/
- while (bytes_done < statbuf->st_size)
+ while (1)
{
- size_t remaining = statbuf->st_size - bytes_done;
+ /*
+ * Determine whether we've read all the data that we need, and if not,
+ * read some more.
+ */
+ if (incremental_blocks == NULL)
+ {
+ size_t remaining = statbuf->st_size - bytes_done;
+
+ /*
+ * If we've read the required number of bytes, then it's time to
+ * stop.
+ */
+ if (bytes_done >= statbuf->st_size)
+ break;
+
+ /*
+ * Read as many bytes as will fit in the buffer, or however many
+ * are left to read, whichever is less.
+ */
+ cnt = read_file_data_into_buffer(sink, readfilename, fd,
+ bytes_done, remaining,
+ blkno + segno * RELSEG_SIZE,
+ verify_checksum,
+ &checksum_failures);
+ }
+ else
+ {
+ BlockNumber relative_blkno;
- /* Try to read some more data. */
- cnt = read_file_data_into_buffer(sink, readfilename, fd, bytes_done,
- remaining,
- blkno + segno * RELSEG_SIZE,
- verify_checksum,
- &checksum_failures);
+ /*
+ * If we've read all the blocks, then it's time to stop.
+ */
+ if (ibindex >= num_incremental_blocks)
+ break;
+
+ /*
+ * Read just one block, whichever one is the next that we're
+ * supposed to include.
+ */
+ relative_blkno = incremental_blocks[ibindex++];
+ cnt = read_file_data_into_buffer(sink, readfilename, fd,
+ relative_blkno * BLCKSZ,
+ BLCKSZ,
+ relative_blkno + segno * RELSEG_SIZE,
+ verify_checksum,
+ &checksum_failures);
+
+ /*
+ * If we get a partial read, that must mean that the relation is
+ * being truncated. Ultimately, it should be truncated to a
+ * multiple of BLCKSZ, since this path should only be reached for
+ * relation files, but we might transiently observe an
+ * intermediate value.
+ *
+ * It should be fine to treat this just as if the entire block had
+ * been truncated away - i.e. fill this and all later blocks with
+ * zeroes. WAL replay will fix things up.
+ */
+ if (cnt < BLCKSZ)
+ break;
+ }
/*
* If the amount of data we were able to read was not a multiple of
return cnt;
}
+/*
+ * Push data into a bbsink.
+ *
+ * It's better, when possible, to read data directly into the bbsink's buffer,
+ * rather than using this function to copy it into the buffer; this function is
+ * for cases where that approach is not practical.
+ *
+ * bytes_done should point to a count of the number of bytes that are
+ * currently used in the bbsink's buffer. Upon return, the bytes identified by
+ * data and length will have been copied into the bbsink's buffer, flushing
+ * as required, and *bytes_done will have been updated accordingly. If the
+ * buffer was flushed, the previous contents will also have been fed to
+ * checksum_ctx.
+ *
+ * Note that after one or more calls to this function it is the caller's
+ * responsibility to perform any required final flush.
+ */
+static void
+push_to_sink(bbsink *sink, pg_checksum_context *checksum_ctx,
+ size_t *bytes_done, void *data, size_t length)
+{
+ while (length > 0)
+ {
+ size_t bytes_to_copy;
+
+ /*
+ * We use < here rather than <= so that if the data exactly fills the
+ * remaining buffer space, we trigger a flush now.
+ */
+ if (length < sink->bbs_buffer_length - *bytes_done)
+ {
+ /* Append remaining data to buffer. */
+ memcpy(sink->bbs_buffer + *bytes_done, data, length);
+ *bytes_done += length;
+ return;
+ }
+
+ /* Copy until buffer is full and flush it. */
+ bytes_to_copy = sink->bbs_buffer_length - *bytes_done;
+ memcpy(sink->bbs_buffer + *bytes_done, data, bytes_to_copy);
+ data = ((char *) data) + bytes_to_copy;
+ length -= bytes_to_copy;
+ bbsink_archive_contents(sink, sink->bbs_buffer_length);
+ if (pg_checksum_update(checksum_ctx, (uint8 *) sink->bbs_buffer,
+ sink->bbs_buffer_length) < 0)
+ elog(ERROR, "could not update checksum");
+ *bytes_done = 0;
+ }
+}
+
/*
* Try to verify the checksum for the provided page, if it seems appropriate
* to do so.
--- /dev/null
+/*-------------------------------------------------------------------------
+ *
+ * basebackup_incremental.c
+ * code for incremental backup support
+ *
+ * This code isn't actually in charge of taking an incremental backup;
+ * the actual construction of the incremental backup happens in
+ * basebackup.c. Here, we're concerned with providing the necessary
+ * supports for that operation. In particular, we need to parse the
+ * backup manifest supplied by the user taking the incremental backup
+ * and extract the required information from it.
+ *
+ * Portions Copyright (c) 2010-2022, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/backend/backup/basebackup_incremental.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/timeline.h"
+#include "access/xlog.h"
+#include "access/xlogrecovery.h"
+#include "backup/basebackup_incremental.h"
+#include "backup/walsummary.h"
+#include "common/blkreftable.h"
+#include "common/parse_manifest.h"
+#include "common/hashfn.h"
+#include "postmaster/walsummarizer.h"
+
+#define BLOCKS_PER_READ 512
+
+/*
+ * Details extracted from the WAL ranges present in the supplied backup manifest.
+ */
+typedef struct
+{
+ TimeLineID tli;
+ XLogRecPtr start_lsn;
+ XLogRecPtr end_lsn;
+} backup_wal_range;
+
+/*
+ * Details extracted from the file list present in the supplied backup manifest.
+ */
+typedef struct
+{
+ uint32 status;
+ const char *path;
+ size_t size;
+} backup_file_entry;
+
+static uint32 hash_string_pointer(const char *s);
+#define SH_PREFIX backup_file
+#define SH_ELEMENT_TYPE backup_file_entry
+#define SH_KEY_TYPE const char *
+#define SH_KEY path
+#define SH_HASH_KEY(tb, key) hash_string_pointer(key)
+#define SH_EQUAL(tb, a, b) (strcmp(a, b) == 0)
+#define SH_SCOPE static inline
+#define SH_DECLARE
+#define SH_DEFINE
+#include "lib/simplehash.h"
+
+struct IncrementalBackupInfo
+{
+ /* Memory context for this object and its subsidiary objects. */
+ MemoryContext mcxt;
+
+ /* Temporary buffer for storing the manifest while parsing it. */
+ StringInfoData buf;
+
+ /* WAL ranges extracted from the backup manifest. */
+ List *manifest_wal_ranges;
+
+ /*
+ * Files extracted from the backup manifest.
+ *
+ * We don't really need this information, because we use WAL summaries to
+ * figure what's changed. It would be unsafe to just rely on the list of
+ * files that existed before, because it's possible for a file to be
+ * removed and a new one created with the same name and different
+ * contents. In such cases, the whole file must still be sent. We can tell
+ * from the WAL summaries whether that happened, but not from the file
+ * list.
+ *
+ * Nonetheless, this data is useful for sanity checking. If a file that we
+ * think we shouldn't need to send is not present in the manifest for the
+ * prior backup, something has gone terribly wrong. We retain the file
+ * names and sizes, but not the checksums or last modified times, for
+ * which we have no use.
+ *
+ * One significant downside of storing this data is that it consumes
+ * memory. If that turns out to be a problem, we might have to decide not
+ * to retain this information, or to make it optional.
+ */
+ backup_file_hash *manifest_files;
+
+ /*
+ * Block-reference table for the incremental backup.
+ *
+ * It's possible that storing the entire block-reference table in memory
+ * will be a problem for some users. The in-memory format that we're using
+ * here is pretty efficient, converging to little more than 1 bit per
+ * block for relation forks with large numbers of modified blocks. It's
+ * possible, however, that if you try to perform an incremental backup of
+ * a database with a sufficiently large number of relations on a
+ * sufficiently small machine, you could run out of memory here. If that
+ * turns out to be a problem in practice, we'll need to be more clever.
+ */
+ BlockRefTable *brtab;
+};
+
+static void manifest_process_file(JsonManifestParseContext *context,
+ char *pathname,
+ size_t size,
+ pg_checksum_type checksum_type,
+ int checksum_length,
+ uint8 *checksum_payload);
+static void manifest_process_wal_range(JsonManifestParseContext *context,
+ TimeLineID tli,
+ XLogRecPtr start_lsn,
+ XLogRecPtr end_lsn);
+static void manifest_report_error(JsonManifestParseContext *ib,
+ const char *fmt,...)
+ pg_attribute_printf(2, 3) pg_attribute_noreturn();
+static int compare_block_numbers(const void *a, const void *b);
+
+/*
+ * Create a new object for storing information extracted from the manifest
+ * supplied when creating an incremental backup.
+ */
+IncrementalBackupInfo *
+CreateIncrementalBackupInfo(MemoryContext mcxt)
+{
+ IncrementalBackupInfo *ib;
+ MemoryContext oldcontext;
+
+ oldcontext = MemoryContextSwitchTo(mcxt);
+
+ ib = palloc0(sizeof(IncrementalBackupInfo));
+ ib->mcxt = mcxt;
+ initStringInfo(&ib->buf);
+
+ /*
+ * It's hard to guess how many files a "typical" installation will have in
+ * the data directory, but a fresh initdb creates almost 1000 files as of
+ * this writing, so it seems to make sense for our estimate to
+ * substantially higher.
+ */
+ ib->manifest_files = backup_file_create(mcxt, 10000, NULL);
+
+ MemoryContextSwitchTo(oldcontext);
+
+ return ib;
+}
+
+/*
+ * Before taking an incremental backup, the caller must supply the backup
+ * manifest from a prior backup. Each chunk of manifest data recieved
+ * from the client should be passed to this function.
+ */
+void
+AppendIncrementalManifestData(IncrementalBackupInfo *ib, const char *data,
+ int len)
+{
+ MemoryContext oldcontext;
+
+ /* Switch to our memory context. */
+ oldcontext = MemoryContextSwitchTo(ib->mcxt);
+
+ /*
+ * XXX. Our json parser is at present incapable of parsing json blobs
+ * incrementally, so we have to accumulate the entire backup manifest
+ * before we can do anything with it. This should really be fixed, since
+ * some users might have very large numbers of files in the data
+ * directory.
+ */
+ appendBinaryStringInfo(&ib->buf, data, len);
+
+ /* Switch back to previous memory context. */
+ MemoryContextSwitchTo(oldcontext);
+}
+
+/*
+ * Finalize an IncrementalBackupInfo object after all manifest data has
+ * been supplied via calls to AppendIncrementalManifestData.
+ */
+void
+FinalizeIncrementalManifest(IncrementalBackupInfo *ib)
+{
+ JsonManifestParseContext context;
+ MemoryContext oldcontext;
+
+ /* Switch to our memory context. */
+ oldcontext = MemoryContextSwitchTo(ib->mcxt);
+
+ /* Parse the manifest. */
+ context.private_data = ib;
+ context.per_file_cb = manifest_process_file;
+ context.per_wal_range_cb = manifest_process_wal_range;
+ context.error_cb = manifest_report_error;
+ json_parse_manifest(&context, ib->buf.data, ib->buf.len);
+
+ /* Done with the buffer, so release memory. */
+ pfree(ib->buf.data);
+ ib->buf.data = NULL;
+
+ /* Switch back to previous memory context. */
+ MemoryContextSwitchTo(oldcontext);
+}
+
+/*
+ * Prepare to take an incremental backup.
+ *
+ * Before this function is called, AppendIncrementalManifestData and
+ * FinalizeIncrementalManifest should have already been called to pass all
+ * the manifest data to this object.
+ *
+ * This function performs sanity checks on the data extracted from the
+ * manifest and figures out for which WAL ranges we need summaries, and
+ * whether those summaries are available. Then, it reads and combines the
+ * data from those summary files. It also updates the backup_state with the
+ * reference TLI and LSN for the prior backup.
+ */
+void
+PrepareForIncrementalBackup(IncrementalBackupInfo *ib,
+ BackupState *backup_state)
+{
+ MemoryContext oldcontext;
+ List *expectedTLEs;
+ List *all_wslist,
+ *required_wslist = NIL;
+ ListCell *lc;
+ TimeLineHistoryEntry **tlep;
+ int num_wal_ranges;
+ int i;
+ bool found_backup_start_tli = false;
+ TimeLineID earliest_wal_range_tli = 0;
+ XLogRecPtr earliest_wal_range_start_lsn = InvalidXLogRecPtr;
+ TimeLineID latest_wal_range_tli = 0;
+ XLogRecPtr summarized_lsn;
+ XLogRecPtr pending_lsn;
+ XLogRecPtr prior_pending_lsn = InvalidXLogRecPtr;
+ int deadcycles = 0;
+ TimestampTz initial_time,
+ current_time;
+
+ Assert(ib->buf.data == NULL);
+
+ /* Switch to our memory context. */
+ oldcontext = MemoryContextSwitchTo(ib->mcxt);
+
+ /*
+ * A valid backup manifest must always contain at least one WAL range
+ * (usually exactly one, unless the backup spanned a timeline switch).
+ */
+ num_wal_ranges = list_length(ib->manifest_wal_ranges);
+ if (num_wal_ranges == 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("manifest contains no required WAL ranges")));
+
+ /*
+ * Match up the TLIs that appear in the WAL ranges of the backup manifest
+ * with those that appear in this server's timeline history. We expect
+ * every backup_wal_range to match to a TimeLineHistoryEntry; if it does
+ * not, that's an error.
+ *
+ * This loop also decides which of the WAL ranges is the manifest is most
+ * ancient and which one is the newest, according to the timeline history
+ * of this server, and stores TLIs of those WAL ranges into
+ * earliest_wal_range_tli and latest_wal_range_tli. It also updates
+ * earliest_wal_range_start_lsn to the start LSN of the WAL range for
+ * earliest_wal_range_tli.
+ *
+ * Note that the return value of readTimeLineHistory puts the latest
+ * timeline at the beginning of the list, not the end. Hence, the earliest
+ * TLI is the one that occurs nearest the end of the list returned by
+ * readTimeLineHistory, and the latest TLI is the one that occurs closest
+ * to the beginning.
+ */
+ expectedTLEs = readTimeLineHistory(backup_state->starttli);
+ tlep = palloc0(num_wal_ranges * sizeof(TimeLineHistoryEntry *));
+ for (i = 0; i < num_wal_ranges; ++i)
+ {
+ backup_wal_range *range = list_nth(ib->manifest_wal_ranges, i);
+ bool saw_earliest_wal_range_tli = false;
+ bool saw_latest_wal_range_tli = false;
+
+ /* Search this server's history for this WAL range's TLI. */
+ foreach(lc, expectedTLEs)
+ {
+ TimeLineHistoryEntry *tle = lfirst(lc);
+
+ if (tle->tli == range->tli)
+ {
+ tlep[i] = tle;
+ break;
+ }
+
+ if (tle->tli == earliest_wal_range_tli)
+ saw_earliest_wal_range_tli = true;
+ if (tle->tli == latest_wal_range_tli)
+ saw_latest_wal_range_tli = true;
+ }
+
+ /*
+ * An incremental backup can only be taken relative to a backup that
+ * represents a previous state of this server. If the backup requires
+ * WAL from a timeline that's not in our history, that definitely
+ * isn't the case.
+ */
+ if (tlep[i] == NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("timeline %u found in manifest, but not in this server's history",
+ range->tli)));
+
+ /*
+ * If we found this TLI in the server's history before encountering
+ * the latest TLI seen so far in the server's history, then this TLI
+ * is the latest one seen so far.
+ *
+ * If on the other hand we saw the earliest TLI seen so far before
+ * finding this TLI, this TLI is earlier than the earliest one seen so
+ * far. And if this is the first TLI for which we've searched, it's
+ * also the earliest one seen so far.
+ *
+ * On the first loop iteration, both things should necessarily be
+ * true.
+ */
+ if (!saw_latest_wal_range_tli)
+ latest_wal_range_tli = range->tli;
+ if (earliest_wal_range_tli == 0 || saw_earliest_wal_range_tli)
+ {
+ earliest_wal_range_tli = range->tli;
+ earliest_wal_range_start_lsn = range->start_lsn;
+ }
+ }
+
+ /*
+ * Propagate information about the prior backup into the backup_label that
+ * will be generated for this backup.
+ */
+ backup_state->istartpoint = earliest_wal_range_start_lsn;
+ backup_state->istarttli = earliest_wal_range_tli;
+
+ /*
+ * Sanity check start and end LSNs for the WAL ranges in the manifest.
+ *
+ * Commonly, there won't be any timeline switches during the prior backup
+ * at all, but if there are, they should happen at the same LSNs that this
+ * server switched timelines.
+ *
+ * Whether there are any timeline switches during the prior backup or not,
+ * the prior backup shouldn't require any WAL from a timeline prior to the
+ * start of that timeline. It also shouldn't require any WAL from later
+ * than the start of this backup.
+ *
+ * If any of these sanity checks fail, one possible explanation is that
+ * the user has generated WAL on the same timeline with the same LSNs more
+ * than once. For instance, if two standbys running on timeline 1 were
+ * both promoted and (due to a broken archiving setup) both selected new
+ * timeline ID 2, then it's possible that one of these checks might trip.
+ *
+ * Note that there are lots of ways for the user to do something very bad
+ * without tripping any of these checks, and they are not intended to be
+ * comprehensive. It's pretty hard to see how we could be certain of
+ * anything here. However, if there's a problem staring us right in the
+ * face, it's best to report it, so we do.
+ */
+ for (i = 0; i < num_wal_ranges; ++i)
+ {
+ backup_wal_range *range = list_nth(ib->manifest_wal_ranges, i);
+
+ if (range->tli == earliest_wal_range_tli)
+ {
+ if (range->start_lsn < tlep[i]->begin)
+ ereport(ERROR,
+ (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("manifest requires WAL from initial timeline %u starting at %X/%X, but that timeline begins at %X/%X",
+ range->tli,
+ LSN_FORMAT_ARGS(range->start_lsn),
+ LSN_FORMAT_ARGS(tlep[i]->begin))));
+ }
+ else
+ {
+ if (range->start_lsn != tlep[i]->begin)
+ ereport(ERROR,
+ (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("manifest requires WAL from continuation timeline %u starting at %X/%X, but that timeline begins at %X/%X",
+ range->tli,
+ LSN_FORMAT_ARGS(range->start_lsn),
+ LSN_FORMAT_ARGS(tlep[i]->begin))));
+ }
+
+ if (range->tli == latest_wal_range_tli)
+ {
+ if (range->end_lsn > backup_state->startpoint)
+ ereport(ERROR,
+ (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("manifest requires WAL from final timeline %u ending at %X/%X, but this backup starts at %X/%X",
+ range->tli,
+ LSN_FORMAT_ARGS(range->end_lsn),
+ LSN_FORMAT_ARGS(backup_state->startpoint))));
+ }
+ else
+ {
+ if (range->end_lsn != tlep[i]->end)
+ ereport(ERROR,
+ (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("manifest requires WAL from non-final timeline %u ending at %X/%X, but this server switched timelines at %X/%X",
+ range->tli,
+ LSN_FORMAT_ARGS(range->end_lsn),
+ LSN_FORMAT_ARGS(tlep[i]->end))));
+ }
+
+ }
+
+ /*
+ * Wait for WAL summarization to catch up to the backup start LSN (but
+ * time out if it doesn't do so quickly enough).
+ */
+ initial_time = current_time = GetCurrentTimestamp();
+ while (1)
+ {
+ long timeout_in_ms = 10000;
+ unsigned elapsed_seconds;
+
+ /*
+ * Align the wait time to prevent drift. This doesn't really matter,
+ * but we'd like the warnings about how long we've been waiting to say
+ * 10 seconds, 20 seconds, 30 seconds, 40 seconds ... without ever
+ * drifting to something that is not a multiple of ten.
+ */
+ timeout_in_ms -=
+ TimestampDifferenceMilliseconds(current_time, initial_time) %
+ timeout_in_ms;
+
+ /* Wait for up to 10 seconds. */
+ summarized_lsn = WaitForWalSummarization(backup_state->startpoint,
+ 10000, &pending_lsn);
+
+ /* If WAL summarization has progressed sufficiently, stop waiting. */
+ if (summarized_lsn >= backup_state->startpoint)
+ break;
+
+ /*
+ * Keep track of the number of cycles during which there has been no
+ * progression of pending_lsn. If pending_lsn is not advancing, that
+ * means that not only are no new files appearing on disk, but we're
+ * not even incorporating new records into the in-memory state.
+ */
+ if (pending_lsn > prior_pending_lsn)
+ {
+ prior_pending_lsn = pending_lsn;
+ deadcycles = 0;
+ }
+ else
+ ++deadcycles;
+
+ /*
+ * If we've managed to wait for an entire minute withot the WAL
+ * summarizer absorbing a single WAL record, error out; probably
+ * something is wrong.
+ *
+ * We could consider also erroring out if the summarizer is taking too
+ * long to catch up, but it's not clear what rate of progress would be
+ * acceptable and what would be too slow. So instead, we just try to
+ * error out in the case where there's no progress at all. That seems
+ * likely to catch a reasonable number of the things that can go wrong
+ * in practice (e.g. the summarizer process is completely hung, say
+ * because somebody hooked up a debugger to it or something) without
+ * giving up too quickly when the sytem is just slow.
+ */
+ if (deadcycles >= 6)
+ ereport(ERROR,
+ (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("WAL summarization is not progressing"),
+ errdetail("Summarization is needed through %X/%X, but is stuck at %X/%X on disk and %X/%X in memory.",
+ LSN_FORMAT_ARGS(backup_state->startpoint),
+ LSN_FORMAT_ARGS(summarized_lsn),
+ LSN_FORMAT_ARGS(pending_lsn))));
+
+ /*
+ * Otherwise, just let the user know what's happening.
+ */
+ current_time = GetCurrentTimestamp();
+ elapsed_seconds =
+ TimestampDifferenceMilliseconds(initial_time, current_time) / 1000;
+ ereport(WARNING,
+ (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("still waiting for WAL summarization through %X/%X after %d seconds",
+ LSN_FORMAT_ARGS(backup_state->startpoint),
+ elapsed_seconds),
+ errdetail("Summarization has reached %X/%X on disk and %X/%X in memory.",
+ LSN_FORMAT_ARGS(summarized_lsn),
+ LSN_FORMAT_ARGS(pending_lsn))));
+ }
+
+ /*
+ * Retrieve a list of all WAL summaries on any timeline that overlap with
+ * the LSN range of interest. We could instead call GetWalSummaries() once
+ * per timeline in the loop that follows, but that would involve reading
+ * the directory multiple times. It should be mildly faster - and perhaps
+ * a bit safer - to do it just once.
+ */
+ all_wslist = GetWalSummaries(0, earliest_wal_range_start_lsn,
+ backup_state->startpoint);
+
+ /*
+ * We need WAL summaries for everything that happened during the prior
+ * backup and everything that happened afterward up until the point where
+ * the current backup started.
+ */
+ foreach(lc, expectedTLEs)
+ {
+ TimeLineHistoryEntry *tle = lfirst(lc);
+ XLogRecPtr tli_start_lsn = tle->begin;
+ XLogRecPtr tli_end_lsn = tle->end;
+ XLogRecPtr tli_missing_lsn = InvalidXLogRecPtr;
+ List *tli_wslist;
+
+ /*
+ * Working through the history of this server from the current
+ * timeline backwards, we skip everything until we find the timeline
+ * where this backup started. Most of the time, this means we won't
+ * skip anything at all, as it's unlikely that the timeline has
+ * changed since the beginning of the backup moments ago.
+ */
+ if (tle->tli == backup_state->starttli)
+ {
+ found_backup_start_tli = true;
+ tli_end_lsn = backup_state->startpoint;
+ }
+ else if (!found_backup_start_tli)
+ continue;
+
+ /*
+ * Find the summaries that overlap the LSN range of interest for this
+ * timeline. If this is the earliest timeline involved, the range of
+ * interest begins with the start LSN of the prior backup; otherwise,
+ * it begins at the LSN at which this timeline came into existence. If
+ * this is the latest TLI involved, the range of interest ends at the
+ * start LSN of the current backup; otherwise, it ends at the point
+ * where we switched from this timeline to the next one.
+ */
+ if (tle->tli == earliest_wal_range_tli)
+ tli_start_lsn = earliest_wal_range_start_lsn;
+ tli_wslist = FilterWalSummaries(all_wslist, tle->tli,
+ tli_start_lsn, tli_end_lsn);
+
+ /*
+ * There is no guarantee that the WAL summaries we found cover the
+ * entire range of LSNs for which summaries are required, or indeed
+ * that we found any WAL summaries at all. Check whether we have a
+ * problem of that sort.
+ */
+ if (!WalSummariesAreComplete(tli_wslist, tli_start_lsn, tli_end_lsn,
+ &tli_missing_lsn))
+ {
+ if (XLogRecPtrIsInvalid(tli_missing_lsn))
+ ereport(ERROR,
+ (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("WAL summaries are required on timeline %u from %X/%X to %X/%X, but no summaries for that timeline and LSN range exist",
+ tle->tli,
+ LSN_FORMAT_ARGS(tli_start_lsn),
+ LSN_FORMAT_ARGS(tli_end_lsn))));
+ else
+ ereport(ERROR,
+ (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("WAL summaries are required on timeline %u from %X/%X to %X/%X, but the summaries for that timeline and LSN range are incomplete",
+ tle->tli,
+ LSN_FORMAT_ARGS(tli_start_lsn),
+ LSN_FORMAT_ARGS(tli_end_lsn)),
+ errdetail("The first unsummarized LSN is this range is %X/%X.",
+ LSN_FORMAT_ARGS(tli_missing_lsn))));
+ }
+
+ /*
+ * Remember that we need to read these summaries.
+ *
+ * Technically, it's possible that this could read more files than
+ * required, since tli_wslist in theory could contain redundant
+ * summaries. For instance, if we have a summary from 0/10000000 to
+ * 0/20000000 and also one from 0/00000000 to 0/30000000, then the
+ * latter subsumes the former and the former could be ignored.
+ *
+ * We ignore this possibility because the WAL summarizer only tries to
+ * generate summaries that do not overlap. If somehow they exist,
+ * we'll do a bit of extra work but the results should still be
+ * correct.
+ */
+ required_wslist = list_concat(required_wslist, tli_wslist);
+
+ /*
+ * Timelines earlier than the one in which the prior backup began are
+ * not relevant.
+ */
+ if (tle->tli == earliest_wal_range_tli)
+ break;
+ }
+
+ /*
+ * Read all of the required block reference table files and merge all of
+ * the data into a single in-memory block reference table.
+ *
+ * See the comments for struct IncrementalBackupInfo for some thoughts on
+ * memory usage.
+ */
+ ib->brtab = CreateEmptyBlockRefTable();
+ foreach(lc, required_wslist)
+ {
+ WalSummaryFile *ws = lfirst(lc);
+ WalSummaryIO wsio;
+ BlockRefTableReader *reader;
+ RelFileLocator rlocator;
+ ForkNumber forknum;
+ BlockNumber limit_block;
+ BlockNumber blocks[BLOCKS_PER_READ];
+
+ wsio.file = OpenWalSummaryFile(ws, false);
+ wsio.filepos = 0;
+ ereport(DEBUG1,
+ (errmsg_internal("reading WAL summary file \"%s\"",
+ FilePathName(wsio.file))));
+ reader = CreateBlockRefTableReader(ReadWalSummary, &wsio,
+ FilePathName(wsio.file),
+ ReportWalSummaryError, NULL);
+ while (BlockRefTableReaderNextRelation(reader, &rlocator, &forknum,
+ &limit_block))
+ {
+ BlockRefTableSetLimitBlock(ib->brtab, &rlocator,
+ forknum, limit_block);
+
+ while (1)
+ {
+ unsigned nblocks;
+ unsigned i;
+
+ nblocks = BlockRefTableReaderGetBlocks(reader, blocks,
+ BLOCKS_PER_READ);
+ if (nblocks == 0)
+ break;
+
+ for (i = 0; i < nblocks; ++i)
+ BlockRefTableMarkBlockModified(ib->brtab, &rlocator,
+ forknum, blocks[i]);
+ }
+ }
+ DestroyBlockRefTableReader(reader);
+ FileClose(wsio.file);
+ }
+
+ /* Switch back to previous memory context. */
+ MemoryContextSwitchTo(oldcontext);
+}
+
+/*
+ * Get the pathname that should be used when a file is sent incrementally.
+ *
+ * The result is a palloc'd string.
+ */
+char *
+GetIncrementalFilePath(Oid dboid, Oid spcoid, RelFileNumber relfilenumber,
+ ForkNumber forknum, unsigned segno)
+{
+ char *path;
+ char *lastslash;
+ char *ipath;
+
+ path = GetRelationPath(dboid, spcoid, relfilenumber, InvalidBackendId,
+ forknum);
+
+ lastslash = strrchr(path, '/');
+ Assert(lastslash != NULL);
+ *lastslash = '\0';
+
+ if (segno > 0)
+ ipath = psprintf("%s/INCREMENTAL.%s.%u", path, lastslash + 1, segno);
+ else
+ ipath = psprintf("%s/INCREMENTAL.%s", path, lastslash + 1);
+
+ pfree(path);
+
+ return ipath;
+}
+
+/*
+ * How should we back up a particular file as part of an incremental backup?
+ *
+ * If the return value is BACK_UP_FILE_FULLY, caller should back up the whole
+ * file just as if this were not an incremental backup.
+ *
+ * If the return value is BACK_UP_FILE_INCREMENTALLY, caller should include
+ * an incremental file in the backup instead of the entire file. On return,
+ * *num_blocks_required will be set to the number of blocks that need to be
+ * sent, and the actual block numbers will have been stored in
+ * relative_block_numbers, which should be an array of at least RELSEG_SIZE.
+ * In addition, *truncation_block_length will be set to the value that should
+ * be included in the incremental file.
+ */
+FileBackupMethod
+GetFileBackupMethod(IncrementalBackupInfo *ib, const char *path,
+ Oid dboid, Oid spcoid,
+ RelFileNumber relfilenumber, ForkNumber forknum,
+ unsigned segno, size_t size,
+ unsigned *num_blocks_required,
+ BlockNumber *relative_block_numbers,
+ unsigned *truncation_block_length)
+{
+ BlockNumber absolute_block_numbers[RELSEG_SIZE];
+ BlockNumber limit_block;
+ BlockNumber start_blkno;
+ BlockNumber stop_blkno;
+ RelFileLocator rlocator;
+ BlockRefTableEntry *brtentry;
+ unsigned i;
+ unsigned nblocks;
+
+ /* Should only be called after PrepareForIncrementalBackup. */
+ Assert(ib->buf.data == NULL);
+
+ /*
+ * dboid could be InvalidOid if shared rel, but spcoid and relfilenumber
+ * should have legal values.
+ */
+ Assert(OidIsValid(spcoid));
+ Assert(RelFileNumberIsValid(relfilenumber));
+
+ /*
+ * If the file size is too large or not a multiple of BLCKSZ, then
+ * something weird is happening, so give up and send the whole file.
+ */
+ if ((size % BLCKSZ) != 0 || size / BLCKSZ > RELSEG_SIZE)
+ return BACK_UP_FILE_FULLY;
+
+ /*
+ * The free-space map fork is not properly WAL-logged, so we need to
+ * backup the entire file every time.
+ */
+ if (forknum == FSM_FORKNUM)
+ return BACK_UP_FILE_FULLY;
+
+ /*
+ * If this file was not part of the prior backup, back it up fully.
+ *
+ * If this file was created after the prior backup and before the start of
+ * the current backup, then the WAL summary information will tell us to
+ * back up the whole file. However, if this file was created after the
+ * start of the current backup, then the WAL summary won't know anything
+ * about it. Without this logic, we would erroneously conclude that it was
+ * OK to send it incrementally.
+ *
+ * Note that the file could have existed at the time of the prior backup,
+ * gotten deleted, and then a new file with the same name could have been
+ * created. In that case, this logic won't prevent the file from being
+ * backed up incrementally. But, if the deletion happened before the start
+ * of the current backup, the limit block will be 0, inducing a full
+ * backup. If the deletion happened after the start of the current backup,
+ * reconstruction will erroneously combine blocks from the current
+ * lifespan of the file with blocks from the previous lifespan -- but in
+ * this type of case, WAL replay to reach backup consistency should remove
+ * and recreate the file anyway, so the initial bogus contents should not
+ * matter.
+ */
+ if (backup_file_lookup(ib->manifest_files, path) == NULL)
+ {
+ char *ipath;
+
+ ipath = GetIncrementalFilePath(dboid, spcoid, relfilenumber,
+ forknum, segno);
+ if (backup_file_lookup(ib->manifest_files, ipath) == NULL)
+ return BACK_UP_FILE_FULLY;
+ }
+
+ /* Look up the block reference table entry. */
+ rlocator.spcOid = spcoid;
+ rlocator.dbOid = dboid;
+ rlocator.relNumber = relfilenumber;
+ brtentry = BlockRefTableGetEntry(ib->brtab, &rlocator, forknum,
+ &limit_block);
+
+ /*
+ * If there is no entry, then there have been no WAL-logged changes to the
+ * relation since the predecessor backup was taken, so we can back it up
+ * incrementally and need not include any modified blocks.
+ *
+ * However, if the file is zero-length, we should do a full backup,
+ * because an incremental file is always more than zero length, and it's
+ * silly to take an incremental backup when a full backup would be
+ * smaller.
+ */
+ if (brtentry == NULL)
+ {
+ if (size == 0)
+ return BACK_UP_FILE_FULLY;
+ *num_blocks_required = 0;
+ *truncation_block_length = size / BLCKSZ;
+ return BACK_UP_FILE_INCREMENTALLY;
+ }
+
+ /*
+ * If the limit_block is less than or equal to the point where this
+ * segment starts, send the whole file.
+ */
+ if (limit_block <= segno * RELSEG_SIZE)
+ return BACK_UP_FILE_FULLY;
+
+ /*
+ * Get relevant entries from the block reference table entry.
+ *
+ * We shouldn't overflow computing the start or stop block numbers, but if
+ * it manages to happen somehow, detect it and throw an error.
+ */
+ start_blkno = segno * RELSEG_SIZE;
+ stop_blkno = start_blkno + (size / BLCKSZ);
+ if (start_blkno / RELSEG_SIZE != segno || stop_blkno < start_blkno)
+ ereport(ERROR,
+ errcode(ERRCODE_INTERNAL_ERROR),
+ errmsg_internal("overflow computing block number bounds for segment %u with size %zu",
+ segno, size));
+ nblocks = BlockRefTableEntryGetBlocks(brtentry, start_blkno, stop_blkno,
+ absolute_block_numbers, RELSEG_SIZE);
+ Assert(nblocks <= RELSEG_SIZE);
+
+ /*
+ * If we're going to have to send nearly all of the blocks, then just send
+ * the whole file, because that won't require much extra storage or
+ * transfer and will speed up and simplify backup restoration. It's not
+ * clear what threshold is most appropriate here and perhaps it ought to
+ * be configurable, but for now we're just going to say that if we'd need
+ * to send 90% of the blocks anyway, give up and send the whole file.
+ *
+ * NB: If you change the threshold here, at least make sure to back up the
+ * file fully when every single block must be sent, because there's
+ * nothing good about sending an incremental file in that case.
+ */
+ if (nblocks * BLCKSZ > size * 0.9)
+ return BACK_UP_FILE_FULLY;
+
+ /*
+ * Looks like we can send an incremental file, so sort the absolute the
+ * block numbers and then transpose absolute block numbers to relative
+ * block numbers.
+ *
+ * NB: If the block reference table was using the bitmap representation
+ * for a given chunk, the block numbers in that chunk will already be
+ * sorted, but when the array-of-offsets representation is used, we can
+ * receive block numbers here out of order.
+ */
+ qsort(absolute_block_numbers, nblocks, sizeof(BlockNumber),
+ compare_block_numbers);
+ for (i = 0; i < nblocks; ++i)
+ relative_block_numbers[i] = absolute_block_numbers[i] - start_blkno;
+ *num_blocks_required = nblocks;
+
+ /*
+ * The truncation block length is the minimum length of the reconstructed
+ * file. Any block numbers below this threshold that are not present in
+ * the backup need to be fetched from the prior backup. At or above this
+ * threshold, blocks should only be included in the result if they are
+ * present in the backup. (This may require inserting zero blocks if the
+ * blocks included in the backup are non-consecutive.)
+ */
+ *truncation_block_length = size / BLCKSZ;
+ if (BlockNumberIsValid(limit_block))
+ {
+ unsigned relative_limit = limit_block - segno * RELSEG_SIZE;
+
+ if (*truncation_block_length < relative_limit)
+ *truncation_block_length = relative_limit;
+ }
+
+ /* Send it incrementally. */
+ return BACK_UP_FILE_INCREMENTALLY;
+}
+
+/*
+ * Compute the size for an incremental file containing a given number of blocks.
+ */
+extern size_t
+GetIncrementalFileSize(unsigned num_blocks_required)
+{
+ size_t result;
+
+ /* Make sure we're not going to overflow. */
+ Assert(num_blocks_required <= RELSEG_SIZE);
+
+ /*
+ * Three four byte quantities (magic number, truncation block length,
+ * block count) followed by block numbers followed by block contents.
+ */
+ result = 3 * sizeof(uint32);
+ result += (BLCKSZ + sizeof(BlockNumber)) * num_blocks_required;
+
+ return result;
+}
+
+/*
+ * Helper function for filemap hash table.
+ */
+static uint32
+hash_string_pointer(const char *s)
+{
+ unsigned char *ss = (unsigned char *) s;
+
+ return hash_bytes(ss, strlen(s));
+}
+
+/*
+ * This callback is invoked for each file mentioned in the backup manifest.
+ *
+ * We store the path to each file and the size of each file for sanity-checking
+ * purposes. For further details, see comments for IncrementalBackupInfo.
+ */
+static void
+manifest_process_file(JsonManifestParseContext *context,
+ char *pathname, size_t size,
+ pg_checksum_type checksum_type,
+ int checksum_length,
+ uint8 *checksum_payload)
+{
+ IncrementalBackupInfo *ib = context->private_data;
+ backup_file_entry *entry;
+ bool found;
+
+ entry = backup_file_insert(ib->manifest_files, pathname, &found);
+ if (!found)
+ {
+ entry->path = MemoryContextStrdup(ib->manifest_files->ctx,
+ pathname);
+ entry->size = size;
+ }
+}
+
+/*
+ * This callback is invoked for each WAL range mentioned in the backup
+ * manifest.
+ *
+ * We're just interested in learning the oldest LSN and the corresponding TLI
+ * that appear in any WAL range.
+ */
+static void
+manifest_process_wal_range(JsonManifestParseContext *context,
+ TimeLineID tli, XLogRecPtr start_lsn,
+ XLogRecPtr end_lsn)
+{
+ IncrementalBackupInfo *ib = context->private_data;
+ backup_wal_range *range = palloc(sizeof(backup_wal_range));
+
+ range->tli = tli;
+ range->start_lsn = start_lsn;
+ range->end_lsn = end_lsn;
+ ib->manifest_wal_ranges = lappend(ib->manifest_wal_ranges, range);
+}
+
+/*
+ * This callback is invoked if an error occurs while parsing the backup
+ * manifest.
+ */
+static void
+manifest_report_error(JsonManifestParseContext *context, const char *fmt,...)
+{
+ StringInfoData errbuf;
+
+ initStringInfo(&errbuf);
+
+ for (;;)
+ {
+ va_list ap;
+ int needed;
+
+ va_start(ap, fmt);
+ needed = appendStringInfoVA(&errbuf, fmt, ap);
+ va_end(ap);
+ if (needed == 0)
+ break;
+ enlargeStringInfo(&errbuf, needed);
+ }
+
+ ereport(ERROR,
+ errmsg_internal("%s", errbuf.data));
+}
+
+/*
+ * Quicksort comparator for block numbers.
+ */
+static int
+compare_block_numbers(const void *a, const void *b)
+{
+ BlockNumber aa = *(BlockNumber *) a;
+ BlockNumber bb = *(BlockNumber *) b;
+
+ if (aa > bb)
+ return 1;
+ else if (aa == bb)
+ return 0;
+ else
+ return -1;
+}
'basebackup.c',
'basebackup_copy.c',
'basebackup_gzip.c',
+ 'basebackup_incremental.c',
'basebackup_lz4.c',
'basebackup_progress.c',
'basebackup_server.c',
%token K_EXPORT_SNAPSHOT
%token K_NOEXPORT_SNAPSHOT
%token K_USE_SNAPSHOT
+%token K_UPLOAD_MANIFEST
%type <node> command
%type <node> base_backup start_replication start_logical_replication
create_replication_slot drop_replication_slot identify_system
- read_replication_slot timeline_history show
+ read_replication_slot timeline_history show upload_manifest
%type <list> generic_option_list
%type <defelt> generic_option
%type <uintval> opt_timeline
| read_replication_slot
| timeline_history
| show
+ | upload_manifest
;
/*
}
;
+/* UPLOAD_MANIFEST doesn't currently accept any arguments */
+upload_manifest:
+ K_UPLOAD_MANIFEST
+ {
+ UploadManifestCmd *cmd = makeNode(UploadManifestCmd);
+
+ $$ = (Node *) cmd;
+ }
+
opt_physical:
K_PHYSICAL
| /* EMPTY */
| K_EXPORT_SNAPSHOT { $$ = "export_snapshot"; }
| K_NOEXPORT_SNAPSHOT { $$ = "noexport_snapshot"; }
| K_USE_SNAPSHOT { $$ = "use_snapshot"; }
+ | K_UPLOAD_MANIFEST { $$ = "upload_manifest"; }
;
%%
NOEXPORT_SNAPSHOT { return K_NOEXPORT_SNAPSHOT; }
USE_SNAPSHOT { return K_USE_SNAPSHOT; }
WAIT { return K_WAIT; }
+UPLOAD_MANIFEST { return K_UPLOAD_MANIFEST; }
{space}+ { /* do nothing */ }
case K_DROP_REPLICATION_SLOT:
case K_READ_REPLICATION_SLOT:
case K_TIMELINE_HISTORY:
+ case K_UPLOAD_MANIFEST:
case K_SHOW:
/* Yes; push back the first token so we can parse later. */
repl_pushed_back_token = first_token;
#include "access/xlogrecovery.h"
#include "access/xlogutils.h"
#include "backup/basebackup.h"
+#include "backup/basebackup_incremental.h"
#include "catalog/pg_authid.h"
#include "catalog/pg_type.h"
#include "commands/dbcommands.h"
*/
static XLogReaderState *xlogreader = NULL;
+/*
+ * If the UPLOAD_MANIFEST command is used to provide a backup manifest in
+ * preparation for an incremental backup, uploaded_manifest will be point
+ * to an object containing information about its contexts, and
+ * uploaded_manifest_mcxt will point to the memory context that contains
+ * that object and all of its subordinate data. Otherwise, both values will
+ * be NULL.
+ */
+static IncrementalBackupInfo *uploaded_manifest = NULL;
+static MemoryContext uploaded_manifest_mcxt = NULL;
+
/*
* These variables keep track of the state of the timeline we're currently
* sending. sendTimeLine identifies the timeline. If sendTimeLineIsHistoric,
static void WalSndDone(WalSndSendDataCallback send_data);
static XLogRecPtr GetStandbyFlushRecPtr(TimeLineID *tli);
static void IdentifySystem(void);
+static void UploadManifest(void);
+static bool HandleUploadManifestPacket(StringInfo buf, off_t *offset,
+ IncrementalBackupInfo *ib);
static void ReadReplicationSlot(ReadReplicationSlotCmd *cmd);
static void CreateReplicationSlot(CreateReplicationSlotCmd *cmd);
static void DropReplicationSlot(DropReplicationSlotCmd *cmd);
pq_endmessage(&buf);
}
+/*
+ * Handle UPLOAD_MANIFEST command.
+ */
+static void
+UploadManifest(void)
+{
+ MemoryContext mcxt;
+ IncrementalBackupInfo *ib;
+ off_t offset = 0;
+ StringInfoData buf;
+
+ /*
+ * parsing the manifest will use the cryptohash stuff, which requires a
+ * resource owner
+ */
+ Assert(CurrentResourceOwner == NULL);
+ CurrentResourceOwner = ResourceOwnerCreate(NULL, "base backup");
+
+ /* Prepare to read manifest data into a temporary context. */
+ mcxt = AllocSetContextCreate(CurrentMemoryContext,
+ "incremental backup information",
+ ALLOCSET_DEFAULT_SIZES);
+ ib = CreateIncrementalBackupInfo(mcxt);
+
+ /* Send a CopyInResponse message */
+ pq_beginmessage(&buf, 'G');
+ pq_sendbyte(&buf, 0);
+ pq_sendint16(&buf, 0);
+ pq_endmessage_reuse(&buf);
+ pq_flush();
+
+ /* Recieve packets from client until done. */
+ while (HandleUploadManifestPacket(&buf, &offset, ib))
+ ;
+
+ /* Finish up manifest processing. */
+ FinalizeIncrementalManifest(ib);
+
+ /*
+ * Discard any old manifest information and arrange to preserve the new
+ * information we just got.
+ *
+ * We assume that MemoryContextDelete and MemoryContextSetParent won't
+ * fail, and thus we shouldn't end up bailing out of here in such a way as
+ * to leave dangling pointrs.
+ */
+ if (uploaded_manifest_mcxt != NULL)
+ MemoryContextDelete(uploaded_manifest_mcxt);
+ MemoryContextSetParent(mcxt, CacheMemoryContext);
+ uploaded_manifest = ib;
+ uploaded_manifest_mcxt = mcxt;
+
+ /* clean up the resource owner we created */
+ WalSndResourceCleanup(true);
+}
+
+/*
+ * Process one packet received during the handling of an UPLOAD_MANIFEST
+ * operation.
+ *
+ * 'buf' is scratch space. This function expects it to be initialized, doesn't
+ * care what the current contents are, and may override them with completely
+ * new contents.
+ *
+ * The return value is true if the caller should continue processing
+ * additional packets and false if the UPLOAD_MANIFEST operation is complete.
+ */
+static bool
+HandleUploadManifestPacket(StringInfo buf, off_t *offset,
+ IncrementalBackupInfo *ib)
+{
+ int mtype;
+ int maxmsglen;
+
+ HOLD_CANCEL_INTERRUPTS();
+
+ pq_startmsgread();
+ mtype = pq_getbyte();
+ if (mtype == EOF)
+ ereport(ERROR,
+ (errcode(ERRCODE_CONNECTION_FAILURE),
+ errmsg("unexpected EOF on client connection with an open transaction")));
+
+ switch (mtype)
+ {
+ case 'd': /* CopyData */
+ maxmsglen = PQ_LARGE_MESSAGE_LIMIT;
+ break;
+ case 'c': /* CopyDone */
+ case 'f': /* CopyFail */
+ case 'H': /* Flush */
+ case 'S': /* Sync */
+ maxmsglen = PQ_SMALL_MESSAGE_LIMIT;
+ break;
+ default:
+ ereport(ERROR,
+ (errcode(ERRCODE_PROTOCOL_VIOLATION),
+ errmsg("unexpected message type 0x%02X during COPY from stdin",
+ mtype)));
+ maxmsglen = 0; /* keep compiler quiet */
+ break;
+ }
+
+ /* Now collect the message body */
+ if (pq_getmessage(buf, maxmsglen))
+ ereport(ERROR,
+ (errcode(ERRCODE_CONNECTION_FAILURE),
+ errmsg("unexpected EOF on client connection with an open transaction")));
+ RESUME_CANCEL_INTERRUPTS();
+
+ /* Process the message */
+ switch (mtype)
+ {
+ case 'd': /* CopyData */
+ AppendIncrementalManifestData(ib, buf->data, buf->len);
+ return true;
+
+ case 'c': /* CopyDone */
+ return false;
+
+ case 'H': /* Sync */
+ case 'S': /* Flush */
+ /* Ignore these while in CopyOut mode as we do elsewhere. */
+ return true;
+
+ case 'f':
+ ereport(ERROR,
+ (errcode(ERRCODE_QUERY_CANCELED),
+ errmsg("COPY from stdin failed: %s",
+ pq_getmsgstring(buf))));
+ }
+
+ /* Not reached. */
+ Assert(false);
+ return false;
+}
+
/*
* Handle START_REPLICATION command.
*
cmdtag = "BASE_BACKUP";
set_ps_display(cmdtag);
PreventInTransactionBlock(true, cmdtag);
- SendBaseBackup((BaseBackupCmd *) cmd_node);
+ SendBaseBackup((BaseBackupCmd *) cmd_node, uploaded_manifest);
EndReplicationCommand(cmdtag);
break;
}
break;
+ case T_UploadManifestCmd:
+ cmdtag = "UPLOAD_MANIFEST";
+ set_ps_display(cmdtag);
+ PreventInTransactionBlock(true, cmdtag);
+ UploadManifest();
+ EndReplicationCommand(cmdtag);
+ break;
+
default:
elog(ERROR, "unrecognized replication command node tag: %u",
cmd_node->type);
#include "postmaster/bgworker_internals.h"
#include "postmaster/bgwriter.h"
#include "postmaster/postmaster.h"
+#include "postmaster/walsummarizer.h"
#include "replication/logicallauncher.h"
#include "replication/origin.h"
#include "replication/slot.h"
size = add_size(size, ReplicationOriginShmemSize());
size = add_size(size, WalSndShmemSize());
size = add_size(size, WalRcvShmemSize());
+ size = add_size(size, WalSummarizerShmemSize());
size = add_size(size, PgArchShmemSize());
size = add_size(size, ApplyLauncherShmemSize());
size = add_size(size, BTreeShmemSize());
ReplicationOriginShmemInit();
WalSndShmemInit();
WalRcvShmemInit();
+ WalSummarizerShmemInit();
PgArchShmemInit();
ApplyLauncherShmemInit();
pg_archivecleanup \
pg_basebackup \
pg_checksums \
+ pg_combinebackup \
pg_config \
pg_controldata \
pg_ctl \
subdir('pg_archivecleanup')
subdir('pg_basebackup')
subdir('pg_checksums')
+subdir('pg_combinebackup')
subdir('pg_config')
subdir('pg_controldata')
subdir('pg_ctl')
if (strcmp(filename, "pg_wal") == 0 ||
strcmp(filename, "pg_xlog") == 0 ||
strcmp(filename, "archive_status") == 0 ||
+ strcmp(filename, "summaries") == 0 ||
strcmp(filename, "pg_tblspc") == 0)
return true;
*/
#define MINIMUM_VERSION_FOR_TERMINATED_TARFILE 150000
+/*
+ * pg_wal/summaries exists beginning with version 17.
+ */
+#define MINIMUM_VERSION_FOR_WAL_SUMMARIES 170000
+
/*
* Different ways to include WAL
*/
void *callback_data);
static void BaseBackup(char *compression_algorithm, char *compression_detail,
CompressionLocation compressloc,
- pg_compress_specification *client_compress);
+ pg_compress_specification *client_compress,
+ char *incremental_manifest);
static bool reached_end_position(XLogRecPtr segendpos, uint32 timeline,
bool segment_finished);
printf(_("\nOptions controlling the output:\n"));
printf(_(" -D, --pgdata=DIRECTORY receive base backup into directory\n"));
printf(_(" -F, --format=p|t output format (plain (default), tar)\n"));
+ printf(_(" -i, --incremental=OLDMANIFEST\n"));
+ printf(_(" take incremental backup\n"));
printf(_(" -r, --max-rate=RATE maximum transfer rate to transfer data directory\n"
" (in kB/s, or use suffix \"k\" or \"M\")\n"));
printf(_(" -R, --write-recovery-conf\n"
if (pg_mkdir_p(statusdir, pg_dir_create_mode) != 0 && errno != EEXIST)
pg_fatal("could not create directory \"%s\": %m", statusdir);
+
+ /*
+ * For newer server versions, likewise create pg_wal/summaries
+ */
+ if (PQserverVersion(conn) < MINIMUM_VERSION_FOR_WAL_SUMMARIES)
+ {
+ char summarydir[MAXPGPATH];
+
+ snprintf(summarydir, sizeof(summarydir), "%s/%s/summaries",
+ basedir,
+ PQserverVersion(conn) < MINIMUM_VERSION_FOR_PG_WAL ?
+ "pg_xlog" : "pg_wal");
+
+ if (pg_mkdir_p(statusdir, pg_dir_create_mode) != 0 &&
+ errno != EEXIST)
+ pg_fatal("could not create directory \"%s\": %m", summarydir);
+ }
}
/*
static void
BaseBackup(char *compression_algorithm, char *compression_detail,
- CompressionLocation compressloc, pg_compress_specification *client_compress)
+ CompressionLocation compressloc,
+ pg_compress_specification *client_compress,
+ char *incremental_manifest)
{
PGresult *res;
char *sysidentifier;
exit(1);
/*
- * Start the actual backup
+ * If the user wants an incremental backup, we must upload the manifest
+ * for the previous backup upon which it is to be based.
+ */
+ if (incremental_manifest != NULL)
+ {
+ int fd;
+ char mbuf[65536];
+ int nbytes;
+
+ /* Reject if server is too old. */
+ if (serverVersion < MINIMUM_VERSION_FOR_WAL_SUMMARIES)
+ pg_fatal("server does not support incremental backup");
+
+ /* Open the file. */
+ fd = open(incremental_manifest, O_RDONLY | PG_BINARY, 0);
+ if (fd < 0)
+ pg_fatal("could not open file \"%s\": %m", incremental_manifest);
+
+ /* Tell the server what we want to do. */
+ if (PQsendQuery(conn, "UPLOAD_MANIFEST") == 0)
+ pg_fatal("could not send replication command \"%s\": %s",
+ "UPLOAD_MANIFEST", PQerrorMessage(conn));
+ res = PQgetResult(conn);
+ if (PQresultStatus(res) != PGRES_COPY_IN)
+ {
+ if (PQresultStatus(res) == PGRES_FATAL_ERROR)
+ pg_fatal("could not upload manifest: %s",
+ PQerrorMessage(conn));
+ else
+ pg_fatal("could not upload manifest: unexpected status %s",
+ PQresStatus(PQresultStatus(res)));
+ }
+
+ /* Loop, reading from the file and sending the data to the server. */
+ while ((nbytes = read(fd, mbuf, sizeof mbuf)) > 0)
+ {
+ if (PQputCopyData(conn, mbuf, nbytes) < 0)
+ pg_fatal("could not send COPY data: %s",
+ PQerrorMessage(conn));
+ }
+
+ /* Bail out if we exited the loop due to an error. */
+ if (nbytes < 0)
+ pg_fatal("could not read file \"%s\": %m", incremental_manifest);
+
+ /* End the COPY operation. */
+ if (PQputCopyEnd(conn, NULL) < 0)
+ pg_fatal("could not send end-of-COPY: %s",
+ PQerrorMessage(conn));
+
+ /* See whether the server is happy with what we sent. */
+ res = PQgetResult(conn);
+ if (PQresultStatus(res) == PGRES_FATAL_ERROR)
+ pg_fatal("could not upload manifest: %s",
+ PQerrorMessage(conn));
+ else if (PQresultStatus(res) != PGRES_COMMAND_OK)
+ pg_fatal("could not upload manifest: unexpected status %s",
+ PQresStatus(PQresultStatus(res)));
+
+ /* Consume ReadyForQuery message from server. */
+ res = PQgetResult(conn);
+ if (res != NULL)
+ pg_fatal("unexpected extra result while sending manifest");
+
+ /* Add INCREMENTAL option to BASE_BACKUP command. */
+ AppendPlainCommandOption(&buf, use_new_option_syntax, "INCREMENTAL");
+ }
+
+ /*
+ * Continue building up the options list for the BASE_BACKUP command.
*/
AppendStringCommandOption(&buf, use_new_option_syntax, "LABEL", label);
if (estimatesize)
else
basebkp = psprintf("BASE_BACKUP %s", buf.data);
+ /* OK, try to start the backup. */
if (PQsendQuery(conn, basebkp) == 0)
pg_fatal("could not send replication command \"%s\": %s",
"BASE_BACKUP", PQerrorMessage(conn));
{"version", no_argument, NULL, 'V'},
{"pgdata", required_argument, NULL, 'D'},
{"format", required_argument, NULL, 'F'},
+ {"incremental", required_argument, NULL, 'i'},
{"checkpoint", required_argument, NULL, 'c'},
{"create-slot", no_argument, NULL, 'C'},
{"max-rate", required_argument, NULL, 'r'},
int option_index;
char *compression_algorithm = "none";
char *compression_detail = NULL;
+ char *incremental_manifest = NULL;
CompressionLocation compressloc = COMPRESS_LOCATION_UNSPECIFIED;
pg_compress_specification client_compress;
atexit(cleanup_directories_atexit);
- while ((c = getopt_long(argc, argv, "c:Cd:D:F:h:l:nNp:Pr:Rs:S:t:T:U:vwWX:zZ:",
+ while ((c = getopt_long(argc, argv, "c:Cd:D:F:h:i:l:nNp:Pr:Rs:S:t:T:U:vwWX:zZ:",
long_options, &option_index)) != -1)
{
switch (c)
case 'h':
dbhost = pg_strdup(optarg);
break;
+ case 'i':
+ incremental_manifest = pg_strdup(optarg);
+ break;
case 'l':
label = pg_strdup(optarg);
break;
}
BaseBackup(compression_algorithm, compression_detail, compressloc,
- &client_compress);
+ &client_compress, incremental_manifest);
success = true;
return 0;
"check backup dir permissions");
}
-# Only archive_sta