From 9e257a181cc1dc5e19eb5d770ce09cc98f470f5f Mon Sep 17 00:00:00 2001
From: Andrew Dunstan
Date: Sun, 24 Mar 2013 11:27:20 -0400
Subject: Add parallel pg_dump option.
New infrastructure is added which creates a set number of workers
(threads on Windows, forked processes on Unix). Jobs are then
handed out to these workers by the master process as needed.
pg_restore is adjusted to use this new infrastructure in place of the
old setup which created a new worker for each step on the fly. Parallel
dumps acquire a snapshot clone in order to stay consistent, if
available.
The parallel option is selected by the -j / --jobs command line
parameter of pg_dump.
Joachim Wieland, lightly editorialized by Andrew Dunstan.
---
doc/src/sgml/backup.sgml | 18 +++++++++
doc/src/sgml/perform.sgml | 9 +++++
doc/src/sgml/ref/pg_dump.sgml | 89 ++++++++++++++++++++++++++++++++++++++++---
3 files changed, 111 insertions(+), 5 deletions(-)
(limited to 'doc/src')
diff --git a/doc/src/sgml/backup.sgml b/doc/src/sgml/backup.sgml
index c4215bed986..e444b1cde3d 100644
--- a/doc/src/sgml/backup.sgml
+++ b/doc/src/sgml/backup.sgml
@@ -310,6 +310,24 @@ pg_restore -d dbname
+
+ Use pg_dump>'s parallel dump feature.
+
+ To speed up the dump of a large database, you can use
+ pg_dump's parallel mode. This will dump
+ multiple tables at the same time. You can control the degree of
+ parallelism with the -j parameter. Parallel dumps
+ are only supported for the "directory" archive format.
+
+
+pg_dump -j num -F d -f out.dirdbname
+
+
+ You can use pg_restore -j to restore a dump in parallel.
+ This will work for any archive of either the "custom" or the "directory"
+ archive mode, whether or not it has been created with pg_dump -j.
+
+
diff --git a/doc/src/sgml/perform.sgml b/doc/src/sgml/perform.sgml
index 1e7544afeb4..34eace35b6e 100644
--- a/doc/src/sgml/perform.sgml
+++ b/doc/src/sgml/perform.sgml
@@ -1433,6 +1433,15 @@ SELECT * FROM x, y, a, b, c WHERE something AND somethingelse;
base backup.
+
+
+ Experiment with the parallel dump and restore modes of both
+ pg_dump> and pg_restore> and find the
+ optimal number of concurrent jobs to use. Dumping and restoring in
+ parallel by means of the
+
Consider whether the whole dump should be restored as a single
diff --git a/doc/src/sgml/ref/pg_dump.sgml b/doc/src/sgml/ref/pg_dump.sgml
index 6d0f214d423..0186ce0938b 100644
--- a/doc/src/sgml/ref/pg_dump.sgml
+++ b/doc/src/sgml/ref/pg_dump.sgml
@@ -73,10 +73,12 @@ PostgreSQL documentation
transfer mechanism. pg_dump can be used to
backup an entire database, then pg_restore
can be used to examine the archive and/or select which parts of the
- database are to be restored. The most flexible output file format is
- the custom format (). It allows
- for selection and reordering of all archived items, and is compressed
- by default.
+ database are to be restored. The most flexible output file formats are
+ the custom format () and the
+ directory format(). They allow
+ for selection and reordering of all archived items, support parallel
+ restoration, and are compressed by default. The directory
+ format is the only format that supports parallel dumps.
@@ -251,7 +253,8 @@ PostgreSQL documentation
can read. A directory format archive can be manipulated with
standard Unix tools; for example, files in an uncompressed archive
can be compressed with the gzip tool.
- This format is compressed by default.
+ This format is compressed by default and also supports parallel
+ dumps.
@@ -285,6 +288,62 @@ PostgreSQL documentation
+
+
+
+
+
+ Run the dump in parallel by dumping njobs
+ tables simultaneously. This option reduces the time of the dump but it also
+ increases the load on the database server. You can only use this option with the
+ directory output format because this is the only output format where multiple processes
+ can write their data at the same time.
+
+
+ pg_dump> will open njobs
+ + 1 connections to the database, so make sure your
+ setting is high enough to accommodate all connections.
+
+
+ Requesting exclusive locks on database objects while running a parallel dump could
+ cause the dump to fail. The reason is that the pg_dump> master process
+ requests shared locks on the objects that the worker processes are going to dump later
+ in order to
+ make sure that nobody deletes them and makes them go away while the dump is running.
+ If another client then requests an exclusive lock on a table, that lock will not be
+ granted but will be queued waiting for the shared lock of the master process to be
+ released.. Consequently any other access to the table will not be granted either and
+ will queue after the exclusive lock request. This includes the worker process trying
+ to dump the table. Without any precautions this would be a classic deadlock situation.
+ To detect this conflict, the pg_dump> worker process requests another
+ shared lock using the NOWAIT> option. If the worker process is not granted
+ this shared lock, somebody else must have requested an exclusive lock in the meantime
+ and there is no way to continue with the dump, so pg_dump> has no choice
+ but to abort the dump.
+
+
+ For a consistent backup, the database server needs to support synchronized snapshots,
+ a feature that was introduced in PostgreSQL 9.2. With this
+ feature, database clients can ensure they see the same dataset even though they use
+ different connections. pg_dump -j uses multiple database
+ connections; it connects to the database once with the master process and
+ once again for each worker job. Without the sychronized snapshot feature, the
+ different worker jobs wouldn't be guaranteed to see the same data in each connection,
+ which could lead to an inconsistent backup.
+
+
+ If you want to run a parallel dump of a pre-9.2 server, you need to make sure that the
+ database content doesn't change from between the time the master connects to the
+ database until the last worker job has connected to the database. The easiest way to
+ do this is to halt any data modifying processes (DDL and DML) accessing the database
+ before starting the backup. You also need to specify the
+ parameter when running
+ pg_dump -j against a pre-9.2 PostgreSQL
+ server.
+
+
+
+
@@ -690,6 +749,17 @@ PostgreSQL documentation
+
+
+
+
+ This option allows running pg_dump -j> against a pre-9.2
+ server, see the documentation of the parameter
+ for more details.
+
+
+
+
@@ -1082,6 +1152,15 @@ CREATE DATABASE foo WITH TEMPLATE template0;
+
+ To dump a database into a directory-format archive in parallel with
+ 5 worker jobs:
+
+
+$pg_dump -Fd mydb -j 5 -f dumpdir
+
+
+
To reload an archive file into a (freshly created) database named
newdb>:
--
cgit v1.2.3