summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorMarko Kreen2011-11-25 14:36:40 +0000
committerMarko Kreen2011-11-25 14:36:40 +0000
commit319bce547fc47208f70220ceb88b16c69a7359d8 (patch)
tree700b40271702d06edd7026f4affac0462eebd486
parent4e4ee32b2f38bdba7737e0754a557cd2b8677f67 (diff)
parenteb6387479a59f0b9923cb70f24d4b5f442fb1010 (diff)
Merge remote-tracking branch 'klando/for_marko'
Conflicts: README
-rw-r--r--README44
-rw-r--r--doc/Makefile4
-rw-r--r--doc/londiste3.txt357
-rw-r--r--doc/qadmin.txt272
-rw-r--r--doc/walmgr.txt304
-rw-r--r--doc/walmgr3.txt519
-rwxr-xr-xpython/walmgr.py4
7 files changed, 1171 insertions, 333 deletions
diff --git a/README b/README
index 84b7dc9a..074f2ba4 100644
--- a/README
+++ b/README
@@ -1,8 +1,9 @@
= SkyTools - tools for PostgreSQL =
-This is a package of tools in use in Skype for replication and
-failover. Also it includes a generic queuing mechanism PgQ and
-utility library for Python scripts.
+This is a package of tools in use in Skype for replication and failover.
+It also includes a generic queuing mechanism PgQ and utility library for
+Python scripts, as well as a script for setting up and managing WAL based
+standby servers.
== Overview ==
@@ -10,21 +11,40 @@ It contains following tools:
=== PgQ ===
-This is the queue mechanism we use. It consists of PL/pgsql, and C code
-in database, with Python framework on top of it. It is based on
-snapshot based event handling ideas from Slony-I, written for general
-usage.
+PgQ is a queuing system written in PL/pgsql, Python and C code. It is
+based on snapshot based event handling ideas from Slony-I, written for
+general usage.
-Features:
+PgQ provides an efficient, transactionnal, queueing system with
+multi-nodes support (including work sharing and splitting, failover and
+switchover, for queues and for consumers).
+
+Rules:
- There can be several queues in database.
- There can be several producers than can insert into any queue.
-- There can be several consumers on one queue and all consumers see
- all events.
+- There can be several consumers on one queue.
+- There can be several subconsumers on a consumer.
+
+PgQ is splited into 3 layers: Producers, Ticker and Consumers.
+
+*Producers* and *Consumers* respectively push and read events into a
+queue. Producers just need to call PostgreSQL stored procedures (like a
+trigger on a table or a PostgreSQL call from the application). And
+consumers are frequently written in Python, the prefered language as it
+has a powerful SKytools Framework but it is not limitative and any
+language able to run PostgreSQL stored procedures can be used.
+
+*Ticker* is a daemon which splits the queues in batch of events and
+handle the maintenance of the system. The Ticker is provided with the
+Skytools.
Documentation:
-- http://skytools.projects.postgresql.org/skytools-3.0/
+- PgQ ticker daemon (pgqd) usage: link:pgqd.html[]
+- PgQ admin tool (qadm) usage: link:qadmin.html[]
+- PgQ SQL API overview: link:pgq-sql.html[]
+- PgQ SQL reference: http://skytools.projects.postgresql.org/skytools-3.0/pgq/
=== Londiste ===
@@ -50,6 +70,8 @@ Documentation:
This script will setup WAL archiving, does initial backup and
runtime WAL archive and restore.
+It can also be used for up-to-last-second partial file copying,
+so less than whole file is lost in case of loss of master database server.
== Source tree contents ==
diff --git a/doc/Makefile b/doc/Makefile
index b7791d90..fe8d3d55 100644
--- a/doc/Makefile
+++ b/doc/Makefile
@@ -11,11 +11,11 @@ HTMLS = README.html INSTALL.html \
TODO.html pgq-sql.html pgq-nodupes.html \
$(SCRIPT_HTMLS) faq.html set.notes.html skytools3.html devnotes.html
-SCRIPT_TXTS = walmgr.txt qadmin.txt scriptmgr.txt skytools_upgrade.txt
+SCRIPT_TXTS = londiste3.txt walmgr3.txt qadmin.txt scriptmgr.txt skytools_upgrade.txt
SCRIPT_HTMLS = $(SCRIPT_TXTS:.txt=.html)
MAN5 =
-MAN1_SFX = scriptmgr.1 skytools_upgrade.1 walmgr.1
+MAN1_SFX = scriptmgr.1 skytools_upgrade.1 walmgr3.1 londiste3.1
MAN1 = qadmin.1 pgqd.1
FQHTML = $(addprefix html/, $(HTMLS))
diff --git a/doc/londiste3.txt b/doc/londiste3.txt
new file mode 100644
index 00000000..16865920
--- /dev/null
+++ b/doc/londiste3.txt
@@ -0,0 +1,357 @@
+
+= londiste3(1) =
+
+== NAME ==
+
+londiste3 - tool for managing trigger-based replication for PostgreSQL.
+
+== SYNOPSIS ==
+
+ londiste3 <config.ini> command [subcommand] [options]
+
+== DESCRIPTION ==
+
+Londiste allows to setup and administer the replication, and is run as a
+daemon to handle the replication itself. (Londiste is just a complex
+PgQ Consumer).
+
+See <<examples,EXAMPLES>> below to start your first trigger-based
+replication in few steps.
+
+The 'command' is one of Node, Replication, Information or Internal
+commands listed below.
+
+Londiste introduces the notion of 'takeover', it is the action when a
+local node takes over the work of another node. The other node can be a
+root node or a branch node and it can be dead or alive when the action
+is run.
+
+Londiste also allows, among many other features, cascading replication,
+partial replication and custom handlers for replication.
+
+== GENERAL OPTIONS ==
+
+ -V, --version::
+ Print version info and exit.
+
+ -h, --help::
+ Show this help message and exit.
+
+ -q, --quiet::
+ Log only errors and warnings.
+
+ -v, --verbose::
+ Log verbosely.
+
+ -d, --daemon::
+ Run in daemon mode (go background).
+
+== SPECIFIC OPTIONS ==
+
+ --ini::
+ Display sample ini file.
+
+ --set="'param=val[,param=value]'"::
+ Override config setting.
+
+== DAEMON OPTIONS ==
+
+ -r, --reload::
+ Reload config (send SIGHUP).
+
+ -s, --stop::
+ Stop program safely (send SIGINT).
+
+ -k, --kill::
+ Kill program immediately (send SIGTERM).
+
+== REPLICATION EXTRA ARGUMENTS ==
+
+ --rewind::
+ Change queue position according to destination.
+
+ --reset::
+ Reset queue position on destination side.
+
+== NODE INITIALIZATION COMMANDS ==
+
+Initialization commands are the only ones that requires a connection
+string argument. It is the connection string that Londiste3 will store,
+other nodes will use it to connect to the current node.
+
+=== create-root <node> <connstr> ===
+
+Initializes a Master node.
+
+The <node> is the name of the node, it should be unique.
+The <connstr> argument is the connection string to the database on the
+Master node.
+
+=== create-branch <node> <connstr> --provider=<public_connstr> ===
+
+Initializes a Slave node which can be used as a reference for other
+nodes.
+
+The <node> is the name of the node, it should be unique.
+The <connstr> argument is the connection string to the database on the
+current node and <public_connstr> is the connection string to the
+provider database (it can be a root node or a branch node).
+
+=== create-leaf <node> <connstr> --provider=<public_connstr> ===
+
+Initializes a Slave node which can not be used as a reference for other
+nodes.
+
+The <node> is the name of the node, it should be unique.
+The <connstr> argument is the connection string to the database on the
+current node and <public_connstr> is the connection string to the
+provider database (it can be a root node or a branch node).
+
+ --merge='qname'::
+ combined queue name
+
+== NODE ADMINISTRATION COMMANDS ==
+
+=== pause ===
+
+Pause the consumer: the replication of the events is stopped and can be
+resumed later.
+
+=== resume ===
+
+When the consumer has been paused, let it replay again.
+
+=== change-provider <tonode> ===
+
+Make <tonode> become the new provider for the current node.
+
+TODO: londiste.py need update (param change from --provider)
+
+=== takeover <fromnode> [--target=<tonode>] [--all] [--dead] ===
+
+This command allows to achieve failover and switchover for any of your
+providers (root or branch nodes).
+
+ --target='tonode'::
+ Target node of the takeover.
+
+ --all::
+ In addition to take over the work from the 'fromnode', make other
+ nodes change their provider to the current node.
+
+ --dead::
+ Don't wait to take the new role and flag the 'fromnode' as dead.
+
+ --dead='deadnode'::
+ Assume node is dead.
+ TODO : why use this one ?
+
+ --dead-root::
+ Old node was root.
+
+ --dead-branch::
+ Old node was branch
+
+=== drop-node <node> ===
+
+Remove the node from the Londiste replication.
+
+Londiste triggers on the node are removed but Londiste or PgQ are not
+removed.
+
+=== tag-dead <node> ===
+
+Tag the node as dead, the command can be run from any node in the
+replication.
+
+=== tag-alive <node> ===
+
+Tag the node as alive, the command can be run from any node in the
+replication.
+
+== INFORMATION COMMANDS ==
+
+=== status ===
+
+Show status of the replication viewed by the current node.
+
+The output is a tree of the members of the replication with their lags,
+last tick, status and the number of tables in state: ok/half/ignored
+(replicated, initial copy not finnished, table not replicated locally).
+
+=== members ===
+
+Show members of the replication viewed by the current node.
+
+Output the nodes name, status and node location (connection string to
+the node).
+
+=== show-consumers [--node] ===
+
+TODO: command is not working
+
+== REPLICATION DAEMON COMMAND ==
+
+=== worker ===
+
+Replay events to subscriber: it is needed to make the replication active
+as it will start to replay the events.
+
+== REPLICATION ADMINISTRATION COMMANDS ==
+
+=== add-table <table> [args] ===
+
+Add the table to the replication.
+
+See <<add_args,ADD ARGUMENTS>> below for the list of possible arguments.
+
+=== remove-table <table> ===
+
+Remove the table from the replication.
+
+=== add-seq <seq> [args] ===
+
+Add the sequence to the replication.
+
+See <<add_args,ADD ARGUMENTS>> below for the list of possible arguments.
+
+=== remove-seq <seq> ===
+
+Remove the sequence from the replication.
+
+=== tables ===
+
+Show all tables on provider.
+
+=== seqs ===
+
+Show all sequences on provider.
+
+=== missing ===
+
+List tables subscriber has not yet attached to.
+
+=== resync <table> ===
+
+Do full copy of the table, again.
+
+== ADD ARGUMENTS [[add_args]] ==
+
+ --all::
+ Include all possible tables.
+
+ --dest-table='table'::
+ Redirect changes to different table.
+
+ --force::
+ Ignore table differences.
+
+ --expect-sync::
+ No copy needed.
+
+ --skip-truncate::
+ Keep old data.
+
+ --create::
+ Create table/sequence if not exist, with minimal schema.
+
+ --create-full::
+ Create table/sequence if not exist, with full schema.
+
+ --trigger-flags='trigger_flags'::
+ Trigger creation flags, see below for details.
+
+ --trigger-arg='trigger_arg'::
+ Custom trigger arg (can be specified multiply times).
+
+ --no-triggers::
+ Dont put triggers on table (makes sense on leaf node).
+
+ --handler='handler'::
+ Custom handler for table.
+
+ --handler-arg='handler_arg'::
+ Argument to custom handler.
+
+ --copy-condition='copy_condition'::
+ Set WHERE expression for copy.
+
+ --merge-all::
+ Merge tables from all source queues.
+
+ --no-merge::
+ Don't merge tables from source queues.
+
+ --max-parallel-copy='max_parallel_copy'::
+ Max number of parallel copy processes.
+
+Trigger creation flags (default: AIUDL):
+
+- I - ON INSERT
+- U - ON UPDATE
+- D - ON DELETE
+- Q - use pgq.sqltriga() as trigger function
+- L - use pgq.logutriga() as trigger function
+- B - BEFORE
+- A - AFTER
+- S - SKIP
+
+== REPLICATION EXTRA COMMANDS ==
+
+=== check ===
+
+Compare table structure on both sides.
+
+=== fkeys ===
+
+Print out fkey drop/create commands.
+
+=== compare [<table>] ===
+
+Compare table contents on both sides.
+
+=== repair [<table>] [--force] ===
+
+Repair data on subscriber.
+
+ --force::
+ Ignore lag.
+
+=== execute [filepath] ===
+
+Execute SQL files on each nodes of the set.
+
+=== show-handlers ['handler'] ===
+
+Show info about all or a specific handler.
+
+== INTERNAL COMMAND ==
+
+=== copy ===
+
+Copy table logic.
+
+== EXIT STATUS ==
+
+ 0::
+ Successful program execution.
+
+== ENVIRONMENT ==
+
+PostgreSQL environment variables can be used.
+
+== EXAMPLES [[examples]] ==
+
+Londiste is provided with HowTos to help you make your fisrt steps:
+
+- How to set up simple replication.
+- How to set up cascaded replication.
+- How to set up table partitioning (handlers).
+
+
+
+
+
+
+
+
diff --git a/doc/qadmin.txt b/doc/qadmin.txt
index 0c0b1d04..e406cad2 100644
--- a/doc/qadmin.txt
+++ b/doc/qadmin.txt
@@ -7,33 +7,279 @@ qadmin - Easy to use admin console to examine and administer PgQ queues.
== SYNOPSIS ==
- qadmin.py [ options ... ]
+ qadmin [options]
== DESCRIPTION ==
-The goal is to be psql-like console for queue administration.
+This is a psql-like console for queue administration.
+
+The console offers a large number of command to setup, control and
+manage PgQ queueing system.
+
+It also offers a non-interactive mode to run one or more commands.
+
+qadmin keeps an history file in the home of the user (`~/.qadmin_history`).
== GENERAL OPTIONS ==
-Initial connection options (for libpq):
+ --help::
+ Help screen.
+
+ --version::
+ Print version.
+
+== CONNECTION OPTIONS ==
+
+ -h host::
+ Sspecify host to connect to (default: localhost via unix socket).
+
+ -p port::
+ Specify port.
+
+ -U user::
+ Specify user name.
+
+ -d dbname::
+ Database name.
+
+ -Q queuename::
+ Queue name, it is used as the `default queue` if it is provided.
+
+== SPECIFIC OPTIONS ==
+
+ -c 'cmd_string'::
+ Execute console command.
+
+ -f 'execfile'::
+ Execute file containing console commands.
+
+== CONNECTION COMMANDS ==
+
+qadmin offers to connect to other databases or queues from the console
+itself.
+
+=== connect <connstring> [queue=<qname>]; ===
+
+Connect the console with the specified connection string, optional
+parameter to set the default queue on connection.
+
+=== connect [queue=<qname>] [node=<node>]; ===
+
+Connect the console to the specified queue and/or node.
+
+== MANAGEMENT COMMANDS ==
+
+=== install pgq | londiste; ===
+
+Install PgQ or Londiste to the connected database.
+
+=== create queue <qname>; ===
+
+Create the specified queue.
+
+=== alter queue <qname | *> set param =<foo=1>,<bar=2>; ===
+
+Set one or more parameters on one or all queue at once.
+
+=== drop queue <qname>; ===
+
+Drop the named queue.
+
+=== register consumer <consumer> [on <qname> | at <tick_id> | copy <consumer> ]; ===
+
+Register a consumer on a queue, or at a specified tick or based on
+another consumer.
+
+=== unregister consumer <consumer | *> [from <qname>]; ===
+
+Unregister one or all consumers, if the console is not connected to a
+queue, its name must be provided.
+
+=== register subconsumer <subconsumer> for <consumer> [on <qname>]; ===
+
+Register a subconsumer to a consumer, if the console is not connected to a
+queue, its name must be provided.
+
+=== unregister subconsumer <subconsumer | *> for <consumer> [from <qname>] [close [batch]]; ===
+
+Unregister one or all subconsumers from a consumer, if the console is
+not connected to a queue, its name must be provided.
+Current batch can be aborted if the `close batch` subcommand is
+provided.
+
+== SHOW COMMANDS ==
+
+=== show help; ===
+
+Show all the console commands.
+
+=== show queue [ <qname | *> ]; ===
+
+Show details of one or all queues.
+
+=== show table <tbl>; ===
+
+Show DDL for the specified table.
+
+=== show sequence <seq>; ===
+
+Show DDL for the specified sequence.
+
+=== show consumer [ <consumer | *> [on <qname>] ]; ===
+
+Show details of one or all consumers on one or all queues.
+
+=== show batch <batch_id>; ===
+
+Show details of the batch, default queue must be set (see `connect queue`)
+
+=== show batch <consumer>; ===
+
+Show details of the current batch for the specified consumer, default
+queue must be set (see `connect queue`)
+
+== LONDISTE COMMANDS ==
+
+All this commands are applyed on the node where the console is connected
+to.
+
+=== londiste add table <tbl> [with ... ] ===
+
+ with no_triggers::
+ Skip trigger creation.
+
+ with skip_truncate::
+ Does not truncate the table on the destination.
+
+ with expect_sync::
+ Set table state to 'ok'.
+
+ with tgflags='UIDBAQL'::
+ Trigger creation flags, see below for details.
+
+ with backup::
+ Put urlencoded contents of old row to `ev_extra2`.
+
+ with skip::
+ Create skip trigger. Same as S flag.
+
+ with when='expr'::
+ If 'expr' returns false, don't insert event.
+
+ with ev_XX='EXPR'::
+ Overwrite default ev_* columns (see below).
+
+Trigger creation flags (default: AIUDL):
+
+- I - ON INSERT
+- U - ON UPDATE
+- D - ON DELETE
+- Q - use pgq.sqltriga() as trigger function
+- L - use pgq.logutriga() as trigger function
+- B - BEFORE
+- A - AFTER
+- S - SKIP
+
+Queue event fields:
+
+- ev_type - I/U/D
+- ev_data - partial SQL statement
+- ev_extra1 - table name
+- ev_extra2 - optional urlencoded backup
+
+
+=== londiste add sequence <seq>; ===
+
+Add the specified sequence to Londiste replication.
+
+=== londiste remove table <tbl> [,tbl]; ===
+
+Remove the specified table(s) from the Londiste replication.
+
+=== londiste remove sequence <seq> [,seq]; ===
+
+Remove the specified sequence(s) from the Londiste replication.
+
+=== londiste tables; ===
+
+List registered tables and informations about them:
+
+- table_name - fully-qualified table name
+- local - does events needs to be applied to local table
+- merge_state - show phase of initial copy
+- custom_snapshot - remote snapshot of COPY transaction
+- table_attrs - urlencoded dict of table attributes
+- dropped_ddl - partition combining: temp place to put DDL
+- copy_role - partition combining: how to handle copy
+- copy_pos - position in parallel copy working order
+
+==== copy_role = lead ====
+
+On copy start, drop indexes and store in dropped_ddl.
+
+On copy finish change state to catching-up, then wait until copy_role
+turns to NULL.
+
+Catching-up: if dropped_ddl is not NULL, restore them.
+
+==== copy_role = wait-copy ====
+
+On copy start wait, until role changes (to wait-replay).
+
+==== copy_role = wait-replay ====
+
+On copy finish, tag as 'catching-up'.
+
+Wait until copy_role is NULL, then proceed.
+
+=== londiste seqs; ===
+
+List registered sequences on this node and their last value.
+
+=== londiste missing; ===
+
+On Master, list tables not registered on set.
+
+On Slave, list tables on set but not registered locally.
+
+== OTHER COMMANDS ==
+
+=== exit; ===
+
+Quit program.
+
+=== ^D ===
+
+Quit program.
+
+=== ^C ===
- -h host:: specify host to connect to (default: localhost via unix socket)
+Clear current buffer.
- -p port:: specify port
+== EXIT STATUS ==
- -U user:: specify user name
+ 0::
+ Successful program execution.
- -d dbname:: database name
+== ENVIRONMENT ==
-Options for qadmin:
+PostgreSQL environment variables can be used.
- -Q queuename:: queue name
+== NOT IMPLEMENTED COMMANDS ==
- -c cmd_string:: execute command
+TODO : is it up-to-date ?
- -f execfile:: execute file containing commands
+- create <root | branch | leaf> node <node> location <loc> [on <qname>];
- --help:: help screen
+- alter node <name> provider <new>;
+- alter node <name> takeover <oldnow> with all;
+- alter node <name> rename <new>;
+- alter node <name> [location=<loc>]
- --version:: print version
+- drop node <name> [on <qname>];
+- takeover <oldnode>;
+- show node [ <node | *> [on <qname>] ];
+- show cascade;
+- show_queue_stats <q>;
+- status
diff --git a/doc/walmgr.txt b/doc/walmgr.txt
deleted file mode 100644
index fd3605cc..00000000
--- a/doc/walmgr.txt
+++ /dev/null
@@ -1,304 +0,0 @@
-
-= walmgr(1) =
-
-== NAME ==
-
-walmgr - tools for managing WAL-based replication for PostgreSQL.
-
-== SYNOPSIS ==
-
- walmgr.py <config.ini> command
-
-== DESCRIPTION ==
-
-It is both admin and worker script for PostgreSQL PITR replication.
-
-== QUICK START ==
-
-1. Set up passwordless ssh authentication from master to slave
-
- master$ test -f ~/.ssh/id_dsa.pub || ssh-keygen -t dsa
- master$ cat ~/.ssh/id_dsa.pub | ssh slave cat \>\> .ssh/authorized_keys
-
-2. Configure paths
-
- master$ edit master.ini
- slave$ edit slave.ini
-
- Make sure that walmgr.py executable has same pathname on slave and master.
-
-3. Start archival process and create a base backup
-
- master$ ./walmgr.py master.ini setup
- master$ ./walmgr.py master.ini backup
-
- Note: starting from PostgreSQL 8.3 the archiving is enabled by setting
- archive_mode GUC to on. However changing this parameter requires the
- server to be restarted.
-
-4. Prepare postgresql.conf and pg_hba.conf on slave and start replay
-
- master$ scp $PGDATA/*.conf slave:
- slave$ ./walmgr.py slave.ini restore
-
- For debian based distributions the standard configuration files are located
- in /etc/postgresql/x.x/main directory. If another scheme is used the postgresql.conf
- and pg_hba.conf should be copied to slave full_backup directory. Make sure to
- disable archive_command in slave config.
-
- 'walmgr.py restore' moves data in place, creates recovery.conf and starts postmaster
- in recovery mode.
-
-5. In-progress WAL segments can be backup by command:
-
- master$ ./walmgr.py master.ini sync
-
-6. If need to stop replay on slave and boot into normal mode, do:
-
- slave$ ./walmgr.py slave.ini boot
-
-== GENERAL OPTIONS ==
-
-Common options to all walmgr.py commands.
-
- -h, --help::
- show this help message and exit
-
- -q, --quiet::
- make program silent
-
- -v, --verbose::
- make program more verbose
-
- -n, --not-really::
- Show what would be done without actually doing anything.
-
-== MASTER COMMANDS ==
-
-=== setup ===
-Sets up postgres archiving, creates necessary directory structures on slave.
-
-=== sync ===
-Synchronizes in-progress WAL files to slave.
-
-=== syncdaemon ===
-Start WAL synchronization in daemon mode. This will start periodically synching
-the in-progress WAL files to slave.
-
-The following parameters are used to drive the syncdaemon:
-loop_delay - how long to sleep between the synchs.
-use_xlog_functions - use record based shipping to synchronize in-progress WAL segments.
-
-=== stop ===
-Deconfigures postgres archiving.
-
-=== periodic ===
-Runs periodic command, if configured. This enables to execute arbitrary commands
-on interval, useful for synchronizing scripts, config files, crontabs etc.
-
-=== listbackups ===
-List backup sets available on slave node.
-
-=== backup ===
-Creates a new base backup from master database. Will purge expired backups and WAL
-files on slave if `keep_backups` is specified. During a backup a lock file is
-created in slave `completed_wals` directory. This is to prevent simultaneous
-backups and resulting corruption. If running backup is terminated, the BACKUPLOCK
-file may have to be removed manually.
-
-=== restore <set> <dst> ===
-EXPERIMENTAL. Attempts to restore the backup from slave to master.
-
-== SLAVE COMMANDS ==
-
-=== boot ===
-Stop log playback and bring the database up.
-
-=== pause ===
-Pauses WAL playback.
-
-=== continue ===
-Continues previously paused WAL playback.
-
-=== listbackups ===
-Lists available backups.
-
-=== backup ===
-EXPERIMENTAL. Creates a new backup from slave data. Log replay is paused,
-slave data directory is backed up to `full_backup` directory and log
-replay resumed. Backups are rotated as needed. The idea is to move the
-backup load away from production node. Usable from postgres 8.2 and up.
-
-=== restore [src][dst] ===
-Restores the specified backup set to target directory. If specified without
-arguments the latest backup is *moved* to slave data directory (doesn't obey
-retention rules). If src backup is specified the backup is copied (instead of moving).
-Alternative destination directory can be specified with `dst`.
-
-== CONFIGURATION ==
-
-=== Common settings ===
-
-==== job_name ====
-Optional. Indentifies this script, used in logging. Keep unique if
-using central logging.
-
-==== logfile ====
-Where to log.
-
-==== use_skylog ====
-Optional. If nonzero, skylog.ini is used for log configuration.
-
-=== Master settings ===
-
-==== pidfile ====
-Pid file location for syncdaemon mode (if running with -d). Otherwise
-not required.
-
-==== master_db ====
-Database to connect to for pg_start_backup() etc. It is not a
-good idea to use `dbname=template` if running syncdaemon in
-record shipping mode.
-
-==== master_data ====
-Master data directory location.
-
-==== master_config ====
-Master postgresql.conf file location. This is where
-`archive_command` gets updated.
-
-==== master_restart_cmd ====
-The command to restart master database, this used after changing
-`archive_mode` parameter. Leave unset, if you cannot afford to
-restart the database at setup/stop.
-
-==== slave ====
-Slave host and base directory.
-
-==== slave_config ====
-Configuration file location for the slave walmgr.
-
-==== completed_wals ====
-Slave directory where archived WAL files are copied.
-
-==== partial_wals ====
-Slave directory where incomplete WAL files are stored.
-
-==== full_backup ====
-Slave directory where full backups are stored.
-
-==== config_backup ====
-Slave directory where configuration file backups are stored. Optional.
-
-==== loop_delay ====
-The frequency of syncdaemon updates. In record shipping mode only
-incremental updates are sent, so smaller interval can be used.
-
-==== use_xlog_functions ====
-Use pg_xlog functions for record based shipping (available in 8.2 and up).
-
-==== compression ====
-If nonzero, a -z flag is added to rsync cmdline. Will reduce network
-traffic at the cost of extra CPU time.
-
-==== periodic_command ====
-Shell script to be executed at specified time interval. Can be used for
-synchronizing scripts, config files etc.
-
-==== command_interval ====
-How ofter to run periodic command script. In seconds, and only evaluated
-at log switch times.
-
-==== hot_standby ===
-Boolean. If set to true, walmgr setup will set wal_level to hot_standby (9.0 and newer).
-
-=== Sample master.ini ===
-
- [wal-master]
- logfile = master.log
- pidfile = master.pid
- master_db = dbname=template1
- master_data = /var/lib/postgresql/8.0/main
- master_config = /etc/postgresql/8.0/main/postgresql.conf
- slave = slave:/var/lib/postgresql/walshipping
- completed_wals = %(slave)s/logs.complete
- partial_wals = %(slave)s/logs.partial
- full_backup = %(slave)s/data.master
- loop_delay = 10.0
- use_xlog_functions = 1
- compression = 1
-
-=== Slave settings ===
-
-==== slave_data ====
-Postgres data directory for the slave. This is where the restored
-backup is copied/moved.
-
-==== slave_config_dir ====
-Directory for postgres configuration files. If specified, "walmgr restore"
-attempts to restore configuration files from config_backup directory.
-
-==== slave_stop_cmd ====
-Script to stop postmaster on slave.
-
-==== slave_start_cmd ====
-Script to start postmaster on slave.
-
-==== slave ====
-Base directory for slave files (logs.complete, data.master etc)
-
-==== slave_bin ====
-Specifies the location of postgres binaries (pg_controldata, etc). Needed if
-they are not already in the PATH.
-
-==== completed_wals ====
-Directory where complete WAL files are stored. Also miscellaneous control files
-are created in this directory (BACKUPLOCK, STOP, PAUSE, etc.).
-
-==== partial_wals ====
-Directory where partial WAL files are stored.
-
-==== full_backup ====
-Directory where full backups are stored.
-
-==== keep_backups ====
-Number of backups to keep. Also all WAL files needed to bring earliest
-
-backup up to date are kept. The backups are rotated before new backup
-is started, so at one point there is actually one less backup available.
-
-It probably doesn't make sense to specify `keep_backups` if periodic
-backups are not performed - the WAL files will pile up quickly.
-
-Backups will be named data.master, data.master.0, data.master.1 etc.
-
-==== archive_command ====
-Script to execute before rotating away the oldest backup. If it fails
-backups will not be rotated.
-
-
-==== slave_pg_xlog ====
-Set slave_pg_xlog to the directory on the slave where pg_xlog files get
-written to. On a restore to the slave walmgr.py will
-create a symbolic link from data/pg_xlog to this location.
-
-
-==== backup_datadir ====
-Set backup_datadir to 'no' to prevent walmgr.py from making a backup
-of the data directory when restoring to the slave. This defaults to
-'yes'
-
-
-=== Sample slave.ini ===
-
- [wal-slave]
- logfile = slave.log
- slave_data = /var/lib/postgresql/8.0/main
- slave_stop_cmd = /etc/init.d/postgresql-8.0 stop
- slave_start_cmd = /etc/init.d/postgresql-8.0 start
- slave = /var/lib/postgresql/walshipping
- completed_wals = %(slave)s/logs.complete
- partial_wals = %(slave)s/logs.partial
- full_backup = %(slave)s/data.master
- keep_backups = 5
- backup_datadir = yes
diff --git a/doc/walmgr3.txt b/doc/walmgr3.txt
new file mode 100644
index 00000000..5961be07
--- /dev/null
+++ b/doc/walmgr3.txt
@@ -0,0 +1,519 @@
+
+= walmgr3(1) =
+
+== NAME ==
+
+walmgr3 - tool for managing WAL-based replication for PostgreSQL.
+
+== SYNOPSIS ==
+
+ walmgr3 <config.ini> command [--not-really] [options]
+
+== DESCRIPTION ==
+
+Walmgr3 is a tool to handle replication of PostgreSQL with PITR (also
+known as Log Shipping).
+
+This script allows to setup and administer the replication, it is also
+used by PostgreSQL to archive and restore the WAL files.
+
+See <<quick_start,QUICK START>> below to start your first log shipping
+in few steps.
+
+The 'command' is one of the Master, Slave, Common or Internal commands
+listed below.
+
+== GENERAL OPTIONS ==
+
+ -V, --version::
+ Print version info and exit.
+
+ -h, --help::
+ Show this help message and exit.
+
+ -q, --quiet::
+ Log only errors and warnings.
+
+ -v, --verbose::
+ Log verbosely.
+
+ -d, --daemon::
+ Run in daemon mode (go background).
+
+ -n, --not-really::
+ Show what would be done without actually doing anything.
+
+== SPECIFIC OPTIONS ==
+
+ --ini::
+ Display sample ini file.
+
+ --set="'param=val[,param=value]'"::
+ Override config setting
+
+ --init-master::
+ Initialize Master walmgr3 configuration.
+
+ --init-slave::
+ Initialize Slave walmgr3 configuration.
+
+ --config-dir='filepath'::
+ Configuration file location for `--init-X commands`.
+
+ --slave='hostname'::
+ Slave host name.
+
+ --pgdata='path'::
+ PostgreSQL data directory.
+
+ --ssh-keygen::
+ Generate a SSH key pair if needed (used in Master).
+
+ --ssh-add-key='keyfile.pub'::
+ Add the public key file to authorized_hosts file (used in Slave).
+
+ --ssh-remove-key='ssh_key'::
+ Remove Master ssh key from authorized_hosts file (used in Slave).
+
+ --primary-conninfo='connection_string'::
+ Provide the connection string to the streaming replication Master
+ (used in Slave).
+
+== DAEMON OPTIONS ==
+
+ -r, --reload::
+ Reload config (send SIGHUP).
+
+ -s, --stop::
+ Stop program safely (send SIGINT).
+
+ -k, --kill::
+ Kill program immediately (send SIGTERM).
+
+== MASTER COMMANDS ==
+
+=== setup ===
+
+Sets up PostgreSQL for WAL archiving, creates necessary directory
+structures on Slave.
+
+=== sync ===
+
+Copies in-progress WAL files to Slave.
+
+=== syncdaemon ===
+
+Start WAL synchronization in daemon mode. This will start periodically
+synching the in-progress WAL files to Slave.
+
+The following configuration parameters are used to drive the syncdaemon:
+
+ - *loop_delay* - how long to sleep between the synchs.
+
+ - *use_xlog_functions* - use record based shipping to synchronize
+ in-progress WAL segments.
+
+=== stop ===
+
+Stop archiving and de-configure PostgreSQL archiving.
+
+=== periodic ===
+
+Runs periodic command if configured. This enables to execute arbitrary
+commands on interval, useful for synchronizing scripts, config files,
+crontabs etc.
+
+== SLAVE COMMANDS ==
+
+=== boot ===
+
+Stop WAL playback and bring the database up so it can accept queries.
+
+=== pause ===
+
+Pauses WAL playback.
+
+=== continue ===
+
+Continues previously paused WAL playback.
+
+== COMMON COMMANDS ==
+
+=== listbackups ===
+
+Lists available backups on Slave node.
+
+=== backup ===
+
+Creates a new base backup from Master database. Will purge expired
+backups and WAL files on Slave if `keep_backups` is not specified.
+During a backup a lock file is created in Slave `completed_wals`
+directory. This is to prevent simultaneous backups and resulting
+corruption. If running backup is terminated, the BACKUPLOCK file may
+have to be removed manually.
+
+EXPERIMENTAL: If run on Slave, creates backup from in-recovery Slave
+data. WAL playback is paused, Slave data directory is backed up to
+`full_backup` directory and WAL playback is resumed. Backups are rotated
+as needed. The idea is to move the backup load away from production node.
+Usable from PostgreSQL 8.2 and up.
+
+=== restore [src[dst]] ===
+
+Restores the specified backup set to target directory. If specified
+without arguments the latest backup is *moved* to Slave data directory
+(doesn't obey retention rules). If `src` backup is specified the backup
+is copied (instead of moved). Alternative destination directory can be
+specified with `dst`.
+
+=== cleanup ===
+
+Cleanup any walmgr3 files after stop.
+
+== INTERNAL COMMANDS ==
+
+=== xarchive <srcpath> <srcname> ===
+
+On Master, archive one WAL file.
+
+=== xrestore <srcname> <dstpath> [last restartpoint wal] ===
+
+On Slave, restore one WAL file.
+
+=== xlock ===
+
+On Master, create lock file to deny other concurrent backups.
+
+=== xrelease ===
+
+On Slave, remove backup lock file, allow other backup to run.
+
+=== xrotate ===
+
+Rotate backups by increasing backup directory suffixes. Note that since
+we also have to make room for next backup, we actually have
+*keep_backups - 1* backups available after this.
+
+Unneeded WAL files are not removed here, it is handled by `xpurgewals`
+command instead.
+
+=== xpurgewals ===
+
+On Slave, remove WAL files not needed for recovery.
+
+=== xpartialsync <filename> <offset> <bytes> ===
+
+Read 'bytes' worth of data from stdin, append to the partial WAl file
+starting from 'offset'. On error it is assumed that master restarts
+from zero.
+
+The resulting file is always padded to XLOG_SEGMENT_SIZE bytes to
+simplify recovery.
+
+== CONFIGURATION ==
+
+=== Common settings ===
+
+==== job_name ====
+
+Optional. Indentifies this script, used in logging. Keep unique if
+using central logging.
+
+==== logfile ====
+
+Where to log.
+
+==== use_skylog ====
+
+Optional. If nonzero, 'skylog.ini' is used for log configuration.
+
+=== Master settings ===
+
+==== pidfile ====
+
+Pid file location for syncdaemon mode (if running with -d). Otherwise
+not required.
+
+==== master_db ====
+
+Database to connect to for pg_start_backup(), etc. It is not a
+good idea to use `dbname=template` if running syncdaemon in record
+shipping mode.
+
+==== master_data ====
+
+Master data directory location.
+
+==== master_config ====
+
+Master postgresql.conf file location. This is where `archive_command`
+gets updated.
+
+==== master_restart_cmd ====
+
+The command to restart Master database, this used after changing
+`archive_mode` parameter. Leave unset if you cannot afford to restart
+the database at setup/stop.
+
+==== slave ====
+
+Slave host and base directory.
+
+==== slave_config ====
+
+Configuration file location for the Slave walmgr3.
+
+==== completed_wals ====
+
+Slave directory where archived WAL files are copied.
+
+==== partial_wals ====
+
+Slave directory where incomplete WAL files are stored.
+
+==== full_backup ====
+
+Slave directory where full backups are stored.
+
+==== config_backup ====
+
+Optional. Slave directory where configuration file backups are stored.
+
+==== loop_delay ====
+
+The frequency of syncdaemon updates. In record shipping mode only
+incremental updates are sent, so smaller interval can be used.
+
+==== use_xlog_functions ====
+
+Use pg_xlog functions for record based shipping (available in 8.2 and
+up).
+
+==== compression ====
+
+If nonzero, a `-z` flag is added to rsync cmdline. It reduces network
+traffic at the cost of extra CPU time.
+
+==== keep_symlinks ====
+
+Keep symlinks for `pg_xlog` and `pg_log`.
+
+==== hot_standby ====
+
+If set to 1, walmgr3 setup will set `wal_level` to `hot_standby`
+(PostgreSQL 9.0 and newer).
+
+==== command_interval ====
+
+How ofter to run periodic command script. In seconds, and only evaluated
+at log switch times.
+
+==== periodic_command ====
+
+Shell script to be executed at specified time interval. Can be used for
+synchronizing scripts, config files etc.
+
+=== Sample master.ini ===
+
+ [walmgr]
+ job_name = wal-master
+ logfile = ~/log/%(job_name)s.log
+ pidfile = ~/pid/%(job_name)s.pid
+ use_skylog = 1
+
+ master_db = dbname=my_db
+ master_data = /var/lib/postgresql/9.1/main
+ master_config = /etc/postgresql/9.1/main/postgresql.conf
+ master_bin = /usr/lib/postgresql/9.1/bin
+
+ # set this only if you can afford database restarts during setup and stop.
+ #master_restart_cmd = /etc/init.d/postgresql-9.1 restart
+
+ slave = slave-host
+ slave_config = /var/lib/postgresql/conf/wal-slave.ini
+
+ walmgr_data = /var/lib/postgresql/walshipping
+ completed_wals = %(walmgr_data)s/logs.complete
+ partial_wals = %(walmgr_data)s/logs.partial
+ full_backup = %(walmgr_data)s/data.master
+ config_backup = %(walmgr_data)s/config.backup
+
+ # syncdaemon update frequency
+ loop_delay = 10.0
+ # use record based shipping available since 8.2
+ use_xlog_functions = 0
+
+ # pass -z to rsync, useful on low bandwidth links
+ compression = 0
+
+ # keep symlinks for pg_xlog and pg_log
+ keep_symlinks = 1
+
+ # tell walmgr to set wal_level to hot_standby during setup
+ #hot_standby = 1
+
+ # periodic sync
+ #command_interval = 600
+ #periodic_command = /var/lib/postgresql/walshipping/periodic.sh
+
+=== Slave settings ===
+
+==== slave_data ====
+
+PostgreSQL data directory for the Slave. This is where the restored
+backup is copied/moved.
+
+==== slave_bin ====
+
+Specifies the location of PostgreSQL binaries (pg_controldata, etc).
+Needed if they are not already in the PATH.
+
+==== slave_stop_cmd ====
+
+Script to stop PostgreSQL on Slave.
+
+==== slave_start_cmd ====
+
+Script to start PostgreSQL on Slave.
+
+==== slave_config_dir ====
+
+Directory for PostgreSQL configuration files. If specified, `walmgr3 restore`
+attempts to restore configuration files from `config_backup` directory.
+
+==== slave_pg_xlog ====
+
+Set to the directory on the Slave where pg_xlog files get written to. On
+a restore to the Slave walmgr3 will create a symbolic link from
+data/pg_xlog to this location.
+
+==== completed_wals ====
+
+Directory where complete WAL files are stored. Also miscellaneous
+control files are created in this directory (BACKUPLOCK, STOP, PAUSE,
+etc.).
+
+==== partial_wals ====
+
+Directory where partial WAL files are stored.
+
+==== full_backup ====
+
+Directory where full backups are stored.
+
+==== config_backup ====
+
+Optional. Slave directory where configuration file backups are stored.
+
+==== backup_datadir ====
+
+Set `backup_datadir` to 'no' to prevent walmgr3 from making a backup of
+the data directory when restoring to the Slave. This defaults to 'yes'.
+
+==== keep_backups ====
+
+Number of backups to keep. Also all WAL files needed to bring earliest
+backup up to date are kept. The backups are rotated before new backup
+is started, so at one point there is actually one less backup available.
+
+It probably doesn't make sense to specify `keep_backups` if periodic
+backups are not performed - the WAL files will pile up quickly.
+
+Backups will be named data.master, data.master.0, data.master.1 etc.
+
+==== archive_command ====
+Script to execute before rotating away the oldest backup. If it fails
+backups will not be rotated.
+
+==== primary_conninfo ====
+
+Primary database connection string for hot standby - enabling this will
+cause the Slave to be started in hot standby mode.
+
+=== Sample slave.ini ===
+
+ [walmgr]
+ job_name = wal-slave
+ logfile = ~/log/%(job_name)s.log
+ use_skylog = 1
+
+ slave_data = /var/lib/postgresql/9.1/main
+ slave_bin = /usr/lib/postgresql/9.1/bin
+ slave_stop_cmd = /etc/init.d/postgresql-9.1 stop
+ slave_start_cmd = /etc/init.d/postgresql-9.1 start
+ slave_config_dir = /etc/postgresql/9.1/main
+
+ # alternative pg_xlog directory for slave, symlinked to pg_xlog on restore
+ #slave_pg_xlog = /vol2/pg_xlog
+
+ walmgr_data = ~/walshipping
+ completed_wals = %(walmgr_data)s/logs.complete
+ partial_wals = %(walmgr_data)s/logs.partial
+ full_backup = %(walmgr_data)s/data.master
+ config_backup = %(walmgr_data)s/config.backup
+
+ backup_datadir = yes
+ keep_backups = 0
+ archive_command =
+
+ # primary database connect string for hot standby -- enabling
+ # this will cause the slave to be started in hot standby mode.
+ #primary_conninfo = host=master port=5432 user=postgres
+
+== EXIT STATUS ==
+
+ 0::
+ Successful program execution.
+
+== ENVIRONMENT ==
+
+PostgreSQL environment variables can be used.
+
+== QUICK START [[quick_start]] ==
+
+1. Set up passwordless ssh-key on Master and write configuration file
+
+ master$ walmgr3 --ssh-keygen --init-master --slave <slave_hostname>
+
+2. Set up passwordless ssh authentication from Master to Slave and write
+ configuration file on Slave
+
+ slave$ walmgr3 --init-slave --ssh-add-key=/tmp/id_rsa.pub
+
+3. Logging setup on Master and Slave
+
+ master$ cp skylog.ini ~postgres/
+ slave$ cp skylog.ini ~postgres/
+
+4. Start archival process and create a base backup
+
+ master$ walmgr3 /var/lib/postgresql/conf/wal-master.ini setup
+ master$ walmgr3 /var/lib/postgresql/conf/wal-master.ini backup
+
+ CAUTION: starting from PostgreSQL 8.3 the archiving is enabled by
+ setting archive_mode GUC to on. However changing this parameter
+ requires the server to be restarted.
+
+5. Prepare postgresql.conf and pg_hba.conf on Slave and start replay
+
+ slave$ walmgr3 /var/lib/postgresql/conf/wal-slave.ini restore
+
+ For debian based distributions the standard configuration files are
+ located in /etc/postgresql/x.x/main directory. If another scheme is
+ used the postgresql.conf and pg_hba.conf should be copied to slave
+ full_backup directory. Make sure to disable archive_command in slave
+ config.
+
+ 'walmgr3 restore' moves data in place, creates recovery.conf and
+ starts postmaster in recovery mode.
+
+6. In-progress WAL segments can be backup by command:
+
+ master$ walmgr3 /var/lib/postgresql/conf/wal-master.ini sync
+
+7. If need to stop replay on Slave and boot into normal mode, do:
+
+ slave$ walmgr3 /var/lib/postgresql/conf/wal-slave.ini boot
+
+
+
diff --git a/python/walmgr.py b/python/walmgr.py
index d4796401..f990bdef 100755
--- a/python/walmgr.py
+++ b/python/walmgr.py
@@ -34,9 +34,7 @@ Internal commands:
xrelease Release backup lock (master)
xrotate Rotate backup sets, expire and archive oldest if necessary.
xpurgewals Remove WAL files not needed for backup (slave)
-
-Switches:
- -n no action, just print commands
+ xpartialsync Append data to WAL file (slave)
"""
import os, sys, re, signal, time, traceback