Merge remote-tracking branch 'klando/for_marko'

Conflicts: README
author: Marko Kreen 2011-11-25 14:36:40 +0000
committer: Marko Kreen 2011-11-25 14:36:40 +0000
commit: 319bce547fc47208f70220ceb88b16c69a7359d8 (patch)
tree: 700b40271702d06edd7026f4affac0462eebd486
parent: 4e4ee32b2f38bdba7737e0754a557cd2b8677f67 (diff)
parent: eb6387479a59f0b9923cb70f24d4b5f442fb1010 (diff)
7 files changed, 1171 insertions, 333 deletions
diff --git a/README b/README
index 84b7dc9a..074f2ba4 100644
--- a/README
+++ b/README
@@ -1,8 +1,9 @@
 = SkyTools - tools for PostgreSQL =
 
-This is a package of tools in use in Skype for replication and
-failover.  Also it includes a generic queuing mechanism PgQ and
-utility library for Python scripts.
+This is a package of tools in use in Skype for replication and failover.
+It also includes a generic queuing mechanism PgQ and utility library for
+Python scripts, as well as a script for setting up and managing WAL based
+standby servers.
 
 == Overview ==
 
@@ -10,21 +11,40 @@ It contains following tools:
 
 === PgQ ===
 
-This is the queue mechanism we use. It consists of PL/pgsql, and C code
-in database, with Python framework on top of it.  It is based on
-snapshot based event handling ideas from Slony-I, written for general
-usage.
+PgQ is a queuing system written in PL/pgsql, Python and C code. It is
+based on snapshot based event handling ideas from Slony-I, written for
+general usage.
 
-Features:
+PgQ provides an efficient, transactionnal, queueing system with
+multi-nodes support (including work sharing and splitting, failover and
+switchover, for queues and for consumers).
+
+Rules:
 
 - There can be several queues in database.
 - There can be several producers than can insert into any queue.
-- There can be several consumers on one queue and all consumers see
-   all events.
+- There can be several consumers on one queue.
+- There can be several subconsumers on a consumer.
+
+PgQ is splited into 3 layers: Producers, Ticker and Consumers.
+
+*Producers*  and *Consumers* respectively push and read events into a
+queue. Producers just need to call PostgreSQL stored procedures (like a
+trigger on a table or a PostgreSQL call from the application). And
+consumers are frequently written in Python, the prefered language as it
+has a powerful SKytools Framework but it is not limitative and any
+language able to run PostgreSQL stored procedures can be used.
+
+*Ticker* is a daemon which splits the queues in batch of events and
+handle the maintenance of the system. The Ticker is provided with the
+Skytools.
 
 Documentation:
 
-- http://skytools.projects.postgresql.org/skytools-3.0/
+- PgQ ticker daemon (pgqd) usage: link:pgqd.html[]
+- PgQ admin tool (qadm) usage: link:qadmin.html[]
+- PgQ SQL API overview: link:pgq-sql.html[]
+- PgQ SQL reference: http://skytools.projects.postgresql.org/skytools-3.0/pgq/
 
 === Londiste ===
 
@@ -50,6 +70,8 @@ Documentation:
 This script will setup WAL archiving, does initial backup and
 runtime WAL archive and restore.
 
+It can also be used for up-to-last-second partial file copying,
+so less than whole file is lost in case of loss of master database server.
 
 == Source tree contents ==
 
diff --git a/doc/Makefile b/doc/Makefile
index b7791d90..fe8d3d55 100644
--- a/doc/Makefile
+++ b/doc/Makefile
@@ -11,11 +11,11 @@ HTMLS = README.html INSTALL.html \
 	TODO.html pgq-sql.html pgq-nodupes.html \
 	$(SCRIPT_HTMLS) faq.html set.notes.html skytools3.html devnotes.html
 
-SCRIPT_TXTS = walmgr.txt qadmin.txt scriptmgr.txt skytools_upgrade.txt
+SCRIPT_TXTS = londiste3.txt walmgr3.txt qadmin.txt scriptmgr.txt skytools_upgrade.txt
 SCRIPT_HTMLS = $(SCRIPT_TXTS:.txt=.html)
 
 MAN5 =
-MAN1_SFX = scriptmgr.1 skytools_upgrade.1 walmgr.1
+MAN1_SFX = scriptmgr.1 skytools_upgrade.1 walmgr3.1 londiste3.1
 MAN1 = qadmin.1 pgqd.1
 
 FQHTML = $(addprefix html/, $(HTMLS))
diff --git a/doc/londiste3.txt b/doc/londiste3.txt
new file mode 100644
index 00000000..16865920
--- /dev/null
+++ b/doc/londiste3.txt
@@ -0,0 +1,357 @@
+
+= londiste3(1) =
+
+== NAME ==
+
+londiste3 - tool for managing trigger-based replication for PostgreSQL.
+
+== SYNOPSIS ==
+
+  londiste3 <config.ini> command [subcommand] [options]
+
+== DESCRIPTION ==
+
+Londiste allows to setup and administer the replication, and is run as a
+daemon to handle the replication itself. (Londiste is just a complex
+PgQ Consumer).
+
+See <<examples,EXAMPLES>> below to start your first trigger-based
+replication in few steps.
+
+The 'command' is one of Node, Replication, Information or Internal
+commands listed below.
+
+Londiste introduces the notion of 'takeover', it is the action when a
+local node takes over the work of another node. The other node can be a
+root node or a branch node and it can be dead or alive when the action
+is run.
+
+Londiste also allows, among many other features,  cascading replication,
+partial replication and custom handlers for replication.
+
+== GENERAL OPTIONS ==
+
+  -V, --version::
+    Print version info and exit.
+
+  -h, --help::
+    Show this help message and exit.
+
+  -q, --quiet::
+    Log only errors and warnings.
+
+  -v, --verbose::
+    Log verbosely.
+
+  -d, --daemon::
+    Run in daemon mode (go background).
+
+== SPECIFIC OPTIONS ==
+
+  --ini::
+    Display sample ini file.
+
+  --set="'param=val[,param=value]'"::
+    Override config setting.
+
+== DAEMON OPTIONS ==
+
+  -r, --reload::
+    Reload config (send SIGHUP).
+
+  -s, --stop::
+    Stop program safely (send SIGINT).
+
+  -k, --kill::
+    Kill program immediately (send SIGTERM).
+
+== REPLICATION EXTRA ARGUMENTS ==
+
+  --rewind::
+    Change queue position according to destination.
+
+  --reset::
+    Reset queue position on destination side.
+
+== NODE INITIALIZATION COMMANDS ==
+
+Initialization commands are the only ones that requires a connection
+string argument. It is the connection string that Londiste3 will store,
+other nodes will use it to connect to the current node.
+
+=== create-root <node> <connstr> ===
+
+Initializes a Master node.
+
+The <node> is the name of the node, it should be unique.
+The <connstr> argument is the connection string to the database on the
+Master node.
+
+=== create-branch <node> <connstr> --provider=<public_connstr> ===
+
+Initializes a Slave node which can be used as a reference for other
+nodes.
+
+The <node> is the name of the node, it should be unique.
+The <connstr> argument is the connection string to the database on the
+current node and <public_connstr> is the connection string to the
+provider database (it can be a root node or a branch node).
+
+=== create-leaf <node> <connstr> --provider=<public_connstr> ===
+
+Initializes a Slave node which can not be used as a reference for other
+nodes.
+
+The <node> is the name of the node, it should be unique.
+The <connstr> argument is the connection string to the database on the
+current node and <public_connstr> is the connection string to the
+provider database (it can be a root node or a branch node).
+
+  --merge='qname'::
+    combined queue name
+
+== NODE ADMINISTRATION COMMANDS ==
+
+=== pause ===
+
+Pause the consumer: the replication of the events is stopped and can be
+resumed later.
+
+=== resume ===
+
+When the consumer has been paused, let it replay again.
+
+=== change-provider <tonode> ===
+
+Make <tonode> become the new provider for the current node.
+
+TODO: londiste.py need update (param change from --provider)
+
+=== takeover <fromnode> [--target=<tonode>] [--all] [--dead] ===
+
+This command allows to achieve failover and switchover for any of your
+providers (root or branch nodes).
+
+  --target='tonode'::
+    Target node of the takeover.
+
+  --all::
+    In addition to take over the work from the 'fromnode', make other
+    nodes change their provider to the current node.
+
+  --dead::
+    Don't wait to take the new role and flag the 'fromnode' as dead.
+
+  --dead='deadnode'::
+    Assume node is dead.
+    TODO : why use this one ?
+
+  --dead-root::
+    Old node was root.
+
+  --dead-branch::
+    Old node was branch
+
+=== drop-node <node> ===
+
+Remove the node from the Londiste replication.
+
+Londiste triggers on the node are removed but Londiste or PgQ are not
+removed.
+
+=== tag-dead <node> ===
+
+Tag the node as dead, the command can be run from any node in the
+replication.
+
+=== tag-alive <node> ===
+
+Tag the node as alive, the command can be run from any node in the
+replication.
+
+== INFORMATION COMMANDS ==
+
+=== status ===
+
+Show status of the replication viewed by the current node.
+
+The output is a tree of the members of the replication with their lags,
+last tick, status and the number of tables in state: ok/half/ignored
+(replicated, initial copy not finnished, table not replicated locally).
+
+=== members ===
+
+Show members of the replication viewed by the current node.
+
+Output the nodes name, status and node location (connection string to
+the node).
+
+=== show-consumers [--node] ===
+
+TODO: command is not working
+
+== REPLICATION DAEMON COMMAND ==
+
+=== worker ===
+
+Replay events to subscriber: it is needed to make the replication active
+as it will start to replay the events.
+
+== REPLICATION ADMINISTRATION COMMANDS ==
+
+=== add-table <table> [args] ===
+
+Add the table to the replication.
+
+See <<add_args,ADD ARGUMENTS>> below for the list of possible arguments.
+
+=== remove-table <table> ===
+
+Remove the table from the replication.
+
+=== add-seq <seq> [args] ===
+
+Add the sequence to the replication.
+
+See <<add_args,ADD ARGUMENTS>> below for the list of possible arguments.
+
+=== remove-seq <seq> ===
+
+Remove the sequence from the replication.
+
+=== tables ===
+
+Show all tables on provider.
+
+=== seqs ===
+
+Show all sequences on provider.
+
+=== missing ===
+
+List tables subscriber has not yet attached to.
+
+=== resync <table> ===
+
+Do full copy of the table, again.
+
+== ADD ARGUMENTS [[add_args]] ==
+
+  --all::
+    Include all possible tables.
+
+  --dest-table='table'::
+    Redirect changes to different table.
+
+  --force::
+    Ignore table differences.
+
+  --expect-sync::
+    No copy needed.
+
+  --skip-truncate::
+    Keep old data.
+
+  --create::
+    Create table/sequence if not exist, with minimal schema.
+
+  --create-full::
+    Create table/sequence if not exist, with full schema.
+
+  --trigger-flags='trigger_flags'::
+    Trigger creation flags, see below for details.
+
+  --trigger-arg='trigger_arg'::
+    Custom trigger arg (can be specified multiply times).
+
+  --no-triggers::
+    Dont put triggers on table (makes sense on leaf node).
+
+  --handler='handler'::
+    Custom handler for table.
+
+  --handler-arg='handler_arg'::
+    Argument to custom handler.
+
+  --copy-condition='copy_condition'::
+    Set WHERE expression for copy.
+
+  --merge-all::
+    Merge tables from all source queues.
+
+  --no-merge::
+    Don't merge tables from source queues.
+
+  --max-parallel-copy='max_parallel_copy'::
+    Max number of parallel copy processes.
+
+Trigger creation flags (default: AIUDL):
+
+- I - ON INSERT
+- U - ON UPDATE
+- D - ON DELETE
+- Q - use pgq.sqltriga() as trigger function
+- L - use pgq.logutriga() as trigger function
+- B - BEFORE
+- A - AFTER
+- S - SKIP
+
+== REPLICATION EXTRA COMMANDS ==
+
+=== check ===
+
+Compare table structure on both sides.
+
+=== fkeys ===
+
+Print out fkey drop/create commands.
+
+=== compare [<table>] ===
+
+Compare table contents on both sides.
+
+=== repair [<table>] [--force] ===
+
+Repair data on subscriber.
+
+  --force::
+    Ignore lag.
+
+=== execute [filepath] ===
+
+Execute SQL files on each nodes of the set.
+
+=== show-handlers ['handler'] ===
+
+Show info about all or a specific handler.
+
+== INTERNAL COMMAND ==
+
+=== copy ===
+
+Copy table logic.
+
+== EXIT STATUS ==
+
+  0::
+    Successful program execution.
+
+== ENVIRONMENT ==
+
+PostgreSQL environment variables can be used.
+
+== EXAMPLES [[examples]] ==
+
+Londiste is provided with HowTos to help you make your fisrt steps:
+
+- How to set up simple replication.
+- How to set up cascaded replication.
+- How to set up table partitioning (handlers).
+
+
+
+
+
+
+
+
diff --git a/doc/qadmin.txt b/doc/qadmin.txt
index 0c0b1d04..e406cad2 100644
--- a/doc/qadmin.txt
+++ b/doc/qadmin.txt
@@ -7,33 +7,279 @@ qadmin - Easy to use admin console to examine and administer PgQ queues.
 
 == SYNOPSIS ==
 
-  qadmin.py [ options ... ]
+  qadmin [options]
 
 == DESCRIPTION ==
 
-The goal is to be psql-like console for queue administration.
+This is a psql-like console for queue administration.
+
+The console offers a large number of command to setup, control and
+manage PgQ queueing system.
+
+It also offers a non-interactive mode to run one or more commands.
+
+qadmin keeps an history file in the home of the user (`~/.qadmin_history`).
 
 == GENERAL OPTIONS ==
 
-Initial connection options (for libpq):
+  --help::
+    Help screen.
+
+  --version::
+    Print version.
+
+== CONNECTION OPTIONS ==
+
+  -h host::
+    Sspecify host to connect to (default: localhost via unix socket).
+
+  -p port::
+    Specify port.
+
+  -U user::
+    Specify user name.
+
+  -d dbname::
+    Database name.
+
+  -Q queuename::
+    Queue name, it is used as the `default queue` if it is provided.
+
+== SPECIFIC OPTIONS ==
+
+  -c 'cmd_string'::
+    Execute console command.
+
+  -f 'execfile'::
+    Execute file containing console commands.
+
+== CONNECTION COMMANDS ==
+
+qadmin offers to connect to other databases or queues from the console
+itself.
+
+=== connect <connstring> [queue=<qname>]; ===
+
+Connect the console with the specified connection string, optional
+parameter to set the default queue on connection.
+
+=== connect [queue=<qname>] [node=<node>]; ===
+
+Connect the console to the specified queue and/or node.
+
+== MANAGEMENT COMMANDS ==
+
+=== install pgq | londiste; ===
+
+Install PgQ or Londiste to the connected database.
+
+=== create queue <qname>; ===
+
+Create the specified queue.
+
+=== alter queue <qname | *> set param =<foo=1>,<bar=2>; ===
+
+Set one or more parameters on one or all queue at once.
+
+=== drop queue <qname>; ===
+
+Drop the named queue.
+
+=== register consumer <consumer> [on <qname> | at <tick_id> | copy <consumer> ]; ===
+
+Register a consumer on a queue, or at a specified tick or based on
+another consumer.
+
+=== unregister consumer <consumer | *> [from <qname>]; ===
+
+Unregister one or all consumers, if the console is not connected to a
+queue, its name must be provided.
+
+=== register subconsumer <subconsumer> for <consumer> [on <qname>]; ===
+
+Register a subconsumer to a consumer, if the console is not connected to a
+queue, its name must be provided.
+
+=== unregister subconsumer <subconsumer | *> for <consumer> [from <qname>] [close [batch]]; ===
+
+Unregister one or all subconsumers from a consumer, if the console is
+not connected to a queue, its name must be provided.
+Current batch can be aborted if the `close batch` subcommand is
+provided.
+
+== SHOW COMMANDS ==
+
+=== show help; ===
+
+Show all the console commands.
+
+=== show queue [ <qname | *> ]; ===
+
+Show details of one or all queues.
+
+=== show table <tbl>; ===
+
+Show DDL for the specified table.
+
+=== show sequence <seq>; ===
+
+Show DDL for the specified sequence.
+
+=== show consumer [ <consumer | *> [on <qname>] ]; ===
+
+Show details of one or all consumers on one or all queues.
+
+=== show batch <batch_id>; ===
+
+Show details of the batch, default queue must be set (see `connect queue`)
+
+=== show batch <consumer>; ===
+
+Show details of the current batch for the specified consumer, default
+queue must be set (see `connect queue`)
+
+== LONDISTE COMMANDS ==
+
+All this commands are applyed on the node where the console is connected
+to.
+
+=== londiste add table <tbl> [with ... ] ===
+
+  with no_triggers::
+    Skip trigger creation.
+
+  with skip_truncate::
+    Does not truncate the table on the destination.
+
+  with expect_sync::
+    Set table state to 'ok'.
+
+  with tgflags='UIDBAQL'::
+    Trigger creation flags, see below for details.
+
+  with backup::
+    Put urlencoded contents of old row to `ev_extra2`.
+
+  with skip::
+    Create skip trigger. Same as S flag.
+
+  with when='expr'::
+    If 'expr' returns false, don't insert event.
+
+  with ev_XX='EXPR'::
+    Overwrite default ev_* columns (see below).
+
+Trigger creation flags (default: AIUDL):
+
+- I - ON INSERT
+- U - ON UPDATE
+- D - ON DELETE
+- Q - use pgq.sqltriga() as trigger function
+- L - use pgq.logutriga() as trigger function
+- B - BEFORE
+- A - AFTER
+- S - SKIP
+
+Queue event fields:
+
+- ev_type   - I/U/D
+- ev_data   - partial SQL statement
+- ev_extra1 - table name
+- ev_extra2 - optional urlencoded backup
+
+
+=== londiste add sequence <seq>; ===
+
+Add the specified sequence to Londiste replication.
+
+=== londiste remove table <tbl> [,tbl]; ===
+
+Remove the specified table(s) from the Londiste replication.
+
+=== londiste remove sequence <seq> [,seq]; ===
+
+Remove the specified sequence(s) from the Londiste replication.
+
+=== londiste tables; ===
+
+List registered tables and informations about them:
+
+- table_name      - fully-qualified table name
+- local           - does events needs to be applied to local table
+- merge_state     - show phase of initial copy
+- custom_snapshot - remote snapshot of COPY transaction
+- table_attrs     - urlencoded dict of table attributes
+- dropped_ddl     - partition combining: temp place to put DDL
+- copy_role       - partition combining: how to handle copy
+- copy_pos        - position in parallel copy working order
+
+==== copy_role = lead ====
+
+On copy start, drop indexes and store in dropped_ddl.
+
+On copy finish change state to catching-up, then wait until copy_role
+turns to NULL.
+
+Catching-up: if dropped_ddl is not NULL, restore them.
+
+==== copy_role = wait-copy ====
+
+On copy start wait, until role changes (to wait-replay).
+
+==== copy_role = wait-replay ====
+
+On copy finish, tag as 'catching-up'.
+
+Wait until copy_role is NULL, then proceed.
+
+=== londiste seqs; ===
+
+List registered sequences on this node and their last value.
+
+=== londiste missing; ===
+
+On Master, list tables not registered on set.
+
+On Slave, list tables on set but not registered locally.
+
+== OTHER COMMANDS ==
+
+=== exit; ===
+
+Quit program.
+
+=== ^D ===
+
+Quit program.
+
+=== ^C ===
 
- -h host:: specify host to connect to (default: localhost via unix socket)
+Clear current buffer.
 
- -p port:: specify port
+== EXIT STATUS ==
 
- -U user:: specify user name
+  0::
+    Successful program execution.
 
- -d dbname:: database name
+== ENVIRONMENT ==
 
-Options for qadmin:
+PostgreSQL environment variables can be used.
 
- -Q queuename:: queue name
+== NOT IMPLEMENTED COMMANDS ==
 
- -c cmd_string:: execute command
+TODO : is it up-to-date ?
 
- -f execfile:: execute file containing commands
+- create <root | branch | leaf> node <node> location <loc> [on <qname>];
 
- --help:: help screen
+- alter node <name> provider <new>;
+- alter node <name> takeover <oldnow> with all;
+- alter node <name> rename <new>;
+- alter node <name> [location=<loc>]
 
- --version:: print version
+- drop node <name> [on <qname>];
+- takeover <oldnode>;
 
+- show node [ <node | *> [on <qname>] ];
+- show cascade;
+- show_queue_stats <q>;
+- status
diff --git a/doc/walmgr.txt b/doc/walmgr.txt
deleted file mode 100644
index fd3605cc..00000000
--- a/doc/walmgr.txt
+++ /dev/null
@@ -1,304 +0,0 @@
-
-= walmgr(1) =
-
-== NAME ==
-
-walmgr - tools for managing WAL-based replication for PostgreSQL.
-
-== SYNOPSIS ==
-
-  walmgr.py <config.ini> command
-
-== DESCRIPTION ==
-
-It is both admin and worker script for PostgreSQL PITR replication.
-
-== QUICK START ==
-
-1. Set up passwordless ssh authentication from master to slave
-
-     master$ test -f ~/.ssh/id_dsa.pub || ssh-keygen -t dsa
-     master$ cat ~/.ssh/id_dsa.pub | ssh slave cat \>\> .ssh/authorized_keys
-
-2. Configure paths
-
-     master$ edit master.ini
-     slave$ edit slave.ini
-
-     Make sure that walmgr.py executable has same pathname on slave and master.
-
-3. Start archival process and create a base backup
-
-     master$ ./walmgr.py master.ini setup
-     master$ ./walmgr.py master.ini backup
-
-   Note: starting from PostgreSQL 8.3 the archiving is enabled by setting
-   archive_mode GUC to on. However changing this parameter requires the 
-   server to be restarted.
-
-4. Prepare postgresql.conf and pg_hba.conf on slave and start replay
-
-     master$ scp $PGDATA/*.conf slave:
-     slave$ ./walmgr.py slave.ini restore
-
-   For debian based distributions the standard configuration files are located
-   in /etc/postgresql/x.x/main directory. If another scheme is used the postgresql.conf 
-   and pg_hba.conf should be copied to slave full_backup directory. Make sure to
-   disable archive_command in slave config.
-
-   'walmgr.py restore' moves data in place, creates recovery.conf and starts postmaster
-   in recovery mode.
-
-5. In-progress WAL segments can be backup by command:
-
-    master$ ./walmgr.py master.ini sync
-
-6. If need to stop replay on slave and boot into normal mode, do:
-
-    slave$ ./walmgr.py slave.ini boot
-
-== GENERAL OPTIONS ==
-
-Common options to all walmgr.py commands.
-
-  -h, --help::
-	show this help message and exit
-
-  -q, --quiet::
-	make program silent
-
-  -v, --verbose::
-	make program more verbose
-
-  -n, --not-really::
-	Show what would be done without actually doing anything.
-
-== MASTER COMMANDS ==
-
-=== setup ===
-Sets up postgres archiving, creates necessary directory structures on slave.
-
-=== sync ===
-Synchronizes in-progress WAL files to slave.
-
-=== syncdaemon ===
-Start WAL synchronization in daemon mode. This will start periodically synching
-the in-progress WAL files to slave. 
-
-The following parameters are used to drive the syncdaemon:
-loop_delay - how long to sleep between the synchs.
-use_xlog_functions - use record based shipping to synchronize in-progress WAL segments.
-
-=== stop ===
-Deconfigures postgres archiving.
-
-=== periodic ===
-Runs periodic command, if configured. This enables to execute arbitrary commands
-on interval, useful for synchronizing scripts, config files, crontabs etc.
-
-=== listbackups ===
-List backup sets available on slave node.
-
-=== backup ===
-Creates a new base backup from master database. Will purge expired backups and WAL
-files on slave if `keep_backups` is specified. During a backup a lock file is 
-created in slave `completed_wals` directory. This is to prevent simultaneous 
-backups and resulting corruption. If running backup is terminated, the BACKUPLOCK
-file may have to be removed manually.
-
-=== restore <set> <dst> ===
-EXPERIMENTAL. Attempts to restore the backup from slave to master.
-
-== SLAVE COMMANDS ==
-
-=== boot ===
-Stop log playback and bring the database up.
-
-=== pause ===
-Pauses WAL playback.
-
-=== continue ===
-Continues previously paused WAL playback.
-
-=== listbackups ===
-Lists available backups.
-
-=== backup ===
-EXPERIMENTAL. Creates a new backup from slave data. Log replay is paused,
-slave data directory is backed up to `full_backup` directory and log 
-replay resumed. Backups are rotated as needed. The idea is to move the 
-backup load away from production node. Usable from postgres 8.2 and up.
-
-=== restore [src][dst] ===
-Restores the specified backup set to target directory. If specified without
-arguments the latest backup is *moved* to slave data directory (doesn't obey
-retention rules). If src backup is specified the backup is copied (instead of moving).
-Alternative destination directory can be specified with `dst`.
-
-== CONFIGURATION ==
-
-=== Common settings ===
-
-==== job_name ====
-Optional. Indentifies this script, used in logging. Keep unique if 
-using central logging.
-
-==== logfile ====
-Where to log.
-
-==== use_skylog ====
-Optional. If nonzero, skylog.ini is used for log configuration.
-
-=== Master settings ===
-
-==== pidfile ====
-Pid file location for syncdaemon mode (if running with -d). Otherwise
-not required.
-
-==== master_db ====
-Database to connect to for pg_start_backup() etc. It is not a
-good idea to use `dbname=template` if running syncdaemon in
-record shipping mode.
-
-==== master_data ====
-Master data directory location.
-
-==== master_config ====
-Master postgresql.conf file location. This is where 
-`archive_command` gets updated.
-
-==== master_restart_cmd  ====
-The command to restart master database, this used after changing
-`archive_mode` parameter. Leave unset, if you cannot afford to 
-restart the database at setup/stop.
-
-==== slave ====
-Slave host and base directory.
-
-==== slave_config ====
-Configuration file location for the slave walmgr.
-
-==== completed_wals ====
-Slave directory where archived WAL files are copied.
-
-==== partial_wals ====
-Slave directory where incomplete WAL files are stored.
-
-==== full_backup ====
-Slave directory where full backups are stored.
-
-==== config_backup ====
-Slave directory where configuration file backups are stored. Optional.
-
-==== loop_delay ====
-The frequency of syncdaemon updates. In record shipping mode only
-incremental updates are sent, so smaller interval can be used.
-
-==== use_xlog_functions ====
-Use pg_xlog functions for record based shipping (available in 8.2 and up).
-
-==== compression ====
-If nonzero, a -z flag is added to rsync cmdline. Will reduce network
-traffic at the cost of extra CPU time.
-
-==== periodic_command ====
-Shell script to be executed at specified time interval. Can be used for
-synchronizing scripts, config files etc.
-
-==== command_interval ====
-How ofter to run periodic command script. In seconds, and only evaluated
-at log switch times.
-
-==== hot_standby ===
-Boolean. If set to true, walmgr setup will set wal_level to hot_standby (9.0 and newer).
-
-=== Sample master.ini ===
-
-  [wal-master]
-  logfile              = master.log
-  pidfile              = master.pid
-  master_db            = dbname=template1
-  master_data          = /var/lib/postgresql/8.0/main
-  master_config        = /etc/postgresql/8.0/main/postgresql.conf
-  slave                = slave:/var/lib/postgresql/walshipping
-  completed_wals       = %(slave)s/logs.complete
-  partial_wals         = %(slave)s/logs.partial
-  full_backup          = %(slave)s/data.master
-  loop_delay           = 10.0
-  use_xlog_functions   = 1
-  compression          = 1
-
-=== Slave settings ===
-
-==== slave_data ====
-Postgres data directory for the slave. This is where the restored
-backup is copied/moved.
-
-==== slave_config_dir ====
-Directory for postgres configuration files. If specified, "walmgr restore"
-attempts to restore configuration files from config_backup directory.
-
-==== slave_stop_cmd ====
-Script to stop postmaster on slave.
-
-==== slave_start_cmd ====
-Script to start postmaster on slave.
-
-==== slave ====
-Base directory for slave files (logs.complete, data.master etc)
-
-==== slave_bin ====
-Specifies the location of postgres binaries (pg_controldata, etc). Needed if
-they are not already in the PATH.
-
-==== completed_wals ====
-Directory where complete WAL files are stored. Also miscellaneous control files
-are created in this directory (BACKUPLOCK, STOP, PAUSE, etc.).
-
-==== partial_wals ====
-Directory where partial WAL files are stored.
-
-==== full_backup ====
-Directory where full backups are stored.
-
-==== keep_backups ====
-Number of backups to keep. Also all WAL files needed to bring earliest
-
-backup up to date are kept. The backups are rotated before new backup
-is started, so at one point there is actually one less backup available.
-
-It probably doesn't make sense to specify `keep_backups` if periodic 
-backups are not performed - the WAL files will pile up quickly.
-
-Backups will be named data.master, data.master.0, data.master.1 etc.
-
-==== archive_command ====
-Script to execute before rotating away the oldest backup. If it fails
-backups will not be rotated.
-
-
-==== slave_pg_xlog ====
-Set slave_pg_xlog to the directory on the slave where pg_xlog files get 
-written to. On a restore to the slave walmgr.py will
-create a symbolic link from data/pg_xlog to this location.
-
-
-==== backup_datadir ====
-Set backup_datadir to 'no' to prevent walmgr.py from making a backup
-of the data directory when restoring to the slave.  This defaults to
-'yes'
-
-
-=== Sample slave.ini ===
-
-  [wal-slave]
-  logfile              = slave.log
-  slave_data           = /var/lib/postgresql/8.0/main
-  slave_stop_cmd       = /etc/init.d/postgresql-8.0 stop
-  slave_start_cmd      = /etc/init.d/postgresql-8.0 start
-  slave                = /var/lib/postgresql/walshipping
-  completed_wals       = %(slave)s/logs.complete
-  partial_wals         = %(slave)s/logs.partial
-  full_backup          = %(slave)s/data.master
-  keep_backups         = 5
-  backup_datadir       = yes
diff --git a/doc/walmgr3.txt b/doc/walmgr3.txt
new file mode 100644
index 00000000..5961be07
--- /dev/null
+++ b/doc/walmgr3.txt
@@ -0,0 +1,519 @@
+
+= walmgr3(1) =
+
+== NAME ==
+
+walmgr3 - tool for managing WAL-based replication for PostgreSQL.
+
+== SYNOPSIS ==
+
+  walmgr3 <config.ini> command [--not-really] [options]
+
+== DESCRIPTION ==
+
+Walmgr3 is a tool to handle replication of PostgreSQL with PITR (also
+known as Log Shipping).
+
+This script allows to setup and administer the replication, it is also
+used by PostgreSQL to archive and restore the WAL files.
+
+See <<quick_start,QUICK START>> below to start your first log shipping
+in few steps.
+
+The 'command' is one of the Master, Slave, Common or Internal commands
+listed below.
+
+== GENERAL OPTIONS ==
+
+  -V, --version::
+    Print version info and exit.
+
+  -h, --help::
+    Show this help message and exit.
+
+  -q, --quiet::
+    Log only errors and warnings.
+
+  -v, --verbose::
+    Log verbosely.
+
+  -d, --daemon::
+    Run in daemon mode (go background).
+
+  -n, --not-really::
+    Show what would be done without actually doing anything.
+
+== SPECIFIC OPTIONS ==
+
+  --ini::
+    Display sample ini file.
+
+  --set="'param=val[,param=value]'"::
+    Override config setting
+
+  --init-master::
+    Initialize Master walmgr3 configuration.
+
+  --init-slave::
+    Initialize Slave walmgr3 configuration.
+
+  --config-dir='filepath'::
+    Configuration file location for `--init-X commands`.
+
+  --slave='hostname'::
+    Slave host name.
+
+  --pgdata='path'::
+    PostgreSQL data directory.
+
+  --ssh-keygen::
+    Generate a SSH key pair if needed (used in Master).
+
+  --ssh-add-key='keyfile.pub'::
+    Add the public key file to authorized_hosts file (used in Slave).
+
+  --ssh-remove-key='ssh_key'::
+    Remove Master ssh key from authorized_hosts file (used in Slave).
+
+  --primary-conninfo='connection_string'::
+    Provide the connection string to the streaming replication Master
+    (used in Slave).
+
+== DAEMON OPTIONS ==
+
+  -r, --reload::
+    Reload config (send SIGHUP).
+
+  -s, --stop::
+    Stop program safely (send SIGINT).
+
+  -k, --kill::
+    Kill program immediately (send SIGTERM).
+
+== MASTER COMMANDS ==
+
+=== setup ===
+
+Sets up PostgreSQL for WAL archiving, creates necessary directory
+structures on Slave.
+
+=== sync ===
+
+Copies in-progress WAL files to Slave.
+
+=== syncdaemon ===
+
+Start WAL synchronization in daemon mode. This will start periodically
+synching the in-progress WAL files to Slave.
+
+The following configuration parameters are used to drive the syncdaemon:
+
+ - *loop_delay* - how long to sleep between the synchs.
+
+ - *use_xlog_functions* - use record based shipping to synchronize
+   in-progress WAL segments.
+
+=== stop ===
+
+Stop archiving and de-configure PostgreSQL archiving.
+
+=== periodic ===
+
+Runs periodic command if configured. This enables to execute arbitrary
+commands on interval, useful for synchronizing scripts, config files,
+crontabs etc.
+
+== SLAVE COMMANDS ==
+
+=== boot ===
+
+Stop WAL playback and bring the database up so it can accept queries.
+
+=== pause ===
+
+Pauses WAL playback.
+
+=== continue ===
+
+Continues previously paused WAL playback.
+
+== COMMON COMMANDS ==
+
+=== listbackups ===
+
+Lists available backups on Slave node.
+
+=== backup ===
+
+Creates a new base backup from Master database. Will purge expired
+backups and WAL files on Slave if `keep_backups` is not specified.
+During a backup a lock file is created in Slave `completed_wals`
+directory. This is to prevent simultaneous backups and resulting
+corruption. If running backup is terminated, the BACKUPLOCK file may
+have to be removed manually.
+
+EXPERIMENTAL: If run on Slave, creates backup from in-recovery Slave
+data. WAL playback is paused, Slave data directory is backed up to
+`full_backup` directory and WAL playback is resumed. Backups are rotated
+as needed. The idea is to move the backup load away from production node.
+Usable from PostgreSQL 8.2 and up.
+
+=== restore [src[dst]] ===
+
+Restores the specified backup set to target directory. If specified
+without arguments the latest backup is *moved* to Slave data directory
+(doesn't obey retention rules). If `src` backup is specified the backup
+is copied (instead of moved). Alternative destination directory can be
+specified with `dst`.
+
+=== cleanup ===
+
+Cleanup any walmgr3 files after stop.
+
+== INTERNAL COMMANDS ==
+
+=== xarchive <srcpath> <srcname> ===
+
+On Master, archive one WAL file.
+
+=== xrestore <srcname> <dstpath> [last restartpoint wal] ===
+
+On Slave, restore one WAL file.
+
+=== xlock ===
+
+On Master, create lock file to deny other concurrent backups.
+
+=== xrelease ===
+
+On Slave, remove backup lock file, allow other backup to run.
+
+=== xrotate ===
+
+Rotate backups by increasing backup directory suffixes. Note that since
+we also have to make room for next backup, we actually have
+*keep_backups - 1* backups available after this.
+
+Unneeded WAL files are not removed here, it is handled by `xpurgewals`
+command instead.
+
+=== xpurgewals ===
+
+On Slave, remove WAL files not needed for recovery.
+
+=== xpartialsync <filename> <offset> <bytes> ===
+
+Read 'bytes' worth of data from stdin, append to the partial WAl file
+starting from 'offset'. On error it is assumed that master restarts
+from zero.
+
+The resulting file is always padded to XLOG_SEGMENT_SIZE bytes to
+simplify recovery.
+
+== CONFIGURATION ==
+
+=== Common settings ===
+
+==== job_name ====
+
+Optional. Indentifies this script, used in logging. Keep unique if
+using central logging.
+
+==== logfile ====
+
+Where to log.
+
+==== use_skylog ====
+
+Optional. If nonzero, 'skylog.ini' is used for log configuration.
+
+=== Master settings ===
+
+==== pidfile ====
+
+Pid file location for syncdaemon mode (if running with -d). Otherwise
+not required.
+
+==== master_db ====
+
+Database to connect to for pg_start_backup(), etc. It is not a
+good idea to use `dbname=template` if running syncdaemon in record
+shipping mode.
+
+==== master_data ====
+
+Master data directory location.
+
+==== master_config ====
+
+Master postgresql.conf file location. This is where `archive_command`
+gets updated.
+
+==== master_restart_cmd  ====
+
+The command to restart Master database, this used after changing
+`archive_mode` parameter. Leave unset if you cannot afford to restart
+the database at setup/stop.
+
+==== slave ====
+
+Slave host and base directory.
+
+==== slave_config ====
+
+Configuration file location for the Slave walmgr3.
+
+==== completed_wals ====
+
+Slave directory where archived WAL files are copied.
+
+==== partial_wals ====
+
+Slave directory where incomplete WAL files are stored.
+
+==== full_backup ====
+
+Slave directory where full backups are stored.
+
+==== config_backup ====
+
+Optional. Slave directory where configuration file backups are stored.
+
+==== loop_delay ====
+
+The frequency of syncdaemon updates. In record shipping mode only
+incremental updates are sent, so smaller interval can be used.
+
+==== use_xlog_functions ====
+
+Use pg_xlog functions for record based shipping (available in 8.2 and
+up).
+
+==== compression ====
+
+If nonzero, a `-z` flag is added to rsync cmdline. It reduces network
+traffic at the cost of extra CPU time.
+
+==== keep_symlinks ====
+
+Keep symlinks for `pg_xlog` and `pg_log`.
+
+==== hot_standby ====
+
+If set to 1, walmgr3 setup will set `wal_level` to `hot_standby`
+(PostgreSQL 9.0 and newer).
+
+==== command_interval ====
+
+How ofter to run periodic command script. In seconds, and only evaluated
+at log switch times.
+
+==== periodic_command ====
+
+Shell script to be executed at specified time interval. Can be used for
+synchronizing scripts, config files etc.
+
+=== Sample master.ini ===
+
+    [walmgr]
+    job_name             = wal-master
+    logfile              = ~/log/%(job_name)s.log
+    pidfile              = ~/pid/%(job_name)s.pid
+    use_skylog           = 1
+
+    master_db            = dbname=my_db
+    master_data          = /var/lib/postgresql/9.1/main
+    master_config        = /etc/postgresql/9.1/main/postgresql.conf
+    master_bin           = /usr/lib/postgresql/9.1/bin
+
+    # set this only if you can afford database restarts during setup and stop.
+    #master_restart_cmd   = /etc/init.d/postgresql-9.1 restart
+
+    slave = slave-host
+    slave_config = /var/lib/postgresql/conf/wal-slave.ini
+
+    walmgr_data          = /var/lib/postgresql/walshipping
+    completed_wals       = %(walmgr_data)s/logs.complete
+    partial_wals         = %(walmgr_data)s/logs.partial
+    full_backup          = %(walmgr_data)s/data.master
+    config_backup        = %(walmgr_data)s/config.backup
+
+    # syncdaemon update frequency
+    loop_delay           = 10.0
+    # use record based shipping available since 8.2
+    use_xlog_functions   = 0
+
+    # pass -z to rsync, useful on low bandwidth links
+    compression          = 0
+
+    # keep symlinks for pg_xlog and pg_log
+    keep_symlinks        = 1
+
+    # tell walmgr to set wal_level to hot_standby during setup
+    #hot_standby = 1
+
+    # periodic sync
+    #command_interval     = 600
+    #periodic_command     = /var/lib/postgresql/walshipping/periodic.sh
+
+=== Slave settings ===
+
+==== slave_data ====
+
+PostgreSQL data directory for the Slave. This is where the restored
+backup is copied/moved.
+
+==== slave_bin ====
+
+Specifies the location of PostgreSQL binaries (pg_controldata, etc).
+Needed if they are not already in the PATH.
+
+==== slave_stop_cmd ====
+
+Script to stop PostgreSQL on Slave.
+
+==== slave_start_cmd ====
+
+Script to start PostgreSQL on Slave.
+
+==== slave_config_dir ====
+
+Directory for PostgreSQL configuration files. If specified, `walmgr3 restore`
+attempts to restore configuration files from `config_backup` directory.
+
+==== slave_pg_xlog ====
+
+Set to the directory on the Slave where pg_xlog files get written to. On
+a restore to the Slave walmgr3 will create a symbolic link from
+data/pg_xlog to this location.
+
+==== completed_wals ====
+
+Directory where complete WAL files are stored. Also miscellaneous
+control files are created in this directory (BACKUPLOCK, STOP, PAUSE,
+etc.).
+
+==== partial_wals ====
+
+Directory where partial WAL files are stored.
+
+==== full_backup ====
+
+Directory where full backups are stored.
+
+==== config_backup ====
+
+Optional. Slave directory where configuration file backups are stored.
+
+==== backup_datadir ====
+
+Set `backup_datadir` to 'no' to prevent walmgr3 from making a backup of
+the data directory when restoring to the Slave. This defaults to 'yes'.
+
+==== keep_backups ====
+
+Number of backups to keep. Also all WAL files needed to bring earliest
+backup up to date are kept. The backups are rotated before new backup
+is started, so at one point there is actually one less backup available.
+
+It probably doesn't make sense to specify `keep_backups` if periodic 
+backups are not performed - the WAL files will pile up quickly.
+
+Backups will be named data.master, data.master.0, data.master.1 etc.
+
+==== archive_command ====
+Script to execute before rotating away the oldest backup. If it fails
+backups will not be rotated.
+
+==== primary_conninfo ====
+
+Primary database connection string for hot standby - enabling this will
+cause the Slave to be started in hot standby mode.
+
+=== Sample slave.ini ===
+
+    [walmgr]
+    job_name             = wal-slave
+    logfile              = ~/log/%(job_name)s.log
+    use_skylog           = 1
+
+    slave_data           = /var/lib/postgresql/9.1/main
+    slave_bin            = /usr/lib/postgresql/9.1/bin
+    slave_stop_cmd       = /etc/init.d/postgresql-9.1 stop
+    slave_start_cmd      = /etc/init.d/postgresql-9.1 start
+    slave_config_dir     = /etc/postgresql/9.1/main
+
+    # alternative pg_xlog directory for slave, symlinked to  pg_xlog on restore
+    #slave_pg_xlog        = /vol2/pg_xlog
+
+    walmgr_data          = ~/walshipping
+    completed_wals       = %(walmgr_data)s/logs.complete
+    partial_wals         = %(walmgr_data)s/logs.partial
+    full_backup          = %(walmgr_data)s/data.master
+    config_backup        = %(walmgr_data)s/config.backup
+
+    backup_datadir       = yes
+    keep_backups         = 0
+    archive_command =
+
+    # primary database connect string for hot standby -- enabling
+    # this will cause the slave to be started in hot standby mode.
+    #primary_conninfo = host=master port=5432 user=postgres
+
+== EXIT STATUS ==
+
+  0::
+    Successful program execution.
+
+== ENVIRONMENT ==
+
+PostgreSQL environment variables can be used.
+
+== QUICK START [[quick_start]] ==
+
+1. Set up passwordless ssh-key on Master and write configuration file
+
+  master$ walmgr3 --ssh-keygen --init-master --slave <slave_hostname>
+
+2. Set up passwordless ssh authentication from Master to Slave and write
+   configuration file on Slave
+
+  slave$ walmgr3 --init-slave --ssh-add-key=/tmp/id_rsa.pub
+
+3. Logging setup on Master and Slave
+
+  master$ cp skylog.ini ~postgres/
+  slave$ cp skylog.ini ~postgres/
+
+4. Start archival process and create a base backup
+
+  master$ walmgr3 /var/lib/postgresql/conf/wal-master.ini setup
+  master$ walmgr3 /var/lib/postgresql/conf/wal-master.ini backup
+
+  CAUTION: starting from PostgreSQL 8.3 the archiving is enabled by
+  setting archive_mode GUC to on. However changing this parameter
+  requires the server to be restarted.
+
+5. Prepare postgresql.conf and pg_hba.conf on Slave and start replay
+
+  slave$ walmgr3 /var/lib/postgresql/conf/wal-slave.ini restore
+
+  For debian based distributions the standard configuration files are
+  located in /etc/postgresql/x.x/main directory. If another scheme is
+  used the postgresql.conf and pg_hba.conf should be copied to slave
+  full_backup directory. Make sure to disable archive_command in slave
+  config.
+
+  'walmgr3 restore' moves data in place, creates recovery.conf and
+  starts postmaster in recovery mode.
+
+6. In-progress WAL segments can be backup by command:
+
+    master$ walmgr3 /var/lib/postgresql/conf/wal-master.ini sync
+
+7. If need to stop replay on Slave and boot into normal mode, do:
+
+    slave$ walmgr3 /var/lib/postgresql/conf/wal-slave.ini boot
+
+
+
diff --git a/python/walmgr.py b/python/walmgr.py
index d4796401..f990bdef 100755
--- a/python/walmgr.py
+++ b/python/walmgr.py
@@ -34,9 +34,7 @@ Internal commands:
   xrelease           Release backup lock (master)
   xrotate            Rotate backup sets, expire and archive oldest if necessary.
   xpurgewals         Remove WAL files not needed for backup (slave)
-
-Switches:
-  -n                 no action, just print commands
+  xpartialsync       Append data to WAL file (slave)
 """
 
 import os, sys, re, signal, time, traceback
author	Marko Kreen	2011-11-25 14:36:40 +0000
committer	Marko Kreen	2011-11-25 14:36:40 +0000
commit	319bce547fc47208f70220ceb88b16c69a7359d8 (patch)
tree	700b40271702d06edd7026f4affac0462eebd486
parent	4e4ee32b2f38bdba7737e0754a557cd2b8677f67 (diff)
parent	eb6387479a59f0b9923cb70f24d4b5f442fb1010 (diff)