diff options
| author | Marko Kreen | 2011-11-25 14:36:40 +0000 |
|---|---|---|
| committer | Marko Kreen | 2011-11-25 14:36:40 +0000 |
| commit | 319bce547fc47208f70220ceb88b16c69a7359d8 (patch) | |
| tree | 700b40271702d06edd7026f4affac0462eebd486 | |
| parent | 4e4ee32b2f38bdba7737e0754a557cd2b8677f67 (diff) | |
| parent | eb6387479a59f0b9923cb70f24d4b5f442fb1010 (diff) | |
Merge remote-tracking branch 'klando/for_marko'
Conflicts:
README
| -rw-r--r-- | README | 44 | ||||
| -rw-r--r-- | doc/Makefile | 4 | ||||
| -rw-r--r-- | doc/londiste3.txt | 357 | ||||
| -rw-r--r-- | doc/qadmin.txt | 272 | ||||
| -rw-r--r-- | doc/walmgr.txt | 304 | ||||
| -rw-r--r-- | doc/walmgr3.txt | 519 | ||||
| -rwxr-xr-x | python/walmgr.py | 4 |
7 files changed, 1171 insertions, 333 deletions
@@ -1,8 +1,9 @@ = SkyTools - tools for PostgreSQL = -This is a package of tools in use in Skype for replication and -failover. Also it includes a generic queuing mechanism PgQ and -utility library for Python scripts. +This is a package of tools in use in Skype for replication and failover. +It also includes a generic queuing mechanism PgQ and utility library for +Python scripts, as well as a script for setting up and managing WAL based +standby servers. == Overview == @@ -10,21 +11,40 @@ It contains following tools: === PgQ === -This is the queue mechanism we use. It consists of PL/pgsql, and C code -in database, with Python framework on top of it. It is based on -snapshot based event handling ideas from Slony-I, written for general -usage. +PgQ is a queuing system written in PL/pgsql, Python and C code. It is +based on snapshot based event handling ideas from Slony-I, written for +general usage. -Features: +PgQ provides an efficient, transactionnal, queueing system with +multi-nodes support (including work sharing and splitting, failover and +switchover, for queues and for consumers). + +Rules: - There can be several queues in database. - There can be several producers than can insert into any queue. -- There can be several consumers on one queue and all consumers see - all events. +- There can be several consumers on one queue. +- There can be several subconsumers on a consumer. + +PgQ is splited into 3 layers: Producers, Ticker and Consumers. + +*Producers* and *Consumers* respectively push and read events into a +queue. Producers just need to call PostgreSQL stored procedures (like a +trigger on a table or a PostgreSQL call from the application). And +consumers are frequently written in Python, the prefered language as it +has a powerful SKytools Framework but it is not limitative and any +language able to run PostgreSQL stored procedures can be used. + +*Ticker* is a daemon which splits the queues in batch of events and +handle the maintenance of the system. The Ticker is provided with the +Skytools. Documentation: -- http://skytools.projects.postgresql.org/skytools-3.0/ +- PgQ ticker daemon (pgqd) usage: link:pgqd.html[] +- PgQ admin tool (qadm) usage: link:qadmin.html[] +- PgQ SQL API overview: link:pgq-sql.html[] +- PgQ SQL reference: http://skytools.projects.postgresql.org/skytools-3.0/pgq/ === Londiste === @@ -50,6 +70,8 @@ Documentation: This script will setup WAL archiving, does initial backup and runtime WAL archive and restore. +It can also be used for up-to-last-second partial file copying, +so less than whole file is lost in case of loss of master database server. == Source tree contents == diff --git a/doc/Makefile b/doc/Makefile index b7791d90..fe8d3d55 100644 --- a/doc/Makefile +++ b/doc/Makefile @@ -11,11 +11,11 @@ HTMLS = README.html INSTALL.html \ TODO.html pgq-sql.html pgq-nodupes.html \ $(SCRIPT_HTMLS) faq.html set.notes.html skytools3.html devnotes.html -SCRIPT_TXTS = walmgr.txt qadmin.txt scriptmgr.txt skytools_upgrade.txt +SCRIPT_TXTS = londiste3.txt walmgr3.txt qadmin.txt scriptmgr.txt skytools_upgrade.txt SCRIPT_HTMLS = $(SCRIPT_TXTS:.txt=.html) MAN5 = -MAN1_SFX = scriptmgr.1 skytools_upgrade.1 walmgr.1 +MAN1_SFX = scriptmgr.1 skytools_upgrade.1 walmgr3.1 londiste3.1 MAN1 = qadmin.1 pgqd.1 FQHTML = $(addprefix html/, $(HTMLS)) diff --git a/doc/londiste3.txt b/doc/londiste3.txt new file mode 100644 index 00000000..16865920 --- /dev/null +++ b/doc/londiste3.txt @@ -0,0 +1,357 @@ + += londiste3(1) = + +== NAME == + +londiste3 - tool for managing trigger-based replication for PostgreSQL. + +== SYNOPSIS == + + londiste3 <config.ini> command [subcommand] [options] + +== DESCRIPTION == + +Londiste allows to setup and administer the replication, and is run as a +daemon to handle the replication itself. (Londiste is just a complex +PgQ Consumer). + +See <<examples,EXAMPLES>> below to start your first trigger-based +replication in few steps. + +The 'command' is one of Node, Replication, Information or Internal +commands listed below. + +Londiste introduces the notion of 'takeover', it is the action when a +local node takes over the work of another node. The other node can be a +root node or a branch node and it can be dead or alive when the action +is run. + +Londiste also allows, among many other features, cascading replication, +partial replication and custom handlers for replication. + +== GENERAL OPTIONS == + + -V, --version:: + Print version info and exit. + + -h, --help:: + Show this help message and exit. + + -q, --quiet:: + Log only errors and warnings. + + -v, --verbose:: + Log verbosely. + + -d, --daemon:: + Run in daemon mode (go background). + +== SPECIFIC OPTIONS == + + --ini:: + Display sample ini file. + + --set="'param=val[,param=value]'":: + Override config setting. + +== DAEMON OPTIONS == + + -r, --reload:: + Reload config (send SIGHUP). + + -s, --stop:: + Stop program safely (send SIGINT). + + -k, --kill:: + Kill program immediately (send SIGTERM). + +== REPLICATION EXTRA ARGUMENTS == + + --rewind:: + Change queue position according to destination. + + --reset:: + Reset queue position on destination side. + +== NODE INITIALIZATION COMMANDS == + +Initialization commands are the only ones that requires a connection +string argument. It is the connection string that Londiste3 will store, +other nodes will use it to connect to the current node. + +=== create-root <node> <connstr> === + +Initializes a Master node. + +The <node> is the name of the node, it should be unique. +The <connstr> argument is the connection string to the database on the +Master node. + +=== create-branch <node> <connstr> --provider=<public_connstr> === + +Initializes a Slave node which can be used as a reference for other +nodes. + +The <node> is the name of the node, it should be unique. +The <connstr> argument is the connection string to the database on the +current node and <public_connstr> is the connection string to the +provider database (it can be a root node or a branch node). + +=== create-leaf <node> <connstr> --provider=<public_connstr> === + +Initializes a Slave node which can not be used as a reference for other +nodes. + +The <node> is the name of the node, it should be unique. +The <connstr> argument is the connection string to the database on the +current node and <public_connstr> is the connection string to the +provider database (it can be a root node or a branch node). + + --merge='qname':: + combined queue name + +== NODE ADMINISTRATION COMMANDS == + +=== pause === + +Pause the consumer: the replication of the events is stopped and can be +resumed later. + +=== resume === + +When the consumer has been paused, let it replay again. + +=== change-provider <tonode> === + +Make <tonode> become the new provider for the current node. + +TODO: londiste.py need update (param change from --provider) + +=== takeover <fromnode> [--target=<tonode>] [--all] [--dead] === + +This command allows to achieve failover and switchover for any of your +providers (root or branch nodes). + + --target='tonode':: + Target node of the takeover. + + --all:: + In addition to take over the work from the 'fromnode', make other + nodes change their provider to the current node. + + --dead:: + Don't wait to take the new role and flag the 'fromnode' as dead. + + --dead='deadnode':: + Assume node is dead. + TODO : why use this one ? + + --dead-root:: + Old node was root. + + --dead-branch:: + Old node was branch + +=== drop-node <node> === + +Remove the node from the Londiste replication. + +Londiste triggers on the node are removed but Londiste or PgQ are not +removed. + +=== tag-dead <node> === + +Tag the node as dead, the command can be run from any node in the +replication. + +=== tag-alive <node> === + +Tag the node as alive, the command can be run from any node in the +replication. + +== INFORMATION COMMANDS == + +=== status === + +Show status of the replication viewed by the current node. + +The output is a tree of the members of the replication with their lags, +last tick, status and the number of tables in state: ok/half/ignored +(replicated, initial copy not finnished, table not replicated locally). + +=== members === + +Show members of the replication viewed by the current node. + +Output the nodes name, status and node location (connection string to +the node). + +=== show-consumers [--node] === + +TODO: command is not working + +== REPLICATION DAEMON COMMAND == + +=== worker === + +Replay events to subscriber: it is needed to make the replication active +as it will start to replay the events. + +== REPLICATION ADMINISTRATION COMMANDS == + +=== add-table <table> [args] === + +Add the table to the replication. + +See <<add_args,ADD ARGUMENTS>> below for the list of possible arguments. + +=== remove-table <table> === + +Remove the table from the replication. + +=== add-seq <seq> [args] === + +Add the sequence to the replication. + +See <<add_args,ADD ARGUMENTS>> below for the list of possible arguments. + +=== remove-seq <seq> === + +Remove the sequence from the replication. + +=== tables === + +Show all tables on provider. + +=== seqs === + +Show all sequences on provider. + +=== missing === + +List tables subscriber has not yet attached to. + +=== resync <table> === + +Do full copy of the table, again. + +== ADD ARGUMENTS [[add_args]] == + + --all:: + Include all possible tables. + + --dest-table='table':: + Redirect changes to different table. + + --force:: + Ignore table differences. + + --expect-sync:: + No copy needed. + + --skip-truncate:: + Keep old data. + + --create:: + Create table/sequence if not exist, with minimal schema. + + --create-full:: + Create table/sequence if not exist, with full schema. + + --trigger-flags='trigger_flags':: + Trigger creation flags, see below for details. + + --trigger-arg='trigger_arg':: + Custom trigger arg (can be specified multiply times). + + --no-triggers:: + Dont put triggers on table (makes sense on leaf node). + + --handler='handler':: + Custom handler for table. + + --handler-arg='handler_arg':: + Argument to custom handler. + + --copy-condition='copy_condition':: + Set WHERE expression for copy. + + --merge-all:: + Merge tables from all source queues. + + --no-merge:: + Don't merge tables from source queues. + + --max-parallel-copy='max_parallel_copy':: + Max number of parallel copy processes. + +Trigger creation flags (default: AIUDL): + +- I - ON INSERT +- U - ON UPDATE +- D - ON DELETE +- Q - use pgq.sqltriga() as trigger function +- L - use pgq.logutriga() as trigger function +- B - BEFORE +- A - AFTER +- S - SKIP + +== REPLICATION EXTRA COMMANDS == + +=== check === + +Compare table structure on both sides. + +=== fkeys === + +Print out fkey drop/create commands. + +=== compare [<table>] === + +Compare table contents on both sides. + +=== repair [<table>] [--force] === + +Repair data on subscriber. + + --force:: + Ignore lag. + +=== execute [filepath] === + +Execute SQL files on each nodes of the set. + +=== show-handlers ['handler'] === + +Show info about all or a specific handler. + +== INTERNAL COMMAND == + +=== copy === + +Copy table logic. + +== EXIT STATUS == + + 0:: + Successful program execution. + +== ENVIRONMENT == + +PostgreSQL environment variables can be used. + +== EXAMPLES [[examples]] == + +Londiste is provided with HowTos to help you make your fisrt steps: + +- How to set up simple replication. +- How to set up cascaded replication. +- How to set up table partitioning (handlers). + + + + + + + + diff --git a/doc/qadmin.txt b/doc/qadmin.txt index 0c0b1d04..e406cad2 100644 --- a/doc/qadmin.txt +++ b/doc/qadmin.txt @@ -7,33 +7,279 @@ qadmin - Easy to use admin console to examine and administer PgQ queues. == SYNOPSIS == - qadmin.py [ options ... ] + qadmin [options] == DESCRIPTION == -The goal is to be psql-like console for queue administration. +This is a psql-like console for queue administration. + +The console offers a large number of command to setup, control and +manage PgQ queueing system. + +It also offers a non-interactive mode to run one or more commands. + +qadmin keeps an history file in the home of the user (`~/.qadmin_history`). == GENERAL OPTIONS == -Initial connection options (for libpq): + --help:: + Help screen. + + --version:: + Print version. + +== CONNECTION OPTIONS == + + -h host:: + Sspecify host to connect to (default: localhost via unix socket). + + -p port:: + Specify port. + + -U user:: + Specify user name. + + -d dbname:: + Database name. + + -Q queuename:: + Queue name, it is used as the `default queue` if it is provided. + +== SPECIFIC OPTIONS == + + -c 'cmd_string':: + Execute console command. + + -f 'execfile':: + Execute file containing console commands. + +== CONNECTION COMMANDS == + +qadmin offers to connect to other databases or queues from the console +itself. + +=== connect <connstring> [queue=<qname>]; === + +Connect the console with the specified connection string, optional +parameter to set the default queue on connection. + +=== connect [queue=<qname>] [node=<node>]; === + +Connect the console to the specified queue and/or node. + +== MANAGEMENT COMMANDS == + +=== install pgq | londiste; === + +Install PgQ or Londiste to the connected database. + +=== create queue <qname>; === + +Create the specified queue. + +=== alter queue <qname | *> set param =<foo=1>,<bar=2>; === + +Set one or more parameters on one or all queue at once. + +=== drop queue <qname>; === + +Drop the named queue. + +=== register consumer <consumer> [on <qname> | at <tick_id> | copy <consumer> ]; === + +Register a consumer on a queue, or at a specified tick or based on +another consumer. + +=== unregister consumer <consumer | *> [from <qname>]; === + +Unregister one or all consumers, if the console is not connected to a +queue, its name must be provided. + +=== register subconsumer <subconsumer> for <consumer> [on <qname>]; === + +Register a subconsumer to a consumer, if the console is not connected to a +queue, its name must be provided. + +=== unregister subconsumer <subconsumer | *> for <consumer> [from <qname>] [close [batch]]; === + +Unregister one or all subconsumers from a consumer, if the console is +not connected to a queue, its name must be provided. +Current batch can be aborted if the `close batch` subcommand is +provided. + +== SHOW COMMANDS == + +=== show help; === + +Show all the console commands. + +=== show queue [ <qname | *> ]; === + +Show details of one or all queues. + +=== show table <tbl>; === + +Show DDL for the specified table. + +=== show sequence <seq>; === + +Show DDL for the specified sequence. + +=== show consumer [ <consumer | *> [on <qname>] ]; === + +Show details of one or all consumers on one or all queues. + +=== show batch <batch_id>; === + +Show details of the batch, default queue must be set (see `connect queue`) + +=== show batch <consumer>; === + +Show details of the current batch for the specified consumer, default +queue must be set (see `connect queue`) + +== LONDISTE COMMANDS == + +All this commands are applyed on the node where the console is connected +to. + +=== londiste add table <tbl> [with ... ] === + + with no_triggers:: + Skip trigger creation. + + with skip_truncate:: + Does not truncate the table on the destination. + + with expect_sync:: + Set table state to 'ok'. + + with tgflags='UIDBAQL':: + Trigger creation flags, see below for details. + + with backup:: + Put urlencoded contents of old row to `ev_extra2`. + + with skip:: + Create skip trigger. Same as S flag. + + with when='expr':: + If 'expr' returns false, don't insert event. + + with ev_XX='EXPR':: + Overwrite default ev_* columns (see below). + +Trigger creation flags (default: AIUDL): + +- I - ON INSERT +- U - ON UPDATE +- D - ON DELETE +- Q - use pgq.sqltriga() as trigger function +- L - use pgq.logutriga() as trigger function +- B - BEFORE +- A - AFTER +- S - SKIP + +Queue event fields: + +- ev_type - I/U/D +- ev_data - partial SQL statement +- ev_extra1 - table name +- ev_extra2 - optional urlencoded backup + + +=== londiste add sequence <seq>; === + +Add the specified sequence to Londiste replication. + +=== londiste remove table <tbl> [,tbl]; === + +Remove the specified table(s) from the Londiste replication. + +=== londiste remove sequence <seq> [,seq]; === + +Remove the specified sequence(s) from the Londiste replication. + +=== londiste tables; === + +List registered tables and informations about them: + +- table_name - fully-qualified table name +- local - does events needs to be applied to local table +- merge_state - show phase of initial copy +- custom_snapshot - remote snapshot of COPY transaction +- table_attrs - urlencoded dict of table attributes +- dropped_ddl - partition combining: temp place to put DDL +- copy_role - partition combining: how to handle copy +- copy_pos - position in parallel copy working order + +==== copy_role = lead ==== + +On copy start, drop indexes and store in dropped_ddl. + +On copy finish change state to catching-up, then wait until copy_role +turns to NULL. + +Catching-up: if dropped_ddl is not NULL, restore them. + +==== copy_role = wait-copy ==== + +On copy start wait, until role changes (to wait-replay). + +==== copy_role = wait-replay ==== + +On copy finish, tag as 'catching-up'. + +Wait until copy_role is NULL, then proceed. + +=== londiste seqs; === + +List registered sequences on this node and their last value. + +=== londiste missing; === + +On Master, list tables not registered on set. + +On Slave, list tables on set but not registered locally. + +== OTHER COMMANDS == + +=== exit; === + +Quit program. + +=== ^D === + +Quit program. + +=== ^C === - -h host:: specify host to connect to (default: localhost via unix socket) +Clear current buffer. - -p port:: specify port +== EXIT STATUS == - -U user:: specify user name + 0:: + Successful program execution. - -d dbname:: database name +== ENVIRONMENT == -Options for qadmin: +PostgreSQL environment variables can be used. - -Q queuename:: queue name +== NOT IMPLEMENTED COMMANDS == - -c cmd_string:: execute command +TODO : is it up-to-date ? - -f execfile:: execute file containing commands +- create <root | branch | leaf> node <node> location <loc> [on <qname>]; - --help:: help screen +- alter node <name> provider <new>; +- alter node <name> takeover <oldnow> with all; +- alter node <name> rename <new>; +- alter node <name> [location=<loc>] - --version:: print version +- drop node <name> [on <qname>]; +- takeover <oldnode>; +- show node [ <node | *> [on <qname>] ]; +- show cascade; +- show_queue_stats <q>; +- status diff --git a/doc/walmgr.txt b/doc/walmgr.txt deleted file mode 100644 index fd3605cc..00000000 --- a/doc/walmgr.txt +++ /dev/null @@ -1,304 +0,0 @@ - -= walmgr(1) = - -== NAME == - -walmgr - tools for managing WAL-based replication for PostgreSQL. - -== SYNOPSIS == - - walmgr.py <config.ini> command - -== DESCRIPTION == - -It is both admin and worker script for PostgreSQL PITR replication. - -== QUICK START == - -1. Set up passwordless ssh authentication from master to slave - - master$ test -f ~/.ssh/id_dsa.pub || ssh-keygen -t dsa - master$ cat ~/.ssh/id_dsa.pub | ssh slave cat \>\> .ssh/authorized_keys - -2. Configure paths - - master$ edit master.ini - slave$ edit slave.ini - - Make sure that walmgr.py executable has same pathname on slave and master. - -3. Start archival process and create a base backup - - master$ ./walmgr.py master.ini setup - master$ ./walmgr.py master.ini backup - - Note: starting from PostgreSQL 8.3 the archiving is enabled by setting - archive_mode GUC to on. However changing this parameter requires the - server to be restarted. - -4. Prepare postgresql.conf and pg_hba.conf on slave and start replay - - master$ scp $PGDATA/*.conf slave: - slave$ ./walmgr.py slave.ini restore - - For debian based distributions the standard configuration files are located - in /etc/postgresql/x.x/main directory. If another scheme is used the postgresql.conf - and pg_hba.conf should be copied to slave full_backup directory. Make sure to - disable archive_command in slave config. - - 'walmgr.py restore' moves data in place, creates recovery.conf and starts postmaster - in recovery mode. - -5. In-progress WAL segments can be backup by command: - - master$ ./walmgr.py master.ini sync - -6. If need to stop replay on slave and boot into normal mode, do: - - slave$ ./walmgr.py slave.ini boot - -== GENERAL OPTIONS == - -Common options to all walmgr.py commands. - - -h, --help:: - show this help message and exit - - -q, --quiet:: - make program silent - - -v, --verbose:: - make program more verbose - - -n, --not-really:: - Show what would be done without actually doing anything. - -== MASTER COMMANDS == - -=== setup === -Sets up postgres archiving, creates necessary directory structures on slave. - -=== sync === -Synchronizes in-progress WAL files to slave. - -=== syncdaemon === -Start WAL synchronization in daemon mode. This will start periodically synching -the in-progress WAL files to slave. - -The following parameters are used to drive the syncdaemon: -loop_delay - how long to sleep between the synchs. -use_xlog_functions - use record based shipping to synchronize in-progress WAL segments. - -=== stop === -Deconfigures postgres archiving. - -=== periodic === -Runs periodic command, if configured. This enables to execute arbitrary commands -on interval, useful for synchronizing scripts, config files, crontabs etc. - -=== listbackups === -List backup sets available on slave node. - -=== backup === -Creates a new base backup from master database. Will purge expired backups and WAL -files on slave if `keep_backups` is specified. During a backup a lock file is -created in slave `completed_wals` directory. This is to prevent simultaneous -backups and resulting corruption. If running backup is terminated, the BACKUPLOCK -file may have to be removed manually. - -=== restore <set> <dst> === -EXPERIMENTAL. Attempts to restore the backup from slave to master. - -== SLAVE COMMANDS == - -=== boot === -Stop log playback and bring the database up. - -=== pause === -Pauses WAL playback. - -=== continue === -Continues previously paused WAL playback. - -=== listbackups === -Lists available backups. - -=== backup === -EXPERIMENTAL. Creates a new backup from slave data. Log replay is paused, -slave data directory is backed up to `full_backup` directory and log -replay resumed. Backups are rotated as needed. The idea is to move the -backup load away from production node. Usable from postgres 8.2 and up. - -=== restore [src][dst] === -Restores the specified backup set to target directory. If specified without -arguments the latest backup is *moved* to slave data directory (doesn't obey -retention rules). If src backup is specified the backup is copied (instead of moving). -Alternative destination directory can be specified with `dst`. - -== CONFIGURATION == - -=== Common settings === - -==== job_name ==== -Optional. Indentifies this script, used in logging. Keep unique if -using central logging. - -==== logfile ==== -Where to log. - -==== use_skylog ==== -Optional. If nonzero, skylog.ini is used for log configuration. - -=== Master settings === - -==== pidfile ==== -Pid file location for syncdaemon mode (if running with -d). Otherwise -not required. - -==== master_db ==== -Database to connect to for pg_start_backup() etc. It is not a -good idea to use `dbname=template` if running syncdaemon in -record shipping mode. - -==== master_data ==== -Master data directory location. - -==== master_config ==== -Master postgresql.conf file location. This is where -`archive_command` gets updated. - -==== master_restart_cmd ==== -The command to restart master database, this used after changing -`archive_mode` parameter. Leave unset, if you cannot afford to -restart the database at setup/stop. - -==== slave ==== -Slave host and base directory. - -==== slave_config ==== -Configuration file location for the slave walmgr. - -==== completed_wals ==== -Slave directory where archived WAL files are copied. - -==== partial_wals ==== -Slave directory where incomplete WAL files are stored. - -==== full_backup ==== -Slave directory where full backups are stored. - -==== config_backup ==== -Slave directory where configuration file backups are stored. Optional. - -==== loop_delay ==== -The frequency of syncdaemon updates. In record shipping mode only -incremental updates are sent, so smaller interval can be used. - -==== use_xlog_functions ==== -Use pg_xlog functions for record based shipping (available in 8.2 and up). - -==== compression ==== -If nonzero, a -z flag is added to rsync cmdline. Will reduce network -traffic at the cost of extra CPU time. - -==== periodic_command ==== -Shell script to be executed at specified time interval. Can be used for -synchronizing scripts, config files etc. - -==== command_interval ==== -How ofter to run periodic command script. In seconds, and only evaluated -at log switch times. - -==== hot_standby === -Boolean. If set to true, walmgr setup will set wal_level to hot_standby (9.0 and newer). - -=== Sample master.ini === - - [wal-master] - logfile = master.log - pidfile = master.pid - master_db = dbname=template1 - master_data = /var/lib/postgresql/8.0/main - master_config = /etc/postgresql/8.0/main/postgresql.conf - slave = slave:/var/lib/postgresql/walshipping - completed_wals = %(slave)s/logs.complete - partial_wals = %(slave)s/logs.partial - full_backup = %(slave)s/data.master - loop_delay = 10.0 - use_xlog_functions = 1 - compression = 1 - -=== Slave settings === - -==== slave_data ==== -Postgres data directory for the slave. This is where the restored -backup is copied/moved. - -==== slave_config_dir ==== -Directory for postgres configuration files. If specified, "walmgr restore" -attempts to restore configuration files from config_backup directory. - -==== slave_stop_cmd ==== -Script to stop postmaster on slave. - -==== slave_start_cmd ==== -Script to start postmaster on slave. - -==== slave ==== -Base directory for slave files (logs.complete, data.master etc) - -==== slave_bin ==== -Specifies the location of postgres binaries (pg_controldata, etc). Needed if -they are not already in the PATH. - -==== completed_wals ==== -Directory where complete WAL files are stored. Also miscellaneous control files -are created in this directory (BACKUPLOCK, STOP, PAUSE, etc.). - -==== partial_wals ==== -Directory where partial WAL files are stored. - -==== full_backup ==== -Directory where full backups are stored. - -==== keep_backups ==== -Number of backups to keep. Also all WAL files needed to bring earliest - -backup up to date are kept. The backups are rotated before new backup -is started, so at one point there is actually one less backup available. - -It probably doesn't make sense to specify `keep_backups` if periodic -backups are not performed - the WAL files will pile up quickly. - -Backups will be named data.master, data.master.0, data.master.1 etc. - -==== archive_command ==== -Script to execute before rotating away the oldest backup. If it fails -backups will not be rotated. - - -==== slave_pg_xlog ==== -Set slave_pg_xlog to the directory on the slave where pg_xlog files get -written to. On a restore to the slave walmgr.py will -create a symbolic link from data/pg_xlog to this location. - - -==== backup_datadir ==== -Set backup_datadir to 'no' to prevent walmgr.py from making a backup -of the data directory when restoring to the slave. This defaults to -'yes' - - -=== Sample slave.ini === - - [wal-slave] - logfile = slave.log - slave_data = /var/lib/postgresql/8.0/main - slave_stop_cmd = /etc/init.d/postgresql-8.0 stop - slave_start_cmd = /etc/init.d/postgresql-8.0 start - slave = /var/lib/postgresql/walshipping - completed_wals = %(slave)s/logs.complete - partial_wals = %(slave)s/logs.partial - full_backup = %(slave)s/data.master - keep_backups = 5 - backup_datadir = yes diff --git a/doc/walmgr3.txt b/doc/walmgr3.txt new file mode 100644 index 00000000..5961be07 --- /dev/null +++ b/doc/walmgr3.txt @@ -0,0 +1,519 @@ + += walmgr3(1) = + +== NAME == + +walmgr3 - tool for managing WAL-based replication for PostgreSQL. + +== SYNOPSIS == + + walmgr3 <config.ini> command [--not-really] [options] + +== DESCRIPTION == + +Walmgr3 is a tool to handle replication of PostgreSQL with PITR (also +known as Log Shipping). + +This script allows to setup and administer the replication, it is also +used by PostgreSQL to archive and restore the WAL files. + +See <<quick_start,QUICK START>> below to start your first log shipping +in few steps. + +The 'command' is one of the Master, Slave, Common or Internal commands +listed below. + +== GENERAL OPTIONS == + + -V, --version:: + Print version info and exit. + + -h, --help:: + Show this help message and exit. + + -q, --quiet:: + Log only errors and warnings. + + -v, --verbose:: + Log verbosely. + + -d, --daemon:: + Run in daemon mode (go background). + + -n, --not-really:: + Show what would be done without actually doing anything. + +== SPECIFIC OPTIONS == + + --ini:: + Display sample ini file. + + --set="'param=val[,param=value]'":: + Override config setting + + --init-master:: + Initialize Master walmgr3 configuration. + + --init-slave:: + Initialize Slave walmgr3 configuration. + + --config-dir='filepath':: + Configuration file location for `--init-X commands`. + + --slave='hostname':: + Slave host name. + + --pgdata='path':: + PostgreSQL data directory. + + --ssh-keygen:: + Generate a SSH key pair if needed (used in Master). + + --ssh-add-key='keyfile.pub':: + Add the public key file to authorized_hosts file (used in Slave). + + --ssh-remove-key='ssh_key':: + Remove Master ssh key from authorized_hosts file (used in Slave). + + --primary-conninfo='connection_string':: + Provide the connection string to the streaming replication Master + (used in Slave). + +== DAEMON OPTIONS == + + -r, --reload:: + Reload config (send SIGHUP). + + -s, --stop:: + Stop program safely (send SIGINT). + + -k, --kill:: + Kill program immediately (send SIGTERM). + +== MASTER COMMANDS == + +=== setup === + +Sets up PostgreSQL for WAL archiving, creates necessary directory +structures on Slave. + +=== sync === + +Copies in-progress WAL files to Slave. + +=== syncdaemon === + +Start WAL synchronization in daemon mode. This will start periodically +synching the in-progress WAL files to Slave. + +The following configuration parameters are used to drive the syncdaemon: + + - *loop_delay* - how long to sleep between the synchs. + + - *use_xlog_functions* - use record based shipping to synchronize + in-progress WAL segments. + +=== stop === + +Stop archiving and de-configure PostgreSQL archiving. + +=== periodic === + +Runs periodic command if configured. This enables to execute arbitrary +commands on interval, useful for synchronizing scripts, config files, +crontabs etc. + +== SLAVE COMMANDS == + +=== boot === + +Stop WAL playback and bring the database up so it can accept queries. + +=== pause === + +Pauses WAL playback. + +=== continue === + +Continues previously paused WAL playback. + +== COMMON COMMANDS == + +=== listbackups === + +Lists available backups on Slave node. + +=== backup === + +Creates a new base backup from Master database. Will purge expired +backups and WAL files on Slave if `keep_backups` is not specified. +During a backup a lock file is created in Slave `completed_wals` +directory. This is to prevent simultaneous backups and resulting +corruption. If running backup is terminated, the BACKUPLOCK file may +have to be removed manually. + +EXPERIMENTAL: If run on Slave, creates backup from in-recovery Slave +data. WAL playback is paused, Slave data directory is backed up to +`full_backup` directory and WAL playback is resumed. Backups are rotated +as needed. The idea is to move the backup load away from production node. +Usable from PostgreSQL 8.2 and up. + +=== restore [src[dst]] === + +Restores the specified backup set to target directory. If specified +without arguments the latest backup is *moved* to Slave data directory +(doesn't obey retention rules). If `src` backup is specified the backup +is copied (instead of moved). Alternative destination directory can be +specified with `dst`. + +=== cleanup === + +Cleanup any walmgr3 files after stop. + +== INTERNAL COMMANDS == + +=== xarchive <srcpath> <srcname> === + +On Master, archive one WAL file. + +=== xrestore <srcname> <dstpath> [last restartpoint wal] === + +On Slave, restore one WAL file. + +=== xlock === + +On Master, create lock file to deny other concurrent backups. + +=== xrelease === + +On Slave, remove backup lock file, allow other backup to run. + +=== xrotate === + +Rotate backups by increasing backup directory suffixes. Note that since +we also have to make room for next backup, we actually have +*keep_backups - 1* backups available after this. + +Unneeded WAL files are not removed here, it is handled by `xpurgewals` +command instead. + +=== xpurgewals === + +On Slave, remove WAL files not needed for recovery. + +=== xpartialsync <filename> <offset> <bytes> === + +Read 'bytes' worth of data from stdin, append to the partial WAl file +starting from 'offset'. On error it is assumed that master restarts +from zero. + +The resulting file is always padded to XLOG_SEGMENT_SIZE bytes to +simplify recovery. + +== CONFIGURATION == + +=== Common settings === + +==== job_name ==== + +Optional. Indentifies this script, used in logging. Keep unique if +using central logging. + +==== logfile ==== + +Where to log. + +==== use_skylog ==== + +Optional. If nonzero, 'skylog.ini' is used for log configuration. + +=== Master settings === + +==== pidfile ==== + +Pid file location for syncdaemon mode (if running with -d). Otherwise +not required. + +==== master_db ==== + +Database to connect to for pg_start_backup(), etc. It is not a +good idea to use `dbname=template` if running syncdaemon in record +shipping mode. + +==== master_data ==== + +Master data directory location. + +==== master_config ==== + +Master postgresql.conf file location. This is where `archive_command` +gets updated. + +==== master_restart_cmd ==== + +The command to restart Master database, this used after changing +`archive_mode` parameter. Leave unset if you cannot afford to restart +the database at setup/stop. + +==== slave ==== + +Slave host and base directory. + +==== slave_config ==== + +Configuration file location for the Slave walmgr3. + +==== completed_wals ==== + +Slave directory where archived WAL files are copied. + +==== partial_wals ==== + +Slave directory where incomplete WAL files are stored. + +==== full_backup ==== + +Slave directory where full backups are stored. + +==== config_backup ==== + +Optional. Slave directory where configuration file backups are stored. + +==== loop_delay ==== + +The frequency of syncdaemon updates. In record shipping mode only +incremental updates are sent, so smaller interval can be used. + +==== use_xlog_functions ==== + +Use pg_xlog functions for record based shipping (available in 8.2 and +up). + +==== compression ==== + +If nonzero, a `-z` flag is added to rsync cmdline. It reduces network +traffic at the cost of extra CPU time. + +==== keep_symlinks ==== + +Keep symlinks for `pg_xlog` and `pg_log`. + +==== hot_standby ==== + +If set to 1, walmgr3 setup will set `wal_level` to `hot_standby` +(PostgreSQL 9.0 and newer). + +==== command_interval ==== + +How ofter to run periodic command script. In seconds, and only evaluated +at log switch times. + +==== periodic_command ==== + +Shell script to be executed at specified time interval. Can be used for +synchronizing scripts, config files etc. + +=== Sample master.ini === + + [walmgr] + job_name = wal-master + logfile = ~/log/%(job_name)s.log + pidfile = ~/pid/%(job_name)s.pid + use_skylog = 1 + + master_db = dbname=my_db + master_data = /var/lib/postgresql/9.1/main + master_config = /etc/postgresql/9.1/main/postgresql.conf + master_bin = /usr/lib/postgresql/9.1/bin + + # set this only if you can afford database restarts during setup and stop. + #master_restart_cmd = /etc/init.d/postgresql-9.1 restart + + slave = slave-host + slave_config = /var/lib/postgresql/conf/wal-slave.ini + + walmgr_data = /var/lib/postgresql/walshipping + completed_wals = %(walmgr_data)s/logs.complete + partial_wals = %(walmgr_data)s/logs.partial + full_backup = %(walmgr_data)s/data.master + config_backup = %(walmgr_data)s/config.backup + + # syncdaemon update frequency + loop_delay = 10.0 + # use record based shipping available since 8.2 + use_xlog_functions = 0 + + # pass -z to rsync, useful on low bandwidth links + compression = 0 + + # keep symlinks for pg_xlog and pg_log + keep_symlinks = 1 + + # tell walmgr to set wal_level to hot_standby during setup + #hot_standby = 1 + + # periodic sync + #command_interval = 600 + #periodic_command = /var/lib/postgresql/walshipping/periodic.sh + +=== Slave settings === + +==== slave_data ==== + +PostgreSQL data directory for the Slave. This is where the restored +backup is copied/moved. + +==== slave_bin ==== + +Specifies the location of PostgreSQL binaries (pg_controldata, etc). +Needed if they are not already in the PATH. + +==== slave_stop_cmd ==== + +Script to stop PostgreSQL on Slave. + +==== slave_start_cmd ==== + +Script to start PostgreSQL on Slave. + +==== slave_config_dir ==== + +Directory for PostgreSQL configuration files. If specified, `walmgr3 restore` +attempts to restore configuration files from `config_backup` directory. + +==== slave_pg_xlog ==== + +Set to the directory on the Slave where pg_xlog files get written to. On +a restore to the Slave walmgr3 will create a symbolic link from +data/pg_xlog to this location. + +==== completed_wals ==== + +Directory where complete WAL files are stored. Also miscellaneous +control files are created in this directory (BACKUPLOCK, STOP, PAUSE, +etc.). + +==== partial_wals ==== + +Directory where partial WAL files are stored. + +==== full_backup ==== + +Directory where full backups are stored. + +==== config_backup ==== + +Optional. Slave directory where configuration file backups are stored. + +==== backup_datadir ==== + +Set `backup_datadir` to 'no' to prevent walmgr3 from making a backup of +the data directory when restoring to the Slave. This defaults to 'yes'. + +==== keep_backups ==== + +Number of backups to keep. Also all WAL files needed to bring earliest +backup up to date are kept. The backups are rotated before new backup +is started, so at one point there is actually one less backup available. + +It probably doesn't make sense to specify `keep_backups` if periodic +backups are not performed - the WAL files will pile up quickly. + +Backups will be named data.master, data.master.0, data.master.1 etc. + +==== archive_command ==== +Script to execute before rotating away the oldest backup. If it fails +backups will not be rotated. + +==== primary_conninfo ==== + +Primary database connection string for hot standby - enabling this will +cause the Slave to be started in hot standby mode. + +=== Sample slave.ini === + + [walmgr] + job_name = wal-slave + logfile = ~/log/%(job_name)s.log + use_skylog = 1 + + slave_data = /var/lib/postgresql/9.1/main + slave_bin = /usr/lib/postgresql/9.1/bin + slave_stop_cmd = /etc/init.d/postgresql-9.1 stop + slave_start_cmd = /etc/init.d/postgresql-9.1 start + slave_config_dir = /etc/postgresql/9.1/main + + # alternative pg_xlog directory for slave, symlinked to pg_xlog on restore + #slave_pg_xlog = /vol2/pg_xlog + + walmgr_data = ~/walshipping + completed_wals = %(walmgr_data)s/logs.complete + partial_wals = %(walmgr_data)s/logs.partial + full_backup = %(walmgr_data)s/data.master + config_backup = %(walmgr_data)s/config.backup + + backup_datadir = yes + keep_backups = 0 + archive_command = + + # primary database connect string for hot standby -- enabling + # this will cause the slave to be started in hot standby mode. + #primary_conninfo = host=master port=5432 user=postgres + +== EXIT STATUS == + + 0:: + Successful program execution. + +== ENVIRONMENT == + +PostgreSQL environment variables can be used. + +== QUICK START [[quick_start]] == + +1. Set up passwordless ssh-key on Master and write configuration file + + master$ walmgr3 --ssh-keygen --init-master --slave <slave_hostname> + +2. Set up passwordless ssh authentication from Master to Slave and write + configuration file on Slave + + slave$ walmgr3 --init-slave --ssh-add-key=/tmp/id_rsa.pub + +3. Logging setup on Master and Slave + + master$ cp skylog.ini ~postgres/ + slave$ cp skylog.ini ~postgres/ + +4. Start archival process and create a base backup + + master$ walmgr3 /var/lib/postgresql/conf/wal-master.ini setup + master$ walmgr3 /var/lib/postgresql/conf/wal-master.ini backup + + CAUTION: starting from PostgreSQL 8.3 the archiving is enabled by + setting archive_mode GUC to on. However changing this parameter + requires the server to be restarted. + +5. Prepare postgresql.conf and pg_hba.conf on Slave and start replay + + slave$ walmgr3 /var/lib/postgresql/conf/wal-slave.ini restore + + For debian based distributions the standard configuration files are + located in /etc/postgresql/x.x/main directory. If another scheme is + used the postgresql.conf and pg_hba.conf should be copied to slave + full_backup directory. Make sure to disable archive_command in slave + config. + + 'walmgr3 restore' moves data in place, creates recovery.conf and + starts postmaster in recovery mode. + +6. In-progress WAL segments can be backup by command: + + master$ walmgr3 /var/lib/postgresql/conf/wal-master.ini sync + +7. If need to stop replay on Slave and boot into normal mode, do: + + slave$ walmgr3 /var/lib/postgresql/conf/wal-slave.ini boot + + + diff --git a/python/walmgr.py b/python/walmgr.py index d4796401..f990bdef 100755 --- a/python/walmgr.py +++ b/python/walmgr.py @@ -34,9 +34,7 @@ Internal commands: xrelease Release backup lock (master) xrotate Rotate backup sets, expire and archive oldest if necessary. xpurgewals Remove WAL files not needed for backup (slave) - -Switches: - -n no action, just print commands + xpartialsync Append data to WAL file (slave) """ import os, sys, re, signal, time, traceback |
