summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
-rw-r--r--doc/common.config.txt38
-rw-r--r--doc/common.logutriga.txt19
-rw-r--r--doc/common.switches.txt29
-rw-r--r--doc/config.txt29
-rw-r--r--doc/londiste.ref.txt309
-rw-r--r--doc/londiste.txt375
-rw-r--r--doc/overview.txt202
-rw-r--r--doc/pgqadm.txt189
-rw-r--r--doc/queue_mover.txt87
-rw-r--r--doc/queue_splitter.txt99
-rw-r--r--doc/upgrade.txt52
11 files changed, 0 insertions, 1428 deletions
diff --git a/doc/common.config.txt b/doc/common.config.txt
deleted file mode 100644
index 7a74623f..00000000
--- a/doc/common.config.txt
+++ /dev/null
@@ -1,38 +0,0 @@
-
-=== Common configuration parameters ===
-
- job_name::
- Name for particulat job the script does. Script will log under this name
- to logdb/logserver. The name is also used as default for PgQ consumer name.
- It should be unique.
-
- pidfile::
- Location for pid file. If not given, script is disallowed to daemonize.
-
- logfile::
- Location for log file.
-
- loop_delay::
- If continuisly running process, how long to sleep after each work loop,
- in seconds. Default: 1.
-
- connection_lifetime::
- Close and reconnect older database connections.
-
- use_skylog::
- foo.
-
-ifdef::pgq[]
-
-=== Common PgQ consumer parameters ===
-
- pgq_queue_name::
- Queue name to attach to.
- No default.
-
- pgq_consumer_id::
- Consumers ID to use when registering.
- Default: %(job_name)s
-
-endif::pgq[]
-
diff --git a/doc/common.logutriga.txt b/doc/common.logutriga.txt
deleted file mode 100644
index 99f02312..00000000
--- a/doc/common.logutriga.txt
+++ /dev/null
@@ -1,19 +0,0 @@
-
-PgQ trigger function `pgq.logutriga()` sends table change event into
-queue in following format:
-
- ev_type::
- `(op || ":" || pkey_fields)`. Where op is either "I", "U" or "D",
- corresponging to insert, update or delete. And `pkey_fields`
- is comma-separated list of primary key fields for table.
- Operation type is always present but pkey_fields list can be empty,
- if table has no primary keys. Example: `I:col1,col2`
-
- ev_data::
- Urlencoded record of data. It uses db-specific urlecoding where
- existence of '=' is meaningful - missing '=' means NULL, present
- '=' means literal value. Example: `id=3&name=str&nullvalue&emptyvalue=`
-
- ev_extra1::
- Fully qualified table name.
-
diff --git a/doc/common.switches.txt b/doc/common.switches.txt
deleted file mode 100644
index 72da7bc3..00000000
--- a/doc/common.switches.txt
+++ /dev/null
@@ -1,29 +0,0 @@
-
-Following switches are common to all skytools.DBScript-based
-Python programs.
-
- -h, --help::
- show help message and exit
-
- -q, --quiet::
- make program silent
-
- -v, --verbose::
- make program more verbose
-
- -d, --daemon::
- make program go background
-
-Following switches are used to control already running process.
-The pidfile is read from config then signal is sent to process
-id specified there.
-
- -r, --reload::
- reload config (send SIGHUP)
-
- -s, --stop::
- stop program safely (send SIGINT)
-
- -k, --kill::
- kill program immidiately (send SIGTERM)
-
diff --git a/doc/config.txt b/doc/config.txt
deleted file mode 100644
index d1062ad7..00000000
--- a/doc/config.txt
+++ /dev/null
@@ -1,29 +0,0 @@
-
-== Common options ==
-
-job_name
-pidfile
-logfile
-
-loop_delay
-connection_lifetime
-
-use_skylog
-
-== Common to PGQ scripts ==
-
-pgq_queue_name
-
-pgq_consumer_id
-
-== Londiste ==
-
-provider_db =
-subscriber_db =
-
-== PgqAdm ==
-
-maint_delay =
-queue_refresh_period
-ticker_log_delay
-
diff --git a/doc/londiste.ref.txt b/doc/londiste.ref.txt
deleted file mode 100644
index bb6a2e5b..00000000
--- a/doc/londiste.ref.txt
+++ /dev/null
@@ -1,309 +0,0 @@
-
-= Londiste Reference =
-
-== Notes ==
-
-=== PgQ daemon ===
-
-Londiste runs as a consumer on PgQ. Thus `pgqadm.py ticker` must be running
-on provider database. It is preferable to run ticker on same machine as database,
-because it needs low latency, but that is not a requirement.
-
-For monitoring you can use `pgqadm.py status` command.
-
-=== Table Names ===
-
-Londiste internally uses table names always fully schema-qualified.
-If table name without schema is given on command line, it just
-puts "public." in front of it, without looking at search_path.
-
-=== PgQ events used ===
-
-==== Table data change event in SQL format ====
-
-Those events will be inserted by triggers on tables.
-
- * ev_type = 'I' / 'U' / 'D'
- * ev_data = partial SQL statement - the part between `[]` is removed:
- - `[ INSERT INTO table ] (column1, column2) values (value1, value2)`
- - `[ UPDATE table SET ] column2=value2 WHERE pkeycolumn1 = value1`
- - `[ DELETE FROM table WHERE ] pkeycolumn1 = value1`
- * ev_extra1 = table name with schema
-
-Such partial SQL format is used for 2 reasons - to conserve space
-and to make possible to redirect events to another table.
-
-==== Table data change event in urlencoded format ====
-
-Those events will be inserted by triggers on tables.
-
- * ev_type = 'I' / 'U' / 'D' + ':' + list of pkey columns
- Eg: I:lastname,firstname
- * ev_data = urlencoded values of all columns of the row.
- NULL is signified by omitting '=' after column name.
- * ev_extra1 = table name with schema
-
-Urlencoded events take more space that SQL events, but are more
-easily parseable by other scripts.
-
-==== Table addition event ====
-
-This event will be inserted by 'londiste add-table' on root.
-
- * ev_type = 'londiste.add-table'
- * ev_data = table name
-
-All subscribers downstream will also register this table
-as being available on the queue.
-
-==== Table removal event ====
-
-This event will be inserted by 'londiste remove-table' on root.
-
- * ev_type = 'londiste.remove-table'
- * ev_data = table name
-
-All subscribers downstream will now unregistister this table
-as being available on the queue. If they happen to be subscriber
-to this table locally, the table is unsubscribed.
-
-==== SQL script execution event ====
-
-This event is inserted by 'londiste execute' on root node.
-The insert happens in same TX as the actual command are executed.
-
- * ev_type = 'EXECUTE'
- * ev_data = script body
- * ev_extra1 = unique id for script (file name?)
-
-Script is identified by name - it is used to check if it is already applied.
-This allows to override scripts in downstream nodes.
-
-== table attributes ==
-
-* skip_truncate
-* copy_where
-
-== log file ==
-
-Londiste normal log consist just of statistics log-lines, key-value
-pairs between `{}`. Their meaning:
-
- * count: how many event was in batch.
- * ignored: how many of them was ignored - table not registered on subscriber or not yet in sync.
- * duration: how long the batch processing took, in seconds.
-
-Example:
-
- {count: 110, duration: 0.88}
-
-
-
-
-
-
-
-
-
-== Commands for managing provider database ==
-
-=== provider install ===
-
- londiste.py <config.ini> provider install
-
-Installs code into provider and subscriber database and creates queue.
-Equivalent to doing following by hand:
-
- CREATE LANGUAGE plpgsql;
- CREATE LANGUAGE plpython;
- \i .../contrib/txid.sql
- \i .../contrib/logtriga.sql
- \i .../contrib/pgq.sql
- \i .../contrib/londiste.sql
- select pgq.create_queue(queue name);
-
-Notes:
-
- * The schema/tables are installed under user Londiste is configured to run.
- If you prefer to run Londiste under non-admin user, they should also
- be installed by hand.
-
-=== provider add ===
-
- londiste.py <config.ini> provider add <table name> ...
-
-Registers table on provider database and adds trigger to the table
-that will send events to the queue.
-
-=== provider remove ===
-
- londiste.py <config.ini> provider remove <table name> ...
-
-Unregisters table on provider side and removes triggers on table.
-The event about table removal is also sent to the queue, so
-all subscriber unregister table from their end also.
-
-=== provider tables ===
-
- londiste.py <config.ini> provider tables
-
-Shows registered tables on provider side.
-
-=== provider seqs ===
-
- londiste.py <config.ini> provider seqs
-
-Shows registered sequences on provider side.
-
-== Commands for managing subscriber database ==
-
-=== subscriber install ===
-
- londiste.py <config.ini> subscriber install
-
-Installs code into subscriber database.
-Equivalent to doing following by hand:
-
- CREATE LANGUAGE plpgsql;
- \i .../contrib/londiste.sql
-
-This will be done under Londiste user, if the tables should be
-owned by someone else, it needs to be done by hand.
-
-=== subscriber add ===
-
- londiste.py <config.ini> subscriber add <table name> ... [--excect-sync | --skip-truncate | --force]
-
-Registers table on subscriber side.
-
-Switches
-
- --expect-sync:: Table is tagged as in-sync so initial COPY is skipped.
- --skip-truncate:: When doing initial COPY, don't remove old data.
- --force:: Ignore table structure differences.
-
-=== subscriber remove ===
-
- londiste.py <config.ini> subscriber remove <table name> ...
-
-Unregisters the table from subscriber. No events will be applied
-to the table anymore. Actual table will not be touched.
-
-=== subscriber resync ===
-
- londiste.py <config.ini> subscriber resync <table name> ...
-
-Tags tables are "not synced." Later replay process will notice this
-and launch `copy` process to sync the table again.
-
-== Replication commands ==
-
-=== replay ===
-
-The actual replication process. Should be run as daemon with `-d` switch,
-because it needs to be always running.
-
-It main task is to get a batches from PgQ and apply them in one transaction.
-
-Basic logic:
-
- * Get batch from PgQ queue on provider. See if it is already applied to
- subsciber, skip the batch in that case.
- * Management actions, can do transactions on subscriber:
- - Load table state from subscriber, to be up-to-date on registrations
- and `copy` processes running in parallel.
- - If a `copy` process wants to give table over to main process,
- wait until `copy` process catches-up.
- - If there is a table that is not synced and no `copy` process
- is already running, launch new `copy` process.
- - If there are sequences registered on subscriber, look latest state
- of them on provider and apply it to subscriber.
- * Event replay, all in one transaction on subscriber:
- - Apply events from the batch, only for tables that are registered
- on subscriber and are in sync.
- - Store tick_id on subscriber.
-
-=== copy (internal) ===
-
-Internal command for initial SYNC. Launched by `replay` if it notices
-that some tables are not in sync. The reason to do table copying in
-separate process is to avoid locking down main replay process for
-long time.
-
-Basic logic:
-
- * Register on the same queue in parallel with different name.
- * One transaction on subscriber:
- - Drop constraints and indexes.
- - Truncate table.
- - COPY data in.
- - Restore constraints and indexes.
- - Tag the table as `catching-up`.
- * When catching-up, the `copy` process acts as regular
- `replay` process but just for one table.
- * When it reaches queue end, when no more batches are immidiately
- available, it hands the table over to main `replay` process.
-
-State changes between `replay` and `copy`:
-
- State | Owner | What is done
- ---------------------+--------+--------------------
- NULL | replay | Changes state to "in-copy", launches londiste.py copy process, continues with it's work
- in-copy | copy | drops indexes, truncates, copies data in, restores indexes, changes state to "catching-up"
- catching-up | copy | replay events for that table only until no more batches (means current moment),
- | | change state to "wanna-sync:<tick_id>" and wait for state to change
- wanna-sync:<tick_id> | replay | catch up to given tick_id, change state to "do-sync:<tick_id>" and wait for state to change
- do-sync:<tick_id> | copy | catch up to given tick_id, both replay and copy must now be at same position. change state to "ok" and exit
- ok | replay | synced table, events can be applied
-
-Such state changes must guarantee that any process can die at any time and by just restarting it can
-continue where it left.
-
-"subscriber add" registers table with `NULL` state. "subscriber add --expect-sync" registers table with `ok` state.
-
-"subscriber resync" sets table state to `NULL`.
-
-== Utility commands ==
-
-=== repair ===
-
-it tries to achieve a state where tables should be in sync and then compares
-them and writes out SQL statements that would fix differences.
-
-Syncing happens by locking provider tables against updates and then waiting
-unitl `replay` has applied all pending changes to subscriber database. As this
-is dangerous operation, it has hardwired limit of 10 seconds for locking. If
-`replay process does not catch up in that time, locks are releases and operation
-is canceled.
-
-Comparing happens by dumping out table from both sides, sorting them and
-then comparing line-by-line. As this is CPU and memory-hungry operation,
-good practice is to run the `repair` command on third machine, to avoid
-consuming resources on neither provider nor subscriber.
-
-=== compare ===
-
-it syncs tables like repair, but just runs SELECT count(*) on both sides,
-to get a little bit cheaper but also less precise way of checking
-if tables are in sync.
-
-== Config file ==
-
- [londiste]
- job_name = test_to_subcriber
-
- # source database, where the queue resides
- provider_db = dbname=provider port=6000 host=127.0.0.1
-
- # destination database
- subscriber_db = dbname=subscriber port=6000 host=127.0.0.1
-
- # the queue where to listen on
- pgq_queue_name = londiste.replika
-
- # where to log
- logfile = ~/log/%(job_name)s.log
-
- # pidfile is used for avoiding duplicate processes
- pidfile = ~/pid/%(job_name)s.pid
-
diff --git a/doc/londiste.txt b/doc/londiste.txt
deleted file mode 100644
index a6665927..00000000
--- a/doc/londiste.txt
+++ /dev/null
@@ -1,375 +0,0 @@
-= londiste(1) =
-
-
-== NAME ==
-
-londiste - PostgreSQL replication engine written in python
-
-== SYNOPSIS ==
-
- londiste.py [option] config.ini command [arguments]
-
-== DESCRIPTION ==
-
-Londiste is the PostgreSQL replication engine portion of the SkyTools suite,
-by Skype. This suite includes packages implementing specific replication
-tasks and/or solutions in layers, building upon each other.
-
-PgQ is a generic queue implementation based on ideas from Slony-I's
-snapshot based event batching. Londiste uses PgQ as its transport
-mechanism to implement a robust and easy to use replication solution.
-
-Londiste is an asynchronous master-slave(s) replication
-system. Asynchronous means that a transaction commited on the master is
-not guaranteed to have made it to any slave at the master's commit time; and
-master-slave means that data changes on slaves are not reported back to
-the master, it's the other way around only.
-
-The replication is trigger based, and you choose a set of tables to
-replicate from the provider to the subscriber(s). Any data changes
-occuring on the provider (in a replicated table) will fire the
-londiste trigger, which fills a queue of events for any subscriber(s) to
-care about.
-
-A replay process consumes the queue in batches, and applies all given
-changes to any subscriber(s). The initial replication step involves using the
-PostgreSQL's COPY command for efficient data loading.
-
-== QUICK-START ==
-
-Basic londiste setup and usage can be summarized by the following
-steps:
-
- 1. create the subscriber database, with tables to replicate
-
- 2. install pgq on both databases and launch pgq daemon.
-
- $ edit pgqadm-master.ini
- $ edit pgqadm-slave.ini
- $ pgqadm pgqadm-master.ini install
- $ pgqadm pgqadm-slave.ini install
- $ pgqadm pgqadm-master.ini ticker -d
- $ pgqadm pgqadm-slave.ini ticker -d
-
- 3. create londiste config file for both databases.
-
- $ edit londiste-master.ini
- $ edit londiste-slave.ini
-
- 4. create londiste nodes. this also installs londiste code.
-
- $ londiste londiste-master.ini create-root master
- $ londiste londiste-slave.ini create-branch slave --provider="master connstr"
-
- 5. launch londiste daemons for both databases.
-
- $ londiste londiste-master.ini replay -d
- $ londiste londiste-slave.ini replay -d
-
- 6. add tables to master
-
- $ londiste londiste-master.ini add-table mytbl1 mytbl2
-
- 7. add tables to slave
-
- $ londiste londiste-slave.ini add-table mytbl1 mytbl2
-
-To replicate to more than one subscriber database just repeat each of the
-described subscriber steps for each subscriber.
-
-== COMMANDS ==
-
-The londiste command is parsed globally, and has both options and
-subcommands. Some options are reserved to a subset of the commands,
-and others should be used without any command at all.
-
-== GENERAL OPTIONS ==
-
-This section presents options available to all and any londiste
-command.
-
- -h, --help::
- show this help message and exit
-
- -q, --quiet::
- make program silent
-
- -v, --verbose::
- make program more verbose
-
-
-== CASCADING COMMANDS ==
-
-=== install ===
-
-Installs code into provider and subscriber database and creates
-queue. Equivalent to doing following by hand:
-
- CREATE LANGUAGE plpgsql;
- \i .../contrib/txid.sql
- \i .../contrib/pgq.sql
- \i .../contrib/pgq_node.sql
- \i .../contrib/londiste.sql
-
-=== create-root ===
-
-q
-
-=== create-branch ===
-
-q
-
-=== create-leaf ===
-
-q
-
-=== add-table <table name> ... ===
-
-Registers table(s) on the provider database and adds the londiste trigger to
-the table(s) which will send events to the queue. Table names can be schema
-qualified with the schema name defaulting to public if not supplied.
-
- --all::
- Register all tables in provider database, except those that are
- under schemas 'pgq', 'londiste', 'information_schema' or 'pg_*'.
-
-=== remove-table <table name> ... ===
-
-Unregisters table(s) on the provider side and removes the londiste triggers
-from the table(s). The table removal event is also sent to the queue, so all
-subscribers unregister the table(s) on their end as well. Table names can be
-schema qualified with the schema name defaulting to public if not supplied.
-
-=== provider add-seq <sequence name> ... ===
-
-Registers a sequence on provider.
-
-=== provider remove-seq <sequence name> ... ===
-
-Unregisters a sequence on provider.
-
-=== provider tables ===
-
-Shows registered tables on provider side.
-
-=== provider seqs ===
-
-Shows registered sequences on provider side.
-
-== SUBSCRIBER COMMANDS ==
-
- londiste.py config.ini subscriber <command>
-
-Where command is one of:
-
-=== subscriber install ===
-
-Installs code into subscriber database. Equivalent to doing following
-by hand:
-
- CREATE LANGUAGE plpgsql;
- \i .../contrib/londiste.sql
-
-This will be done under the Postgres Londiste user, if the tables should
-be owned by someone else, it needs to be done by hand.
-
-=== subscriber add <table name> ... ===
-
-Registers table(s) on subscriber side. Table names can be schema qualified
-with the schema name defaulting to `public` if not supplied.
-
-Switches (optional):
-
- --all::
- Add all tables that are registered on provider to subscriber database
- --force::
- Ignore table structure differences.
- --excect-sync::
- Table is already synced by external means so initial COPY is unnecessary.
- --skip-truncate::
- When doing initial COPY, don't remove old data.
-
-=== subscriber remove <table name> ... ===
-
-Unregisters table(s) from subscriber. No events will be applied to
-the table anymore. Actual table will not be touched. Table names can be
-schema qualified with the schema name defaulting to public if not supplied.
-
-=== subscriber add-seq <sequence name> ... ===
-
-Registers a sequence on subscriber.
-
-=== subscriber remove-seq <sequence name> ... ===
-
-Unregisters a sequence on subscriber.
-
-=== subscriber resync <table name> ... ===
-
-Tags table(s) as "not synced". Later the replay process will notice this
-and launch copy process(es) to sync the table(s) again.
-
-=== subscriber tables ===
-
-Shows registered tables on the subscriber side, and the current state of
-each table. Possible state values are:
-
-NEW::
- the table has not yet been considered by londiste.
-
-in-copy::
- Full-table copy is in progress.
-
-catching-up::
- Table is copied, missing events are replayed on to it.
-
-wanna-sync:<tick-id>::
- The "copy" process catched up, wants to hand the table over to
- "replay".
-
-do-sync:<tick_id>::
- "replay" process is ready to accept it.
-
-ok::
- table is in sync.
-
-
-== REPLICATION COMMANDS ==
-
-=== replay ===
-
-The actual replication process. Should be run as daemon with -d
-switch, because it needs to be always running.
-
-It's main task is to get batches of events from PgQ and apply
-them to subscriber database.
-
-Switches:
-
- -d, --daemon::
- go background
-
- -r, --reload::
- reload config (send SIGHUP)
-
- -s, --stop::
- stop program safely (send SIGINT)
-
- -k, --kill::
- kill program immidiately (send SIGTERM)
-
-== UTILITY COMMAND ==
-
-=== repair <table name> ... ===
-
-Attempts to achieve a state where the table(s) is/are in sync, compares
-them, and writes out SQL statements that would fix differences.
-
-Syncing happens by locking provider tables against updates and then
-waiting until the replay process has applied all pending changes to
-subscriber database. As this is dangerous operation, it has a hardwired
-limit of 10 seconds for locking. If the replay process does not catch up
-in that time, the locks are released and the repair operation is cancelled.
-
-Comparing happens by dumping out the table contents of both sides,
-sorting them and then comparing line-by-line. As this is a CPU and
-memory-hungry operation, good practice is to run the repair command on a
-third machine to avoid consuming resources on either the provider or the
-subscriber.
-
-=== compare <table name> ... ===
-
-Syncs tables like repair, but just runs SELECT count(*) on both
-sides to get a little bit cheaper, but also less precise, way of
-checking if the tables are in sync.
-
-== CONFIGURATION ==
-
-Londiste and PgQ both use INI configuration files, your distribution of
-skytools include examples. You often just have to edit the database
-connection strings, namely db in PgQ ticker.ini and provider_db and
-subscriber_db in londiste conf.ini as well as logfile and pidfile to adapt to
-you system paths.
-
-See `londiste(5)`.
-
-== UPGRADING ==
-
-As the skytools software contains code which is run directly from
-inside the database server (+PostgreSQL+ functions), installing the
-new package version at the OS level is not enough to perform the
-upgrade.
-
-It still is possible to upgrade londiste without stopping the service,
-how to do this somewhat depends on the specifics version you're
-upgrading from and to, please refer to the +upgrade.txt+
-documentation, which can be found at this url too:
-http://skytools.projects.postgresql.org/doc/upgrade.html[upgrade.html]
-
-== CURRENT LIMITATIONS ==
-
-+londiste+, as a trigger based solution, is not able to replicate
-neither
-http://www.postgresql.org/docs/current/interactive/ddl.html[DDL] nor
-http://www.postgresql.org/docs/current/static/sql-truncate.html[TRUNCATE]
-+SQL+ commands.
-
-Please also note that the cascaded replication scenario is still a
-+TODO+ item, which means +londiste+ is not yet able to properly handle
-the case on its own.
-
-=== DDL ===
-
-If you edit a table definition on the provider, you have to manually
-update the table definition on every subscriber replicating the table
-data. When adding, renaming or removing columns, replication won't
-work for the table until subscriber is updated too, but +londiste+
-won't loose any item to replicate, and reapply them once schemas match.
-
-=== TRUNCATE ===
-
-For truncating a table +foo+ which is replicated by +londiste+, the
-easier way is to remove +foo+ from subscriber(s), +TRUNCATE+ it on the
-provider and add it again on subscriber(s):
-
- subscriber> londiste.py conf.ini subscriber remove foo
- provider> psql provider_db -c "truncate foo;"
- subscriber> londiste.py conf.ini subscriber add foo
-
-Of course, you need to perform the subscriber steps on each of your
-subscribers if you have more than one currently replicating the +foo+
-table.
-
-=== Cascaded replication ===
-
-+londiste+ is not yet able to handle cascaded replication. What it
-means is that if you setup the three servers A, B and C such as some
-tables from A are replicated to B and the same table are replicated
-from B to C, and if the replication from A to B is stopped, +londiste+
-won't be able to have the replication to C ongoing by reconfiguring it
-from A to C for you.
-
-== SWITHOVER ==
-
-While using +londiste+ to replicate data from a provider to a
-subscriber, it is possible to have the subscriber become the provider.
-This can be used, for example, to upgrade from one +PostgreSQL+
-version to another (more recent) one, or from a physical setup to
-another, for example.
-
-The recommanded procedure to achieve a switchover is the following:
-
- 1. stop all write access to db.
- 2. let the londiste apply last changes
- 3. set up new queue on slave as provider, add tables
- 4. subscribe old master to new master, add tables with --expect-sync
- 5. do some DDL things on new master (triggers, etc)
- 6. allow write access to new master
-
-== SEE ALSO ==
-
-`londiste(5)`
-
-https://developer.skype.com/SkypeGarage/DbProjects/SkyTools/[]
-
-http://skytools.projects.postgresql.org/doc/londiste.ref.html[Reference guide]
-
diff --git a/doc/overview.txt b/doc/overview.txt
deleted file mode 100644
index d63eed91..00000000
--- a/doc/overview.txt
+++ /dev/null
@@ -1,202 +0,0 @@
-#pragma section-numbers 2
-
-= SkyTools =
-
-[[TableOfContents]]
-
-== Intro ==
-
-This is package of tools we use at Skype to manage our cluster of [http://www.postgresql.org PostgreSQL]
-servers. They are put together for our own convinience and also because they build on each other,
-so managing them separately is pain.
-
-The code is hosted at [http://pgfoundry.org PgFoundry] site:
-
- http://pgfoundry.org/projects/skytools/
-
-There are our [http://pgfoundry.org/frs/?group_id=1000206 downloads] and
-[http://lists.pgfoundry.org/mailman/listinfo/skytools-users mailing list].
-Also [http://pgfoundry.org/scm/?group_id=1000206 CVS]
-and [http://pgfoundry.org/tracker/?group_id=1000206 bugtracker].
-
-Combined todo list for all the modules: [http://skytools.projects.postgresql.org/doc/TODO.html TODO.html]
-
-== High-level tools ==
-
-Those are script that are meant for end-user.
-In our case that means database administrators.
-
-=== Londiste ===
-
-Replication engine written in Python. It uses PgQ as transport mechanism.
-Its main goals are robustness and easy usage. Thus its not as complete
-and featureful as Slony-I.
-
-[http://pgsql.tapoueh.org/londiste.html Tutorial] written by Dimitri Fontaine.
-
-Documentation:
-
- * [http://skytools.projects.postgresql.org/doc/londiste.cmdline.html Usage guide]
- * [http://skytools.projects.postgresql.org/doc/londiste.config.html Config file]
- * [http://skytools.projects.postgresql.org/doc/londiste.ref.html Low-level reference]
-
-''' Features '''
-
- * Tables can be added one-by-one into set.
- * Initial COPY for one table does not block event replay for other tables.
- * Can compare tables on both sides.
- * Supports sequences.
- * Easy installation.
-
-''' Missing features '''
-
- * Does not understand cascaded replication, when one subscriber acts
- as provider to another one and it dies, the last one loses sync with the first one.
- In other words - it understands only pair of servers.
-
-''' Sample usage '''
-{{{
-## install pgq on provider:
-$ pgqadm.py provider_ticker.ini install
-
-## run ticker on provider:
-$ pgqadm.py provider_ticker.ini ticker -d
-
-## install Londiste in provider
-$ londiste.py replic.ini provider install
-
-## install Londiste in subscriber
-$ londiste.py replic.ini subscriber install
-
-## start replication daemon
-$ londiste.py replic.ini replay -d
-
-## activate tables on provider
-$ londiste.py replic.ini provider add users orders
-
-## add tables to subscriber
-$ londiste.py replic.ini subscriber add users
-}}}
-
-=== PgQ ===
-
-Generic queue implementation. Based on ideas from [http://www.slony1.info/ Slony-I] -
-snapshot based event batching.
-
-''' Features '''
-
- * Generic multi-consumer, multi-producer queue.
- * There can be several consumers on one queue.
- * It is guaranteed that each of them sees a event at least once.
- But it's not guaranteed that it sees it only once.
- * The goal is to provide a clean API as SQL functions. The frameworks
- on top of that don't need to understand internal details.
-
-''' Technical design '''
-
- * Events are batched using snapshots (like Slony-I).
- * Consumers are poll-only, they don't need to do any administrative work.
- * Queue administration is separate process from consumers.
- * Tolerant of long transactions.
- * Easy to monitor.
-
-''' Docs '''
-
- * [http://skytools.projects.postgresql.org/doc/pgq-sql.html SQL API overview]
- * [http://skytools.projects.postgresql.org/pgq/ SQL API detailed docs]
- * [http://skytools.projects.postgresql.org/doc/pgq-admin.html Administrative tool usage]
-
-=== WalMgr ===
-
-Python script for hot failover. Tries to make setup
-initial copy and later switch easy for admins.
-
- * Docs: [http://skytools.projects.postgresql.org/doc/walmgr.html walmgr.html]
-
-Sample:
-
-{{{
- [ .. prepare config .. ]
-
- master$ walmgr master.ini setup
- master$ walmgr master.ini backup
- slave$ walmgr slave.ini restore
-
- [ .. main server down, switch failover server to normal mode: ]
-
- slave$ walmgr slave.ini boot
-}}}
-
-== Low-level tools ==
-
-Those are building blocks for the PgQ and Londiste.
-Useful for database developers.
-
-=== txid ===
-
- Provides 8-byte transaction id-s for external usage.
-
-=== logtriga ===
-
- Trigger function for table event logging in "partial SQL" format.
- Based on Slony-I logtrigger. Used in londiste for replication.
-
-=== logutriga ===
-
- Trigger function for table event logging in urlencoded format.
- Written in PL/Python. For cases where data manipulation is necessary.
-
-== Developement frameworks ==
-
-=== skytools - Python framework for database scripting ===
-
-This collect various utilities for Python scripts for databases.
-
-''' Topics '''
-
- * Daemonization
- * Logging
- * Configuration.
- * Skeleton class for scripts.
- * Quoting (SQL/COPY)
- * COPY helpers.
- * Database object lookup.
- * Table structure detection.
-
-Documentation: http://skytools.projects.postgresql.org/api/
-
-=== pgq - Python framework for PgQ consumers ===
-
-This builds on scripting framework above.
-
-Docs:
-
- * [http://skytools.projects.postgresql.org/api/ Python API docs]
-
-== Sample scripts ==
-
-Those are specialized script that are based on skytools/pgq framework.
-Can be considered examples, although they are used in production in Skype.
-
-=== Special data moving scripts ===
-
-There are couple of scripts for situations where regular replication
-does not fit. They all operate on `logutriga()` urlencoded queues.
-
- * `cube_dispatcher`: Multi-table partitioning on change date, with optional keep-all-row-versions mode.
- * `table_dispatcher`: configurable partitioning for one table.
- * `bulk_loader`: aggregates changes for slow databases. Instead of each change in separate statement,
- does minimal amount of DELETE-s and then big COPY.
-
-|| Script || Supported operations || Number of tables || Partitioning ||
-|| table_dispatcher || INSERT || 1 || any ||
-|| cube_dispatcher || INSERT/UPDATE || any || change time ||
-|| bulk_loader || INSERT/UPDATE/DELETE || any || none ||
-
-=== queue_mover ===
-
-Simply copies all events from one queue to another.
-
-=== scriptmgr ===
-
-Allows to start and stop several scripts together.
diff --git a/doc/pgqadm.txt b/doc/pgqadm.txt
deleted file mode 100644
index 7c6df5c7..00000000
--- a/doc/pgqadm.txt
+++ /dev/null
@@ -1,189 +0,0 @@
-= pgqadm(1) =
-
-== NAME ==
-
-pgqadm - PgQ ticker and administration interface
-
-== SYNOPSIS ==
-
- pgqadm.py [option] config.ini command [arguments]
-
-== DESCRIPTION ==
-
-PgQ is Postgres based event processing system. It is part of SkyTools package
-that contains several useful implementations on this engine. Main function of
-PgQadm is to maintain and keep healthy both pgq internal tables and tables that
-store events.
-
-SkyTools is scripting framework for Postgres databases written in Python that
-provides several utilities and implements common database handling logic.
-
-Event - atomic piece of data created by Producers. In PgQ event is one record
-in one of tables that services that queue. Event record contains some system fields
-for PgQ and several data fileds filled by Producers. PgQ is neither checking nor
-enforcing event type. Event type is someting that consumer and produser must agree on.
-PgQ guarantees that each event is seen at least once but it is up to consumer to
-make sure that event is processed no more than once if that is needed.
-
-Batch - PgQ is designed for efficiency and high throughput so events are grouped
-into batches for bulk processing. Creating these batches is one of main tasks of
-PgQadm and there are several parameters for each queue that can be use to tune
-size and frequency of batches. Consumerss receive events in these batches and depending
-on business requirements process events separately or also in batches.
-
-Queue - Event are stored in queue tables i.e queues. Several producers can write into
-same queeu and several consumers can read from the queue. Events are kept in queue
-until all the consumers have seen them. We use table rotation to decrease
-hard disk io. Queue can contain any number of event types it is up to Producer and
-Consumer to agree on what types of events are passed and how they are encoded
-For example Londiste producer side can produce events for more tables tan consumer
-side needs so consumer subscribes only to those tables it needs and events for
-other tables are ignores.
-
-Producer - applicatione that pushes event into queue. Prodecer can be written in any
-langaage that is able to run stored procedures in Postgres.
-
-Consumer - application that reads events from queue. Consumers can be written in any
-language that can interact with Postgres. SkyTools package contains several useful
-consumers written in Python that can be used as they are or as good starting points
-to write more complex consumers.
-
-== QUICK-START ==
-
-Basic PgQ setup and usage can be summarized by the following
-steps:
-
- 1. create the database
-
- 2. edit a PgQ ticker configuration file, say ticker.ini
-
- 3. install PgQ internal tables
-
- $ pgqadm.py ticker.ini install
-
- 4. launch the PgQ ticker on databse machine as daemon
-
- $ pgqadm.py -d ticker.ini ticker
-
- 5. create queue
-
- $ pgqadm.py ticker.ini create <queue>
-
- 6. register or run consumer to register it automatically
-
- $ pgqadm.py ticker.ini register <queue> <consumer>
-
- 7. start producing events
-
-== CONFIG ==
-
- [pgqadm]
- job_name = pgqadm_somedb
-
- db = dbname=somedb
-
- # how often to run maintenance [seconds]
- maint_delay = 600
-
- # how often to check for activity [seconds]
- loop_delay = 0.1
-
- logfile = ~/log/%(job_name)s.log
- pidfile = ~/pid/%(job_name)s.pid
-
-== COMMANDS ==
-
-=== ticker ===
-
-Start ticking & maintenance process. Usually run as daemon with -d option.
-Must be running for PgQ to be functional and for consumers to see any events.
-
-=== status ===
-
-Show overview of registered queues and consumers and queue health.
-This command is used when you want to know what is happening inside PgQ.
-
-=== install ===
-
-Installs PgQ schema into database from config file.
-
-=== create <queue> ===
-
-Create queue tables into pgq schema. As soon as queue is created producers can
-start inserting events into it. But you must be aware that if there are no
-consumers on the queue the events are lost until consumer is registered.
-
-=== drop <queue> ===
-
-Drop queue and all it's consumers from PgQ. Queue tables are dropped and
-all the contents are lost forever so use with care as with most drop commands.
-
-=== register <queue> <consumer> ===
-
-Register given consumer to listen to given queue. First batch seen by this consumer
-is the one completed after registration. Registration happens automatically when
-consumer is run first time so using this command is optional but may be needed
-when producers start producing events before consumer can be run.
-
-=== unregister <queue> <consumer> ===
-
-Removes consumer from given queue. Note consumer must be stopped before issuing
-this command otherwise it automatically registers again.
-
-=== config [<queue> [<variable>=<value> ... ]] ===
-
-Show or change queue config. There are several parameters that can be set for each
-queue shown here with default values:
-
-queue_ticker_max_lag (2)::
- If no tick has happend during given number of seconds then one
- is generated just to keep queue lag in control. It may be increased
- if there is no need to deliver events fast. Not much room to decrease it :)
-
-queue_ticker_max_count (200)::
- Threshold number of events in filling batch that triggers tick.
- Can be increased to encourage PgQ to create larger batches or decreased
- to encourage faster ticking with smaller batches.
-
-queue_ticker_idle_period (60)::
- Number of seconds that can pass without ticking if no events are coming to queue.
- These empty ticks are used as keep alive signals for batch jobs and monitoring.
-
-queue_rotation_period (2 hours)::
- Interval of time that may pass before PgQ tries to rotate tables to free up space.
- Not PgQ can not rotate tables if there are long transactions in database like VACUUM
- or pg_dump. May be decreased if low on disk space or increased to keep longer history
- of old events. To small values might affect performance badly because postgres tends
- to do seq scans on small tables. Too big values may waste disk space.
-
-Looking at queue config.
-
- $ pgqadm.py mydb.ini config
- testqueue
- queue_ticker_max_lag = 3
- queue_ticker_max_count = 500
- queue_ticker_idle_period = 60
- queue_rotation_period = 7200
- $ pgqadm.py conf/pgqadm_myprovider.ini config testqueue queue_ticker_max_lag=10 queue_ticker_max_count=300
- Change queue bazqueue config to: queue_ticker_max_lag='10', queue_ticker_max_count='300'
- $
-
-== COMMON OPTIONS ==
-
- -h, --help::
- show help message
- -q, --quiet::
- make program silent
- -v, --verbose::
- make program verbose
- -d, --daemon::
- go background
- -r, --reload::
- reload config (send SIGHUP)
- -s, --stop::
- stop program safely (send SIGINT)
- -k, --kill::
- kill program immidiately (send SIGTERM)
-
-// vim:sw=2 et smarttab sts=2:
-
diff --git a/doc/queue_mover.txt b/doc/queue_mover.txt
deleted file mode 100644
index 3f8acbfd..00000000
--- a/doc/queue_mover.txt
+++ /dev/null
@@ -1,87 +0,0 @@
-
-= queue_mover(1) =
-
-== NAME ==
-
-queue_mover - PgQ consumer that copies data from one queue to another.
-
-== SYNOPSIS ==
-
- queue_mover.py [switches] config.ini
-
-== DESCRIPTION ==
-
-queue_mover is PgQ consumer that transports events from source queue into
-target queue. One use case is when events are produced in several databases
-then queue_mover is used to consolidate these events into single queue
-that can then be processed by consumers who need to handle theses events.
-For example in case of patitioned databases it's convenient to move events
-from each partition into one central queue database and then process them there.
-That way configuration and dependancies of partiton databases are
-simpler and more robust. Another use case is to move events from OLTP
-database to batch processing server.
-
-Transactionality: events will be inserted as one transaction on target side.
-That means only batch_id needs to be tracked on target side.
-
-== QUICK-START ==
-
-Basic PgQ setup and usage can be summarized by the following
-steps:
-
- 1. PgQ must be installed both in source and target databases.
- See pgqadm man page for details.
-
- 2. Target database must also have pgq_ext schema installed.
- It is used to keep sync between two databases.
-
- 3. Create a queue_mover configuration file, say qmover_sourceq_to_targetdb.ini
-
- 4. create source and target queues
-
- $ pgqadm.py sourcedb_ticker.ini create <srcqueue>
- $ pgqadm.py targetdb_ticker.ini create <dstqueue>
-
- 5. launch queue mover in daemon mode
-
- $ queue_mover.py -d qmover_sourceq_to_targetdb.ini
-
- 6. start producing and consuming events
-
-
-== CONFIG ==
-
-include::common.config.txt[]
-
-=== queue_mover parameters ===
-
-src_db::
- Source database.
-
-dst_db::
- Target database.
-
-dst_queue_name::
- Target queue name.
-
-=== Example config file ===
-
- [queue_mover]
- job_name = eventlog_to_target_mover
- src_db = dbname=sourcedb
- dst_db = dbname=targetdb
- pgq_queue_name = eventlog
- dst_queue_name = copy_of_eventlog
- pidfile = log/%(job_name)s.pid
- logfile = pid/%(job_name)s.log
-
-== COMMAND LINE SWITCHES ==
-
-include::common.switches.txt[]
-
-== BUGS ==
-
-Event ID is not kept on target side. If needed is can be kept,
-then event_id seq at target side need to be increased by hand to
-inform ticker about new events.
-
diff --git a/doc/queue_splitter.txt b/doc/queue_splitter.txt
deleted file mode 100644
index 0d4cb3f2..00000000
--- a/doc/queue_splitter.txt
+++ /dev/null
@@ -1,99 +0,0 @@
-= queue_splitter(1) =
-
-== NAME ==
-
-queue_splitter - PgQ consumer that transports events from one queue into several target queues
-
-== SYNOPSIS ==
-
- queue_splitter.py [switches] config.ini
-
-== DESCRIPTION ==
-
-queue_spliter is PgQ consumer that transports events from source queue into
-several target queues. `ev_extra1` field in each event shows into which
-target queue it must go. (`pgq.logutriga()` puts there the table name.)
-
-One use case is to move events from OLTP database to batch processing server.
-By using queue spliter it is possible to move all kinds of events for batch
-processing with one consumer thus keeping OLTP database less crowded.
-
-== QUICK-START ==
-
-Basic queue_splitter setup and usage can be summarized by the following
-steps:
-
- 1. pgq must be installed both in source and target databases.
- See pgqadm man page for details. Target database must also
- have pgq_ext schema installed.
-
- 2. edit a queue_splitter configuration file, say queue_splitter_sourcedb_sourceq_targetdb.ini
-
- 3. create source and target queues
-
- $ pgqadm.py ticker.ini create <queue>
-
- 4. launch queue splitter in daemon mode
-
- $ queue_splitter.py queue_splitter_sourcedb_sourceq_targetdb.ini -d
-
- 5. start producing and consuming events
-
-== CONFIG ==
-
-include::common.config.txt[]
-
-=== queue_splitter parameters ===
-
-src_db::
- Source database.
-
-dst_db::
- Target database.
-
-=== Example config file ===
-
- [queue_splitter]
- job_name = queue_spliter_sourcedb_sourceq_targetdb
-
- src_db = dbname=sourcedb
- dst_db = dbname=targetdb
-
- pgq_queue_name = sourceq
-
- logfile = ~/log/%(job_name)s.log
- pidfile = ~/pid/%(job_name)s.pid
-
-== COMMAND LINE SWITCHES ==
-
-include::common.switches.txt[]
-
-== USECASE ==
-
-How to to process events created in secondary database
-with several queues but have only one queue in primary
-database. This also shows how to insert events into
-queues with regular SQL easily.
-
- CREATE SCHEMA queue;
- CREATE TABLE queue.event1 (
- -- this should correspond to event internal structure
- -- here you can put checks that correct data is put into queue
- id int4,
- name text,
- -- not needed, but good to have:
- primary key (id)
- );
- -- put data into queue in urlencoded format, skip actual insert
- CREATE TRIGGER redirect_queue1_trg BEFORE INSERT ON queue.event1
- FOR EACH ROW EXECUTE PROCEDURE pgq.logutriga('singlequeue', 'SKIP');
- -- repeat the above for event2
-
- -- now the data can be inserted:
- INSERT INTO queue.event1 (id, name) VALUES (1, 'user');
-
-If the queue_splitter is put on "singlequeue", it spreads the event
-on target to queues named "queue.event1", "queue.event2", etc.
-This keeps PgQ load on primary database minimal both CPU-wise
-and maintenance-wise.
-
diff --git a/doc/upgrade.txt b/doc/upgrade.txt
deleted file mode 100644
index 96e29086..00000000
--- a/doc/upgrade.txt
+++ /dev/null
@@ -1,52 +0,0 @@
-
-= Live londiste upgrade =
-
-It is possible to upgrade +londiste+ software without having to stop the service,
-how to do so depends on specifics versions you're upgrading from and to.
-
-This document list steps to follow once the new version of +skytools+ package is
-installed on the provider and subscriber hosts where to operate a live upgrade.
-
-As the +skytools+ software contains code which is run directly from inside the
-database server (+PostgreSQL+ functions), installing the new package version at
-the OS level is not enough to perform the upgrade.
-
-The following sections list the manual steps to perform in order to upgrade a
-running londiste and PgQ solution at the database side.
-
-
-== upgrading from 2.1.5 to 2.1.6 ==
-
-First, +PgQ+ software has to be upgraded. It often is installed on the provider
-side of your replication setup, and runs on the provider host.
-
-Then +londiste+ database functions have to be updated too, which are run on
-both provider and subscriber. You need to repeat the londiste steps for each and
-every running subscriber.
-
-
-=== PgQ upgrade ===
-
- * PgQ (used on Londiste provider side), table structure, plpgsql functions:
-
- $ psql dbname -f upgrade/final/v2.1.5.pgq_core.sql
-
- * PgQ new insert_event(), written in C:
-
- $ psql dbname -f sql/pgq/lowlevel/pgq_lowlevel.sql
-
- * PgQ new triggers (sqltriga, logtriga, logutriga), written in C:
-
- $ psql dbname -f sql/pgq/triggers/pgq_triggers.sql
-
-=== Londiste upgrade ===
-
- * Londiste (both provider and subscriber side)
-
- $ psql dbname -f upgrade/final/v2.1.5.londiste.sql
-
- * pgq_ext:
-
- $ psql dbname -f upgrade/final/v2.1.5.pgq_ext.sql
-
-