Londiste compare and repair fixes

Deprecated Londiste command line parameter --copy-condition. Londiste handler part.py must be used instead as it leaves configuration in maintainable state. Fixed Londiste compare and repair for one to many and many to one repliaction use cases. Now filtering condition is applied on both ends and datasets compared should be the same. Cleaned up database splitting howto.
author: Asko Oja 2012-10-08 14:38:10 +0000
committer: Asko Oja 2012-10-09 06:42:58 +0000
commit: ccfcf87df7fe0a123b563efec1639dec8367c2da (patch)
tree: bb76b6bc8648c7617d1d527bcc5b723704b29c8d /doc
parent: 7c85ff093c17a819f23ecd522b5614aaf753d7ee (diff)
1 files changed, 72 insertions, 86 deletions
diff --git a/doc/howto/londiste3_partitioning_howto.txt b/doc/howto/londiste3_partitioning_howto.txt
index ba83fabb..d206add1 100644
--- a/doc/howto/londiste3_partitioning_howto.txt
+++ b/doc/howto/londiste3_partitioning_howto.txt
@@ -9,51 +9,46 @@ The target databases will have `partconf` schema which is usually
 used to drive PL/Proxy.  Here it is used simply to provide
 configuration to `part` handler.
 
+== Prerequisites ==
+
+Obviously skytools must be installed but in addition we need pghashlib and 
+pgbench.
+
 == Setting up the Root Database ==
 
-=== Create database ===
+=== Create databases ===
+
+Create root database that will contain all data and two shard databases
 
 Run the following SQL: 
 ----
-CREATE DATABASE l3part_root;
+psql -c "CREATE DATABASE rootdb;"
+psql -c "CREATE DATABASE sharddb_0;"
+psql -c "CREATE DATABASE sharddb_1;"
 ----
 
+Deploy hash function everywhere. This is needed because internal hashtext
+function was changed between 8.3 and 8.4 versions and may be changed again
+in future withoud consideration for it's users.
+
+---
+psql rootdb < /usr/share/postgresql/8.4/contrib/hashlib.sql 
+psql sharddb_0 < /usr/share/postgresql/8.4/contrib/hashlib.sql 
+psql sharddb_1 < /usr/share/postgresql/8.4/contrib/hashlib.sql 
+---
+
 === Set up pgbench schema ===
 
 In this HowTo we are using pgbench for setting up the schema,
 populating it with sampledata and later running SQL loads to be replicated.
 
-
 This command will create pgbanch tables and fill them with data:
 ----
-/usr/lib/postgresql/9.1/bin/pgbench -i -s 2 -F 80 l3part_root
-----
-
-=== Change primary key columns to text ===
-
-Standard pgbench schema has integer primary key columns for its tables.
-The standard partitioning handler is able to partition only text columns,
-so we change the primary key column types to text
-
-
-----
-alter table pgbench_accounts alter column aid type text;
-alter table pgbench_branches alter column bid type text;
-alter table pgbench_tellers  alter column tid type text;
-----
-
-Now create the partition databases to replicate to.
-Each of these will get roughly half of the individual data rows.
-
-
-Create database for partition #0:
-----
-createdb l3part_part0;
+/usr/lib/postgresql/8.4/bin/pgbench -i -s 2 -F 80 rootdb
 ----
 
-And create a partition configuration table in this database
+Write partconf.sql that will be deployed to all db's
 ----
-
 CREATE SCHEMA partconf;
 CREATE TABLE partconf.conf (
     part_nr integer,
@@ -63,91 +58,85 @@ CREATE TABLE partconf.conf (
     max_slot integer,
     cluster_name text
 );
-insert into partconf.conf(part_nr, max_part) values(0,1);
 
+CREATE FUNCTION partconf.get_hash_raw
+( i_input integer
+) RETURNS integer
+AS
+$_$
+-- used to wrap hashtext so that we can replace it in 8.4 with 
+-- older implementation to keep compatibility
+select hash_string($1::text, 'lookup2');
+$_$ 
+LANGUAGE sql;
 ----
 
+Populate shard configuration tables. These values are used inside part.py
 
-Create database for partition #1:
 ----
-CREATE DATABASE l3part_part1;
-----
-
-
-----
-
-CREATE SCHEMA partconf;
-CREATE TABLE partconf.conf (
-    part_nr integer,
-    max_part integer,
-    db_code bigint,
-    is_primary boolean,
-    max_slot integer,
-    cluster_name text
-);
-insert into partconf.conf(part_nr, max_part) values(1,1);
+psql rootdb < partconf.sql
+psql sharddb_0 < partconf.sql
+psql sharddb_1 < partconf.sql
+psql sharddb_0 -c "insert into partconf.conf(part_nr, max_part) values(0,1);"
+psql sharddb_1 -c "insert into partconf.conf(part_nr, max_part) values(1,1);"
 ----
 
 Next create configuration files file for root node and both partitions
 
-st3partsplit/st3_l3part_root.ini
+st3partsplit/st3_rootdb.ini
 ----
 [londiste3]
-job_name = st3_l3part_root
-db = dbname=l3part_root
+job_name = st3_rootdb
+db = dbname=rootdb
 queue_name = replika
-logfile = st3partsplit/log/st3_l3part_root.log
-pidfile = st3partsplit/pid/st3_l3part_root.pid
+logfile = st3partsplit/log/st3_rootdb.log
+pidfile = st3partsplit/pid/st3_rootdb.pid
 ----
 
-st3partsplit/st3_l3part_part0.ini
+st3partsplit/st3_sharddb_0.ini
 ----
 [londiste3]
-job_name = st3_l3part_part0
-db = dbname=l3part_part0
+job_name = st3_sharddb_0
+db = dbname=sharddb_0
 queue_name = replika
-logfile = st3partsplit/log/st3_l3part_part0.log
-pidfile = st3partsplit/pid/st3_l3part_part0.pid
+logfile = st3partsplit/log/st3_sharddb_0.log
+pidfile = st3partsplit/pid/st3_sharddb_0.pid
 ----
 
-st3partsplit/st3_l3part_part1.ini
+st3partsplit/st3_sharddb_1.ini
 ----
 [londiste3]
-job_name = st3_l3part_part1
-db = dbname=l3part_part1
+job_name = st3_sharddb_1
+db = dbname=sharddb_1
 queue_name = replika
-logfile = st3partsplit/log/st3_l3part_part1.log
-pidfile = st3partsplit/pid/st3_l3part_part1.pid
+logfile = st3partsplit/log/st3_sharddb_1.log
+pidfile = st3partsplit/pid/st3_sharddb_1.pid
 ----
 
 Then create root node:
 
 ----
-londiste3 st3partsplit/st3_l3part_root.ini create-root node1 dbname=l3part_root
+londiste3 st3partsplit/st3_rootdb.ini create-root node1 dbname=rootdb
 ----
 
 And start the worker on root:
 ----
-londiste3 -d st3partsplit/st3_l3part_root.ini worker
+londiste3 -d st3partsplit/st3_rootdb.ini worker
 ----
 
-
-
 And create leaf nodes and start the workers on partitions :
 
 ----
-londiste3 st3partsplit/st3_l3part_part0.ini create-leaf node2_0 dbname=l3part_part0 --provider=dbname=l3part_root
-londiste3 -d st3partsplit/st3_l3part_part0.ini worker
+londiste3 st3partsplit/st3_sharddb_0.ini create-leaf node2_0 dbname=sharddb_0 --provider=dbname=rootdb
+londiste3 -d st3partsplit/st3_sharddb_0.ini worker
 ----
 
 Second node:
 ----
-londiste3 st3partsplit/st3_l3part_part1.ini create-leaf node2_1 dbname=l3part_part1 --provider=dbname=l3part_root
-londiste3 -d st3partsplit/st3_l3part_part1.ini worker
+londiste3 st3partsplit/st3_sharddb_1.ini create-leaf node2_1 dbname=sharddb_1 --provider=dbname=rootdb
+londiste3 -d st3partsplit/st3_sharddb_1.ini worker
 ----
 
-
-
 Create config file st3partsplit/pgqd.ini for `pgqd` ("the ticker")
 ----
 [pgqd]
@@ -157,14 +146,11 @@ pidfile = st3partsplit/pid/pgqd.pid
 
 ----
 
-
 Start the ticker process :
 ----
 pgqd -d st3partsplit/pgqd.ini
 ----
 
-
-
 Now add the replicated tables to root and partitions.
 Here we use `--create` switch to add them to partition,
 which means Londiste takes schema from root node and
@@ -177,31 +163,31 @@ the `--handler-arg=key=*id` specifyies which key field to partition on.
 
 Run command the following commands :
 ----
-londiste3 st3partsplit/st3_l3part_root.ini add-table pgbench_accounts --handler=part --handler-arg=key=aid
-londiste3 st3partsplit/st3_l3part_part0.ini add-table pgbench_accounts --create --handler=part --handler-arg=key=aid
-londiste3 st3partsplit/st3_l3part_part1.ini add-table pgbench_accounts --create --handler=part --handler-arg=key=aid
+londiste3 st3partsplit/st3_rootdb.ini add-table pgbench_accounts --handler=part --handler-arg=key=aid
+londiste3 st3partsplit/st3_sharddb_0.ini add-table pgbench_accounts --create --handler=part --handler-arg=key=aid
+londiste3 st3partsplit/st3_sharddb_1.ini add-table pgbench_accounts --create --handler=part --handler-arg=key=aid
 
-londiste3 st3partsplit/st3_l3part_root.ini add-table pgbench_branches --handler=part --handler-arg=key=bid
-londiste3 st3partsplit/st3_l3part_part0.ini add-table pgbench_branches --create --handler=part --handler-arg=key=bid
-londiste3 st3partsplit/st3_l3part_part1.ini add-table pgbench_branches --create --handler=part --handler-arg=key=bid
+londiste3 st3partsplit/st3_rootdb.ini add-table pgbench_branches --handler=part --handler-arg=key=bid
+londiste3 st3partsplit/st3_sharddb_0.ini add-table pgbench_branches --create --handler=part --handler-arg=key=bid
+londiste3 st3partsplit/st3_sharddb_1.ini add-table pgbench_branches --create --handler=part --handler-arg=key=bid
 
-londiste3 st3partsplit/st3_l3part_root.ini add-table pgbench_tellers --handler=part --handler-arg=key=tid
-londiste3 st3partsplit/st3_l3part_part0.ini add-table pgbench_tellers --create --handler=part --handler-arg=key=tid
-londiste3 st3partsplit/st3_l3part_part1.ini add-table pgbench_tellers --create --handler=part --handler-arg=key=tid
+londiste3 st3partsplit/st3_rootdb.ini add-table pgbench_tellers --handler=part --handler-arg=key=tid
+londiste3 st3partsplit/st3_sharddb_0.ini add-table pgbench_tellers --create --handler=part --handler-arg=key=tid
+londiste3 st3partsplit/st3_sharddb_1.ini add-table pgbench_tellers --create --handler=part --handler-arg=key=tid
 ----
 
 The following command will run pgbench full speed with 5 parallel
 database connections for 10 seconds.
 
 ----
-/usr/lib/postgresql/9.1/bin/pgbench -T 10 -c 5 l3part_root
+/usr/lib/postgresql/8.4/bin/pgbench -T 10 -c 5 rootdb
 ----
 
 After this is done, you can check that the tables on both sides hanve the same data with
 
 ----
-londiste3 st3partsplit/st3_l3part_part0.ini compare
-londiste3 st3partsplit/st3_l3part_part0.ini compare
+londiste3 st3partsplit/st3_sharddb_0.ini compare
+londiste3 st3partsplit/st3_sharddb_0.ini compare
 ----
 
 Except of course that they dont - each partition will only have roughly half
author	Asko Oja	2012-10-08 14:38:10 +0000
committer	Asko Oja	2012-10-09 06:42:58 +0000
commit	ccfcf87df7fe0a123b563efec1639dec8367c2da (patch)
tree	bb76b6bc8648c7617d1d527bcc5b723704b29c8d /doc
parent	7c85ff093c17a819f23ecd522b5614aaf753d7ee (diff)