doc/howto/londiste3_merge_howto.txt


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217


= How To Set Up "Merge" Replication of the Same Table from Multiple Partitions =

== Introduction ==

In this howto we will set up a replication scheme, where data is collected
from multiple "partition" databases into a single table on "full" database.

This situation is common when using PL/Proxy or other similar partitioning
solution for OLTP, but still wanting the data in one table for Data Warehousing.

Here we will demonstrate the simplest possible setup with 2 partition databases
(`part1` and `part2`) replicating their data into one full database `full1`.

=== Setting the nodes ===

==== Partition databases ====

On both partition databases you need to set up a londiste3 "root" node to
replicate from. We will use the the following .ini files for that

for database part1 `conf/l3_part1_q_part1.ini`:

----
[londiste3]
job_name = l3_part1_q_part1
db = dbname=part1
queue_name = l3_part1_q
logfile = log/%(job_name)s.log
pidfile = pid/%(job_name)s.pid
----

and for part2 `conf/l3_part2_q_part2.ini`:

----
[londiste3]
job_name = l3_part2_q_part2
db = dbname=part2
queue_name = l3_part2_q
logfile = log/%(job_name)s.log
pidfile = pid/%(job_name)s.pid
----

These ini files are then used for setting up the nodes and adding tables to root node.

Set up the root nodes on part1 and part2:

----
$ londiste3 -v conf/l3_part1_q_part1.ini create-root part1_root dbname=part1
$ londiste3 -v conf/l3_part2_q_part2.ini create-root part2_root dbname=part2
----

==== Full database ====

On the full database, which will hold data from both partitions you need to
set up two londiste nodes, one for each of the partition nodes.
These will act as the receiving nodes to replicate to.

These look very similar and differ only in queue name:

File `conf/l3_part1_q_full1.ini`:
----
[londiste3]
job_name = l3_part1_q_full1
db = dbname=full1
queue_name = l3_part1_q
logfile = log/%(job_name)s.log
pidfile = pid/%(job_name)s.pid
----

File `conf/l3_part2_q_full1.ini`:
----
[londiste3]
job_name = l3_part2_q_full1
db = dbname=full1
queue_name = l3_part2_q
logfile = log/%(job_name)s.log
pidfile = pid/%(job_name)s.pid
----

These are first used to set up the leaf nodes:

----
$ londiste3 -v conf/l3_part1_q_full1.ini create-leaf merge_part1_full1 dbname=full1 --provider=dbname=part1
$ londiste3 -v conf/l3_part2_q_full1.ini create-leaf merge_part2_full1 dbname=full1 --provider=dbname=part2
----

And later also for launching the replication worker daemons.

But before launching the workers you need to start the pgqd or the "ticker daemon":

----
$ pgqd -v -d conf/pgqd.ini
----

The `conf/pgqd.ini` file for the command above looks like this:

----
[pgqd]
database_list = part1,part2,full1
logfile = log/pgqd.log
pidfile = pid/pgqd.pid
----

Now that the ticker is running, it's time to launch londiste3 workers which will
do the actual replication:

----
$ londiste3 -v -d conf/l3_part1_q_full1.ini worker
$ londiste3 -v -d conf/l3_part2_q_full1.ini worker
----

=== Setting up the tables ===

In order to have something to replicate, we need some tables, so let's create
them on partition nodes:

----
$ psql -d "part1" -c "create table mydata (id int4 primary key, data text)"
$ psql -d "part2" -c "create table mydata (id int4 primary key, data text)"
----

And then add them to set of replicated tables on the root node:

----
$ londiste3 -v conf/l3_part1_q_part1.ini add-table mydata
$ londiste3 -v conf/l3_part2_q_part2.ini add-table mydata
----

Now we need some data in these tables, as replicating empty tables is no fun:

----
$ psql -d "part1" -c "insert into mydata values (1, 'part1')"
$ psql -d "part2" -c "insert into mydata values (2, 'part2')"
----

We can check that the tables are actually registered for replication in londiste:

----
$ psql -d "full1" -c "select * from londiste.table_info order by queue_name"
 nr | queue_name |  table_name   | local | merge_state | custom_snapshot | dropped_ddl | table_attrs | dest_table
----+------------+---------------+-------+-------------+-----------------+-------------+-------------+------------
  1 | l3_part1_q | public.mydata | f     |             |                 |             |             |
  2 | l3_part2_q | public.mydata | f     |             |                 |             |             |
(2 rows)
----

Now let's subscribe them on full database. As the table is not yet created on
full1, we specify `--create` so londiste creates the table on leaf node based on
structure that is on root. The switch `--merge-all` tells londiste to add the table to all
queues which have it on root side, not just the one from the .ini file.

----
$ londiste3 -v conf/l3_part1_q_full1.ini add-table mydata --create --merge-all
----

And yes, there it is, subscribed from both queues:
    
----
$ psql -d "full1" -c "select * from londiste.table_info order by queue_name"
 nr | queue_name |  table_name   | local | merge_state | custom_snapshot | dropped_ddl | table_attrs | dest_table
----+------------+---------------+-------+-------------+-----------------+-------------+-------------+------------
  1 | l3_part1_q | public.mydata | t     |             |                 |             |             |
  2 | l3_part2_q | public.mydata | t     |             |                 |             |             |
(2 rows)
----

Now we can put more data to partition tables:

----
$ psql -d "part1" -c "insert into mydata values (4 + 1, 'part1')"
$ psql -d "part2" -c "insert into mydata values (4 + 2, 'part2')"
----

Wait a few seconds:
    
----
$ sleep 10
----

And check that the data has indeed appeared on full database:

----
$ psql -d "full1" -c "select * from mydata order by id"
 id | data
----+-------
  1 | part1
  2 | part2
  5 | part1
  6 | part2
(4 rows)
----

The rows with ids 1 and 2 where replicated during initial copy, the ones with
5 and 6 were captured by triggers into event log on partition database and then
replicated to full1 using the standard replication process.

=== checking subscription ===

Just to check if we really did achieve what we wanted, we see which tables
are present and fully subscribed ('ok'):

----
$ psql -d "full1" -c "select * from londiste.table_info order by queue_name"
 nr | queue_name |  table_name   | local | merge_state | custom_snapshot | dropped_ddl | table_attrs | dest_table
----+------------+---------------+-------+-------------+-----------------+-------------+-------------+------------
  1 | l3_part1_q | public.mydata | t     | ok          |                 |             |             |
  2 | l3_part2_q | public.mydata | t     | ok          |                 |             |             |
(2 rows)
----

Ok, here we have the table public.mydata subscribed from 2 queues and its
merge_state is 'ok', meaning the initial copy process has been successfull.


That's it, we have successfully set up replication from two partition
databases to one single full database.