doc/tutorial.txt


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266


= PL/Proxy tutorial =

This section explains how to use PL/Proxy to proxy queries across a set of 
remote databases. For purposes of this intro we assume 
each remote database has a "user" table that contains a username and an email 
column.

We also assume that the data is partitioned across remote databases by taking
a hash of the username and assigning it to one of 2 servers. Real applications 
should use a partitioning scheme that is appropriate to their requirements.

For the purposes of this example assume that the partition databases part00
and part01 both contain a table resembling

------------
CREATE TABLE users (
    username text,
    email text
);
------------


== Installation ==

 1. Download PL/Proxy from http://pgfoundry.org/projects/plproxy and extract.

 2. Build PL/Proxy by running `make` and `make install` inside of the plproxy 
    directory. If your having problems make sure that pg_config from the
    postgresql bin directory is in your path.

 3. To install PL/Proxy in a database execute the commands in the `plproxy.sql` 
    file.  For example `psql -f $SHAREDIR/contrib/plproxy.sql mydatabase`

Steps 1 and 2 can be skipped if your installed pl/proxy from a packaging system
such as RPM.


== Simple remote function call ==

Here we will create a plproxy function that will run on the proxy database which
will connect to a remote database named
'part00' and return a users email address.

This example  uses plproxy in CONNECT mode, it will 
connect to `dbname=part00` and run following SQL there:

------------
CREATE FUNCTION get_user_email(i_username text)
RETURNS SETOF text AS $$
    CONNECT 'dbname=part00';
    SELECT email FROM users WHERE username = $1;
$$ LANGUAGE plproxy;

SELECT * from get_user_email($1);
------------

The above example uses plproxy to proxy the query to the remote database but 
doesn't handle partitioning of data.  It assumes that the entire users table is 
in the remote users database.  The next few steps will describe how to partition 
data with PL/Proxy.


== Create  configuration functions ==

Using PL/Proxy for partitioning requires setting up some configuration functions.
Alternatively, if you are running PostgreSQL 8.4 or above you can take advantage
of the SQL/MED connection management facilities. See below.

When a query needs to be forwarded to a remote database the function
plproxy.get_cluster_partitions(cluster) is invoked by plproxy to get the
connection string to use for each partition. 

The following is an example 

------------
CREATE OR REPLACE FUNCTION plproxy.get_cluster_partitions(cluster_name text)
RETURNS SETOF text AS $$
BEGIN
    IF cluster_name = 'usercluster' THEN
        RETURN NEXT 'dbname=part00 host=127.0.0.1';
        RETURN NEXT 'dbname=part01 host=127.0.0.1';
        RETURN;
    END IF;
    RAISE EXCEPTION 'Unknown cluster';
END;
$$ LANGUAGE plpgsql;
------------
 
A production application might query some configuration tables to return the
connstrings. The number of partitions must be a power of 2.

Next define a plproxy.get_cluster_version(cluster_name) function.  This is 
called on each request and determines if the output from a cached
result from plproxy.get_cluster_partitions can be reused. 

------------
CREATE OR REPLACE FUNCTION plproxy.get_cluster_version(cluster_name text)
RETURNS int4 AS $$
BEGIN
    IF cluster_name = 'usercluster' THEN
        RETURN 1;
    END IF;
    RAISE EXCEPTION 'Unknown cluster';
END;
$$ LANGUAGE plpgsql;
------------

We also need to provide a plproxy.get_cluster_config function, ours will provide
a value for the connection lifetime.  See the configuration section for details 
on what this function can do. 

------------
CREATE OR REPLACE FUNCTION plproxy.get_cluster_config(
    in cluster_name text,
    out key text,
    out val text)
RETURNS SETOF record AS $$
BEGIN
    -- lets use same config for all clusters
    key := 'connection_lifetime';
    val := 30*60; -- 30m
    RETURN NEXT;
    RETURN;
END;
$$ LANGUAGE plpgsql;
------------

The config section contains more information on all of these functions.


== Configuring Pl/Proxy clusters with SQL/MED ==

First we need a foreign data wrapper. This is mostly a placeholder, but can
be extended with a validator function to verify the cluster definition.
See http://www.postgresql.org/docs/8.4/static/sql-createforeigndatawrapper.html
for additional details of how to manage the SQL/MED catalog.

------------
CREATE FOREIGN DATA WRAPPER plproxy;
------------

Then the actual cluster with its configuration options and partitions:

------------
CREATE SERVER usercluster FOREIGN DATA WRAPPER plproxy
OPTIONS (connection_lifetime '1800',
         p0 'dbname=part00 host=127.0.0.1',
         p1 'dbname=part01 host=127.0.0.1' );
------------

We also need a user mapping that maps local PostgreSQL users to remote
partitions. It is possible to create PUBLIC mapping that applies for
all users in the local system:

------------
CREATE USER MAPPING FOR PUBLIC SERVER usercluster;
------------

Or a private mapping that can only be used by specific users:

------------
CREATE USER MAPPING FOR bob SERVER usercluster 
  	OPTIONS (user 'plproxy', password 'salakala');
------------

Finally we need to grant USAGE on the cluster to specific users:

------------
GRANT USAGE ON SERVER usercluster TO bob;
------------


== Partitioned remote call ==

Here we assume that the user table is spread over several databases based
on a hash of the username. The connection string for the partitioned databases 
are contained in the get_cluster_partitions function described above.

Below is a get_user_email function that is executed on the proxy server,which
will make a remote connection to the appropriate partitioned database. The
user's email address will be returned.

This function should be created in the proxy database.

------------
CREATE OR REPLACE FUNCTION get_user_email(i_username text)
RETURNS SETOF text AS $$
    CLUSTER 'usercluster';
    RUN ON hashtext(i_username) ;
    SELECT email FROM users WHERE username = i_username;
$$ LANGUAGE plproxy;
------------


== Inserting into the proper partition ==

Next we provide a simple INSERT function.  

Inserting data through plproxy requires functions to be defined on the proxy 
databases that will perform the insert.

We define this function on both part00 and part01

------------
CREATE OR REPLACE FUNCTION insert_user(i_username text, i_emailaddress text)
RETURNS integer AS $$
       INSERT INTO users (username, email) VALUES ($1,$2);
       SELECT 1;
$$ LANGUAGE SQL;
------------

Now we define a proxy function inside the proxy database to send the 
INSERT's to the appropriate target.


------------
CREATE OR REPLACE FUNCTION insert_user(i_username text, i_emailaddress text)
RETURNS integer AS $$
    CLUSTER 'usercluster';
    RUN ON hashtext(i_username);
$$ LANGUAGE plproxy;
------------


== Putting it all together ==

Connect to the proxy database (The one we installed plproxy and the plproxy
functions on).

------------
SELECT insert_user('Sven','sven@somewhere.com');
SELECT insert_user('Marko', 'marko@somewhere.com');
SELECT insert_user('Steve','steve@somewhere.cm');
------------

Now connect to the plproxy_1 and plproxy_2 databases.  Sven and Marko should be
in plproxy_2 , and Steve should be in plproxy_1.

When connected to the proxy user you can obtain data by doing

------------
SELECT get_user_email('Sven');
SELECT get_user_email('Marko');
SELECT get_user_email('Steve');
------------


== Connection pooling ==

When used in partitioned setup, PL/Proxy somewhat wastes connections
as it opens connection to each partition from each backend process.
So it's good idea to use a pooler that can take queries from several
connections and funnel them via smaller number of connections to actual
database.  We use and recommend
http://wiki.postgresql.org/wiki/PgBouncer[PgBouncer]
for that.


== More resources ==

Kristo Kaiv has his own take on tutorial here:
http://kaiv.wordpress.com/category/plproxy/[]