summaryrefslogtreecommitdiff
path: root/doc/syntax.txt
blob: 71ab3268bbfdb1b22e625db802791eafad6021f1 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255

= PL/Proxy Language Syntax =

The language is similar to plpgsql - string quoting, comments,
semicolon at the statements end.

It contains only 4 statements: `CONNECT`, `CLUSTER`, `RUN` and `SELECT`.

Each function needs to have either `CONNECT` or pair of `CLUSTER` + `RUN` statements
to specify where to run the function.

The `SELECT` statement is optional, if it is missing, there will be default
query generated based on proxy function signature.

The `RUN` statment is also optional, it defaults to `RUN ON ANY`
which means the query will be run random partition.

== CONNECT ==

--------
CONNECT 'libpq connstr';
--------

Specifies exact location where to connect and execute the query.
If several functions have same connstr, they will use same connection.

--------
CONNECT connect_func(...);
CONNECT argname;
CONNECT $argnr;
--------

Connect string is taken from function result or directly from argument.
If several functions have same connstr, they will use same connection.
_(New in 2.0.9)_

*NB*: giving untrusted users ability to specify full connect string creates
security hole.  Eg it can used to read cleartext passwords from `~/.pgpass`
or `pg_service.conf`.  If such function cannot be avoided, it's access rights
need to be restricted.


== CLUSTER ==

--------
CLUSTER 'cluster_name';
--------

Specifies exact cluster name to be run on.  The cluster name will
be passed to plproxy.get_cluster_* functions.
 
--------
CLUSTER cluster_func(..);
--------

Cluster name can be dynamically decided upon proxy function arguments.
`cluster_func` should return text value of final cluster name.

== RUN ON ==

--------
RUN ON ALL;
--------

Query will be run on all partitions in cluster in parallel.

--------
RUN ON ANY;
--------

Query will be run on random partition.

--------
RUN ON <NR>;
--------

Run on partition number `<NR>`.

--------
RUN ON partition_func(..);
--------

Run partition_func() which should return one or more hash values.  (int4)
query will be run on tagged partitions.  If more than one partition was
tagged, query will be sent in parallel to them.

--------
RUN ON argname;
RUN ON $1;
--------

Take hash value directly from function argument.  _(New in 2.0.8)_


== SPLIT ==

--------
SPLIT array_arg_1 [ , array_arg_2 ... ] ;
SPLIT ALL ;
--------

Split the input arrays based on RUN ON statement into per-partition arrays.
This is done by evaluating RUN ON condition for each array element and building
per-partition parameter arrays for each matching partition. During execution
each tagged partition then gets its own subset of the array to process.
_(New in 2.1)_

The semantics of RUN ON statement is slightly changed with SPLIT arrays:

--------
RUN ON partition_func(..);
--------

The array is split between the partitions matching `partition_func()`. Any
SPLIT parameters passed to the function are actually replaced with the
individual array elements.

--------
RUN ON argname;
RUN ON $1;
--------

An array of partition numbers (or hashes) can be passed as `argname`. The function
shall be run on the partitions specified in the array.

--------
RUN ON ANY;
--------

Each element is assigned to random partition.

--------
RUN ON ALL;
RUN ON <NR>;
--------

Unaffected, except for the added overhead of array copying.

Example:

--------
CREATE FUNCTION set_profiles(i_users text[], i_profiles text[])
RETURNS SETOF text AS $$
    CLUSTER 'userdb';
    SPLIT i_users, i_profiles;
    RUN ON hashtext(i_users);
$$ LANGUAGE plproxy;
--------

Given query:

--------
SELECT * FROM set_profiles(ARRAY['foo', 'bar'], ARRAY['a', 'b']);
--------

The hash function is called 2 times:

--------
SELECT * FROM hashtext('foo');
SELECT * FROM hashtext('bar');
--------

And target partitions get queries:

--------
SELECT * FROM set_profiles(ARRAY['foo'], ARRAY['a']);
SELECT * FROM set_profiles(ARRAY['bar'], ARRAY['b']);
--------


== TARGET ==

Specify function name on remote side to be called.  By default
PL/Proxy uses current function name.  _(New in 2.2)_

Following function:

--------
CREATE FUNCTION some_function(username text, num int4)
RETURNS SETOF text AS $$
    CLUSTER 'userdb';
    RUN ON hashtext(username);
    TARGET other_function;
$$ LANGUAGE plproxy;
--------

will run following query on remote side:

--------
SELECT * FROM other_function(username, num);
--------

== SELECT ==

--------
SELECT .... ;
--------

By default, PL/Proxy generates query based on its own signature.
But this can be overrided by giving explicit `SELECT` statement to run.

Everything after `SELECT` until semicolon is taken as SQL to be passed on.
Only argument substitution is done on the contents, otherwise the text
is unparsed.  To avoid a table column to be parsed as function argument,
table aliases should be used.

Query result should have same number of columns as function result
and same names too.

== Argument substitution ==

Proxy function arguments can be referenced using name or `$n` syntax.
Everything that is not argument reference is just passed on.

== Dynamic records ==

PL/Proxy supports function returning plain RECORD type.  Such functions
need the result type specified at call site.  Main use-case is to run
random queries on partitions. _(New in 2.0.6)_

Very simple example:

--------
CREATE OR REPLACE FUNCTION dynamic_query(q text)
RETURNS SETOF RECORD AS $$
    CLUSTER 'mycluster';
    RUN ON ALL;
$$ LANGUAGE plproxy;
--------

Corresponding function in partitions:

--------
CREATE OR REPLACE FUNCTION dynamic_query(sql text)
RETURNS SETOF RECORD AS $$
DECLARE
    rec RECORD;
BEGIN
    FOR rec IN EXECUTE sql
    LOOP
        RETURN NEXT rec;
    END LOOP;
    RETURN;
END;
$$ LANGUAGE plpgsql;
--------

Sample request:

--------
SELECT * FROM dynamic_query('SELECT id, username FROM sometable') AS (id integer, username text);
--------

The types given in AS clause must match actual types from query.