--- /dev/null
+= PL/Proxy FAQ =
+
+== General ==
+
+=== What is PL/Proxy? ===
+
+PL/Proxy is compact language for remote calls between PostgreSQL
+databases. Syntax is similar to PL/pgSql and language contains only 4
+statements.
+
+With PL/Proxy user can create proxy functions that have same signature
+as remote functions to be called. The function body describes
+how the remote connection should be acquired.
+
+When such proxy function is called, PL/Proxy:
+
+ 1. Automatically generates the SQL query to be executed from function signature
+ 2. Executes statements in function body to get the connection
+ 3. Uses function arguments as input parameters to query
+ 4. Passes the query result back as function result
+
+
+=== Why functions? ===
+
+Concentrating on just function-calls allows PL/Proxy to keep
+its code small and also to present user simple and compact API.
+
+- The actual query run on remote database can be generated
+ based on plproxy function signature. User just needs to
+ specify how the connection to remote database must be acquired.
+
+- There is no need for complex transaction handling as any
+ multi-statement transactions can be put into functions.
+ PL/Proxy can just execute all queries in 'autocommit' mode.
+
+- Simple autocommit transactions mean that the connection
+ handling is simple and can be done automatically.
+
+Using function-based database access has more general good points:
+
+- It's good to have SQL statements that operate on data
+ near to tables. That makes life of DBA's easier.
+- It makes it possible to optimize and reorganize tables
+ transparently to the application.
+- Enables DBA's to move parts of database into another
+ database without changing application interface.
+- Easier to manage security if you don't have to do it on
+ table level. In most cases you need to control what user
+ can do and on which data not on what tables.
+- All transactions can be made in 'autocommit' mode.
+ That means absolutely minimal amount of roundtrips (1)
+ for each query and also each transaction takes shortest
+ possible amount of time on server - remember that various
+ locks that transactions aquire are released on COMMIT.
+
+
+=== Why not develop it into Remote Parallel PL/SQL? ===
+
+Huge benefit of PL/Proxy is it's compactness and efficiency.
+As it does not need to parse queries going through it adds very
+little overhead.
+
+Making it full-blown language for SQL execution would mean
+reimplementing PL/pgSQL, PL/Perl, parts of pgpool and more,
+which is waste of effort.
+
+Also when plproxy functions mirror actual functions, the
+PL/Proxy becomes optional component of setup - the client
+apps can bypass PL/Proxy and work directly on actual database.
+
+This is good for testing and also live deployment - we let
+clients work on smaller databases directly, they are put
+behind PL/Proxy only when load gets too high and we need
+to partition a database.
+
+
+=== What can PL/Proxy be used for? ===
+
+- Remote calls from one database to another either used inside SQL or other procedures.
+ (If used as part of local transaction need to make sure only
+ one side is allowed to write to database, PL/Proxy does not
+ guarantee transactionality between 2 databases.)
+- Proxy databases for better security and data protection.
+- Proxy databases for hiding complexity of databases from application,
+ eg. if you have functions distributed between several databases
+- Horizontal partitioning. Instead of buying more powerful servers you can
+ split your data between several servers and then use PL/Proxy to redirect
+ function calls into right partitions.
+- Load balancing if you have several read only replicas of your data.
+
+
+=== How does it compare to dblink? ===
+
+- PL/Proxy handles connections automatically, dblink forces user to handle them.
+
+- PL/Proxy has single place where result column types are specified - function signature.
+ dblink requires them to be specified in each query.
+
+- PL/Proxy makes easy to run a query in several remote servers in parallel.
+ Seems that dblink async API makes that also possible, but the usage is complex.
+
+- dblink allows arbitrary complex transactions, PL/Proxy runs everything
+ in autocommit mode. As previourly discussed, when using functions the
+ complex transactions are not needed and with such tradeoff PL/Proxy can
+ offer much simpler API.
+
+
+=== How are PL/Proxy and PgBouncer related? ===
+
+PL/Proxy version 1 had PL and pooler integrated. But such design
+caused a lot of unnecessary complexity. With PL/Proxy version 2,
+we wrote both pooler and PL part from scratch, both designed
+to be standalone components.
+
+That allowed both components to be tested and used separately
+and resulted in compact and robust codebase.
+
+So PgBouncer can be used with PL/Proxy to lessen connection count
+on partition server, but such usage is not mandatory.
+
+
+== Internals ==
+
+=== What are the external dependencies? ===
+
+It depends only on libpq and poll(2) + gettimeofday(2) system calls.
+So it should be quite portable.
+
+
+=== How the remote calls are done? ===
+
+First a SELECT query is generated based on PL/Proxy function
+signature.
+
+A function signature of:
+
+ CREATE FUNCTION get_data(IN first_name text, IN last_name text, OUT bdate date, OUT balance numeric(20,10))
+
+Results in following query:
+
+ SELECT bdate::date, balance::numeric(20,10) FROM public.get_data($1::text, $2::text);
+
+The casts and explicit `OUT` parameter names are used to survive minor type or
+result column order differences between local and remote databases.
+
+Then the `CLUSTER` statement is processed, optionally
+executing function. This result in cluster name.
+
+Then `plproxy.get_cluster_version(<cluster_name>)`
+is executed. This gives numeric version number for cluster.
+If resulting version number differs from version
+in cached cluster, the cache is dropped.
+
+If cluster information in not cached, the `plproxy.get_cluster_partitions()`
+function is executed, resulting in list of connect strings for that cluster.
+
+Then `RUN` statement is processed, optionally executing
+function if requested. This will tag one or more connections
+in cluster to be used for query execution.
+
+Then the query is sent to remote server using libpq async API.
+If there are several remote connections tagged, the execution
+will happen in parallel. PL/Proxy then waits until it has
+acquired resultsets from all connections and then returns
+them to local backend.
+
+
+=== How does PL/Proxy handle connections? ===
+
+It opens them lazily - only when needed. Then keeps them
+open until it libpq reports error on it or connection
+lifetime is over - which is by default 2h.
+
+There is a safety hack used - before sending query
+to already open connection a poll(2) call is run on
+connection. If poll() shows events the connection
+is dropped to avoid use of likely broken connection.
+
+
+=== Can PL/Proxy survive different settings in local and remote database? ===
+
+client_encoding::
+ If it differs, PL/Proxy sets the `client_encoding` on remote database
+ to be equal to local one.
+
+standard_conforming_strings::
+ Query parameters are passed separately, so in general the difference
+ should not matter. Except when function uses explicit SELECT
+ and it contains literal strings. Fix is to avoid use of SELECT.
+
+datestyle, timezone::
+ Currently no handling is done.
+
+Rest of parameters::
+ Cannot be handled.
+
+
+=== Why does PL/Proxy require the number of partition be power of 2? ===
+
+There is no deep reason, mostly because of following points:
+
+- To have minimal sanity-checking on the output of get_cluster_partitions().
+- To have clear way to map hashes to partition. As users quite
+ likely need to write their own code for splitting and sanity checking
+ their data, the algorithm should be as simple as possible.
+
+It would be easy to use mod N internally, but:
+
+- We would lose the sanity checking.
+- We would need to define mod function for negative integers that
+ maps to positive range. This sounds like a source for confusion and bugs.
+
+So it seems it's preferable to keep the power-of-2 requirement.
+
+This may seem to require that the number of servers be also power of 2,
+but this is not so - to make future administration easier it is
+always preferable to split database into more parts than you
+immediately need. Such splitting also overcomes the power-of-2
+requirement.
+
+For example, if user needs to spread the load over 3 servers,
+the database can be split to 16 partitions and then 2 servers
+get 5 partitions and last one 6.
+
+
+== Partitioning ==
+
+=== How to partition data? ===
+
+There are several usage patterns how PL/Proxy can be used
+to distribute load on several servers
+
+- Vertical partitioning. Data is divided into separate servers table by table
+ and PL/Proxy calls are used to direct calls to right databases. In some cases
+ wrapper functions that do several remote calls into other databases are needed.
+- Horizontal partitioning. Using hashtext function any field can be
+ converted into integer. In simpler case you can use just your id field.
+ Number of partitions must be power of two in cluster and PL/Proxy uses
+ bitwise and to get number of partition from given integer.
+- Two-level vertical partitioning. PL/Proxy allows the cluster name also
+ be calculated on function arguments. So it is possible to dedicate
+ different clusters to different categories or one cluster to read-queries,
+ second cluster to write-queries and then do the usual hash-based
+ partitioning inside clusters.
+- Read only replicas. Load can be divided on read only replicas. You can define
+ cluster to have more partitions in cluster that you have actual databases and
+ use repeating connect strings as weights on servers.
+
+In many of these scenarios good replication software like Londiste from SkyTools
+is handy.
+
+
+=== How to spread single large query over several partitions? ===
+
+If each partition holds only part of the total data this
+happens automatically - just use RUN ON ALL.
+
+If the partitions are copies of each other or the query does
+not follow the split pattern for some other reason, it will
+be bit more tricky. Best way would be to assign each partition
+number and later pass an array of parameters to RUN ON ALL;
+query where each partition picks it's values to work on.
+The values can be even actual SQL queries, giving maximum
+flexibility on whats possible to do.
+
+There is a preliminary design for feature that lets user
+run a query on different partitions with different parameters:
+http://lists.pgfoundry.org/pipermail/plproxy-users/2008-June/000093.html[]
+
+Whether this gets implemented depends if there is any *actual*
+use-cases for this and whether there are any developers interested
+in working on the feature.
+
+
+=== How to do aggregated queries? ===
+
+Aggregation needs to happen in 3 steps:
+
+1. Function on partition that does per-partition aggregation.
+
+2. PL/Proxy function that collects the result of per-partition aggregation.
+ It will return a row for each partition.
+
+3. Top-level aggregation that does the final aggregation on the
+ resultset of PL/Proxy function. A regular PL/pgSQL function
+ can be used or this can be done outside database by client application.
+
+Note: some of the aggregations cannot be done naively - eg. `avg()`.
+Instead each partition must do `sum() + count()` and the top-level
+aggregator calculates actual average.
+
+
+=== How to add partitions? ===
+
+The simple way would be to collect data from all partitions
+together then split it again to new partitions. But that
+is a waste of resources.
+
+Few things to keep in mind to make the addition easier:
+
+- Always partition data to more pieces that you actually need.
+ Eg. if you think 2 servers would handle the load, then
+ do the split into 8 partitions, keeping 4 of them
+ on single server. That way when load grows you just
+ need to move databases to separate server, not rehash
+ your data. That also allows you to load-balance between
+ servers with inequal power - keep more partitions on server
+ that has more power.
+
+- Split one partition at a time, splitting it to 2 (preferably 4 or 8).
+ You just need to keep duplicate entries in partition list
+ for partitions that are not split yet.
+
+
+=== Can I have foreign keys on my data? ===
+
+Yes, unless the data you want to partition on references
+itself.
+
+Another common scenario is that there are some big data
+tables that user wants to partition but they reference
+various smaller common tables that are not partitionable.
+In such situation the common tables should be managed
+from single external database and replicated to each
+partition. That gives single place to manipulate data
+and correct transactionality when spreading data out.
+
+
+=== What happens if I do updates in remote database? ===
+
+PL/Proxy is in autocommit mode so if remote function succeeds then changes are
+automatically committed at once. Special handling is needed if updates are done
+in both databases. If remote call fails both are rolled back but if remote call
+succeeds and local updates fail then only local updates are rolled back.
+Usually PgQ based solutions are used in these situations.
+
+
+=== How to handle sequences? ===
+
+Best way is to use separate ranges for each partition.
+
+In our case, no code uses serials directly, instead they
+use wrapper function that combines unique ID each database
+has and plain sequence. That way we don't need to manage
+sequences explicitly, instead only thing we need to do
+is to assign each database unique ID.
+
+
+// todo
+// === How to handle queries that don't follow partitioning? ===
+
+