Server Setup and Operation This chapter discusses how to set up and run the Pgpool-II server and its interactions with the operating system. Managing kernel resouces kernel resouces To run Pgpool-II System V shared memory and System V semaphore are necessary. Although on linux enough amount of both resources are provided, other platforms may need to configure the kernel parameters. The minimum requirement of System V shared memory is around 140 MB. If you plan to use more than default 64MB of query cache, you may need more shared memory. Number of necessary shared memory segments is 3. Usually the kernel provides enough shared memory segments. The minimum requirement of System V semaphore is around 10. Number of necessary semaphore sets is 3. Usually the kernel provides enough semaphore sets. The <productname>Pgpool-II</productname> User Account Pgpool-II user As with any server daemon that is accessible to the outside world, it is advisable to run Pgpool-II under a separate user account. This user account should only own the data that is managed by the server, and should not be shared with other daemons. (For example, using the user nobody is a bad idea.) It is not advisable to install executables owned by this user because compromised systems could then modify their own binaries. To add a Unix user account to your system, look for a command useradd or adduser. The user name pgpool is often used, and is assumed throughout this book, but you can use another name if you like. Configuring pcp.conf pcp configuration Pgpool-II provides a interface for administrators to perform management operation, such as getting Pgpool-II status or terminating Pgpool-II processes remotely. pcp.conf is the user/password file used for authentication by this interface. All operation modes require the pcp.conf file to be set. A $prefix/etc/pcp.conf.sample file is created during the installation of Pgpool-II. Copy the file as $prefix/etc/pcp.conf and add your user name and password to it. $ cp $prefix/etc/pcp.conf.sample $prefix/etc/pcp.conf An empty line or a line starting with # is treated as a comment and will be ignored. A user name and its associated password must be written as one line using the following format: username:[md5 encrypted password] [md5 encrypted password] can be produced with the $prefix/bin/pg_md5 command. $ pg_md5 your_password 1060b7b46a3bd36b3a0d66e0127d0517 If you don't want pass the password as the argument, execute pg_md5 -p. $ pg_md5 -p password: your_password The pcp.conf file must be readable by the user who executes Pgpool-II. Configuring Pgpool-II Pgpool-II configuration Configuring pgpool.conf pgpool.conf is the main configuration file of Pgpool-II. You need to specify the path to the file when starting Pgpool-II using option. pgpool.conf is located at $prefix/etc/pgpool.conf by default, if it installed from source code. To specify the Pgpool-II clustering mode, set parameter to the value explained below. <xref linkend="guc-backend-clustering-mode"> value in pgpool.conf Clustering mode value Streaming replication mode streaming_replication Replication mode native_replication Logical replication mode logical_replication Slony mode slony Snapshot isolation mode snapshot_isolation Raw mode raw
These configuration files are located at /usr/local/etc with default installation from source code. You can copy one of them as pgpool.conf. (probably you need root privilege for this) # cd /usr/local/etc # cp pgpool.conf.sample pgpool.conf
Clustering mode of Pgpool-II streaming replication mode native replication mode main replica mode logical replication mode snapshot isolation mode There are six different clustering modes in Pgpool-II: streaming replication mode, logical replication mode, main replica mode (slony mode), native replication mode, raw mode and snapshot isolation mode. In any mode, Pgpool-II provides connection pooling, and automatic fail over. Online recovery can be used only with streaming replication mode, snapshot isolation mode and native replication mode. See for more details of online recovery. Those modes are exclusive each other and cannot be changed after starting the server. You should make a decision which to use in the early stage of designing the system. If you are not sure, it is recommended to use the streaming replication mode or the snapshot isolation mode. The streaming replication mode can be used with PostgreSQL servers operating streaming replication. In this mode, PostgreSQL is responsible for synchronizing databases. This mode is widely used and most recommended way to use Pgpool-II. Load balancing is possible in the mode. visibility consistency among nodes is not guaranteed. In the snapshot isolation mode Pgpool-II is responsible for synchronizing databases. The advantage for the mode is the synchronization is done in synchronous way: writing to the database does not return until all of PostgreSQL servers finish the write operation. Also it guarantees the visibility consistency among nodes. To put it simply, it means that the visibility rule of transactions on single server is applied to a cluster consisting of multiple servers as well. This is a remarkable feature of the snapshot isolation mode in Pgpool-II. In fact, the snapshot isolation mode in Pgpool-II is the only system which guarantees the visibility consistency among nodes without modifications to PostgreSQL at the moment. Because of this, applications do not need to recognize that they are using a cluster consisting of PostgreSQL servers, rather than a single PostgreSQL system. However in this mode the transaction isolation level must be REPEATABLE READ. You need to set postgresql.conf like this: default_transaction_isolation = 'repeatable read' Also you need to aware that performance in the mode may be worse than the streaming replication mode and native replication mode due to the overhead to keep the consistency in transactions. In the native replication mode, Pgpool-II is responsible for synchronizing databases. The advantage for the mode is the synchronization is done in synchronous way: writing to the database does not return until all of PostgreSQL servers finish the write operation. As visibility consistency among nodes is not guaranteed, it is recommended to use the snapshot isolation mode except you want to use other than REPEATABLE READ isolation mode. Load balancing is possible in the mode. The logical replication mode can be used with PostgreSQL servers operating logical replication. In this mode, PostgreSQL is responsible for synchronizing tables. Load balancing is possible in the mode. Since logical replication does not replicate all tables, it's user's responsibility to replicate the table which could be load balanced. Pgpool-II load balances all tables. This means that if a table is not replicated, Pgpool-II may lookup outdated tables in the subscriber side. The main replica mode (slony mode) can be used with PostgreSQL servers operating Slony. In this mode, Slony/PostgreSQL is responsible for synchronizing databases. Since Slony-I is being obsoleted by streaming replication, we do not recommend to use this mode unless you have specific reason to use Slony. Load balancing is possible in the mode. In the raw mode, Pgpool-II does not care about the database synchronization. It's user's responsibility to make the whole system does a meaningful thing. Load balancing is not possible in the mode. Process management modes dynamic process management static process management Pgpool-II implements a multi-process architecture where each child process can handle exactly one client connection at any time. The total number of concurrent client connections Pgpool-II can handle is configured by the num_init_children config parameter. Pgpool-II supports two child process management modes. Dynamic and Static. In static process management mode, Pgpool-II pre-forks the num_init_children number of child process at startup, and each child process keeps listening for incoming client connection. While with dynamic process management mode, Pgpool-II keeps track of idle processes and forks or kills processes to keep this number within the specified boundaries. is not available prior to Pgpool-II V4.4.
Configuring backend information For Pgpool-II to recognize PostgreSQL backend servers, you need to configure backend* in pgpool.conf. For starters, at least and parameters are required to be set up to start Pgpool-II server. Backend Settings Backend PostgreSQL used by Pgpool-II must be specified in pgpool.conf. See Starting Pgpool-II and PostgreSQL To start Pgpool-II, execute: $ pgpool -f /usr/local/etc/pgpool.conf -F /usr/local/etc/pcp.conf which will start the server running in the background. "-f" specifies the path to the main pgpool configuration file and "-F" specifies the path to the configuration file of pcp server, which is the control server for Pgpool-II. For other options of the command please take a look at manual. Before starting Pgpool-II, you must start PostgreSQL because if PostgreSQL has not started yet, Pgpool-II triggers failover process and makes PostgreSQL is in down status. If you have difficulty in controlling the startup sequence of PostgreSQL, for example Pgpool-II and PostgreSQL are installed on different servers, you can make longer (the default is 5 minutes) so that Pgpool-II waits for PostgreSQL starts up until expires. If PostgreSQL starts up before expires, Pgpool-II should start up without problem. If expires before PostgreSQL starts up, no primary node will be detected, which means you cannot execute DML/DDL. You need to restart Pgpool-II in this case. To confirm that the primary node exists you can use command. Please note can only be used in the streaming replication mode because the parameter is only valid in the mode. See for more details about streaming replication mode. For other mode, tweak the health check (see ) parameters so that there's enough time before PostgreSQL becomes available. If health check detects that PostgreSQL is not available before Pgpool-II starts up, some or all PostgreSQL are recognized in "down" status. In this case you need to manually put the PostgreSQL server in "up" state by using command. If a client tries to connect to Pgpool-II before PostgreSQL is available, failover could be triggered. In this case you also need to execute command to put the PostgreSQL server in "up" state. Stopping Pgpool-II and PostgreSQL shutdown To stop Pgpool-II, execute: $ pgpool -f /usr/local/etc/pgpool.conf -F /usr/local/etc/pcp.conf -m fast stop "-m" option specifies how gently stops Pgpool-II. "fast" means shutdown Pgpool-II immediately even if there are existing connections from clients. You can specify "smart" to the option, which force Pgpool-II to wait until all clients are disconnected from Pgpool-II. But this could make Pgpool-II wait forever and this may result in sending SIGKILL signal from the operating system and leaving garbage, which will bring troubles next time when you start Pgpool-II. After shutting down Pgpool-II, you can shutdown PostgreSQL. Temporarily Shutting Down PostgreSQL shutdown Sometimes you want to temporarily stop or restart PostgreSQL to maintain or version up it. In this section how to perform the task with minimum downtime. Using pcp_detach_node command If you stop PostgreSQL by using pg_ctl, failover will not happen until Pgpool-II detects it by the health check depending on the health check settings and it will take sometime to detach PostgreSQL. Especially if Watchdog is enabled and is on, Pgpool-II will not start failover until more than half of watchdog nodes agree that PostgreSQL is stopped. If you detach the node by using , failover will immediately start regardless the settings of health check. Please note that the detached PostgreSQL node is not actually stopped and if necessary, you need to manually stop it. Using backend_flag Stopping or restarting PostgreSQL causes failover. If the running mode is not streaming replication mode, or the server is a standby server in streaming replication mode, probably that's not a big deal because clients can always use other servers in the cluster. However if the server is primary server, it will cause change of primary server by promoting one of the standby servers. Moreover if there's only one server remains in the cluster, there are no alternative server or standby server which can be promoted. In this case you can use to avoid failover. By setting below in pgpool.conf will avoid failover of backend0. backend_flag0 = DISALLOW_TO_FAILOVER This will take effect by reloading or restarting Pgpool-II. If this flag is set, failover will not happen if the backend is not available. While the backend is not available, clients will get error message: psql: error: could not connect to server: FATAL: failed to create a backend connection DETAIL: executing failover on backend After restarting the backend, clients can connect as usual. To allow failover on the backend again, you can set: backend_flag0 = ALLOW_TO_FAILOVER and reload or restart Pgpool-II. Backing up PostgreSQL database backup If you plan to backup PostgreSQL database using pg_dump, pg_basebackup or any other tools, we strongly recommend to run the commands against PostgreSQL directly. As Pgpool-II is a proxy software, it gives an overhead for relaying message packets. Since obtaining a backup tends to produce lots of packets, performing backup through Pgpool-II will be slow compared with directly connecting PostgreSQL unless the database is very small. Moreover, parallel pg_dump raises error if it is executed through Pgpool-II because the command handles snapshot id, which is a database depending object. In most cases you want to choose primary server as the backup target. If you want backup standby server, you have to be very careful in choosing the right PostgreSQL server to obtain a backup because if the data is outdated, you are likely to have outdated database backup. You can use or to know how the standby server catches up the primary server.