Server Setup and Operation
This chapter discusses how to set up and run the Pgpool-II> server
and its interactions with the operating system.
Managing kernel resouceskernel resouces
To run Pgpool-II System V shared memory
and System V semaphore are necessary. Although on linux enough
amount of both resources are provided, other platforms may need to
configure the kernel parameters.
The minimum requirement of System V shared memory is around 140 MB.
If you plan to use more than default 64MB
of query
cache, you may need more shared memory. Number of necessary
shared memory segments is 3. Usually the kernel provides enough
shared memory segments.
The minimum requirement of System V semaphore is around 10. Number
of necessary semaphore sets is 3. Usually the kernel provides
enough semaphore sets.
The Pgpool-II User AccountPgpool-II user
As with any server daemon that is accessible to the outside world,
it is advisable to run Pgpool-II under a
separate user account. This user account should only own the data
that is managed by the server, and should not be shared with other
daemons. (For example, using the user nobody is a bad
idea.) It is not advisable to install executables owned by this
user because compromised systems could then modify their own
binaries.
To add a Unix user account to your system, look for a command
useradd or adduser. The user
name pgpool is often used, and is assumed
throughout this book, but you can use another name if you like.
Configuring pcp.confpcp configurationPgpool-II provides a interface
for administrators to perform management operation, such as
getting Pgpool-II status or terminating
Pgpool-II processes
remotely. pcp.conf is the user/password file
used for authentication by this interface. All operation modes
require the pcp.conf file to be set. A
$prefix/etc/pcp.conf.sample file is created
during the installation
of Pgpool-II. Copy the file as
$prefix/etc/pcp.conf and add your user name and password
to it.
$> cp $prefix/etc/pcp.conf.sample $prefix/etc/pcp.conf
An empty line or a line starting with # is treated as a
comment and will be ignored. A user name and its associated
password must be written as one line using the following format:
username>:[md5 encrypted password]>
[md5 encrypted password]> can be produced with the $prefix/bin/pg_md5> command.
$> pg_md5 your_password
1060b7b46a3bd36b3a0d66e0127d0517
If you don't want pass the password as the argument, execute pg_md5 -p.
$> pg_md5 -p
password: your_password
The pcp.conf file must be readable by the
user who executes Pgpool-II.
Configuring Pgpool-IIPgpool-II configurationConfiguring pgpool.confpgpool.conf is the main configuration file
of Pgpool-II. You need to specify the
path to the file when
starting Pgpool-II
using option.
pgpool.conf is located
at $prefix/etc/pgpool.conf by default,
if it installed from source code.
To specify the Pgpool-II clustering
mode, set parameter
to the value explained below.
value in pgpool.confClustering modevalueStreaming replication modestreaming_replicationReplication modenative_replicationLogical replication modelogical_replicationSlony modeslonySnapshot isolation modesnapshot_isolationRaw moderaw
These configuration files are located at /usr/local/etc with
default installation from source code.
You can copy one of them as pgpool.conf.
(probably you need root privilege for this)
# cd /usr/local/etc
# cp pgpool.conf.sample pgpool.conf
Clustering mode of Pgpool-IIstreaming replication modenative replication modemain replica modelogical replication modesnapshot isolation mode
There are six different clustering modes in
Pgpool-II>: streaming replication mode, logical
replication mode, main replica mode (slony mode), native
replication mode, raw mode and snapshot isolation mode. In any
mode, Pgpool-II> provides connection pooling, and
automatic fail over. Online recovery can be used only with
streaming replication mode, snapshot isolation mode and native
replication mode. See
for more details of online recovery.
Those modes are exclusive each other and cannot be changed after
starting the server. You should make a decision which to use in
the early stage of designing the system. If you are not sure, it
is recommended to use the streaming replication mode or the
snapshot isolation mode.
The streaming replication mode can be used
with PostgreSQL> servers operating streaming
replication. In this mode, PostgreSQL> is
responsible for synchronizing databases. This mode is widely used
and most recommended way to use Pgpool-II>. Load
balancing is possible in the mode. visibility consistency among
nodes is not guaranteed.
In the snapshot isolation modePgpool-II> is responsible for synchronizing
databases. The advantage for the mode is the synchronization is
done in synchronous way: writing to the database does not return
until all of PostgreSQL> servers finish the write
operation. Also it guarantees the visibility consistency among
nodes. To put it simply, it means that the visibility rule of
transactions on single server is applied to a cluster consisting
of multiple servers as well. This is a remarkable feature of the
snapshot isolation mode in
Pgpool-II. In fact, the snapshot
isolation mode in Pgpool-II is the only
system which guarantees the visibility consistency among nodes
without modifications to PostgreSQL> at the moment.
Because of this, applications do not need to recognize that they
are using a cluster consisting of
PostgreSQL servers, rather than a
single PostgreSQL system. However in
this mode the transaction isolation level must be
REPEATABLE READ. You need to set
postgresql.conf like this:
default_transaction_isolation = 'repeatable read'
Also you need to aware that performance in the mode may be worse
than the streaming replication mode and native replication mode
due to the overhead to keep the consistency in transactions.
In the native replication mode,
Pgpool-II> is responsible for synchronizing
databases. The advantage for the mode is the synchronization is
done in synchronous way: writing to the database does not return
until all of PostgreSQL> servers finish the write
operation. As visibility consistency among nodes is not
guaranteed, it is recommended to use the snapshot isolation mode
except you want to use other than REPEATABLE READ isolation mode.
Load balancing is possible in the mode.
The logical replication mode can be used
with PostgreSQL> servers operating logical
replication. In this mode, PostgreSQL> is
responsible for synchronizing tables. Load balancing is possible
in the mode. Since logical replication does not replicate all
tables, it's user's responsibility to replicate the table which
could be load balanced. Pgpool-II> load balances
all tables. This means that if a table is not
replicated, Pgpool-II> may lookup outdated tables
in the subscriber side.
The main replica mode (slony mode)
can be used with PostgreSQL> servers
operating Slony>. In this
mode, Slony>/PostgreSQL> is
responsible for synchronizing
databases. Since Slony-I> is being obsoleted by
streaming replication, we do not recommend to use this mode
unless you have specific reason to
use Slony>. Load balancing is possible in the
mode.
In the raw
mode, Pgpool-II> does not care about
the database synchronization. It's user's responsibility to make
the whole system does a meaningful thing. Load balancing
is not possible in the mode.
Process management modesdynamic process managementstatic process managementPgpool-II> implements a multi-process architecture where
each child process can handle exactly one client connection at any time.
The total number of concurrent client connections Pgpool-II>
can handle is configured by the
num_init_children config parameter.
Pgpool-II> supports two child process management modes.
Dynamic and Static.
In static process management mode, Pgpool-II> pre-forks the
num_init_children number of child
process at startup, and each child process keeps listening for incoming
client connection. While with dynamic process management mode,
Pgpool-II> keeps track of idle processes and forks or kills
processes to keep this number within the specified boundaries.
is not available prior to
Pgpool-II V4.4.
Configuring backend information
For Pgpool-II to recognize PostgreSQL>
backend servers, you need to configure backend*
in pgpool.conf. For starters, at
least
and parameters are required to
be set up to start Pgpool-II> server.
Backend Settings
Backend PostgreSQL> used by Pgpool-II> must be specified in pgpool.conf.
See Starting Pgpool-II and PostgreSQL
To start Pgpool-II, execute:
$ pgpool -f /usr/local/etc/pgpool.conf -F /usr/local/etc/pcp.conf
which will start the server running in the background. "-f"
specifies the path to the main pgpool configuration file and "-F"
specifies the path to the configuration file of pcp server, which
is the control server for Pgpool-II. For
other options of the command please take a look at manual.
Before starting Pgpool-II, you must
start PostgreSQL because if
PostgreSQL has not started yet,
Pgpool-II triggers failover process and
makes PostgreSQL is in down status.
If you have difficulty in controlling the startup sequence of
PostgreSQL, for example
Pgpool-II and
PostgreSQL are installed on different
servers, you can make longer (the default is 5
minutes) so that Pgpool-II waits for
PostgreSQL starts up until expires. If
PostgreSQL starts up before expires,
Pgpool-II should start up without
problem. If
expires before PostgreSQL starts up, no
primary node will be detected, which means you cannot execute
DML/DDL. You need to restart Pgpool-II
in this case. To confirm that the primary node exists you can use
command.
Please note can
only be used in the streaming replication mode because the
parameter is only valid in the mode. See for more details about streaming
replication mode. For other mode, tweak the health check (see ) parameters so that there's
enough time before PostgreSQL becomes
available.
If health check detects that PostgreSQL
is not available before Pgpool-II starts
up, some or all PostgreSQL are
recognized in "down" status. In this case you need to manually put
the PostgreSQL server in "up" state by
using command. If a client tries
to connect to Pgpool-II before
PostgreSQL is available, failover could
be triggered. In this case you also need to execute command to put the
PostgreSQL server in "up" state.
Stopping Pgpool-II and PostgreSQLshutdown
To stop Pgpool-II, execute:
$ pgpool -f /usr/local/etc/pgpool.conf -F /usr/local/etc/pcp.conf -m fast stop
"-m" option specifies how gently stops
Pgpool-II. "fast" means shutdown
Pgpool-II immediately even if there are
existing connections from clients. You can specify "smart" to the
option, which force Pgpool-II to wait
until all clients are disconnected from
Pgpool-II. But this could make
Pgpool-II wait forever and this may
result in sending SIGKILL signal from the operating system and
leaving garbage, which will bring troubles next time when you start
Pgpool-II.
After shutting down Pgpool-II, you can
shutdown PostgreSQL.
Temporarily Shutting Down PostgreSQLshutdown
Sometimes you want to temporarily stop or restart
PostgreSQL to maintain or version up
it. In this section how to perform the task with minimum downtime.
Using pcp_detach_node command
If you stop PostgreSQL by using
pg_ctl, failover will not happen until
Pgpool-II detects it by the health
check depending on the health check settings and it will take
sometime to detach PostgreSQL.
Especially if Watchdog is
enabled and is on,
Pgpool-II will not start failover until
more than half of watchdog nodes agree that
PostgreSQL is stopped. If you detach
the node by using , failover will
immediately start regardless the settings of health check. Please
note that the detached PostgreSQL node
is not actually stopped and if necessary, you need to manually
stop it.
Using backend_flag
Stopping or restarting PostgreSQL
causes failover. If the running mode is not streaming replication
mode, or the server is a standby server in streaming replication
mode, probably that's not a big deal because clients can always
use other servers in the cluster. However if the server is primary
server, it will cause change of primary server by promoting one of
the standby servers. Moreover if there's only one server remains
in the cluster, there are no alternative server or standby server
which can be promoted.
In this case you can use to avoid
failover. By setting below in pgpool.conf will avoid failover of
backend0.
backend_flag0 = DISALLOW_TO_FAILOVER
This will take effect by reloading or restarting
Pgpool-II. If this flag is set, failover
will not happen if the backend is not available. While the backend
is not available, clients will get error message:
psql: error: could not connect to server: FATAL: failed to create a backend connection
DETAIL: executing failover on backend
After restarting the backend, clients can connect as usual.
To allow failover on the backend again, you can set:
backend_flag0 = ALLOW_TO_FAILOVER
and reload or restart Pgpool-II.
Backing up PostgreSQL databasebackup
If you plan to backup PostgreSQL
database
using pg_dump, pg_basebackup
or any other tools, we strongly recommend to run the commands
against PostgreSQL
directly. As Pgpool-II is a proxy
software, it gives an overhead for relaying message packets. Since
obtaining a backup tends to produce lots of packets, performing
backup through Pgpool-II will be slow
compared with directly
connecting PostgreSQL unless the
database is very small.
Moreover, parallel pg_dump raises error if it is
executed through Pgpool-II because the
command handles snapshot id, which is a database depending object.
In most cases you want to choose primary server as the backup
target. If you want backup standby server, you have to be very
careful in choosing the right PostgreSQL
server to obtain a backup because if the data is outdated, you are
likely to have outdated database backup. You can
use
or to know how the standby server
catches up the primary server.