Watchdog
Introduction
Watchdog is a sub process of
Pgpool-II to add high
availability. Watchdog is used to resolve the single point of
failure by coordinating multiple
Pgpool-II nodes. The watchdog was first
introduced in pgpool-II
V3.2 and is significantly enhanced in
Pgpool-II V3.5, to
ensure the presence of a quorum at all time. This new addition to
watchdog makes it more fault tolerant and robust in handling and
guarding against the split-brain syndrome and network
partitioning. In addition, V3.7 introduced
quorum failover (see ) to reduce the false
positives of PostgreSQL server
failures. To ensure the quorum mechanism properly works, the number
of Pgpool-II nodes must be odd in number
and greater than or equal to 3.
Coordinating multiple Pgpool-II nodes
WATCHDOG
Watchdog coordinates multiple Pgpool-II nodes
by exchanging information with each other.
At the startup, if the watchdog is enabled, Pgpool-II node
sync the status of all configured backend nodes from the leader watchdog node.
And if the node goes on to become a leader node itself it initializes the backend
status locally. When a backend node status changes by failover etc..,
watchdog notifies the information to other Pgpool-II
nodes and synchronizes them. When online recovery occurs, watchdog restricts
client connections to other Pgpool-II
nodes for avoiding inconsistency between backends.
Watchdog also coordinates with all connected Pgpool-II nodes to ensure
that failback, failover and follow_primary commands must be executed only on one pgpool-II node.
Life checking of other Pgpool-II nodes
WATCHDOG
Watchdog lifecheck is the sub-component of watchdog to monitor
the health of Pgpool-II nodes participating
in the watchdog cluster to provide the high availability.
Traditionally Pgpool-II watchdog provides
two methods of remote node health checking. "heartbeat"
and "query" mode.
The watchdog in Pgpool-II V3.5
adds a new "external" to ,
which enables to hook an external third party health checking
system with Pgpool-II watchdog.
Apart from remote node health checking watchdog lifecheck can also check
the health of node it is installed on by monitoring the connection to upstream servers.
If the monitoring fails, watchdog treats it as the local Pgpool-II
node failure.
In heartbeat mode, watchdog monitors other Pgpool-II
processes by using heartbeat signal.
Watchdog receives heartbeat signals sent by other Pgpool-II
periodically. If there is no signal for a certain period,
watchdog regards this as the failure of the Pgpool-II.
For redundancy you can use multiple network connections for heartbeat
exchange between Pgpool-II nodes.
This is the default and recommended mode to be used for health checking.
In query mode, watchdog monitors Pgpool-II
service rather than process. In this mode watchdog sends queries to other
Pgpool-II and checks the response.
Note that this method requires connections from other Pgpool-II,
so it would fail monitoring if the parameter isn't large enough.
This mode is deprecated and left for backward compatibility.
external mode is introduced by Pgpool-II
V3.5. This mode basically disables the built in lifecheck
of Pgpool-II watchdog and expects that the external system
will inform the watchdog about health of local and all remote nodes participating in the watchdog cluster.
Consistency of configuration parameters on all Pgpool-II nodes
WATCHDOG
At startup watchdog verifies the Pgpool-II
configuration of the local node for the consistency with the configurations
on the leader watchdog node and warns the user of any differences.
This eliminates the likelihood of undesired behavior that can happen
because of different configuration on different Pgpool-II nodes.
Changing active/standby state when certain fault is detected
WATCHDOG
When a fault of Pgpool-II is detected,
watchdog notifies the other watchdogs of it.
If this is the active Pgpool-II,
watchdogs decide the new active Pgpool-II
by voting and change active/standby state.
Automatic virtual IP switching
WATCHDOG
When a standby Pgpool-II server promotes to active,
the new active server brings up virtual IP interface. Meanwhile, the previous
active server brings down the virtual IP interface. This enables the active
Pgpool-II to work using the same
IP address even when servers are switched.
Automatic registration of a server as a standby in recovery
WATCHDOG
When the broken server recovers or new server is attached, the watchdog process
notifies this to the other watchdogs in the cluster along with the information of the new server,
and the watchdog process receives information on the active server and
other servers. Then, the attached server is registered as a standby.
Starting/stopping watchdog
WATCHDOG
The watchdog process starts and stops automatically as sub-processes
of the Pgpool-II, therefore there is no
dedicated command to start and stop watchdog.
Watchdog controls the virtual IP interface, the commands executed by
the watchdog for bringing up and bringing down the VIP require the
root privileges. Pgpool-II requires the
user running Pgpool-II to have root
privileges when the watchdog is enabled along with virtual IP.
This is however not good security practice to run the
Pgpool-II as root user, the alternative
and preferred way is to run the Pgpool-II
as normal user and use either the custom commands for
, ,
and using sudo
or use setuid ("set user ID upon execution")
on if_* commands
Lifecheck process is a sub-component of watchdog, its job is to monitor the
health of Pgpool-II nodes participating in
the watchdog cluster. The Lifecheck process is started automatically
when the watchdog is configured to use the built-in life-checking,
it starts after the watchdog main process initialization is complete.
However lifecheck process only kicks in when all configured watchdog
nodes join the cluster and becomes active. If some remote node fails
before the Lifecheck become active that failure will not get caught by the lifecheck.
Integrating external lifecheck with watchdog
Pgpool-II watchdog process uses the
BSD sockets for communicating with
all the Pgpool-II processes and the
same BSD socket can also be used by any third
party system to provide the lifecheck function for local and remote
Pgpool-II watchdog nodes.
The BSD socket file name for IPC is constructed
by appending Pgpool-II wd_port after
"s.PGPOOLWD_CMD." string and the socket file is
placed in the directory.
Watchdog IPC command packet format
WATCHDOG
The watchdog IPC command packet consists of three fields.
Below table details the message fields and description.
Watchdog IPC result packet format
WATCHDOG
The watchdog IPC command result packet consists of three fields.
Below table details the message fields and description.
Watchdog IPC command packet types
WATCHDOG
The first byte of the IPC command packet sent to watchdog process
and the result returned by watchdog process is identified as the
command or command result type.
The below table lists all valid types and their meanings
Watchdog IPC command packet types
Name
Byte Value
Type
Description
REGISTER FOR NOTIFICATIONS
'0'
Command packet
Command to register the current connection to receive watchdog notifications
NODE STATUS CHANGE
'2'
Command packet
Command to inform watchdog about node status change of watchdog node
GET NODES LIST
'3'
Command packet
Command to get the list of all configured watchdog nodes
NODES LIST DATA
'4'
Result packet
The JSON data in packet contains the list of all configured watchdog nodes
CLUSTER IN TRANSITION
'7'
Result packet
Watchdog returns this packet type when it is not possible to process the command because the cluster is transitioning.
RESULT BAD
'8'
Result packet
Watchdog returns this packet type when the IPC command fails
RESULT OK
'9'
Result packet
Watchdog returns this packet type when IPC command succeeds
External lifecheck IPC packets and data
WATCHDOG
"GET NODES LIST" ,"NODES LIST DATA" and "NODE STATUS CHANGE"
IPC messages of watchdog can be used to integration an external
lifecheck systems. Note that the built-in lifecheck of pgpool
also uses the same channel and technique.
Getting list of configured watchdog nodes
WATCHDOG
Any third party lifecheck system can send the "GET NODES LIST"
packet on watchdog IPC socket with a JSON
data containing the authorization key and value if
is set or empty packet data
when is not configured to get
the "NODES LIST DATA" result packet.
The result packet returned by watchdog for the "GET NODES LIST"
will contains the list of all configured watchdog nodes to do
health check on in the JSON format.
The JSON of the watchdog nodes contains the
"WatchdogNodes" Array of all watchdog nodes.
Each watchdog JSON node contains the
"ID", "NodeName",
"HostName", "DelegateIP",
"WdPort" and "PgpoolPort"
for each node.
-- The example JSON data contained in "NODES LIST DATA"
{
"NodeCount":3,
"WatchdogNodes":
[
{
"ID":0,
"State":1,
"NodeName":"Linux_ubuntu_9999",
"HostName":"watchdog-host1",
"DelegateIP":"172.16.5.133",
"WdPort":9000,
"PgpoolPort":9999
},
{
"ID":1,
"State":1,
"NodeName":"Linux_ubuntu_9991",
"HostName":"watchdog-host2",
"DelegateIP":"172.16.5.133",
"WdPort":9000,
"PgpoolPort":9991
},
{
"ID":2,
"State":1,
"NodeName":"Linux_ubuntu_9992",
"HostName":"watchdog-host3",
"DelegateIP":"172.16.5.133",
"WdPort":9000,
"PgpoolPort":9992
}
]
}
-- Note that ID 0 is always reserved for local watchdog node
After getting the configured watchdog nodes information from the
watchdog the external lifecheck system can proceed with the
health checking of watchdog nodes, and when it detects some status
change of any node it can inform that to watchdog using the
"NODE STATUS CHANGE" IPC messages of watchdog.
The data in the message should contain the JSON
with the node ID of the node whose status is changed
(The node ID must be same as returned by watchdog for that node
in WatchdogNodes list) and the new status of node.
-- The example JSON to inform pgpool-II watchdog about health check
failed on node with ID 1 will look like
{
"NodeID":1,
"NodeStatus":1,
"Message":"optional message string to log by watchdog for this event"
"IPCAuthKey":"wd_authkey configuration parameter value"
}
-- NodeStatus values meanings are as follows
NODE STATUS DEAD = 1
NODE STATUS ALIVE = 2
Restrictions on watchdog
WATCHDOG
Watchdog restriction with query mode lifecheck
WATCHDOG
In query mode, when all the DB nodes are detached from a
Pgpool-II due to PostgreSQL server
failure or pcp_detach_node issued, watchdog regards that the
Pgpool-II service is in the down
status and brings the virtual IP assigned to watchdog down.
Thus clients of Pgpool-II cannot
connect to Pgpool-II using the
virtual IP any more. This is necessary to avoid split-brain,
that is, situations where there are multiple active
Pgpool-II.
Connecting to Pgpool-II whose watchdog status is down
WATCHDOG
Don't connect to Pgpool-II in down
status using the real IP. Because a Pgpool-II
in down status can't receive information from other
Pgpool-II watchdogs so it's backend status
may be different from other the Pgpool-II.
Pgpool-II whose watchdog status is down requires restart
WATCHDOG
Pgpool-II in down status can't become active
nor the standby Pgpool-II.
Recovery from down status requires the restart of Pgpool-II.
Watchdog promotion to active takes few seconds
WATCHDOG
After the active Pgpool-II stops,
it will take a few seconds until the standby Pgpool-II
promote to new active, to make sure that the former virtual IP is
brought down before a down notification packet is sent to other
Pgpool-II.
Architecture of the watchdog
Watchdog is a sub process of Pgpool-II,
which adds the high availability and resolves the single point of
failure by coordinating multiple Pgpool-II.
The watchdog process automatically starts (if enabled) when the
Pgpool-II starts up and consists of two
main components, Watchdog core and the lifecheck system.
Watchdog Core
Watchdog core referred as a "watchdog" is a
Pgpool-II child process that
manages all the watchdog related communications with the
Pgpool-II nodes present in the
cluster and also communicates with the Pgpool-II
parent and lifecheck processes.
The heart of a watchdog process is a state machine that starts
from its initial state (WD_LOADING) and transit
towards either standby (WD_STANDBY) or
leader/coordinator (WD_COORDINATOR) state.
Both standby and leader/coordinator states are stable states of the
watchdog state machine and the node stays in standby or
leader/coordinator state until some problem in local
Pgpool-II node is detected or a
remote Pgpool-II disconnects from the cluster.
The watchdog process performs the following tasks:
Manages and coordinates the local node watchdog state.
Interacts with built-in or external lifecheck system
for the of local and remote Pgpool-II
node health checking.
Interacts with Pgpool-II main
process and provides the mechanism to
Pgpool-II parent process for
executing the cluster commands over the watchdog channel.
Communicates with all the participating Pgpool-II
nodes to coordinate the selection of
leader/coordinator node and to ensure the quorum in the cluster.
Manages the Virtual-IP on the active/coordinator node and
allow the users to provide custom scripts for
escalation and de-escalation.
Verifies the consistency of Pgpool-II
configurations across the participating Pgpool-II
nodes in the watchdog cluster.
Synchronize the status of all PostgreSQL backends at startup.
Provides the distributed locking facility to
Pgpool-II main process
for synchronizing the different failover commands.
Communication with other nodes in the Cluster
Watchdog uses TCP/IP sockets for all the communication with other nodes.
Each watchdog node can have two sockets opened with each node. One is the
outgoing (client) socket which this node creates and initiate the
connection to the remote node and the second socket is the one which
is listening socket for inbound connection initiated by remote
watchdog node. As soon as the socket connection to remote node succeeds
watchdog sends the ADD NODE (WD_ADD_NODE_MESSAGE)
message on that socket. And upon receiving the ADD NODE message the
watchdog node verifies the node information encapsulated in the message
with the Pgpool-II configurations for that node, and if the node passes
the verification test it is added to the cluster otherwise the connection
is dropped.
IPC and data format
Watchdog process exposes a UNIX domain socket
for IPC communications, which accepts and provides the data in
JSON format. All the internal Pgpool-II
processes, including Pgpool-II's
built-in lifecheck and Pgpool-II main process
uses this IPC socket interface to interact with the watchdog.
This IPC socket can also be used by any external/3rd party system
to interact with watchdog.
See for details
on how to use watchdog IPC interface for integrating external/3rd party systems.
Watchdog Lifecheck
Watchdog lifecheck is the sub-component of watchdog that monitors the health
of Pgpool-II nodes participating in the watchdog
cluster. Pgpool-II watchdog provides three built-in
methods of remote node health checking, "heartbeat", "query" and "external" mode.
In "heartbeat" mode, The lifecheck process sends and receives the data over
UDP socket to check the availability of remote nodes and
for each node the parent lifecheck process spawns two child process one for
sending the heartbeat signal and another for receiving the heartbeat.
While in "query" mode, The lifecheck process uses the PostgreSQL libpq
interface for querying the remote Pgpool-II.
And in this mode the lifecheck process creates a new thread for each health
check query which gets destroyed as soon as the query finishes.
While in "external" mode, this mode disables the built in lifecheck of
Pgpool-II, and expects that the external system
will monitor local and remote node instead.
Apart from remote node health checking watchdog lifecheck can also check the
health of node it is installed on by monitoring the connection to upstream servers.
For monitoring the connectivity to the upstream server Pgpool-II
lifecheck uses execv() function to executes
'ping -q -c3 hostname' command.
So a new child process gets spawned for executing each ping command.
This means for each health check cycle a child process gets created and
destroyed for each configured upstream server.
For example, if two upstream servers are configured in the lifecheck and it is
asked to health check at ten second intervals, then after each ten second
lifecheck will spawn two child processes, one for each upstream server,
and each process will live until the ping command is finished.