Watchdog Introduction Watchdog is a sub process of Pgpool-II to add high availability. Watchdog is used to resolve the single point of failure by coordinating multiple Pgpool-II nodes. The watchdog was first introduced in pgpool-II V3.2 and is significantly enhanced in Pgpool-II V3.5, to ensure the presence of a quorum at all time. This new addition to watchdog makes it more fault tolerant and robust in handling and guarding against the split-brain syndrome and network partitioning. In addition, V3.7 introduced quorum failover (see ) to reduce the false positives of PostgreSQL server failures. To ensure the quorum mechanism properly works, the number of Pgpool-II nodes must be odd in number and greater than or equal to 3. Coordinating multiple <productname>Pgpool-II</productname> nodes WATCHDOG Watchdog coordinates multiple Pgpool-II nodes by exchanging information with each other. At the startup, if the watchdog is enabled, Pgpool-II node sync the status of all configured backend nodes from the leader watchdog node. And if the node goes on to become a leader node itself it initializes the backend status locally. When a backend node status changes by failover etc.., watchdog notifies the information to other Pgpool-II nodes and synchronizes them. When online recovery occurs, watchdog restricts client connections to other Pgpool-II nodes for avoiding inconsistency between backends. Watchdog also coordinates with all connected Pgpool-II nodes to ensure that failback, failover and follow_primary commands must be executed only on one pgpool-II node. Life checking of other <productname>Pgpool-II</productname> nodes WATCHDOG Watchdog lifecheck is the sub-component of watchdog to monitor the health of Pgpool-II nodes participating in the watchdog cluster to provide the high availability. Traditionally Pgpool-II watchdog provides two methods of remote node health checking. "heartbeat" and "query" mode. The watchdog in Pgpool-II V3.5 adds a new "external" to , which enables to hook an external third party health checking system with Pgpool-II watchdog. Apart from remote node health checking watchdog lifecheck can also check the health of node it is installed on by monitoring the connection to upstream servers. If the monitoring fails, watchdog treats it as the local Pgpool-II node failure. In heartbeat mode, watchdog monitors other Pgpool-II processes by using heartbeat signal. Watchdog receives heartbeat signals sent by other Pgpool-II periodically. If there is no signal for a certain period, watchdog regards this as the failure of the Pgpool-II. For redundancy you can use multiple network connections for heartbeat exchange between Pgpool-II nodes. This is the default and recommended mode to be used for health checking. In query mode, watchdog monitors Pgpool-II service rather than process. In this mode watchdog sends queries to other Pgpool-II and checks the response. Note that this method requires connections from other Pgpool-II, so it would fail monitoring if the parameter isn't large enough. This mode is deprecated and left for backward compatibility. external mode is introduced by Pgpool-II V3.5. This mode basically disables the built in lifecheck of Pgpool-II watchdog and expects that the external system will inform the watchdog about health of local and all remote nodes participating in the watchdog cluster. Consistency of configuration parameters on all <productname>Pgpool-II</productname> nodes WATCHDOG At startup watchdog verifies the Pgpool-II configuration of the local node for the consistency with the configurations on the leader watchdog node and warns the user of any differences. This eliminates the likelihood of undesired behavior that can happen because of different configuration on different Pgpool-II nodes. Changing active/standby state when certain fault is detected WATCHDOG When a fault of Pgpool-II is detected, watchdog notifies the other watchdogs of it. If this is the active Pgpool-II, watchdogs decide the new active Pgpool-II by voting and change active/standby state. Automatic virtual IP switching WATCHDOG When a standby Pgpool-II server promotes to active, the new active server brings up virtual IP interface. Meanwhile, the previous active server brings down the virtual IP interface. This enables the active Pgpool-II to work using the same IP address even when servers are switched. Automatic registration of a server as a standby in recovery WATCHDOG When the broken server recovers or new server is attached, the watchdog process notifies this to the other watchdogs in the cluster along with the information of the new server, and the watchdog process receives information on the active server and other servers. Then, the attached server is registered as a standby. Starting/stopping watchdog WATCHDOG The watchdog process starts and stops automatically as sub-processes of the Pgpool-II, therefore there is no dedicated command to start and stop watchdog. Watchdog controls the virtual IP interface, the commands executed by the watchdog for bringing up and bringing down the VIP require the root privileges. Pgpool-II requires the user running Pgpool-II to have root privileges when the watchdog is enabled along with virtual IP. This is however not good security practice to run the Pgpool-II as root user, the alternative and preferred way is to run the Pgpool-II as normal user and use either the custom commands for , , and using sudo or use setuid ("set user ID upon execution") on if_* commands Lifecheck process is a sub-component of watchdog, its job is to monitor the health of Pgpool-II nodes participating in the watchdog cluster. The Lifecheck process is started automatically when the watchdog is configured to use the built-in life-checking, it starts after the watchdog main process initialization is complete. However lifecheck process only kicks in when all configured watchdog nodes join the cluster and becomes active. If some remote node fails before the Lifecheck become active that failure will not get caught by the lifecheck. Integrating external lifecheck with watchdog Pgpool-II watchdog process uses the BSD sockets for communicating with all the Pgpool-II processes and the same BSD socket can also be used by any third party system to provide the lifecheck function for local and remote Pgpool-II watchdog nodes. The BSD socket file name for IPC is constructed by appending Pgpool-II wd_port after "s.PGPOOLWD_CMD." string and the socket file is placed in the directory. Watchdog IPC command packet format WATCHDOG The watchdog IPC command packet consists of three fields. Below table details the message fields and description. Watchdog IPC command packet format Field Type Description TYPE BYTE1 Command Type LENGTH INT32 in network byte order The length of data to follow DATA DATA in JSON format Command data in JSON format
Watchdog IPC result packet format WATCHDOG The watchdog IPC command result packet consists of three fields. Below table details the message fields and description. Watchdog IPC result packet format Field Type Description TYPE BYTE1 Command Type LENGTH INT32 in network byte order The length of data to follow DATA DATA in JSON format Command result data in JSON format
Watchdog IPC command packet types WATCHDOG The first byte of the IPC command packet sent to watchdog process and the result returned by watchdog process is identified as the command or command result type. The below table lists all valid types and their meanings Watchdog IPC command packet types Name Byte Value Type Description REGISTER FOR NOTIFICATIONS '0' Command packet Command to register the current connection to receive watchdog notifications NODE STATUS CHANGE '2' Command packet Command to inform watchdog about node status change of watchdog node GET NODES LIST '3' Command packet Command to get the list of all configured watchdog nodes NODES LIST DATA '4' Result packet The JSON data in packet contains the list of all configured watchdog nodes CLUSTER IN TRANSITION '7' Result packet Watchdog returns this packet type when it is not possible to process the command because the cluster is transitioning. RESULT BAD '8' Result packet Watchdog returns this packet type when the IPC command fails RESULT OK '9' Result packet Watchdog returns this packet type when IPC command succeeds
External lifecheck IPC packets and data WATCHDOG "GET NODES LIST" ,"NODES LIST DATA" and "NODE STATUS CHANGE" IPC messages of watchdog can be used to integration an external lifecheck systems. Note that the built-in lifecheck of pgpool also uses the same channel and technique. Getting list of configured watchdog nodes WATCHDOG Any third party lifecheck system can send the "GET NODES LIST" packet on watchdog IPC socket with a JSON data containing the authorization key and value if is set or empty packet data when is not configured to get the "NODES LIST DATA" result packet. The result packet returned by watchdog for the "GET NODES LIST" will contains the list of all configured watchdog nodes to do health check on in the JSON format. The JSON of the watchdog nodes contains the "WatchdogNodes" Array of all watchdog nodes. Each watchdog JSON node contains the "ID", "NodeName", "HostName", "DelegateIP", "WdPort" and "PgpoolPort" for each node. -- The example JSON data contained in "NODES LIST DATA" { "NodeCount":3, "WatchdogNodes": [ { "ID":0, "State":1, "NodeName":"Linux_ubuntu_9999", "HostName":"watchdog-host1", "DelegateIP":"172.16.5.133", "WdPort":9000, "PgpoolPort":9999 }, { "ID":1, "State":1, "NodeName":"Linux_ubuntu_9991", "HostName":"watchdog-host2", "DelegateIP":"172.16.5.133", "WdPort":9000, "PgpoolPort":9991 }, { "ID":2, "State":1, "NodeName":"Linux_ubuntu_9992", "HostName":"watchdog-host3", "DelegateIP":"172.16.5.133", "WdPort":9000, "PgpoolPort":9992 } ] } -- Note that ID 0 is always reserved for local watchdog node After getting the configured watchdog nodes information from the watchdog the external lifecheck system can proceed with the health checking of watchdog nodes, and when it detects some status change of any node it can inform that to watchdog using the "NODE STATUS CHANGE" IPC messages of watchdog. The data in the message should contain the JSON with the node ID of the node whose status is changed (The node ID must be same as returned by watchdog for that node in WatchdogNodes list) and the new status of node. -- The example JSON to inform pgpool-II watchdog about health check failed on node with ID 1 will look like { "NodeID":1, "NodeStatus":1, "Message":"optional message string to log by watchdog for this event" "IPCAuthKey":"wd_authkey configuration parameter value" } -- NodeStatus values meanings are as follows NODE STATUS DEAD = 1 NODE STATUS ALIVE = 2
Restrictions on watchdog WATCHDOG Watchdog restriction with query mode lifecheck WATCHDOG In query mode, when all the DB nodes are detached from a Pgpool-II due to PostgreSQL server failure or pcp_detach_node issued, watchdog regards that the Pgpool-II service is in the down status and brings the virtual IP assigned to watchdog down. Thus clients of Pgpool-II cannot connect to Pgpool-II using the virtual IP any more. This is necessary to avoid split-brain, that is, situations where there are multiple active Pgpool-II. Connecting to <productname>Pgpool-II</productname> whose watchdog status is down WATCHDOG Don't connect to Pgpool-II in down status using the real IP. Because a Pgpool-II in down status can't receive information from other Pgpool-II watchdogs so it's backend status may be different from other the Pgpool-II. <productname>Pgpool-II</productname> whose watchdog status is down requires restart WATCHDOG Pgpool-II in down status can't become active nor the standby Pgpool-II. Recovery from down status requires the restart of Pgpool-II. Watchdog promotion to active takes few seconds WATCHDOG After the active Pgpool-II stops, it will take a few seconds until the standby Pgpool-II promote to new active, to make sure that the former virtual IP is brought down before a down notification packet is sent to other Pgpool-II. Architecture of the watchdog Watchdog is a sub process of Pgpool-II, which adds the high availability and resolves the single point of failure by coordinating multiple Pgpool-II. The watchdog process automatically starts (if enabled) when the Pgpool-II starts up and consists of two main components, Watchdog core and the lifecheck system. Watchdog Core Watchdog core referred as a "watchdog" is a Pgpool-II child process that manages all the watchdog related communications with the Pgpool-II nodes present in the cluster and also communicates with the Pgpool-II parent and lifecheck processes. The heart of a watchdog process is a state machine that starts from its initial state (WD_LOADING) and transit towards either standby (WD_STANDBY) or leader/coordinator (WD_COORDINATOR) state. Both standby and leader/coordinator states are stable states of the watchdog state machine and the node stays in standby or leader/coordinator state until some problem in local Pgpool-II node is detected or a remote Pgpool-II disconnects from the cluster. The watchdog process performs the following tasks: Manages and coordinates the local node watchdog state. Interacts with built-in or external lifecheck system for the of local and remote Pgpool-II node health checking. Interacts with Pgpool-II main process and provides the mechanism to Pgpool-II parent process for executing the cluster commands over the watchdog channel. Communicates with all the participating Pgpool-II nodes to coordinate the selection of leader/coordinator node and to ensure the quorum in the cluster. Manages the Virtual-IP on the active/coordinator node and allow the users to provide custom scripts for escalation and de-escalation. Verifies the consistency of Pgpool-II configurations across the participating Pgpool-II nodes in the watchdog cluster. Synchronize the status of all PostgreSQL backends at startup. Provides the distributed locking facility to Pgpool-II main process for synchronizing the different failover commands. Communication with other nodes in the Cluster Watchdog uses TCP/IP sockets for all the communication with other nodes. Each watchdog node can have two sockets opened with each node. One is the outgoing (client) socket which this node creates and initiate the connection to the remote node and the second socket is the one which is listening socket for inbound connection initiated by remote watchdog node. As soon as the socket connection to remote node succeeds watchdog sends the ADD NODE (WD_ADD_NODE_MESSAGE) message on that socket. And upon receiving the ADD NODE message the watchdog node verifies the node information encapsulated in the message with the Pgpool-II configurations for that node, and if the node passes the verification test it is added to the cluster otherwise the connection is dropped. IPC and data format Watchdog process exposes a UNIX domain socket for IPC communications, which accepts and provides the data in JSON format. All the internal Pgpool-II processes, including Pgpool-II's built-in lifecheck and Pgpool-II main process uses this IPC socket interface to interact with the watchdog. This IPC socket can also be used by any external/3rd party system to interact with watchdog. See for details on how to use watchdog IPC interface for integrating external/3rd party systems. Watchdog Lifecheck Watchdog lifecheck is the sub-component of watchdog that monitors the health of Pgpool-II nodes participating in the watchdog cluster. Pgpool-II watchdog provides three built-in methods of remote node health checking, "heartbeat", "query" and "external" mode. In "heartbeat" mode, The lifecheck process sends and receives the data over UDP socket to check the availability of remote nodes and for each node the parent lifecheck process spawns two child process one for sending the heartbeat signal and another for receiving the heartbeat. While in "query" mode, The lifecheck process uses the PostgreSQL libpq interface for querying the remote Pgpool-II. And in this mode the lifecheck process creates a new thread for each health check query which gets destroyed as soon as the query finishes. While in "external" mode, this mode disables the built in lifecheck of Pgpool-II, and expects that the external system will monitor local and remote node instead. Apart from remote node health checking watchdog lifecheck can also check the health of node it is installed on by monitoring the connection to upstream servers. For monitoring the connectivity to the upstream server Pgpool-II lifecheck uses execv() function to executes 'ping -q -c3 hostname' command. So a new child process gets spawned for executing each ping command. This means for each health check cycle a child process gets created and destroyed for each configured upstream server. For example, if two upstream servers are configured in the lifecheck and it is asked to health check at ten second intervals, then after each ten second lifecheck will spawn two child processes, one for each upstream server, and each process will live until the ping command is finished.