Tatsuo Ishii [Sat, 28 Sep 2024 13:34:57 +0000 (22:34 +0900)]
[New feature] Force to make query cache.
Recognize /*FORCE QUERY CACHE*/ SQL statement comment so that any read
only SELECT/with queries are cached. This is opposite to /*NO QUERY
CACHE*/ comment. This feature should be used carefully. See the manual
for more details.
Discussion: https://github.com/pgpool/pgpool2/issues/56
Tatsuo Ishii [Wed, 18 Sep 2024 02:25:10 +0000 (11:25 +0900)]
Fix pgpool crash when pgpool child process exits.
When a pgpool child process exits, close_all_backend_connections() is
called, which is responsible for closing all connections to backend in
the connection pool. It used mistakenly MAIN_CONNECTION macro, which
is fine for current active connections but is not good for pooled
connections because a main node could be different at the time when
the connection pool was created. Fix is using in_use_backend()
instead.
Reported-by: Emond Papegaaij
Backpatch-through: v4.2
Tatsuo Ishii [Sat, 14 Sep 2024 13:52:49 +0000 (22:52 +0900)]
Fix resource leaks in pool_memqcache.c.
These leaks were brought in by commit
6fdba5c33 " Use psprintf()
instead of snprintf()." Since the commit was backpatched through 4.1,
this needs to be backpatched through 4.1 too.
Per Coverity (CID
1559726).
Backpatch-through: 4.1.
Tatsuo Ishii [Sat, 14 Sep 2024 13:41:30 +0000 (22:41 +0900)]
Fix pool_push_pending_data().
Fix "insecure data handling".
Per Coverity (CID
1559731)
Tatsuo Ishii [Sat, 14 Sep 2024 12:07:33 +0000 (21:07 +0900)]
Fix another bug in native replication/snapshot isolation mode.
insert_lock() forgot to send row lock command (lock_kind == 3 case) to
other than main node.
Tatsuo Ishii [Sat, 14 Sep 2024 00:54:56 +0000 (09:54 +0900)]
Fix bug in replication/snapshot isolation mode.
When INSERT command is received, pgpool automatically issues table
LOCK command to the target table but it forgot to send the command to
other than main nodes. This only happened in extended query mode.
This commit fixes the bug.
Discussion: GitHub issue #69.
https://github.com/pgpool/pgpool2/issues/69
Backpatch-through: v4.1
Tatsuo Ishii [Tue, 10 Sep 2024 10:20:03 +0000 (19:20 +0900)]
Fix resource leaks in pool_memqcache.c.
6fdba5c33
These leaks were brought in by commit
6fdba5c33 " Use psprintf()
instead of snprintf()." Since the commit was backpatched through 4.1,
this needs to be backpatched through 4.1 too.
Per Coverity (CID
1559736).
Backpatch-through: 4.1.
Tatsuo Ishii [Mon, 9 Sep 2024 08:10:30 +0000 (17:10 +0900)]
Fix resource leaks in watchdog.c.
These leaks were mostly brought in by commit
65dbbe7a0 "Add IPv6
support for hostname and heartbeat_hostname parameter." Since the
commit was only for master branch, no backpatch is necessary.
Per Coverity (CID
1559737 and CID
1559734).
Bo Peng [Sat, 7 Sep 2024 14:03:19 +0000 (23:03 +0900)]
Doc: add release notes.
Bo Peng [Sat, 7 Sep 2024 12:30:07 +0000 (21:30 +0900)]
Fix multiple query cache vulnerabilities (CVE-2024-45624).
When the query cache feature is enabled, it was possible that a user
can read rows from tables that should not be visible for the user
through query cache.
- If query cache is created for a row security enabled table for user
A, and then other user B accesses the table via SET ROLE or SET
SESSION_AUTHORIZATION in the same session, it was possible for the
user B to retrieve rows which should not be visible from the user B.
- If query cache is created for a table for user A, and then other
user B accesses the table via SET ROLE or SET SESSION_AUTHORIZATION
in the same session, it was possible for the user B to retrieve rows
which should not be visible from the user B.
- If query cache is created for a table for a user, and then the
access right of the table is revoked from the user by REVOKE
command, still it was possible for the user to to retrieve the rows
through the query cache.
Besides the vulnerabilities, there were multiple bugs with the query
cache feature.
- If query cache is created for a row security enabled table for a
user, and then ALTER DATABASE BYPASSRLS or ALTER ROLE BYPASSRLS
disable the row security of the table, subsequent SELECT still
returns the same rows as before through the query cache.
- If query cache is created for a table for a user, and then ALTER
TABLE SET SCHEMA changes the search path to not allow to access the
table, subsequent SELECT still returns the rows as before through
the query cache.
To fix above, following changes are made:
- Do not allow to create query cache/use query cache for row security
enabled tables (even if the table is included in
cache_safe_memqcache_table_list).
- Do not allow to create query cache/use query cache if SET ROLE/SET
AUTHORIZATION is executed in the session (query cache invalidation
is performed when a table is modified as usual).
- Remove entire query cache if REVOKE/ALTER DATABASE/ALTER TABLE/ALTER
ROLE is executed. If the command is executed in an explicit
transaction, do not create query cache/use query cache until the
transaction gets committed (query cache invalidation is performed
when a table is modified as usual). If the transaction is aborted,
do not remove query cache.
Patch is created by Tatsuo Ishii.
Backpatch-through: v4.1
Tatsuo Ishii [Tue, 27 Aug 2024 01:59:37 +0000 (10:59 +0900)]
Add IPv6 support for hostname and heartbeat_hostname parameter.
Now these watchdog configuration parameters accept IPv6 IP address.
Author: Kwangwon Seo
Reviewed-by: Muhammad Usama, Tatsuo Ishii
Discussion: [pgpool-hackers: 4476] Watchdog and IPv6
https://www.pgpool.net/pipermail/pgpool-hackers/2024-July/004477.html
Tatsuo Ishii [Sun, 25 Aug 2024 01:03:54 +0000 (10:03 +0900)]
Revert "Doc: mention that reloading is not necessary when pool_passwd is updated."
This reverts commit
4695affe7859338fa41d860dac74bfbebea7a88a.
"reloading is not necessary when pool_passwd is updated." was not
correct. Since reading pool_passwd in pgpool uses a buffered read,
it's too fragile to assume that changes to pool_passwd by different
process is immediately available for pgpool process. To reflect the
changes, pgpool reload (which causes re-opening pool_passwd). is
necessary.
Discussion: [pgpool-general: 9185] reloading of pool_passwd file
https://www.pgpool.net/pipermail/pgpool-general/2024-August/001862.html
Tatsuo Ishii [Tue, 20 Aug 2024 11:21:43 +0000 (20:21 +0900)]
Doc: mention that reloading is not necessary when pool_passwd is updated.
Discussion: [pgpool-general: 9185] reloading of pool_passwd file
https://www.pgpool.net/pipermail/pgpool-general/2024-August/001862.html
Backpatch-through: v4.1
Tatsuo Ishii [Sun, 11 Aug 2024 06:36:37 +0000 (15:36 +0900)]
Fix another bug in replication mode and snapshot isolation mode.
This is a follow up commit for
181d300de6337fe9a10b60ddbd782aa886b563e9.
If previous query produces parameter status message, subsequent
parse() needs to read and process it because it wants to read Ready
for query message which is supposed to follow the parameter status
message. However when ParameterStatus() gets called, the query in
progress flag was set and it was possible that only one of parameter
status message from backend was processed if the query processed in
this parse() call is load balanced. It is likely that the parameter
status message comes from all live backend because they are generated
by SET command, and SET command are sent to all live backend in
replication mode and snapshot isolation mode. So unset the query in
progress flag before calling ParameterStatus().
Here is the test case written in pgproto data format.
'P' "" "SET application_name TO foo"
'B' "" "" 0 0 0
'E' "" 0
'P' "" "SELECT 1"
'B' "" "" 0 0 0
'E' "" 0
'P' "" "SET application_name TO bar"
'B' "" "" 0 0 0
'E' "" 0
'S'
'Y'
'X'
Backpatch-through: v4.1.
Tatsuo Ishii [Fri, 9 Aug 2024 10:55:05 +0000 (19:55 +0900)]
Fix bug in replication mode and snapshot isolation mode.
In replication mode and snapshot isolation mode when a command fishes,
pgpool waits for a ready for query message but forgot that some
commands (for example SET ROLE) produces a parameter status
message. As a result pgpool errors out that other message arrives
before the ready for query message. Deal with the case when a
parameter status message arrives.
Here is the test case written in pgproto data format.
'P' "" "SET ROLE TO foo"
'B' "" "" 0 0 0
'E' "" 0
'P' "" "SELECT 1"
'B' "" "" 0 0 0
'E' "" 0
'S'
'Y'
Backpatch-through: v4.1.
Bo Peng [Tue, 6 Aug 2024 06:07:19 +0000 (15:07 +0900)]
Doc: add release notes.
Bo Peng [Mon, 5 Aug 2024 06:52:53 +0000 (15:52 +0900)]
Change the default value of *_user parameters to ''.
Currently the default values of *_user parameters are "nobody".
This commit changes the default value of *_user parameters to ''.
Bo Peng [Mon, 5 Aug 2024 06:44:48 +0000 (15:44 +0900)]
Downgrade reaper handler logs.
The following log messages appear when a child process exits due to settings (e.g., child_life_time or child_max_connections) .
Downgrade them to DEBUG1 because they are normal messages.
reaper handler
reaper handler: exiting normally
Bo Peng [Mon, 5 Aug 2024 06:34:33 +0000 (15:34 +0900)]
Feature: Add new PCP command to trigger log rotation
Currently the only way to trigger log rotation in logging collector process
is to send SIGUSR1 signal directly to logging collector process.
However, I think it would be nice to have a better way to do it with an external
tool (e.g. logrotate) without requiring knowledge of the logging collector's PID.
This commit adds a new PCP command "pcp_log_rotate" for triggering log rotation.
Tatsuo Ishii [Sun, 4 Aug 2024 05:16:03 +0000 (14:16 +0900)]
Remove unnecessary code surrounded by ifdef NOT_USED.
Tatsuo Ishii [Sun, 4 Aug 2024 03:14:11 +0000 (12:14 +0900)]
Remove unnecessary code surrounded by ifdef NOT_USED.
Tatsuo Ishii [Sun, 4 Aug 2024 01:14:00 +0000 (10:14 +0900)]
Comment: fix typo in comment.
Tatsuo Ishii [Sat, 3 Aug 2024 05:30:33 +0000 (14:30 +0900)]
Use psprintf() instead of snprintf().
Previously fixed size buffers were used for snprintf in the file. It's
not appropriate to use snprintf here because the result string could
exceed the buffer size and it could lead to incomplete command or path
used after.
Backpatch-through: 4.1.
Bo Peng [Thu, 1 Aug 2024 17:25:49 +0000 (02:25 +0900)]
Use "psql -V" instead of "initdb -V" in sample scripts.
Use "psql -V" instead of "initdb -V" in the sample scripts
bacause in some cases postgresqlxx-server may not be installed.
Bo Peng [Thu, 1 Aug 2024 04:49:48 +0000 (13:49 +0900)]
Doc: Add the criteria for selecting processes to be killed to max_spare_children.
Tatsuo Ishii [Tue, 30 Jul 2024 09:17:57 +0000 (18:17 +0900)]
Fix hang after a flush message received.
Previously pgpool could hang after a flush message arrives. Consider
following scenario:
(1) backend sends a portal suspend message.
(2) pgool write it in the frontend write buffer. But not flush it.
(3) The frontend sends a flush message to pgpool.
(4) pgpool fowards the flush message to backend.
(5) Since there's no pending message in backend, nothing happen.
(6) The frontend waits for the portal suspend message from pgpool in vain.
To fix this, at (4) pgpool flushes data in the frontend write buffer
if some data remains (in this case the portal suspend message). Then
the frontend will send next request message to pgpool.
Discussion: https://github.com/pgpool/pgpool2/issues/59
Backpatch-through: master, 4.5, 4.4, 4.3, 4.2 and 4.1.
Tatsuo Ishii [Tue, 30 Jul 2024 02:51:59 +0000 (11:51 +0900)]
Doc: enhance failover document.
Clarify the condition on failover when failover_on_backend_shutdown is
enabled.
Tatsuo Ishii [Sat, 27 Jul 2024 10:22:13 +0000 (19:22 +0900)]
Remove dead code.
Remove dead code surrounded by "#ifdef NOT_USED".
Tatsuo Ishii [Mon, 22 Jul 2024 10:32:41 +0000 (19:32 +0900)]
Fix another segmentation fault.
It is reported that pgpool child segfaulted in pool_do_auth. The cause
was MAIN_CONNECTION() returns NULL. It seems my_main_node_id was set
to incorrect node id 0, which was actually in down status. thus there
was no connection in cp->slots[0]. In this particular case a client
connected to pgpool while failover occurred in another pgpool node,
and it was propagated by watchdog, which changed backend_status in
shared memory. new_connection() properly updates my_backend_status but
it forgot to update my_main_node_id, and MAIN_CONNECTION returned
incorrect backend id.
Problem reported by: Emond Papegaaij
Discussion: [pgpool-general: 9175] Segmentation fault
https://www.pgpool.net/pipermail/pgpool-general/2024-July/001852.html
Backpatch-through: V4.1.
Tatsuo Ishii [Fri, 19 Jul 2024 11:43:03 +0000 (20:43 +0900)]
Fix dynamic process management.
Calculation of pooled_connection, which is used by the process
eviction algorithm, was not correct. The number always resulted in
max_pool. Also more comments are added.
Discussion: [pgpool-hackers: 4490] Issue with dynamic process management
https://www.pgpool.net/pipermail/pgpool-hackers/2024-July/004491.html
Backpatch-through: master, 4.5, 4.4
Tatsuo Ishii [Thu, 11 Jul 2024 02:56:57 +0000 (11:56 +0900)]
Test: add temporary checking in 028.watchdog_enable_consensus_with_half_votes.
We often see a timeout error in the buildfarm test. Analyzing the
buildfarm log shows:
2024-07-10 03:41:31.044: watchdog pid 29119: FATAL: failed to create watchdog receive socket
2024-07-10 03:41:31.044: watchdog pid 29119: DETAIL: bind on "TCP:50010" failed with reason: "Address already in use"
I suspect there's something wrong in watchdog shutdown process. To
confirm my theory, add sh command to show all process named "pgpool"
at the end of each test cycle.
Bo Peng [Fri, 28 Jun 2024 10:42:58 +0000 (19:42 +0900)]
Fixed segmentation fault at parsing config file.
This commit fixed a segmentation fault that occurs when parsing pgpool.conf
if the setting value was not enclosed in single quotes.
The patch is created by Carlos Chapi, reviewed and modified by Tatsuo Ishii.
Tatsuo Ishii [Fri, 21 Jun 2024 06:37:25 +0000 (15:37 +0900)]
Fix segfault to not use MAIN_NODE macro.
Some functions (close_idle_connection(), new_connection() and
pool_create_cp()) used MAIN* and VALID_BACKEND where they are not
appropriate. MAIN* and VALID_BACKEND are only useful against current
connections to backend, not for pooled connections since in pooled
connections which backend is the main node or up and running is
necessarily same as the current connections to backend.
The misuses of those macros sometimes leads to segfault.
This patch introduces new in_use_backend_id() which returns the fist
node id in use. This commit replaces some of MAIN* with the return
value from in_use_backend_id(). Also inappropriate calls to
VALID_BACKEND are replaced with CONNECTION_SLOT macro.
Problem reported by Emond Papegaaij
Discussion: https://www.pgpool.net/pipermail/pgpool-general/2024-June/009176.html
[pgpool-general: 9114] Re: Another segmentation fault
Backpatch-through: V4.1
Tatsuo Ishii [Fri, 21 Jun 2024 05:21:15 +0000 (14:21 +0900)]
Fix MAIN_NODE macro (actually pool_virtual_main_db_node_id()).
The macro used to REAL_MAIN_NODE_ID if there's no session context.
This is wrong since REAL_MAIN_NODE_ID can be changed any time when
failover/failback happens. Suppose REAL_MAIN_NODE_ID ==
my_main_node_id == 1. Then due to failback, REAL_MAIN_NODE_ID is
changed to 0. Then MAIN_CONNECTION(cp) will return NULL and any
reference to it will cause segmentation fault. To prevent the issue we
should return my_main_node_id instead.
Discussion: https://www.pgpool.net/pipermail/pgpool-general/2024-June/009205.html
Backpatch-through: V4.1
Tatsuo Ishii [Thu, 20 Jun 2024 06:44:18 +0000 (15:44 +0900)]
Fix typo in comment.
Bo Peng [Wed, 19 Jun 2024 06:19:30 +0000 (15:19 +0900)]
Doc: add the missing default values for virtual IP related parameters.
Tatsuo Ishii [Fri, 14 Jun 2024 00:30:46 +0000 (09:30 +0900)]
Fix "show pool_processes" to not show row description twice.
processes_reporting() accidentaly called both send_row_description()
and send_row_description_and_data_rows().
Discussion: https://www.pgpool.net/pipermail/pgpool-hackers/2024-June/004472.html
[pgpool-hackers: 4471] [PATCH] printing empty row first in query "show pool_process"
Author: Kwangwon Seo
Back patch to V4.2 where the problem started.
Tatsuo Ishii [Wed, 12 Jun 2024 08:13:47 +0000 (17:13 +0900)]
Eliminate unnecessary memory allocation in extended query protocol.
When pending messages are created, Pgpool-II did like:
(1) pmsg = pool_pending_message_create(); /* create a pending message */
(2) pool_pending_message_dest_set(pmsg, query_context) /* set PostgreSQL node ids to be sent */
(3) pool_pending_message_query_set(pmsg, query_context); /* add query context */
(4) pool_pending_message_add(pmsg); /* add the pending message to the list */
(5) pool_pending_message_free_pending_message(pmsg); /* free memory allocated by pool_pending_message_create();
The reason why pool_pending_message_free_pending_message(pmsg) is
called here is, pool_pending_message_add() creates a copy of the
pending message then add it to the list. This commit modifies
pool_pending_message_add() so that it does not create a copy of the
object and adds it to the pending messages list. This way, we can
eliminate (5) as well and it should reduce memory footprint and CPU
cycle.
Tatsuo Ishii [Tue, 11 Jun 2024 11:15:08 +0000 (20:15 +0900)]
Fix segfault in a child process.
It is reported that pgpool child segfaulted [1].
[snip]
In the down thread it is reported that despite VALID_BACKEND(i)
returns true, backend->slots[i] is NULL, which should have been filled
by new_connection().
It seems there's a race condition. In new_connection(), there's a code
fragment:
/*
* Make sure that the global backend status in the shared memory
* agrees the local status checked by VALID_BACKEND. It is possible
* that the local status is up, while the global status has been
* changed to down by failover.
*/
A--> if (BACKEND_INFO(i).backend_status != CON_UP &&
BACKEND_INFO(i).backend_status != CON_CONNECT_WAIT)
{
ereport(DEBUG1,
(errmsg("creating new connection to backend"),
errdetail("skipping backend slot %d because global backend_status = %d",
i, BACKEND_INFO(i).backend_status)));
/* sync local status with global status */
B--> *(my_backend_status[i]) = BACKEND_INFO(i).backend_status;
continue;
}
It is possible that at A backend_status in the shared memory is down
but by the time it reaches B the status has been changed to up. And
new_connection() skipped to create a backend connection. This seems to
explain why the connection slot is NULL while VALID_BACKEND returns
true. To prevent the race condtion, backend_status in shared memory is
copied to a local variable and evaluate it. Also the VALID_BACKEND
just before:
pool_set_db_node_id(CONNECTION(backend, i), i);
is changed to:
if (VALID_BACKEND(i) && CONNECTION_SLOT(backend, i))
so that it prevents crash just in case.
[1] [pgpool-general: 9104] Another segmentation fault
Muhammad Usama [Tue, 11 Jun 2024 06:52:00 +0000 (11:52 +0500)]
Fix: 823: Watchdog dies and kills pgpool2 when network gets shortly interrupted.
With network monitoring enabled, a Pgpool node would shut down immediately if it
lost all network interfaces or assigned IP addresses, providing extra protection
by quickly removing a non-communicative node from the cluster.
The issue was that Pgpool responded to network blackout events even when network
monitoring was disabled. This fix ensures that the network monitoring socket is
not opened when network monitoring is not enabled, preventing unnecessary shutdowns.
Tatsuo Ishii [Mon, 10 Jun 2024 02:23:47 +0000 (11:23 +0900)]
Fix "pgpool reset" command not working if watchdog is enabled.
[pgpool-hackers: 4465] abnormal behavior about PGPOOL RESET. and proposal a patch file.
reported that "pgpool reset" command fails if watchdog is enabled.
test=# PGPOOL RESET client_idle_limit;
SET
ERROR: Pgpool node id file �y/pgpool_node_id does not exist
DETAIL: If watchdog is enable, pgpool_node_id file is required
message type 0x5a arrived from server while idle
message type 0x43 arrived from server while idle
message type 0x5a arrived from server while idle
SetPgpoolNodeId() tried to obtain the path to the node id file by
using global variable config_file_dir and failed because it points to
an automatic variable in ParseConfigFile().
To fix this, change the config_file_dir from a pointer to an array and
save the path string into config_file_dir in ParseConfigFile().
Also regression test is added to 004.watchdog.
Bug reported and problem analysis by keiseo.
Back patch to V4.2 in which the node id file was introduced.
Tatsuo Ishii [Fri, 7 Jun 2024 10:21:30 +0000 (19:21 +0900)]
Mega typo fix for docs and program source codes.
Author: Umar Hayat
Tatsuo Ishii [Tue, 4 Jun 2024 01:11:05 +0000 (10:11 +0900)]
Fix psql_scan crash.
It was reported that psql_scan crashes while determining whether a
string in a long query is psql variable (i.e. starting with ":") or
not.
https://github.com/pgpool/pgpool2/issues/54
This is because callback struct were not provided while calling
psql_scan_create(). Later psql_scan() tries to invoke a callback and
crashes because the pointer to the callback struct is NULL. To fix
this, provide PsqlScanCallbacks struct with a NULL pointer inside to
the callback function. With this, psql_scan() avoids to invoke a
callback.
Backpatch to master, V4.5, V4.4, V4.3, V4.2 and V4.1 where psql_scan
was introduced.
Bo Peng [Mon, 20 May 2024 05:11:36 +0000 (14:11 +0900)]
Delete unnecessary if branch.
https://github.com/pgpool/pgpool2/issues/52
Bo Peng [Thu, 16 May 2024 01:00:22 +0000 (10:00 +0900)]
Doc: update Copyright.
Bo Peng [Tue, 14 May 2024 22:50:16 +0000 (07:50 +0900)]
Doc: add release notes.
Bo Peng [Thu, 9 May 2024 00:11:06 +0000 (09:11 +0900)]
Remove leading/trailing spaces in string list type configuration parameters.
If the string list type configuration parameters (e.g. unix_socket_directories, pcp_socket_dir, etc.) contain white spaces, it may cause startup failure.
Bo Peng [Tue, 7 May 2024 00:17:30 +0000 (09:17 +0900)]
Doc: fix documentation typos.
Bo Peng [Wed, 1 May 2024 06:42:10 +0000 (15:42 +0900)]
Fixed compiler error with -Werror=implicit-function-declaration
- Add missing header files in autoconf check and
- Add LDAP_DEPRECATED to include prototypes for deprecated ldap functions
Patch is created by Vladimir Petko.
Tatsuo Ishii [Wed, 24 Apr 2024 04:43:25 +0000 (13:43 +0900)]
Silence gcc warning.
Commit
0b94cd9f caused a gcc warning:
streaming_replication/pool_worker_child.c: In function 'do_worker_child':
streaming_replication/pool_worker_child.c:281:40: warning: 'watchdog_leader' may be used uninitialized in this function [-Wmaybe-uninitialized]
if (!pool_config->use_watchdog ||
^
It seems this only occures in older gcc (e.g. gcc 4.8.5).
Backpatch-thtrough: master branch only as commit
0b94cd9f only applied to master.
Bo Peng [Thu, 18 Apr 2024 02:24:35 +0000 (11:24 +0900)]
Fix pgpool.spec.
The permission of /etc/sudoers.d/pgpool should be mode 0440.
Tatsuo Ishii [Thu, 4 Apr 2024 11:44:53 +0000 (20:44 +0900)]
Fix segfault in pgpool main process.
This is a follow up commit for
0564864e "Fix assorted causes of
segmentation fault.". It lacked the fix while verify_backend_node calls
get_server_version, i.e. checking availability of slots.
Patch provided by: Emond Papegaaij
Backpatch-through: v4.4
Discussion:
[pgpool-general: 9072] Re: Segmentation after switchover
https://www.pgpool.net/pipermail/pgpool-general/2024-April/009133.html
Tatsuo Ishii [Thu, 4 Apr 2024 04:54:34 +0000 (13:54 +0900)]
Fix assorted causes of segmentation fault.
It is reported that pgpool and its child process segfault in certain
cases when failover involved.
In pgpool main get_query_result (called from find_primary_node) crashed.
do_query(slots[backend_id]->con, query, res, PROTO_MAJOR_V3);
It seems slots[0] is NULL here. slots[0] is created by
make_persistent_db_connection_noerror() but it failed with log
message: "find_primary_node: make_persistent_db_connection_noerror
failed on node 0". Note that at the time when
make_persistent_db_connection_noerror() is called, VALID_BACKEND
reported that node 0 is up. This means that failover is ongoing and
the node status used by VALID_BACKEND did not catch up. As a result
get_query_user is called with slots[0] = NULL, which caused the
segfault. Fix is, check slots entry before calling
get_query_result.
Also health check has an issue with connection "slot" memory. It is
managed by HealthCheckMemoryContext. slot is the pointer to the
memory. When elog(ERROR) is raised, pgpool long jumps and resets the
memory context. Thus, slot remains as a pointer to freed memory. To
fix this, always set NULL to slot right after the
HealthCheckMemoryContext call.
Similar issue is found with streaming replication check too and is
also fixed in this commit.
Problem reported and analyzed: Emond Papegaaij
Backpatch-through: v4.4
Discussion:
[pgpool-general: 9070] Re: Segmentation after switchover
https://www.pgpool.net/pipermail/pgpool-general/2024-April/009131.html
Tatsuo Ishii [Thu, 4 Apr 2024 02:45:42 +0000 (11:45 +0900)]
Test: fix 037.failover_session.
The test script forgot to execute shutdownall before exiting.
Tatsuo Ishii [Wed, 3 Apr 2024 10:13:53 +0000 (19:13 +0900)]
Fix uninitialized memory error.
It was reported that valgrind found several errors including an
uninitialized memory error in read_startup_packet. It allocates memory
for user name in a startup packet in case cancel or SSL request using
palloc, and later on the memory is used by pstrdup. Since memory
allocated by palloc is undefined, this should have been palloc0.
Bug reported by: Emond Papegaaij
Backpatch-through: v4.1
Discussion:
[pgpool-general: 9065] Re: Segmentation after switchover
https://www.pgpool.net/pipermail/pgpool-general/2024-April/009126.html
Bo Peng [Wed, 3 Apr 2024 07:18:17 +0000 (16:18 +0900)]
Doc: enhance "Upstream server connection" documentation.
Tatsuo Ishii [Tue, 2 Apr 2024 10:30:27 +0000 (19:30 +0900)]
Fix errors/hung up when load_balance_mode is off.
Commit:
3f3c1656 Fix statement_level_load_balance with BEGIN etc.
brought errors/hung up when load_balance_mode is off, primary node id
is not 0 and queries are BEGIN etc.
pool_setall_node_to_be_sent() checked if the node is primary. If not,
just returned with empty where_to_send map which makes
set_vrtual_main_node() not to set
query_context->virtual_main_node_id. As a result, MAIN_NODE macro
(it's actually pool_virtual_main_db_node_id()) returns
REAL_MAIN_NODE_ID, which is 0 if node 0 is alive (this should have
been primary node id).
Following simple test reveals the bug.
(1) create a two-node cluster using pgpool_setup
(2) shutdown node 0 and recover node 1 (pcp_recovery_node 0). This
makes node 0 to be standby, node 1 to be primary.
(3) add followings to pgpool.conf and restart whole cluster.
load_balance_mode = off
backend_weight1 = 0
(4) type "begin" from psql. It gets stuck.
Bug found and analyzed by Emond Papegaaij.
Discussion: https://www.pgpool.net/pipermail/pgpool-general/2024-March/009113.html
Backpatch-through: v4.1
Masaya Kawamoto [Thu, 28 Mar 2024 00:16:55 +0000 (00:16 +0000)]
Doc: language cleanup in Japanese document
Replace "マスター" with "プライマリ"
Masaya Kawamoto [Wed, 27 Mar 2024 08:11:41 +0000 (08:11 +0000)]
Doc: add the note about using pcp_promote_node when two postgres
Even if there are two postgres, there are cases that
follow_primary_command is required to be set.
Tatsuo Ishii [Mon, 25 Mar 2024 07:15:50 +0000 (16:15 +0900)]
Fix compile errors with certain CFLAGS.
https://github.com/pgpool/pgpool2/issues/42 reported that with CFLAGS
-flto=4 -Werror=odr -Werror=lto-type-mismatch -Werror=strict-aliasing
gcc emits errors. Some of them are mistakes when their sources were
brought in from PostgreSQL. This commit fixes them. Note that I was
not able to suppress some errors at least with my gcc (9.4.0). This
may be because gcc bug (false positives) or just a bug with the old
gcc, I don't know at this point. Maybe someday revisit this.
Discussion:
[pgpool-hackers: 4442] Fixing GitHub issue 42
https://www.pgpool.net/pipermail/pgpool-hackers/2024-March/004443.html
../src/include/query_cache/pool_memqcache.h:251:20: warning: type of 'pool_fetch_from_memory_cache' does not match original declaration [-Wlto-type-mismatch]
251 | extern POOL_STATUS pool_fetch_from_memory_cache(POOL_CONNECTION * frontend,
| ^
query_cache/pool_memqcache.c:731:1: note: 'pool_fetch_from_memory_cache' was previously declared here
731 | pool_fetch_from_memory_cache(POOL_CONNECTION * frontend,
| ^
query_cache/pool_memqcache.c:731:1: note: code may be misoptimized unless '-fno-strict-aliasing' is used
../src/include/utils/palloc.h:64:22: warning: type of 'CurrentMemoryContext' does not match original declaration [-Wlto-type-mismatch]
64 | extern MemoryContext CurrentMemoryContext;
| ^
../../src/utils/mmgr/mcxt.c:40:15: note: 'CurrentMemoryContext' was previously declared here
../../src/utils/mmgr/mcxt.c:40:15: note: code may be misoptimized unless '-fno-strict-aliasing' is used
../src/include/utils/memutils.h:55:22: warning: type of 'TopMemoryContext' does not match original declaration [-Wlto-type-mismatch]
55 | extern MemoryContext TopMemoryContext;
| ^
../../src/utils/mmgr/mcxt.c:46:15: note: 'TopMemoryContext' was previously declared here
../../src/utils/mmgr/mcxt.c:46:15: note: code may be misoptimized unless '-fno-strict-aliasing' is used
../src/include/pool_config.h:646:22: warning: type of 'pool_config' does not match original declaration [-Wlto-type-mismatch]
646 | extern POOL_CONFIG * pool_config;
| ^
config/pool_config.l:46:14: note: 'pool_config' was previously declared here
46 | POOL_CONFIG *pool_config = &g_pool_config; /* for legacy reason pointer to the above struct */
| ^
config/pool_config.l:46:14: note: code may be misoptimized unless '-fno-strict-aliasing' is used
Bo Peng [Mon, 25 Mar 2024 07:10:46 +0000 (16:10 +0900)]
Fixed comments in sample pgpool.conf.
- The comment for sr_check_period. The default value should be 10 seconds.
- Also fixed some typos in comments.
Patch is created by hiroin and modified by Bo Peng.
Takuma Hoshiai [Thu, 21 Mar 2024 13:16:28 +0000 (22:16 +0900)]
Remove a file under config directory.
Remove Makefile.in etc. generated by autoconf.
Create .gitignore under src/config and add generated files by bison and flex.
Tatsuo Ishii [Wed, 20 Mar 2024 22:31:29 +0000 (07:31 +0900)]
Allow reset queries to run even if extended queries do not end.
Commit
240c668d "Guard against inappropriate protocol data." caused
reset queries fail if extended query messages do not end. This commit
fix that by checking whether we are running reset queries in
SimpleQuery(). Also add the test case for this.
Takuma Hoshiai [Wed, 20 Mar 2024 23:57:23 +0000 (08:57 +0900)]
Fix a compiler warning
Fix warning introduced in the previous commit.
Takuma Hoshiai [Wed, 20 Mar 2024 16:41:52 +0000 (01:41 +0900)]
Fix memory leak pointed out by Coverity.
Tatsuo Ishii [Mon, 18 Mar 2024 04:50:03 +0000 (13:50 +0900)]
Test: enhance 082.guard_against_bad_protocol script comment.
Tatsuo Ishii [Mon, 18 Mar 2024 01:33:16 +0000 (10:33 +0900)]
Guard against inappropriate protocol data.
If a simple query message arrives before a sequence of extended query
messages ends (that is, no sync message arrives or some ready for
query messages corresponding the sync message do not arrive yet),
pgpool could hang. This is because the query context in the session
context for the simple query is overwritten by the query contexts of
the extended query messages.
This commit implements a guard in SimpleQuery() by checking whether
extended query protocol messages ended. If they do not end, raise a
FATAL error. A known example detected by this checking is JDBC
driver's "autosave=always" option. This means pgpool will not accept
the option after this commit until the issue (sending a simple
protocol message before ending extended query message protocol) is
fixed by the JDBC driver side.
Discussion:
[pgpool-hackers: 4427] Guard against ill mannered frontend
https://www.pgpool.net/pipermail/pgpool-hackers/2024-February/004428.html
Tatsuo Ishii [Sun, 17 Mar 2024 01:11:04 +0000 (10:11 +0900)]
Enhance the stability of detach_false_primary.
It was possible that enabling detach_false_primary caused that all
backend nodes went down.
Suppose watchdog is enabled and there are 3 watchdog nodes pgpool0,
pgpool1 and pgpool2. If pgpool0 and pgpool1 find primary PostgreSQL
goes down due to network trouble between pgpool and PostgreSQL, they
promote a standby node. pgpool2 could find that there are two primary
nodes because the backend status at pgpool2 has not been synced with
pgpool0 and pgpool1, and pgpool2 perform detach_false_primary against
the standby, which is being promoted.
To prevent the issue, now detach_false_primary is performed only by
watchdog leader node. With this, pgpool will not see half baked
backend status and the issue described above will not happen.
Discussion: https://www.pgpool.net/pipermail/pgpool-hackers/2024-February/004432.html
([pgpool-hackers: 4431] detach_false_primary could make all nodes go down)
Tatsuo Ishii [Sun, 17 Mar 2024 00:01:13 +0000 (09:01 +0900)]
Revert "Enhance the stability of detach_false_primary."
This reverts commit
c5b25883d21a180ec54a2fea9de67d5da1367464.
This commit accidentally included other updates.
Tatsuo Ishii [Sat, 16 Mar 2024 13:07:17 +0000 (22:07 +0900)]
Enhance the stability of detach_false_primary.
It was possible that enabling detach_false_primary caused that all
backend node went down.
Suppose watchdog is enabled and there are 3 watchdog nodes pgpool0,
pgpool1 and pgpool2. If pgpool0 and pgpool1 find primary PostgreSQL
goes down due to network trouble between pgpool and PostgreSQL, they
promote a standby node. pgpool2 could find that there are two primary
nodes because the backend status at pgpool2 has not been synced with
pgpool0 and pgpool1, and pgpool2 perform detach_false_primary against
the standby, which is being promoted.
To prevent the situation, now detach_false_primary is performed by only
watchdog leader node. With this, pgpool will not see half baked backend
status and the issue described above will not happen.
Discussion: https://www.pgpool.net/pipermail/pgpool-hackers/2024-February/004432.html
([pgpool-hackers: 4431] detach_false_primary could make all nodes go down)
Bo Peng [Sat, 16 Mar 2024 13:11:23 +0000 (22:11 +0900)]
Test: fix test failure introduced in the previous commit.
Bo Peng [Fri, 15 Mar 2024 04:54:59 +0000 (13:54 +0900)]
Test: fixed regression test 005.jdbc to avoid Java 21 deprecation warnings.
Replace Runtime.exec(String) with Runtime.exec(String[]) to avoid Java 21 deprecation warnings.
Patch is created by Vladimir Petko and modified by Bo Peng.
Bo Peng [Thu, 29 Feb 2024 05:13:13 +0000 (14:13 +0900)]
Doc: update copy right.
Bo Peng [Tue, 27 Feb 2024 02:28:25 +0000 (11:28 +0900)]
Doc: add release notes.
Bo Peng [Tue, 27 Feb 2024 00:55:28 +0000 (09:55 +0900)]
Remove restore_command from sample scripts.
Restore_command is not required because replication slot is enabled.
It causes occasional failover failure.
Bo Peng [Mon, 26 Feb 2024 10:43:10 +0000 (19:43 +0900)]
Fix the default values.
Fixed the default values of the following parameters:
- recovery_user
- failover_on_backend_shutdown
- insert_lock
Tatsuo Ishii [Mon, 26 Feb 2024 07:05:31 +0000 (16:05 +0900)]
Doc: fix Japanese watchdog document.
It mistakenly used "master" watchdog node instead of "leader".
Tatsuo Ishii [Mon, 26 Feb 2024 07:01:27 +0000 (16:01 +0900)]
Doc: fix English watchdog document.
It mistakenly used "main" PostgreSQL node instead of "primary".
Tatsuo Ishii [Thu, 22 Feb 2024 11:35:36 +0000 (20:35 +0900)]
Fix to use forward declation of a variable.
It is required by our coding standard (we follow PostgreSQL's coding
standard). Also fix small typo.
Tatsuo Ishii [Sat, 10 Feb 2024 02:50:28 +0000 (11:50 +0900)]
Fix statement_level_load_balance with BEGIN etc.
When statement_level_load_balance is enabled,
BEGIN/END/COMMIT/ABORT/SET/SAVEPOINT/RELEASE SAVEPOINT/DEALLOCATE
ALL/DISCARD were sent to primary node and all standby nodes even if
load_balance_mode is off. This is not only plain wrong but caused slow
down if one of the standby nodes are in remote network. Fix this in
that pgpool sends such queries to primary node only when
load_balance_mode is off.
Note that if load_balance_mode is on and statement_level_load_balance
is on, such queries are sent to all nodes as before. This is
necessary. For example, suppose there are 2 PostgreSQL nodes 0 and
1. An explicit transaction starts followed by two read only
SELECTs. The first SELECT is sent to node 0 because the node 0 is
chosen as the load balance node. The second SELECT is sent to node 1
because the node 1 is chosen as the load balance node. If pgpool has
not sent BEGIN to both node 0 and 1 when the transaction started, the
first or the second SELECT will be executed outside the transaction,
which is not an expected behavior. However this may bring slow down
mentioned above. I guess this has been less known to users and I
decided to add some notes to the statement_level_load_balance doc.
Reported: [pgpool-general: 8998] https://www.pgpool.net/pipermail/pgpool-general/2024-January/009059.html
Discussion: [pgpool-hackers: 4422] https://www.pgpool.net/pipermail/pgpool-hackers/2024-February/004423.html
Backpatch-through: v4.1
Bo Peng [Tue, 9 Jan 2024 03:11:48 +0000 (12:11 +0900)]
Test: now it will be displayed in the log if segfault occurs.
Tatsuo Ishii [Mon, 25 Dec 2023 08:10:12 +0000 (17:10 +0900)]
Test: enhance 037.failover_session/test.sh.
Previously it mistakenly executed pg_ctl stop after starting pgbench
in background. The smart shutdown always allows pgbench to run
successfully with pgbench -C option not being set because backend
does not shutdown while client session continues. In order to shutdown
backend in the middle of pgbench run, tweak health check parameters so
that it detects the backend down as soon as possible. This will
trigger failover in the middle of pgbench run.
With these changes Pgpool-II 4.5 and beyond succeeds in all 4 tests,
while pre-4.5 will fail in all 4 tests (that was the originally
expected result).
Also allow to run all the 4 tests even if some tests fail so that we
can check which of which test failed.
Bo Peng [Thu, 21 Dec 2023 09:17:51 +0000 (18:17 +0900)]
Doc: fix delay_threshold_by_time document mistake.
Millisecond is correct.
Bo Peng [Tue, 12 Dec 2023 04:30:55 +0000 (13:30 +0900)]
Start 4.6 development.
Bo Peng [Mon, 11 Dec 2023 07:52:10 +0000 (16:52 +0900)]
Doc: update 4.5 release note.
Tatsuo Ishii [Fri, 8 Dec 2023 07:27:57 +0000 (16:27 +0900)]
Fix 4.5 release note.
Description of multi-statement was not accurate and could cause misunderstanding.
Also mention that load balance for PREPARE/EXECUTE/DEALLOCATE is now possible.
Tatsuo Ishii [Thu, 7 Dec 2023 21:03:40 +0000 (06:03 +0900)]
Remove duplicate definition of TransactionId.
Since commit:
ca300f839, following is defined in
src/include/parser/pg_list.h and src/include/parser/primnodes.h.
typedef uint32 TransactionId;
This is harmless in moder OS/compilers, but an old RHEL5 user reported
that this results in compile error:
https://www.pgpool.net/pipermail/pgpool-general/2023-December/009040.html
So remove the definition from primnodes.h (remove it from pg_list.h
causes another compile error).
Back patched to V4_5_STABLE.
Bo Peng [Wed, 29 Nov 2023 06:56:48 +0000 (15:56 +0900)]
Doc: add release notes of 4.0.25-4.4.5.
Bo Peng [Mon, 27 Nov 2023 05:19:18 +0000 (14:19 +0900)]
Doc: update Installation document to mention that from Pgpool-II 4.5 it is required to run "autoreconf -fi" first to generate configure file.
Bo Peng [Mon, 27 Nov 2023 02:49:14 +0000 (11:49 +0900)]
Doc: Add Japanese 4.5 release note.
Bo Peng [Mon, 27 Nov 2023 02:34:57 +0000 (11:34 +0900)]
Doc: update Configuration Example "8.2. Pgpool-II + Watchdog Setup Example" to Pgpool-II 4.5 and PostgreSQL 16.
Bo Peng [Fri, 17 Nov 2023 03:46:54 +0000 (12:46 +0900)]
Modify the replication slot name conversion in sample scripts to add support for uppercase hostname.
Patch in created by Sheikh Wasiu Al Hasib and modified by Be Peng.
Bo Peng [Mon, 13 Nov 2023 13:26:21 +0000 (22:26 +0900)]
Update PostgreSQL version to PostgreSQL 16 in sample scripts.
Bo Peng [Mon, 13 Nov 2023 00:57:30 +0000 (09:57 +0900)]
Doc: update English Configuration Example "8.2. Pgpool-II + Watchdog Setup Example" to Pgpool-II 4.5 and PostgreSQL 16.
Several enhancements are also added.
Bo Peng [Fri, 10 Nov 2023 08:18:33 +0000 (17:18 +0900)]
Doc: update Pgpool-II version and PostgreSQL version in installation section.
Bo Peng [Fri, 10 Nov 2023 07:47:19 +0000 (16:47 +0900)]
Enable AM_MAINTAINER_MODE on master branch.
Bo Peng [Fri, 10 Nov 2023 07:45:04 +0000 (16:45 +0900)]
Disable AM_MAINTAINER_MODE.
Bo Peng [Thu, 9 Nov 2023 13:21:59 +0000 (22:21 +0900)]
Doc: add 4.5 english release notes.
Bo Peng [Thu, 9 Nov 2023 13:15:08 +0000 (22:15 +0900)]
Update Copyright of the previous commit (
4bfca73c6788cee498d74e938fa38c38b9abb6a2).
Bo Peng [Thu, 9 Nov 2023 12:52:43 +0000 (21:52 +0900)]
Downgrading some normal ERROR messages to DEBUG messages.
The following ERROR messages are downgraded to DEBUG messages.
(1) ERROR:unable to flush data to frontend
(2) ERROR: unable to read data from frontend
DETAIL: EOF encountered with frontend
(3) ERROR: unable to read data
DETAIL: child connection forced to terminate due to client_idle_limit:30 is reached
(1) and (2)
These messages are cuased when the client did not send a terminate message
before disconnecting to pgpool.
For example, when the client process was forcefully terminated, the error occurs.
Although they are harmless, it can sometimes confuse users.
(3)
If we set "client_idle_limit" to a non-zero value, the connection
will be disconnected if it remains idle since the last query.
The disconnection is caused by Pgpool-II settings,
but Pgpool-II handles the log message as an "ERROR".
Because the ERROR messages above are normal messages, I decide to downgrade them.
Discussion: https://www.pgpool.net/pipermail/pgpool-hackers/2023-June/004351.html