Age | Commit message (Collapse) | Author |
|
Commit 182b65bfc allows to use multiple Unix socket directories: /tmp
and /var/run/postgresql. However if the system does not have
accessible /var/run/postgresql, pgpool_setup fails unless
$PGSOCKET_DIR is explicitly set. Instead of failing, this commit
allows pgpool_setup to skip inaccessible directories.
Backpatch-through: v4.5
|
|
Author: Bo Peng
Tested by Taiki Koshino
Backpatch-through: V4.5
|
|
The streaming replication check and health check process forgot to
reopen pool_passwd upon reload. If sr_check_passwd or
health_check_passwd is empty string, the password is obtained from
pool_passwd. Thus those process read outdated content of pool_passwd
upon reload.
Backpatch-through: v4.2
|
|
|
|
|
|
If pg_rewind fails, the safest way for users is to recover manually.
|
|
The process started to call
get_pg_backend_status_from_leader_wd_node() which unconditionally emits
log message:
LOG: received the get data request from local pgpool-II on IPC interface
LOG: get data request from local pgpool-II node received on IPC interface is forwarded to leader watchdog node
every sr_check_period seconds, which is annoying. To fix this, an elog
line in process_IPC_data_request_from_leader() is downgraded from LOG
to DEBUG1.
Reported-by: Bo Peng.
|
|
Following error message was recorded every wd_heartbeat_deadtime since
65dbbe7a0 was committed.
2025-02-10 10:50:37.990: heart_beat_receiver pid 1060625: ERROR: failed to get socket data from heartbeat receive socket list
2025-02-10 10:50:37.990: heart_beat_receiver pid 1060625: DETAIL: select() got timeout, exceed 30 sec(s)
The heartbeat receiver waits for heartbeart packet arrives in
select(2) until wd_heartbeat_deadtime is expired. I believe the logic
is wrong: it should wait forever until the packet arrives. In v4.5 or
earlier, the hearbeart receiver waits in recvfrom() without
timeout. So give NULL to select's timeout parameter so that it waits
forever. Since 65dbbe7a0 is only in master branch, no backpatch is
made.
Reported by: Peng Bo
|
|
This commit includes:
- update sample scripts to PostgreSQL 17
- remove archive settings to disable archive mode
|
|
Patch is created by Umar Hayat.
|
|
Occasionally the test failed due to:
ERROR: relation "t2" does not exist
LINE 1: SELECT i, 'QUERY ID T1-1' FROM t2;
It seems the cause is that newly created table t2 takes sometime to
get replicated to standby. So insert "sleep 1" after the table
creation.
Backpatch-through: v4.2
|
|
Previously pool_signal did not set SA_RESTART flag. Thus any system
calls interrupted by a signal does not restart. Some of our code are
prepared so that they restart if a system call is interrupted by a
signal. But not sure all places are prepared too. So add the
flag. Note, PostgreSQL always uses the flag.
|
|
|
|
warning: ‘delete_all_cache_on_memcached’ declared ‘static’ but never defined[-Wunused-function]
|
|
|
|
reloading the Pgpool-II configurations.
The following logging_collector related parameters can now be changed by reloading:
- log_truncate_on_rotation
- log_directory
- log_filename
- log_rotation_age
- log_rotation_size
- log_file_mode
|
|
If query cache is enabled and query is operated in extended query mode
and pgpool is running in streaming replication mode, an execute
message could return incorrect results.
This could happen when an execute message comes with a non 0 row
number parameter. In this case it fetches up to the specified number of
rows and returns "PortalSuspended" message. Pgpool-II does not create
query cache for this. But if another execute message with 0 row
number parameter comes in, it fetches rest of rows (if any) and
creates query cache with the number of rows which the execute messages
fetched.
Obviously this causes unwanted results later on: another execute
messages returns result from query cache which has only number of rows
captured by the previous execute message with limited number of rows.
Another trouble is when multiple execute messages are sent
consecutively. In this case Pgpool-II returned exactly the same
results from query cache for each execute message. This is wrong since
the second or subsequent executes should return 0 rows.
To fix this, new boolean fields "atEnd" and "partial_fetch" are
introduced in the query context. They are initialized to false when a
query context is created (also initialized when bind message is
received). If an execute message with 0 row number is executed, atEnd
is set to true upon receiving CommandComplete message. If an execute
message with non 0 row number is executed, partial_fetch is set to
true and never uses the cache result, nor creates query cache.
When atEnd is true, pgpool will return CommandComplete message with
"SELECT 0" as a result of the execute message.
Also tests for this case is added to the 006.memqcache regression
test.
Backpatch-through: v4.2
Discussion: [pgpool-hackers: 4547] Bug in query cache
https://www.pgpool.net/pipermail/pgpool-hackers/2024-December/004548.html
|
|
a file other than the default value was specified in the pool_passwd parameter.
This issue is reported by Sadhuprasad Patro.
|
|
4dd7371c2 added test cases. SQL syntax used in the test was not
compatible with PostgreSQL 15 or earlier.
Backpatch-through: v4.2
|
|
When query cache is enabled and an execute message is sent from
frontend, pgpool injects query cache data into backend message buffer
if query cache data is available. inject_cached_message() is
responsible for the task. But it had an oversight if the message
stream from frontend includes more than one sets of bind or describe
message before a sync message. It tried to determine the frontend
message end by finding a bind complete or a row description message
from backend. But in the case, it is possible that these messages do
not indicate the message stream end because there are one more bind
complete or row description message. As a result the cached message is
inserted at inappropriate positron and pgpool mistakenly raised "kind
mismatch" error.
This commit changes the algorithm to detect the message stream end:
compare the number of messages from backend with the pending message
queue length. When a message is read from backend, the counter for the
number of message is counted up if the message is one of parse
complete, bind complete, close complete, command compete, portal
suspended or row description. For other message type the counter is
not counted up. If the counter reaches to the pending message queue
length, we are at the end of message stream and inject the cahced
messages.
Test cases for 006.memqcache are added.
Backpatch-through: v4.2.
|
|
Sometimes we see regression errors like:
2024-12-01 13:55:55.508: watchdog pid 27340: FATAL: failed to create watchdog receive socket
2024-12-01 13:55:55.508: watchdog pid 27340: DETAIL: bind on "TCP:50002" failed with reason: "Address already in use"
Before starting each regression test, we use "clean_all" script to
kill all remaining process. I suspect that this is not enough to
release bound ports. So I add netstat command to check whether some
ports are remain bound.
For not this commit is master branch only.
|
|
Commit 6d4106f9c forgot to add pgproto data which is necessary in the
test.
|
|
When enabled, log protocol messages from each backend. Possible
options are "none", "terse" and "verbose". "none" disables the feature
and is the default. "verbose" prints the log each time pgpool receives
a message from backend. "terse" is similar to verbose except it does
not print logs for repeated message to save log lines. If different
kind of message received, pgpool prints a log message including the
number of the message. One downside of "terse" is, the repeated
message will not be printed if the pgpool child process is killed
before different kind of message arrives.
For testing, 039.log_backend_messages is added.
Discussion: [pgpool-hackers: 4535] New feature: log_backend_messages
https://www.pgpool.net/pipermail/pgpool-hackers/2024-November/004536.html
|
|
In the client side implementation of SSL negotiation
(pool_ssl_negotiate_clientserver()), it was possible for a
man-in-the-middle attacker to send a long error message to confuse
Pgpool-II or client while in the SSL negotiation phase. This commit
rejects the negotiation immediately (issue a FATAL error) and exits
the session to prevent such an attack.
This resembles PostgreSQL's CVE-2024-10977.
Backpatch-through: v4.1
|
|
In the test we check the error message when the target certificate is
revoked. Unfortunately the error message from OpenSSL seems to be
changed from v3.0 to v3.2.
v3.0 or before: "sslv3 alert certificate revoked"
v3.2: "ssl/tls alert certificate revoked"
So fix is checking only "alert certificate revoked" part.
|
|
The reload_config() function in Pgpool-II should send a SIGHUP signal to the watchdog process.
|
|
Renew cert.sh using examples in PostgreSQL docs.
|
|
This reverts commit dd5a79aef8081bea74f9be7c4beb54ef34637ec9.
The attempt to fix 024.cert_auth regression test failure on
RockyLinux9 was not successful.
|
|
Starting from Thu, 21 Nov 2024 16:11:06 +0900, buildfarm's
024.cert_auth have started failed on RockyLinux9 regardless the
Pgpool-II versions or PostgreSQL versions. It seems at the timing the
test platform was updated from RockyLinux9.4 to RockyLinux9.5 and
openssl version was updated from 3.0 to 3.2 as well. The test firstly
revokes the frontend certificate using openssl ca -revoke, and then
generate a separate CRL file using openssl ca -gencrl command. I
suspect that openssl 3.2 now checks the revoked certificate itself and
decides that it is not valid.
Let's see how buildfarm reacts.
|
|
Upon receiving DataRow packet, it converts the number of fields from
network byte order to host byte order. Unfortunately it used htons()
for this purpose instead of ntohs(). This is simply wrong. Similarly it
used htonl() instead of htohl() while converting the data length from
network byte order to host byte order. This is wrong too. But
fortunately both ntohs()/htons() and ntohl()/htonl() swap the byte and
they bring the same result (i.e. htonl(data_len) ==
ntohl(data_len). So actually the bug does not hurt anything.
However a bug is a bug. This commit fixes them.
Backpatch-through: v4.1
|
|
If pool_hba.conf is disabled, updating pool_passwd was not recognized
by pgpool child process even if pgpool reload was performed. The
reload processing function check_config_reload() mistakenly assumed
that reopening pool_passwd was only necessary when enable_pool_hba is
on.
Backpatch-through: v4.1
Discussion: https://www.pgpool.net/pipermail/pgpool-general/2024-November/001944.html
|
|
This is a follow up commit for commit:
ab091663b09ef8c2d0a1841921597948c597444e
Add test case using pgproto to existing 076.copy_hang.
Backpatch-through: v4.1
|
|
During COPY IN state (i.e. COPY FROM STDIN), frontend can send Flush
or Sync messages. According to the F/B protocol specification, they
should be ignored but Pgpool-II treated as an invalid message and this
causes COPY hung.
Discussion: https://github.com/pgpool/pgpool2/issues/79
Backpatch-through: v4.1
|
|
It is reported that pgpool child process crashes during shutdown.
[pgpool-general: 9261] Re: Segmentation fault during shutdown
The actual crash was in close_all_backend_connections().
close_all_backend_connections() was called because on_system_exit
registers child_will_go_down(). At the moment it seems pgpool child
had just started up and doing pool_init_cp(). The connection pool
object had not been completely initialized, that's cause of the crash.
To fix this, introduce a new static variable in child.c and set it
true when the connection pool object is initialized. In
child_will_go_down() it is checked and close_all_backend_connections()
is called only when the variable is set to true.
Problem reported and analyzed by: Emond Papegaaij
Backpatch-through: v4.2
Discussion: https://www.pgpool.net/pipermail/pgpool-general/2024-November/001938.html
|
|
This reverts commit 25ad9e6d50343e2cbd4dc337803d231c92141021.
Per discussion: [pgpool-general: 9265] Re: Segmentation fault during shutdown
https://www.pgpool.net/pipermail/pgpool-general/2024-November/001942.html
|
|
It is reported that pgpool child process crashes during shutdown.
[pgpool-general: 9261] Re: Segmentation fault during shutdown
The actual crash was in close_all_backend_connections().
close_all_backend_connections() was called because on_system_exit
registers child_will_ho_down(). At the moment it seems pgpool child
had just started up and doing pool_init_cp(). The connection pool
object had not been completely initialized, that's cause of the crash.
To fix this, just remove the call to close_all_backend_connections()
in child_will_ho_down(). Although this will prevent the terminate
message ('X') being sent to backend, it should be harmless since
backend can take care such a disconnection without a terminate
message.
Problem reported and analyzed by: Emond Papegaaij
Backpatch-through: v4.2
Discussion: https://www.pgpool.net/pipermail/pgpool-general/2024-November/001938.html
|
|
There were unnecessary function exporting. This commit fixes them.
Also fixes indentations that did not follow our standards.
Since this is not a bug fix, appied to only master branch.
|
|
It is reported that health check process fails due to authentication
failures.
[pgpool-general: 9236] Sporadic health check failures due to authentication failure
https://www.pgpool.net/pipermail/pgpool-general/2024-October/001913.html
When health_check_password is empty string, health check process looks
for the password from pool_passwd file. Problem is, the file
descriptor for the file is inherited from parent pgpool process. This
means that pgpool main and health check process (possibly multiple
process) share the same descriptor, which causes various problem
including the issue reported here. To fix the problem, re-open the
file when health check process starts so that each health check
process owns its own file descriptor.
Note that pgpool child process (responsible for frontend sessions)
already re-opens the file descriptor and they are related to the
issue.
Problem reported and analyzed by Emond Papegaaij.
Backpatch-through: v4.1
Discussion: https://www.pgpool.net/pipermail/pgpool-general/2024-October/001913.html
|
|
|
|
Commit 6b7d585eb1c693e4ffb5b8e6ed9aa0f067fa1b89 invalidates query
cache if any ALTER ROLE/USER statement is used. Actually this is an
overkill. Because following queries do not affect the privilege of the
role.
- ALTER ROLE user WITH [ENCRYPTED] PASSWORD
- ALTER ROLE user WITH CONNECTION LIMIT
So do not invalidate query cache if those commands are used.
Backpatch-through: v4.1
Discussion: https://www.pgpool.net/pipermail/pgpool-hackers/2024-October/004532.html
|
|
The new PGPOOl SET command allows to delete query cache by specifying
the previous query used to create the query cache entry. example usage
is:
PGPOOL SET CACHE DELETE 'SELECT * FROM t1;'
This command is particularly useful for queries that are not
invalidated by the auto cache invalidation feature because the query
does not have any reference to tables.
|
|
Major changes of PostgreSQL 17 parser include:
- Allow MERGE to use NOT MATCHED BY SOURCE and RETURNING clause:
MERGE INTO ... WHEN NOT MATCHED BY SOURCE ...
MERGE INTO ... RETURNING ...
- Add new COPY option ON_ERROR ignore and LOG_VERBOSITY:
COPY ... WITH (ON_ERROR ignore);
COPY ... WITH (LOG_VERBOSITY verbose);
- Allow to use '*' to specify the COPY FROM options FORCE_NOT_NULL and FORCE_NULL for all columns.
COPY ... WITH (FORCE_NOT_NULL *);
COPY ... WITH (FORCE_NULL *);
- Add EXPLAIN option SERIALIZE and MEMORY
EXPLAIN (MEMORY) ...
EXPLAIN (ANALYZE, SERIALIZE ...) ...
- Allow ALTER TABLE to use SET STATISTICS DEFAULT to set a column to the default statistics target
ALTER TABLE ... ALTER COLUMN ... SET STATISTICS DEFAULT;
- Allow ALTER TABLE to change a column's generation expression
ALTER TABLE ... ALTER COLUMN ... SET EXPRESSION;
- Add DEFAULT setting for ALTER TABLE .. SET ACCESS METHOD
ALTER TABLE ... SET STATISTICS DEFAULT;
- Allow event triggers to use login event:
CREATE EVENT TRIGGER ... ON login ...
- Add event trigger support for REINDEX.
|
|
Buildfarm reported 006.memqcache failure. This was caused by a mistake
in the test script (test.sh). It executes pcp_invalidate_query_cache
then compares the results of a query calling current_timestamp which
is already in query cache (using /*FORCE QUERY CACHE*/ comment). Since
pcp_invalidate_query_cache just places an invalidation request and
next query processes it, comparing the result right after execution of
"SELECT current_timestamp" with the previous cached result indeed
returns an equality and the test failed. To fix this, after
pcp_invalidate_query_cache, executes a different query.
Also I found the test not only fails, but sometimes causes timeout at
my local environment. Inspecting the remaining child process showed
that it is likely the SIGINT handler was not executed (variable
exit_request was not set). I suspect this is because
pool_clear_memory_cache(), which is responsible for actually clearing
the query cache, blocks all signal including SIGINT. I think this is
the reason why the signal handler for SIGINT is not executed. Since
pool_clear_memory_cache() already uses pool_shmem_lock() to protect
the operation on query cache, the signal blocking is not necessary. In
this commit I just removed calls to POOL_SETMASK2 and POOL_SETMASK.
|
|
|
|
Previously it was not possible to invalidate query cache without
restarting pgpool. This commit adds new PCP command
"pcp_invalidate_query_cache" to invalidate query cache without
restarting pgpool. Note this command only places a query cache
invalidate request on shared the shared memory. The actual
invalidation is performed by pgpool child process.
The reasons for the PCP process cannot remove cache directly are:
1) the connection handle to memcached server is not managed by PCP
process.
2) removing shared memory query cache needs an interlock using
pool_shmem_ock() which may not work well on PCP process. Also a
function used here (pool_clear_memory_cache()) uses PG_TRY, which is
only usable in pgpool child process.
If pgpool child process finds such a request, the process invalidates
all query cache on the shared memory. If the query cache storage is
memcached, then pgpool issues memcached_flush() so that all query
cache on memcached are flushed immediately.
Note that the timing for pgpool child process to check the
invalidation request is after processing current query or response
from backend. This means that if all pgpool child process sit idle,
the request will not be processed until any of them receives a
messages from either frontend or backend.
Another note is, about query cache statistics shown by "show
pool_cache" command. Since the cache invalidation does not clear the
statistics, some of them (num_cache_hits and num_selects) continue to
increase even after the cache invalidation. Initializing the
statistics at the same could be possible but I am not sure if all
users want to do it.
Discussion:https://www.pgpool.net/pipermail/pgpool-hackers/2024-October/004525.html
|
|
|
|
Recognize /*FORCE QUERY CACHE*/ SQL statement comment so that any read
only SELECT/with queries are cached. This is opposite to /*NO QUERY
CACHE*/ comment. This feature should be used carefully. See the manual
for more details.
Discussion: https://github.com/pgpool/pgpool2/issues/56
|
|
When a pgpool child process exits, close_all_backend_connections() is
called, which is responsible for closing all connections to backend in
the connection pool. It used mistakenly MAIN_CONNECTION macro, which
is fine for current active connections but is not good for pooled
connections because a main node could be different at the time when
the connection pool was created. Fix is using in_use_backend()
instead.
Reported-by: Emond Papegaaij
Backpatch-through: v4.2
|
|
These leaks were brought in by commit 6fdba5c33 " Use psprintf()
instead of snprintf()." Since the commit was backpatched through 4.1,
this needs to be backpatched through 4.1 too.
Per Coverity (CID 1559726).
Backpatch-through: 4.1.
|
|
Fix "insecure data handling".
Per Coverity (CID 1559731)
|