Age | Commit message (Collapse) | Author |
|
|
|
If query cache is enabled and query is operated in extended query mode
and pgpool is running in streaming replication mode, an execute
message could return incorrect results.
This could happen when an execute message comes with a non 0 row
number parameter. In this case it fetches up to the specified number of
rows and returns "PortalSuspended" message. Pgpool-II does not create
query cache for this. But if another execute message with 0 row
number parameter comes in, it fetches rest of rows (if any) and
creates query cache with the number of rows which the execute messages
fetched.
Obviously this causes unwanted results later on: another execute
messages returns result from query cache which has only number of rows
captured by the previous execute message with limited number of rows.
Another trouble is when multiple execute messages are sent
consecutively. In this case Pgpool-II returned exactly the same
results from query cache for each execute message. This is wrong since
the second or subsequent executes should return 0 rows.
To fix this, new boolean fields "atEnd" and "partial_fetch" are
introduced in the query context. They are initialized to false when a
query context is created (also initialized when bind message is
received). If an execute message with 0 row number is executed, atEnd
is set to true upon receiving CommandComplete message. If an execute
message with non 0 row number is executed, partial_fetch is set to
true and never uses the cache result, nor creates query cache.
When atEnd is true, pgpool will return CommandComplete message with
"SELECT 0" as a result of the execute message.
Also tests for this case is added to the 006.memqcache regression
test.
Backpatch-through: v4.2
Discussion: [pgpool-hackers: 4547] Bug in query cache
https://www.pgpool.net/pipermail/pgpool-hackers/2024-December/004548.html
|
|
Commit 6b7d585eb1c693e4ffb5b8e6ed9aa0f067fa1b89 invalidates query
cache if any ALTER ROLE/USER statement is used. Actually this is an
overkill. Because following queries do not affect the privilege of the
role.
- ALTER ROLE user WITH [ENCRYPTED] PASSWORD
- ALTER ROLE user WITH CONNECTION LIMIT
So do not invalidate query cache if those commands are used.
Backpatch-through: v4.1
Discussion: https://www.pgpool.net/pipermail/pgpool-hackers/2024-October/004532.html
|
|
Previously it was not possible to invalidate query cache without
restarting pgpool. This commit adds new PCP command
"pcp_invalidate_query_cache" to invalidate query cache without
restarting pgpool. Note this command only places a query cache
invalidate request on shared the shared memory. The actual
invalidation is performed by pgpool child process.
The reasons for the PCP process cannot remove cache directly are:
1) the connection handle to memcached server is not managed by PCP
process.
2) removing shared memory query cache needs an interlock using
pool_shmem_ock() which may not work well on PCP process. Also a
function used here (pool_clear_memory_cache()) uses PG_TRY, which is
only usable in pgpool child process.
If pgpool child process finds such a request, the process invalidates
all query cache on the shared memory. If the query cache storage is
memcached, then pgpool issues memcached_flush() so that all query
cache on memcached are flushed immediately.
Note that the timing for pgpool child process to check the
invalidation request is after processing current query or response
from backend. This means that if all pgpool child process sit idle,
the request will not be processed until any of them receives a
messages from either frontend or backend.
Another note is, about query cache statistics shown by "show
pool_cache" command. Since the cache invalidation does not clear the
statistics, some of them (num_cache_hits and num_selects) continue to
increase even after the cache invalidation. Initializing the
statistics at the same could be possible but I am not sure if all
users want to do it.
Discussion:https://www.pgpool.net/pipermail/pgpool-hackers/2024-October/004525.html
|
|
When the query cache feature is enabled, it was possible that a user
can read rows from tables that should not be visible for the user
through query cache.
- If query cache is created for a row security enabled table for user
A, and then other user B accesses the table via SET ROLE or SET
SESSION_AUTHORIZATION in the same session, it was possible for the
user B to retrieve rows which should not be visible from the user B.
- If query cache is created for a table for user A, and then other
user B accesses the table via SET ROLE or SET SESSION_AUTHORIZATION
in the same session, it was possible for the user B to retrieve rows
which should not be visible from the user B.
- If query cache is created for a table for a user, and then the
access right of the table is revoked from the user by REVOKE
command, still it was possible for the user to to retrieve the rows
through the query cache.
Besides the vulnerabilities, there were multiple bugs with the query
cache feature.
- If query cache is created for a row security enabled table for a
user, and then ALTER DATABASE BYPASSRLS or ALTER ROLE BYPASSRLS
disable the row security of the table, subsequent SELECT still
returns the same rows as before through the query cache.
- If query cache is created for a table for a user, and then ALTER
TABLE SET SCHEMA changes the search path to not allow to access the
table, subsequent SELECT still returns the rows as before through
the query cache.
To fix above, following changes are made:
- Do not allow to create query cache/use query cache for row security
enabled tables (even if the table is included in
cache_safe_memqcache_table_list).
- Do not allow to create query cache/use query cache if SET ROLE/SET
AUTHORIZATION is executed in the session (query cache invalidation
is performed when a table is modified as usual).
- Remove entire query cache if REVOKE/ALTER DATABASE/ALTER TABLE/ALTER
ROLE is executed. If the command is executed in an explicit
transaction, do not create query cache/use query cache until the
transaction gets committed (query cache invalidation is performed
when a table is modified as usual). If the transaction is aborted,
do not remove query cache.
Patch is created by Tatsuo Ishii.
Backpatch-through: v4.1
|
|
Debug message was accidentally left.
|
|
This commit tries to eliminate pgpool's long standing limitations
regarding multiple statements (multi-statements).
Previously
BEGIN;SELECT;
SAVEPOINT foo;
will fail in streaming replication mode because "BEGIN" was sent to
the primar node, but "SAVEPOINT" will be sent to both the primary and
standbys, and standbys will complain "SAVEPOINT can only be used in
transaction blocks".
Basic idea to solve the problem is, tracking explicit transactions
started by multi-statement queries so that all commands including
PREPARE, EXECUTE, DEALLOCATE, SAVEPOINT and COMMIT/ROLLBACK are sent
to the primary node in streaming replication mode or logical
replication mode. In native replication or snapshot isolation mode,
those queries are sent to all of the backend nodes.
For this purpose new member: is_tx_started_by_multi_statement is added
to session context and also support functions are added.
extern bool is_tx_started_by_multi_statement_query(void);
extern void set_tx_started_by_multi_statement_query(void);
extern void unset_tx_started_by_multi_statement_query(void);
Discussion: https://www.pgpool.net/pipermail/pgpool-hackers/2023-February/004287.html
Back-patch-through: 4.2 as backport to 4.1 and before looks difficult
|
|
Currently there are two string manipulation modules: pool_string and
StringInfo. StringInfo is more feature rich and PostgreSQL compatible
because it was imported from PostgreSQL. So replace all usages of
pool_string by StringInfo. This also solves a problem reported by
Peng: i.e. struct name "String" collision: pool_string uses "String"
and PostgreSQL 15's parser also uses "String".
|
|
replication mode.
We used to create query cache in ReadyForQuery() in extended query
mode in other than streaming and logical replication mode. However if
following message sequence is sent from frontend, the query cache was
never created because pool_is_cache_safe() returns false in
pool_handle_query_cache(). Why? Because pool_is_cache_safe() examines
the current query context, and the current query context is for "END"
message.
'P' "" "BEGIN" 0
'B' "" "" 0 0 0
'E' "" 0
'P' "S1" "SELECT 1" 0
'B' "S1" "S1" 0 0 0
'E' "S1" 0
'P' "" "END" 0
'B' "" "" 0 0 0
'E' "" 0
'S'
'Y'
'X'
So this commit changes CommandComplete() so that
pool_handle_query_cache() gets called in not only streaming and
logical replication mode. pool_handle_query_cache() will create a
temporary query cache and it will be processed when next time
ReadyForQuery() is called for an END message.
I found the bug while taking care of:
https://www.pgpool.net/mantisbt/view.php?id=700
Note that if the transaction is ended by a simple query message "END",
the bug does not appear because extended query SELECT messages will be
followed by a SYNC message, which will produce a Ready for query
message, and ReadyForQuery() will happily register query cache since
this time pool_is_cache_safe() returns true.
I think this is a long standing bug. The reason why this was not found
earlier is, despite the similar message sequence is created by the JDBC
driver, CommandComplete() already handles in the way described above.
|
|
Fixes for source code files including *.[chl] and regression scripts,
not only for comments but variable names and function names.
Documents (sgml and some text files) are also fixed.
See:
https://github.com/pgpool/pgpool2/pull/38
for more details.
Patch contributed by Josh Soref. Fixes for Japanese sgml files by me.
|
|
black/white_function_list -> write_function_list, read_only_function_list
black_query_pattern -> primay_routing_query_pattern
black/white_memqcache_table_list -> cache_unsafe/cache_safe_table_list
Watchdog: replace master to 'leader' for 'master' watchdog nodes
ALWAYS_MASTER flag is changed to ALWAYS_PRIMARY.
Replace relcache_query_target option 'master' to 'primary'.
Replace Master Node with Main node ( node id with 0 or youngest live ).
Replace follow_master with follow_primary with parameters.
Replace some remaining occurrences of master with primary/main/leader
Patch contributed by: Umar Hayat
Reviewed by me:
|
|
Major changes of PostgreSQL 13 parser include:
- Remove an object's dependency on an extension
ALTER TRIGGER ... NO DEPENDS ON EXTENSION ...
ALTER FUNCTION ... NO DEPENDS ON EXTENSION ...
- Allow FETCH FIRST to use WITH TIES
FETCH FIRST ... WITH TIES
- Add ALTER TABLE clause DROP EXPRESSION
ALTER TABLE ... DROP EXPRESSION
- Allow setting statistics target for extended statistics
ALTER STATISTICS ... SET STATISTICS
- Allow ALTER VIEW to rename view columns
ALTER VIEW ... RENAME COLUMN ... TO ...
- Add CREATE DATABASE clause LOCALE option
CREATE DATABASE ... LOCALE
- Add DROP DATABASE clause WITH FORCE option
DROP DATABASE ... WITH (force)
- Add VACUUM clause PARALLEL option
VACUUM (PARALLEL 1) ...
|
|
While working on the pcp_stop_pgpool command to implement cluster mode,
I realized that we have a lot of clutter in a few source files. Especially
pool.h header file. So I also did some code reorganization and added a
bunch of new includefiles in the hope to reduce the size of pool.h and
also get rid of too many global functions.
Along the way, I found out there are few other files that need a little
shredding as well, src/main/pgpool_main.c and src/protocol/child.c are at
the top of this wanted to be trimmed list. So as part of this commit.
I have taken a few things related to internal commands
(interface between the child processes can and Pgpool-II main process)
from pgpool_main.c file and moved them into new "src/main/pool_internal_comms.c"
Similarly, src/protocol/child.c had and still has so many functions that do
not fit with the personality of child.c file. So I have moved some of the
stuff related to DB functions info src/protocol/pool_pg_utils.c file.
Having done that I think this is still not enough and we may require another
round ( if someoneis willing to work on it) of a source file reorganization.
|
|
Checking temporary tables are slow because it needs to lookup system
catalogs. To eliminate the lookup, new method to trace CREATE TEMP
TABLE/DROP TABLE is added. For this purpose a linked list is added to
session context. In the list, information of creating/dropping temp
tables (not yet committed) and created/dropped temp tables (committed
or aborted) is kept. By looking up the list, it is possible to decide
whether given table is a temporary or not.
However it is impossible to trace the table creation in functions and
triggers. thus the new method does not completely replace existing
method and choice of method are given by changing check_temp_table
type from boolean to string.
"catalog": existing method (catalog lookup, same as check_temp_table = on)
"trace": the proposed new way
"none": no temp table checking (same as check_temp_table = off)
|
|
In "batch" mode, not for every execute message, a sync message is
followed. Unfortunately Pgpool-II only discard memory of query
context for the last execute message while processing the ready for
query message. For example if 3 execute messages are sent before the
sync message, 2 of query context memory will not be freed and this
leads to serious memory leak.
To fix the problem, now the query context memory is possibly discarded
when a command complete message is returned from backend if the query
context is not referenced either by sent messages or pending messages.
If it is not referenced at all, we can discard the query context.
Also even if it is referenced, it is ok to discard the query context
if it is either an unnamed statement or an unnamed portal because it
will be discarded anyway when next unnamed statement or portal is
created.
Per bug 468.
|
|
While processing command complete message in extended query/streaming
replication mode, failure to read from socket leads to memory leak.
|
|
CommandComplete.c: fix memory leak
pool_memqcache.c: fix access to already freed data
|
|
PREPARE should be add to pool_add_sent_message, so that EXECUTE and DEALLOCATE
can be sent to the same node as PREPARE.
See [pgpool-general: 6226] for more details.
|
|
Fix and unify indentation by using PostgreSQL 11's pgindent command.
|
|
|
|
If write query is issued in an explicit transaction and
disable_load_balance is set to 'trans_transaction', then subsequent
read queries will be redirected to primary node even if the
transaction is closed.
TODO:
- Implement 'always' case. In addition to 'trans_transaction', the
effect persists even outside of explicit transactions until the
session ends.
- Documentation
|
|
|
|
When empty query (empty string or all comment query) is sent, command
complete message was returned to frontend. This is not correct. An
empty query response should be returned to frontend.
Per bug328.
|
|
- Commit cache upon receiving a command complete message, rather than
upon receiving a "ready for query" message, except in an explicit
transaction. Doing it upon the receiving ready for query message had
serious problem with streaming replication mode and extended query
case, because it is possible that multiple selects get executed
before the ready for query message arrives. The older way cannot
handle it because it only looked at the last command before the
ready for query message.
- When query cache hits, do not forward cached messages to frontend in
Execute(). Instead "inject" cached messages into the backend
buffer. This is better than older way since it respects the pending
messages and necessary treatment while receiving messages including
parse complete and bind complete. Note that now the cached messages
are returned only after a flush or a sync message sent. Before they
are returned immediately while executing execute message. The newer
behavior more resembles to the ordinary messages (no cached message).
To avoid adding cache by the injected messages twice,
"skip_cache_commit" flags is introduced in query context. The flag
is set to true in Execute() if the injection
happens. handle_query_cache() checks the flag and skips registering
query results to the query cache if the flag is true. The flags is
set to false if cache is not hit.
- Introduce new function pool_at_command_success() which were the code
fragments in ReadyForQuery() to take care misc treatments including
unsettling writing transaction flags in CommandComplete() as
well. Also ReadyForQuery() now calls it.
- Fix memory leak. If a close request is not issued (this is typically
the case when unnamed statements/portals used), temp cache buffer
remains until the session end (at the session end, the memory for
the buffer will be freed because they are in the session context
memory. Thus only long running sessions hit the bug). Before we
prepared new buffer when query cache committed. Now do not create
new buffer there. Rather create a buffer when memqcache_register
gets called if the temp buffer is not allocated yet.
|
|
The debugging line after an if statement should be included in a curly
brance. Otherwise report wrong information.
|
|
|
|
In streaming replication mode, if SHOW is issued then subsequent
SELECTs are sent to the primary node in an explicit transaction. This
is not a reasonable and unnecessary limitation.
Also fix hang when parse command returns error.
|
|
replication mode.
The errors are caused by wrong prediction in which (or both) database
node will send response to Pgpool-II. Previous implementation using
"sync map" are weak and sometimes fail in the prediction.
This patch introduces new implementation using "pending message
queue", which records all sent message to backends. The element of the
queue stores info regarding messages types
(parse/bind/execute/describe/close/sync), to which database node the
message was sent and so on. It's a simple FIFO queue. When a message
arrives from backend, by looking at the head of the "pending message
queue", it is possible to reliably predict what kind of message and
from which database node it will arrive. After receiving the message,
the element is removed from the queue.
I would like to thank to Sergey Kim, who has been helping me in
testing series of patches.
See bug 271:
http://www.pgpool.net/mantisbt/view.php?id=271
and discussion in pgpool-hackers mailing list [pgpool-hackers: 2043]
and [pgpool-hackers: 2140] for more details.
|
|
(From pgpool-hackers: 1784)
While taking care of bug track item 238:
http://www.pgpool.net/mantisbt/view.php?id=238
I found there's a room to enhance the performance of data transfer
from Pgpool-II to frontend. In an extreme case, such as sending many
rows to frontend, the enhancement was 47% to 62% according to my
benchmark result.
Currently Pgpool-II flushes data to network (calling write(2)) every
time it sends a row data ("Data Row" message) to frontend. For
example, if 10,000 rows needed to be transfer, 10,000 times write()s
are issued. This is pretty expensive. Since after repeating to send
row data, "Command Complete" message is sent, it's enough to issue a
write() with the command complete message. Also there are unnecessary
flushing are in handling the command complete message.
Attached patch tries to fix them. I did a quick benchmark on my laptop
(Ubuntu 14.04 LTS, 2 cores, 16GB mem, SSD) using pgbench (scale factor
= 1) + custom query (SELECT * FROM pgbench_accounts), with master
branch head and with patched version. The graph attached.
Unfortunately, performance in workloads where transferring few rows,
will not be enhanced since such rows are needed to flush to network
anyway.
|
|
Deal with bug 167. The problem is in the following sequence:
Parse(BEGIN)
Bind
:
Execute
Parse(SELECT)
Bind
:
(Notice that no Sync message between Execute and Parse)
BEGIN is sent to all db nodes while select is sent to one of db
nodes. So pgpool is confused which DB node to read the response. To
fix this, after each Execute, send flash message to backend to get
response. This will ensure that the responses of BEGIN etc. are read
from all DB nodes while the responses of SELECT etc. are read from
only one of DB node.
|
|
While processing Command Complete message, it is possible that query
in progress state is false. In this case, pool_set_query_state() will
fail and should not be called.
|
|
|