| Age | Commit message (Collapse) | Author |
|
The sq_key alone may be up to 64 bytes, so we need more than that.
We could use dynamic memory instead, but 128 bytes should be enough
both for the sq_key and the other pieces.
|
|
A row description messages contains the type information for the attributes in
the column. But if the type does not exist in the search_path then the
coordinator fails to parse the typename back to the type. So the datanode must
send the schema name along with the type name.
Per report and test case by Hengbing Wang @ Microfun.
Added a new test file and a few test cases to cover this area.
|
|
We'd occassionally seen that the pooler process fails to respond to SIGQUIT and
gets stuck in a non recoverable state. Code inspection reveals that we're not
following the model followed by rest of the background worker processes in
handling SIGQUIT. So get that fixed, with the hope that this will fix the
problem case.
|
|
Without this, parseTypeString() might throw an error or resolve to a wrong type
in case the type name requires quoting.
Per report by Hengbing Wang
|
|
Per report from Hengbing, the current implementation of PITR recovery to a
BARRIER failed to correctly stop at the given recovery_target_barrier. It seems
there are two bugs here. 1) we failed to write the XLOG record correctly and 2)
we also failed to mark the end-of-recovery upon seeing the XLOG record during
the recovery.
Fix both these problems and also fix pg_xlogdump in passing to ensure we can
dump the BARRIER XLOG records correctly.
|
|
Chi Gao and Hengbing Wang reported certain issues around transaction handling
and demonstrated via xlogdump how certain transactions were getting marked
committed/aborted repeatedly on a datanode. When an already committed
transaction is attempted to be aborted again, it results in a PANIC. Upon
investigation, this uncovered a very serious yet long standing bug in
transaction handling.
If the client is running in autocommit mode, we try to avoid starting a
transaction block on the datanode side if only one datanode is going to be
involved in the transaction. This is an optimisation to speed up short queries
touching only a single node. But when the query rewriter transforms a single
statement into multiple statements, we would still (and incorrectly) run each
statement in an autocommit mode on the datanode. This can cause inconsistencies
when one statement commits but the next statement aborts. And it may also lead
to the PANIC situations if we continue to use the same global transaction
identifier for the statements.
This can also happen when the user invokes a user-defined function. If the
function has multiple statements, each statement will run in an autocommit
mode, if it's FQSed, thus again creating inconsistency if a following statement
in the function fails.
We now have a more elaborate mechanism to tackle autocommit and transaction
block needs. The special casing for force_autocommit is now removed, thus
making it more predictable. We also have specific conditions to check to ensure
that we don't mixup autocommit and transaction block for the same global xid.
Finally, if a query rewriter transforms a single statement into multiple
statements, we run those statements in a transaction block. Together these
changes should help us fix the problems.
|
|
d9f45c9018ec3ec1fc11e4be2be7f9728a1799b1 attempted to refactor
release_connection() to make it more readable, but unfortunately
inverted the force_destroy check, causing regression failures.
In hindsight, the refactoring was rather arbitrary and not really
helping with the readability, so just revert to the original code
(but keep the comments, explaining what's happening).
|
|
A number of functions were defined in pgxcnode.h/pgxnnode.h, but
only ever used in poolmgr.c. Those are:
- PGXCNodeConnect - open libpq connection using conn. string
- PGXCNodePing - ping node using connection string
- PGXCNodeClose - close libpq connection
- PGXCNodeConnected - verify connection status
- PGXCNodeConnStr - build connection string
So move them to poolmgr.c and make them static, so that poolmgr
is the only part dealing with libpq connections directly.
|
|
Similarly to a39b06b0c6, this does minor cleanup in the pool manager
code by removing unused functions and adding a lot of comments, both
at the file level (explaining the concepts and basic API methods)
and for individual functions.
|
|
pgxc_FQS_planner() was not copying queryId, so extensions relying on
it did not work properly. For example the pg_stat_statements extension
was ignoring queries executed using FQS entirely.
Backpatch to Postgres-XL 9.5.
|
|
When checking if a query is eligible for FQS (fast-query shipping),
disable the optimization for queries in SCROLL cursors, as FQS does
not support backward scans.
Discussion: <e66932f3-3c35-cab0-af7e-60e8dfa423ba@2ndquadrant.com>
|
|
Our efforts to improve shared queue synchronization continues. We now have a
per queue producer lwlock that must be held for synchronization between
consumers and the producer. Consumers must hold this lock before setting the
producer latch to ensure the producer does not miss out any signals and does
not go into unnecessary waits.
We still can't get rid of all the timeouts, especially we see that sometimes a
producer finishes and tries to unbind from the queue, even before a consumer
gets chance to connect to the queue. We left the 10s wait to allow consumers to
connect. There is still net improvement because when the consumer is not going
to connect, it tells the producer and we avoid the 10s timeout, like we used to
see earlier.
|
|
Rules are converted in their string representation and stored in the catalog.
While building relation descriptor, this information is read back and converted
into a Node representation. Since relation descriptors could be built when we
are reading plan information sent by the remote server in a stringified
representation, trying to read the rules with portable input on may lead to
unpleasant behaviour. So we must first reset portable input and restore it back
after reading the rules. The same applies to RLS policies (even though we don't
have a test showing the impact, but it looks like a sane thing to fix anyways)
|
|
We never had this support and we never felt the need because the use of FQS was
limited for utility statements and simple queries which can be completed
pushed down to the remote node. But in PG 10, we're seeing errors while using
cursors for queries which are FQSed. So instead of forcing regular remote
subplan on such queries, we are adding support for rescan of RemoteQuery node.
Patch by Senhu <senhu@tencent.com>
|
|
As the coordinator_lxid is uin32, so make sure we use %u to format it
(e.g. when sending it to remote nodes as string) and not just %d.
|
|
We were not dealing with the params in Subplan correctly, thus those params
were not sent to the remote nodes correctly during RemoteSubplan exectution.
This patch fixes that by traversing the Subplan node correctly. The regression
failure in the 'join' test case is addressed too.
Patch by senhu (senhu@tencent.com)
|
|
This is the merge-base of PostgreSQL's master branch and REL_10_STABLE branch.
This should be the last merge from PG's master branch into XL 10 branch.
Subsequent merges must happen from REL_10_STABLE branch
|
|
The quote_guc_value() result was assigned to regular (char *) variable,
resulting in compiler warning about discarding const qualifier. Fix by
marking the variable as 'const' too.
|
|
This merges the current master branch of XL with the XL 10 development branch.
Commits upto f72330316ea5796a2b11a05710b98eba4e706788 are included in this
merge.
|
|
With roundrobin node, the initial node was always set to the first node
in the list. That works fine when inserting many rows at once (e.g. with
INSERT SELECT), but with single-row inserts this puts all data on the
first node, resulting in unbalanced distribution.
This randomizes the choice of the initial node, so that with single-row
inserts the ROUNDROBIN behaves a bit like RANDOM distribution.
This also removes unnecessary srand() call from RelationBuildLocator(),
located after a call to rand().
|
|
Due to a coding issue in IsDistColumnForRelId() it was possible to drop
columns that were used as modulo distribution keys. A simple example
demonstrates the behavior:
CREATE TABLE t (a INT, b INT, c INT) DISTRIBUTE BY MODULO (a);
ALTER TABLE t DROP COLUMN a;
test=# \d+ t
Table "public.t"
Column | Type | Modifiers | Storage | Stats target | Description
--------+---------+-----------+---------+--------------+-------------
b | integer | | plain | |
c | integer | | plain | |
Distribute By: MODULO(........pg.dropped.1........)
Location Nodes: ALL DATANODES
With this commit, the ALTER TABLE command fails as expected:
ERROR: Distribution column cannot be dropped
The commit simplifies the coding a bit, and removes several functions
that were not needed anymore (and unused outside locator.c).
|
|
Until now BIGINT data type was not supported by MODULO distribution and
attempts to create such tables failed. This patch removes the limitation.
The compute_modulo() function originally used an optimized algorithm
from http://www.graphics.stanford.edu/~seander/bithacks.html (namely the
one described in section "Compute modulus division by (1 << s) - 1 in
parallel without a division operator") to compute the modulo. But that
algorithm version only supported 32-bit values, and so would require
changes to support 64-bit values. Instead, I've decided to simply drop
that code and use simple % operator, which should translate to IDIV
instruction.
Judging by benchmarks (MODULO on INTEGER column), switching to plain
modulo (%) might result in about 1% slowdown, but it might easily be
just noise caused by different binary layout due to code changes. In
fact, the simplified algorithm is much less noisy in this respect.
|
|
This is the result of the "git merge remotes/PGSQL/master" upto the said commit
point. We have done some basic analysis, fixed compilation problems etc, but
bulk of the logical problems in conflict resolution etc will be handled by
subsequent commits.
|
|
Fast-query-shipping consumes all rows produced by one datanode (connection)
before moving to the next connection. This leads to suboptimal performance
when the datanodes can't prroduce tuples at a desired pace. Instead, we switch
between connections after every PGXLRemoteFetchSize (pgx_remote_fetch_size)
rows are fetched. This gives datanode a chance to produce more tuples while
the coordinator consumes tuples already produced and sent across by another
datanode.
This seems to improve performance for FQS-ed queries significantly when they
are returning large number of rows from more than one datanodes.
Report by Pilar de Teodoro <pteodoro@sciops.esa.int>, initial analysis and
performance tests by Tomas Vondra, further analysis and patch by me.
Backpatched to XL9_5_STABLE.
|
|
This avoids resetting global session information when DISCARD/RESET ALL is
executed. This can have bad effects, especially as seen from the 'guc' test
case where we fail to handle temp tables correctly. So we mark global_session
GUC with GUC_NO_RESET_ALL flag and instead issue an explicit RESET
global_session when connection is cleaned up.
|
|
the remote side during RemoteSubplan execution.
This allows us to experiment with different sizes more easily. Playing with the
fetch size also exposed couple of problems fixed in this same commit.
1. We were incorrectly forgetting a connection response combiner while
suspending a portal, leading to errors later when we try to buffer the results
because the connection must be used for other queries.
2. The remote cursor name was not getting set properly, thus datanodes
complaining about non-existent cursors.
|
|
MaxAllocSize.
We take this opportunity to rearrange the code to avoid duplicity in handling
in and out buffers. Also, add some other checks to ensure that we don't overrun
the limits.
Report, investigation and draft patch by Krzysztof Nienartowicz.
|
|
running COPY protocol.
We sometimes do see a 'M' message (command ID received from the remote node)
from the datanode and that breaks the combiner validation message. So we only
do that if we received a RESPONSE_COPY message during COPY IN protocol.
|
|
PostgreSQL's BINARY COPY protocol does not expect a newline character between
rows. But coordinator was adding this while running the protocol between
coordinator and the datanodes. While Postgres-XL had some provision to deal
with this on the datanode side, it seemed either insufficient or buggy as
evident during tests with pglogical. So get rid of that and make the protocol
same as vanilla PG.
|
|
comamnds to the remote nodes
Earlier we'd special cased a few GUCs such as those using memory or time units
or transaction isolation levels. But clearly that wasn't enough as we noticed
with "application_name" recently. So fix this problem in a more comprehensive
manner.
Added a few more test cases to cover these scenarios.
|
|
The switch in determine_param_types() was missing T_CteScan case,
so some queries were failing with:
ERROR: unrecognized node type: 119
This fixes it, and it also fixes the expected output for a few other
queries in the same test suite.
|
|
Until now the application_name was set to 'pgxc' for all connections.
Set it to 'pgxc:NODE_NAME' instead, to make it easier to understand
data from pg_stat_activity.
|
|
The code was mixing code for named and regular tranches (identified
by just ID, in this case LWTRANCHE_SHARED_QUEUES). So switch to
the LWLockRegisterTranche() approach.
There were a two more bugs though:
1) The tranche struct was local in the block, which does not work as
it's referenced in the global array without creating a copy. So just
make it static and global.
2) The locks were not properly initialized. Add a LWLockInitialize
call to the loop.
|
|
LWLockAssign() got removed so this is the only way to do this.
In any case, once we switch to the built-in shared queues, this should
not be needed at all I believe.
|
|
The compilation was failing in poolmgr.c because of incorrect types
when tweaking signals. After inspection it seems HAVE_SIGPROCMASK
is not actually needed (it's not defined anywhere, and upstream is
not using it either) so just get rid of it entirely.
|
|
The frontend libraries need to be included last, because libpq-int.h
defines FRONTEND, which triggers #error in s_lock.h. Including the
libraries at the end is not particularly pretty, but it works.
Also reorder the includes a bit into the usual lexicographic order.
|
|
There's a fourth argument (abbrev), so just pass NULL.
|
|
|
|
max_datanodes
The user can still break things by first setting a high value for max_datanodes
or max_coordinators and then lowering it again after adding nodes. So this
basic check is not enough, but better than what we've currently.
|
|
the GTM no longer has state information for the transaction when
xc_maintenance_mode is active.
Information on the GTM is important for explicit prepared transactions because
GTM tracks the participants nodes in that case and the COMMIT/ROLLBACK PREPARED
commands can then to forwarded to those participants. So when
xc_maintenance_mode is active, we only cleanup prepared transaction on the
local node (or remote when used with EXECUTE DIRECT). So it seems ok not to
guarantee the GTM to have the state information.
|
|
the correct order and correction position
Per report and test-case by Daniel Bunford-Jones
|
|
There was only a single function in this module, and it was not
referenced from anywhere. So remove the module and all references
to it (e.g. includes of the header).
In createplan.c, this is a bit more difficult, because it got
getpid() through this file, so include the pgxcnode.h instead,
as it uncludes unistd.h.
|
|
Although stormdb_promote_standby() is a function visible from SQL,
it does not seem mentioned anywhere (e.g. in the docs etc.). So
let's get rid of it, as it's apparently not needed.
|
|
As written originally, the branching confused the compiler enough to
emit warning about using uninitialized allOids variable. Refactor the
code a bit to make it clear that's not really possible, and also move
a number of variable declarations from the function scope to a block
level.
Note: I think the function would deserve more documentation - both
about the intent in general, and about the meaning of the return
value ('res' is quite unclear name).
|
|
Under rare conditions (missing support for unix sockets or empty
Unix_socket_directories) the fdsocket variable remained unitialized.
Handle this case explicitly, and also fix a few minor annoyances
(remove extra braces around a single-line block).
|
|
When handling of a message type requires more than a few lines,
move it to a separate function. This makes the code flow much
cleaner, and it also issues with reusing shared variables
(originally defined at the function scope).
Also differentiate between EOF and protocol violation, if only
to log the protocol violation.
Note: The for(;;) loop checks for EOF on two places - once in the
switch, and then at the very end (using pool_pollbyte). There's
a subtle difference - the second place does not do agent_destroy.
Not sure if this is intentional, but perhaps that's a bug?
|
|
While tweaking the code, I've noticed this is the only place using
NodeHealthStatus, apparently attempting to save a few bytes. That
seems rather pointless given how little memory this will need and
how short-lived the memory is (and also how AllocSet doubles the
allocation sizes anyway), so I got rid of this. This makes copying
the old state easier (a simple memcpy instead of for loops).
Note: The PgxcNodeListAndCount() seems a bit unfortunate, as the
name suggests to list and count nodes, but in practice it refreshes
the status of nodes from catalog, while keeping the health status.
|
|
Otherwise the agent_acquire_connections() might bail out before
actually setting pids to anything, leaving the pointer set to
some random garbade, causing segfault at pfree() time.
|
|
|
|
enforce_two_phase_commit
These two options were ineffective - defined and listed in the sample
configuration file, but attached to unreferenced variables.
|