Age | Commit message (Collapse) | Author |
|
The sq_key alone may be up to 64 bytes, so we need more than that.
We could use dynamic memory instead, but 128 bytes should be enough
both for the sq_key and the other pieces.
|
|
In XL, we embed the nodename in the tablespace subdir name to ensure that
non-conflicting paths are created when multiple coordinators/datanodes are
running on the same server. The code to handle tablespace mapping in basebackup
was missing this support.
Per report and patch by Wanglin.
|
|
While in restore mode, that we use to load schema when a new node is added to
the cluster, the partition child tables should correctly inherit the
distribution properties from the parent table. This support was lacking, thus
leading to incorrect handling of such tables.
Per report by Virendra Kumar.
|
|
Similar to what we did in e688c0c23c962d425b82fdfad014bace4207af1d, we must not
rely on the temporary namespace on the coordinator since it may change on the
remote nodes. Instead we use the pg_my_temp_schema() function to find the
currently active temporary schema on the remote node.
|
|
We create a coordinator-only LOCAL temporary table for REFRESH MATERIALIZED
VIEW CONCURRENTLY. Since this table does not exist on the remote nodes, we must
not use explicit "ANALYZE <temptable>". Instead, just analyze it locally like
we were doing at other places.
Restore the matview test case to use REFRESH MATERIALIZED VIEW CONCURRENTLY now
that the underlying bug is fixed.
|
|
|
|
A row description messages contains the type information for the attributes in
the column. But if the type does not exist in the search_path then the
coordinator fails to parse the typename back to the type. So the datanode must
send the schema name along with the type name.
Per report and test case by Hengbing Wang @ Microfun.
Added a new test file and a few test cases to cover this area.
|
|
We'd occassionally seen that the pooler process fails to respond to SIGQUIT and
gets stuck in a non recoverable state. Code inspection reveals that we're not
following the model followed by rest of the background worker processes in
handling SIGQUIT. So get that fixed, with the hope that this will fix the
problem case.
|
|
Without this, parseTypeString() might throw an error or resolve to a wrong type
in case the type name requires quoting.
Per report by Hengbing Wang
|
|
|
|
Per report from Hengbing, the current implementation of PITR recovery to a
BARRIER failed to correctly stop at the given recovery_target_barrier. It seems
there are two bugs here. 1) we failed to write the XLOG record correctly and 2)
we also failed to mark the end-of-recovery upon seeing the XLOG record during
the recovery.
Fix both these problems and also fix pg_xlogdump in passing to ensure we can
dump the BARRIER XLOG records correctly.
|
|
The system may and very likely choose different namespace for temporary tables
on different nodes. So it was erroneous to explicitly add the coordinator side
nampspace to the queries constructed for fetching stats from the remote nodes.
A regression test was non-deterministically failing for this reason for long,
but only now we could fully understand the problem and fix it. We now use
pg_my_temp_schema() to derive the current temporary schema used by the remote
node instead of hardcoding that in the query using coordinator side
information.
|
|
|
|
Chi Gao and Hengbing Wang reported certain issues around transaction handling
and demonstrated via xlogdump how certain transactions were getting marked
committed/aborted repeatedly on a datanode. When an already committed
transaction is attempted to be aborted again, it results in a PANIC. Upon
investigation, this uncovered a very serious yet long standing bug in
transaction handling.
If the client is running in autocommit mode, we try to avoid starting a
transaction block on the datanode side if only one datanode is going to be
involved in the transaction. This is an optimisation to speed up short queries
touching only a single node. But when the query rewriter transforms a single
statement into multiple statements, we would still (and incorrectly) run each
statement in an autocommit mode on the datanode. This can cause inconsistencies
when one statement commits but the next statement aborts. And it may also lead
to the PANIC situations if we continue to use the same global transaction
identifier for the statements.
This can also happen when the user invokes a user-defined function. If the
function has multiple statements, each statement will run in an autocommit
mode, if it's FQSed, thus again creating inconsistency if a following statement
in the function fails.
We now have a more elaborate mechanism to tackle autocommit and transaction
block needs. The special casing for force_autocommit is now removed, thus
making it more predictable. We also have specific conditions to check to ensure
that we don't mixup autocommit and transaction block for the same global xid.
Finally, if a query rewriter transforms a single statement into multiple
statements, we run those statements in a transaction block. Together these
changes should help us fix the problems.
|
|
We do some special processing for RemoteSubplan with returning lists. But the
EXPLAIN plan mechanism is not adequetly trained to handle that special
crafting. So for now do not try to print the target list in the EXPLAIN output.
|
|
The new message 'W' to report waited-for XIDs must not be sent to a non-XL
client since it's not capable of handling that and might just cause unpleasant
problems. In fact, we should change 'W' to something else since standard libpq
understands that message and hangs forever expecting more data. With a new
protocol message, it would have failed, thus providing a more user friend
error. But postponing that for now since we should think through implications
of protocol change carefully before doing that.
|
|
d9f45c9018ec3ec1fc11e4be2be7f9728a1799b1 attempted to refactor
release_connection() to make it more readable, but unfortunately
inverted the force_destroy check, causing regression failures.
In hindsight, the refactoring was rather arbitrary and not really
helping with the readability, so just revert to the original code
(but keep the comments, explaining what's happening).
|
|
A number of functions were defined in pgxcnode.h/pgxnnode.h, but
only ever used in poolmgr.c. Those are:
- PGXCNodeConnect - open libpq connection using conn. string
- PGXCNodePing - ping node using connection string
- PGXCNodeClose - close libpq connection
- PGXCNodeConnected - verify connection status
- PGXCNodeConnStr - build connection string
So move them to poolmgr.c and make them static, so that poolmgr
is the only part dealing with libpq connections directly.
|
|
Similarly to a39b06b0c6, this does minor cleanup in the pool manager
code by removing unused functions and adding a lot of comments, both
at the file level (explaining the concepts and basic API methods)
and for individual functions.
|
|
ANALYZE was not collecting index statistics, which may have negative
impact for example on selectivity estimates for expressions. This also
fixes some incorrect plan changes in updatable_views regression test.
Discussion: <c822a7ff-7c53-ebaf-6f34-03132cd27621@2ndquadrant.com>
|
|
This fixes some remaining bugs in handling root->distribution, caused
by the upper-planner pathification (in PostgreSQL 9.6).
Prior to the pathification (so in PostgreSQL 9.5 and Postgres-XL 9.5),
the root->distribution was used for two purposes:
* To track distribution expected by ModifyTable (UPDATE,DELETE), so
that grouping_planner() knew how to redistribute the data.
* To communicate the resulting distribution from grouping_planner()
back to standard_planner().
This worked fine in 9.5 as grouping_planner() was only dealing with
a single remaining path (plan) when considering the redistribution,
and so it was OK to tweak root->distribution.
But since the pathification in 9.6 that is no longer true. There is
no obvious reason why all the paths would have to share the same
distribution, and we don't know which one will be the cheapest one.
So from now on root->distribution is used to track the distribution
expected by ModifyTable. Distribution for each path is available in
path->distribution if needed.
Note: We still use subroot->distribution to pass information about
distribution of subqueries, though. But we only set it after the
one cheapest path is selected.
|
|
While rewriting UPDATE/DELETE commands in rewriteTargetListUD, we've
been pulling all Vars from quals, and adding them to target lists. As
multiple Vars may reference the same column, this sometimes produced
plans with duplicate targetlist entries like this one:
Update on public.t111
-> Index Scan using t1_a_idx on public.t1
Output: 100, t1.b, t1.c, t1.a, t1.a, t1.a, t1.a, t1.a, t1.a,
t1.a, t1.a, t1.ctid
-> ...
Getting rid of the duplicate entries would be simple - before adding
entry for eachh Vars, check that a matching entry does not exist yet.
The question however is if we actually need any of this.
The comment in rewriteTargetListUD() claims we need to add the Vars
because of "coordinator quals" - which is not really defined anywhere,
but it probably means quals evaluated at the Remote Subquery node.
But we push all quals to the remote node, so there should not be any
cases where a qual would have to be evaluated locally (or where that
would be preferable).
So just remove all the relevant code from rewriteHandler.c, which
means we produce this plan instead:
Update on public.t111
-> Index Scan using t1_a_idx on public.t1
Output: 100, t1.b, t1.c, t1.ctid
-> ...
This affects a number of plans in regression tests, but the changes
seem fine - we simply remove unnecessary target list entries.
I've also added an assert to EXPLAIN enforcing the "no quals" rule
for Remote Subquery nodes.
Discussion: <95e80368-1549-a921-c5e2-7e0ad9485bd3@2ndquadrant.com>
|
|
pgxc_FQS_planner() was not copying queryId, so extensions relying on
it did not work properly. For example the pg_stat_statements extension
was ignoring queries executed using FQS entirely.
Backpatch to Postgres-XL 9.5.
|
|
When checking if a query is eligible for FQS (fast-query shipping),
disable the optimization for queries in SCROLL cursors, as FQS does
not support backward scans.
Discussion: <e66932f3-3c35-cab0-af7e-60e8dfa423ba@2ndquadrant.com>
|
|
Our efforts to improve shared queue synchronization continues. We now have a
per queue producer lwlock that must be held for synchronization between
consumers and the producer. Consumers must hold this lock before setting the
producer latch to ensure the producer does not miss out any signals and does
not go into unnecessary waits.
We still can't get rid of all the timeouts, especially we see that sometimes a
producer finishes and tries to unbind from the queue, even before a consumer
gets chance to connect to the queue. We left the 10s wait to allow consumers to
connect. There is still net improvement because when the consumer is not going
to connect, it tells the producer and we avoid the 10s timeout, like we used to
see earlier.
|
|
We had an issue with tracking knownXids on the standby and it was overflowing
the allocated array in the shared memory. It turned out that the primary reason
for this is that the GTM leaves behind a hole in XID allocation when it's
restarted. The standby oblivious to this, was complaining about array overflow
and thus die.
We now fix this by allocating array which can hold CONTROL_INTERVAL worth
additional XIDs. This would mostly be a waste because the XIDs are never
allocated. But this seems like a quick fix to further test the Hot standby. The
good thing is that we might just waste memory, but not have any impact on the
performance because of larger array since we only loop for numKnownXids which
will be more accurate.
With this change, also fix the defaults for datanode and coordinator standbys
and make them Hot Standbys. The wal_level is changed too.
|
|
When communicating with other nodes, we send names of objects instead
of OIDs as those are assigned on each node independently. We failed to
do this for Aggref->aggargtypes, which worked fine for built-in data
types (those have the same OID on all nodes), but resulted in failures
for custom data types (like for example FIXEDDECIMAL).
ERROR: cache lookup failed for type 16731
This fixes it by implementing READ/WRITE_TYPID_LIST_FIELD, similarly
to what we had for RELID.
Note: Turns out the WRITE_RELID_LIST_FIELD was broken, but apparently
we never call it in XL as it's only used for arbiterIndexes field. So
fix that too, in case we enable the feature in the future.
|
|
Rules are converted in their string representation and stored in the catalog.
While building relation descriptor, this information is read back and converted
into a Node representation. Since relation descriptors could be built when we
are reading plan information sent by the remote server in a stringified
representation, trying to read the rules with portable input on may lead to
unpleasant behaviour. So we must first reset portable input and restore it back
after reading the rules. The same applies to RLS policies (even though we don't
have a test showing the impact, but it looks like a sane thing to fix anyways)
|
|
Commit 04f96689945462a4212047f03eb3281fb56bcf2f incorrectly allowed
distributed grouping paths for grouping sets, causing failures in
'groupingsets' regression test suite. So fix that by making sure
try_distributed_aggregation=false for plans with grouping sets.
|
|
We now create views/materialised views on all nodes, unless they are temporary
objects in which case they are created only on the local coordinator and the
datanodes. Similarly, temporary sequences are created on the local coordinator
and the datanodes.
This solves many outstanding problems in the regression results where remote
nodes used to fail because of non-existent type for a view or similar such
issues. A few other test cases now started to work correctly and produce output
matching upstream PG. So the expected output for those test cases has been
appropriated fixed.
Couple of sequences in the rangefuncs test case have been converted into
permanent sequences because the subsequent SQL functions refer to them and
hence fail if they do not exist on the remote coordinators.
The problem with special RULE converting a regular table into a view goes away
with the fix since DROP VIEW commands are now propgataed to the datanodes too.
|
|
Furthre more simplification and consolidation of the code.
|
|
|
|
|
|
Commit 665c224a6b2afa disabled CREATE PUBLICATION/SUBSCRIPTION, but
it was still possible to create a logical replication slot and call
pg_logical_slot_get_changes() on it.
That would however crash and burn as ReorderBufferCommit() relies on
subtransactions, and BeginInternalSubTransaction() is not expected
to fail, leading to segfaults in the PG_CATCH block.
Simply disallowing creating logical slots (and whatever else relies
on CheckLogicalDecodingRequirements) seems like the best fix.
|
|
Some database objects are created only on a subset of nodes. For example, views
are created only on the coordinators. Similarly, temp tables are created on the
local coordinator and all datanodes. So we must consult the relation kind
before executing the CREATE STATISTICS command on the remote nodes. Otherwise
we might try to execute it on a node where the underlying object is missing,
resulting in errors.
Patch by senhu (senhu@tencent.com) which was later reworked by me.
|
|
A dummy append node with no subpaths doesn't need any adjustment for
distribution. This allows us to actually correct handle UPDATE/DELETE in some
cases which were failing earlier.
|
|
We never had this support and we never felt the need because the use of FQS was
limited for utility statements and simple queries which can be completed
pushed down to the remote node. But in PG 10, we're seeing errors while using
cursors for queries which are FQSed. So instead of forcing regular remote
subplan on such queries, we are adding support for rescan of RemoteQuery node.
Patch by Senhu <senhu@tencent.com>
|
|
The target datanode must be determined after computing the next value. So
let is go through regular planning. This fixes couple of regression failures.
|
|
As the coordinator_lxid is uin32, so make sure we use %u to format it
(e.g. when sending it to remote nodes as string) and not just %d.
|
|
The coordinator_lxid GUC is internally stored as uint32, but was defined
as plaint int32, triggering a compiler warning. It's also unclear what
would happen for transaction IDs outside the signed range (possibly some
strange issues).
This adds a new GUC type (UInt), used only for this one GUC. The patch
is fairly large, but most of it is boilerplate infrastructure to support
the new GUC type. We have considered simpler workarounds (e.g. treating
the GUC as string and converting it to/from uint32 using the GUC hooks,
but this seems much cleaner and tidier.
|
|
gcc 6.4.1 is complaining when ExecRemoteQuery(PlanState *) gets called
with (RemoteSubqueryState*) parameter. This commit adds explicit cast on
a few places to silence the warnings noise.
An alternative fix might be to use (RemoteSubqueryState*), but that does
not quite work as ResponseCombiner needs to keep a pointer to either
ExecRemoteQuery or ExecRemoteSubplan. So the explicit cast seems better.
|
|
We were not dealing with the params in Subplan correctly, thus those params
were not sent to the remote nodes correctly during RemoteSubplan exectution.
This patch fixes that by traversing the Subplan node correctly. The regression
failure in the 'join' test case is addressed too.
Patch by senhu (senhu@tencent.com)
|
|
Recent changes in PG 10 generates a nextval() expression (there was no support
for NextValExpr in ruleutils before that). But that fails on the datanode side
because only DEFAULT values are accepted for identity columns, unless
overridden. This patch restores the XL behaviour, thus helping the regression.
|
|
This is the merge-base of PostgreSQL's master branch and REL_10_STABLE branch.
This should be the last merge from PG's master branch into XL 10 branch.
Subsequent merges must happen from REL_10_STABLE branch
|
|
|
|
Stress testing by Andreas Seltenreich disclosed longstanding problems that
occur if a FATAL exit (e.g. due to receipt of SIGTERM) occurs while we are
trying to execute a ROLLBACK of an already-failed transaction. In such a
case, xact.c is in TBLOCK_ABORT state, so that AbortOutOfAnyTransaction
would skip AbortTransaction and go straight to CleanupTransaction. This
led to an assert failure in an assert-enabled build (due to the ROLLBACK's
portal still having a cleanup hook) or without assertions, to a FATAL exit
complaining about "cannot drop active portal". The latter's not
disastrous, perhaps, but it's messy enough to want to improve it.
We don't really want to run all of AbortTransaction in this code path.
The minimum required to clean up the open portal safely is to do
AtAbort_Memory and AtAbort_Portals. It seems like a good idea to
do AtAbort_Memory unconditionally, to be entirely sure that we are
starting with a safe CurrentMemoryContext. That means that if the
main loop in AbortOutOfAnyTransaction does nothing, we need an extra
step at the bottom to restore CurrentMemoryContext = TopMemoryContext,
which I chose to do by invoking AtCleanup_Memory. This'll result in
calling AtCleanup_Memory twice in many of the paths through this function,
but that seems harmless and reasonably inexpensive.
The original motivation for the assertion in AtCleanup_Portals was that
we wanted to be sure that any user-defined code executed as a consequence
of the cleanup hook runs during AbortTransaction not CleanupTransaction.
That still seems like a valid concern, and now that we've seen one case
of the assertion firing --- which means that exactly that would have
happened in a production build --- let's replace the Assert with a runtime
check. If we see the cleanup hook still set, we'll emit a WARNING and
just drop the hook unexecuted.
This has been like this a long time, so back-patch to all supported
branches.
Discussion: https://postgr.es/m/877ey7bmun.fsf@ansel.ydns.eu
|
|
Author: Masahiko Sawada <sawada.mshk@gmail.com>
|
|
The sole useful effect of this function, to check that no catcache
entries have positive refcounts at transaction end, has really been
obsolete since we introduced ResourceOwners in PG 8.1. We reduced the
checks to assertions years ago, so that the function was a complete
no-op in production builds. There have been previous discussions about
removing it entirely, but consensus up to now was that it had some small
value as a cross-check for bugs in the ResourceOwner logic.
However, it now emerges that it's possible to trigger these assertions
if you hit an assert-enabled backend with SIGTERM during a call to
SearchCatCacheList, because that function temporarily increases the
refcounts of entries it's intending to add to a catcache list construct.
In a normal ERROR scenario, the extra refcounts are cleaned up by
SearchCatCacheList's PG_CATCH block; but in a FATAL exit we do a
transaction abort and exit without ever executing PG_CATCH handlers.
There's a case to be made that this is a generic hazard and we should
consider restructuring elog(FATAL) handling so that pending PG_CATCH
handlers do get run. That's pretty scary though: it could easily create
more problems than it solves. Preliminary stress testing by Andreas
Seltenreich suggests that there are not many live problems of this ilk,
so we rejected that idea.
There are more-localized ways to fix the problem; the most principled
one would be to use PG_ENSURE_ERROR_CLEANUP instead of plain PG_TRY.
But adding cycles to SearchCatCacheList isn't very appealing. We could
also weaken the assertions in AtEOXact_CatCache in some more or less
ad-hoc way, but that just makes its raison d'etre even less compelling.
In the end, the most reasonable solution seems to be to just remove
AtEOXact_CatCache altogether, on the grounds that it's not worth trying
to fix it. It hasn't found any bugs for us in many years.
Per report from Jeevan Chalke. Back-patch to all supported branches.
Discussion: https://postgr.es/m/CAM2+6=VEE30YtRQCZX7_sCFsEpoUkFBV1gZazL70fqLn8rcvBA@mail.gmail.com
|
|
Reported by Masahiko Sawada
Discussion: https://postgr.es/m/CAD21AoB+ycZ2z-4Ye=6MfQ_r0aV5r6cvVPw4kOyPdp6bHqQoBQ@mail.gmail.com
|
|
This affects mostly code comments, some documentation, and tests.
Official APIs already used "standby".
|