summaryrefslogtreecommitdiff
path: root/src/backend
AgeCommit message (Collapse)Author
2018-10-12Use sufficiently large buffer in SharedQueueWriteTomas Vondra
The sq_key alone may be up to 64 bytes, so we need more than that. We could use dynamic memory instead, but 128 bytes should be enough both for the sq_key and the other pieces.
2018-08-03Use correct path for tablspaces while creating a basebackupPavan Deolasee
In XL, we embed the nodename in the tablespace subdir name to ensure that non-conflicting paths are created when multiple coordinators/datanodes are running on the same server. The code to handle tablespace mapping in basebackup was missing this support. Per report and patch by Wanglin.
2018-07-31Ensure partition child tables inherit distribution properties correctlyPavan Deolasee
While in restore mode, that we use to load schema when a new node is added to the cluster, the partition child tables should correctly inherit the distribution properties from the parent table. This support was lacking, thus leading to incorrect handling of such tables. Per report by Virendra Kumar.
2018-07-27Teach pgxc_exec_sizefunc() to use pg_my_temp_schema() to get temp schemaPavan Deolasee
Similar to what we did in e688c0c23c962d425b82fdfad014bace4207af1d, we must not rely on the temporary namespace on the coordinator since it may change on the remote nodes. Instead we use the pg_my_temp_schema() function to find the currently active temporary schema on the remote node.
2018-07-27Fix handling of REFRESH MATERIALIZED VIEW CONCURRENTLYPavan Deolasee
We create a coordinator-only LOCAL temporary table for REFRESH MATERIALIZED VIEW CONCURRENTLY. Since this table does not exist on the remote nodes, we must not use explicit "ANALYZE <temptable>". Instead, just analyze it locally like we were doing at other places. Restore the matview test case to use REFRESH MATERIALIZED VIEW CONCURRENTLY now that the underlying bug is fixed.
2018-07-27Fix a compiler warning introduced in the previous commitPavan Deolasee
2018-07-27Ensure that typename is schema qualified while sending row descriptionPavan Deolasee
A row description messages contains the type information for the attributes in the column. But if the type does not exist in the search_path then the coordinator fails to parse the typename back to the type. So the datanode must send the schema name along with the type name. Per report and test case by Hengbing Wang @ Microfun. Added a new test file and a few test cases to cover this area.
2018-07-27Ensure pooler process follows consistent model for SIGQUIT handlingPavan Deolasee
We'd occassionally seen that the pooler process fails to respond to SIGQUIT and gets stuck in a non recoverable state. Code inspection reveals that we're not following the model followed by rest of the background worker processes in handling SIGQUIT. So get that fixed, with the hope that this will fix the problem case.
2018-07-27Properly quote typename before calling parseTypeStringPavan Deolasee
Without this, parseTypeString() might throw an error or resolve to a wrong type in case the type name requires quoting. Per report by Hengbing Wang
2018-05-21Remove some accidentally added elog(LOG) messagesPavan Deolasee
2018-05-21Fix broken implementation of recovery to barrier.Pavan Deolasee
Per report from Hengbing, the current implementation of PITR recovery to a BARRIER failed to correctly stop at the given recovery_target_barrier. It seems there are two bugs here. 1) we failed to write the XLOG record correctly and 2) we also failed to mark the end-of-recovery upon seeing the XLOG record during the recovery. Fix both these problems and also fix pg_xlogdump in passing to ensure we can dump the BARRIER XLOG records correctly.
2018-05-21Fix a long standing bug in vacuum/analyze of temp tablesPavan Deolasee
The system may and very likely choose different namespace for temporary tables on different nodes. So it was erroneous to explicitly add the coordinator side nampspace to the queries constructed for fetching stats from the remote nodes. A regression test was non-deterministically failing for this reason for long, but only now we could fully understand the problem and fix it. We now use pg_my_temp_schema() to derive the current temporary schema used by the remote node instead of hardcoding that in the query using coordinator side information.
2018-05-18Fix post-cherry-pick problems.Pavan Deolasee
2018-05-18Track clearly whether to run a remote transaction in autocommit or a blockPavan Deolasee
Chi Gao and Hengbing Wang reported certain issues around transaction handling and demonstrated via xlogdump how certain transactions were getting marked committed/aborted repeatedly on a datanode. When an already committed transaction is attempted to be aborted again, it results in a PANIC. Upon investigation, this uncovered a very serious yet long standing bug in transaction handling. If the client is running in autocommit mode, we try to avoid starting a transaction block on the datanode side if only one datanode is going to be involved in the transaction. This is an optimisation to speed up short queries touching only a single node. But when the query rewriter transforms a single statement into multiple statements, we would still (and incorrectly) run each statement in an autocommit mode on the datanode. This can cause inconsistencies when one statement commits but the next statement aborts. And it may also lead to the PANIC situations if we continue to use the same global transaction identifier for the statements. This can also happen when the user invokes a user-defined function. If the function has multiple statements, each statement will run in an autocommit mode, if it's FQSed, thus again creating inconsistency if a following statement in the function fails. We now have a more elaborate mechanism to tackle autocommit and transaction block needs. The special casing for force_autocommit is now removed, thus making it more predictable. We also have specific conditions to check to ensure that we don't mixup autocommit and transaction block for the same global xid. Finally, if a query rewriter transforms a single statement into multiple statements, we run those statements in a transaction block. Together these changes should help us fix the problems.
2018-05-07Do not try to show targetlist of a RemoteSubplan on top of ModifyTablePavan Deolasee
We do some special processing for RemoteSubplan with returning lists. But the EXPLAIN plan mechanism is not adequetly trained to handle that special crafting. So for now do not try to print the target list in the EXPLAIN output.
2018-04-17Do not send the new protocol message to non-XL client.Pavan Deolasee
The new message 'W' to report waited-for XIDs must not be sent to a non-XL client since it's not capable of handling that and might just cause unpleasant problems. In fact, we should change 'W' to something else since standard libpq understands that message and hangs forever expecting more data. With a new protocol message, it would have failed, thus providing a more user friend error. But postponing that for now since we should think through implications of protocol change carefully before doing that.
2017-11-07Fix bug in release_connection() introduced by d9f45c9018Tomas Vondra
d9f45c9018ec3ec1fc11e4be2be7f9728a1799b1 attempted to refactor release_connection() to make it more readable, but unfortunately inverted the force_destroy check, causing regression failures. In hindsight, the refactoring was rather arbitrary and not really helping with the readability, so just revert to the original code (but keep the comments, explaining what's happening).
2017-11-04Move several functions from pgxcnode.c to poolmgr.cTomas Vondra
A number of functions were defined in pgxcnode.h/pgxnnode.h, but only ever used in poolmgr.c. Those are: - PGXCNodeConnect - open libpq connection using conn. string - PGXCNodePing - ping node using connection string - PGXCNodeClose - close libpq connection - PGXCNodeConnected - verify connection status - PGXCNodeConnStr - build connection string So move them to poolmgr.c and make them static, so that poolmgr is the only part dealing with libpq connections directly.
2017-11-04Comments and cleanup in the connection pool managerTomas Vondra
Similarly to a39b06b0c6, this does minor cleanup in the pool manager code by removing unused functions and adding a lot of comments, both at the file level (explaining the concepts and basic API methods) and for individual functions.
2017-10-19Collect index statistics during ANALYZE on coordinatorTomas Vondra
ANALYZE was not collecting index statistics, which may have negative impact for example on selectivity estimates for expressions. This also fixes some incorrect plan changes in updatable_views regression test. Discussion: <c822a7ff-7c53-ebaf-6f34-03132cd27621@2ndquadrant.com>
2017-10-19Fix handling of root->distribution during redistributionTomas Vondra
This fixes some remaining bugs in handling root->distribution, caused by the upper-planner pathification (in PostgreSQL 9.6). Prior to the pathification (so in PostgreSQL 9.5 and Postgres-XL 9.5), the root->distribution was used for two purposes: * To track distribution expected by ModifyTable (UPDATE,DELETE), so that grouping_planner() knew how to redistribute the data. * To communicate the resulting distribution from grouping_planner() back to standard_planner(). This worked fine in 9.5 as grouping_planner() was only dealing with a single remaining path (plan) when considering the redistribution, and so it was OK to tweak root->distribution. But since the pathification in 9.6 that is no longer true. There is no obvious reason why all the paths would have to share the same distribution, and we don't know which one will be the cheapest one. So from now on root->distribution is used to track the distribution expected by ModifyTable. Distribution for each path is available in path->distribution if needed. Note: We still use subroot->distribution to pass information about distribution of subqueries, though. But we only set it after the one cheapest path is selected.
2017-10-19Remove coordinator quals, evaluated at Remote SubqueryTomas Vondra
While rewriting UPDATE/DELETE commands in rewriteTargetListUD, we've been pulling all Vars from quals, and adding them to target lists. As multiple Vars may reference the same column, this sometimes produced plans with duplicate targetlist entries like this one: Update on public.t111 -> Index Scan using t1_a_idx on public.t1 Output: 100, t1.b, t1.c, t1.a, t1.a, t1.a, t1.a, t1.a, t1.a, t1.a, t1.a, t1.ctid -> ... Getting rid of the duplicate entries would be simple - before adding entry for eachh Vars, check that a matching entry does not exist yet. The question however is if we actually need any of this. The comment in rewriteTargetListUD() claims we need to add the Vars because of "coordinator quals" - which is not really defined anywhere, but it probably means quals evaluated at the Remote Subquery node. But we push all quals to the remote node, so there should not be any cases where a qual would have to be evaluated locally (or where that would be preferable). So just remove all the relevant code from rewriteHandler.c, which means we produce this plan instead: Update on public.t111 -> Index Scan using t1_a_idx on public.t1 Output: 100, t1.b, t1.c, t1.ctid -> ... This affects a number of plans in regression tests, but the changes seem fine - we simply remove unnecessary target list entries. I've also added an assert to EXPLAIN enforcing the "no quals" rule for Remote Subquery nodes. Discussion: <95e80368-1549-a921-c5e2-7e0ad9485bd3@2ndquadrant.com>
2017-10-14Remember queryId for queries executed using FQSTomas Vondra
pgxc_FQS_planner() was not copying queryId, so extensions relying on it did not work properly. For example the pg_stat_statements extension was ignoring queries executed using FQS entirely. Backpatch to Postgres-XL 9.5.
2017-10-05Disable FQS for cursors defined with SCROLLTomas Vondra
When checking if a query is eligible for FQS (fast-query shipping), disable the optimization for queries in SCROLL cursors, as FQS does not support backward scans. Discussion: <e66932f3-3c35-cab0-af7e-60e8dfa423ba@2ndquadrant.com>
2017-09-20Improve shared queue synchronization furtherPavan Deolasee
Our efforts to improve shared queue synchronization continues. We now have a per queue producer lwlock that must be held for synchronization between consumers and the producer. Consumers must hold this lock before setting the producer latch to ensure the producer does not miss out any signals and does not go into unnecessary waits. We still can't get rid of all the timeouts, especially we see that sometimes a producer finishes and tries to unbind from the queue, even before a consumer gets chance to connect to the queue. We left the 10s wait to allow consumers to connect. There is still net improvement because when the consumer is not going to connect, it tells the producer and we avoid the 10s timeout, like we used to see earlier.
2017-09-20Enable Hot Standby on the replicasPavan Deolasee
We had an issue with tracking knownXids on the standby and it was overflowing the allocated array in the shared memory. It turned out that the primary reason for this is that the GTM leaves behind a hole in XID allocation when it's restarted. The standby oblivious to this, was complaining about array overflow and thus die. We now fix this by allocating array which can hold CONTROL_INTERVAL worth additional XIDs. This would mostly be a waste because the XIDs are never allocated. But this seems like a quick fix to further test the Hot standby. The good thing is that we might just waste memory, but not have any impact on the performance because of larger array since we only loop for numKnownXids which will be more accurate. With this change, also fix the defaults for datanode and coordinator standbys and make them Hot Standbys. The wal_level is changed too.
2017-09-19Handle Aggref->aggargtypes in out/readfuncs.cTomas Vondra
When communicating with other nodes, we send names of objects instead of OIDs as those are assigned on each node independently. We failed to do this for Aggref->aggargtypes, which worked fine for built-in data types (those have the same OID on all nodes), but resulted in failures for custom data types (like for example FIXEDDECIMAL). ERROR: cache lookup failed for type 16731 This fixes it by implementing READ/WRITE_TYPID_LIST_FIELD, similarly to what we had for RELID. Note: Turns out the WRITE_RELID_LIST_FIELD was broken, but apparently we never call it in XL as it's only used for arbiterIndexes field. So fix that too, in case we enable the feature in the future.
2017-09-18Ensure that we don't read rule definition with portable input onPavan Deolasee
Rules are converted in their string representation and stored in the catalog. While building relation descriptor, this information is read back and converted into a Node representation. Since relation descriptors could be built when we are reading plan information sent by the remote server in a stringified representation, trying to read the rules with portable input on may lead to unpleasant behaviour. So we must first reset portable input and restore it back after reading the rules. The same applies to RLS policies (even though we don't have a test showing the impact, but it looks like a sane thing to fix anyways)
2017-09-15Fix incorrect planning of grouping setsTomas Vondra
Commit 04f96689945462a4212047f03eb3281fb56bcf2f incorrectly allowed distributed grouping paths for grouping sets, causing failures in 'groupingsets' regression test suite. So fix that by making sure try_distributed_aggregation=false for plans with grouping sets.
2017-09-12Ensure that database objects are created consistently.Pavan Deolasee
We now create views/materialised views on all nodes, unless they are temporary objects in which case they are created only on the local coordinator and the datanodes. Similarly, temporary sequences are created on the local coordinator and the datanodes. This solves many outstanding problems in the regression results where remote nodes used to fail because of non-existent type for a view or similar such issues. A few other test cases now started to work correctly and produce output matching upstream PG. So the expected output for those test cases has been appropriated fixed. Couple of sequences in the rangefuncs test case have been converted into permanent sequences because the subsequent SQL functions refer to them and hence fail if they do not exist on the remote coordinators. The problem with special RULE converting a regular table into a view goes away with the fix since DROP VIEW commands are now propgataed to the datanodes too.
2017-09-11Further refactoring of utility.c codePavan Deolasee
Furthre more simplification and consolidation of the code.
2017-09-08Rearrange switch cases so that they are grouped together when possiblePavan Deolasee
2017-09-08Refactor changes in the utility.cPavan Deolasee
2017-08-30Disable logical decoding as unsupportedTomas Vondra
Commit 665c224a6b2afa disabled CREATE PUBLICATION/SUBSCRIPTION, but it was still possible to create a logical replication slot and call pg_logical_slot_get_changes() on it. That would however crash and burn as ReorderBufferCommit() relies on subtransactions, and BeginInternalSubTransaction() is not expected to fail, leading to segfaults in the PG_CATCH block. Simply disallowing creating logical slots (and whatever else relies on CheckLogicalDecodingRequirements) seems like the best fix.
2017-08-30Fetch the target remote nodes to run CREATE STATISTICS commandPavan Deolasee
Some database objects are created only on a subset of nodes. For example, views are created only on the coordinators. Similarly, temp tables are created on the local coordinator and all datanodes. So we must consult the relation kind before executing the CREATE STATISTICS command on the remote nodes. Otherwise we might try to execute it on a node where the underlying object is missing, resulting in errors. Patch by senhu (senhu@tencent.com) which was later reworked by me.
2017-08-28Do not add any distribution to a dummy append nodePavan Deolasee
A dummy append node with no subpaths doesn't need any adjustment for distribution. This allows us to actually correct handle UPDATE/DELETE in some cases which were failing earlier.
2017-08-22Handle rescan of RemoteQuery node correctlyPavan Deolasee
We never had this support and we never felt the need because the use of FQS was limited for utility statements and simple queries which can be completed pushed down to the remote node. But in PG 10, we're seeing errors while using cursors for queries which are FQSed. So instead of forcing regular remote subplan on such queries, we are adding support for rescan of RemoteQuery node. Patch by Senhu <senhu@tencent.com>
2017-08-22Do not FQS NextValueExprPavan Deolasee
The target datanode must be determined after computing the next value. So let is go through regular planning. This fixes couple of regression failures.
2017-08-21Make sure coordinator_lxid is formatted as %u and not %dTomas Vondra
As the coordinator_lxid is uin32, so make sure we use %u to format it (e.g. when sending it to remote nodes as string) and not just %d.
2017-08-21Define coordinator_lxid GUC as unsigned integerTomas Vondra
The coordinator_lxid GUC is internally stored as uint32, but was defined as plaint int32, triggering a compiler warning. It's also unclear what would happen for transaction IDs outside the signed range (possibly some strange issues). This adds a new GUC type (UInt), used only for this one GUC. The patch is fairly large, but most of it is boilerplate infrastructure to support the new GUC type. We have considered simpler workarounds (e.g. treating the GUC as string and converting it to/from uint32 using the GUC hooks, but this seems much cleaner and tidier.
2017-08-21Make sure ExecRemoteQuery is called with (PlanState *) parameterTomas Vondra
gcc 6.4.1 is complaining when ExecRemoteQuery(PlanState *) gets called with (RemoteSubqueryState*) parameter. This commit adds explicit cast on a few places to silence the warnings noise. An alternative fix might be to use (RemoteSubqueryState*), but that does not quite work as ResponseCombiner needs to keep a pointer to either ExecRemoteQuery or ExecRemoteSubplan. So the explicit cast seems better.
2017-08-21Handle params correctly within Subplan nodesPavan Deolasee
We were not dealing with the params in Subplan correctly, thus those params were not sent to the remote nodes correctly during RemoteSubplan exectution. This patch fixes that by traversing the Subplan node correctly. The regression failure in the 'join' test case is addressed too. Patch by senhu (senhu@tencent.com)
2017-08-18Generate a DEFAULT clause for identity columnsPavan Deolasee
Recent changes in PG 10 generates a nextval() expression (there was no support for NextValExpr in ruleutils before that). But that fails on the datanode side because only DEFAULT values are accepted for identity columns, unless overridden. This patch restores the XL behaviour, thus helping the regression.
2017-08-18Merge commit '21d304dfedb4f26d0d6587d9ac39b1b5c499bb55'Pavan Deolasee
This is the merge-base of PostgreSQL's master branch and REL_10_STABLE branch. This should be the last merge from PG's master branch into XL 10 branch. Subsequent merges must happen from REL_10_STABLE branch
2017-08-14Final pgindent + perltidy run for v10.Tom Lane
2017-08-14Handle elog(FATAL) during ROLLBACK more robustly.Tom Lane
Stress testing by Andreas Seltenreich disclosed longstanding problems that occur if a FATAL exit (e.g. due to receipt of SIGTERM) occurs while we are trying to execute a ROLLBACK of an already-failed transaction. In such a case, xact.c is in TBLOCK_ABORT state, so that AbortOutOfAnyTransaction would skip AbortTransaction and go straight to CleanupTransaction. This led to an assert failure in an assert-enabled build (due to the ROLLBACK's portal still having a cleanup hook) or without assertions, to a FATAL exit complaining about "cannot drop active portal". The latter's not disastrous, perhaps, but it's messy enough to want to improve it. We don't really want to run all of AbortTransaction in this code path. The minimum required to clean up the open portal safely is to do AtAbort_Memory and AtAbort_Portals. It seems like a good idea to do AtAbort_Memory unconditionally, to be entirely sure that we are starting with a safe CurrentMemoryContext. That means that if the main loop in AbortOutOfAnyTransaction does nothing, we need an extra step at the bottom to restore CurrentMemoryContext = TopMemoryContext, which I chose to do by invoking AtCleanup_Memory. This'll result in calling AtCleanup_Memory twice in many of the paths through this function, but that seems harmless and reasonably inexpensive. The original motivation for the assertion in AtCleanup_Portals was that we wanted to be sure that any user-defined code executed as a consequence of the cleanup hook runs during AbortTransaction not CleanupTransaction. That still seems like a valid concern, and now that we've seen one case of the assertion firing --- which means that exactly that would have happened in a production build --- let's replace the Assert with a runtime check. If we see the cleanup hook still set, we'll emit a WARNING and just drop the hook unexecuted. This has been like this a long time, so back-patch to all supported branches. Discussion: https://postgr.es/m/877ey7bmun.fsf@ansel.ydns.eu
2017-08-14Fix typoPeter Eisentraut
Author: Masahiko Sawada <sawada.mshk@gmail.com>
2017-08-13Remove AtEOXact_CatCache().Tom Lane
The sole useful effect of this function, to check that no catcache entries have positive refcounts at transaction end, has really been obsolete since we introduced ResourceOwners in PG 8.1. We reduced the checks to assertions years ago, so that the function was a complete no-op in production builds. There have been previous discussions about removing it entirely, but consensus up to now was that it had some small value as a cross-check for bugs in the ResourceOwner logic. However, it now emerges that it's possible to trigger these assertions if you hit an assert-enabled backend with SIGTERM during a call to SearchCatCacheList, because that function temporarily increases the refcounts of entries it's intending to add to a catcache list construct. In a normal ERROR scenario, the extra refcounts are cleaned up by SearchCatCacheList's PG_CATCH block; but in a FATAL exit we do a transaction abort and exit without ever executing PG_CATCH handlers. There's a case to be made that this is a generic hazard and we should consider restructuring elog(FATAL) handling so that pending PG_CATCH handlers do get run. That's pretty scary though: it could easily create more problems than it solves. Preliminary stress testing by Andreas Seltenreich suggests that there are not many live problems of this ilk, so we rejected that idea. There are more-localized ways to fix the problem; the most principled one would be to use PG_ENSURE_ERROR_CLEANUP instead of plain PG_TRY. But adding cycles to SearchCatCacheList isn't very appealing. We could also weaken the assertions in AtEOXact_CatCache in some more or less ad-hoc way, but that just makes its raison d'etre even less compelling. In the end, the most reasonable solution seems to be to just remove AtEOXact_CatCache altogether, on the grounds that it's not worth trying to fix it. It hasn't found any bugs for us in many years. Per report from Jeevan Chalke. Back-patch to all supported branches. Discussion: https://postgr.es/m/CAM2+6=VEE30YtRQCZX7_sCFsEpoUkFBV1gZazL70fqLn8rcvBA@mail.gmail.com
2017-08-13Reword comment for clarityAlvaro Herrera
Reported by Masahiko Sawada Discussion: https://postgr.es/m/CAD21AoB+ycZ2z-4Ye=6MfQ_r0aV5r6cvVPw4kOyPdp6bHqQoBQ@mail.gmail.com
2017-08-11Remove uses of "slave" in replication contextsPeter Eisentraut
This affects mostly code comments, some documentation, and tests. Official APIs already used "standby".