postgres-xl.git - Official repo for Postgres-XL. Stable branch is XL9_5_STABLE. Current development is PG10 compatible. Controlled by Postgres-X2 Core Team.

Age	Commit message (Collapse)	Author
2018-10-12	Use sufficiently large buffer in SharedQueueWrite	Tomas Vondra
	The sq_key alone may be up to 64 bytes, so we need more than that. We could use dynamic memory instead, but 128 bytes should be enough both for the sq_key and the other pieces.
2018-08-03	Use correct path for tablspaces while creating a basebackup	Pavan Deolasee
	In XL, we embed the nodename in the tablespace subdir name to ensure that non-conflicting paths are created when multiple coordinators/datanodes are running on the same server. The code to handle tablespace mapping in basebackup was missing this support. Per report and patch by Wanglin.
2018-07-31	Ensure partition child tables inherit distribution properties correctly	Pavan Deolasee
	While in restore mode, that we use to load schema when a new node is added to the cluster, the partition child tables should correctly inherit the distribution properties from the parent table. This support was lacking, thus leading to incorrect handling of such tables. Per report by Virendra Kumar.
2018-07-27	Teach pgxc_exec_sizefunc() to use pg_my_temp_schema() to get temp schema	Pavan Deolasee
	Similar to what we did in e688c0c23c962d425b82fdfad014bace4207af1d, we must not rely on the temporary namespace on the coordinator since it may change on the remote nodes. Instead we use the pg_my_temp_schema() function to find the currently active temporary schema on the remote node.
2018-07-27	Fix handling of REFRESH MATERIALIZED VIEW CONCURRENTLY	Pavan Deolasee
	We create a coordinator-only LOCAL temporary table for REFRESH MATERIALIZED VIEW CONCURRENTLY. Since this table does not exist on the remote nodes, we must not use explicit "ANALYZE <temptable>". Instead, just analyze it locally like we were doing at other places. Restore the matview test case to use REFRESH MATERIALIZED VIEW CONCURRENTLY now that the underlying bug is fixed.
2018-07-27	Fix a compiler warning introduced in the previous commit	Pavan Deolasee

2018-07-27	Ensure that typename is schema qualified while sending row description	Pavan Deolasee
	A row description messages contains the type information for the attributes in the column. But if the type does not exist in the search_path then the coordinator fails to parse the typename back to the type. So the datanode must send the schema name along with the type name. Per report and test case by Hengbing Wang @ Microfun. Added a new test file and a few test cases to cover this area.
2018-07-27	Ensure pooler process follows consistent model for SIGQUIT handling	Pavan Deolasee
	We'd occassionally seen that the pooler process fails to respond to SIGQUIT and gets stuck in a non recoverable state. Code inspection reveals that we're not following the model followed by rest of the background worker processes in handling SIGQUIT. So get that fixed, with the hope that this will fix the problem case.
2018-07-27	Properly quote typename before calling parseTypeString	Pavan Deolasee
	Without this, parseTypeString() might throw an error or resolve to a wrong type in case the type name requires quoting. Per report by Hengbing Wang
2018-05-21	Remove some accidentally added elog(LOG) messages	Pavan Deolasee

2018-05-21	Fix broken implementation of recovery to barrier.	Pavan Deolasee
	Per report from Hengbing, the current implementation of PITR recovery to a BARRIER failed to correctly stop at the given recovery_target_barrier. It seems there are two bugs here. 1) we failed to write the XLOG record correctly and 2) we also failed to mark the end-of-recovery upon seeing the XLOG record during the recovery. Fix both these problems and also fix pg_xlogdump in passing to ensure we can dump the BARRIER XLOG records correctly.
2018-05-21	Fix a long standing bug in vacuum/analyze of temp tables	Pavan Deolasee
	The system may and very likely choose different namespace for temporary tables on different nodes. So it was erroneous to explicitly add the coordinator side nampspace to the queries constructed for fetching stats from the remote nodes. A regression test was non-deterministically failing for this reason for long, but only now we could fully understand the problem and fix it. We now use pg_my_temp_schema() to derive the current temporary schema used by the remote node instead of hardcoding that in the query using coordinator side information.
2018-05-18	Fix post-cherry-pick problems.	Pavan Deolasee

2018-05-18	Track clearly whether to run a remote transaction in autocommit or a block	Pavan Deolasee
	Chi Gao and Hengbing Wang reported certain issues around transaction handling and demonstrated via xlogdump how certain transactions were getting marked committed/aborted repeatedly on a datanode. When an already committed transaction is attempted to be aborted again, it results in a PANIC. Upon investigation, this uncovered a very serious yet long standing bug in transaction handling. If the client is running in autocommit mode, we try to avoid starting a transaction block on the datanode side if only one datanode is going to be involved in the transaction. This is an optimisation to speed up short queries touching only a single node. But when the query rewriter transforms a single statement into multiple statements, we would still (and incorrectly) run each statement in an autocommit mode on the datanode. This can cause inconsistencies when one statement commits but the next statement aborts. And it may also lead to the PANIC situations if we continue to use the same global transaction identifier for the statements. This can also happen when the user invokes a user-defined function. If the function has multiple statements, each statement will run in an autocommit mode, if it's FQSed, thus again creating inconsistency if a following statement in the function fails. We now have a more elaborate mechanism to tackle autocommit and transaction block needs. The special casing for force_autocommit is now removed, thus making it more predictable. We also have specific conditions to check to ensure that we don't mixup autocommit and transaction block for the same global xid. Finally, if a query rewriter transforms a single statement into multiple statements, we run those statements in a transaction block. Together these changes should help us fix the problems.
2018-05-07	Do not try to show targetlist of a RemoteSubplan on top of ModifyTable	Pavan Deolasee
	We do some special processing for RemoteSubplan with returning lists. But the EXPLAIN plan mechanism is not adequetly trained to handle that special crafting. So for now do not try to print the target list in the EXPLAIN output.
2018-04-17	Do not send the new protocol message to non-XL client.	Pavan Deolasee
	The new message 'W' to report waited-for XIDs must not be sent to a non-XL client since it's not capable of handling that and might just cause unpleasant problems. In fact, we should change 'W' to something else since standard libpq understands that message and hangs forever expecting more data. With a new protocol message, it would have failed, thus providing a more user friend error. But postponing that for now since we should think through implications of protocol change carefully before doing that.
2017-11-07	Fix bug in release_connection() introduced by d9f45c9018	Tomas Vondra
	d9f45c9018ec3ec1fc11e4be2be7f9728a1799b1 attempted to refactor release_connection() to make it more readable, but unfortunately inverted the force_destroy check, causing regression failures. In hindsight, the refactoring was rather arbitrary and not really helping with the readability, so just revert to the original code (but keep the comments, explaining what's happening).
2017-11-04	Move several functions from pgxcnode.c to poolmgr.c	Tomas Vondra
	A number of functions were defined in pgxcnode.h/pgxnnode.h, but only ever used in poolmgr.c. Those are: - PGXCNodeConnect - open libpq connection using conn. string - PGXCNodePing - ping node using connection string - PGXCNodeClose - close libpq connection - PGXCNodeConnected - verify connection status - PGXCNodeConnStr - build connection string So move them to poolmgr.c and make them static, so that poolmgr is the only part dealing with libpq connections directly.
2017-11-04	Comments and cleanup in the connection pool manager	Tomas Vondra
	Similarly to a39b06b0c6, this does minor cleanup in the pool manager code by removing unused functions and adding a lot of comments, both at the file level (explaining the concepts and basic API methods) and for individual functions.
2017-10-19	Collect index statistics during ANALYZE on coordinator	Tomas Vondra
	ANALYZE was not collecting index statistics, which may have negative impact for example on selectivity estimates for expressions. This also fixes some incorrect plan changes in updatable_views regression test. Discussion: <c822a7ff-7c53-ebaf-6f34-03132cd27621@2ndquadrant.com>
2017-10-19	Fix handling of root->distribution during redistribution	Tomas Vondra
	This fixes some remaining bugs in handling root->distribution, caused by the upper-planner pathification (in PostgreSQL 9.6). Prior to the pathification (so in PostgreSQL 9.5 and Postgres-XL 9.5), the root->distribution was used for two purposes: * To track distribution expected by ModifyTable (UPDATE,DELETE), so that grouping_planner() knew how to redistribute the data. * To communicate the resulting distribution from grouping_planner() back to standard_planner(). This worked fine in 9.5 as grouping_planner() was only dealing with a single remaining path (plan) when considering the redistribution, and so it was OK to tweak root->distribution. But since the pathification in 9.6 that is no longer true. There is no obvious reason why all the paths would have to share the same distribution, and we don't know which one will be the cheapest one. So from now on root->distribution is used to track the distribution expected by ModifyTable. Distribution for each path is available in path->distribution if needed. Note: We still use subroot->distribution to pass information about distribution of subqueries, though. But we only set it after the one cheapest path is selected.
2017-10-19	Remove coordinator quals, evaluated at Remote Subquery	Tomas Vondra
	While rewriting UPDATE/DELETE commands in rewriteTargetListUD, we've been pulling all Vars from quals, and adding them to target lists. As multiple Vars may reference the same column, this sometimes produced plans with duplicate targetlist entries like this one: Update on public.t111 -> Index Scan using t1_a_idx on public.t1 Output: 100, t1.b, t1.c, t1.a, t1.a, t1.a, t1.a, t1.a, t1.a, t1.a, t1.a, t1.ctid -> ... Getting rid of the duplicate entries would be simple - before adding entry for eachh Vars, check that a matching entry does not exist yet. The question however is if we actually need any of this. The comment in rewriteTargetListUD() claims we need to add the Vars because of "coordinator quals" - which is not really defined anywhere, but it probably means quals evaluated at the Remote Subquery node. But we push all quals to the remote node, so there should not be any cases where a qual would have to be evaluated locally (or where that would be preferable). So just remove all the relevant code from rewriteHandler.c, which means we produce this plan instead: Update on public.t111 -> Index Scan using t1_a_idx on public.t1 Output: 100, t1.b, t1.c, t1.ctid -> ... This affects a number of plans in regression tests, but the changes seem fine - we simply remove unnecessary target list entries. I've also added an assert to EXPLAIN enforcing the "no quals" rule for Remote Subquery nodes. Discussion: <95e80368-1549-a921-c5e2-7e0ad9485bd3@2ndquadrant.com>
2017-10-14	Remember queryId for queries executed using FQS	Tomas Vondra
	pgxc_FQS_planner() was not copying queryId, so extensions relying on it did not work properly. For example the pg_stat_statements extension was ignoring queries executed using FQS entirely. Backpatch to Postgres-XL 9.5.
2017-10-05	Disable FQS for cursors defined with SCROLL	Tomas Vondra
	When checking if a query is eligible for FQS (fast-query shipping), disable the optimization for queries in SCROLL cursors, as FQS does not support backward scans. Discussion: <e66932f3-3c35-cab0-af7e-60e8dfa423ba@2ndquadrant.com>
2017-09-20	Improve shared queue synchronization further	Pavan Deolasee
	Our efforts to improve shared queue synchronization continues. We now have a per queue producer lwlock that must be held for synchronization between consumers and the producer. Consumers must hold this lock before setting the producer latch to ensure the producer does not miss out any signals and does not go into unnecessary waits. We still can't get rid of all the timeouts, especially we see that sometimes a producer finishes and tries to unbind from the queue, even before a consumer gets chance to connect to the queue. We left the 10s wait to allow consumers to connect. There is still net improvement because when the consumer is not going to connect, it tells the producer and we avoid the 10s timeout, like we used to see earlier.
2017-09-20	Enable Hot Standby on the replicas	Pavan Deolasee
	We had an issue with tracking knownXids on the standby and it was overflowing the allocated array in the shared memory. It turned out that the primary reason for this is that the GTM leaves behind a hole in XID allocation when it's restarted. The standby oblivious to this, was complaining about array overflow and thus die. We now fix this by allocating array which can hold CONTROL_INTERVAL worth additional XIDs. This would mostly be a waste because the XIDs are never allocated. But this seems like a quick fix to further test the Hot standby. The good thing is that we might just waste memory, but not have any impact on the performance because of larger array since we only loop for numKnownXids which will be more accurate. With this change, also fix the defaults for datanode and coordinator standbys and make them Hot Standbys. The wal_level is changed too.
2017-09-19	Handle Aggref->aggargtypes in out/readfuncs.c	Tomas Vondra
	When communicating with other nodes, we send names of objects instead of OIDs as those are assigned on each node independently. We failed to do this for Aggref->aggargtypes, which worked fine for built-in data types (those have the same OID on all nodes), but resulted in failures for custom data types (like for example FIXEDDECIMAL). ERROR: cache lookup failed for type 16731 This fixes it by implementing READ/WRITE_TYPID_LIST_FIELD, similarly to what we had for RELID. Note: Turns out the WRITE_RELID_LIST_FIELD was broken, but apparently we never call it in XL as it's only used for arbiterIndexes field. So fix that too, in case we enable the feature in the future.
2017-09-18	Ensure that we don't read rule definition with portable input on	Pavan Deolasee
	Rules are converted in their string representation and stored in the catalog. While building relation descriptor, this information is read back and converted into a Node representation. Since relation descriptors could be built when we are reading plan information sent by the remote server in a stringified representation, trying to read the rules with portable input on may lead to unpleasant behaviour. So we must first reset portable input and restore it back after reading the rules. The same applies to RLS policies (even though we don't have a test showing the impact, but it looks like a sane thing to fix anyways)
2017-09-15	Fix incorrect planning of grouping sets	Tomas Vondra
	Commit 04f96689945462a4212047f03eb3281fb56bcf2f incorrectly allowed distributed grouping paths for grouping sets, causing failures in 'groupingsets' regression test suite. So fix that by making sure try_distributed_aggregation=false for plans with grouping sets.
2017-09-12	Ensure that database objects are created consistently.	Pavan Deolasee
	We now create views/materialised views on all nodes, unless they are temporary objects in which case they are created only on the local coordinator and the datanodes. Similarly, temporary sequences are created on the local coordinator and the datanodes. This solves many outstanding problems in the regression results where remote nodes used to fail because of non-existent type for a view or similar such issues. A few other test cases now started to work correctly and produce output matching upstream PG. So the expected output for those test cases has been appropriated fixed. Couple of sequences in the rangefuncs test case have been converted into permanent sequences because the subsequent SQL functions refer to them and hence fail if they do not exist on the remote coordinators. The problem with special RULE converting a regular table into a view goes away with the fix since DROP VIEW commands are now propgataed to the datanodes too.
2017-09-11	Further refactoring of utility.c code	Pavan Deolasee
	Furthre more simplification and consolidation of the code.
2017-09-08	Rearrange switch cases so that they are grouped together when possible	Pavan Deolasee

2017-09-08	Refactor changes in the utility.c	Pavan Deolasee

2017-08-30	Disable logical decoding as unsupported	Tomas Vondra
	Commit 665c224a6b2afa disabled CREATE PUBLICATION/SUBSCRIPTION, but it was still possible to create a logical replication slot and call pg_logical_slot_get_changes() on it. That would however crash and burn as ReorderBufferCommit() relies on subtransactions, and BeginInternalSubTransaction() is not expected to fail, leading to segfaults in the PG_CATCH block. Simply disallowing creating logical slots (and whatever else relies on CheckLogicalDecodingRequirements) seems like the best fix.
2017-08-30	Fetch the target remote nodes to run CREATE STATISTICS command	Pavan Deolasee
	Some database objects are created only on a subset of nodes. For example, views are created only on the coordinators. Similarly, temp tables are created on the local coordinator and all datanodes. So we must consult the relation kind before executing the CREATE STATISTICS command on the remote nodes. Otherwise we might try to execute it on a node where the underlying object is missing, resulting in errors. Patch by senhu (senhu@tencent.com) which was later reworked by me.
2017-08-28	Do not add any distribution to a dummy append node	Pavan Deolasee
	A dummy append node with no subpaths doesn't need any adjustment for distribution. This allows us to actually correct handle UPDATE/DELETE in some cases which were failing earlier.
2017-08-22	Handle rescan of RemoteQuery node correctly	Pavan Deolasee
	We never had this support and we never felt the need because the use of FQS was limited for utility statements and simple queries which can be completed pushed down to the remote node. But in PG 10, we're seeing errors while using cursors for queries which are FQSed. So instead of forcing regular remote subplan on such queries, we are adding support for rescan of RemoteQuery node. Patch by Senhu <senhu@tencent.com>
2017-08-22	Do not FQS NextValueExpr	Pavan Deolasee
	The target datanode must be determined after computing the next value. So let is go through regular planning. This fixes couple of regression failures.
2017-08-21	Make sure coordinator_lxid is formatted as %u and not %d	Tomas Vondra
	As the coordinator_lxid is uin32, so make sure we use %u to format it (e.g. when sending it to remote nodes as string) and not just %d.
2017-08-21	Define coordinator_lxid GUC as unsigned integer	Tomas Vondra
	The coordinator_lxid GUC is internally stored as uint32, but was defined as plaint int32, triggering a compiler warning. It's also unclear what would happen for transaction IDs outside the signed range (possibly some strange issues). This adds a new GUC type (UInt), used only for this one GUC. The patch is fairly large, but most of it is boilerplate infrastructure to support the new GUC type. We have considered simpler workarounds (e.g. treating the GUC as string and converting it to/from uint32 using the GUC hooks, but this seems much cleaner and tidier.
2017-08-21	Make sure ExecRemoteQuery is called with (PlanState *) parameter	Tomas Vondra
	gcc 6.4.1 is complaining when ExecRemoteQuery(PlanState ) gets called with (RemoteSubqueryState) parameter. This commit adds explicit cast on a few places to silence the warnings noise. An alternative fix might be to use (RemoteSubqueryState*), but that does not quite work as ResponseCombiner needs to keep a pointer to either ExecRemoteQuery or ExecRemoteSubplan. So the explicit cast seems better.
2017-08-21	Handle params correctly within Subplan nodes	Pavan Deolasee
	We were not dealing with the params in Subplan correctly, thus those params were not sent to the remote nodes correctly during RemoteSubplan exectution. This patch fixes that by traversing the Subplan node correctly. The regression failure in the 'join' test case is addressed too. Patch by senhu (senhu@tencent.com)
2017-08-18	Generate a DEFAULT clause for identity columns	Pavan Deolasee
	Recent changes in PG 10 generates a nextval() expression (there was no support for NextValExpr in ruleutils before that). But that fails on the datanode side because only DEFAULT values are accepted for identity columns, unless overridden. This patch restores the XL behaviour, thus helping the regression.
2017-08-18	Merge commit '21d304dfedb4f26d0d6587d9ac39b1b5c499bb55'	Pavan Deolasee
	This is the merge-base of PostgreSQL's master branch and REL_10_STABLE branch. This should be the last merge from PG's master branch into XL 10 branch. Subsequent merges must happen from REL_10_STABLE branch
2017-08-14	Final pgindent + perltidy run for v10.	Tom Lane

2017-08-14	Handle elog(FATAL) during ROLLBACK more robustly.	Tom Lane
	Stress testing by Andreas Seltenreich disclosed longstanding problems that occur if a FATAL exit (e.g. due to receipt of SIGTERM) occurs while we are trying to execute a ROLLBACK of an already-failed transaction. In such a case, xact.c is in TBLOCK_ABORT state, so that AbortOutOfAnyTransaction would skip AbortTransaction and go straight to CleanupTransaction. This led to an assert failure in an assert-enabled build (due to the ROLLBACK's portal still having a cleanup hook) or without assertions, to a FATAL exit complaining about "cannot drop active portal". The latter's not disastrous, perhaps, but it's messy enough to want to improve it. We don't really want to run all of AbortTransaction in this code path. The minimum required to clean up the open portal safely is to do AtAbort_Memory and AtAbort_Portals. It seems like a good idea to do AtAbort_Memory unconditionally, to be entirely sure that we are starting with a safe CurrentMemoryContext. That means that if the main loop in AbortOutOfAnyTransaction does nothing, we need an extra step at the bottom to restore CurrentMemoryContext = TopMemoryContext, which I chose to do by invoking AtCleanup_Memory. This'll result in calling AtCleanup_Memory twice in many of the paths through this function, but that seems harmless and reasonably inexpensive. The original motivation for the assertion in AtCleanup_Portals was that we wanted to be sure that any user-defined code executed as a consequence of the cleanup hook runs during AbortTransaction not CleanupTransaction. That still seems like a valid concern, and now that we've seen one case of the assertion firing --- which means that exactly that would have happened in a production build --- let's replace the Assert with a runtime check. If we see the cleanup hook still set, we'll emit a WARNING and just drop the hook unexecuted. This has been like this a long time, so back-patch to all supported branches. Discussion: https://postgr.es/m/877ey7bmun.fsf@ansel.ydns.eu
2017-08-14	Fix typo	Peter Eisentraut
	Author: Masahiko Sawada <sawada.mshk@gmail.com>
2017-08-13	Remove AtEOXact_CatCache().	Tom Lane
	The sole useful effect of this function, to check that no catcache entries have positive refcounts at transaction end, has really been obsolete since we introduced ResourceOwners in PG 8.1. We reduced the checks to assertions years ago, so that the function was a complete no-op in production builds. There have been previous discussions about removing it entirely, but consensus up to now was that it had some small value as a cross-check for bugs in the ResourceOwner logic. However, it now emerges that it's possible to trigger these assertions if you hit an assert-enabled backend with SIGTERM during a call to SearchCatCacheList, because that function temporarily increases the refcounts of entries it's intending to add to a catcache list construct. In a normal ERROR scenario, the extra refcounts are cleaned up by SearchCatCacheList's PG_CATCH block; but in a FATAL exit we do a transaction abort and exit without ever executing PG_CATCH handlers. There's a case to be made that this is a generic hazard and we should consider restructuring elog(FATAL) handling so that pending PG_CATCH handlers do get run. That's pretty scary though: it could easily create more problems than it solves. Preliminary stress testing by Andreas Seltenreich suggests that there are not many live problems of this ilk, so we rejected that idea. There are more-localized ways to fix the problem; the most principled one would be to use PG_ENSURE_ERROR_CLEANUP instead of plain PG_TRY. But adding cycles to SearchCatCacheList isn't very appealing. We could also weaken the assertions in AtEOXact_CatCache in some more or less ad-hoc way, but that just makes its raison d'etre even less compelling. In the end, the most reasonable solution seems to be to just remove AtEOXact_CatCache altogether, on the grounds that it's not worth trying to fix it. It hasn't found any bugs for us in many years. Per report from Jeevan Chalke. Back-patch to all supported branches. Discussion: https://postgr.es/m/CAM2+6=VEE30YtRQCZX7_sCFsEpoUkFBV1gZazL70fqLn8rcvBA@mail.gmail.com
2017-08-13	Reword comment for clarity	Alvaro Herrera
	Reported by Masahiko Sawada Discussion: https://postgr.es/m/CAD21AoB+ycZ2z-4Ye=6MfQ_r0aV5r6cvVPw4kOyPdp6bHqQoBQ@mail.gmail.com
2017-08-11	Remove uses of "slave" in replication contexts	Peter Eisentraut
	This affects mostly code comments, some documentation, and tests. Official APIs already used "standby".