postgres-xl.git - Official repo for Postgres-XL. Stable branch is XL9_5_STABLE. Current development is PG10 compatible. Controlled by Postgres-X2 Core Team.

Age	Commit message (Collapse)	Author
2018-10-12	Use sufficiently large buffer in SharedQueueWrite	Tomas Vondra
	The sq_key alone may be up to 64 bytes, so we need more than that. We could use dynamic memory instead, but 128 bytes should be enough both for the sq_key and the other pieces.
2018-07-27	Ensure that typename is schema qualified while sending row description	Pavan Deolasee
	A row description messages contains the type information for the attributes in the column. But if the type does not exist in the search_path then the coordinator fails to parse the typename back to the type. So the datanode must send the schema name along with the type name. Per report and test case by Hengbing Wang @ Microfun. Added a new test file and a few test cases to cover this area.
2018-07-27	Ensure pooler process follows consistent model for SIGQUIT handling	Pavan Deolasee
	We'd occassionally seen that the pooler process fails to respond to SIGQUIT and gets stuck in a non recoverable state. Code inspection reveals that we're not following the model followed by rest of the background worker processes in handling SIGQUIT. So get that fixed, with the hope that this will fix the problem case.
2018-07-27	Properly quote typename before calling parseTypeString	Pavan Deolasee
	Without this, parseTypeString() might throw an error or resolve to a wrong type in case the type name requires quoting. Per report by Hengbing Wang
2018-05-21	Fix broken implementation of recovery to barrier.	Pavan Deolasee
	Per report from Hengbing, the current implementation of PITR recovery to a BARRIER failed to correctly stop at the given recovery_target_barrier. It seems there are two bugs here. 1) we failed to write the XLOG record correctly and 2) we also failed to mark the end-of-recovery upon seeing the XLOG record during the recovery. Fix both these problems and also fix pg_xlogdump in passing to ensure we can dump the BARRIER XLOG records correctly.
2018-05-18	Track clearly whether to run a remote transaction in autocommit or a block	Pavan Deolasee
	Chi Gao and Hengbing Wang reported certain issues around transaction handling and demonstrated via xlogdump how certain transactions were getting marked committed/aborted repeatedly on a datanode. When an already committed transaction is attempted to be aborted again, it results in a PANIC. Upon investigation, this uncovered a very serious yet long standing bug in transaction handling. If the client is running in autocommit mode, we try to avoid starting a transaction block on the datanode side if only one datanode is going to be involved in the transaction. This is an optimisation to speed up short queries touching only a single node. But when the query rewriter transforms a single statement into multiple statements, we would still (and incorrectly) run each statement in an autocommit mode on the datanode. This can cause inconsistencies when one statement commits but the next statement aborts. And it may also lead to the PANIC situations if we continue to use the same global transaction identifier for the statements. This can also happen when the user invokes a user-defined function. If the function has multiple statements, each statement will run in an autocommit mode, if it's FQSed, thus again creating inconsistency if a following statement in the function fails. We now have a more elaborate mechanism to tackle autocommit and transaction block needs. The special casing for force_autocommit is now removed, thus making it more predictable. We also have specific conditions to check to ensure that we don't mixup autocommit and transaction block for the same global xid. Finally, if a query rewriter transforms a single statement into multiple statements, we run those statements in a transaction block. Together these changes should help us fix the problems.
2017-11-07	Fix bug in release_connection() introduced by d9f45c9018	Tomas Vondra
	d9f45c9018ec3ec1fc11e4be2be7f9728a1799b1 attempted to refactor release_connection() to make it more readable, but unfortunately inverted the force_destroy check, causing regression failures. In hindsight, the refactoring was rather arbitrary and not really helping with the readability, so just revert to the original code (but keep the comments, explaining what's happening).
2017-11-04	Move several functions from pgxcnode.c to poolmgr.c	Tomas Vondra
	A number of functions were defined in pgxcnode.h/pgxnnode.h, but only ever used in poolmgr.c. Those are: - PGXCNodeConnect - open libpq connection using conn. string - PGXCNodePing - ping node using connection string - PGXCNodeClose - close libpq connection - PGXCNodeConnected - verify connection status - PGXCNodeConnStr - build connection string So move them to poolmgr.c and make them static, so that poolmgr is the only part dealing with libpq connections directly.
2017-11-04	Comments and cleanup in the connection pool manager	Tomas Vondra
	Similarly to a39b06b0c6, this does minor cleanup in the pool manager code by removing unused functions and adding a lot of comments, both at the file level (explaining the concepts and basic API methods) and for individual functions.
2017-10-14	Remember queryId for queries executed using FQS	Tomas Vondra
	pgxc_FQS_planner() was not copying queryId, so extensions relying on it did not work properly. For example the pg_stat_statements extension was ignoring queries executed using FQS entirely. Backpatch to Postgres-XL 9.5.
2017-10-05	Disable FQS for cursors defined with SCROLL	Tomas Vondra
	When checking if a query is eligible for FQS (fast-query shipping), disable the optimization for queries in SCROLL cursors, as FQS does not support backward scans. Discussion: <e66932f3-3c35-cab0-af7e-60e8dfa423ba@2ndquadrant.com>
2017-09-20	Improve shared queue synchronization further	Pavan Deolasee
	Our efforts to improve shared queue synchronization continues. We now have a per queue producer lwlock that must be held for synchronization between consumers and the producer. Consumers must hold this lock before setting the producer latch to ensure the producer does not miss out any signals and does not go into unnecessary waits. We still can't get rid of all the timeouts, especially we see that sometimes a producer finishes and tries to unbind from the queue, even before a consumer gets chance to connect to the queue. We left the 10s wait to allow consumers to connect. There is still net improvement because when the consumer is not going to connect, it tells the producer and we avoid the 10s timeout, like we used to see earlier.
2017-09-18	Ensure that we don't read rule definition with portable input on	Pavan Deolasee
	Rules are converted in their string representation and stored in the catalog. While building relation descriptor, this information is read back and converted into a Node representation. Since relation descriptors could be built when we are reading plan information sent by the remote server in a stringified representation, trying to read the rules with portable input on may lead to unpleasant behaviour. So we must first reset portable input and restore it back after reading the rules. The same applies to RLS policies (even though we don't have a test showing the impact, but it looks like a sane thing to fix anyways)
2017-08-22	Handle rescan of RemoteQuery node correctly	Pavan Deolasee
	We never had this support and we never felt the need because the use of FQS was limited for utility statements and simple queries which can be completed pushed down to the remote node. But in PG 10, we're seeing errors while using cursors for queries which are FQSed. So instead of forcing regular remote subplan on such queries, we are adding support for rescan of RemoteQuery node. Patch by Senhu <senhu@tencent.com>
2017-08-21	Make sure coordinator_lxid is formatted as %u and not %d	Tomas Vondra
	As the coordinator_lxid is uin32, so make sure we use %u to format it (e.g. when sending it to remote nodes as string) and not just %d.
2017-08-21	Handle params correctly within Subplan nodes	Pavan Deolasee
	We were not dealing with the params in Subplan correctly, thus those params were not sent to the remote nodes correctly during RemoteSubplan exectution. This patch fixes that by traversing the Subplan node correctly. The regression failure in the 'join' test case is addressed too. Patch by senhu (senhu@tencent.com)
2017-08-18	Merge commit '21d304dfedb4f26d0d6587d9ac39b1b5c499bb55'	Pavan Deolasee
	This is the merge-base of PostgreSQL's master branch and REL_10_STABLE branch. This should be the last merge from PG's master branch into XL 10 branch. Subsequent merges must happen from REL_10_STABLE branch
2017-08-08	Mark variable as const to prevent compiler warning	Tomas Vondra
	The quote_guc_value() result was assigned to regular (char *) variable, resulting in compiler warning about discarding const qualifier. Fix by marking the variable as 'const' too.
2017-06-28	Merge remote-tracking branch 'remotes/origin/master' into xl10devel	Pavan Deolasee
	This merges the current master branch of XL with the XL 10 development branch. Commits upto f72330316ea5796a2b11a05710b98eba4e706788 are included in this merge.
2017-06-17	Randomize the choice of the initial ROUNDROBIN node	Tomas Vondra
	With roundrobin node, the initial node was always set to the first node in the list. That works fine when inserting many rows at once (e.g. with INSERT SELECT), but with single-row inserts this puts all data on the first node, resulting in unbalanced distribution. This randomizes the choice of the initial node, so that with single-row inserts the ROUNDROBIN behaves a bit like RANDOM distribution. This also removes unnecessary srand() call from RelationBuildLocator(), located after a call to rand().
2017-06-17	Prevent dropping distribution keys for MODULO	Tomas Vondra
	Due to a coding issue in IsDistColumnForRelId() it was possible to drop columns that were used as modulo distribution keys. A simple example demonstrates the behavior: CREATE TABLE t (a INT, b INT, c INT) DISTRIBUTE BY MODULO (a); ALTER TABLE t DROP COLUMN a; test=# \d+ t Table "public.t" Column \| Type \| Modifiers \| Storage \| Stats target \| Description --------+---------+-----------+---------+--------------+------------- b \| integer \| \| plain \| \| c \| integer \| \| plain \| \| Distribute By: MODULO(........pg.dropped.1........) Location Nodes: ALL DATANODES With this commit, the ALTER TABLE command fails as expected: ERROR: Distribution column cannot be dropped The commit simplifies the coding a bit, and removes several functions that were not needed anymore (and unused outside locator.c).
2017-06-17	Add support for MODULO distribution on BIGINT	Tomas Vondra
	Until now BIGINT data type was not supported by MODULO distribution and attempts to create such tables failed. This patch removes the limitation. The compute_modulo() function originally used an optimized algorithm from http://www.graphics.stanford.edu/~seander/bithacks.html (namely the one described in section "Compute modulus division by (1 << s) - 1 in parallel without a division operator") to compute the modulo. But that algorithm version only supported 32-bit values, and so would require changes to support 64-bit values. Instead, I've decided to simply drop that code and use simple % operator, which should translate to IDIV instruction. Judging by benchmarks (MODULO on INTEGER column), switching to plain modulo (%) might result in about 1% slowdown, but it might easily be just noise caused by different binary layout due to code changes. In fact, the simplified algorithm is much less noisy in this respect.
2017-06-14	Merge from PG master upto d5cb3bab564e0927ffac7c8729eacf181a12dd40	Pavan Deolasee
	This is the result of the "git merge remotes/PGSQL/master" upto the said commit point. We have done some basic analysis, fixed compilation problems etc, but bulk of the logical problems in conflict resolution etc will be handled by subsequent commits.
2017-06-08	Switch connections after processing PGXLRemoteFetchSize rows	Pavan Deolasee
	Fast-query-shipping consumes all rows produced by one datanode (connection) before moving to the next connection. This leads to suboptimal performance when the datanodes can't prroduce tuples at a desired pace. Instead, we switch between connections after every PGXLRemoteFetchSize (pgx_remote_fetch_size) rows are fetched. This gives datanode a chance to produce more tuples while the coordinator consumes tuples already produced and sent across by another datanode. This seems to improve performance for FQS-ed queries significantly when they are returning large number of rows from more than one datanodes. Report by Pilar de Teodoro <pteodoro@sciops.esa.int>, initial analysis and performance tests by Tomas Vondra, further analysis and patch by me. Backpatched to XL9_5_STABLE.
2017-05-18	Do not reset global_session on RESET ALL	Pavan Deolasee
	This avoids resetting global session information when DISCARD/RESET ALL is executed. This can have bad effects, especially as seen from the 'guc' test case where we fail to handle temp tables correctly. So we mark global_session GUC with GUC_NO_RESET_ALL flag and instead issue an explicit RESET global_session when connection is cleaned up.
2017-05-05	Add a user configurable parameter to control the number of rows fetched from	Pavan Deolasee
	the remote side during RemoteSubplan execution. This allows us to experiment with different sizes more easily. Playing with the fetch size also exposed couple of problems fixed in this same commit. 1. We were incorrectly forgetting a connection response combiner while suspending a portal, leading to errors later when we try to buffer the results because the connection must be used for other queries. 2. The remote cursor name was not getting set properly, thus datanodes complaining about non-existent cursors.
2017-05-05	Ensure that we don't try to allocate connection in/out buffers over	Pavan Deolasee
	MaxAllocSize. We take this opportunity to rearrange the code to avoid duplicity in handling in and out buffers. Also, add some other checks to ensure that we don't overrun the limits. Report, investigation and draft patch by Krzysztof Nienartowicz.
2017-05-05	Try to validate the combiner only when a RESPONSE_COPY is received during	Pavan Deolasee
	running COPY protocol. We sometimes do see a 'M' message (command ID received from the remote node) from the datanode and that breaks the combiner validation message. So we only do that if we received a RESPONSE_COPY message during COPY IN protocol.
2017-05-05	Do not add a newline ('\n') between rows while running a BINARY COPY protocol.	Pavan Deolasee
	PostgreSQL's BINARY COPY protocol does not expect a newline character between rows. But coordinator was adding this while running the protocol between coordinator and the datanodes. While Postgres-XL had some provision to deal with this on the datanode side, it seemed either insufficient or buggy as evident during tests with pglogical. So get rid of that and make the protocol same as vanilla PG.
2017-05-05	Ensure variable values are quoted when necessary while sending down SET	Pavan Deolasee
	comamnds to the remote nodes Earlier we'd special cased a few GUCs such as those using memory or time units or transaction isolation levels. But clearly that wasn't enough as we noticed with "application_name" recently. So fix this problem in a more comprehensive manner. Added a few more test cases to cover these scenarios.
2017-03-19	Resolve failures in rules tests by adding T_CteScan to execRemote.c	Tomas Vondra
	The switch in determine_param_types() was missing T_CteScan case, so some queries were failing with: ERROR: unrecognized node type: 119 This fixes it, and it also fixes the expected output for a few other queries in the same test suite.
2017-01-19	add name of the originating node to application_name	Tomas Vondra
	Until now the application_name was set to 'pgxc' for all connections. Set it to 'pgxc:NODE_NAME' instead, to make it easier to understand data from pg_stat_activity.
2017-01-09	fix SharedQueuesInit to register the tranche properly	Tomas Vondra
	The code was mixing code for named and regular tranches (identified by just ID, in this case LWTRANCHE_SHARED_QUEUES). So switch to the LWLockRegisterTranche() approach. There were a two more bugs though: 1) The tranche struct was local in the block, which does not work as it's referenced in the global array without creating a copy. So just make it static and global. 2) The locks were not properly initialized. Add a LWLockInitialize call to the loop.
2016-11-09	allocate SQueue LWLocks using a named tranche (dynamically)	Tomas Vondra
	LWLockAssign() got removed so this is the only way to do this. In any case, once we switch to the built-in shared queues, this should not be needed at all I believe.
2016-11-07	get rid of HAVE_SIGPROCMASK	Tomas Vondra
	The compilation was failing in poolmgr.c because of incorrect types when tweaking signals. After inspection it seems HAVE_SIGPROCMASK is not actually needed (it's not defined anywhere, and upstream is not using it either) so just get rid of it entirely.
2016-11-07	fix includes in poolmgr.c because of s_lock.h failures	Tomas Vondra
	The frontend libraries need to be included last, because libpq-int.h defines FRONTEND, which triggers #error in s_lock.h. Including the libraries at the end is not particularly pretty, but it works. Also reorder the includes a bit into the usual lexicographic order.
2016-11-07	fix calls to tuplesort_gettupleslot() from execRemote.c	Tomas Vondra
	There's a fourth argument (abbrev), so just pass NULL.
2016-10-26	Fix tablesample handling which was broken during the merge	Pavan Deolasee

2016-10-18	Ensure we're not going beyond the configured limits of max_coordinators or	Pavan Deolasee
	max_datanodes The user can still break things by first setting a high value for max_datanodes or max_coordinators and then lowering it again after adding nodes. So this basic check is not enough, but better than what we've currently.
2016-10-18	Allow COMMIT/ROLLBACK PREPARED xact to succeed on the local coordinator even if	Pavan Deolasee
	the GTM no longer has state information for the transaction when xc_maintenance_mode is active. Information on the GTM is important for explicit prepared transactions because GTM tracks the participants nodes in that case and the COMMIT/ROLLBACK PREPARED commands can then to forwarded to those participants. So when xc_maintenance_mode is active, we only cleanup prepared transaction on the local node (or remote when used with EXECUTE DIRECT). So it seems ok not to guarantee the GTM to have the state information.
2016-10-18	Handle remote parameters correctly by sending then down to the remote nodes in	Pavan Deolasee
	the correct order and correction position Per report and test-case by Daniel Bunford-Jones
2016-10-18	remove pgxc/pool/postgresql_fdw.c as is_immutable_func() is unused	Tomas Vondra
	There was only a single function in this module, and it was not referenced from anywhere. So remove the module and all references to it (e.g. includes of the header). In createplan.c, this is a bit more difficult, because it got getpid() through this file, so include the pgxcnode.h instead, as it uncludes unistd.h.
2016-10-18	remove stormutils, containing only stormdb_promote_standby() function	Tomas Vondra
	Although stormdb_promote_standby() is a function visible from SQL, it does not seem mentioned anywhere (e.g. in the docs etc.). So let's get rid of it, as it's apparently not needed.
2016-10-18	refactor DoRefreshRemoteHandles a bit to get rid of compiler warnings	Tomas Vondra
	As written originally, the branching confused the compiler enough to emit warning about using uninitialized allOids variable. Refactor the code a bit to make it clear that's not really possible, and also move a number of variable declarations from the function scope to a block level. Note: I think the function would deserve more documentation - both about the intent in general, and about the meaning of the return value ('res' is quite unclear name).
2016-10-18	rework GetPoolManagerHandle a bit to get rid of compiler warnings	Tomas Vondra
	Under rare conditions (missing support for unix sockets or empty Unix_socket_directories) the fdsocket variable remained unitialized. Handle this case explicitly, and also fix a few minor annoyances (remove extra braces around a single-line block).
2016-10-18	refactor agent_handle_input() by splitting it into smaller pieces	Tomas Vondra
	When handling of a message type requires more than a few lines, move it to a separate function. This makes the code flow much cleaner, and it also issues with reusing shared variables (originally defined at the function scope). Also differentiate between EOF and protocol violation, if only to log the protocol violation. Note: The for(;;) loop checks for EOF on two places - once in the switch, and then at the very end (using pool_pollbyte). There's a subtle difference - the second place does not do agent_destroy. Not sure if this is intentional, but perhaps that's a bug?
2016-10-18	minor PgxcNodeListAndCount() refactoring to fix compiler warnings	Tomas Vondra
	While tweaking the code, I've noticed this is the only place using NodeHealthStatus, apparently attempting to save a few bytes. That seems rather pointless given how little memory this will need and how short-lived the memory is (and also how AllocSet doubles the allocation sizes anyway), so I got rid of this. This makes copying the old state easier (a simple memcpy instead of for loops). Note: The PgxcNodeListAndCount() seems a bit unfortunate, as the name suggests to list and count nodes, but in practice it refreshes the status of nodes from catalog, while keeping the health status.
2016-10-18	initialize 'pids' variable explicitly in agent_handle_input	Tomas Vondra
	Otherwise the agent_acquire_connections() might bail out before actually setting pids to anything, leaving the pointer set to some random garbade, causing segfault at pfree() time.
2016-10-18	cast connection to (PGconn*) in grow_pool() to fix compiler warning	Tomas Vondra

2016-10-18	remove unused GUC options - strict_statement_checking and ↵	Tomas Vondra
	enforce_two_phase_commit These two options were ineffective - defined and listed in the sample configuration file, but attached to unreferenced variables.