summaryrefslogtreecommitdiff
path: root/src/backend/pgxc
AgeCommit message (Collapse)Author
2018-10-12Use sufficiently large buffer in SharedQueueWriteTomas Vondra
The sq_key alone may be up to 64 bytes, so we need more than that. We could use dynamic memory instead, but 128 bytes should be enough both for the sq_key and the other pieces.
2018-07-27Ensure that typename is schema qualified while sending row descriptionPavan Deolasee
A row description messages contains the type information for the attributes in the column. But if the type does not exist in the search_path then the coordinator fails to parse the typename back to the type. So the datanode must send the schema name along with the type name. Per report and test case by Hengbing Wang @ Microfun. Added a new test file and a few test cases to cover this area.
2018-07-27Ensure pooler process follows consistent model for SIGQUIT handlingPavan Deolasee
We'd occassionally seen that the pooler process fails to respond to SIGQUIT and gets stuck in a non recoverable state. Code inspection reveals that we're not following the model followed by rest of the background worker processes in handling SIGQUIT. So get that fixed, with the hope that this will fix the problem case.
2018-07-27Properly quote typename before calling parseTypeStringPavan Deolasee
Without this, parseTypeString() might throw an error or resolve to a wrong type in case the type name requires quoting. Per report by Hengbing Wang
2018-05-21Fix broken implementation of recovery to barrier.Pavan Deolasee
Per report from Hengbing, the current implementation of PITR recovery to a BARRIER failed to correctly stop at the given recovery_target_barrier. It seems there are two bugs here. 1) we failed to write the XLOG record correctly and 2) we also failed to mark the end-of-recovery upon seeing the XLOG record during the recovery. Fix both these problems and also fix pg_xlogdump in passing to ensure we can dump the BARRIER XLOG records correctly.
2018-05-18Track clearly whether to run a remote transaction in autocommit or a blockPavan Deolasee
Chi Gao and Hengbing Wang reported certain issues around transaction handling and demonstrated via xlogdump how certain transactions were getting marked committed/aborted repeatedly on a datanode. When an already committed transaction is attempted to be aborted again, it results in a PANIC. Upon investigation, this uncovered a very serious yet long standing bug in transaction handling. If the client is running in autocommit mode, we try to avoid starting a transaction block on the datanode side if only one datanode is going to be involved in the transaction. This is an optimisation to speed up short queries touching only a single node. But when the query rewriter transforms a single statement into multiple statements, we would still (and incorrectly) run each statement in an autocommit mode on the datanode. This can cause inconsistencies when one statement commits but the next statement aborts. And it may also lead to the PANIC situations if we continue to use the same global transaction identifier for the statements. This can also happen when the user invokes a user-defined function. If the function has multiple statements, each statement will run in an autocommit mode, if it's FQSed, thus again creating inconsistency if a following statement in the function fails. We now have a more elaborate mechanism to tackle autocommit and transaction block needs. The special casing for force_autocommit is now removed, thus making it more predictable. We also have specific conditions to check to ensure that we don't mixup autocommit and transaction block for the same global xid. Finally, if a query rewriter transforms a single statement into multiple statements, we run those statements in a transaction block. Together these changes should help us fix the problems.
2017-11-07Fix bug in release_connection() introduced by d9f45c9018Tomas Vondra
d9f45c9018ec3ec1fc11e4be2be7f9728a1799b1 attempted to refactor release_connection() to make it more readable, but unfortunately inverted the force_destroy check, causing regression failures. In hindsight, the refactoring was rather arbitrary and not really helping with the readability, so just revert to the original code (but keep the comments, explaining what's happening).
2017-11-04Move several functions from pgxcnode.c to poolmgr.cTomas Vondra
A number of functions were defined in pgxcnode.h/pgxnnode.h, but only ever used in poolmgr.c. Those are: - PGXCNodeConnect - open libpq connection using conn. string - PGXCNodePing - ping node using connection string - PGXCNodeClose - close libpq connection - PGXCNodeConnected - verify connection status - PGXCNodeConnStr - build connection string So move them to poolmgr.c and make them static, so that poolmgr is the only part dealing with libpq connections directly.
2017-11-04Comments and cleanup in the connection pool managerTomas Vondra
Similarly to a39b06b0c6, this does minor cleanup in the pool manager code by removing unused functions and adding a lot of comments, both at the file level (explaining the concepts and basic API methods) and for individual functions.
2017-10-14Remember queryId for queries executed using FQSTomas Vondra
pgxc_FQS_planner() was not copying queryId, so extensions relying on it did not work properly. For example the pg_stat_statements extension was ignoring queries executed using FQS entirely. Backpatch to Postgres-XL 9.5.
2017-10-05Disable FQS for cursors defined with SCROLLTomas Vondra
When checking if a query is eligible for FQS (fast-query shipping), disable the optimization for queries in SCROLL cursors, as FQS does not support backward scans. Discussion: <e66932f3-3c35-cab0-af7e-60e8dfa423ba@2ndquadrant.com>
2017-09-20Improve shared queue synchronization furtherPavan Deolasee
Our efforts to improve shared queue synchronization continues. We now have a per queue producer lwlock that must be held for synchronization between consumers and the producer. Consumers must hold this lock before setting the producer latch to ensure the producer does not miss out any signals and does not go into unnecessary waits. We still can't get rid of all the timeouts, especially we see that sometimes a producer finishes and tries to unbind from the queue, even before a consumer gets chance to connect to the queue. We left the 10s wait to allow consumers to connect. There is still net improvement because when the consumer is not going to connect, it tells the producer and we avoid the 10s timeout, like we used to see earlier.
2017-09-18Ensure that we don't read rule definition with portable input onPavan Deolasee
Rules are converted in their string representation and stored in the catalog. While building relation descriptor, this information is read back and converted into a Node representation. Since relation descriptors could be built when we are reading plan information sent by the remote server in a stringified representation, trying to read the rules with portable input on may lead to unpleasant behaviour. So we must first reset portable input and restore it back after reading the rules. The same applies to RLS policies (even though we don't have a test showing the impact, but it looks like a sane thing to fix anyways)
2017-08-22Handle rescan of RemoteQuery node correctlyPavan Deolasee
We never had this support and we never felt the need because the use of FQS was limited for utility statements and simple queries which can be completed pushed down to the remote node. But in PG 10, we're seeing errors while using cursors for queries which are FQSed. So instead of forcing regular remote subplan on such queries, we are adding support for rescan of RemoteQuery node. Patch by Senhu <senhu@tencent.com>
2017-08-21Make sure coordinator_lxid is formatted as %u and not %dTomas Vondra
As the coordinator_lxid is uin32, so make sure we use %u to format it (e.g. when sending it to remote nodes as string) and not just %d.
2017-08-21Handle params correctly within Subplan nodesPavan Deolasee
We were not dealing with the params in Subplan correctly, thus those params were not sent to the remote nodes correctly during RemoteSubplan exectution. This patch fixes that by traversing the Subplan node correctly. The regression failure in the 'join' test case is addressed too. Patch by senhu (senhu@tencent.com)
2017-08-18Merge commit '21d304dfedb4f26d0d6587d9ac39b1b5c499bb55'Pavan Deolasee
This is the merge-base of PostgreSQL's master branch and REL_10_STABLE branch. This should be the last merge from PG's master branch into XL 10 branch. Subsequent merges must happen from REL_10_STABLE branch
2017-08-08Mark variable as const to prevent compiler warningTomas Vondra
The quote_guc_value() result was assigned to regular (char *) variable, resulting in compiler warning about discarding const qualifier. Fix by marking the variable as 'const' too.
2017-06-28Merge remote-tracking branch 'remotes/origin/master' into xl10develPavan Deolasee
This merges the current master branch of XL with the XL 10 development branch. Commits upto f72330316ea5796a2b11a05710b98eba4e706788 are included in this merge.
2017-06-17Randomize the choice of the initial ROUNDROBIN nodeTomas Vondra
With roundrobin node, the initial node was always set to the first node in the list. That works fine when inserting many rows at once (e.g. with INSERT SELECT), but with single-row inserts this puts all data on the first node, resulting in unbalanced distribution. This randomizes the choice of the initial node, so that with single-row inserts the ROUNDROBIN behaves a bit like RANDOM distribution. This also removes unnecessary srand() call from RelationBuildLocator(), located after a call to rand().
2017-06-17Prevent dropping distribution keys for MODULOTomas Vondra
Due to a coding issue in IsDistColumnForRelId() it was possible to drop columns that were used as modulo distribution keys. A simple example demonstrates the behavior: CREATE TABLE t (a INT, b INT, c INT) DISTRIBUTE BY MODULO (a); ALTER TABLE t DROP COLUMN a; test=# \d+ t Table "public.t" Column | Type | Modifiers | Storage | Stats target | Description --------+---------+-----------+---------+--------------+------------- b | integer | | plain | | c | integer | | plain | | Distribute By: MODULO(........pg.dropped.1........) Location Nodes: ALL DATANODES With this commit, the ALTER TABLE command fails as expected: ERROR: Distribution column cannot be dropped The commit simplifies the coding a bit, and removes several functions that were not needed anymore (and unused outside locator.c).
2017-06-17Add support for MODULO distribution on BIGINTTomas Vondra
Until now BIGINT data type was not supported by MODULO distribution and attempts to create such tables failed. This patch removes the limitation. The compute_modulo() function originally used an optimized algorithm from http://www.graphics.stanford.edu/~seander/bithacks.html (namely the one described in section "Compute modulus division by (1 << s) - 1 in parallel without a division operator") to compute the modulo. But that algorithm version only supported 32-bit values, and so would require changes to support 64-bit values. Instead, I've decided to simply drop that code and use simple % operator, which should translate to IDIV instruction. Judging by benchmarks (MODULO on INTEGER column), switching to plain modulo (%) might result in about 1% slowdown, but it might easily be just noise caused by different binary layout due to code changes. In fact, the simplified algorithm is much less noisy in this respect.
2017-06-14Merge from PG master upto d5cb3bab564e0927ffac7c8729eacf181a12dd40Pavan Deolasee
This is the result of the "git merge remotes/PGSQL/master" upto the said commit point. We have done some basic analysis, fixed compilation problems etc, but bulk of the logical problems in conflict resolution etc will be handled by subsequent commits.
2017-06-08Switch connections after processing PGXLRemoteFetchSize rowsPavan Deolasee
Fast-query-shipping consumes all rows produced by one datanode (connection) before moving to the next connection. This leads to suboptimal performance when the datanodes can't prroduce tuples at a desired pace. Instead, we switch between connections after every PGXLRemoteFetchSize (pgx_remote_fetch_size) rows are fetched. This gives datanode a chance to produce more tuples while the coordinator consumes tuples already produced and sent across by another datanode. This seems to improve performance for FQS-ed queries significantly when they are returning large number of rows from more than one datanodes. Report by Pilar de Teodoro <pteodoro@sciops.esa.int>, initial analysis and performance tests by Tomas Vondra, further analysis and patch by me. Backpatched to XL9_5_STABLE.
2017-05-18Do not reset global_session on RESET ALLPavan Deolasee
This avoids resetting global session information when DISCARD/RESET ALL is executed. This can have bad effects, especially as seen from the 'guc' test case where we fail to handle temp tables correctly. So we mark global_session GUC with GUC_NO_RESET_ALL flag and instead issue an explicit RESET global_session when connection is cleaned up.
2017-05-05Add a user configurable parameter to control the number of rows fetched fromPavan Deolasee
the remote side during RemoteSubplan execution. This allows us to experiment with different sizes more easily. Playing with the fetch size also exposed couple of problems fixed in this same commit. 1. We were incorrectly forgetting a connection response combiner while suspending a portal, leading to errors later when we try to buffer the results because the connection must be used for other queries. 2. The remote cursor name was not getting set properly, thus datanodes complaining about non-existent cursors.
2017-05-05Ensure that we don't try to allocate connection in/out buffers overPavan Deolasee
MaxAllocSize. We take this opportunity to rearrange the code to avoid duplicity in handling in and out buffers. Also, add some other checks to ensure that we don't overrun the limits. Report, investigation and draft patch by Krzysztof Nienartowicz.
2017-05-05Try to validate the combiner only when a RESPONSE_COPY is received duringPavan Deolasee
running COPY protocol. We sometimes do see a 'M' message (command ID received from the remote node) from the datanode and that breaks the combiner validation message. So we only do that if we received a RESPONSE_COPY message during COPY IN protocol.
2017-05-05Do not add a newline ('\n') between rows while running a BINARY COPY protocol.Pavan Deolasee
PostgreSQL's BINARY COPY protocol does not expect a newline character between rows. But coordinator was adding this while running the protocol between coordinator and the datanodes. While Postgres-XL had some provision to deal with this on the datanode side, it seemed either insufficient or buggy as evident during tests with pglogical. So get rid of that and make the protocol same as vanilla PG.
2017-05-05Ensure variable values are quoted when necessary while sending down SETPavan Deolasee
comamnds to the remote nodes Earlier we'd special cased a few GUCs such as those using memory or time units or transaction isolation levels. But clearly that wasn't enough as we noticed with "application_name" recently. So fix this problem in a more comprehensive manner. Added a few more test cases to cover these scenarios.
2017-03-19Resolve failures in rules tests by adding T_CteScan to execRemote.cTomas Vondra
The switch in determine_param_types() was missing T_CteScan case, so some queries were failing with: ERROR: unrecognized node type: 119 This fixes it, and it also fixes the expected output for a few other queries in the same test suite.
2017-01-19add name of the originating node to application_nameTomas Vondra
Until now the application_name was set to 'pgxc' for all connections. Set it to 'pgxc:NODE_NAME' instead, to make it easier to understand data from pg_stat_activity.
2017-01-09fix SharedQueuesInit to register the tranche properlyTomas Vondra
The code was mixing code for named and regular tranches (identified by just ID, in this case LWTRANCHE_SHARED_QUEUES). So switch to the LWLockRegisterTranche() approach. There were a two more bugs though: 1) The tranche struct was local in the block, which does not work as it's referenced in the global array without creating a copy. So just make it static and global. 2) The locks were not properly initialized. Add a LWLockInitialize call to the loop.
2016-11-09allocate SQueue LWLocks using a named tranche (dynamically)Tomas Vondra
LWLockAssign() got removed so this is the only way to do this. In any case, once we switch to the built-in shared queues, this should not be needed at all I believe.
2016-11-07get rid of HAVE_SIGPROCMASKTomas Vondra
The compilation was failing in poolmgr.c because of incorrect types when tweaking signals. After inspection it seems HAVE_SIGPROCMASK is not actually needed (it's not defined anywhere, and upstream is not using it either) so just get rid of it entirely.
2016-11-07fix includes in poolmgr.c because of s_lock.h failuresTomas Vondra
The frontend libraries need to be included last, because libpq-int.h defines FRONTEND, which triggers #error in s_lock.h. Including the libraries at the end is not particularly pretty, but it works. Also reorder the includes a bit into the usual lexicographic order.
2016-11-07fix calls to tuplesort_gettupleslot() from execRemote.cTomas Vondra
There's a fourth argument (abbrev), so just pass NULL.
2016-10-26Fix tablesample handling which was broken during the mergePavan Deolasee
2016-10-18Ensure we're not going beyond the configured limits of max_coordinators orPavan Deolasee
max_datanodes The user can still break things by first setting a high value for max_datanodes or max_coordinators and then lowering it again after adding nodes. So this basic check is not enough, but better than what we've currently.
2016-10-18Allow COMMIT/ROLLBACK PREPARED xact to succeed on the local coordinator even ifPavan Deolasee
the GTM no longer has state information for the transaction when xc_maintenance_mode is active. Information on the GTM is important for explicit prepared transactions because GTM tracks the participants nodes in that case and the COMMIT/ROLLBACK PREPARED commands can then to forwarded to those participants. So when xc_maintenance_mode is active, we only cleanup prepared transaction on the local node (or remote when used with EXECUTE DIRECT). So it seems ok not to guarantee the GTM to have the state information.
2016-10-18Handle remote parameters correctly by sending then down to the remote nodes inPavan Deolasee
the correct order and correction position Per report and test-case by Daniel Bunford-Jones
2016-10-18remove pgxc/pool/postgresql_fdw.c as is_immutable_func() is unusedTomas Vondra
There was only a single function in this module, and it was not referenced from anywhere. So remove the module and all references to it (e.g. includes of the header). In createplan.c, this is a bit more difficult, because it got getpid() through this file, so include the pgxcnode.h instead, as it uncludes unistd.h.
2016-10-18remove stormutils, containing only stormdb_promote_standby() functionTomas Vondra
Although stormdb_promote_standby() is a function visible from SQL, it does not seem mentioned anywhere (e.g. in the docs etc.). So let's get rid of it, as it's apparently not needed.
2016-10-18refactor DoRefreshRemoteHandles a bit to get rid of compiler warningsTomas Vondra
As written originally, the branching confused the compiler enough to emit warning about using uninitialized allOids variable. Refactor the code a bit to make it clear that's not really possible, and also move a number of variable declarations from the function scope to a block level. Note: I think the function would deserve more documentation - both about the intent in general, and about the meaning of the return value ('res' is quite unclear name).
2016-10-18rework GetPoolManagerHandle a bit to get rid of compiler warningsTomas Vondra
Under rare conditions (missing support for unix sockets or empty Unix_socket_directories) the fdsocket variable remained unitialized. Handle this case explicitly, and also fix a few minor annoyances (remove extra braces around a single-line block).
2016-10-18refactor agent_handle_input() by splitting it into smaller piecesTomas Vondra
When handling of a message type requires more than a few lines, move it to a separate function. This makes the code flow much cleaner, and it also issues with reusing shared variables (originally defined at the function scope). Also differentiate between EOF and protocol violation, if only to log the protocol violation. Note: The for(;;) loop checks for EOF on two places - once in the switch, and then at the very end (using pool_pollbyte). There's a subtle difference - the second place does not do agent_destroy. Not sure if this is intentional, but perhaps that's a bug?
2016-10-18minor PgxcNodeListAndCount() refactoring to fix compiler warningsTomas Vondra
While tweaking the code, I've noticed this is the only place using NodeHealthStatus, apparently attempting to save a few bytes. That seems rather pointless given how little memory this will need and how short-lived the memory is (and also how AllocSet doubles the allocation sizes anyway), so I got rid of this. This makes copying the old state easier (a simple memcpy instead of for loops). Note: The PgxcNodeListAndCount() seems a bit unfortunate, as the name suggests to list and count nodes, but in practice it refreshes the status of nodes from catalog, while keeping the health status.
2016-10-18initialize 'pids' variable explicitly in agent_handle_inputTomas Vondra
Otherwise the agent_acquire_connections() might bail out before actually setting pids to anything, leaving the pointer set to some random garbade, causing segfault at pfree() time.
2016-10-18cast connection to (PGconn*) in grow_pool() to fix compiler warningTomas Vondra
2016-10-18remove unused GUC options - strict_statement_checking and ↵Tomas Vondra
enforce_two_phase_commit These two options were ineffective - defined and listed in the sample configuration file, but attached to unreferenced variables.