summaryrefslogtreecommitdiff
path: root/src/include
AgeCommit message (Collapse)Author
2018-07-27Improve locking semantics in GTM and GTM ProxyPavan Deolasee
While GTM allows long jump in case of errors, we were not careful to release locks currently held by the executing thread. That could lead to threads leaving a critical section still holding a lock and thus causing deadlocks. We now properly track currently held locks in the thread-specific information and release those locks in case of an error. Same is done for mutex locks as well, though there is only one that gets used. This change required using a malloc-ed memory for thread-specific info. While due care has been taken to free the structure, we should keep an eye on it for any possible memory leaks. In passing also improve handling of bad-protocol startup messages which may have caused deadlock and resource starvation.
2018-05-18Track clearly whether to run a remote transaction in autocommit or a blockPavan Deolasee
Chi Gao and Hengbing Wang reported certain issues around transaction handling and demonstrated via xlogdump how certain transactions were getting marked committed/aborted repeatedly on a datanode. When an already committed transaction is attempted to be aborted again, it results in a PANIC. Upon investigation, this uncovered a very serious yet long standing bug in transaction handling. If the client is running in autocommit mode, we try to avoid starting a transaction block on the datanode side if only one datanode is going to be involved in the transaction. This is an optimisation to speed up short queries touching only a single node. But when the query rewriter transforms a single statement into multiple statements, we would still (and incorrectly) run each statement in an autocommit mode on the datanode. This can cause inconsistencies when one statement commits but the next statement aborts. And it may also lead to the PANIC situations if we continue to use the same global transaction identifier for the statements. This can also happen when the user invokes a user-defined function. If the function has multiple statements, each statement will run in an autocommit mode, if it's FQSed, thus again creating inconsistency if a following statement in the function fails. We now have a more elaborate mechanism to tackle autocommit and transaction block needs. The special casing for force_autocommit is now removed, thus making it more predictable. We also have specific conditions to check to ensure that we don't mixup autocommit and transaction block for the same global xid. Finally, if a query rewriter transforms a single statement into multiple statements, we run those statements in a transaction block. Together these changes should help us fix the problems.
2017-11-04Move several functions from pgxcnode.c to poolmgr.cTomas Vondra
A number of functions were defined in pgxcnode.h/pgxnnode.h, but only ever used in poolmgr.c. Those are: - PGXCNodeConnect - open libpq connection using conn. string - PGXCNodePing - ping node using connection string - PGXCNodeClose - close libpq connection - PGXCNodeConnected - verify connection status - PGXCNodeConnStr - build connection string So move them to poolmgr.c and make them static, so that poolmgr is the only part dealing with libpq connections directly.
2017-11-04Comments and cleanup in the connection pool managerTomas Vondra
Similarly to a39b06b0c6, this does minor cleanup in the pool manager code by removing unused functions and adding a lot of comments, both at the file level (explaining the concepts and basic API methods) and for individual functions.
2017-11-04Improve comments in GTM code, minor naming tweaksTomas Vondra
This patch improves comments in gtm_txn.c and gtm_snap.c in three basic ways: 1) Adds global comments explaining the basics of transaction and snapshot management APIs - underlying concepts, main methods. 2) Improves (and adds) function-level comments, explaining the meaning of parameters, return values, and other details. 3) Tweaks the naming of several API functions, to make them more consistent with the rest of the module.
2017-11-04Cleanup GTM API: make functions static, remove dead codeTomas Vondra
The cleanup does two basic things: * Functions used only in a single source file are made static (and also removed from the header file, of course). This reduces the size of the public GTM API. * Unused functions (identified by the compiler thanks to making other functions static in the previous step) are removed. The assumption is that this code was not really tested at all, and would only make future improvements harder.
2017-10-14Allocate thread-local snapshot array staticallyTomas Vondra
Since commit fb56418d66 the snapshots are computed in thread-local storage, but we haven't been freeing the memory (on thread exit). As the memory is allocated in the global (TopMostMemoryContext), this presented a memory leak of 64kB on each GTM connection. One way to fix this would be to track when the thread-local storage is used in GTM_GetTransactionSnapshot(), and allocate the memory in TopMemoryContext (which is per-thread and released on exit). But there's a simpler way - allocate the thread-specific storage as part of GTM_ThreadInfo, and just redirect sn_xip from the snapshot. This way we don't have to worry about palloc/pfree at all, and we mostly assume that every connection will need to compute at least one snapshot anyway. Reported by Rob Canavan <rxcanavan@gmail.com>, investigation and fix by me. For more discussion see <CAFTg0q6VC_11+c=Q=gsAcDsBrDjvuGKjzNwH4Lr8vERRDn4Ycw@mail.gmail.com> Backpatch to Postgres-XL 9.5.
2017-09-20Improve shared queue synchronization furtherPavan Deolasee
Our efforts to improve shared queue synchronization continues. We now have a per queue producer lwlock that must be held for synchronization between consumers and the producer. Consumers must hold this lock before setting the producer latch to ensure the producer does not miss out any signals and does not go into unnecessary waits. We still can't get rid of all the timeouts, especially we see that sometimes a producer finishes and tries to unbind from the queue, even before a consumer gets chance to connect to the queue. We left the 10s wait to allow consumers to connect. There is still net improvement because when the consumer is not going to connect, it tells the producer and we avoid the 10s timeout, like we used to see earlier.
2017-09-18Ensure that we don't read rule definition with portable input onPavan Deolasee
Rules are converted in their string representation and stored in the catalog. While building relation descriptor, this information is read back and converted into a Node representation. Since relation descriptors could be built when we are reading plan information sent by the remote server in a stringified representation, trying to read the rules with portable input on may lead to unpleasant behaviour. So we must first reset portable input and restore it back after reading the rules. The same applies to RLS policies (even though we don't have a test showing the impact, but it looks like a sane thing to fix anyways)
2017-09-07Stamp Postgres-XL 10alpha2Pavan Deolasee
2017-08-22Handle rescan of RemoteQuery node correctlyPavan Deolasee
We never had this support and we never felt the need because the use of FQS was limited for utility statements and simple queries which can be completed pushed down to the remote node. But in PG 10, we're seeing errors while using cursors for queries which are FQSed. So instead of forcing regular remote subplan on such queries, we are adding support for rescan of RemoteQuery node. Patch by Senhu <senhu@tencent.com>
2017-08-21Define coordinator_lxid GUC as unsigned integerTomas Vondra
The coordinator_lxid GUC is internally stored as uint32, but was defined as plaint int32, triggering a compiler warning. It's also unclear what would happen for transaction IDs outside the signed range (possibly some strange issues). This adds a new GUC type (UInt), used only for this one GUC. The patch is fairly large, but most of it is boilerplate infrastructure to support the new GUC type. We have considered simpler workarounds (e.g. treating the GUC as string and converting it to/from uint32 using the GUC hooks, but this seems much cleaner and tidier.
2017-08-18Merge commit '21d304dfedb4f26d0d6587d9ac39b1b5c499bb55'Pavan Deolasee
This is the merge-base of PostgreSQL's master branch and REL_10_STABLE branch. This should be the last merge from PG's master branch into XL 10 branch. Subsequent merges must happen from REL_10_STABLE branch
2017-08-14Final pgindent + perltidy run for v10.Tom Lane
2017-08-13Remove AtEOXact_CatCache().Tom Lane
The sole useful effect of this function, to check that no catcache entries have positive refcounts at transaction end, has really been obsolete since we introduced ResourceOwners in PG 8.1. We reduced the checks to assertions years ago, so that the function was a complete no-op in production builds. There have been previous discussions about removing it entirely, but consensus up to now was that it had some small value as a cross-check for bugs in the ResourceOwner logic. However, it now emerges that it's possible to trigger these assertions if you hit an assert-enabled backend with SIGTERM during a call to SearchCatCacheList, because that function temporarily increases the refcounts of entries it's intending to add to a catcache list construct. In a normal ERROR scenario, the extra refcounts are cleaned up by SearchCatCacheList's PG_CATCH block; but in a FATAL exit we do a transaction abort and exit without ever executing PG_CATCH handlers. There's a case to be made that this is a generic hazard and we should consider restructuring elog(FATAL) handling so that pending PG_CATCH handlers do get run. That's pretty scary though: it could easily create more problems than it solves. Preliminary stress testing by Andreas Seltenreich suggests that there are not many live problems of this ilk, so we rejected that idea. There are more-localized ways to fix the problem; the most principled one would be to use PG_ENSURE_ERROR_CLEANUP instead of plain PG_TRY. But adding cycles to SearchCatCacheList isn't very appealing. We could also weaken the assertions in AtEOXact_CatCache in some more or less ad-hoc way, but that just makes its raison d'etre even less compelling. In the end, the most reasonable solution seems to be to just remove AtEOXact_CatCache altogether, on the grounds that it's not worth trying to fix it. It hasn't found any bugs for us in many years. Per report from Jeevan Chalke. Back-patch to all supported branches. Discussion: https://postgr.es/m/CAM2+6=VEE30YtRQCZX7_sCFsEpoUkFBV1gZazL70fqLn8rcvBA@mail.gmail.com
2017-08-11Reject use of ucol_strcollUTF8() before ICU 53Peter Eisentraut
Various bugs can cause crashes, so don't use that function before ICU 53. It will fall back to the code path used for other encodings. Since we now tie the function availability to an ICU version, we don't need the configure test anymore. That also resolves the issue that the test result was previously hardcoded for Windows. researched by Daniel Verite <daniel@manitou-mail.org>, Peter Geoghegan <pg@bowt.ie>, Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://www.postgresql.org/message-id/flat/f1438ec6-22aa-4029-9a3b-26f79d330e72%40manitou-mail.org
2017-08-10Improve the error message when creating an empty range partition.Robert Haas
The previous message didn't mention the name of the table or the bounds. Put the table name in the primary error message and the bounds in the detail message. Amit Langote, changed slightly by me. Suggestions on the exac phrasing from Tom Lane, David G. Johnston, and Dean Rasheed. Discussion: http://postgr.es/m/CA+Tgmoae6bpwVa-1BMaVcwvCCeOoJ5B9Q9-RHWo-1gJxfPBZ5Q@mail.gmail.com
2017-08-10Fix typo in comment.Robert Haas
Etsuro Fujita Discussion: http://postgr.es/m/5f794b91-67df-1ac6-8a4f-069f8e8e169d@lab.ntt.co.jp
2017-08-08Fix replication origin-related race conditionsAlvaro Herrera
Similar to what was fixed in commit 9915de6c1cb2 for replication slots, but this time it's related to replication origins: DROP SUBSCRIPTION attempts to drop the replication origin, but that fails if the replication worker process hasn't yet marked it unused. This causes failures in the buildfarm: ERROR: could not drop replication origin with OID 1, in use by PID 34069 Like the aforementioned commit, fix by having the process running DROP SUBSCRIPTION sleep until the worker marks the the replication origin struct as free. This uses a condition variable on each replication origin shmem state struct, so that the session trying to drop can sleep and expect to be awakened by the process keeping the origin open. Also fix a SGML markup in the previous commit. Discussion: https://postgr.es/m/20170808001433.rozlseaf4m2wkw3n@alvherre.pgsql
2017-08-08Fix inadequacies in recently added wait eventsAlvaro Herrera
In commit 9915de6c1cb2, we introduced a new wait point for replication slots and incorrectly labelled it as wait event PG_WAIT_LOCK. That's wrong, so invent an appropriate new wait event instead, and document it properly. While at it, fix numerous other problems in the vicinity: - two different walreceiver wait events were being mixed up in a single wait event (which wasn't documented either); split it out so that they can be distinguished, and document the new events properly. - ParallelBitmapPopulate was documented but didn't exist. - ParallelBitmapScan was not documented (I think this should be called "ParallelBitmapScanInit" instead.) - Logical replication wait events weren't documented - various symbols had been added in dartboard order in various places. Put them in alphabetical order instead, as was originally intended. Discussion: https://postgr.es/m/20170808181131.mu4fjepuh5m75cyq@alvherre.pgsql
2017-08-07Stamp 10beta3.Tom Lane
2017-08-05Only kill sync workers at commit time in subscription DDLPeter Eisentraut
This allows a transaction abort to avoid killing those workers. Author: Petr Jelinek <petr.jelinek@2ndquadrant.com>
2017-08-04hash: Increase the number of possible overflow bitmaps by 8x.Robert Haas
Per a report from AP, it's not that hard to exhaust the supply of bitmap pages if you create a table with a hash index and then insert a few billion rows - and then you start getting errors when you try to insert additional rows. In the particular case reported by AP, there's another fix that we can make to improve recycling of overflow pages, which is another way to avoid the error, but there may be other cases where this problem happens and that fix won't help. So let's buy ourselves as much headroom as we can without rearchitecting anything. The comments claim that the old limit was 64GB, but it was really only 32GB, because we didn't use all the bits in the page for bitmap bits - only the largest power of 2 that could fit after deducting space for the page header and so forth. Thus, we have 4kB per page for bitmap bits, not 8kB. The new limit is thus actually 8 times the old *real* limit but only 4 times the old *purported* limit. Since this breaks on-disk compatibility, bump HASH_VERSION. We've already done this earlier in this release cycle, so this doesn't cause any incremental inconvenience for people using pg_upgrade from releases prior to v10. However, users who use pg_upgrade to reach 10beta3 or later from 10beta2 or earlier will need to REINDEX any hash indexes again. Amit Kapila and Robert Haas Discussion: http://postgr.es/m/20170704105728.mwb72jebfmok2nm2@zip.com.au
2017-08-03Teach map_partition_varattnos to handle whole-row expressions.Robert Haas
Otherwise, partitioned tables with RETURNING expressions or subject to a WITH CHECK OPTION do not work properly. Amit Langote, reviewed by Amit Khandekar and Etsuro Fujita. A few comment changes by me. Discussion: http://postgr.es/m/9a39df80-871e-6212-0684-f93c83be4097@lab.ntt.co.jp
2017-08-02Make temporary tables use shared storage on datanodesPavan Deolasee
Since a temporary table may be accessed by multiple backends on a datanode, XL mostly treats such tables as regular tables. But the technique that was used to distingush between temporary tables that may need shared storage vs those which are accessed only by a single backend, wasn't very full proof. We were relying on global session activation to make that distinction. This clearly fails when a background process, such as autovacuuum process, tries to figure out whether a table is using local or shared storage. This was leading to various problems, such as, when the underlying file system objects for the table were getting cleaned up, but without first discarding all references to the table from the shared buffers. We now make all temp tables to use shared storage on the datanodes and thus simplify things. Only EXECUTE DIRECT anyways does not set up global session, so I don't think this will have any meaningful impact on the performance. This should fix the checkpoint failures during regression tests.
2017-07-31Fix comment.Tatsuo Ishii
XLByteToSeg and XLByteToPrevSeg calculate only a segment number. The definition of these macros were modified by commit dfda6ebaec6763090fb78b458a979b558c50b39b but the comment remain unchanged. Patch by Yugo Nagata. Back patched to 9.3 and beyond.
2017-07-31Always use 2048 bit DH parameters for OpenSSL ephemeral DH ciphers.Heikki Linnakangas
1024 bits is considered weak these days, but OpenSSL always passes 1024 as the key length to the tmp_dh callback. All the code to handle other key lengths is, in fact, dead. To remedy those issues: * Only include hard-coded 2048-bit parameters. * Set the parameters directly with SSL_CTX_set_tmp_dh(), without the callback * The name of the file containing the DH parameters is now a GUC. This replaces the old hardcoded "dh1024.pem" filename. (The files for other key lengths, dh512.pem, dh2048.pem, etc. were never actually used.) This is not a new problem, but it doesn't seem worth the risk and churn to backport. If you care enough about the strength of the DH parameters on old versions, you can create custom DH parameters, with as many bits as you wish, and put them in the "dh1024.pem" file. Per report by Nicolas Guini and Damian Quiroga. Reviewed by Michael Paquier. Discussion: https://www.postgresql.org/message-id/CAMxBoUyjOOautVozN6ofzym828aNrDjuCcOTcCquxjwS-L2hGQ@mail.gmail.com
2017-07-30Move ExecProcNode from dispatch to function pointer based model.Andres Freund
This allows us to add stack-depth checks the first time an executor node is called, and skip that overhead on following calls. Additionally it yields a nice speedup. While it'd probably have been a good idea to have that check all along, it has become more important after the new expression evaluation framework in b8d7f053c5c2bf2a7e - there's no stack depth check in common paths anymore now. We previously relied on ExecEvalExpr() being executed somewhere. We should move towards that model for further routines, but as this is required for v10, it seems better to only do the necessary (which already is quite large). Author: Andres Freund, Tom Lane Reported-By: Julien Rouhaud Discussion: https://postgr.es/m/22833.1490390175@sss.pgh.pa.us https://postgr.es/m/b0af9eaa-130c-60d0-9e4e-7a135b1e0c76@dalibo.com
2017-07-26Update copyright in recently added filesAlvaro Herrera
2017-07-25Fix race conditions in replication slot operationsAlvaro Herrera
It is relatively easy to get a replication slot to look as still active while one process is in the process of getting rid of it; when some other process tries to "acquire" the slot, it would fail with an error message of "replication slot XYZ is active for PID N". The error message in itself is fine, except that when the intention is to drop the slot, it is unhelpful: the useful behavior would be to wait until the slot is no longer acquired, so that the drop can proceed. To implement this, we use a condition variable so that slot acquisition can be told to wait on that condition variable if the slot is already acquired, and we make any change in active_pid broadcast a signal on the condition variable. Thus, as soon as the slot is released, the drop will proceed properly. Reported by: Tom Lane Discussion: https://postgr.es/m/11904.1499039688@sss.pgh.pa.us Authors: Petr Jelínek, Álvaro Herrera
2017-07-21Use MINVALUE/MAXVALUE instead of UNBOUNDED for range partition bounds.Dean Rasheed
Previously, UNBOUNDED meant no lower bound when used in the FROM list, and no upper bound when used in the TO list, which was OK for single-column range partitioning, but problematic with multiple columns. For example, an upper bound of (10.0, UNBOUNDED) would not be collocated with a lower bound of (10.0, UNBOUNDED), thus making it difficult or impossible to define contiguous multi-column range partitions in some cases. Fix this by using MINVALUE and MAXVALUE instead of UNBOUNDED to represent a partition column that is unbounded below or above respectively. This syntax removes any ambiguity, and ensures that if one partition's lower bound equals another partition's upper bound, then the partitions are contiguous. Also drop the constraint prohibiting finite values after an unbounded column, and just document the fact that any values after MINVALUE or MAXVALUE are ignored. Previously it was necessary to repeat UNBOUNDED multiple times, which was needlessly verbose. Note: Forces a post-PG 10 beta2 initdb. Report by Amul Sul, original patch by Amit Langote with some additional hacking by me. Discussion: https://postgr.es/m/CAAJ_b947mowpLdxL3jo3YLKngRjrq9+Ej4ymduQTfYR+8=YAYQ@mail.gmail.com
2017-07-19Add static assertions about pg_control fitting into one disk sector.Tom Lane
When pg_control was first designed, sizeof(ControlFileData) was small enough that a comment seemed like plenty to document the assumption that it'd fit into one disk sector. Now it's nearly 300 bytes, raising the possibility that somebody would carelessly add enough stuff to create a problem. Let's add a StaticAssertStmt() to ensure that the situation doesn't pass unnoticed if it ever occurs. While at it, rename PG_CONTROL_SIZE to PG_CONTROL_FILE_SIZE to make it clearer what that symbol means, and convert the existing runtime comparisons of sizeof(ControlFileData) vs. PG_CONTROL_FILE_SIZE to be static asserts --- we didn't have that technology when this code was first written. Discussion: https://postgr.es/m/9192.1500490591@sss.pgh.pa.us
2017-07-18Fix serious performance problems in json(b) to_tsvector().Tom Lane
In an off-list followup to bug #14745, Bob Jones complained that to_tsvector() on a 2MB jsonb value took an unreasonable amount of time and space --- enough to draw the wrath of the OOM killer on his machine. On my machine, his example proved to require upwards of 18 seconds and 4GB, which seemed pretty bogus considering that to_tsvector() on the same data treated as text took just a couple hundred msec and 10 or so MB. On investigation, the problem is that the implementation scans each string element of the json(b) and converts it to tsvector separately, then applies tsvector_concat() to join those separate tsvectors. The unreasonable memory usage came from leaking every single one of the transient tsvectors --- but even without that mistake, this is an O(N^2) or worse algorithm, because tsvector_concat() has to repeatedly process the words coming from earlier elements. We can fix it by accumulating all the lexeme data and applying make_tsvector() just once. As a side benefit, that also makes the desired adjustment of lexeme positions far cheaper, because we can just tweak the running "pos" counter between JSON elements. In passing, try to make the explanation of that tweak more intelligible. (I didn't think that a barely-readable comment far removed from the actual code was helpful.) And do some minor other code beautification.
2017-07-18Use a real RT index when setting up partition tuple routing.Robert Haas
Before, we always used a dummy value of 1, but that's not right when the partitioned table being modified is inside of a WITH clause rather than part of the main query. Amit Langote, reported and reviewd by Etsuro Fujita, with a comment change by me. Discussion: http://postgr.es/m/ee12f648-8907-77b5-afc0-2980bcb0aa37@lab.ntt.co.jp
2017-07-14Code review for NextValueExpr expression node type.Tom Lane
Add missing infrastructure for this node type, notably in ruleutils.c where its lack could demonstrably cause EXPLAIN to fail. Add outfuncs/readfuncs support. (outfuncs support is useful today for debugging purposes. The readfuncs support may never be needed, since at present it would only matter for parallel query and NextValueExpr should never appear in a parallelizable query; but it seems like a bad idea to have a primnode type that isn't fully supported here.) Teach planner infrastructure that NextValueExpr is a volatile, parallel-unsafe, non-leaky expression node with cost cpu_operator_cost. Given its limited scope of usage, there *might* be no live bug today from the lack of that knowledge, but it's certainly going to bite us on the rear someday. Teach pg_stat_statements about the new node type, too. While at it, also teach cost_qual_eval() that MinMaxExpr, SQLValueFunction, XmlExpr, and CoerceToDomain should be charged as cpu_operator_cost. Failing to do this for SQLValueFunction was an oversight in my commit 0bb51aa96. The others are longer-standing oversights, but no time like the present to fix them. (In principle, CoerceToDomain could have cost much higher than this, but it doesn't presently seem worth trying to examine the domain's constraints here.) Modify execExprInterp.c to execute NextValueExpr as an out-of-line function; it seems quite unlikely to me that it's worth insisting that it be inlined in all expression eval methods. Besides, providing the out-of-line function doesn't stop anyone from inlining if they want to. Adjust some places where NextValueExpr support had been inserted with the aid of a dartboard rather than keeping it in the same order as elsewhere. Discussion: https://postgr.es/m/23862.1499981661@sss.pgh.pa.us
2017-07-13Merge remote-tracking branch 'remotes/PGSQL/master' of PG 10Pavan Deolasee
This merge includes all commits upto bc2d716ad09fceeb391c755f78c256ddac9d3b9f of PG 10.
2017-07-10Stamp 10beta2.Tom Lane
2017-07-09Remove storm_catalog schemaTomas Vondra
The storm_catalog schema is supposed to contain the same catalogs and views as pg_catalog, but filtered to the current database. The use case for this is multi-tenant systems, which was a StormDB feature. But on XL this is mostly irrelevant, and the schema was not populated since commit 8096e3edf17b260de15472eb04567d1beec1e3e6 which disabled this part of initdb. So instead of fixing the regression failures in misc_sanity caused by this (initdb-time schema with no pinned objects), just rip all the remaining bits out, including the pgxc_catalog_remap GUC etc. This also removes the setup_storm() call disabled by 8096e3edf1, as the function got removed since then.
2017-06-30Fix locking in WAL receiver/sender shmem state structsAlvaro Herrera
In WAL receiver and WAL server, some accesses to their corresponding shared memory control structs were done without holding any kind of lock, which could lead to inconsistent and possibly insecure results. In walsender, fix by clarifying the locking rules and following them correctly, as documented in the new comment in walsender_private.h; namely that some members can be read in walsender itself without a lock, because the only writes occur in the same process. The rest of the struct requires spinlock for accesses, as usual. In walreceiver, fix by always holding spinlock while accessing the struct. While there is potentially a problem in all branches, it is minor in stable ones. This only became a real problem in pg10 because of quorum commit in synchronous replication (commit 3901fd70cc7c), and a potential security problem in walreceiver because a superuser() check was removed by default monitoring roles (commit 25fff40798fc). Thus, no backpatch. In passing, clean up some leftover braces which were used to create unconditional blocks. Once upon a time these were used for volatile-izing accesses to those shmem structs, which is no longer required. Many other occurrences of this pattern remain. Author: Michaël Paquier Reported-by: Michaël Paquier Reviewed-by: Masahiko Sawada, Kyotaro Horiguchi, Thomas Munro, Robert Haas Discussion: https://postgr.es/m/CAB7nPqTWYqtzD=LN_oDaf9r-hAjUEPAy0B9yRkhcsLdRN8fzrw@mail.gmail.com
2017-06-30Remove outdated commentPeter Eisentraut
Author: Thomas Munro <thomas.munro@enterprisedb.com>
2017-06-28Change pg_ctl to detect server-ready by watching status in postmaster.pid.Tom Lane
Traditionally, "pg_ctl start -w" has waited for the server to become ready to accept connections by attempting a connection once per second. That has the major problem that connection issues (for instance, a kernel packet filter blocking traffic) can't be reliably told apart from server startup issues, and the minor problem that if server startup isn't quick, we accumulate "the database system is starting up" spam in the server log. We've hacked around many of the possible connection issues, but it resulted in ugly and complicated code in pg_ctl.c. In commit c61559ec3, I changed the probe rate to every tenth of a second. That prompted Jeff Janes to complain that the log-spam problem had become much worse. In the ensuing discussion, Andres Freund pointed out that we could dispense with connection attempts altogether if the postmaster were changed to report its status in postmaster.pid, which "pg_ctl start" already relies on being able to read. This patch implements that, teaching postmaster.c to report a status string into the pidfile at the same state-change points already identified as being of interest for systemd status reporting (cf commit 7d17e683f). pg_ctl no longer needs to link with libpq at all; all its functions now depend on reading server files. In support of this, teach AddToDataDirLockFile() to allow addition of postmaster.pid lines in not-necessarily-sequential order. This is needed on Windows where the SHMEM_KEY line will never be written at all. We still have the restriction that we don't want to truncate the pidfile; document the reasons for that a bit better. Also, fix the pg_ctl TAP tests so they'll notice if "start -w" mode is broken --- before, they'd just wait out the sixty seconds until the loop gives up, and then report success anyway. (Yes, I found that out the hard way.) While at it, arrange for pg_ctl to not need to #include miscadmin.h; as a rather low-level backend header, requiring that to be compilable client-side is pretty dubious. This requires moving the #define's associated with the pidfile into a new header file, and moving PG_BACKEND_VERSIONSTR someplace else. For lack of a clearly better "someplace else", I put it into port.h, beside the declaration of find_other_exec(), since most users of that macro are passing the value to find_other_exec(). (initdb still depends on miscadmin.h, but at least pg_ctl and pg_upgrade no longer do.) In passing, fix main.c so that PG_BACKEND_VERSIONSTR actually defines the output of "postgres -V", which remarkably it had never done before. Discussion: https://postgr.es/m/CAMkU=1xJW8e+CTotojOMBd-yzUvD0e_JZu2xHo=MnuZ4__m7Pg@mail.gmail.com
2017-06-28Fix transition tables for ON CONFLICT.Andrew Gierth
We now disallow having triggers with both transition tables and ON INSERT OR UPDATE (which was a PG extension to the spec anyway), because in this case it's not at all clear how the transition tables should work for an INSERT ... ON CONFLICT query. Separate ON INSERT and ON UPDATE triggers with transition tables are allowed, and the transition tables for these reflect only the inserted and only the updated tuples respectively. Patch by Thomas Munro Discussion: https://postgr.es/m/CAEepm%3D11KHQ0JmETJQihSvhZB5mUZL2xrqHeXbCeLhDiqQ39%3Dw%40mail.gmail.com
2017-06-28Fix transition tables for wCTEs.Andrew Gierth
The original coding didn't handle this case properly; each separate DML substatement needs its own set of transitions. Patch by Thomas Munro Discussion: https://postgr.es/m/CAL9smLCDQ%3D2o024rBgtD4WihzX8B3C6u_oSQ2K3%2BR5grJrV0bg%40mail.gmail.com
2017-06-28Fix transition tables for partition/inheritance.Andrew Gierth
We disallow row-level triggers with transition tables on child tables. Transition tables for triggers on the parent table contain only those columns present in the parent. (We can't mix tuple formats in a single transition table.) Patch by Thomas Munro Discussion: https://postgr.es/m/CA%2BTgmoZzTBBAsEUh4MazAN7ga%3D8SsMC-Knp-6cetts9yNZUCcg%40mail.gmail.com
2017-06-28Merge remote-tracking branch 'remotes/origin/master' into xl10develPavan Deolasee
This merges the current master branch of XL with the XL 10 development branch. Commits upto f72330316ea5796a2b11a05710b98eba4e706788 are included in this merge.
2017-06-27Merge PG10 master branch into xl10develPavan Deolasee
This commit merges PG10 branch upto commit 2710ccd782d0308a3fa1ab193531183148e9b626. Regression tests show no noteworthy additional failures. This merge includes major pgindent work done with the newer version of pgindent
2017-06-24Further hacking on ICU collation creation and usage.Tom Lane
pg_import_system_collations() refused to create any ICU collations if the current database's encoding didn't support ICU. This is wrongheaded: initdb must initialize pg_collation in an encoding-independent way since it might be used in other databases with different encodings. The reason for the restriction seems to be that get_icu_locale_comment() used icu_from_uchar() to convert the UChar-format display name, and that unsurprisingly doesn't know what to do in unsupported encodings. But by the same token that the initial catalog contents must be encoding-independent, we can't allow non-ASCII characters in the comment strings. So we don't really need icu_from_uchar() here: just check for Unicode codes outside the ASCII range, and if there are none, the format conversion is trivial. If there are some, we can simply not install the comment. (In my testing, this affects only Norwegian Bokmål, which has given us trouble before.) For paranoia's sake, also check for non-ASCII characters in ICU locale names, and skip such locales, as we do for libc locales. I don't currently have a reason to believe that this will ever reject anything, but then again the libc maintainers should have known better too. With just the import changes, ICU collations can be found in pg_collation in databases with unsupported encodings. This resulted in more or less clean failures at runtime, but that's not how things act for unsupported encodings with libc collations. Make it work the same as our traditional behavior for libc collations by having collation lookup take into account whether is_encoding_supported_by_icu(). Adjust documentation to match. Also, expand Table 23.1 to show which encodings are supported by ICU. catversion bump because of likely change in pg_collation/pg_description initial contents in ICU-enabled builds. Discussion: https://postgr.es/m/20c74bc3-d6ca-243d-1bbc-12f17fa4fe9a@gmail.com
2017-06-23Rethink behavior of pg_import_system_collations().Tom Lane
Marco Atzeri reported that initdb would fail if "locale -a" reported the same locale name more than once. All previous versions of Postgres implicitly de-duplicated the results of "locale -a", but the rewrite to move the collation import logic into C had lost that property. It had also lost the property that locale names matching built-in collation names were silently ignored. The simplest way to fix this is to make initdb run the function in if-not-exists mode, which means that there's no real use-case for non if-not-exists mode; we might as well just drop the boolean argument and simplify the function's definition to be "add any collations not already known". This change also gets rid of some odd corner cases caused by the fact that aliases were added in if-not-exists mode even if the function argument said otherwise. While at it, adjust the behavior so that pg_import_system_collations() doesn't spew "collation foo already exists, skipping" messages during a re-run; that's completely unhelpful, especially since there are often hundreds of them. And make it return a count of the number of collations it did add, which seems like it might be helpful. Also, re-integrate the previous coding's property that it would make a deterministic selection of which alias to use if there were conflicting possibilities. This would only come into play if "locale -a" reports multiple equivalent locale names, say "de_DE.utf8" and "de_DE.UTF-8", but that hardly seems out of the question. In passing, fix incorrect behavior in pg_import_system_collations()'s ICU code path: it neglected CommandCounterIncrement, which would result in failures if ICU returns duplicate names, and it would try to create comments even if a new collation hadn't been created. Also, reorder operations in initdb so that the 'ucs_basic' collation is created before calling pg_import_system_collations() not after. This prevents a failure if "locale -a" were to report a locale named that. There's no reason to think that that ever happens in the wild, but the old coding would have survived it, so let's be equally robust. Discussion: https://postgr.es/m/20c74bc3-d6ca-243d-1bbc-12f17fa4fe9a@gmail.com
2017-06-23Fix memory leakage in ICU encoding conversion, and other code review.Tom Lane
Callers of icu_to_uchar() neglected to pfree the result string when done with it. This results in catastrophic memory leaks in varstr_cmp(), because of our prevailing assumption that btree comparison functions don't leak memory. For safety, make all the call sites clean up leaks, though I suspect that we could get away without it in formatting.c. I audited callers of icu_from_uchar() as well, but found no places that seemed to have a comparable issue. Add function API specifications for icu_to_uchar() and icu_from_uchar(); the lack of any thought-through specification is perhaps not unrelated to the existence of this bug in the first place. Fix icu_to_uchar() to guarantee a nul-terminated result; although no existing caller appears to care, the fact that it would have been nul-terminated except in extreme corner cases seems ideally designed to bite someone on the rear someday. Fix ucnv_fromUChars() destCapacity argument --- in the worst case, that could perhaps have led to a non-nul-terminated result, too. Fix icu_from_uchar() to have a more reasonable definition of the function result --- no callers are actually paying attention, so this isn't a live bug, but it's certainly sloppily designed. Const-ify icu_from_uchar()'s input string for consistency. That is not the end of what needs to be done to these functions, but it's as much as I have the patience for right now. Discussion: https://postgr.es/m/1955.1498181798@sss.pgh.pa.us
2017-06-21Reformat comments about ResultRelInfoPeter Eisentraut
Also add a comment on its new member PartitionRoot. Reported-by: Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp>