postgresql.git - This is the main PostgreSQL git repository.

Age	Commit message (Collapse)	Author
2014-12-02	pageinspect/BRIN: minor tweaks	Alvaro Herrera
	Michael Paquier Double-dash additions suggested by Peter Geoghegan
2014-12-01	Fix hstore_to_json_loose's detection of valid JSON number values.	Andrew Dunstan
	We expose a function IsValidJsonNumber that internally calls the lexer for json numbers. That allows us to use the same test everywhere, instead of inventing a broken test for hstore conversions. The new function is also used in datum_to_json, replacing the code that is now moved to the new function. Backpatch to 9.3 where hstore_to_json_loose was introduced.
2014-11-30	Move test modules from contrib to src/test/modules	Alvaro Herrera
	This is advance preparation for introducing even more test modules; the easy solution is to add them to contrib, but that's bloated enough that it seems a good time to think of something different. Moved modules are dummy_seclabel, test_shm_mq, test_parser and worker_spi. (test_decoding was also a candidate, but there was too much opposition to moving that one. We can always reconsider later.)
2014-11-28	Add bms_next_member(), and use it where appropriate.	Tom Lane
	This patch adds a way of iterating through the members of a bitmapset nondestructively, unlike the old way with bms_first_member(). While bms_next_member() is very slightly slower than bms_first_member() (at least for typical-size bitmapsets), eliminating the need to palloc and pfree a temporary copy of the target bitmapset is a significant win. So this method should be preferred in all cases where a temporary copy would be necessary. Tom Lane, with suggestions from Dean Rasheed and David Rowley
2014-11-27	Free libxml2/libxslt resources in a safer order.	Tom Lane
	Mark Simonetti reported that libxslt sometimes crashes for him, and that swapping xslt_process's object-freeing calls around to do them in reverse order of creation seemed to fix it. I've not reproduced the crash, but valgrind clearly shows a reference to already-freed memory, which is consistent with the idea that shutdown of the xsltTransformContext is trying to reference the already-freed stylesheet or input document. With this patch, valgrind is no longer unhappy. I have an inquiry in to see if this is a libxslt bug or if we're just abusing the library; but even if it's a library bug, we'd want to adjust our code so it doesn't fail with unpatched libraries. Back-patch to all supported branches, because we've been doing this in the wrong(?) order for a long time.
2014-11-25	Make Port->ssl_in_use available, even when built with !USE_SSL	Heikki Linnakangas
	Code that check the flag no longer need #ifdef's, which is more convenient. In particular, makes it easier to write extensions that depend on it. In the passing, modify sslinfo's ssl_is_used function to check ssl_in_use instead of the OpenSSL specific 'ssl' pointer. It doesn't make any difference currently, as sslinfo is only compiled when built with OpenSSL, but seems cleaner anyway.
2014-11-24	Add infrastructure to save and restore GUC values.	Robert Haas
	This is further infrastructure for parallelism. Amit Khandekar, Noah Misch, Robert Haas
2014-11-22	Fix mishandling of system columns in FDW queries.	Tom Lane
	postgres_fdw would send query conditions involving system columns to the remote server, even though it makes no effort to ensure that system columns other than CTID match what the remote side thinks. tableoid, in particular, probably won't match and might have some use in queries. Hence, prevent sending conditions that include non-CTID system columns. Also, create_foreignscan_plan neglected to check local restriction conditions while determining whether to set fsSystemCol for a foreign scan plan node. This again would bollix the results for queries that test a foreign table's tableoid. Back-patch the first fix to 9.3 where postgres_fdw was introduced. Back-patch the second to 9.2. The code is probably broken in 9.1 as well, but the patch doesn't apply cleanly there; given the weak state of support for FDWs in 9.1, it doesn't seem worth fixing. Etsuro Fujita, reviewed by Ashutosh Bapat, and somewhat modified by me
2014-11-21	Add pageinspect functions for inspecting GIN indexes.	Heikki Linnakangas
	Patch by me, Peter Geoghegan and Michael Paquier, reviewed by Amit Kapila.
2014-11-20	Revamp the WAL record format.	Heikki Linnakangas
	Each WAL record now carries information about the modified relation and block(s) in a standardized format. That makes it easier to write tools that need that information, like pg_rewind, prefetching the blocks to speed up recovery, etc. There's a whole new API for building WAL records, replacing the XLogRecData chains used previously. The new API consists of XLogRegister* functions, which are called for each buffer and chunk of data that is added to the record. The new API also gives more control over when a full-page image is written, by passing flags to the XLogRegisterBuffer function. This also simplifies the XLogReadBufferForRedo() calls. The function can dig the relation and block number from the WAL record, so they no longer need to be passed as arguments. For the convenience of redo routines, XLogReader now disects each WAL record after reading it, copying the main data part and the per-block data into MAXALIGNed buffers. The data chunks are not aligned within the WAL record, but the redo routines can assume that the pointers returned by XLogRecGet* functions are. Redo routines are now passed the XLogReaderState, which contains the record in the already-disected format, instead of the plain XLogRecord. The new record format also makes the fixed size XLogRecord header smaller, by removing the xl_len field. The length of the "main data" portion is now stored at the end of the WAL record, and there's a separate header after XLogRecord for it. The alignment padding at the end of XLogRecord is also removed. This compansates for the fact that the new format would otherwise be more bulky than the old format. Reviewed by Andres Freund, Amit Kapila, Michael Paquier, Alvaro Herrera, Fujii Masao.
2014-11-19	Avoid file descriptor leak in pg_test_fsync.	Robert Haas
	This can cause problems on Windows, where files that are still open can't be unlinked. Jeff Janes
2014-11-15	postgres_fdw.h: don't pull in rel.h when relcache.h is enough	Alvaro Herrera

2014-11-13	Fix and improve cache invalidation logic for logical decoding.	Andres Freund
	There are basically three situations in which logical decoding needs to perform cache invalidation. During/After replaying a transaction with catalog changes, when skipping a uninteresting transaction that performed catalog changes and when erroring out while replaying a transaction. Unfortunately these three cases were all done slightly differently - partially because 8de3e410fa, which greatly simplifies matters, got committed in the midst of the development of logical decoding. The actually problematic case was when logical decoding skipped transaction commits (and thus processed invalidations). When used via the SQL interface cache invalidation could access the catalog - bad, because we didn't set up enough state to allow that correctly. It'd not be hard to setup sufficient state, but the simpler solution is to always perform cache invalidation outside a valid transaction. Also make the different cache invalidation cases look as similar as possible, to ease code review. This fixes the assertion failure reported by Antonin Houska in 53EE02D9.7040702@gmail.com. The presented testcase has been expanded into a regression test. Backpatch to 9.4, where logical decoding was introduced.
2014-11-13	Move the guts of our Levenshtein implementation into core.	Robert Haas
	The hope is that we can use this to produce better diagnostics in some cases. Peter Geoghegan, reviewed by Michael Paquier, with some further changes by me.
2014-11-12	Fix several weaknesses in slot and logical replication on-disk serialization.	Andres Freund
	Heikki noticed in 544E23C0.8090605@vmware.com that slot.c and snapbuild.c were missing the FIN_CRC32 call when computing/checking checksums of on disk files. That doesn't lower the the error detection capabilities of the checksum, but is inconsistent with other usages. In a followup mail Heikki also noticed that, contrary to a comment, the 'version' and 'length' struct fields of replication slot's on disk data where not covered by the checksum. That's not likely to lead to actually missed corruption as those fields are cross checked with the expected version and the actual file length. But it's wrong nonetheless. As fixing these issues makes existing on disk files unreadable, bump the expected versions of on disk files for both slots and logical decoding historic catalog snapshots. This means that loading old files will fail with ERROR: "replication slot file ... has unsupported version 1" and ERROR: "snapbuild state file ... has unsupported version 1 instead of 2" respectively. Given the low likelihood of anybody already using these new features in a production setup that seems acceptable. Fixing these issues made me notice that there's no regression test covering the loading of historic snapshot from disk - so add one. Backpatch to 9.4 where these features were introduced.
2014-11-12	Add interrupt checks to contrib/pg_prewarm.	Andres Freund
	Currently the extension's pg_prewarm() function didn't check interrupts once it started "warming" data. Since individual calls can take a long while it's important for them to be interruptible. Backpatch to 9.4 where pg_prewarm was introduced.
2014-11-11	Loop when necessary in contrib/pgcrypto's pktreader_pull().	Tom Lane
	This fixes a scenario in which pgp_sym_decrypt() failed with "Wrong key or corrupt data" on messages whose length is 6 less than a power of 2. Per bug #11905 from Connor Penhale. Fix by Marko Tiikkaja, regression test case from Jeff Janes.
2014-11-07	Update pg_xlogdump's .gitignore for brindesc.c.	Robert Haas

2014-11-07	BRIN: Block Range Indexes	Alvaro Herrera
	BRIN is a new index access method intended to accelerate scans of very large tables, without the maintenance overhead of btrees or other traditional indexes. They work by maintaining "summary" data about block ranges. Bitmap index scans work by reading each summary tuple and comparing them with the query quals; all pages in the range are returned in a lossy TID bitmap if the quals are consistent with the values in the summary tuple, otherwise not. Normal index scans are not supported because these indexes do not store TIDs. As new tuples are added into the index, the summary information is updated (if the block range in which the tuple is added is already summarized) or not; in the latter case, a subsequent pass of VACUUM or the brin_summarize_new_values() function will create the summary information. For data types with natural 1-D sort orders, the summary info consists of the maximum and the minimum values of each indexed column within each page range. This type of operator class we call "Minmax", and we supply a bunch of them for most data types with B-tree opclasses. Since the BRIN code is generalized, other approaches are possible for things such as arrays, geometric types, ranges, etc; even for things such as enum types we could do something different than minmax with better results. In this commit I only include minmax. Catalog version bumped due to new builtin catalog entries. There's more that could be done here, but this is a good step forwards. Loosely based on ideas from Simon Riggs; code mostly by Álvaro Herrera, with contribution by Heikki Linnakangas. Patch reviewed by: Amit Kapila, Heikki Linnakangas, Robert Haas. Testing help from Jeff Janes, Erik Rijkers, Emanuel Calvo. PS: The research leading to these results has received funding from the European Union's Seventh Framework Programme (FP7/2007-2013) under grant agreement n° 318633.
2014-11-06	Move the backup-block logic from XLogInsert to a new file, xloginsert.c.	Heikki Linnakangas
	xlog.c is huge, this makes it a little bit smaller, which is nice. Functions related to putting together the WAL record are in xloginsert.c, and the lower level stuff for managing WAL buffers and such are in xlog.c. Also move the definition of XLogRecord to a separate header file. This causes churn in the #includes of all the files that write WAL records, and redo routines, but it avoids pulling in xlog.h into most places. Reviewed by Michael Paquier, Alvaro Herrera, Andres Freund and Amit Kapila.
2014-11-05	Fix volatility markings of some contrib I/O functions.	Tom Lane
	In general, datatype I/O functions are supposed to be immutable or at worst stable. Some contrib I/O functions were, through oversight, not marked with any volatility property at all, which made them VOLATILE. Since (most of) these functions actually behave immutably, the erroneous marking isn't terribly harmful; but it can be user-visible in certain circumstances, as per a recent bug report from Joe Van Dyk in which a cast to text was disallowed in an expression index definition. To fix, just adjust the declarations in the extension SQL scripts. If we were being very fussy about this, we'd bump the extension version numbers, but that seems like more trouble (for both developers and users) than the problem is worth. A fly in the ointment is that chkpass_in actually is volatile, because of its use of random() to generate a fresh salt when presented with a not-yet-encrypted password. This is bad because of the general assumption that I/O functions aren't volatile: the consequence is that records or arrays containing chkpass elements may have input behavior a bit different from a bare chkpass column. But there seems no way to fix this without breaking existing usage patterns for chkpass, and the consequences of the inconsistency don't seem bad enough to justify that. So for the moment, just document it in a comment. Since we're not bumping version numbers, there seems no harm in back-patching these fixes; at least future installations will get the functions marked correctly.
2014-11-04	Switch to CRC-32C in WAL and other places.	Heikki Linnakangas
	The old algorithm was found to not be the usual CRC-32 algorithm, used by Ethernet et al. We were using a non-reflected lookup table with code meant for a reflected lookup table. That's a strange combination that AFAICS does not correspond to any bit-wise CRC calculation, which makes it difficult to reason about its properties. Although it has worked well in practice, seems safer to use a well-known algorithm. Since we're changing the algorithm anyway, we might as well choose a different polynomial. The Castagnoli polynomial has better error-correcting properties than the traditional CRC-32 polynomial, even if we had implemented it correctly. Another reason for picking that is that some new CPUs have hardware support for calculating CRC-32C, but not CRC-32, let alone our strange variant of it. This patch doesn't add any support for such hardware, but a future patch could now do that. The old algorithm is kept around for tsquery and pg_trgm, which use the values in indexes that need to remain compatible so that pg_upgrade works. While we're at it, share the old lookup table for CRC-32 calculation between hstore, ltree and core. They all use the same table, so might as well.
2014-11-03	Docs: fix incorrect spelling of contrib/pgcrypto option.	Tom Lane
	pgp_sym_encrypt's option is spelled "sess-key", not "enable-session-key". Spotted by Jeff Janes. In passing, improve a comment in pgp-pgsql.c to make it clearer that the debugging options are intentionally undocumented.
2014-11-03	Remove dead-since-introduction pgcrypto code.	Noah Misch
	Marko Tiikkaja
2014-10-20	pg_test_fsync: Update output format	Peter Eisentraut
	Apparently, computers are now a bit faster than when this was first added, so we need to make room for a digit or two in the ops/sec format. While we're at it, adjust some of the other output for a more consistent line length.
2014-10-20	Fix file-identification comment in contrib/pgcrypto/pgcrypto--1.2.sql.	Tom Lane
	Cosmetic oversight in commit 32984d8fc3dbb90a3fafb69fece0134f1ea790f9. Marko Tiikkaja
2014-10-16	Support timezone abbreviations that sometimes change.	Tom Lane
	Up to now, PG has assumed that any given timezone abbreviation (such as "EDT") represents a constant GMT offset in the usage of any particular region; we had a way to configure what that offset was, but not for it to be changeable over time. But, as with most things horological, this view of the world is too simplistic: there are numerous regions that have at one time or another switched to a different GMT offset but kept using the same timezone abbreviation. Almost the entire Russian Federation did that a few years ago, and later this month they're going to do it again. And there are similar examples all over the world. To cope with this, invent the notion of a "dynamic timezone abbreviation", which is one that is referenced to a particular underlying timezone (as defined in the IANA timezone database) and means whatever it currently means in that zone. For zones that use or have used daylight-savings time, the standard and DST abbreviations continue to have the property that you can specify standard or DST time and get that time offset whether or not DST was theoretically in effect at the time. However, the abbreviations mean what they meant at the time in question (or most recently before that time) rather than being absolutely fixed. The standard abbreviation-list files have been changed to use this behavior for abbreviations that have actually varied in meaning since 1970. The old simple-numeric definitions are kept for abbreviations that have not changed, since they are a bit faster to resolve. While this is clearly a new feature, it seems necessary to back-patch it into all active branches, because otherwise use of Russian zone abbreviations is going to become even more problematic than it already was. This change supersedes the changes in commit 513d06ded et al to modify the fixed meanings of the Russian abbreviations; since we've not shipped that yet, this will avoid an undesirably incompatible (not to mention incorrect) change in behavior for timestamps between 2011 and 2014. This patch makes some cosmetic changes in ecpglib to keep its usage of datetime lookup tables as similar as possible to the backend code, but doesn't do anything about the increasingly obsolete set of timezone abbreviation definitions that are hard-wired into ecpglib. Whatever we do about that will likely not be appropriate material for back-patching. Also, a potential free() of a garbage pointer after an out-of-memory failure in ecpglib has been fixed. This patch also fixes pre-existing bugs in DetermineTimeZoneOffset() that caused it to produce unexpected results near a timezone transition, if both the "before" and "after" states are marked as standard time. We'd only ever thought about or tested transitions between standard and DST time, but that's not what's happening when a zone simply redefines their base GMT offset. In passing, update the SGML documentation to refer to the Olson/zoneinfo/ zic timezone database as the "IANA" database, since it's now being maintained under the auspices of IANA.
2014-10-15	Print planning time only in EXPLAIN ANALYZE, not plain EXPLAIN.	Tom Lane
	We've gotten enough push-back on that change to make it clear that it wasn't an especially good idea to do it like that. Revert plain EXPLAIN to its previous behavior, but keep the extra output in EXPLAIN ANALYZE. Per discussion. Internally, I set this up as a separate flag ExplainState.summary that controls printing of planning time and execution time. For now it's just copied from the ANALYZE option, but we could consider exposing it to users.
2014-10-13	Add --latency-limit option to pgbench.	Heikki Linnakangas
	This allows transactions that take longer than specified limit to be counted separately. With --rate, transactions that are already late by the time we get to execute them are skipped altogether. Using --latency-limit with --rate allows you to "catch up" more quickly, if there's a hickup in the server causing a lot of transactions to stall momentarily. Fabien COELHO, reviewed by Rukh Meski and heavily refactored by me.
2014-10-11	pg_upgrade: prefix Unix shell script name output with "./"	Bruce Momjian
	This more clearly suggests the current directory. While this also works on Windows, it might be confusing. Report by Christoph Berg
2014-10-10	Remove unnecessary initialization of local variables.	Heikki Linnakangas
	Oops, forgot these in the prveious commit.
2014-10-10	Change the way encoding and locale checks are done in pg_upgrade.	Heikki Linnakangas
	Lc_collate and lc_ctype have been per-database settings since server version 8.4, but pg_upgrade was still treating them as cluster-wide options. It fetched the values for the template0 databases in old and new cluster, and compared them. That's backwards; the encoding and locale of the template0 database doesn't matter, as template0 is guaranteed to contain only ASCII characters. But if there are any other databases that exist on both clusters (in particular template1 and postgres databases), their encodings and locales must be compatible. Also, make the locale comparison more lenient. If the locale names are not equal, try to canonicalize both of them by passing them to setlocale(). We used to do that only when upgrading from 9.1 or below, but it seems like a good idea even with newer versions. If we change the canonical form of a locale, this allows pg_upgrade to still work. I'm about to do just that to fix bug #11431, by mapping a locale name that contains non-ASCII characters to a pure-ASCII alias of the same locale. No backpatching, because earlier versions of pg_upgrade still support upgrading from 8.3 servers. That would be more complicated, so it doesn't seem worth it, given that we haven't received any complaints about this from users.
2014-10-02	Fix typo in error message.	Heikki Linnakangas

2014-10-02	Refactor pgbench log-writing code to a separate function.	Heikki Linnakangas
	The doCustom function was incredibly long, this makes it a little bit more readable.
2014-10-01	Add functions for dealing with PGP armor header lines to pgcrypto.	Heikki Linnakangas
	This add a new pgp_armor_headers function to extract armor headers from an ASCII-armored blob, and a new overloaded variant of the armor function, for constructing an ASCII-armor with extra headers. Marko Tiikkaja and me.
2014-10-01	Improve documentation about binary/textual output mode for output plugins.	Andres Freund
	Also improve related error message as it contributed to the confusion. Discussion: CAB7nPqQrqFzjqCjxu4GZzTrD9kpj6HMn9G5aOOMwt1WZ8NfqeA@mail.gmail.com, CAB7nPqQXc_+g95zWnqaa=mVQ4d3BVRs6T41frcEYi2ocUrR3+A@mail.gmail.com Per discussion between Michael Paquier, Robert Haas and Andres Freund Backpatch to 9.4 where logical decoding was introduced.
2014-09-30	pg_upgrade: have pg_upgrade fail for old 9.4 JSONB format	Bruce Momjian
	Backpatch through 9.4
2014-09-26	Define META_FREE in a way that doesn't cause -Wempty-body warnings.	Andres Freund
	That get rids of the only -Wempty-body warning when compiling postgres with gcc 4.8/9. As 6550b901f shows, it's useful to be able to use that option routinely. Without asserts there's many more warnings, but that's food for another commit.
2014-09-25	Refactor space allocation for base64 encoding/decoding in pgcrypto.	Heikki Linnakangas
	Instead of trying to accurately calculate the space needed, use a StringInfo that's enlarged as needed. This is just moving things around currently - the old code was not wrong - but this is in preparation for a patch that adds support for extra armor headers, and would make the space calculation more complicated. Marko Tiikkaja
2014-09-22	Improve code around the recently added rm_identify rmgr callback.	Andres Freund
	There are four weaknesses in728f152e07f998d2cb4fe5f24ec8da2c3bda98f2: * append_init() in heapdesc.c was ugly and required that rm_identify return values are only valid till the next call. Instead just add a couple more switch() cases for the INIT_PAGE cases. Now the returned value will always be valid. * a couple rm_identify() callbacks missed masking xl_info with ~XLR_INFO_MASK. * pg_xlogdump didn't map a NULL rm_identify to UNKNOWN or a similar string. * append_init() was called when id=NULL - which should never actually happen. But it's better to be careful.
2014-09-19	Fix failure of contrib/auto_explain to print per-node timing information.	Tom Lane
	This has been broken since commit af7914c6627bcf0b0ca614e9ce95d3f8056602bf, which added the EXPLAIN (TIMING) option. Although that commit included updates to auto_explain, they evidently weren't tested very carefully, because the code failed to print node timings even when it should, due to failure to set es.timing in the ExplainState struct. Reported off-list by Neelakanth Nadgir of Salesforce. In passing, clean up the documentation for auto_explain's options a little bit, including re-ordering them into what seems to me a more logical order.
2014-09-19	Add the capability to display summary statistics to pg_xlogdump.	Andres Freund
	The new --stats/--stats=record options to pg_xlogdump display per rmgr/per record statistics about the parsed WAL. This is useful to understand what the WAL primarily consists of, to allow targeted optimizations on application, configuration, and core code level. It is likely that we will want to fine tune the statistics further, but the feature already is quite helpful. Author: Abhijit Menon-Sen, slightly editorialized by me Reviewed-By: Andres Freund, Dilip Kumar and Furuya Osamu Discussion: 20140604104716.GA3989@toroid.org
2014-09-19	Add rmgr callback to name xlog record types for display purposes.	Andres Freund
	This is primarily useful for the upcoming pg_xlogdump --stats feature, but also allows to remove some duplicated code in the rmgr_desc routines. Due to the separation and harmonization, the output of dipsplayed records changes somewhat. But since this isn't enduser oriented content that's ok. It's potentially desirable to further change pg_xlogdump's display of records. It previously wasn't possible to show the record type separately from the description forcing it to be in the last column. But that's better done in a separate commit. Author: Abhijit Menon-Sen, slightly editorialized by me Reviewed-By: Álvaro Herrera, Andres Freund, and Heikki Linnakangas Discussion: 20140604104716.GA3989@toroid.org
2014-09-11	pg_upgrade: adjust C comments	Bruce Momjian

2014-09-11	Fix Windows build.	Heikki Linnakangas
	I renamed a variable, but missed an #ifdef WIN32 block.
2014-09-11	Simplify calculation of Poisson distributed delays in pgbench --rate mode.	Heikki Linnakangas
	The previous coding first generated a uniform random value between 0.0 and 1.0, then converted that to an integer between 1 and 10000, and divided that again by 10000. Those conversions are unnecessary; we can use the double value that pg_erand48() returns directly. While we're at it, put the logic into a helper function, getPoissonRand(). The largest delay generated by the old coding was about 9.2 times the average, because of the way the uniformly distributed value used for the calculation was truncated to 1/10000 granularity. The new coding doesn't have such clamping. With my laptop's DBL_MIN value, the maximum delay with the new coding is about 700x the average. That seems acceptable - any reasonable pgbench session should last long enough to average that out. Backpatch to 9.4.
2014-09-11	Change the way latency is calculated with pgbench --rate option.	Heikki Linnakangas
	The reported latency values now include the "schedule lag" time, that is, the time between the transaction's scheduled start time and the time it actually started. This relates better to a model where requests arrive at a certain rate, and we are interested in the response time to the end user or application, rather than the response time of the database itself. Also, when --rate is used, include the schedule lag time in the log output. The --rate option is new in 9.4, so backpatch to 9.4. It seems better to make this change in 9.4, while we're still in the beta period, than ship a 9.4 version that calculates the values differently than 9.5.
2014-09-11	pg_upgrade: compare control version, not catalog version	Bruce Momjian
	Also modify test for the possibility the large object value might not exist in the old cluster. Fix for commit e1598a15f4fb0f076a6034d3d3debb9776aff07a
2014-09-10	pg_upgrade: check for large object size compatibility	Bruce Momjian

2014-09-09	doc: Reflect renaming of Mac OS X to OS X	Peter Eisentraut
	bug #10528