summaryrefslogtreecommitdiff
path: root/src/common
AgeCommit message (Collapse)Author
2023-04-08Introduce PG_IO_ALIGN_SIZE and align all I/O buffers.Thomas Munro
In order to have the option to use O_DIRECT/FILE_FLAG_NO_BUFFERING in a later commit, we need the addresses of user space buffers to be well aligned. The exact requirements vary by OS and file system (typically sectors and/or memory pages). The address alignment size is set to 4096, which is enough for currently known systems: it matches modern sectors and common memory page size. There is no standard governing O_DIRECT's requirements so we might eventually have to reconsider this with more information from the field or future systems. Aligning I/O buffers on memory pages is also known to improve regular buffered I/O performance. Three classes of I/O buffers for regular data pages are adjusted: (1) Heap buffers are now allocated with the new palloc_aligned() or MemoryContextAllocAligned() functions introduced by commit 439f6175. (2) Stack buffers now use a new struct PGIOAlignedBlock to respect PG_IO_ALIGN_SIZE, if possible with this compiler. (3) The buffer pool is also aligned in shared memory. WAL buffers were already aligned on XLOG_BLCKSZ. It's possible for XLOG_BLCKSZ to be configured smaller than PG_IO_ALIGNED_SIZE and thus for O_DIRECT WAL writes to fail to be well aligned, but that's a pre-existing condition and will be addressed by a later commit. BufFiles are not yet addressed (there's no current plan to use O_DIRECT for those, but they could potentially get some incidental speedup even in plain buffered I/O operations through better alignment). If we can't align stack objects suitably using the compiler extensions we know about, we disable the use of O_DIRECT by setting PG_O_DIRECT to 0. This avoids the need to consider systems that have O_DIRECT but can't align stack objects the way we want; such systems could in theory be supported with more work but we don't currently know of any such machines, so it's easier to pretend there is no O_DIRECT support instead. That's an existing and tested class of system. Add assertions that all buffers passed into smgrread(), smgrwrite() and smgrextend() are correctly aligned, unless PG_O_DIRECT is 0 (= stack alignment tricks may be unavailable) or the block size has been set too small to allow arrays of buffers to be all aligned. Author: Thomas Munro <thomas.munro@gmail.com> Author: Andres Freund <andres@anarazel.de> Reviewed-by: Justin Pryzby <pryzby@telsasoft.com> Discussion: https://postgr.es/m/CA+hUKGK1X532hYqJ_MzFWt0n1zt8trz980D79WbjwnT-yYLZpg@mail.gmail.com
2023-04-06Support long distance matching for zstd compressionTomas Vondra
zstd compression supports a special mode for finding matched in distant past, which may result in better compression ratio, at the expense of using more memory (the window size is 128MB). To enable this optional mode, use the "long" keyword when specifying the compression method (--compress=zstd:long). Author: Justin Pryzby Reviewed-by: Tomas Vondra, Jacob Champion Discussion: https://postgr.es/m/20230224191840.GD1653@telsasoft.com Discussion: https://postgr.es/m/20220327205020.GM28503@telsasoft.com
2023-03-27Make SCRAM iteration count configurableDaniel Gustafsson
Replace the hardcoded value with a GUC such that the iteration count can be raised in order to increase protection against brute-force attacks. The hardcoded value for SCRAM iteration count was defined to be 4096, which is taken from RFC 7677, so set the default for the GUC to 4096 to match. In RFC 7677 the recommendation is at least 15000 iterations but 4096 is listed as a SHOULD requirement given that it's estimated to yield a 0.5s processing time on a mobile handset of the time of RFC writing (late 2015). Raising the iteration count of SCRAM will make stored passwords more resilient to brute-force attacks at a higher computational cost during connection establishment. Lowering the count will reduce computational overhead during connections at the tradeoff of reducing strength against brute-force attacks. There are however platforms where even a modest iteration count yields a too high computational overhead, with weaker password encryption schemes chosen as a result. In these situations, SCRAM with a very low iteration count still gives benefits over weaker schemes like md5, so we allow the iteration count to be set to one at the low end. The new GUC is intentionally generically named such that it can be made to support future SCRAM standards should they emerge. At that point the value can be made into key:value pairs with an undefined key as a default which will be backwards compatible with this. Reviewed-by: Michael Paquier <michael@paquier.xyz> Reviewed-by: Jonathan S. Katz <jkatz@postgresql.org> Discussion: https://postgr.es/m/F72E7BC7-189F-4B17-BF47-9735EB72C364@yesql.se
2023-03-23Implement find_my_exec()'s path normalization using realpath(3).Tom Lane
Replace the symlink-chasing logic in find_my_exec with realpath(3), which has been required by POSIX since SUSv2. (Windows lacks realpath(), but there we can use _fullpath() which is functionally equivalent.) The main benefit of this is that -- on all modern platforms at least -- realpath() avoids the chdir() shenanigans we used to perform while interpreting symlinks. That had various corner-case failure modes so it's good to get rid of it. There is still ongoing discussion about whether we could skip the replacement of symlinks in some cases, but that's really matter for a separate patch. Meanwhile I want to push this before we get too close to feature freeze, so that we can find out if there are showstopper portability issues. Discussion: https://postgr.es/m/797232.1662075573@sss.pgh.pa.us
2023-03-21Add SHELL_ERROR and SHELL_EXIT_CODE magic variables to psql.Tom Lane
These are set after a \! command or a backtick substitution. SHELL_ERROR is just "true" for error (nonzero exit status) or "false" for success, while SHELL_EXIT_CODE records the actual exit status following standard shell/system(3) conventions. Corey Huinker, reviewed by Maxim Orlov and myself Discussion: https://postgr.es/m/CADkLM=cWao2x2f+UDw15W1JkVFr_bsxfstw=NGea7r9m4j-7rQ@mail.gmail.com
2023-03-16Silence pedantic compiler warning introduced in ce340e530d1Andres Freund
.../src/common/file_utils.c: In function ‘pg_pwrite_zeros’: .../src/common/file_utils.c:543:9: warning: ‘static’ is not at beginning of declaration [-Wold-style-declaration] 543 | const static PGAlignedBlock zbuffer = {{0}}; /* worth BLCKSZ */
2023-03-13Fix JSON error reporting for many cases of erroneous string values.Tom Lane
The majority of error exit cases in json_lex_string() failed to set lex->token_terminator, causing problems for the error context reporting code: it would see token_terminator less than token_start and do something more or less nuts. In v14 and up the end result could be as bad as a crash in report_json_context(). Older versions accidentally avoided that fate; but all versions produce error context lines that are far less useful than intended, because they'd stop at the end of the prior token instead of continuing to where the actually-bad input is. To fix, invent some macros that make it less notationally painful to do the right thing. Also add documentation about what the function is actually required to do; and in >= v14, add an assertion in report_json_context about token_terminator being sufficiently far advanced. Per report from Nikolay Shaplov. Back-patch to all supported versions. Discussion: https://postgr.es/m/7332649.x5DLKWyVIX@thinkpad-pgpro
2023-03-09Improve/correct commentsPeter Eisentraut
Change comments for pg_cryptohash_init(), pg_cryptohash_update(), pg_cryptohash_final() in cryptohash.c to match cryptohash_openssl.c. In particular, the claim that these functions were "designed" to never fail was incorrect, since by design callers need to be prepared to handle failures, for compatibility with the cryptohash_openssl.c versions. Reviewed-by: Daniel Gustafsson <daniel@yesql.se> Discussion: https://www.postgresql.org/message-id/301F4EDD-27B9-460F-B462-B9DB2BDE4ACF@yesql.se
2023-03-08meson: don't require 'touch' binary, make use of 'cp' optionalAndres Freund
We already didn't use touch (some earlier version of the meson build did ), and cp is only used for updating unicode files. The latter already depends on the optional availability of 'wget', so doing the same for 'cp' makes sense. Eventually we probably want a portable command for updating source code as part of a target, but for now... Reported-by: Andrew Dunstan <andrew@dunslane.net> Discussion: https://postgr.es/m/70e96c34-64ee-e549-8c4a-f91a7a668804@dunslane.net
2023-03-06Silence -Wmissing-braces complaints in file_utils.cMichael Paquier
Per buildfarm member lapwing, coupled with an offline poke from Julien Rouhaud. 6392f2a was a similar case.
2023-03-06Revise pg_pwrite_zeros()Michael Paquier
The following changes are made to pg_write_zeros(), the API able to write series of zeros using vectored I/O: - Add of an "offset" parameter, to write the size from this position (the 'p' of "pwrite" seems to mean position, though POSIX does not outline ythat directly), hence the name of the routine is incorrect if it is not able to handle offsets. - Avoid memset() of "zbuffer" on every call. - Avoid initialization of the whole IOV array if not needed. - Group the trailing write() call with the main write() call, simplifying the function logic. Author: Andres Freund Reviewed-by: Michael Paquier, Bharath Rupireddy Discussion: https://postgr.es/m/20230215005525.mrrlmqrxzjzhaipl@awork3.anarazel.de
2023-02-05Revert refactoring of restore command code to shell_restore.cMichael Paquier
This reverts commits 24c35ec and 57169ad. PreRestoreCommand() and PostRestoreCommand() need to be put closer to the system() call calling a restore_command, as they enable in_restore_command for the startup process which would in turn trigger an immediate proc_exit() in the SIGTERM handler. Perhaps we could get rid of this behavior entirely, but 24c35ec has made the window where the flag is enabled much larger than it was, and any Postgres-like actions (palloc, etc.) taken by code paths while the flag is enabled could lead to more severe issues in the shutdown processing. Note that curculio has showed that there are much more problems in this area, unrelated to this change, actually, hence the issues related to that had better be addressed first. Keeping the code of HEAD in line with the stable branches should make that a bit easier. Per discussion with Andres Freund and Nathan Bossart. Discussion: https://postgr.es/m/Y979NR3U5VnWrTwB@paquier.xyz
2023-01-31Refactor rmtree() to use get_dirent_type().Thomas Munro
Switch to get_dirent_type() instead of lstat() while traversing a directory tree, to see if that fixes the intermittent ENOTEMPTY failures seen in recent pg_upgrade tests, on Windows CI. While refactoring, also use AllocateDir() instead of opendir() in the backend, which knows how to handle descriptor pressure. Our CI system currently uses Windows Server 2019, a version known not to have POSIX unlink semantics enabled by default yet, unlike typical Windows 10 and 11 systems. That might explain why we see this flapping on CI but (apparently) not in the build farm, though the frequency is quite low. The theory is that some directory entry must be in state STATUS_DELETE_PENDING, which lstat() would report as ENOENT, though unfortunately we don't know exactly why yet. With this change, rmtree() will not skip them, and try to unlink (again). Our unlink() wrapper should either wait a short time for them to go away when some other process closes the handle, or log a message to tell us the path of the problem file if not, so we can dig further. Discussion: https://postgr.es/m/20220919213217.ptqfdlcc5idk5xup%40awork3.anarazel.de
2023-01-20Use appendStringInfoSpaces in more placesDavid Rowley
This adjusts a few places which were appending a string constant containing spaces onto a StringInfo. We have appendStringInfoSpaces for that job, so let's use that instead. For the change to jsonb.c's add_indent() function, appendStringInfoString was being called inside a loop to append 4 spaces on each loop. This meant that enlargeStringInfo would get called once per loop. Here it should be much more efficient to get rid of the loop and just calculate the number of spaces with "level * 4" and just append all the spaces in one go. Here we additionally adjust the appendStringInfoSpaces function so it makes use of memset rather than a while loop to apply the required spaces to the StringInfo. One of the problems with the while loop was that it was incrementing one variable and decrementing another variable once per loop. That's more work than what's required to get the job done. We may as well use memset for this rather than trying to optimize the existing loop. Some testing has shown memset is faster even for very small sizes. Discussion: https://postgr.es/m/CAApHDvp_rKkvwudBKgBHniNRg67bzXVjyvVKfX0G2zS967K43A@mail.gmail.com
2023-01-18Refactor code for restoring files via shell commandsMichael Paquier
Presently, restore_command uses a different code path than archive_cleanup_command and recovery_end_command. These code paths are similar and can be easily combined, as long as it is possible to identify if a command should: - Issue a FATAL on signal. - Exit immediately on SIGTERM. While on it, this removes src/common/archive.c and its associated header. Since the introduction of c96de2c, BuildRestoreCommand() has become a simple wrapper of replace_percent_placeholders() able to call make_native_path(). This simplifies shell_restore.c as long as RestoreArchivedFile() includes a call to make_native_path(). Author: Nathan Bossart Reviewed-by: Andres Freund, Michael Paquier Discussion: https://postgr.es/m/20221227192449.GA3672473@nathanxps13
2023-01-12Code cleanupPeter Eisentraut
for commit c96de2ce1782116bd0489b1cd69ba88189a495e8 Author: Nathan Bossart <nathandbossart@gmail.com> Discussion: https://www.postgresql.org/message-id/20230111185434.GA1912982@nathanxps13
2023-01-11Common function for percent placeholder replacementPeter Eisentraut
There are a number of places where a shell command is constructed with percent-placeholders (like %x). It's cumbersome to have to open-code this several times. This factors out this logic into a separate function. This also allows us to ensure consistency for and document some subtle behaviors, such as what to do with unrecognized placeholders. The unified handling is now that incorrect and unknown placeholders are an error, where previously in most cases they were skipped or ignored. This affects the following settings: - archive_cleanup_command - archive_command - recovery_end_command - restore_command - ssl_passphrase_command The following settings are part of this refactoring but already had stricter error handling and should be unchanged in their behavior: - basebackup_to_shell.command Reviewed-by: Nathan Bossart <nathandbossart@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/5238bbed-0b01-83a6-d4b2-7eb0562a054e%40enterprisedb.com
2023-01-09Invent random_normal() to provide normally-distributed random numbers.Tom Lane
There is already a version of this in contrib/tablefunc, but it seems sufficiently widely useful to justify having it in core. Paul Ramsey Discussion: https://postgr.es/m/CACowWR0DqHAvOKUCNxTrASFkWsDLqKMd6WiXvVvaWg4pV1BMnQ@mail.gmail.com
2023-01-02Update copyright for 2023Bruce Momjian
Backpatch-through: 11
2022-12-30Change argument of appendBinaryStringInfo from char * to void *Peter Eisentraut
There is some code that uses this function to assemble some kind of packed binary layout, which requires a bunch of casts because of this. Functions taking binary data plus length should take void * instead, like memcpy() for example. Discussion: https://www.postgresql.org/message-id/flat/a0086cfc-ff0f-2827-20fe-52b591d2666c%40enterprisedb.com
2022-12-28Reorder some object files in makefilesPeter Eisentraut
This restores some once-intended alphabetical orders and makes the lists consistent between the different build systems.
2022-12-20Add copyright notices to meson filesAndrew Dunstan
Discussion: https://postgr.es/m/222b43a5-2fb3-2c1b-9cd0-375d376c8246@dunslane.net
2022-12-19Remove hardcoded dependency to cryptohash type in the internals of SCRAMMichael Paquier
SCRAM_KEY_LEN was a variable used in the internal routines of SCRAM to size a set of fixed-sized arrays used in the SHA and HMAC computations during the SASL exchange or when building a SCRAM password. This had a hard dependency on SHA-256, reducing the flexibility of SCRAM when it comes to the addition of more hash methods. A second issue was that SHA-256 is assumed as the cryptohash method to use all the time. This commit renames SCRAM_KEY_LEN to a more generic SCRAM_KEY_MAX_LEN, which is used as the size of the buffers used by the internal routines of SCRAM. This is aimed at tracking centrally the maximum size necessary for all the hash methods supported by SCRAM. A global variable has the advantage of keeping the code in its simplest form, reducing the need of more alloc/free logic for all the buffers used in the hash calculations. A second change is that the key length (SHA digest length) and hash types are now tracked by the state data in the backend and the frontend, the common portions being extended to handle these as arguments by the internal routines of SCRAM. There are a few RFC proposals floating around to extend the SCRAM protocol, including some to use stronger cryptohash algorithms, so this lifts some of the existing restrictions in the code. The code in charge of parsing and building SCRAM secrets is extended to rely on the key length and on the cryptohash type used for the exchange, assuming currently that only SHA-256 is supported for the moment. Note that the mock authentication simply enforces SHA-256. Author: Michael Paquier Reviewed-by: Peter Eisentraut, Jonathan Katz Discussion: https://postgr.es/m/Y5k3Qiweo/1g9CG6@paquier.xyz
2022-12-15Static assertions cleanupPeter Eisentraut
Because we added StaticAssertStmt() first before StaticAssertDecl(), some uses as well as the instructions in c.h are now a bit backwards from the "native" way static assertions are meant to be used in C. This updates the guidance and moves some static assertions to better places. Specifically, since the addition of StaticAssertDecl(), we can put static assertions at the file level. This moves a number of static assertions out of function bodies, where they might have been stuck out of necessity, to perhaps better places at the file level or in header files. Also, when the static assertion appears in a position where a declaration is allowed, then using StaticAssertDecl() is more native than StaticAssertStmt(). Reviewed-by: John Naylor <john.naylor@enterprisedb.com> Discussion: https://www.postgresql.org/message-id/flat/941a04e7-dd6f-c0e4-8cdf-a33b3338cbda%40enterprisedb.com
2022-12-11Convert json_in and jsonb_in to report errors softly.Tom Lane
This requires a bit of further infrastructure-extension to allow trapping errors reported by numeric_in and pg_unicode_to_server, but otherwise it's pretty straightforward. In the case of jsonb_in, we are only capturing errors reported during the initial "parse" phase. The value-construction phase (JsonbValueToJsonb) can also throw errors if assorted implementation limits are exceeded. We should improve that, but it seems like a separable project. Andrew Dunstan and Tom Lane Discussion: https://postgr.es/m/3bac9841-fe07-713d-fa42-606c225567d6@dunslane.net
2022-12-11Change JsonSemAction to allow non-throw error reporting.Tom Lane
Formerly, semantic action functions for the JSON parser returned void, so that there was no way for them to affect the parser's behavior. That means in particular that they can't force an error exit except by longjmp'ing. That won't do in the context of our project to make input functions return errors softly. Hence, change them to return the same JsonParseErrorType enum value as the parser itself uses. If an action function returns anything besides JSON_SUCCESS, the parse is abandoned and that error code is returned. Action functions can thus easily return the same error conditions that the parser already knows about. As an escape hatch for expansion, also invent a code JSON_SEM_ACTION_FAILED that the core parser does not know the exact meaning of. When returning this code, an action function must use some out-of-band mechanism for reporting the error details. This commit simply makes the API change and causes all the existing action functions to return JSON_SUCCESS, so that there is no actual change in behavior here. This is long enough and boring enough that it seemed best to commit it separately from the changes that make real use of the new mechanism. In passing, remove a duplicate assignment of transform_string_values_scalar. Discussion: https://postgr.es/m/1436686.1670701118@sss.pgh.pa.us
2022-12-07meson: Add basic PGXS compatibilityAndres Freund
Generate a Makefile.global that's complete enough for PGXS to work for some extensions. It is likely that this compatibility layer will not suffice for every extension and not all platforms - we can expand it over time. This allows extensions to use a single buildsystem across all the supported postgres versions. Once all supported PG versions support meson, we can remove the compatibility layer. Reviewed-by: Peter Eisentraut <peter.eisentraut@enterprisedb.com> Discussion: https://postgr.es/m/20221005200710.luvw5evhwf6clig6@awork3.anarazel.de
2022-11-30Refactor code parsing compression option values (-Z/--compress)Michael Paquier
This commit moves the code in charge of deparsing the method and detail strings fed later to parse_compress_specification() to a common routine, where the backward-compatible case of only an integer being found (N = 0 => "none", N > 1 => gzip at level N) is handled. Note that this has a side-effect for pg_basebackup, as we now attempt to detect "server-" and "client-" before checking for the integer-only pre-14 grammar, where values like server-N and client-N (without the follow-up detail string) are now valid rather than failing because of an unsupported method name. Past grammars are still handled the same way, but these flavors are now authorized, and would now switch to consider N = 0 as no compression and N > 1 as gzip with the compression level used as N, with the caller still controlling if the compression method should be done server-side, client-side or is unspecified. The documentation of pg_basebackup is updated to reflect that. This benefits other code paths that would like to rely on the same logic as pg_basebackup and pg_receivewal with option values used for compression specifications, one area discussed lately being pg_dump. Author: Georgios Kokolatos, Michael Paquier Discussion: https://postgr.es/m/O4mutIrCES8ZhlXJiMvzsivT7ztAMja2lkdL1LJx6O5f22I2W8PBIeLKz7mDLwxHoibcnRAYJXm1pH4tyUNC4a8eDzLn22a6Pb1S74Niexg=@pm.me
2022-11-15Check return value of pclose() correctlyPeter Eisentraut
Some callers didn't check the return value of pclose() or ClosePipeStream() correctly. Either they didn't check it at all or they treated it like the return of fclose(). The correct way is to first check whether the return value is -1, and then report errno, and then check the return value like a result from system(), for which we already have wait_result_to_str() to make it simpler. To make this more compact, expand wait_result_to_str() to also handle -1 explicitly. Reviewed-by: Ankit Kumar Pandey <itsankitkp@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/8cd9fb02-bc26-65f1-a809-b1cb360eef73@enterprisedb.com
2022-11-08Introduce pg_pwrite_zeros() in fileutils.cMichael Paquier
This routine is designed to write zeros to a file using vectored I/O, for a size given by its caller, being useful when it comes to initializing a file with a final size already known. XLogFileInitInternal() in xlog.c is changed to use this new routine when initializing WAL segments with zeros (wal_init_zero enabled). Note that the aligned buffers used for the vectored I/O writes have a size of XLOG_BLCKSZ, and not BLCKSZ anymore, as pg_pwrite_zeros() relies on PGAlignedBlock while xlog.c originally used PGAlignedXLogBlock. This routine will be used in a follow-up patch to do the pre-padding of WAL segments for pg_receivewal and pg_basebackup when these are not compressed. Author: Bharath Rupireddy Reviewed-by: Nathan Bossart, Andres Freund, Thomas Munro, Michael Paquier Discussion: https://www.postgresql.org/message-id/CALj2ACUq7nAb7%3DbJNbK3yYmp-SZhJcXFR_pLk8un6XgDzDF3OA%40mail.gmail.com
2022-10-28Remove AssertArg and AssertStatePeter Eisentraut
These don't offer anything over plain Assert, and their usage had already been declared obsolescent. Author: Nathan Bossart <nathandbossart@gmail.com> Reviewed-by: Michael Paquier <michael@paquier.xyz> Discussion: https://www.postgresql.org/message-id/20221009210148.GA900071@nathanxps13
2022-10-27Move pg_pwritev_with_retry() to src/common/file_utils.cMichael Paquier
This commit moves pg_pwritev_with_retry(), a convenience wrapper of pg_writev() able to handle partial writes, to common/file_utils.c so that the frontend code is able to use it. A first use-case targetted for this routine is pg_basebackup and pg_receivewal, for the zero-padding of a newly-initialized WAL segment. This is used currently in the backend when the GUC wal_init_zero is enabled (default). Author: Bharath Rupireddy Reviewed-by: Nathan Bossart, Thomas Munro Discussion: https://postgr.es/m/CALj2ACUq7nAb7=bJNbK3yYmp-SZhJcXFR_pLk8un6XgDzDF3OA@mail.gmail.com
2022-10-07meson: Add support for building with precompiled headersAndres Freund
This substantially speeds up building for windows, due to the vast amount of headers included via windows.h. A cross build from linux targetting mingw goes from 994.11user 136.43system 0:31.58elapsed 3579%CPU to 422.41user 89.05system 0:14.35elapsed 3562%CPU The wins on windows are similar-ish (but I don't have a system at hand just now for actual numbers). Targetting other operating systems the wins are far smaller (tested linux, macOS, FreeBSD). For now precompiled headers are disabled by default, it's not clear how well they work on all platforms. E.g. on FreeBSD gcc doesn't seem to have working support, but clang does. When doing a full build precompiled headers are only beneficial for targets with multiple .c files, as meson builds a separate precompiled header for each target (so that different compilation options take effect). This commit therefore only changes target with at least two .c files to use precompiled headers. Because this commit adds b_pch=false to the default_options new build directories will have precompiled headers disabled by default, however existing build directories will continue use the default value of b_pch, which is true. Note that using precompiled headers with ccache requires setting CCACHE_SLOPPINESS=pch_defines,time_macros to get hits. Reviewed-by: Peter Eisentraut <peter.eisentraut@enterprisedb.com> Reviewed-by: Justin Pryzby <pryzby@telsasoft.com> Discussion: https://postgr.es/m/CA+hUKG+50eOUbN++ocDc0Qnp9Pvmou23DSXu=ZA6fepOcftKqA@mail.gmail.com Discussion: https://postgr.es/m/c5736f70-bb6d-8d25-e35c-e3d886e4e905@enterprisedb.com Discussion: https://postgr.es/m/20190826054000.GE7005%40paquier.xyz
2022-09-28Change some errdetail() to errdetail_internal()Alvaro Herrera
This prevents marking the argument string for translation for gettext, and it also prevents the given string (which is already translated) from being translated at runtime. Also, mark the strings used as arguments to check_rolespec_name for translation. Backpatch all the way back as appropriate. None of this is caught by any tests (necessarily so), so I verified it manually.
2022-09-28Revert 56-bit relfilenode change and follow-up commits.Robert Haas
There are still some alignment-related failures in the buildfarm, which might or might not be able to be fixed quickly, but I've also just realized that it increased the size of many WAL records by 4 bytes because a block reference contains a RelFileLocator. The effect of that hasn't been studied or discussed, so revert for now.
2022-09-27Increase width of RelFileNumbers from 32 bits to 56 bits.Robert Haas
RelFileNumbers are now assigned using a separate counter, instead of being assigned from the OID counter. This counter never wraps around: if all 2^56 possible RelFileNumbers are used, an internal error occurs. As the cluster is limited to 2^64 total bytes of WAL, this limitation should not cause a problem in practice. If the counter were 64 bits wide rather than 56 bits wide, we would need to increase the width of the BufferTag, which might adversely impact buffer lookup performance. Also, this lets us use bigint for pg_class.relfilenode and other places where these values are exposed at the SQL level without worrying about overflow. This should remove the need to keep "tombstone" files around until the next checkpoint when relations are removed. We do that to keep RelFileNumbers from being recycled, but now that won't happen anyway. However, this patch doesn't actually change anything in this area; it just makes it possible for a future patch to do so. Dilip Kumar, based on an idea from Andres Freund, who also reviewed some earlier versions of the patch. Further review and some wordsmithing by me. Also reviewed at various points by Ashutosh Sharma, Vignesh C, Amul Sul, Álvaro Herrera, and Tom Lane. Discussion: http://postgr.es/m/CA+Tgmobp7+7kmi4gkq7Y+4AM9fTvL+O1oQ4-5gFTT+6Ng-dQ=g@mail.gmail.com
2022-09-24Message style improvementsPeter Eisentraut
2022-09-22Use min/max bounds defined by Zstd for compression levelMichael Paquier
The bounds hardcoded in compression.c since ffd5365 (minimum at 1 and maximum at 22) do not match the reality of what zstd is able to handle, these values being available via ZSTD_maxCLevel() and ZSTD_minCLevel() at run-time. The maximum of 22 is actually correct in recent versions, but the minimum was not as the library can go down to -131720 by design. This commit changes the code to use the run-time values in the code instead of some hardcoded ones. Zstd seems to assume that these bounds could change in the future, and Postgres will be able to adapt automatically to such changes thanks to what's being done in this commit. Reported-by: Justin Prysby Discussion: https://postgr.es/m/20220922033716.GL31833@telsasoft.com Backpatch-through: 15
2022-09-22meson: Add initial version of meson based build systemAndres Freund
Autoconf is showing its age, fewer and fewer contributors know how to wrangle it. Recursive make has a lot of hard to resolve dependency issues and slow incremental rebuilds. Our home-grown MSVC build system is hard to maintain for developers not using Windows and runs tests serially. While these and other issues could individually be addressed with incremental improvements, together they seem best addressed by moving to a more modern build system. After evaluating different build system choices, we chose to use meson, to a good degree based on the adoption by other open source projects. We decided that it's more realistic to commit a relatively early version of the new build system and mature it in tree. This commit adds an initial version of a meson based build system. It supports building postgres on at least AIX, FreeBSD, Linux, macOS, NetBSD, OpenBSD, Solaris and Windows (however only gcc is supported on aix, solaris). For Windows/MSVC postgres can now be built with ninja (faster, particularly for incremental builds) and msbuild (supporting the visual studio GUI, but building slower). Several aspects (e.g. Windows rc file generation, PGXS compatibility, LLVM bitcode generation, documentation adjustments) are done in subsequent commits requiring further review. Other aspects (e.g. not installing test-only extensions) are not yet addressed. When building on Windows with msbuild, builds are slower when using a visual studio version older than 2019, because those versions do not support MultiToolTask, required by meson for intra-target parallelism. The plan is to remove the MSVC specific build system in src/tools/msvc soon after reaching feature parity. However, we're not planning to remove the autoconf/make build system in the near future. Likely we're going to keep at least the parts required for PGXS to keep working around until all supported versions build with meson. Some initial help for postgres developers is at https://wiki.postgresql.org/wiki/Meson With contributions from Thomas Munro, John Naylor, Stone Tickle and others. Author: Andres Freund <andres@anarazel.de> Author: Nazir Bilal Yavuz <byavuz81@gmail.com> Author: Peter Eisentraut <peter@eisentraut.org> Reviewed-By: Peter Eisentraut <peter.eisentraut@enterprisedb.com> Discussion: https://postgr.es/m/20211012083721.hvixq4pnh2pixr3j@alap3.anarazel.de
2022-09-14Simplify handling of compression level with compression specificationsMichael Paquier
PG_COMPRESSION_OPTION_LEVEL is removed from the compression specification logic, and instead the compression level is always assigned with each library's default if nothing is directly given. This centralizes the checks on the compression methods supported by a given build, and always assigns a default compression level when parsing a compression specification. This results in complaining at an earlier stage than previously if a build supports a compression method or not, aka when parsing a specification in the backend or the frontend, and not when processing it. zstd, lz4 and zlib are able to handle in their respective routines setting up the compression level the case of a default value, hence the backend or frontend code (pg_receivewal or pg_basebackup) has now no need to know what the default compression level should be if nothing is specified: the logic is now done so as the specification parsing assigns it. It can also be enforced by passing down a "level" set to the default value, that the backend will accept (the replication protocol is for example able to handle a command like BASE_BACKUP (COMPRESSION_DETAIL 'gzip:level=-1')). This code simplification fixes an issue with pg_basebackup --gzip introduced by ffd5365, where the tarball of the streamed WAL segments would be created as of pg_wal.tar.gz with uncompressed contents, while the intention is to compress the segments with gzip at a default level. The origin of the confusion comes from the handling of the default compression level of gzip (-1 or Z_DEFAULT_COMPRESSION) and the value of 0 was getting assigned, which is what walmethods.c would consider as equivalent to no compression when streaming WAL segments with its tar methods. Assigning always the compression level removes the confusion of some code paths considering a value of 0 set in a specification as either no compression or a default compression level. Note that 010_pg_basebackup.pl has to be adjusted to skip a few tests where the shape of the compression detail string for client and server-side compression was checked using gzip. This is a result of the code simplification, as gzip specifications cannot be used if a build does not support it. Reported-by: Tom Lane Reviewed-by: Tom Lane Discussion: https://postgr.es/m/1400032.1662217889@sss.pgh.pa.us Backpatch-through: 15
2022-09-13pg_clean_ascii(): escape bytes rather than lose themPeter Eisentraut
Rather than replace each unprintable byte with a '?' character, replace it with a hex escape instead. The API now allocates a copy rather than modifying the input in place. Author: Jacob Champion <jchampion@timescale.com> Discussion: https://www.postgresql.org/message-id/CAAWbhmgsvHrH9wLU2kYc3pOi1KSenHSLAHBbCVmmddW6-mc_=w@mail.gmail.com
2022-09-13Treat Unicode codepoints of category "Format" as non-spacingJohn Naylor
Commit d8594d123 updated the list of non-spacing codepoints used for calculating display width, but in doing so inadvertently removed some, since the script used for that commit only considered combining characters. For complete coverage for zero-width characters, include codepoints in the category Cf (Format). To reflect the wider purpose, also rename files and update comments that referred specifically to combining characters. Some of these ranges have been missing since v12, but due to lack of field complaints it was determined not important enough to justify adding special-case logic the backbranches. Kyotaro Horiguchi Report by Pavel Stehule Discussion: https://www.postgresql.org/message-id/flat/CAFj8pRBE8yvpQ0FSkPCoe0Ny1jAAsAQ6j3qMgVwWvkqAoaaNmQ%40mail.gmail.com
2022-09-12Assorted examples of expanded type-safer palloc/pg_malloc APIPeter Eisentraut
This adds some uses of the new palloc/pg_malloc variants here and there as a demonstration and test. This is kept separate from the actual API patch, since the latter might be backpatched at some point. Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://www.postgresql.org/message-id/flat/bb755632-2a43-d523-36f8-a1e7a389a907@enterprisedb.com
2022-09-09Replace load of functions by direct calls for some WIN32Michael Paquier
This commit changes the following code paths to do direct system calls to some WIN32 functions rather than loading them from an external library, shaving some code in the process: - Creation of restricted tokens in pg_ctl.c, introduced by a25cd81. - QuerySecurityContextToken() in auth.c for SSPI authentication in the backend, introduced in d602592. - CreateRestrictedToken() in src/common/. This change is similar to the case of pg_ctl.c. Most of these functions were loaded rather than directly called because, as mentioned in the code comments, MinGW headers were not declaring them. I have double-checked the recent MinGW code, and all the functions changed here are declared in its headers, so this change should be safe. Note that I do not have a MinGW environment at hand so I have not tested it directly, but that MSVC was fine with the change. The buildfarm will tell soon enough if this change is appropriate or not for a much broader set of environments. A few code paths still use GetProcAddress() to load some functions: - LDAP authentication for ldap_start_tls_sA(), where I am not confident that this change would work. - win32env.c and win32ntdll.c where we have a per-MSVC version dependency for the name of the library loaded. - crashdump.c for MiniDumpWriteDump() and EnumDirTree(), where direct calls were not able to work after testing. Reported-by: Thomas Munro Reviewed-by: Justin Prysby Discussion: https://postgr.es/m/CA+hUKG+BMdcaCe=P-EjMoLTCr3zrrzqbcVE=8h5LyNsSVHKXZA@mail.gmail.com
2022-09-02Speed up lexing of long JSON stringsJohn Naylor
Use optimized linear search when looking ahead for end quotes, backslashes, and non-printable characters. This results in nearly 40% faster JSON parsing on x86-64 when most values are long strings, and all platforms should see some improvement. Reviewed by Andres Freund and Nathan Bossart Discussion: https://www.postgresql.org/message-id/CAFBsxsGhaR2KQ5eisaK%3D6Vm60t%3DaxhD8Ckj1qFoCH1pktZi%2B2w%40mail.gmail.com Discussion: https://www.postgresql.org/message-id/CAFBsxsESLUyJ5spfOSyPrOvKUEYYNqsBosue9SV1j8ecgNXSKA%40mail.gmail.com
2022-08-29Clean up inconsistent use of fflush().Tom Lane
More than twenty years ago (79fcde48b), we hacked the postmaster to avoid a core-dump on systems that didn't support fflush(NULL). We've mostly, though not completely, hewed to that rule ever since. But such systems are surely gone in the wild, so in the spirit of cleaning out no-longer-needed portability hacks let's get rid of multiple per-file fflush() calls in favor of using fflush(NULL). Also, we were fairly inconsistent about whether to fflush() before popen() and system() calls. While we've received no bug reports about that, it seems likely that at least some of these call sites are at risk of odd behavior, such as error messages appearing in an unexpected order. Rather than expend a lot of brain cells figuring out which places are at hazard, let's just establish a uniform coding rule that we should fflush(NULL) before these calls. A no-op fflush() is surely of trivial cost compared to launching a sub-process via a shell; while if it's not a no-op then we likely need it. Discussion: https://postgr.es/m/2923412.1661722825@sss.pgh.pa.us
2022-08-26Use SSE2 in is_valid_ascii() where available.John Naylor
Per flame graph from Jelte Fennema, COPY FROM ... USING BINARY shows input validation taking at least 5% of the profile, so it's worth trying to be more efficient here. With this change, validation of pure ASCII is nearly 40% faster on contemporary Intel hardware. To make this change legible and easier to adopt to additional architectures, use helper functions to abstract the platform details away. Reviewed by Nathan Bossart Discussion: https://www.postgresql.org/message-id/CAFBsxsG%3Dk8t%3DC457FXnoBXb%3D8iA4OaZkbFogFMachWif7mNnww%40mail.gmail.com
2022-08-23Don't bother to set sockaddr_un.sun_len.Thomas Munro
It's not necessary to fill in sun_len when calling bind() or connect(), on all known systems that have it. Discussion: https://postgr.es/m/2781112.1644819528%40sss.pgh.pa.us
2022-08-22Remove configure probes for sockaddr_storage members.Thomas Munro
Remove four probes for members of sockaddr_storage. Keep only the probe for sockaddr's sa_len, which is enough for our two remaining places that know about _len fields: 1. ifaddr.c needs to know if sockaddr has sa_len to understand the result of ioctl(SIOCGIFCONF). Only AIX is still using the relevant code today, but it seems like a good idea to keep it compilable on Linux. 2. ip.c was testing for presence of ss_len to decide whether to fill in sun_len in our getaddrinfo_unix() function. It's just as good to test for sa_len. If you have one, you have them all. (The code in #2 isn't actually needed at all on several OSes I checked since modern versions ignore sa_len on input to system calls. Proving that's the case for all relevant OSes is left for another day, but wouldn't get rid of that last probe anyway if we still want it for #1.) Discussion: https://postgr.es/m/CA%2BhUKGJJjF2AqdU_Aug5n2MAc1gr%3DGykNjVBZq%2Bd6Jrcp3Dyvg%40mail.gmail.com
2022-08-18Remove configure probe for netinet/tcp.h.Thomas Munro
<netinet/tcp.h> is in SUSv3 and all targeted Unix systems have it. For Windows, we can provide a stub include file, to avoid some #ifdef noise. Discussion: https://postgr.es/m/CA+hUKGKErNfhmvb_H0UprEmp4LPzGN06yR2_0tYikjzB-2ECMw@mail.gmail.com