summaryrefslogtreecommitdiff
path: root/contrib/pg_buffercache
AgeCommit message (Collapse)Author
4 daysUpdate copyright for 2026Bruce Momjian
Backpatch-through: 14
2025-12-11Use palloc_object() and palloc_array(), the last changeMichael Paquier
This is the last batch of changes that have been suggested by the author, this part covering the non-trivial changes. Some of the changes suggested have been discarded as they seem to lead to more instructions generated, leaving the parts that can be qualified as in-place replacements. Similar work has been done in 1b105f9472bd, 0c3c5c3b06a3 and 31d3847a37be. Author: David Geier <geidav.pg@gmail.com> Discussion: https://postgr.es/m/ad0748d4-3080-436e-b0bc-ac8f86a3466a@gmail.com
2025-12-11pg_buffercache: Fix memory allocation formulaMichael Paquier
The code over-allocated the memory required for os_page_status, relying on uint64 for its element size instead of an int, hence doubling what was required. This could mean quite a lot of memory if dealing with a lot of NUMA pages. Oversight in ba2a3c2302f1. Author: David Geier <geidav.pg@gmail.com> Discussion: https://postgr.es/m/ad0748d4-3080-436e-b0bc-ac8f86a3466a@gmail.com Backpatch-through: 18
2025-11-28pg_buffercache: Add pg_buffercache_mark_dirty{,_relation,_all}()Michael Paquier
This commit introduces three new functions for marking shared buffers as dirty by using the functions introduced in 9660906dbd69: * pg_buffercache_mark_dirty() for one shared buffer. - pg_buffercache_mark_dirt_relation() for all the shared buffers in a relation. * pg_buffercache_mark_dirty_all() for all the shared buffers in pool. The "_all" and "_relation" flavors are designed to address the inefficiency of repeatedly calling pg_buffercache_mark_dirty() for each individual buffer, which can be time-consuming when dealing with with large shared buffers pool. These functions are intended as developer tools and are available only to superusers. There is no need to bump the version of pg_buffercache, 4b203d499c61 having done this job in this release cycle. Author: Nazir Bilal Yavuz <byavuz81@gmail.com> Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: Aidar Imamov <a.imamov@postgrespro.ru> Reviewed-by: Amit Kapila <amit.kapila16@gmail.com> Reviewed-by: Joseph Koshakow <koshy44@gmail.com> Reviewed-by: Michael Paquier <michael@paquier.xyz> Reviewed-by: Yuhang Qiu <iamqyh@gmail.com> Reviewed-by: Xuneng Zhou <xunengzhou@gmail.com> Discussion: https://postgr.es/m/CAN55FZ0h_YoSqqutxV6DES1RW8ig6wcA8CR9rJk358YRMxZFmw@mail.gmail.com
2025-11-24pg_buffercache: Add pg_buffercache_os_pagesMichael Paquier
ba2a3c2302f has added a way to check if a buffer is spread across multiple pages with some NUMA information, via a new view pg_buffercache_numa that depends on pg_buffercache_numa_pages(), a SQL function. These can only be queried when support for libnuma exists, generating an error if not. However, it can be useful to know how shared buffers and OS pages map when NUMA is not supported or not available. This commit expands the capabilities around pg_buffercache_numa: - pg_buffercache_numa_pages() is refactored as an internal function, able to optionally process NUMA. Its SQL definition prior to this commit is still around to ensure backward-compatibility with v1.6. - A SQL function called pg_buffercache_os_pages() is added, able to work with or without NUMA. - The view pg_buffercache_numa is redefined to use pg_buffercache_os_pages(). - A new view is added, called pg_buffercache_os_pages. This ignores NUMA for its result processing, for a better efficiency. The implementation is done so as there is no code duplication between the NUMA and non-NUMA views/functions, relying on one internal function that does the job for all of them. The module is bumped to v1.7. Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com> Reviewed-by: Mircea Cadariu <cadariu.mircea@gmail.com> Reviewed-by: Michael Paquier <michael@paquier.xyz> Discussion: https://postgr.es/m/Z/fFA2heH6lpSLlt@ip-10-97-1-34.eu-west-3.compute.internal
2025-11-23pg_buffercache: Remove unused fields from BufferCacheNumaRecMichael Paquier
These fields have been added in commit ba2a3c2302f, and have never been used. While on it, this commit moves a comment that was out of place, improving it. Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com> Discussion: https://postgr.es/m/aSBOKX6pLJzumbmF@ip-10-97-1-34.eu-west-3.compute.internal
2025-11-18pg_buffercache: Fix incorrect result cast for relforknumberMichael Paquier
pg_buffercache_pages.relforknumber is defined as an int2, but its value was stored with ObjectIdGetDatum() rather than Int16GetDatum() in the result record. Author: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> Discussion: https://postgr.es/m/CAExHW5s2_qwSdhKpVnUzjRMf0cf1PvmhUHQDLaFM3QzKbP1OyQ@mail.gmail.com
2025-11-06bufmgr: Allow some buffer state modifications while holding header lockAndres Freund
Until now BufferDesc.state was not allowed to be modified while the buffer header spinlock was held. This meant that operations like unpinning buffers needed to use a CAS loop, waiting for the buffer header spinlock to be released before updating. The benefit of that restriction is that it allowed us to unlock the buffer header spinlock with just a write barrier and an unlocked write (instead of a full atomic operation). That was important to avoid regressions in 48354581a49c. However, since then the hottest buffer header spinlock uses have been replaced with atomic operations (in particular, the most common use of PinBuffer_Locked(), in GetVictimBuffer() (formerly in BufferAlloc()), has been removed in 5e899859287). This change will allow, in a subsequent commit, to release buffer pins with a single atomic-sub operation. This previously was not possible while such operations were not allowed while the buffer header spinlock was held, as an atomic-sub would not have allowed a race-free check for the buffer header lock being held. Using atomic-sub to unpin buffers is a nice scalability win, however it is not the primary motivation for this change (although it would be sufficient). The primary motivation is that we would like to merge the buffer content lock into BufferDesc.state, which will result in more frequent changes of the state variable, which in some situations can cause a performance regression, due to an increased CAS failure rate when unpinning buffers. The regression entirely vanishes when using atomic-sub. Naively implementing this would require putting CAS loops in every place modifying the buffer state while holding the buffer header lock. To avoid that, introduce UnlockBufHdrExt(), which can set/add flags as well as the refcount, together with releasing the lock. Reviewed-by: Robert Haas <robertmhaas@gmail.com> Reviewed-by: Matthias van de Meent <boekewurm+postgres@gmail.com> Discussion: https://postgr.es/m/fvfmkr5kk4nyex56ejgxj3uzi63isfxovp2biecb4bspbjrze7@az2pljabhnff
2025-08-19Add CHECK_FOR_INTERRUPTS in contrib/pg_buffercache functions.Masahiko Sawada
This commit adds CHECK_FOR_INTERRUPTS to loops iterating over shared buffers in several pg_buffercache functions, allowing them to be interrupted during long-running operations. Backpatch to all supported versions. Add CHECK_FOR_INTERRUPTS to the loop in pg_buffercache_pages() in all supported branches, and to pg_buffercache_summary() and pg_buffercache_usage_counts() in version 16 and newer. Author: SATYANARAYANA NARLAPURAM <satyanarlapuram@gmail.com> Discussion: https://postgr.es/m/CAHg+QDcejeLx7WunFT3DX6XKh1KshvGKa8F5au8xVhqVvvQPRw@mail.gmail.com Backpatch-through: 13
2025-07-01Silence valgrind about pg_numa_touch_mem_if_requiredTomas Vondra
When querying NUMA status of pages in shared memory, we need to touch the memory first to get valid results. This may trigger valgrind reports, because some of the memory (e.g. unpinned buffers) may be marked as noaccess. Solved by adding a valgrind suppresion. An alternative would be to adjust the access/noaccess status before touching the memory, but that seems far too invasive. It would require all those places to have detailed knowledge of what the shared memory stores. The pg_numa_touch_mem_if_required() macro is replaced with a function. Macros are invisible to suppressions, so it'd have to suppress reports for the caller - e.g. pg_get_shmem_allocations_numa(). So we'd suppress reports for the whole function, and that seems to heavy-handed. It might easily hide other valid issues. Reviewed-by: Christoph Berg <myon@debian.org> Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com> Discussion: https://postgr.es/m/aEtDozLmtZddARdB@msg.df7cb.de Backpatch-through: 18
2025-04-20Fix a few duplicate words in commentsDavid Rowley
These are all new to v18 Author: David Rowley <dgrowleyml@gmail.com> Discussion: https://postgr.es/m/CAApHDvrMcr8XD107H3NV=WHgyBcu=sx5+7=WArr-n_cWUqdFXQ@mail.gmail.com
2025-04-19Fix typos and grammar in the codeMichael Paquier
The large majority of these have been introduced by recent commits done in the v18 development cycle. Author: Alexander Lakhin <exclusion@gmail.com> Discussion: https://postgr.es/m/9a7763ab-5252-429d-a943-b28941e0e28b@gmail.com
2025-04-09Cleanup of pg_numa.cTomas Vondra
This moves/renames some of the functions defined in pg_numa.c: * pg_numa_get_pagesize() is renamed to pg_get_shmem_pagesize(), and moved to src/backend/storage/ipc/shmem.c. The new name better reflects that the page size is not related to NUMA, and it's specifically about the page size used for the main shared memory segment. * move pg_numa_available() to src/backend/storage/ipc/shmem.c, i.e. into the backend (which more appropriate for functions callable from SQL). While at it, improve the comment to explain what page size it returns. * remove unnecessary includes from src/port/pg_numa.c, adding unnecessary dependencies (src/port should be suitable for frontent). These were either leftovers or unnecessary thanks to the other changes in this commit. This eliminates unnecessary dependencies on backend symbols, which we don't want in src/port. Reported-by: Kirill Reshke <reshkekirill@gmail.com> Reviewed-by: Andres Freund <andres@anarazel.de> https://postgr.es/m/CALdSSPi5fj0a7UG7Fmw2cUD1uWuckU_e8dJ+6x-bJEokcSXzqA@mail.gmail.com
2025-04-08pg_buffercache: Change page_num type to bigintTomas Vondra
The page_num was defined as integer, which should be sufficient for the near future (with 4K pages it's 8TB). But it's virtually free to return bigint, and get a wider range. This was agreed on the thread, but I forgot to tweak this in ba2a3c2302f1. While at it, make the data types in CREATE VIEW a bit more consistent. Discussion: https://postgr.es/m/CAKZiRmxh6KWo0aqRqvmcoaX2jUxZYb4kGp3N%3Dq1w%2BDiH-696Xw%40mail.gmail.co
2025-04-08Add pg_buffercache_evict_{relation,all} functionsAndres Freund
In addition to the added functions, the pg_buffercache_evict() function now shows whether the buffer was flushed. pg_buffercache_evict_relation(): Evicts all shared buffers in a relation at once. pg_buffercache_evict_all(): Evicts all shared buffers at once. Both functions provide mechanism to evict multiple shared buffers at once. They are designed to address the inefficiency of repeatedly calling pg_buffercache_evict() for each individual buffer, which can be time-consuming when dealing with large shared buffer pools. (e.g., ~477ms vs. ~2576ms for 16GB of fully populated shared buffers). These functions are intended for developer testing and debugging purposes and are available to superusers only. Minimal tests for the new functions are included. Also, there was no test for pg_buffercache_evict(), test for this added too. No new extension version is needed, as it was already increased this release by ba2a3c2302f. Author: Nazir Bilal Yavuz <byavuz81@gmail.com> Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: Aidar Imamov <a.imamov@postgrespro.ru> Reviewed-by: Joseph Koshakow <koshy44@gmail.com> Discussion: https://postgr.es/m/CAN55FZ0h_YoSqqutxV6DES1RW8ig6wcA8CR9rJk358YRMxZFmw%40mail.gmail.com
2025-04-07Add pg_buffercache_numa view with NUMA node infoTomas Vondra
Introduces a new view pg_buffercache_numa, showing NUMA memory nodes for individual buffers. For each buffer the view returns an entry for each memory page, with the associated NUMA node. The database blocks and OS memory pages may have different size - the default block size is 8KB, while the memory page is 4K (on x86). But other combinations are possible, depending on configure parameters, platform, etc. This means buffers may overlap with multiple memory pages, each associated with a different NUMA node. To determine the NUMA node for a buffer, we first need to touch the memory pages using pg_numa_touch_mem_if_required, otherwise we might get status -2 (ENOENT = The page is not present), indicating the page is either unmapped or unallocated. The view may be relatively expensive, especially when accessed for the first time in a backend, as it touches all memory pages to get reliable information about the NUMA node. This may also force allocation of the shared memory. Author: Jakub Wartak <jakub.wartak@enterprisedb.com> Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com> Reviewed-by: Tomas Vondra <tomas@vondra.me> Discussion: https://postgr.es/m/CAKZiRmxh6KWo0aqRqvmcoaX2jUxZYb4kGp3N%3Dq1w%2BDiH-696Xw%40mail.gmail.com
2025-03-26Use PG_MODULE_MAGIC_EXT in our installable shared libraries.Tom Lane
It seems potentially useful to label our shared libraries with version information, now that a facility exists for retrieving that. This patch labels them with the PG_VERSION string. There was some discussion about using semantic versioning conventions, but that doesn't seem terribly helpful for modules with no SQL-level presence; and for those that do have SQL objects, we typically expect them to support multiple revisions of the SQL definitions, so it'd still not be very helpful. I did not label any of src/test/modules/. It seems unnecessary since we don't install those, and besides there ought to be someplace that still provides test coverage for the original PG_MODULE_MAGIC macro. Author: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/dd4d1b59-d0fe-49d5-b28f-1e463b68fa32@gmail.com
2025-01-01Update copyright for 2025Bruce Momjian
Backpatch-through: 13
2024-04-08Add pg_buffercache_evict() function for testing.Thomas Munro
When testing buffer pool logic, it is useful to be able to evict arbitrary blocks. This function can be used in SQL queries over the pg_buffercache view to set up a wide range of buffer pool states. Of course, buffer mappings might change concurrently so you might evict a block other than the one you had in mind, and another session might bring it back in at any time. That's OK for the intended purpose of setting up developer testing scenarios, and more complicated interlocking schemes to give stronger guararantees about that would likely be less flexible for actual testing work anyway. Superuser-only. Author: Palak Chaturvedi <chaturvedipalak1911@gmail.com> Author: Thomas Munro <thomas.munro@gmail.com> (docs, small tweaks) Reviewed-by: Nitin Jadhav <nitinjadhavpostgres@gmail.com> Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: Cary Huang <cary.huang@highgo.ca> Reviewed-by: Cédric Villemain <cedric.villemain+pgsql@abcsql.com> Reviewed-by: Jim Nasby <jim.nasby@gmail.com> Reviewed-by: Maxim Orlov <orlovmg@gmail.com> Reviewed-by: Thomas Munro <thomas.munro@gmail.com> Reviewed-by: Melanie Plageman <melanieplageman@gmail.com> Discussion: https://postgr.es/m/CALfch19pW48ZwWzUoRSpsaV9hqt0UPyaBPC4bOZ4W+c7FF566A@mail.gmail.com
2024-01-04Update copyright for 2024Bruce Momjian
Reported-by: Michael Paquier Discussion: https://postgr.es/m/ZZKTDPxBBMt3C0J9@paquier.xyz Backpatch-through: 12
2023-04-07Add pg_buffercache_usage_counts() to contrib/pg_buffercache.Tom Lane
It was pointed out that pg_buffercache_summary()'s report of the overall average usage count isn't that useful, and what would be more helpful in many cases is to report totals for each possible usage count. Add a new function to do it like that. Since pg_buffercache 1.4 is already new for v16, we don't need to create a new extension version; we'll just define this as part of 1.4. Nathan Bossart Discussion: https://postgr.es/m/20230130233040.GA2800702@nathanxps13
2023-01-02Update copyright for 2023Bruce Momjian
Backpatch-through: 11
2022-12-20Add copyright notices to meson filesAndrew Dunstan
Discussion: https://postgr.es/m/222b43a5-2fb3-2c1b-9cd0-375d376c8246@dunslane.net
2022-10-13pg_buffercache: Add pg_buffercache_summary()Andres Freund
Using pg_buffercache_summary() is significantly cheaper than querying pg_buffercache and summarizing in SQL. Author: Melih Mutlu <m.melihmutlu@gmail.com> Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: Aleksander Alekseev <aleksander@timescale.com> Reviewed-by: Zhang Mingli <zmlpostgres@gmail.com> Discussion: https://postgr.es/m/CAGPVpCQAXYo54Q%3D8gqBsS%3Du0uk9qhnnq4%2B710BtUhUisX1XGEg%40mail.gmail.com
2022-10-05meson: Add windows resource filesAndres Freund
The generated resource files aren't exactly the same ones as the old buildsystems generate. Previously "InternalName" and "OriginalFileName" were mostly wrong / not set (despite being required), but that was hard to fix in at least the make build. Additionally, the meson build falls back to a "auto-generated" description when not set, and doesn't set it in a few cases - unlikely that anybody looks at these descriptions in detail. Author: Andres Freund <andres@anarazel.de> Author: Nazir Bilal Yavuz <byavuz81@gmail.com> Reviewed-by: Peter Eisentraut <peter.eisentraut@enterprisedb.com>
2022-09-28Revert 56-bit relfilenode change and follow-up commits.Robert Haas
There are still some alignment-related failures in the buildfarm, which might or might not be able to be fixed quickly, but I've also just realized that it increased the size of many WAL records by 4 bytes because a block reference contains a RelFileLocator. The effect of that hasn't been studied or discussed, so revert for now.
2022-09-27Update pg_buffercache's meson.build.Robert Haas
Commit 05d4cbf9b6ba708858984b01ca0fc56d59d4ec7c needed to do this, but didn't. Per Justin Pryzby. Discussion: 20220927191710.GG6256@telsasoft.com
2022-09-27Increase width of RelFileNumbers from 32 bits to 56 bits.Robert Haas
RelFileNumbers are now assigned using a separate counter, instead of being assigned from the OID counter. This counter never wraps around: if all 2^56 possible RelFileNumbers are used, an internal error occurs. As the cluster is limited to 2^64 total bytes of WAL, this limitation should not cause a problem in practice. If the counter were 64 bits wide rather than 56 bits wide, we would need to increase the width of the BufferTag, which might adversely impact buffer lookup performance. Also, this lets us use bigint for pg_class.relfilenode and other places where these values are exposed at the SQL level without worrying about overflow. This should remove the need to keep "tombstone" files around until the next checkpoint when relations are removed. We do that to keep RelFileNumbers from being recycled, but now that won't happen anyway. However, this patch doesn't actually change anything in this area; it just makes it possible for a future patch to do so. Dilip Kumar, based on an idea from Andres Freund, who also reviewed some earlier versions of the patch. Further review and some wordsmithing by me. Also reviewed at various points by Ashutosh Sharma, Vignesh C, Amul Sul, Álvaro Herrera, and Tom Lane. Discussion: http://postgr.es/m/CA+Tgmobp7+7kmi4gkq7Y+4AM9fTvL+O1oQ4-5gFTT+6Ng-dQ=g@mail.gmail.com
2022-09-22meson: Add initial version of meson based build systemAndres Freund
Autoconf is showing its age, fewer and fewer contributors know how to wrangle it. Recursive make has a lot of hard to resolve dependency issues and slow incremental rebuilds. Our home-grown MSVC build system is hard to maintain for developers not using Windows and runs tests serially. While these and other issues could individually be addressed with incremental improvements, together they seem best addressed by moving to a more modern build system. After evaluating different build system choices, we chose to use meson, to a good degree based on the adoption by other open source projects. We decided that it's more realistic to commit a relatively early version of the new build system and mature it in tree. This commit adds an initial version of a meson based build system. It supports building postgres on at least AIX, FreeBSD, Linux, macOS, NetBSD, OpenBSD, Solaris and Windows (however only gcc is supported on aix, solaris). For Windows/MSVC postgres can now be built with ninja (faster, particularly for incremental builds) and msbuild (supporting the visual studio GUI, but building slower). Several aspects (e.g. Windows rc file generation, PGXS compatibility, LLVM bitcode generation, documentation adjustments) are done in subsequent commits requiring further review. Other aspects (e.g. not installing test-only extensions) are not yet addressed. When building on Windows with msbuild, builds are slower when using a visual studio version older than 2019, because those versions do not support MultiToolTask, required by meson for intra-target parallelism. The plan is to remove the MSVC specific build system in src/tools/msvc soon after reaching feature parity. However, we're not planning to remove the autoconf/make build system in the near future. Likely we're going to keep at least the parts required for PGXS to keep working around until all supported versions build with meson. Some initial help for postgres developers is at https://wiki.postgresql.org/wiki/Meson With contributions from Thomas Munro, John Naylor, Stone Tickle and others. Author: Andres Freund <andres@anarazel.de> Author: Nazir Bilal Yavuz <byavuz81@gmail.com> Author: Peter Eisentraut <peter@eisentraut.org> Reviewed-By: Peter Eisentraut <peter.eisentraut@enterprisedb.com> Discussion: https://postgr.es/m/20211012083721.hvixq4pnh2pixr3j@alap3.anarazel.de
2022-08-24Include RelFileLocator fields individually in BufferTag.Robert Haas
This is preparatory work for a project to increase the number of bits in a RelFileNumber from 32 to 56. Along the way, introduce static inline accessor functions for a couple of BufferTag fields. Dilip Kumar, reviewed by me. The overall patch series has also had review at various times from Andres Freund, Ashutosh Sharma, Hannu Krosing, Vignesh C, Álvaro Herrera, and Tom Lane. Discussion: http://postgr.es/m/CAFiTN-trubju5YbWAq-BSpZ90-Z6xCVBQE8BVqXqANOZAF1Znw@mail.gmail.com
2022-07-30Add regression test coverage for contrib/pg_buffercache.Tom Lane
We can't check the output of this view very closely without creating portability headaches, but we can make sure that the number of rows is as-expected. In any case, this is sufficient to exercise all the C code within, which is a lot better than the 0% coverage we had before. DongWook Lee Discussion: https://postgr.es/m/CAAcByaLCHGJB7qAENEcx9D09UL=w4ma+yijwF_-1MSqQZ9wK6Q@mail.gmail.com
2022-07-06Change internal RelFileNode references to RelFileNumber or RelFileLocator.Robert Haas
We have been using the term RelFileNode to refer to either (1) the integer that is used to name the sequence of files for a certain relation within the directory set aside for that tablespace/database combination; or (2) that value plus the OIDs of the tablespace and database; or occasionally (3) the whole series of files created for a relation based on those values. Using the same name for more than one thing is confusing. Replace RelFileNode with RelFileNumber when we're talking about just the single number, i.e. (1) from above, and with RelFileLocator when we're talking about all the things that are needed to locate a relation's files on disk, i.e. (2) from above. In the places where we refer to (3) as a relfilenode, instead refer to "relation storage". Since there is a ton of SQL code in the world that knows about pg_class.relfilenode, don't change the name of that column, or of other SQL-facing things that derive their name from it. On the other hand, do adjust closely-related internal terminology. For example, the structure member names dbNode and spcNode appear to be derived from the fact that the structure itself was called RelFileNode, so change those to dbOid and spcOid. Likewise, various variables with names like rnode and relnode get renamed appropriately, according to how they're being used in context. Hopefully, this is clearer than before. It is also preparation for future patches that intend to widen the relfilenumber fields from its current width of 32 bits. Variables that store a relfilenumber are now declared as type RelFileNumber rather than type Oid; right now, these are the same, but that can now more easily be changed. Dilip Kumar, per an idea from me. Reviewed also by Andres Freund. I fixed some whitespace issues, changed a couple of words in a comment, and made one other minor correction. Discussion: http://postgr.es/m/CA+TgmoamOtXbVAQf9hWFzonUo6bhhjS6toZQd7HZ-pmojtAmag@mail.gmail.com Discussion: http://postgr.es/m/CA+Tgmobp7+7kmi4gkq7Y+4AM9fTvL+O1oQ4-5gFTT+6Ng-dQ=g@mail.gmail.com Discussion: http://postgr.es/m/CAFiTN-vTe79M8uDH1yprOU64MNFE+R3ODRuA+JWf27JbhY4hJw@mail.gmail.com
2020-02-19Remove support for upgrading extensions from "unpackaged" state.Tom Lane
Andres Freund pointed out that allowing non-superusers to run "CREATE EXTENSION ... FROM unpackaged" has security risks, since the unpackaged-to-1.0 scripts don't try to verify that the existing objects they're modifying are what they expect. Just attaching such objects to an extension doesn't seem too dangerous, but some of them do more than that. We could have resolved this, perhaps, by still requiring superuser privilege to use the FROM option. However, it's fair to ask just what we're accomplishing by continuing to lug the unpackaged-to-1.0 scripts forward. None of them have received any real testing since 9.1 days, so they may not even work anymore (even assuming that one could still load the previous "loose" object definitions into a v13 database). And an installation that's trying to go from pre-9.1 to v13 or later in one jump is going to have worse compatibility problems than whether there's a trivial way to convert their contrib modules into extension style. Hence, let's just drop both those scripts and the core-code support for "CREATE EXTENSION ... FROM". Discussion: https://postgr.es/m/20200213233015.r6rnubcvl4egdh5r@alap3.anarazel.de
2019-11-05Split all OBJS style lines in makefiles into one-line-per-entry style.Andres Freund
When maintaining or merging patches, one of the most common sources for conflicts are the list of objects in makefiles. Especially when the split across lines has been changed on both sides, which is somewhat common due to attempting to stay below 80 columns, those conflicts are unnecessarily laborious to resolve. By splitting, and alphabetically sorting, OBJS style lines into one object per line, conflicts should be less frequent, and easier to resolve when they still occur. Author: Andres Freund Discussion: https://postgr.es/m/20191029200901.vww4idgcxv74cwes@alap3.anarazel.de
2018-11-21Remove WITH OIDS support, change oid catalog column visibility.Andres Freund
Previously tables declared WITH OIDS, including a significant fraction of the catalog tables, stored the oid column not as a normal column, but as part of the tuple header. This special column was not shown by default, which was somewhat odd, as it's often (consider e.g. pg_class.oid) one of the more important parts of a row. Neither pg_dump nor COPY included the contents of the oid column by default. The fact that the oid column was not an ordinary column necessitated a significant amount of special case code to support oid columns. That already was painful for the existing, but upcoming work aiming to make table storage pluggable, would have required expanding and duplicating that "specialness" significantly. WITH OIDS has been deprecated since 2005 (commit ff02d0a05280e0). Remove it. Removing includes: - CREATE TABLE and ALTER TABLE syntax for declaring the table to be WITH OIDS has been removed (WITH (oids[ = true]) will error out) - pg_dump does not support dumping tables declared WITH OIDS and will issue a warning when dumping one (and ignore the oid column). - restoring an pg_dump archive with pg_restore will warn when restoring a table with oid contents (and ignore the oid column) - COPY will refuse to load binary dump that includes oids. - pg_upgrade will error out when encountering tables declared WITH OIDS, they have to be altered to remove the oid column first. - Functionality to access the oid of the last inserted row (like plpgsql's RESULT_OID, spi's SPI_lastoid, ...) has been removed. The syntax for declaring a table WITHOUT OIDS (or WITH (oids = false) for CREATE TABLE) is still supported. While that requires a bit of support code, it seems unnecessary to break applications / dumps that do not use oids, and are explicit about not using them. The biggest user of WITH OID columns was postgres' catalog. This commit changes all 'magic' oid columns to be columns that are normally declared and stored. To reduce unnecessary query breakage all the newly added columns are still named 'oid', even if a table's column naming scheme would indicate 'reloid' or such. This obviously requires adapting a lot code, mostly replacing oid access via HeapTupleGetOid() with access to the underlying Form_pg_*->oid column. The bootstrap process now assigns oids for all oid columns in genbki.pl that do not have an explicit value (starting at the largest oid previously used), only oids assigned later by oids will be above FirstBootstrapObjectId. As the oid column now is a normal column the special bootstrap syntax for oids has been removed. Oids are not automatically assigned during insertion anymore, all backend code explicitly assigns oids with GetNewOidWithIndex(). For the rare case that insertions into the catalog via SQL are called for the new pg_nextoid() function can be used (which only works on catalog tables). The fact that oid columns on system tables are now normal columns means that they will be included in the set of columns expanded by * (i.e. SELECT * FROM pg_class will now include the table's oid, previously it did not). It'd not technically be hard to hide oid column by default, but that'd mean confusing behavior would either have to be carried forward forever, or it'd cause breakage down the line. While it's not unlikely that further adjustments are needed, the scope/invasiveness of the patch makes it worthwhile to get merge this now. It's painful to maintain externally, too complicated to commit after the code code freeze, and a dependency of a number of other patches. Catversion bump, for obvious reasons. Author: Andres Freund, with contributions by John Naylor Discussion: https://postgr.es/m/20180930034810.ywp2c7awz7opzcfr@alap3.anarazel.de
2017-06-21Phase 2 of pgindent updates.Tom Lane
Change pg_bsd_indent to follow upstream rules for placement of comments to the right of code, and remove pgindent hack that caused comments following #endif to not obey the general rule. Commit e3860ffa4dd0dad0dd9eea4be9cc1412373a8c89 wasn't actually using the published version of pg_bsd_indent, but a hacked-up version that tried to minimize the amount of movement of comments to the right of code. The situation of interest is where such a comment has to be moved to the right of its default placement at column 33 because there's code there. BSD indent has always moved right in units of tab stops in such cases --- but in the previous incarnation, indent was working in 8-space tab stops, while now it knows we use 4-space tabs. So the net result is that in about half the cases, such comments are placed one tab stop left of before. This is better all around: it leaves more room on the line for comment text, and it means that in such cases the comment uniformly starts at the next 4-space tab stop after the code, rather than sometimes one and sometimes two tabs after. Also, ensure that comments following #endif are indented the same as comments following other preprocessor commands such as #else. That inconsistency turns out to have been self-inflicted damage from a poorly-thought-through post-indent "fixup" in pgindent. This patch is much less interesting than the first round of indent changes, but also bulkier, so I thought it best to separate the effects. Discussion: https://postgr.es/m/E1dAmxK-0006EE-1r@gemulon.postgresql.org Discussion: https://postgr.es/m/30527.1495162840@sss.pgh.pa.us
2017-03-30Default monitoring rolesSimon Riggs
Three nologin roles with non-overlapping privs are created by default * pg_read_all_settings - read all GUCs. * pg_read_all_stats - pg_stat_*, pg_database_size(), pg_tablespace_size() * pg_stat_scan_tables - may lock/scan tables Top level role - pg_monitor includes all of the above by default, plus others Author: Dave Page Reviewed-by: Stephen Frost, Robert Haas, Peter Eisentraut, Simon Riggs
2016-09-29Don't bother to lock bufmgr partitions in pg_buffercache.Heikki Linnakangas
That makes the view a lot less disruptive to use on a production system. Without the locks, you don't get a consistent snapshot across all buffers, but that's OK. It wasn't a very useful guarantee in practice. Ivan Kartyshov, reviewed by Tomas Vondra and Robert Haas. Discusssion: <f9d6cab2-73a7-7a84-55a8-07dcb8516ae5@postgrespro.ru>
2016-09-15pg_buffercache: Allow huge allocations.Robert Haas
Otherwise, users who have configured shared_buffers >= 256GB won't be able to use this module. There probably aren't many of those, but it doesn't hurt anything to fix it so that it works. Backpatch to 9.4, where MemoryContextAllocHuge was introduced. The same problem exists in older branches, but there's no easy way to fix it there. KaiGai Kohei
2016-06-09Update pg_buffercache extension for parallel query.Robert Haas
The pg_buffercache_pages function provided by this extension is PARALLEL SAFE. Andreas Karlsson
2016-04-11Allow Pin/UnpinBuffer to operate in a lockfree manner.Andres Freund
Pinning/Unpinning a buffer is a very frequent operation; especially in read-mostly cache resident workloads. Benchmarking shows that in various scenarios the spinlock protecting a buffer header's state becomes a significant bottleneck. The problem can be reproduced with pgbench -S on larger machines, but can be considerably worse for queries which touch the same buffers over and over at a high frequency (e.g. nested loops over a small inner table). To allow atomic operations to be used, cram BufferDesc's flags, usage_count, buf_hdr_lock, refcount into a single 32bit atomic variable; that allows to manipulate them together using 32bit compare-and-swap operations. This requires reducing MAX_BACKENDS to 2^18-1 (which could be lifted by using a 64bit field, but it's not a realistic configuration atm). As not all operations can easily implemented in a lockfree manner, implement the previous buf_hdr_lock via a flag bit in the atomic variable. That way we can continue to lock the header in places where it's needed, but can get away without acquiring it in the more frequent hot-paths. There's some additional operations which can be done without the lock, but aren't in this patch; but the most important places are covered. As bufmgr.c now essentially re-implements spinlocks, abstract the delay logic from s_lock.c into something more generic. It now has already two users, and more are coming up; there's a follupw patch for lwlock.c at least. This patch is based on a proof-of-concept written by me, which Alexander Korotkov made into a fully working patch; the committed version is again revised by me. Benchmarking and testing has, amongst others, been provided by Dilip Kumar, Alexander Korotkov, Robert Haas. On a large x86 system improvements for readonly pgbench, with a high client count, of a factor of 8 have been observed. Author: Alexander Korotkov and Andres Freund Discussion: 2400449.GjM57CE0Yg@dinodell
2015-05-24pgindent run for 9.5Bruce Momjian
2015-05-20Collection of typo fixes.Heikki Linnakangas
Use "a" and "an" correctly, mostly in comments. Two error messages were also fixed (they were just elogs, so no translation work required). Two function comments in pg_proc.h were also fixed. Etsuro Fujita reported one of these, but I found a lot more with grep. Also fix a few other typos spotted while grepping for the a/an typos. For example, "consists out of ..." -> "consists of ...". Plus a "though"/ "through" mixup reported by Euler Taveira. Many of these typos were in old code, which would be nice to backpatch to make future backpatching easier. But much of the code was new, and I didn't feel like crafting separate patches for each branch. So no backpatching.
2015-01-29Align buffer descriptors to cache line boundaries.Andres Freund
Benchmarks has shown that aligning the buffer descriptor array to cache lines is important for scalability; especially on bigger, multi-socket, machines. Currently the array sometimes already happens to be aligned by happenstance, depending how large previous shared memory allocations were. That can lead to wildly varying performance results after minor configuration changes. In addition to aligning the start of descriptor array, also force the size of individual descriptors to be of a common cache line size (64 bytes). That happens to already be the case on 64bit platforms, but this way we can change the struct BufferDesc more easily. As the alignment primarily matters in highly concurrent workloads which probably all are 64bit these days, and the space wastage of element alignment would be a bit more noticeable on 32bit systems, we don't force the stride to be cacheline sized on 32bit platforms for now. If somebody does actual performance testing, we can reevaluate that decision by changing the definition of BUFFERDESC_PADDED_SIZE. Discussion: 20140202151319.GD32123@awork2.anarazel.de Per discussion with Bruce Momjan, Tom Lane, Robert Haas, and Peter Geoghegan.
2014-08-30Make backend local tracking of buffer pins memory efficient.Andres Freund
Since the dawn of time (aka Postgres95) multiple pins of the same buffer by one backend have been optimized not to modify the shared refcount more than once. This optimization has always used a NBuffer sized array in each backend keeping track of a backend's pins. That array (PrivateRefCount) was one of the biggest per-backend memory allocations, depending on the shared_buffers setting. Besides the waste of memory it also has proven to be a performance bottleneck when assertions are enabled as we make sure that there's no remaining pins left at the end of transactions. Also, on servers with lots of memory and a correspondingly high shared_buffers setting the amount of random memory accesses can also lead to poor cpu cache efficiency. Because of these reasons a backend's buffers pins are now kept track of in a small statically sized array that overflows into a hash table when necessary. Benchmarks have shown neutral to positive performance results with considerably lower memory usage. Patch by me, review by Robert Haas. Discussion: 20140321182231.GA17111@alap3.anarazel.de
2014-08-25Fix typos in some error messages thrown by extension scripts when fed to psql.Andres Freund
Some of the many error messages introduced in 458857cc missed 'FROM unpackaged'. Also e016b724 and 45ffeb7e forgot to quote extension version numbers. Backpatch to 9.1, just like 458857cc which introduced the messages. Do so because the error messages thrown when the wrong command is copy & pasted aren't easy to understand.
2014-08-22Fix newly introduced misspelling of existence in pg_buffercache.Andres Freund
Peter Geoghegan
2014-08-21Add pinning_backends column to the pg_buffercache extension.Andres Freund
The new column shows how many backends have a buffer pinned. That can be useful during development or to diagnose production issues e.g. caused by vacuum waiting for cleanup locks. To handle upgrades transparently - the extension might be used in views - deal with callers expecting the old number of columns. Reviewed by Fujii Masao and Rajeev rastogi.
2014-07-14Add file version information to most installed Windows binaries.Noah Misch
Prominent binaries already had this metadata. A handful of minor binaries, such as pg_regress.exe, still lack it; efforts to eliminate such exceptions are welcome. Michael Paquier, reviewed by MauMau.
2014-07-10Adjust blank lines around PG_MODULE_MAGIC defines, for consistencyBruce Momjian
Report by Robert Haas