summaryrefslogtreecommitdiff
path: root/src/include
diff options
context:
space:
mode:
authorPeter Geoghegan2025-06-06 14:19:44 +0000
committerPeter Geoghegan2025-06-06 14:19:44 +0000
commite6eed40e44419e3268d01fe0d2daec08a7df68f7 (patch)
tree6895e05e740c6c192b1102d60af5ac12dd987084 /src/include
parent016e407f4ba10b230f5094c9ba36a1df3d34fb22 (diff)
Avoid BufferGetLSNAtomic() calls during nbtree scans.
Delay calling BufferGetLSNAtomic() until we finish reading a page that actually contains items that btgettuple will return to the executor. This reduces the number of calls during plain index scans (we'll only call BufferGetLSNAtomic() when _bt_readpage returns true), and totally eliminates calls during index-only scans, bitmap index scans, and plain index scans of an unlogged relation. Currently, when checksums (or wal_log_hints) are enabled, acquiring a page's LSN in BufferGetLSNAtomic() involves locking the buffer header (which involves the use of spinlocks). Testing has shown that enabling page-level checksums causes large regressions with certain workloads, especially on larger multi-socket systems. The regression isn't tied to any Postgres 18 commit. However, Postgres 18 commit 04bec894 made initdb use checksums by default, so it seems prudent to address the problem now. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Tomas Vondra <tomas@vondra.me> Discussion: https://postgr.es/m/941f0190-e3c6-4622-9ac7-c04e936e5fdb@vondra.me Discussion: https://postgr.es/m/CAH2-Wzk-Dg5XWs_jDuiHt4_7ryrSY+n=vxmHY51EVqPDFsKXmg@mail.gmail.com
Diffstat (limited to 'src/include')
-rw-r--r--src/include/access/nbtree.h5
1 files changed, 3 insertions, 2 deletions
diff --git a/src/include/access/nbtree.h b/src/include/access/nbtree.h
index ebca02588d3..e709d2e0afe 100644
--- a/src/include/access/nbtree.h
+++ b/src/include/access/nbtree.h
@@ -939,7 +939,7 @@ typedef BTVacuumPostingData *BTVacuumPosting;
* processing. This approach minimizes lock/unlock traffic. We must always
* drop the lock to make it okay for caller to process the returned items.
* Whether or not we can also release the pin during this window will vary.
- * We drop the pin eagerly (when safe) to avoid blocking progress by VACUUM
+ * We drop the pin (when so->dropPin) to avoid blocking progress by VACUUM
* (see nbtree/README section about making concurrent TID recycling safe).
* We'll always release both the lock and the pin on the current page before
* moving on to its sibling page.
@@ -967,7 +967,7 @@ typedef struct BTScanPosData
BlockNumber currPage; /* page referenced by items array */
BlockNumber prevPage; /* currPage's left link */
BlockNumber nextPage; /* currPage's right link */
- XLogRecPtr lsn; /* currPage's LSN */
+ XLogRecPtr lsn; /* currPage's LSN (when so->dropPin) */
/* scan direction for the saved position's call to _bt_readpage */
ScanDirection dir;
@@ -1070,6 +1070,7 @@ typedef struct BTScanOpaqueData
/* info about killed items if any (killedItems is NULL if never used) */
int *killedItems; /* currPos.items indexes of killed items */
int numKilled; /* number of currently stored items */
+ bool dropPin; /* drop leaf pin before btgettuple returns? */
/*
* If we are doing an index-only scan, these are the tuple storage