summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorSimon Riggs2012-06-01 11:47:01 +0000
committerSimon Riggs2012-06-01 11:47:01 +0000
commit46ceb398f09e0bd8427b73350d48e414f4372153 (patch)
treede2df31a2b074a30c1a3e4e6e56364b611651474
parent8c9f2dc67084986332dc1eaf7228461ff1f2d9a6 (diff)
Avoid early reuse of btree pages, causing incorrect query results.
When we allowed read-only transactions to skip assigning XIDs we introduced the possibility that a fully deleted btree page could be reused. This broke the index link sequence which could then lead to indexscans silently returning fewer rows than would have been correct. The actual incidence of silent errors from this is thought to be very low because of the exact workload required and locking pre-conditions. Fix is to remove pages only if index page opaque->btpo.xact precedes RecentGlobalXmin. Noah Misch, reviewed and backpatched by Simon Riggs
-rw-r--r--src/backend/access/nbtree/README10
-rw-r--r--src/backend/access/nbtree/nbtpage.c2
2 files changed, 7 insertions, 5 deletions
diff --git a/src/backend/access/nbtree/README b/src/backend/access/nbtree/README
index 9fe84e320e2..101ed38a232 100644
--- a/src/backend/access/nbtree/README
+++ b/src/backend/access/nbtree/README
@@ -261,13 +261,15 @@ we need to be sure we don't miss or re-scan any items.
A deleted page can only be reclaimed once there is no scan or search that
has a reference to it; until then, it must stay in place with its
-right-link undisturbed. We implement this by waiting until all
-transactions that were running at the time of deletion are dead; which is
+right-link undisturbed. We implement this by waiting until all active
+snapshots and registered snapshots as of the deletion are gone; which is
overly strong, but is simple to implement within Postgres. When marked
dead, a deleted page is labeled with the next-transaction counter value.
VACUUM can reclaim the page for re-use when this transaction number is
-older than the oldest open transaction. (NOTE: VACUUM FULL can reclaim
-such pages immediately.)
+older than RecentGlobalXmin. As collateral damage, this implementation
+also waits for running XIDs with no snapshots and for snapshots taken
+until the next transaction to allocate an XID commits.
+(NOTE: VACUUM FULL can reclaim such pages immediately.)
Reclaiming a page doesn't actually change its state on disk --- we simply
record it in the shared-memory free space map, from which it will be
diff --git a/src/backend/access/nbtree/nbtpage.c b/src/backend/access/nbtree/nbtpage.c
index ad458a9fe45..9fa29776dfb 100644
--- a/src/backend/access/nbtree/nbtpage.c
+++ b/src/backend/access/nbtree/nbtpage.c
@@ -639,7 +639,7 @@ _bt_page_recyclable(Page page)
*/
opaque = (BTPageOpaque) PageGetSpecialPointer(page);
if (P_ISDELETED(opaque) &&
- TransactionIdPrecedesOrEquals(opaque->btpo.xact, RecentXmin))
+ TransactionIdPrecedes(opaque->btpo.xact, RecentGlobalXmin))
return true;
return false;
}