Fix race condition in reading commit timestamps
authorAlvaro Herrera <alvherre@alvh.no-ip.org>
Thu, 19 Jan 2017 21:23:09 +0000 (18:23 -0300)
committerAlvaro Herrera <alvherre@alvh.no-ip.org>
Thu, 19 Jan 2017 21:24:17 +0000 (18:24 -0300)
commit8eace46d34ab6ac0d887aa4d3504bc4222c2e448
tree1e6d8a76c81c56b452eff3885381f7d998aee7f7
parent8b0fec93ecc788c8d8b329d41ab795712d8dcc5a
Fix race condition in reading commit timestamps

If a user requests the commit timestamp for a transaction old enough
that its data is concurrently being truncated away by vacuum at just the
right time, they would receive an ugly internal file-not-found error
message from slru.c rather than the expected NULL return value.

In a primary server, the window for the race is very small: the lookup
has to occur exactly between the two calls by vacuum, and there's not a
lot that happens between them (mostly just a multixact truncate).  In a
standby server, however, the window is larger because the truncation is
executed as soon as the WAL record for it is replayed, but the advance
of the oldest-Xid is not executed until the next checkpoint record.

To fix in the primary, simply reverse the order of operations in
vac_truncate_clog.  To fix in the standby, augment the WAL truncation
record so that the standby is aware of the new oldest-XID value and can
apply the update immediately.  WAL version bumped because of this.

No backpatch, because of the low importance of the bug and its rarity.

Author: Craig Ringer
Reviewed-By: Petr JelĂ­nek, Peter Eisentraut
Discussion: https://postgr.es/m/CAMsr+YFhVtRQT1VAwC+WGbbxZZRzNou=N9Ed-FrCqkwQ8H8oJQ@mail.gmail.com
src/backend/access/rmgrdesc/committsdesc.c
src/backend/access/transam/commit_ts.c
src/backend/commands/vacuum.c
src/include/access/commit_ts.h
src/include/access/xlog_internal.h