Fix "missing continuation record" after standby promotion
authorAlvaro Herrera <alvherre@alvh.no-ip.org>
Wed, 23 Mar 2022 17:22:10 +0000 (18:22 +0100)
committerAlvaro Herrera <alvherre@alvh.no-ip.org>
Wed, 23 Mar 2022 17:22:10 +0000 (18:22 +0100)
Invalidate abortedRecPtr and missingContrecPtr after a missing
continuation record is successfully skipped on a standby. This fixes a
PANIC caused when a recently promoted standby attempts to write an
OVERWRITE_RECORD with an LSN of the previously read aborted record.

Backpatch to 10 (all stable versions).

Author: Sami Imseih <simseih@amazon.com>
Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org>
Discussion: https://postgr.es/m/44D259DE-7542-49C4-8A52-2AB01534DCA9@amazon.com

src/backend/access/transam/xlogrecovery.c
src/test/recovery/t/026_overwrite_contrecord.pl

index 9feea3e6ec99ac58ea2c6f42a4cf74dc7852503b..8d2395dae256a067d9645d7e8ea922e06f5cd81e 100644 (file)
@@ -1948,6 +1948,10 @@ xlogrecovery_redo(XLogReaderState *record, TimeLineID replayTLI)
                                 LSN_FORMAT_ARGS(xlrec.overwritten_lsn),
                                 LSN_FORMAT_ARGS(record->overwrittenRecPtr));
 
+               /* We have safely skipped the aborted record */
+               abortedRecPtr = InvalidXLogRecPtr;
+               missingContrecPtr = InvalidXLogRecPtr;
+
                ereport(LOG,
                                (errmsg("successfully skipped missing contrecord at %X/%X, overwritten at %s",
                                                LSN_FORMAT_ARGS(xlrec.overwritten_lsn),
index 0fd907f152608592c39e331b4b4ec8f127fbef6a..78feccd9aaef02de9c3eed413ec4ec395c65e31f 100644 (file)
@@ -13,7 +13,7 @@ use Test::More;
 # Test: Create a physical replica that's missing the last WAL file,
 # then restart the primary to create a divergent WAL file and observe
 # that the replica replays the "overwrite contrecord" from that new
-# file.
+# file and the standby promotes successfully.
 
 my $node = PostgreSQL::Test::Cluster->new('primary');
 $node->init(allows_streaming => 1);
@@ -100,6 +100,9 @@ like(
        qr[successfully skipped missing contrecord at],
        "found log line in standby");
 
+# Verify promotion is successful
+$node_standby->promote;
+
 $node->stop;
 $node_standby->stop;