Don't use Asserts to check for violations of replication protocol.
authorTom Lane <tgl@sss.pgh.pa.us>
Sat, 12 Jun 2021 16:59:15 +0000 (12:59 -0400)
committerTom Lane <tgl@sss.pgh.pa.us>
Sat, 12 Jun 2021 16:59:15 +0000 (12:59 -0400)
Using an Assert to check the validity of incoming messages is an
extremely poor decision.  In a debug build, it should not be that easy
for a broken or malicious remote client to crash the logrep worker.
The consequences could be even worse in non-debug builds, which will
fail to make such checks at all, leading to who-knows-what misbehavior.
Hence, promote every Assert that could possibly be triggered by wrong
or out-of-order replication messages to a full test-and-ereport.

To avoid bloating the set of messages the translation team has to cope
with, establish a policy that replication protocol violation error
reports don't need to be translated.  Hence, all the new messages here
use errmsg_internal().  A couple of old messages are changed likewise
for consistency.

Along the way, fix some non-idiomatic or outright wrong uses of
hash_search().

Most of these mistakes are new with the "streaming replication"
patch (commit 464824323), but a couple go back a long way.
Back-patch as appropriate.

Discussion: https://postgr.es/m/1719083.1623351052@sss.pgh.pa.us

src/backend/replication/logical/reorderbuffer.c
src/backend/replication/logical/worker.c

index 5251932669074a65828173985a48348a53b60b18..1351b33011f8c17ce2b9fab4908b01e5aa60afa0 100644 (file)
@@ -1380,7 +1380,7 @@ ReorderBufferBuildTupleCidHash(ReorderBuffer *rb, ReorderBufferTXN *txn)
        ent = (ReorderBufferTupleCidEnt *)
            hash_search(txn->tuplecid_hash,
                        (void *) &key,
-                       HASH_ENTER | HASH_FIND,
+                       HASH_ENTER,
                        &found);
        if (!found)
        {
index da748668ecc184655ae51adcf72c5f19d7745a83..1dbc0a4a60706d68168c464e20a24cccba6b0beb 100644 (file)
@@ -559,7 +559,14 @@ apply_handle_commit(StringInfo s)
 
    logicalrep_read_commit(s, &commit_data);
 
-   Assert(commit_data.commit_lsn == remote_final_lsn);
+   if (commit_data.commit_lsn != remote_final_lsn)
+       ereport(ERROR,
+               (errcode(ERRCODE_PROTOCOL_VIOLATION),
+                errmsg_internal("incorrect commit LSN %X/%X in commit message (expected %X/%X)",
+                                (uint32) (commit_data.commit_lsn >> 32),
+                                (uint32) commit_data.commit_lsn,
+                                (uint32) (remote_final_lsn >> 32),
+                                (uint32) remote_final_lsn)));
 
    /* The synchronization worker runs in single transaction. */
    if (IsTransactionState() && !am_tablesync_worker())