Fix race condition with unprotected use of a latch pointer variable.
authorTom Lane <tgl@sss.pgh.pa.us>
Tue, 3 Oct 2017 18:00:56 +0000 (14:00 -0400)
committerTom Lane <tgl@sss.pgh.pa.us>
Tue, 3 Oct 2017 18:00:56 +0000 (14:00 -0400)
commit45f9d08684d954b0e514b69f270e763d2785dd53
tree9df4017dcfe21f08dcf1c7f01b0c3b21bea811b7
parent89e434b59caffeeeb7478653c74ad5d7a50d2e96
Fix race condition with unprotected use of a latch pointer variable.

Commit 597a87ccc introduced a latch pointer variable to replace use
of a long-lived shared latch in the shared WalRcvData structure.
This was not well thought out, because there are now hazards of the
pointer variable changing while it's being inspected by another
process.  This could obviously lead to a core dump in code like

if (WalRcv->latch)
SetLatch(WalRcv->latch);

and there's a more remote risk of a torn read, if we have any
platforms where reading/writing a pointer is not atomic.

An actual problem would occur only if the walreceiver process
exits (gracefully) while the startup process is trying to
signal it, but that seems well within the realm of possibility.

To fix, treat the pointer variable (not the referenced latch)
as being protected by the WalRcv->mutex spinlock.  There
remains a race condition that we could apply SetLatch to a
process latch that no longer belongs to the walreceiver, but
I believe that's harmless: at worst it'd cause an extra wakeup
of the next process to use that PGPROC structure.

Back-patch to v10 where the faulty code was added.

Discussion: https://postgr.es/m/22735.1507048202@sss.pgh.pa.us
src/backend/replication/walreceiver.c
src/backend/replication/walreceiverfuncs.c
src/include/replication/walreceiver.h