Fix race condition with unprotected use of a latch pointer variable.
authorTom Lane <tgl@sss.pgh.pa.us>
Tue, 3 Oct 2017 18:00:57 +0000 (14:00 -0400)
committerTom Lane <tgl@sss.pgh.pa.us>
Tue, 3 Oct 2017 18:00:57 +0000 (14:00 -0400)
commit2451de7e9e72943dbc71a3749379cf16946920f0
tree5bcba1a1e0bfecadb3684f03134966abda0b87c1
parentad40d5f74585d1f14ea4b73c20fb5b2dabfb153f
Fix race condition with unprotected use of a latch pointer variable.

Commit 597a87ccc introduced a latch pointer variable to replace use
of a long-lived shared latch in the shared WalRcvData structure.
This was not well thought out, because there are now hazards of the
pointer variable changing while it's being inspected by another
process.  This could obviously lead to a core dump in code like

if (WalRcv->latch)
SetLatch(WalRcv->latch);

and there's a more remote risk of a torn read, if we have any
platforms where reading/writing a pointer is not atomic.

An actual problem would occur only if the walreceiver process
exits (gracefully) while the startup process is trying to
signal it, but that seems well within the realm of possibility.

To fix, treat the pointer variable (not the referenced latch)
as being protected by the WalRcv->mutex spinlock.  There
remains a race condition that we could apply SetLatch to a
process latch that no longer belongs to the walreceiver, but
I believe that's harmless: at worst it'd cause an extra wakeup
of the next process to use that PGPROC structure.

Back-patch to v10 where the faulty code was added.

Discussion: https://postgr.es/m/22735.1507048202@sss.pgh.pa.us
src/backend/replication/walreceiver.c
src/backend/replication/walreceiverfuncs.c
src/include/replication/walreceiver.h