In caught-up logical walsender, sleep only in WalSndWaitForWal().
authorNoah Misch <noah@leadboat.com>
Sat, 25 Apr 2020 17:18:12 +0000 (10:18 -0700)
committerNoah Misch <noah@leadboat.com>
Sat, 25 Apr 2020 17:18:12 +0000 (10:18 -0700)
Before sleeping, WalSndWaitForWal() sends a keepalive if MyWalSnd->write
< sentPtr.  When the latest physical LSN yields no logical replication
messages (a common case), that keepalive elicits a reply.  Processing
the reply updates pg_stat_replication.replay_lsn.  WalSndLoop() lacks
that; when WalSndLoop() slept, replay_lsn advancement could stall until
wal_receiver_status_interval elapsed.  This sometimes stalled
src/test/subscription/t/001_rep_changes.pl for up to 10s.

Reviewed by Fujii Masao and Michael Paquier.

Discussion: https://postgr.es/m/20200418070142.GA1075445@rfd.leadboat.com

src/backend/replication/walsender.c

index 0813c830269488034eca5a714210c4ecd14860f8..6c87f6087ced287652436d564a5c69480ef06628 100644 (file)
@@ -1428,8 +1428,10 @@ WalSndWaitForWal(XLogRecPtr loc)
        /*
         * We only send regular messages to the client for full decoded
         * transactions, but a synchronous replication and walsender shutdown
-        * possibly are waiting for a later location. So we send pings
-        * containing the flush location every now and then.
+        * possibly are waiting for a later location. So, before sleeping, we
+        * send a ping containing the flush location. If the receiver is
+        * otherwise idle, this keepalive will trigger a reply. Processing the
+        * reply will update these MyWalSnd locations.
         */
        if (MyWalSnd->flush < sentPtr &&
            MyWalSnd->write < sentPtr &&
@@ -2314,14 +2316,14 @@ WalSndLoop(WalSndSendDataCallback send_data)
        WalSndKeepaliveIfNecessary();
 
        /*
-        * We don't block if not caught up, unless there is unsent data
-        * pending in which case we'd better block until the socket is
-        * write-ready.  This test is only needed for the case where the
-        * send_data callback handled a subset of the available data but then
-        * pq_flush_if_writable flushed it all --- we should immediately try
-        * to send more.
+        * Block if we have unsent data.  XXX For logical replication, let
+        * WalSndWaitForWal() handle any other blocking; idle receivers need
+        * its additional actions.  For physical replication, also block if
+        * caught up; its send_data does not block.
         */
-       if ((WalSndCaughtUp && !streamingDoneSending) || pq_is_send_pending())
+       if ((WalSndCaughtUp && send_data != XLogSendLogical &&
+            !streamingDoneSending) ||
+           pq_is_send_pending())
        {
            long        sleeptime;
            int         wakeEvents;