Fix status reporting for terminated bgworkers that were never started.
authorRobert Haas <rhaas@postgresql.org>
Thu, 19 Mar 2015 14:56:34 +0000 (10:56 -0400)
committerRobert Haas <rhaas@postgresql.org>
Thu, 19 Mar 2015 15:04:09 +0000 (11:04 -0400)
Previously, GetBackgroundWorkerPid() would return BGWH_NOT_YET_STARTED
if the slot used for the worker registration had not been reused by
unrelated activity, and BGWH_STOPPED if it had.  Either way, a process
that had requested notification when the state of one of its
background workers changed did not receive such notifications.  Fix
things so that GetBackgroundWorkerPid() always returns BGWH_STOPPED in
this situation, so that we do not erroneously give waiters the
impression that the worker will eventually be started; and send
notifications just as we would if the process terminated after having
been started, so that it's possible to wait for the postmaster to
process a worker termination request without polling.

Discovered by Amit Kapila during testing of parallel sequential scan.
Analysis and fix by me.  Back-patch to 9.4; there may not be anyone
relying on this interface yet, but if anyone is, the new behavior is a
clear improvement.

src/backend/postmaster/bgworker.c

index 267b91632712b9e7cfd00bd2bc94858fbcce7495..cf7524f4921ef372b0370ddb3d5d811f3e761635 100644 (file)
@@ -244,14 +244,37 @@ BackgroundWorkerStateChange(void)
                rw->rw_terminate = true;
                if (rw->rw_pid != 0)
                    kill(rw->rw_pid, SIGTERM);
+               else
+               {
+                   /* Report never-started, now-terminated worker as dead. */
+                   ReportBackgroundWorkerPID(rw);
+               }
            }
            continue;
        }
 
-       /* If it's already flagged as do not restart, just release the slot. */
+       /*
+        * If the worker is marked for termination, we don't need to add it
+        * to the registered workers list; we can just free the slot.
+        * However, if bgw_notify_pid is set, the process that registered the
+        * worker may need to know that we've processed the terminate request,
+        * so be sure to signal it.
+        */
        if (slot->terminate)
        {
+           int notify_pid;
+
+           /*
+            * We need a memory barrier here to make sure that the load of
+            * bgw_notify_pid completes before the store to in_use.
+            */
+           notify_pid = slot->worker.bgw_notify_pid;
+           pg_memory_barrier();
+           slot->pid = 0;
            slot->in_use = false;
+           if (notify_pid != 0)
+               kill(notify_pid, SIGUSR1);
+
            continue;
        }