Poll postmaster less frequently in recovery.
authorThomas Munro <tmunro@postgresql.org>
Fri, 12 Mar 2021 06:08:52 +0000 (19:08 +1300)
committerThomas Munro <tmunro@postgresql.org>
Fri, 12 Mar 2021 06:45:42 +0000 (19:45 +1300)
Since commits 9f095299 and f98b8476 we don't poll the postmaster
pipe at all during crash recovery on Linux and FreeBSD, but on other
operating systems we were still doing it for every WAL record.  Do it
less frequently on operating systems where system calls are required, at
the cost of delaying exit a bit after postmaster death.  This avoids
expensive system calls reported to slow down CPU-bound recovery by as
much as 10-30%.

Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Reviewed-by: Fujii Masao <masao.fujii@oss.nttdata.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/CA%2BhUKGK1607VmtrDUHQXrsooU%3Dap4g4R2yaoByWOOA3m8xevUQ%40mail.gmail.com
Discussion: https://postgr.es/m/7261eb39-0369-f2f4-1bb5-62f3b6083b5e@iki.fi

src/backend/postmaster/startup.c

index f781fdc6fcab50e5a1a7e48dc54de7d0abbfe259..22135d5e0776e4605259bd14a9b8dea966340d94 100644 (file)
 #include "utils/timeout.h"
 
 
+#ifndef USE_POSTMASTER_DEATH_SIGNAL
+/*
+ * On systems that need to make a system call to find out if the postmaster has
+ * gone away, we'll do so only every Nth call to HandleStartupProcInterrupts().
+ * This only affects how long it takes us to detect the condition while we're
+ * busy replaying WAL.  Latch waits and similar which should react immediately
+ * through the usual techniques.
+ */
+#define POSTMASTER_POLL_RATE_LIMIT 1024
+#endif
+
 /*
  * Flags set by interrupt handlers for later service in the redo loop.
  */
@@ -134,6 +145,10 @@ StartupRereadConfig(void)
 void
 HandleStartupProcInterrupts(void)
 {
+#ifdef POSTMASTER_POLL_RATE_LIMIT
+   static uint32 postmaster_poll_count = 0;
+#endif
+
    /*
     * Process any requests or signals received recently.
     */
@@ -151,9 +166,15 @@ HandleStartupProcInterrupts(void)
 
    /*
     * Emergency bailout if postmaster has died.  This is to avoid the
-    * necessity for manual cleanup of all postmaster children.
+    * necessity for manual cleanup of all postmaster children.  Do this less
+    * frequently on systems for which we don't have signals to make that
+    * cheap.
     */
-   if (IsUnderPostmaster && !PostmasterIsAlive())
+   if (IsUnderPostmaster &&
+#ifdef POSTMASTER_POLL_RATE_LIMIT
+       postmaster_poll_count++ % POSTMASTER_POLL_RATE_LIMIT == 0 &&
+#endif
+       !PostmasterIsAlive())
        exit(1);
 
    /* Process barrier events */