Reword overly-optimistic comment about backup checksum verification.
authorRobert Haas <rhaas@postgresql.org>
Mon, 6 Mar 2023 15:35:15 +0000 (10:35 -0500)
committerRobert Haas <rhaas@postgresql.org>
Mon, 6 Mar 2023 15:35:15 +0000 (10:35 -0500)
The comment implies that a single retry is sufficient to avoid
spurious checksum failures, but in fact no number of retries is
sufficient for that purpose. Update the comment accordingly.

Patch by me, reviewed by Michael Paquier.

Discussion: http://postgr.es/m/CA+TgmoZ_fFAoU6mrHt9QBs+dcYhN6yXenGTTMRebZNhtwPwHyg@mail.gmail.com

src/backend/backup/basebackup.c

index 6547e37d125a53d9cc66d3780a7835fd666fba88..6efdefb591776d6d707f4bc86dd13dfc93325482 100644 (file)
@@ -1602,11 +1602,21 @@ sendFile(bbsink *sink, const char *readfilename, const char *tarfilename,
                                                 * Retry the block on the first failure.  It's
                                                 * possible that we read the first 4K page of the
                                                 * block just before postgres updated the entire block
-                                                * so it ends up looking torn to us.  We only need to
-                                                * retry once because the LSN should be updated to
-                                                * something we can ignore on the next pass.  If the
-                                                * error happens again then it is a true validation
-                                                * failure.
+                                                * so it ends up looking torn to us. If, before we
+                                                * retry the read, the concurrent write of the block
+                                                * finishes, the page LSN will be updated and we'll
+                                                * realize that we should ignore this block.
+                                                *
+                                                * There's no guarantee that this will actually
+                                                * happen, though: the torn write could take an
+                                                * arbitrarily long time to complete. Retrying multiple
+                                                * times wouldn't fix this problem, either, though
+                                                * it would reduce the chances of it happening in
+                                                * practice. The only real fix here seems to be to
+                                                * have some kind of interlock that allows us to wait
+                                                * until we can be certain that no write to the block
+                                                * is in progress. Since we don't have any such thing
+                                                * right now, we just do this and hope for the best.
                                                 */
                                                if (block_retry == false)
                                                {