diff options
author | Michael Paquier | 2025-04-11 01:00:21 +0000 |
---|---|---|
committer | Michael Paquier | 2025-04-11 01:00:21 +0000 |
commit | 2e57790836c636d579871b13de13850e013f511c (patch) | |
tree | 751d19115c1908fd886a252c4081d75ea133a3f2 /src/include/replication | |
parent | 530050d8d2850d0453bb56e2bfa7cae216ee8a18 (diff) |
Fix race with synchronous_standby_names at startup
synchronous_standby_names cannot be reloaded safely by backends, and the
checkpointer is in charge of updating a state in shared memory if the
GUC is enabled in WalSndCtl, to let the backends know if they should
wait or not for a given LSN. This provides a strict control on the
timing of the waiting queues if the GUC is enabled or disabled, then
reloaded. The checkpointer is also in charge of waking up the backends
that could be waiting for a LSN when the GUC is disabled.
This logic had a race condition at startup, where it would be possible
for backends to not wait for a LSN even if synchronous_standby_names is
enabled. This would cause visibility issues with transactions that we
should be waiting for but they were not. The problem lasts until the
checkpointer does its initial update of the shared memory state when it
loads synchronous_standby_names.
In order to take care of this problem, the shared memory state in
WalSndCtl is extended to detect if it has been initialized by the
checkpointer, and not only check if synchronous_standby_names is
defined. In WalSndCtlData, sync_standbys_defined is renamed to
sync_standbys_status, a bits8 able to know about two states:
- If the shared memory state has been initialized. This flag is set by
the checkpointer at startup once, and never removed.
- If synchronous_standby_names is known as defined in the shared memory
state. This is the same as the previous sync_standbys_defined in
WalSndCtl.
This method gives a way for backends to decide what they should do until
the shared memory area is initialized, and they now ultimately fall back
to a check on the GUC value in this case, which is the best thing that
can be done.
Fortunately, SyncRepUpdateSyncStandbysDefined() is called immediately by
the checkpointer when this process starts, so the window is very narrow.
It is possible to enlarge the problematic window by making the
checkpointer wait at the beginning of SyncRepUpdateSyncStandbysDefined()
with a hardcoded sleep for example, and doing so has showed that a 2PC
visibility test is indeed failing. On machines slow enough, this bug
would cause spurious failures.
In 17~, we have looked at the possibility of adding an injection point
to have a reproducible test, but as the problematic window happens at
early startup, we would need to invent a way to make an injection point
optionally persistent across restarts when attached, something that
would be fine for this case as it would involve the checkpointer. This
issue is quite old, and can be reproduced on all the stable branches.
Author: Melnikov Maksim <m.melnikov@postgrespro.ru>
Co-authored-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/163fcbec-900b-4b07-beaa-d2ead8634bec@postgrespro.ru
Backpatch-through: 13
Diffstat (limited to 'src/include/replication')
-rw-r--r-- | src/include/replication/walsender_private.h | 23 |
1 files changed, 19 insertions, 4 deletions
diff --git a/src/include/replication/walsender_private.h b/src/include/replication/walsender_private.h index 0fc77f1b4af..e98701038f5 100644 --- a/src/include/replication/walsender_private.h +++ b/src/include/replication/walsender_private.h @@ -96,11 +96,11 @@ typedef struct XLogRecPtr lsn[NUM_SYNC_REP_WAIT_MODE]; /* - * Are any sync standbys defined? Waiting backends can't reload the - * config file safely, so checkpointer updates this value as needed. - * Protected by SyncRepLock. + * Status of data related to the synchronous standbys. Waiting backends + * can't reload the config file safely, so checkpointer updates this value + * as needed. Protected by SyncRepLock. */ - bool sync_standbys_defined; + bits8 sync_standbys_status; /* used as a registry of physical / logical walsenders to wake */ ConditionVariable wal_flush_cv; @@ -116,6 +116,21 @@ typedef struct WalSnd walsnds[FLEXIBLE_ARRAY_MEMBER]; } WalSndCtlData; +/* Flags for WalSndCtlData->sync_standbys_status */ + +/* + * Is the synchronous standby data initialized from the GUC? This is set the + * first time synchronous_standby_names is processed by the checkpointer. + */ +#define SYNC_STANDBY_INIT (1 << 0) + +/* + * Is the synchronous standby data defined? This is set when + * synchronous_standby_names has some data, after being processed by the + * checkpointer. + */ +#define SYNC_STANDBY_DEFINED (1 << 1) + extern PGDLLIMPORT WalSndCtlData *WalSndCtl; |