It is reported that pgpool child segfaulted [1].
[snip]
In the down thread it is reported that despite VALID_BACKEND(i)
returns true, backend->slots[i] is NULL, which should have been filled
by new_connection().
It seems there's a race condition. In new_connection(), there's a code
fragment:
/*
* Make sure that the global backend status in the shared memory
* agrees the local status checked by VALID_BACKEND. It is possible
* that the local status is up, while the global status has been
* changed to down by failover.
*/
A--> if (BACKEND_INFO(i).backend_status != CON_UP &&
BACKEND_INFO(i).backend_status != CON_CONNECT_WAIT)
{
ereport(DEBUG1,
(errmsg("creating new connection to backend"),
errdetail("skipping backend slot %d because global backend_status = %d",
i, BACKEND_INFO(i).backend_status)));
/* sync local status with global status */
B--> *(my_backend_status[i]) = BACKEND_INFO(i).backend_status;
continue;
}
It is possible that at A backend_status in the shared memory is down
but by the time it reaches B the status has been changed to up. And
new_connection() skipped to create a backend connection. This seems to
explain why the connection slot is NULL while VALID_BACKEND returns
true. To prevent the race condtion, backend_status in shared memory is
copied to a local variable and evaluate it. Also the VALID_BACKEND
just before:
pool_set_db_node_id(CONNECTION(backend, i), i);
is changed to:
if (VALID_BACKEND(i) && CONNECTION_SLOT(backend, i))
so that it prevents crash just in case.
[1] [pgpool-general: 9104] Another segmentation fault
for (i = 0; i < NUM_BACKENDS; i++)
{
- if (VALID_BACKEND(i))
+ if (VALID_BACKEND(i) && CONNECTION_SLOT(backend, i))
{
-
/* set DB node id */
pool_set_db_node_id(CONNECTION(backend, i), i);
int active_backend_count = 0;
int i;
bool status_changed = false;
+ volatile BACKEND_STATUS status;
MemoryContext oldContext = MemoryContextSwitchTo(TopMemoryContext);
* that the local status is up, while the global status has been
* changed to down by failover.
*/
- if (BACKEND_INFO(i).backend_status != CON_UP &&
- BACKEND_INFO(i).backend_status != CON_CONNECT_WAIT)
+ status = BACKEND_INFO(i).backend_status;
+ if (status != CON_UP && status != CON_CONNECT_WAIT)
{
ereport(DEBUG1,
(errmsg("creating new connection to backend"),
i, BACKEND_INFO(i).backend_status)));
/* sync local status with global status */
- *(my_backend_status[i]) = BACKEND_INFO(i).backend_status;
+ *(my_backend_status[i]) = status;
continue;
}