服务器时间发生跳变导致hghac中对应主机状态频繁切换为crash或stop

环境

系统平台:N/A
版本:N/A

症状

集群状态:

[root@bthbj-hgywsjkjq-ip28-cen76 ~]# hghactl list

  • Cluster: highgo-ee-cluster —±-------------±--------±—±----------+

| Member | Host | Role | State | TL | Lag in MB |

±---------±-------------------±-------------±--------±—±----------+

| pgsql_1 | x.x.100.1:5866 | leader | running | 84 | |

| pgsql_2 | x.x.100.2:5866 | crash | running | 84 | 0 |

±---------±-------------------±-------------±--------±—±----------+

注意:虽然状态显示为crash,但是实际状态并未停止,仍然是running,数据库正常访问。

问题原因

检查hghac日志:

2020-04-03 14:40:35,434 INFO: Lock owner: None; I am pgsql_2

2020-04-03 14:40:35,524 INFO: PAUSE: postgres is not running

2020-04-03 14:40:45,433 INFO: Process 77888 is not postmaster, too much difference between PID file start time 1565058216.95 and process start time 1565058213

2020-04-03 14:40:45,434 INFO: Process 77888 is not postmaster, too much difference between PID file start time 1565058216.95 and process start time 1565058213

2020-04-03 14:40:45,434 WARNING: Postgresql is not running.

2020-04-03 14:40:45,434 INFO: Lock owner: None; I am pgsql_2

2020-04-03 14:40:45,523 INFO: PAUSE: postgres is not running

2020-04-03 14:40:55,435 INFO: Process 77888 is not postmaster, too much difference between PID file start time 1565058216.95 and process start time 1565058213

2020-04-03 14:40:55,435 INFO: Process 77888 is not postmaster, too much difference between PID file start time 1565058216.95 and process start time 1565058213

2020-04-03 14:40:55,435 WARNING: Postgresql is not running.

原因:

由于服务器时间发生跳变,则进程在操作系统中的启动时间戳也会发生改变,不再与$PGDATA/postmaster.pid文件中记录的时间相同,

hghac(patroni)会认为有集群以外其他的postmaster正在该$PGDATA运行,并尝试停止postmaster进程,但是由于操作系统pid一致,未能成功停止该进程,所以会不断切换节点状态。

解决方案

解决方案:

  1. (稳妥推荐)条件允许情况下,在问题节点手动重启集群服务和数据库服务,让数据库的postmaster.pid中记录的时间戳更新。或者:

  2. 手动修改postmaster.pid,修改:

[root@localhost data]# cat postmaster.pid

77888

/opt/highgo/hgdb-see-4.5.8/data

1565058213 <—修改为hghac日志中的记录时间,

5866

/tmp
*
5866001 48

ready

注意:一些版本中,hghac日志会把进程时间和pid创建时间位置反了,所以将postmaster.pid的时间改为日志中另一个时间,让两个时间相同即可。

相关文档

报错编码

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值