Thread: pg_standby stuck on a wal file size <16MB
I've been having problem with pgsql-8.2.5 master/slave warm standby replication setup where occasionally master node generates a wal file smaller then expected 16MB. pg_standby on slave gets stuck on such short files, and replication halts from that moment on. we have to do pg_start_backup/ rsync of data to slave / pg_stop_backup and restart slave in order to recover. database I'm replicating is write mostly. this is process on slave that is waiting on log file which is smaller in size: postgres 14277 11857 0 03:34 ? 00:00:00 sh -c /usr/local/pgsql/bin/pg_standby -d -k 255 -r 7 -s 10 -w 0 -t /usr/local/recovery_control/quit_recovery /usr/local/wal_archive 000000010000034000000020 pg_xlog/RECOVERYXLOG 2>> /usr/local/pgsql/data/standby.log postgres 14278 14277 0 03:34 ? 00:00:00 /usr/local/pgsql/bin/pg_standby -d -k 255 -r 7 -s 10 -w 0 -t /usr/local/recovery_control/quit_recovery /usr/local/wal_archive 000000010000034000000020 pg_xlog/RECOVERYXLOG here is a sample list of wal_archive directory slave, note the size of 000000010000034000000020 expected wal file is less then 16MB: ... -rw------- 1 postgres postgres 16777216 May 17 03:19 000000010000034000000017 -rw------- 1 postgres postgres 16777216 May 17 03:21 000000010000034000000018 -rw------- 1 postgres postgres 16777216 May 17 03:22 000000010000034000000019 -rw------- 1 postgres postgres 16777216 May 17 03:24 00000001000003400000001A -rw------- 1 postgres postgres 16777216 May 17 03:26 00000001000003400000001B -rw------- 1 postgres postgres 16777216 May 17 03:27 00000001000003400000001C -rw------- 1 postgres postgres 16777216 May 17 03:29 00000001000003400000001D -rw------- 1 postgres postgres 16777216 May 17 03:30 00000001000003400000001E -rw------- 1 postgres postgres 16777216 May 17 03:32 00000001000003400000001F -rw------- 1 postgres postgres 13746176 May 17 03:34 000000010000034000000020 -rw------- 1 postgres postgres 16777216 May 17 03:35 000000010000034000000021 -rw------- 1 postgres postgres 16777216 May 17 03:37 000000010000034000000022 -rw------- 1 postgres postgres 16777216 May 17 03:38 000000010000034000000023 -rw------- 1 postgres postgres 16777216 May 17 03:40 000000010000034000000024 -rw------- 1 postgres postgres 16777216 May 17 03:41 000000010000034000000025 -rw------- 1 postgres postgres 16777216 May 17 03:43 000000010000034000000026 -rw------- 1 postgres postgres 16777216 May 17 03:45 000000010000034000000027 ... skipping a bit further I see there is at least one other instance where wal file is shorter then normal 16MB: -rw------- 1 postgres postgres 16777216 May 17 05:42 00000001000003400000006F -rw------- 1 postgres postgres 16777216 May 17 05:44 000000010000034000000070 -rw------- 1 postgres postgres 16777216 May 17 05:46 000000010000034000000071 -rw------- 1 postgres postgres 16777216 May 17 05:47 000000010000034000000072 -rw------- 1 postgres postgres 16384 May 17 05:50 000000010000034000000073 -rw------- 1 postgres postgres 16777216 May 17 05:51 000000010000034000000074 -rw------- 1 postgres postgres 16777216 May 17 05:52 000000010000034000000075 -rw------- 1 postgres postgres 16777216 May 17 05:54 000000010000034000000076 why would a master node create a wal file smaller then normal checkpoint_segment size and how can this be avoided. I need a reliable replication mechanism even at a cost of longer recovery on standby. master's postgresql.conf: .. fsync = on wal_sync_method = open_sync wal_buffers = 128 checkpoint_segments = 64 archive_command = 'test ! -f /usr/local/wal_archive_local/%f && cp %p /usr/local/wal_archive_local/%f' archive files are then moved on master to standby every other minute: rsync -aq --remove-sent-files /usr/local/wal_archive_local/ slave::wal_archive/ slave's recovery.conf: restore_command = '/usr/local/pgsql/bin/pg_standby -d -k 255 -r 7 -s 10 -w 0 -t /usr/local/recovery_control/quit_recovery /usr/local/wal_archive %f %p 2>> /u sr/local/pgsql/data/standby.log' both servers are identical Dell PE1950 servers with 4 sas hd w/hardware RAID 1+0 running: 2.6.18-8.1.8.el5 #1 SMP Tue Jul 10 06:39:17 EDT 2007 x86_64 x86_64 x86_64 GNU/Linux Thanks much in advance V.
On Sun, May 18, 2008 at 12:17 AM, Vladimir Kosilov <vkosilov@gmail.com> wrote:
Any reason why you are not using rsync command in the archive_command in the first place?
gurjeet[.singh]@EnterpriseDB.com
singh.gurjeet@{ gmail | hotmail | indiatimes | yahoo }.com
EnterpriseDB http://www.enterprisedb.com
Mail sent from my BlackLaptop device
I've been having problem with pgsql-8.2.5 master/slave warm standby replication setup where occasionally master node generates a wal file smaller then expected 16MB. pg_standby on slave gets stuck on such short files, and replication halts from that moment on. we have to do pg_start_backup/ rsync of data to slave / pg_stop_backup and restart slave in order to recover. database I'm replicating is write mostly.
this is process on slave that is waiting on log file which is smaller in size:
postgres 14277 11857 0 03:34 ? 00:00:00 sh -c /usr/local/pgsql/bin/pg_standby -d -k 255 -r 7 -s 10 -w 0 -t /usr/local/recovery_control/quit_recovery /usr/local/wal_archive 000000010000034000000020 pg_xlog/RECOVERYXLOG 2>> /usr/local/pgsql/data/standby.log
postgres 14278 14277 0 03:34 ? 00:00:00 /usr/local/pgsql/bin/pg_standby -d -k 255 -r 7 -s 10 -w 0 -t /usr/local/recovery_control/quit_recovery /usr/local/wal_archive 000000010000034000000020 pg_xlog/RECOVERYXLOG
here is a sample list of wal_archive directory slave, note the size of 000000010000034000000020 expected wal file is less then 16MB:
...
-rw------- 1 postgres postgres 16777216 May 17 03:19 000000010000034000000017
-rw------- 1 postgres postgres 16777216 May 17 03:21 000000010000034000000018
-rw------- 1 postgres postgres 16777216 May 17 03:22 000000010000034000000019
-rw------- 1 postgres postgres 16777216 May 17 03:24 00000001000003400000001A
-rw------- 1 postgres postgres 16777216 May 17 03:26 00000001000003400000001B
-rw------- 1 postgres postgres 16777216 May 17 03:27 00000001000003400000001C
-rw------- 1 postgres postgres 16777216 May 17 03:29 00000001000003400000001D
-rw------- 1 postgres postgres 16777216 May 17 03:30 00000001000003400000001E
-rw------- 1 postgres postgres 16777216 May 17 03:32 00000001000003400000001F
-rw------- 1 postgres postgres 13746176 May 17 03:34 000000010000034000000020
-rw------- 1 postgres postgres 16777216 May 17 03:35 000000010000034000000021
-rw------- 1 postgres postgres 16777216 May 17 03:37 000000010000034000000022
-rw------- 1 postgres postgres 16777216 May 17 03:38 000000010000034000000023
-rw------- 1 postgres postgres 16777216 May 17 03:40 000000010000034000000024
-rw------- 1 postgres postgres 16777216 May 17 03:41 000000010000034000000025
-rw------- 1 postgres postgres 16777216 May 17 03:43 000000010000034000000026
-rw------- 1 postgres postgres 16777216 May 17 03:45 000000010000034000000027
...
skipping a bit further I see there is at least one other instance where wal file is shorter then normal 16MB:
-rw------- 1 postgres postgres 16777216 May 17 05:42 00000001000003400000006F
-rw------- 1 postgres postgres 16777216 May 17 05:44 000000010000034000000070
-rw------- 1 postgres postgres 16777216 May 17 05:46 000000010000034000000071
-rw------- 1 postgres postgres 16777216 May 17 05:47 000000010000034000000072
-rw------- 1 postgres postgres 16384 May 17 05:50 000000010000034000000073
-rw------- 1 postgres postgres 16777216 May 17 05:51 000000010000034000000074
-rw------- 1 postgres postgres 16777216 May 17 05:52 000000010000034000000075
-rw------- 1 postgres postgres 16777216 May 17 05:54 000000010000034000000076
why would a master node create a wal file smaller then normal checkpoint_segment size and how can this be avoided. I need a reliable replication mechanism even at a cost of longer recovery on standby.
master's postgresql.conf:
..
fsync = on
wal_sync_method = open_sync
wal_buffers = 128
checkpoint_segments = 64
archive_command = 'test ! -f /usr/local/wal_archive_local/%f && cp %p /usr/local/wal_archive_local/%f'
archive files are then moved on master to standby every other minute:
rsync -aq --remove-sent-files /usr/local/wal_archive_local/ slave::wal_archive/
slave's recovery.conf:
restore_command = '/usr/local/pgsql/bin/pg_standby -d -k 255 -r 7 -s 10 -w 0 -t /usr/local/recovery_control/quit_recovery /usr/local/wal_archive %f %p 2>> /u
sr/local/pgsql/data/standby.log'
both servers are identical Dell PE1950 servers with 4 sas hd w/hardware RAID 1+0 running:
2.6.18-8.1.8.el5 #1 SMP Tue Jul 10 06:39:17 EDT 2007 x86_64 x86_64 x86_64 GNU/Linux
Thanks much in advance
V.
Any reason why you are not using rsync command in the archive_command in the first place?
singh.gurjeet@{ gmail | hotmail | indiatimes | yahoo }.com
EnterpriseDB http://www.enterprisedb.com
Mail sent from my BlackLaptop device
> archive_command = 'test ! -f /usr/local/wal_archive_local/%f && cp %p > /usr/local/wal_archive_local/%f' The archive command tests if the wal segment exists and is a file, but it does not check if the file is still being written. You need to copy the file after writing has finished (it reached 16777216 bytes). I don't have sources of pg_standby near me, but I remember in the C code checks for complete segment sizes. thanks Ioannis Tambouras
Gurjeet Singh wrote: >> archive_command = 'test ! -f /usr/local/wal_archive_local/%f && cp %p >> /usr/local/wal_archive_local/%f' >> >> archive files are then moved on master to standby every other minute: >> >> rsync -aq --remove-sent-files /usr/local/wal_archive_local/ >> slave::wal_archive/ >> >> slave's recovery.conf: >> restore_command = '/usr/local/pgsql/bin/pg_standby -d -k 255 -r 7 -s 10 -w >> 0 -t /usr/local/recovery_control/quit_recovery /usr/local/wal_archive %f %p >> 2>> /u >> sr/local/pgsql/data/standby.log' >> >> both servers are identical Dell PE1950 servers with 4 sas hd w/hardware >> RAID 1+0 running: >> 2.6.18-8.1.8.el5 #1 SMP Tue Jul 10 06:39:17 EDT 2007 x86_64 x86_64 x86_64 >> GNU/Linux >> >> Thanks much in advance >> V. >> > > > Any reason why you are not using rsync command in the archive_command in the > first place? > > gurjeet[.singh]@EnterpriseDB.com > singh.gurjeet@{ gmail | hotmail | indiatimes | yahoo }.com > > EnterpriseDB http://www.enterprisedb.com > > Mail sent from my BlackLaptop device > > I wanted to remove dependency on networking between the nodes and standby availability at what seems to be a very critical moment - getting a good archive of a wal file. we tried coping to nfs mounts before with mixed results on a writes busy db and linux/nfs Thanks, V. -- ________________________________________ Vladimir (Vlad) Kosilov Senior Systems Administrator Contigo Systems Inc. 604.683.3106 (phone) 604.648.9886 (fax) vkosilov@contigo.com www.contigo.com ________________________________________
is this wrong of me to expect that postgres would not make a wal file available to archive_command unless it was completely ready to let go of that wal file? thanks, V. Ioannis Tambouras wrote: >> archive_command = 'test ! -f /usr/local/wal_archive_local/%f && cp %p >> /usr/local/wal_archive_local/%f' >> > > The archive command tests if the wal segment exists and is a file, > but it does not check if the file is still being written. You need to > copy the file after writing has finished (it reached 16777216 bytes). > I don't have sources of pg_standby near me, but I remember in the > C code checks for complete segment sizes. > > > thanks > Ioannis Tambouras > > > -- ________________________________________ Vladimir (Vlad) Kosilov Senior Systems Administrator Contigo Systems Inc. 604.683.3106 (phone) 604.648.9886 (fax) vkosilov@contigo.com www.contigo.com ________________________________________
On Sun, May 18, 2008 at 5:48 AM, Vlad Kosilov <vkosilov@contigo.com> wrote:
thats a perfectly reasonable expectation; and thats exactly how it works.
--
gurjeet[.singh]@EnterpriseDB.com
singh.gurjeet@{ gmail | hotmail | indiatimes | yahoo }.com
EnterpriseDB http://www.enterprisedb.com
Mail sent from my BlackLaptop device
is this wrong of me to expect that postgres would not make a wal file available to archive_command unless it was completely ready to let go of that wal file?
thats a perfectly reasonable expectation; and thats exactly how it works.
Ioannis Tambouras wrote:archive_command = 'test ! -f /usr/local/wal_archive_local/%f && cp %p
/usr/local/wal_archive_local/%f'
The archive command tests if the wal segment exists and is a file, but it does not check if the file is still being written. You need to copy the file after writing has finished (it reached 16777216 bytes). I don't have sources of pg_standby near me, but I remember in the
C code checks for complete segment sizes.
thanks
Ioannis Tambouras
--________________________________________
Vladimir (Vlad) Kosilov
Senior Systems Administrator
Contigo Systems Inc.
604.683.3106 (phone)
604.648.9886 (fax)
vkosilov@contigo.com
www.contigo.com
________________________________________--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
--
gurjeet[.singh]@EnterpriseDB.com
singh.gurjeet@{ gmail | hotmail | indiatimes | yahoo }.com
EnterpriseDB http://www.enterprisedb.com
Mail sent from my BlackLaptop device
On Sat, 17 May 2008, Ioannis Tambouras wrote: > The archive command tests if the wal segment exists and is a file, > but it does not check if the file is still being written. That's because it doesn't have to; the archive command doesn't get called until the writing is done. > I don't have sources of pg_standby near me, but I remember in the > C code checks for complete segment sizes. That's on the receiving side, to make sure it's not trying to process files that haven't finished copying to the standby yet. You don't have to do any of that in the archive_command. Anyway, back to the original question: > archive_command = 'test ! -f /usr/local/wal_archive_local/%f && cp %p > /usr/local/wal_archive_local/%f' > archive files are then moved on master to standby every other minute: > rsync -aq --remove-sent-files /usr/local/wal_archive_local/ > slave::wal_archive/ I don't see any mechanism here to keep rsync from copying over partial files to the standby before they've finished copying to the wal_archive_local directory. That's my guess for where the small files are coming from, rsync before the cp is done. If you're going to buffer in a transfer directory, you need some sort of test or locking to make sure the file is complete with exactly 16MB before it gets rsync'd over. I suspect no amount of poking at the standby will root out the issue because it's happening on the primary. -- * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
Greg, I think you might be onto something here. your suggestion prompts to get rid off --remove-sent-files in my rsync command, and clean up wal files from master by some other means, this way even if master's rsync picks partial file on the first run, it should update and complete file once it's finished being copied by archive_command to my buffer directory. Thanks! I'll get this in asap, if this does not solve the problem I'll re-post V. Greg Smith wrote: > On Sat, 17 May 2008, Ioannis Tambouras wrote: > >> The archive command tests if the wal segment exists and is a file, >> but it does not check if the file is still being written. > > That's because it doesn't have to; the archive command doesn't get > called until the writing is done. > >> I don't have sources of pg_standby near me, but I remember in the >> C code checks for complete segment sizes. > > That's on the receiving side, to make sure it's not trying to process > files that haven't finished copying to the standby yet. You don't > have to do any of that in the archive_command. > > Anyway, back to the original question: > >> archive_command = 'test ! -f /usr/local/wal_archive_local/%f && cp %p >> /usr/local/wal_archive_local/%f' >> archive files are then moved on master to standby every other minute: >> rsync -aq --remove-sent-files /usr/local/wal_archive_local/ >> slave::wal_archive/ > > I don't see any mechanism here to keep rsync from copying over partial > files to the standby before they've finished copying to the > wal_archive_local directory. That's my guess for where the small > files are coming from, rsync before the cp is done. If you're going > to buffer in a transfer directory, you need some sort of test or > locking to make sure the file is complete with exactly 16MB before it > gets rsync'd over. I suspect no amount of poking at the standby will > root out the issue because it's happening on the primary. > > -- > * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD > -- ________________________________________ Vladimir (Vlad) Kosilov Senior Systems Administrator Contigo Systems Inc. 604.683.3106 (phone) 604.648.9886 (fax) vkosilov@contigo.com www.contigo.com ________________________________________
as Greg pointed out: my use of rsync --remove-sent-files option had contributed to a short sized wal log file on standby. changing master's postgres crontab to the following helped to resolve the issue: # ship logs to standby: */2 * * * * rsync -aq /wal_archive_local/ 10.10.10.12::wal_archive/ # remove files older then remove_check file mtime */5 * * * * find /wal_archive_local/ ! -newer /wal_archive_local/remove_check -exec rm -f {} \; && touch /wal_archive_local/remove_check Thank you! V. Greg Smith wrote: > On Sat, 17 May 2008, Ioannis Tambouras wrote: > >> The archive command tests if the wal segment exists and is a file, >> but it does not check if the file is still being written. > > That's because it doesn't have to; the archive command doesn't get > called until the writing is done. > >> I don't have sources of pg_standby near me, but I remember in the >> C code checks for complete segment sizes. > > That's on the receiving side, to make sure it's not trying to process > files that haven't finished copying to the standby yet. You don't > have to do any of that in the archive_command. > > Anyway, back to the original question: > >> archive_command = 'test ! -f /usr/local/wal_archive_local/%f && cp %p >> /usr/local/wal_archive_local/%f' >> archive files are then moved on master to standby every other minute: >> rsync -aq --remove-sent-files /usr/local/wal_archive_local/ >> slave::wal_archive/ > > I don't see any mechanism here to keep rsync from copying over partial > files to the standby before they've finished copying to the > wal_archive_local directory. That's my guess for where the small > files are coming from, rsync before the cp is done. If you're going > to buffer in a transfer directory, you need some sort of test or > locking to make sure the file is complete with exactly 16MB before it > gets rsync'd over. I suspect no amount of poking at the standby will > root out the issue because it's happening on the primary. > > -- > * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD > -- ________________________________________ Vladimir (Vlad) Kosilov Senior Systems Administrator Contigo Systems Inc. 604.683.3106 (phone) 604.648.9886 (fax) vkosilov@contigo.com www.contigo.com ________________________________________
On Wed, May 21, 2008 at 4:14 AM, Vlad Kosilov <vkosilov@contigo.com> wrote:
Hmmm, nice trick.
--
gurjeet[.singh]@EnterpriseDB.com
singh.gurjeet@{ gmail | hotmail | indiatimes | yahoo }.com
EnterpriseDB http://www.enterprisedb.com
Mail sent from my BlackLaptop device
as Greg pointed out: my use of rsync --remove-sent-files option had contributed to a short sized wal log file on standby.
changing master's postgres crontab to the following helped to resolve the issue:
# ship logs to standby:
*/2 * * * * rsync -aq /wal_archive_local/ 10.10.10.12::wal_archive/
# remove files older then remove_check file mtime
*/5 * * * * find /wal_archive_local/ ! -newer /wal_archive_local/remove_check -exec rm -f {} \; && touch /wal_archive_local/remove_check
Hmmm, nice trick.
--
gurjeet[.singh]@EnterpriseDB.com
singh.gurjeet@{ gmail | hotmail | indiatimes | yahoo }.com
EnterpriseDB http://www.enterprisedb.com
Mail sent from my BlackLaptop device