Thread: pg_standby stuck on a wal file size <16MB

pg_standby stuck on a wal file size <16MB

From

Vladimir Kosilov

Date:

17 May 2008, 15:47:40

I've been having problem with pgsql-8.2.5 master/slave warm standby
replication setup where occasionally master node generates a wal file
smaller then expected 16MB. pg_standby on slave gets stuck on such short
files, and replication halts from that moment on. we have to do
pg_start_backup/ rsync of data to slave / pg_stop_backup and restart
slave in order to recover. database I'm replicating is write mostly.

this is process on slave that is waiting on log file which is smaller in
size:
postgres 14277 11857  0 03:34 ?        00:00:00 sh -c
/usr/local/pgsql/bin/pg_standby -d -k 255 -r 7 -s 10 -w 0 -t
/usr/local/recovery_control/quit_recovery /usr/local/wal_archive
000000010000034000000020 pg_xlog/RECOVERYXLOG 2>>
/usr/local/pgsql/data/standby.log
postgres 14278 14277  0 03:34 ?        00:00:00
/usr/local/pgsql/bin/pg_standby -d -k 255 -r 7 -s 10 -w 0 -t
/usr/local/recovery_control/quit_recovery /usr/local/wal_archive
000000010000034000000020 pg_xlog/RECOVERYXLOG

here is a sample list of wal_archive directory slave, note the size of
000000010000034000000020 expected wal file is less then 16MB:

...
-rw------- 1 postgres postgres 16777216 May 17 03:19
000000010000034000000017
-rw------- 1 postgres postgres 16777216 May 17 03:21
000000010000034000000018
-rw------- 1 postgres postgres 16777216 May 17 03:22
000000010000034000000019
-rw------- 1 postgres postgres 16777216 May 17 03:24
00000001000003400000001A
-rw------- 1 postgres postgres 16777216 May 17 03:26
00000001000003400000001B
-rw------- 1 postgres postgres 16777216 May 17 03:27
00000001000003400000001C
-rw------- 1 postgres postgres 16777216 May 17 03:29
00000001000003400000001D
-rw------- 1 postgres postgres 16777216 May 17 03:30
00000001000003400000001E
-rw------- 1 postgres postgres 16777216 May 17 03:32
00000001000003400000001F

-rw------- 1 postgres postgres 13746176 May 17 03:34
000000010000034000000020

-rw------- 1 postgres postgres 16777216 May 17 03:35
000000010000034000000021
-rw------- 1 postgres postgres 16777216 May 17 03:37
000000010000034000000022
-rw------- 1 postgres postgres 16777216 May 17 03:38
000000010000034000000023
-rw------- 1 postgres postgres 16777216 May 17 03:40
000000010000034000000024
-rw------- 1 postgres postgres 16777216 May 17 03:41
000000010000034000000025
-rw------- 1 postgres postgres 16777216 May 17 03:43
000000010000034000000026
-rw------- 1 postgres postgres 16777216 May 17 03:45
000000010000034000000027
...

skipping a bit further I see there is at least one other instance where
wal file is shorter then normal 16MB:

-rw------- 1 postgres postgres 16777216 May 17 05:42
00000001000003400000006F
-rw------- 1 postgres postgres 16777216 May 17 05:44
000000010000034000000070
-rw------- 1 postgres postgres 16777216 May 17 05:46
000000010000034000000071
-rw------- 1 postgres postgres 16777216 May 17 05:47
000000010000034000000072

-rw------- 1 postgres postgres    16384 May 17 05:50
000000010000034000000073

-rw------- 1 postgres postgres 16777216 May 17 05:51
000000010000034000000074
-rw------- 1 postgres postgres 16777216 May 17 05:52
000000010000034000000075
-rw------- 1 postgres postgres 16777216 May 17 05:54
000000010000034000000076

why would a master node create a wal file smaller then normal
checkpoint_segment size and how can this be avoided. I need a reliable
replication mechanism even at a cost of longer recovery on standby.

master's postgresql.conf:
..
fsync = on
wal_sync_method = open_sync
wal_buffers = 128
checkpoint_segments = 64
archive_command = 'test ! -f /usr/local/wal_archive_local/%f && cp %p
/usr/local/wal_archive_local/%f'

archive files are then moved  on master to standby every other minute:

rsync -aq --remove-sent-files /usr/local/wal_archive_local/
slave::wal_archive/

slave's recovery.conf:
restore_command = '/usr/local/pgsql/bin/pg_standby -d -k 255 -r 7 -s 10
-w 0 -t /usr/local/recovery_control/quit_recovery /usr/local/wal_archive
%f %p 2>> /u
sr/local/pgsql/data/standby.log'

both servers are identical Dell PE1950 servers with 4 sas hd w/hardware
RAID 1+0 running:
2.6.18-8.1.8.el5 #1 SMP Tue Jul 10 06:39:17 EDT 2007 x86_64 x86_64
x86_64 GNU/Linux

Thanks much in advance
V.

Re: pg_standby stuck on a wal file size <16MB

From

"Gurjeet Singh"

Date:

17 May 2008, 16:07:19

On Sun, May 18, 2008 at 12:17 AM, Vladimir Kosilov <vkosilov@gmail.com> wrote:

I've been having problem with pgsql-8.2.5 master/slave warm standby replication setup where occasionally master node generates a wal file smaller then expected 16MB. pg_standby on slave gets stuck on such short files, and replication halts from that moment on. we have to do pg_start_backup/ rsync of data to slave / pg_stop_backup and restart slave in order to recover. database I'm replicating is write mostly.

this is process on slave that is waiting on log file which is smaller in size:
postgres 14277 11857 0 03:34 ? 00:00:00 sh -c /usr/local/pgsql/bin/pg_standby -d -k 255 -r 7 -s 10 -w 0 -t /usr/local/recovery_control/quit_recovery /usr/local/wal_archive 000000010000034000000020 pg_xlog/RECOVERYXLOG 2>> /usr/local/pgsql/data/standby.log
postgres 14278 14277 0 03:34 ? 00:00:00 /usr/local/pgsql/bin/pg_standby -d -k 255 -r 7 -s 10 -w 0 -t /usr/local/recovery_control/quit_recovery /usr/local/wal_archive 000000010000034000000020 pg_xlog/RECOVERYXLOG

here is a sample list of wal_archive directory slave, note the size of 000000010000034000000020 expected wal file is less then 16MB:

...
-rw------- 1 postgres postgres 16777216 May 17 03:19 000000010000034000000017
-rw------- 1 postgres postgres 16777216 May 17 03:21 000000010000034000000018
-rw------- 1 postgres postgres 16777216 May 17 03:22 000000010000034000000019
-rw------- 1 postgres postgres 16777216 May 17 03:24 00000001000003400000001A
-rw------- 1 postgres postgres 16777216 May 17 03:26 00000001000003400000001B
-rw------- 1 postgres postgres 16777216 May 17 03:27 00000001000003400000001C
-rw------- 1 postgres postgres 16777216 May 17 03:29 00000001000003400000001D
-rw------- 1 postgres postgres 16777216 May 17 03:30 00000001000003400000001E
-rw------- 1 postgres postgres 16777216 May 17 03:32 00000001000003400000001F

-rw------- 1 postgres postgres 13746176 May 17 03:34 000000010000034000000020

-rw------- 1 postgres postgres 16777216 May 17 03:35 000000010000034000000021
-rw------- 1 postgres postgres 16777216 May 17 03:37 000000010000034000000022
-rw------- 1 postgres postgres 16777216 May 17 03:38 000000010000034000000023
-rw------- 1 postgres postgres 16777216 May 17 03:40 000000010000034000000024
-rw------- 1 postgres postgres 16777216 May 17 03:41 000000010000034000000025
-rw------- 1 postgres postgres 16777216 May 17 03:43 000000010000034000000026
-rw------- 1 postgres postgres 16777216 May 17 03:45 000000010000034000000027
...

skipping a bit further I see there is at least one other instance where wal file is shorter then normal 16MB:

-rw------- 1 postgres postgres 16777216 May 17 05:42 00000001000003400000006F
-rw------- 1 postgres postgres 16777216 May 17 05:44 000000010000034000000070
-rw------- 1 postgres postgres 16777216 May 17 05:46 000000010000034000000071
-rw------- 1 postgres postgres 16777216 May 17 05:47 000000010000034000000072

-rw------- 1 postgres postgres 16384 May 17 05:50 000000010000034000000073

-rw------- 1 postgres postgres 16777216 May 17 05:51 000000010000034000000074
-rw------- 1 postgres postgres 16777216 May 17 05:52 000000010000034000000075
-rw------- 1 postgres postgres 16777216 May 17 05:54 000000010000034000000076

why would a master node create a wal file smaller then normal checkpoint_segment size and how can this be avoided. I need a reliable replication mechanism even at a cost of longer recovery on standby.

master's postgresql.conf:
..
fsync = on
wal_sync_method = open_sync
wal_buffers = 128
checkpoint_segments = 64
archive_command = 'test ! -f /usr/local/wal_archive_local/%f && cp %p /usr/local/wal_archive_local/%f'

archive files are then moved on master to standby every other minute:

rsync -aq --remove-sent-files /usr/local/wal_archive_local/ slave::wal_archive/

slave's recovery.conf:
restore_command = '/usr/local/pgsql/bin/pg_standby -d -k 255 -r 7 -s 10 -w 0 -t /usr/local/recovery_control/quit_recovery /usr/local/wal_archive %f %p 2>> /u
sr/local/pgsql/data/standby.log'

both servers are identical Dell PE1950 servers with 4 sas hd w/hardware RAID 1+0 running:
2.6.18-8.1.8.el5 #1 SMP Tue Jul 10 06:39:17 EDT 2007 x86_64 x86_64 x86_64 GNU/Linux

Thanks much in advance
V.

Any reason why you are not using rsync command in the archive_command in the first place?

gurjeet[.singh]@EnterpriseDB.com
singh.gurjeet@{ gmail | hotmail | indiatimes | yahoo }.com

EnterpriseDB http://www.enterprisedb.com

Mail sent from my BlackLaptop device

Re: pg_standby stuck on a wal file size <16MB

From

Ioannis Tambouras

Date:

17 May 2008, 20:07:28

>   archive_command = 'test ! -f /usr/local/wal_archive_local/%f && cp %p
>   /usr/local/wal_archive_local/%f'

 The archive command tests if the wal segment exists and is a file,
but it does not check if the file is still being written. You need to
copy the file after writing has finished (it reached 16777216 bytes).
I don't have sources of pg_standby near me, but I remember in the
C code checks for complete segment sizes.


thanks
Ioannis Tambouras

Re: pg_standby stuck on a wal file size <16MB

From

Vlad Kosilov

Date:

17 May 2008, 21:12:13

Gurjeet Singh wrote:
>> archive_command = 'test ! -f /usr/local/wal_archive_local/%f && cp %p
>> /usr/local/wal_archive_local/%f'
>>
>> archive files are then moved  on master to standby every other minute:
>>
>> rsync -aq --remove-sent-files /usr/local/wal_archive_local/
>> slave::wal_archive/
>>
>> slave's recovery.conf:
>> restore_command = '/usr/local/pgsql/bin/pg_standby -d -k 255 -r 7 -s 10 -w
>> 0 -t /usr/local/recovery_control/quit_recovery /usr/local/wal_archive %f %p
>> 2>> /u
>> sr/local/pgsql/data/standby.log'
>>
>> both servers are identical Dell PE1950 servers with 4 sas hd w/hardware
>> RAID 1+0 running:
>> 2.6.18-8.1.8.el5 #1 SMP Tue Jul 10 06:39:17 EDT 2007 x86_64 x86_64 x86_64
>> GNU/Linux
>>
>> Thanks much in advance
>> V.
>>
>
>
> Any reason why you are not using rsync command in the archive_command in the
> first place?
>
> gurjeet[.singh]@EnterpriseDB.com
> singh.gurjeet@{ gmail | hotmail | indiatimes | yahoo }.com
>
> EnterpriseDB http://www.enterprisedb.com
>
> Mail sent from my BlackLaptop device
>
>
I wanted to remove dependency on networking between the nodes and
standby availability at what seems to be a very critical moment -
getting a good archive of a wal file. we tried coping to nfs mounts
before with mixed results on a writes busy db and linux/nfs

Thanks,
V.

--
________________________________________
Vladimir (Vlad) Kosilov
Senior Systems Administrator
Contigo Systems Inc.
604.683.3106 (phone)
604.648.9886 (fax)
vkosilov@contigo.com
www.contigo.com
________________________________________

Re: pg_standby stuck on a wal file size <16MB

From

Vlad Kosilov

Date:

17 May 2008, 21:43:16

is this wrong of me to expect that postgres would not make a wal file
available to archive_command unless it was completely ready to let go of
that wal file?
thanks,
V.
Ioannis Tambouras wrote:
>>   archive_command = 'test ! -f /usr/local/wal_archive_local/%f && cp %p
>>   /usr/local/wal_archive_local/%f'
>>
>
>  The archive command tests if the wal segment exists and is a file,
> but it does not check if the file is still being written. You need to
> copy the file after writing has finished (it reached 16777216 bytes).
> I don't have sources of pg_standby near me, but I remember in the
> C code checks for complete segment sizes.
>
>
> thanks
> Ioannis Tambouras
>
>
>


--
________________________________________
Vladimir (Vlad) Kosilov
Senior Systems Administrator
Contigo Systems Inc.
604.683.3106 (phone)
604.648.9886 (fax)
vkosilov@contigo.com
www.contigo.com
________________________________________

Re: pg_standby stuck on a wal file size <16MB

From

"Gurjeet Singh"

Date:

17 May 2008, 23:39:52

On Sun, May 18, 2008 at 5:48 AM, Vlad Kosilov <vkosilov@contigo.com> wrote:

is this wrong of me to expect that postgres would not make a wal file available to archive_command unless it was completely ready to let go of that wal file?

thats a perfectly reasonable expectation; and thats exactly how it works.

Ioannis Tambouras wrote:
archive_command = 'test ! -f /usr/local/wal_archive_local/%f && cp %p
/usr/local/wal_archive_local/%f'

The archive command tests if the wal segment exists and is a file, but it does not check if the file is still being written. You need to copy the file after writing has finished (it reached 16777216 bytes). I don't have sources of pg_standby near me, but I remember in the
C code checks for complete segment sizes.

thanks
Ioannis Tambouras

--
________________________________________
Vladimir (Vlad) Kosilov
Senior Systems Administrator
Contigo Systems Inc.
604.683.3106 (phone)
604.648.9886 (fax)
vkosilov@contigo.com
www.contigo.com
________________________________________

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

--
gurjeet[.singh]@EnterpriseDB.com
singh.gurjeet@{ gmail | hotmail | indiatimes | yahoo }.com

EnterpriseDB http://www.enterprisedb.com

Mail sent from my BlackLaptop device

Re: pg_standby stuck on a wal file size <16MB

From

Greg Smith

Date:

18 May 2008, 01:38:04

On Sat, 17 May 2008, Ioannis Tambouras wrote:

> The archive command tests if the wal segment exists and is a file,
> but it does not check if the file is still being written.

That's because it doesn't have to; the archive command doesn't get called
until the writing is done.

> I don't have sources of pg_standby near me, but I remember in the
> C code checks for complete segment sizes.

That's on the receiving side, to make sure it's not trying to process
files that haven't finished copying to the standby yet.  You don't have to
do any of that in the archive_command.

Anyway, back to the original question:

> archive_command = 'test ! -f /usr/local/wal_archive_local/%f && cp %p
> /usr/local/wal_archive_local/%f'
> archive files are then moved  on master to standby every other minute:
> rsync -aq --remove-sent-files /usr/local/wal_archive_local/
> slave::wal_archive/

I don't see any mechanism here to keep rsync from copying over partial
files to the standby before they've finished copying to the
wal_archive_local directory.  That's my guess for where the small files
are coming from, rsync before the cp is done.  If you're going to buffer
in a transfer directory, you need some sort of test or locking to make
sure the file is complete with exactly 16MB before it gets rsync'd over.
I suspect no amount of poking at the standby will root out the issue
because it's happening on the primary.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

Re: pg_standby stuck on a wal file size <16MB

From

Vlad Kosilov

Date:

18 May 2008, 01:58:09

Greg, I think you might be onto something here. your suggestion prompts
to get rid off --remove-sent-files in my rsync command, and clean up wal
files from master by some other means, this way even if master's rsync
picks partial file on the first run, it should update and complete file
once it's finished being copied by archive_command to my buffer directory.
Thanks! I'll get this in asap, if this does not solve the problem I'll
re-post
V.
Greg Smith wrote:
> On Sat, 17 May 2008, Ioannis Tambouras wrote:
>
>> The archive command tests if the wal segment exists and is a file,
>> but it does not check if the file is still being written.
>
> That's because it doesn't have to; the archive command doesn't get
> called until the writing is done.
>
>> I don't have sources of pg_standby near me, but I remember in the
>> C code checks for complete segment sizes.
>
> That's on the receiving side, to make sure it's not trying to process
> files that haven't finished copying to the standby yet.  You don't
> have to do any of that in the archive_command.
>
> Anyway, back to the original question:
>
>> archive_command = 'test ! -f /usr/local/wal_archive_local/%f && cp %p
>> /usr/local/wal_archive_local/%f'
>> archive files are then moved  on master to standby every other minute:
>> rsync -aq --remove-sent-files /usr/local/wal_archive_local/
>> slave::wal_archive/
>
> I don't see any mechanism here to keep rsync from copying over partial
> files to the standby before they've finished copying to the
> wal_archive_local directory.  That's my guess for where the small
> files are coming from, rsync before the cp is done.  If you're going
> to buffer in a transfer directory, you need some sort of test or
> locking to make sure the file is complete with exactly 16MB before it
> gets rsync'd over. I suspect no amount of poking at the standby will
> root out the issue because it's happening on the primary.
>
> --
> * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
>


--
________________________________________
Vladimir (Vlad) Kosilov
Senior Systems Administrator
Contigo Systems Inc.
604.683.3106 (phone)
604.648.9886 (fax)
vkosilov@contigo.com
www.contigo.com
________________________________________

Re: pg_standby stuck on a wal file size <16MB

From

Vlad Kosilov

Date:

20 May 2008, 19:44:35

as Greg pointed out: my use of rsync --remove-sent-files option had
contributed to a short sized wal log file on standby.
changing master's postgres crontab to the following helped to resolve
the issue:

# ship logs to standby:
*/2     * * * * rsync -aq /wal_archive_local/ 10.10.10.12::wal_archive/
# remove files older then remove_check file mtime
*/5     * * * * find /wal_archive_local/ ! -newer
/wal_archive_local/remove_check -exec rm -f {} \; && touch
/wal_archive_local/remove_check

Thank you!
V.


Greg Smith wrote:
> On Sat, 17 May 2008, Ioannis Tambouras wrote:
>
>> The archive command tests if the wal segment exists and is a file,
>> but it does not check if the file is still being written.
>
> That's because it doesn't have to; the archive command doesn't get
> called until the writing is done.
>
>> I don't have sources of pg_standby near me, but I remember in the
>> C code checks for complete segment sizes.
>
> That's on the receiving side, to make sure it's not trying to process
> files that haven't finished copying to the standby yet.  You don't
> have to do any of that in the archive_command.
>
> Anyway, back to the original question:
>
>> archive_command = 'test ! -f /usr/local/wal_archive_local/%f && cp %p
>> /usr/local/wal_archive_local/%f'
>> archive files are then moved  on master to standby every other minute:
>> rsync -aq --remove-sent-files /usr/local/wal_archive_local/
>> slave::wal_archive/
>
> I don't see any mechanism here to keep rsync from copying over partial
> files to the standby before they've finished copying to the
> wal_archive_local directory.  That's my guess for where the small
> files are coming from, rsync before the cp is done.  If you're going
> to buffer in a transfer directory, you need some sort of test or
> locking to make sure the file is complete with exactly 16MB before it
> gets rsync'd over. I suspect no amount of poking at the standby will
> root out the issue because it's happening on the primary.
>
> --
> * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
>


--
________________________________________
Vladimir (Vlad) Kosilov
Senior Systems Administrator
Contigo Systems Inc.
604.683.3106 (phone)
604.648.9886 (fax)
vkosilov@contigo.com
www.contigo.com
________________________________________

Re: pg_standby stuck on a wal file size <16MB

From

"Gurjeet Singh"

Date:

20 May 2008, 23:56:31

On Wed, May 21, 2008 at 4:14 AM, Vlad Kosilov <vkosilov@contigo.com> wrote:

as Greg pointed out: my use of rsync --remove-sent-files option had contributed to a short sized wal log file on standby.
changing master's postgres crontab to the following helped to resolve the issue:

# ship logs to standby:
*/2 * * * * rsync -aq /wal_archive_local/ 10.10.10.12::wal_archive/
# remove files older then remove_check file mtime
*/5 * * * * find /wal_archive_local/ ! -newer /wal_archive_local/remove_check -exec rm -f {} \; && touch /wal_archive_local/remove_check

Hmmm, nice trick.

--
gurjeet[.singh]@EnterpriseDB.com
singh.gurjeet@{ gmail | hotmail | indiatimes | yahoo }.com

EnterpriseDB http://www.enterprisedb.com

Mail sent from my BlackLaptop device