summaryrefslogtreecommitdiff
path: root/doc/TODO.detail/mmap
blob: 1ea7b85d027dc2ac13b6436947ad77a391f46e3f (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
From pgsql-hackers-owner+M5149@postgresql.org Mon Feb 26 03:32:49 2001
Received: from mail.postgresql.org (webmail.postgresql.org [216.126.85.28])
	by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id DAA04497
	for <pgman@candle.pha.pa.us>; Mon, 26 Feb 2001 03:32:48 -0500 (EST)
Received: from mail.postgresql.org (webmail.postgresql.org [216.126.85.28])
	by mail.postgresql.org (8.11.1/8.11.1) with SMTP id f1Q8TSx48319;
	Mon, 26 Feb 2001 03:29:28 -0500 (EST)
	(envelope-from pgsql-hackers-owner+M5149@postgresql.org)
Received: from store.d.zembu.com (nat.zembu.com [209.128.96.253])
	by mail.postgresql.org (8.11.1/8.11.1) with ESMTP id f1Q8LPx47243
	for <pgsql-hackers@postgreSQL.org>; Mon, 26 Feb 2001 03:21:25 -0500 (EST)
	(envelope-from ncm@zembu.com)
Received: by store.d.zembu.com (Postfix, from userid 509)
	id 58E39A782; Mon, 26 Feb 2001 00:21:25 -0800 (PST)
Date: Mon, 26 Feb 2001 00:21:25 -0800
To: pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] Re: [PATCHES] A patch for xlog.c
Message-ID: <20010226002125.A2430@store.zembu.com>
Reply-To: pgsql-hackers@postgresql.org
References: <200102260200.VAA17397@candle.pha.pa.us> <22318.983161726@sss.pgh.pa.us>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.2.5i
In-Reply-To: <22318.983161726@sss.pgh.pa.us>; from tgl@sss.pgh.pa.us on Sun, Feb 25, 2001 at 11:28:46PM -0500
From: ncm@zembu.com (Nathan Myers)
Precedence: bulk
Sender: pgsql-hackers-owner@postgresql.org
Status: ORr

On Sun, Feb 25, 2001 at 11:28:46PM -0500, Tom Lane wrote:
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > It allows no backing store on disk.  

I.e. it allows you to map memory without an associated inode; the memory
may still be swapped.  Of course, there is no problem with mapping an 
inode too, so that unrelated processes can join in.  Solarix has a flag
to pin the shared pages in RAM so they can't be swapped out.

> > It is the BSD solution to SysV
> > share memory.  Here are all the BSDi flags:
> 
> >      MAP_ANON    Map anonymous memory not associated with any specific
> >                  file.  The file descriptor used for creating MAP_ANON
> >                  must be -1.  The offset parameter is ignored.
> 
> Hmm.  Now that I read down to the "nonstandard extensions" part of the
> HPUX man page for mmap(), I find
> 
>      If MAP_ANONYMOUS is set in flags:
> 
>           o    A new memory region is created and initialized to all zeros.
>                This memory region can be shared only with descendants of
>                the current process.

This is supported on Linux and BSD, but not on Solarix 7.  It's not 
necessary; you can just map /dev/zero on SysV systems that don't 
have MAP_ANON.

> While I've said before that I don't think it's really necessary for
> processes that aren't children of the postmaster to access the shared
> memory, I'm not sure that I want to go over to a mechanism that makes it
> *impossible* for that to be done.  Especially not if the only motivation
> is to avoid having to configure the kernel's shared memory settings.

There are enormous advantages to avoiding the need to configure kernel 
settings.  It makes PG a better citizen.  PG is much easier to drop in 
and use if you don't need attention from the IT department.

But I don't know of any reason to avoid mapping an actual inode,
so using mmap doesn't necessarily mean giving up sharing among
unrelated processes.

> Besides, what makes you think there's not a limit on the size of shmem
> allocatable via mmap()?

I've never seen any mmap limit documented.  Since mmap() is how 
everybody implements shared libraries, such a limit would be equivalent 
to a limit on how much/many shared libraries are used.  mmap() with 
MAP_ANONYMOUS (or its SysV /dev/zero equivalent) is a common, modern 
way to get raw storage for malloc(), so such a limit would be a limit
on malloc() too.

The mmap architecture comes to us from the Mach microkernel memory
manager, backported into BSD and then copied widely.  Since it was
the fundamental mechanism for all memory operations in Mach, arbitrary
limits would make no sense.  That it worked so well is the reason it 
was copied everywhere else, so adding arbitrary limits while copying 
it would be silly.  I don't think we'll see any systems like that.

Nathan Myers
ncm@zembu.com

From pgsql-hackers-owner+M6138@postgresql.org Mon Mar 19 07:57:59 2001
Received: from mail.postgresql.org (webmail.postgresql.org [216.126.85.28])
	by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id HAA26926
	for <pgman@candle.pha.pa.us>; Mon, 19 Mar 2001 07:57:59 -0500 (EST)
Received: from mail.postgresql.org (webmail.postgresql.org [216.126.85.28])
	by mail.postgresql.org (8.11.1/8.11.1) with SMTP id f2JCug641835;
	Mon, 19 Mar 2001 07:56:42 -0500 (EST)
	(envelope-from pgsql-hackers-owner+M6138@postgresql.org)
Received: from fw.wintelcom.net (ns1.wintelcom.net [209.1.153.20])
	by mail.postgresql.org (8.11.1/8.11.1) with ESMTP id f2JCt7641684
	for <pgsql-hackers@postgresql.org>; Mon, 19 Mar 2001 07:55:07 -0500 (EST)
	(envelope-from bright@fw.wintelcom.net)
Received: (from bright@localhost)
	by fw.wintelcom.net (8.10.0/8.10.0) id f2JCt2325289;
	Mon, 19 Mar 2001 04:55:02 -0800 (PST)
Date: Mon, 19 Mar 2001 04:55:01 -0800
From: Alfred Perlstein <bright@wintelcom.net>
To: Rod Taylor <rod.taylor@inquent.com>
Cc: Hackers List <pgsql-hackers@postgresql.org>
Subject: Re: [HACKERS] Fw: [vorbis-dev] ogg123: shared memory by mmap()
Message-ID: <20010319045500.T29888@fw.wintelcom.net>
References: <018301c0b070$16049a40$2205010a@jester>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.2.5i
In-Reply-To: <018301c0b070$16049a40$2205010a@jester>; from rod.taylor@inquent.com on Mon, Mar 19, 2001 at 07:28:21AM -0500
X-all-your-base: are belong to us.
Precedence: bulk
Sender: pgsql-hackers-owner@postgresql.org
Status: ORr

WOOT WOOT! DANGER WILL ROBINSON!

> ----- Original Message -----
> From: "Christian Weisgerber" <naddy@mips.inka.de>
> Newsgroups: list.vorbis.dev
> To: <vorbis-dev@xiph.org>
> Sent: Saturday, March 17, 2001 12:01 PM
> Subject: [vorbis-dev] ogg123: shared memory by mmap()
> 
> 
> > The patch below adds:
> >
> > - acinclude.m4:  A new macro A_FUNC_SMMAP to check that sharing
> pages
> >   through mmap() works.  This is taken from Joerg Schilling's star.
> > - configure.in:  A_FUNC_SMMAP
> > - ogg123/buffer.c:  If we have a working mmap(), use it to create
> >   a region of shared memory instead of using System V IPC.
> >
> > Works on BSD.  Should also work on SVR4 and offspring (Solaris),
> > and Linux.

This is a really bad idea performance wise.  Solaris has a special
code path for SYSV shared memory that doesn't require tons of swap
tracking structures per-page/per-process.  FreeBSD also has this
optimization (it's off by default, but should work since FreeBSD
4.2 via the sysctl kern.ipc.shm_use_phys=1)

Both OS's use a trick of making the pages non-pageable, this allows
signifigant savings in kernel space required for each attached
process, as well as the use of large pages which reduce the amount
of TLB faults your processes will incurr.

Anyhow, if you could make this a runtime option it wouldn't be so
evil, but as a compile time option, it's a really bad idea for
Solaris and FreeBSD.

--
-Alfred Perlstein - [bright@wintelcom.net|alfred@freebsd.org]

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
    (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)

From pgsql-hackers-owner+M6255@postgresql.org Tue Mar 20 18:46:33 2001
Received: from mail.postgresql.org (webmail.postgresql.org [216.126.85.28])
	by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id SAA02887
	for <pgman@candle.pha.pa.us>; Tue, 20 Mar 2001 18:46:33 -0500 (EST)
Received: from mail.postgresql.org (webmail.postgresql.org [216.126.85.28])
	by mail.postgresql.org (8.11.3/8.11.1) with SMTP id f2KNjtH22390;
	Tue, 20 Mar 2001 18:45:55 -0500 (EST)
	(envelope-from pgsql-hackers-owner+M6255@postgresql.org)
Received: from fw.wintelcom.net (ns1.wintelcom.net [209.1.153.20])
	by mail.postgresql.org (8.11.3/8.11.1) with ESMTP id f2KNiFH22033
	for <pgsql-hackers@postgresql.org>; Tue, 20 Mar 2001 18:44:15 -0500 (EST)
	(envelope-from bright@fw.wintelcom.net)
Received: (from bright@localhost)
	by fw.wintelcom.net (8.10.0/8.10.0) id f2KNiAW02417;
	Tue, 20 Mar 2001 15:44:10 -0800 (PST)
Date: Tue, 20 Mar 2001 15:44:10 -0800
From: Alfred Perlstein <bright@wintelcom.net>
To: Bruce Momjian <pgman@candle.pha.pa.us>
Cc: Rod Taylor <rod.taylor@inquent.com>,
        Hackers List <pgsql-hackers@postgresql.org>
Subject: Re: [HACKERS] Fw: [vorbis-dev] ogg123: shared memory by mmap()
Message-ID: <20010320154410.H29888@fw.wintelcom.net>
References: <20010319045500.T29888@fw.wintelcom.net> <200103202210.RAA23981@candle.pha.pa.us>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.2.5i
In-Reply-To: <200103202210.RAA23981@candle.pha.pa.us>; from pgman@candle.pha.pa.us on Tue, Mar 20, 2001 at 05:10:33PM -0500
X-all-your-base: are belong to us.
Precedence: bulk
Sender: pgsql-hackers-owner@postgresql.org
Status: OR

* Bruce Momjian <pgman@candle.pha.pa.us> [010320 14:10] wrote:
> > > > The patch below adds:
> > > >
> > > > - acinclude.m4:  A new macro A_FUNC_SMMAP to check that sharing
> > > pages
> > > >   through mmap() works.  This is taken from Joerg Schilling's star.
> > > > - configure.in:  A_FUNC_SMMAP
> > > > - ogg123/buffer.c:  If we have a working mmap(), use it to create
> > > >   a region of shared memory instead of using System V IPC.
> > > >
> > > > Works on BSD.  Should also work on SVR4 and offspring (Solaris),
> > > > and Linux.
> > 
> > This is a really bad idea performance wise.  Solaris has a special
> > code path for SYSV shared memory that doesn't require tons of swap
> > tracking structures per-page/per-process.  FreeBSD also has this
> > optimization (it's off by default, but should work since FreeBSD
> > 4.2 via the sysctl kern.ipc.shm_use_phys=1)
> 
> > 
> > Both OS's use a trick of making the pages non-pageable, this allows
> > signifigant savings in kernel space required for each attached
> > process, as well as the use of large pages which reduce the amount
> > of TLB faults your processes will incurr.
> 
> That is interesting.  BSDi has SysV shared memory as non-pagable, and I
> always thought of that as a bug.  Seems you are saying that having it
> pagable has a significant performance penalty.  Interesting.

Yes, having it pageable is actually sort of bad.

It doesn't allow you to do several important optimizations.

-- 
-Alfred Perlstein - [bright@wintelcom.net|alfred@freebsd.org]


---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

From pgsql-general-owner+M14300@postgresql.org Mon Aug 27 13:07:32 2001
Return-path: <pgsql-general-owner+M14300@postgresql.org>
Received: from server1.pgsql.org (server1.pgsql.org [64.39.15.238])
	by candle.pha.pa.us (8.11.6/8.10.1) with ESMTP id f7RH7VF04800
	for <pgman@candle.pha.pa.us>; Mon, 27 Aug 2001 13:07:31 -0400 (EDT)
Received: from postgresql.org.org (webmail.postgresql.org [216.126.85.28])
	by server1.pgsql.org (8.11.6/8.11.6) with ESMTP id f7RH7Tq17721;
	Mon, 27 Aug 2001 12:07:29 -0500 (CDT)
	(envelope-from pgsql-general-owner+M14300@postgresql.org)
Received: from svana.org (svana.org [210.9.66.30])
	by postgresql.org (8.11.3/8.11.4) with ESMTP id f7RFE1f13269
	for <pgsql-general@postgresql.org>; Mon, 27 Aug 2001 11:14:01 -0400 (EDT)
	(envelope-from kleptog@svana.org)
Received: from kleptog by svana.org with local (Exim 3.12 #1 (Debian))
	id 15bO5x-0000Fd-00; Tue, 28 Aug 2001 01:14:33 +1000
Date: Tue, 28 Aug 2001 01:14:33 +1000
From: Martijn van Oosterhout <kleptog@svana.org>
To: Andrew Snow <andrew@modulus.org>
cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] raw partition
Message-ID: <20010828011433.E32309@svana.org>
Reply-To: Martijn van Oosterhout <kleptog@svana.org>
References: <20010827233815.B32309@svana.org> <000101c12f00$dc5814b0$fa01b5ca@avon>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.2.5i
In-Reply-To: <000101c12f00$dc5814b0$fa01b5ca@avon>; from andrew@modulus.org on Tue, Aug 28, 2001 at 12:02:08AM +1000
Precedence: bulk
Sender: pgsql-general-owner@postgresql.org
Status: OR

On Tue, Aug 28, 2001 at 12:02:08AM +1000, Andrew Snow wrote:
> 
> What I think would be better would be moving postgresql to a system of
> using memory-mapped I/O.  instead of the shared buffer cache, files
> would be directly memory-mapped and the OS would do the caching.  I
> can't see this happening though because of platform dependancy, but I
> think its worth another look soon because many unix platforms support
> mmap().  I think it would improve the performance of disk-intensive
> tasks noticeably.

Well, this has other problems. Consider tables that are larger than your
system memory. You'd have to continuously map and unmap different sections.
That can have odd side effects (witness mozilla on linux having 15,000
mapped areas or so...)

You would still however get the advantage that you wouldn't have to copy the
data from the disk buffers to user space, you simply get the disk buffer
mapped into your address space.

I think that for commonly used tables that are under 100K in size (most of
the system tables), this is quite a workable idea. If you don't mind keeping
them mapped the whole time.

-- 
Martijn van Oosterhout <kleptog@svana.org>
http://svana.org/kleptog/
> It would be nice if someone came up with a certification system that
> actually separated those who can barely regurgitate what they crammed over
> the last few weeks from those who command secret ninja networking powers.

---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

From pgsql-general-owner+M14319@postgresql.org Mon Aug 27 16:57:10 2001
Return-path: <pgsql-general-owner+M14319@postgresql.org>
Received: from server1.pgsql.org (server1.pgsql.org [64.39.15.238])
	by candle.pha.pa.us (8.11.6/8.10.1) with ESMTP id f7RKv9F16849
	for <pgman@candle.pha.pa.us>; Mon, 27 Aug 2001 16:57:09 -0400 (EDT)
Received: from postgresql.org.org (webmail.postgresql.org [216.126.85.28])
	by server1.pgsql.org (8.11.6/8.11.6) with ESMTP id f7RKv9q31456;
	Mon, 27 Aug 2001 15:57:09 -0500 (CDT)
	(envelope-from pgsql-general-owner+M14319@postgresql.org)
Received: from sss.pgh.pa.us ([192.204.191.242])
	by postgresql.org (8.11.3/8.11.4) with ESMTP id f7RJrsf55472
	for <pgsql-general@postgresql.org>; Mon, 27 Aug 2001 15:53:54 -0400 (EDT)
	(envelope-from tgl@sss.pgh.pa.us)
Received: from sss2.sss.pgh.pa.us (tgl@localhost [127.0.0.1])
	by sss.pgh.pa.us (8.11.4/8.11.4) with ESMTP id f7RJrGK19431;
	Mon, 27 Aug 2001 15:53:16 -0400 (EDT)
To: Martijn van Oosterhout <kleptog@svana.org>
cc: Andrew Snow <andrew@modulus.org>, pgsql-general@postgresql.org
Subject: Re: [GENERAL] raw partition 
In-Reply-To: <20010828011433.E32309@svana.org> 
References: <20010827233815.B32309@svana.org> <000101c12f00$dc5814b0$fa01b5ca@avon> <20010828011433.E32309@svana.org>
Comments: In-reply-to Martijn van Oosterhout <kleptog@svana.org>
	message dated "Tue, 28 Aug 2001 01:14:33 +1000"
Date: Mon, 27 Aug 2001 15:53:15 -0400
Message-ID: <19428.998941995@sss.pgh.pa.us>
From: Tom Lane <tgl@sss.pgh.pa.us>
Precedence: bulk
Sender: pgsql-general-owner@postgresql.org
Status: OR

Martijn van Oosterhout <kleptog@svana.org> writes:
> You would still however get the advantage that you wouldn't have to copy the
> data from the disk buffers to user space, you simply get the disk buffer
> mapped into your address space.

AFAICS this would be the *only* advantage.  While it's not negligible,
it's quite unclear that it's worth the bookkeeping and portability
headaches of managing lots of mmap'd areas, either.

Before I take this idea seriously at all, I'd want to see a design that
addresses a couple of critical issues:

1. Postgres' shared buffers are *shared*, potentially across many
processes.  How will you deal with buffers for files that have been
mmap'd by only some of the processes?  (Maybe this means that the
whole concept of shared buffers goes away, and each process does its
own buffer management based on its own mmaps.  Not sure.  That would be
a pretty radical restructuring though, and would completely invalidate
our present approach to page-level locking.)

2. How do you deal with extending a file?  My system's mmap man page
says
     If the size of the mapped file changes after the call to mmap(), the
     effect of references to portions of the mapped region that correspond
     to added or removed portions of the file is unspecified.
This suggests that the only portable way to cope is to issue a separate
mmap for every disk page.  Will typical Unix systems perform well with
umpteen thousand small mmap requests?

3. How do you persuade the other backends to drop their mmaps of a table
you are deleting?

There are probably other gotchas, but without an understanding of how
to address these, I doubt it's worth looking further ...

			regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html

From pgsql-hackers-owner+M13750=candle.pha.pa.us=pgman@postgresql.org Mon Oct  1 05:59:15 2001
Return-path: <pgsql-hackers-owner+M13750=candle.pha.pa.us=pgman@postgresql.org>
Received: from server1.pgsql.org (server1.pgsql.org [64.39.15.238] (may be forged))
	by candle.pha.pa.us (8.11.6/8.10.1) with ESMTP id f919xF512590
	for <pgman@candle.pha.pa.us>; Mon, 1 Oct 2001 05:59:15 -0400 (EDT)
Received: from postgresql.org (webmail.postgresql.org [216.126.85.28])
	by server1.pgsql.org (8.11.6/8.11.6) with ESMTP id f919xA207817
	for <pgman@candle.pha.pa.us>; Mon, 1 Oct 2001 04:59:10 -0500 (CDT)
	(envelope-from pgsql-hackers-owner+M13750=candle.pha.pa.us=pgman@postgresql.org)
Received: from mrsgntmail01.mediaring.com.sg (mserver.mediaring.com.sg [203.208.141.175])
	by postgresql.org (8.11.3/8.11.4) with ESMTP id f919rE320926
	for <pgsql-hackers@postgreSQL.org>; Mon, 1 Oct 2001 05:53:15 -0400 (EDT)
	(envelope-from jana-reddy@mediaring.com.sg)
Received: by MRSGNTMAIL01 with Internet Mail Service (5.5.2650.21)
	id <PMTCM7SJ>; Mon, 1 Oct 2001 18:03:34 +0800
Received: from mediaring.com.sg (10.1.0.131 [10.1.0.131]) by mrsgntmail01.mediaring.com.sg with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2650.21)
	id PMTCM7SH; Mon, 1 Oct 2001 18:03:25 +0800
From: Janardhana Reddy <jana-reddy@mediaring.com.sg>
To: Bruce Momjian <pgman@candle.pha.pa.us>, Tom Lane <tgl@sss.pgh.pa.us>
cc: PostgreSQL-development <pgsql-hackers@postgresql.org>,
   janareddy
  <jana-reddy@mediaring.com.sg>
Message-ID: <3BB83DF0.8946973@mediaring.com.sg>
Date: Mon, 01 Oct 2001 17:57:04 +0800
X-Mailer: Mozilla 4.75 [en] (X11; U; Linux 2.4.0 i686)
X-Accept-Language: en
MIME-Version: 1.0
Subject: Re: [HACKERS] PERFORMANCE IMPROVEMENT by mapping  WAL FILES
References: <200109282137.f8SLbpm01890@candle.pha.pa.us>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Precedence: bulk
Sender: pgsql-hackers-owner@postgresql.org
Status: ORr

     I have just  completed the functional testing  the WAL using mmap  , it is

 working  fine,  I  have tested  by commenting out the  "CreateCheckPoint "
functionality so that
   when i kill the postgres and restart it will redo all the records from the
WAL log file  which
  is updated  using mmap.
     Just i need  to  clean code and to do some stress testing.
 By the end of this week i should able to  complete  the stress test  and
generate the patch file .
    As Tom Lane mentioned  i see the  problem in portability  to all platforms,

      what i propose is to use mmap for only WAL  for some platforms like
  linux,freebsd etc . For  other platforms we can use the existing method by
slightly modifying the
 write()  routine to write only the modified part of the page.

Regards
jana

>
>
> OK, I have talked to Tom Lane about this on the phone and we have a few
> ideas.
>
> Historically, we have avoided mmap() because of portability problems,
> and because using mmap() to write to large tables could consume lots of
> address space with little benefit.  However, I perhaps can see WAL as
> being a good use of mmap.
>
> First, there is the issue of using mmap().  For OS's that have the
> mmap() MAP_SHARED flag, different backends could mmap the same file and
> each see the changes.  However, keep in mind we still have to fsync()
> WAL, so we need to use msync().
>
> So, looking at the benefits of using mmap(), we have overhead of
> different backends having to mmap something that now sits quite easily
> in shared memory.  Now, I can see mmap reducing the copy from user to
> kernel, but there are other ways to fix that.  We could modify the
> write() routines to write() 8k on first WAL page write and later write
> only the modified part of the page to the kernel buffers.  The old
> kernel buffer is probably still around so it is unlikely to require a
> read from the file system to read in the rest of the page.  This reduces
> the write from 8k to something probably less than 4k which is better
> than we can do with mmap.
>
> I will add a TODO item to this effect.
>
> As far as reducing the write to disk from 8k to 4k, if we have to
> fsync/msync, we have to wait for the disk to spin to the proper location
> and at that point writing 4k or 8k doesn't seem like much of a win.
>
> In summary, I think it would be nice to reduce the 8k transfer from user
> to kernel on secondary page writes to only the modified part of the
> page.  I am uncertain if mmap() or anything else will help the physical
> write to the disk.
>
> --
>   Bruce Momjian                        |  http://candle.pha.pa.us
>   pgman@candle.pha.pa.us               |  (610) 853-3000
>   +  If your life is a hard drive,     |  830 Blythe Avenue
>   +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org