1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
|
Features needed before V1.0 release
===================================
Settings information
Validate against min/max - stub
Figure out how to deal with delimiters - stub
For memory, handle the same memory input formats as the database
Format output so it looks nicer
Warning about listen_addresses if it's not set to '*'-input parameter?
Show original value in output
Put a pgconftune header before any changed settings
Limit shared memory use on Windows
Platform specific details
Hint for <256MB case
Look for postgresql.conf file using PGDATA
Look for settings files based on path of executable
Output sysctl.conf suggestion
Test on Mac as additional OS
Test on Linux + Python 2.5, reported ValueError on wintypes via list
Model
Estimate shared memory use
Docs
Describe all the types clearly, refer to docs in usage notes
Suggest where the settings are still conservative
Data warehouse: more stats, turn off auto vacuum
Does not account for application overhead, can reduce total mem if that's the case
Targets using 1/4 for shared_buffers, up to 1/2 for work_mem, and typically an additional 1/2 of the OS buffer cache
May not correctly parse existing values if you have manually increased either BLCKSZ or XLOG_BLCKSZ; should
still set values correctly though.
Minimal systems should just use what comes out of initdb
Outline of main desired features
================================
Config file class:
Interpret input units
Strip out items set to the default
Strip out comments
V2.0 features
Wizard to ask questions
Real GUI
Brief versus verbose, removing comments
Collect comments that appear in the default config to allow stripping them out
Set max_prepared_transactions
Set FSM parameters
Add 8.3, 8.2, 8.1 compatibility
List of parameters not to mess with (collate, archive_command) may be needed for more advanced generation stuff
Allow overriding the OS information for sysctl output
Extend model to work properly on systems with smaller amounts of RAM aimed at a small number of users
Python systems information
==========================
system parameter SC_INT_MAX will tell us what kind of platform we're on?
SC_INT_MAX 2147483647
SC_NPROCESSORS_CONF 4
Available via platform information--it looks at the size of a pointer
Notes about pg-generate-conf
============================
pg_generate_conf would take the following switches:
* -b , --basic — short conf file, listing only the 15-18 most commonly changed options
* -a , --advanced — conf file listing all 196+ options
* -t, --terse — conf file lists only category headings and actual settings, no comments
* -n, --normal — conf file has category and subcategory settings, with short, descriptive comments
* -v, --verbose — conf file lists full descriptions and recommendations in comments with each option
* -c "option = value" set specific option to specific value in the file
* -f "filename" — take options and values from file "filename"
The default would be "-b, -n", with specific settings for shared_buffers and wal_sync_method. The current postgresql.conf is a lot more like a "-a, -v" file.
Initial Review
==============
The following section is notes from Josh Berkus after reviewing the first
pgtune prototype released:
Solicit the following information from the user:
-- Available RAM
-- Expected Database Size (to nearest 10x)
-- Type of Application
-- Web
-- Data Warehouse
-- Mixed
-- Desktop
-- Operating System [Linux/Windows/OSX/Solaris/FreeBSD/other]
From the above, you can derive all necessary calculations for the basics. In the advanced version, we'll also want to ask:
-- Memory used by other applications on the system?
-- Analyze queries for performance?
-- SSL?
-- Production vs. Development status?
-- How many connections?
-- Logging setup:
Syslog
Analyze Performance
Private log with weekly rotation
It's completely unnecessary to account for OS overhead. This can and should be taken into account as part of the
calculations for other figures. For example, my 1/4 and 3/4 calculations ignore OS overhead. You only need to reduce
Available RAM when the server will be running something else, like a J2EE server or multiple databases.
The FSM stuff is not be necessary if targeting 8.4. Values:
web: max_fsm_pages = DBsize / PageSize / 8
oltp: max_fsm_pages = DBsize / PageSize / 8
DW: max_fsm_pages = DBsize / PageSize / 32
(unless LoadSize is known)
Mixed: max_fsm_pages = DBsize / PageSize / 8
Desktop: max_fsm_pages = DBsize / PageSize / 8
The specific elements of a "DW" use-case aren't necessarily tied to size. They are:
* data comes in in large batches rather than individual rows
* small numbers of users
* large complex queries
For example, right now, I'm refactoring a database which is only 15GB, but is definitely DW behavior, so we want to keep
max_connections to < 20 and turn autovaccum off.
So I think we should ask the user what kind of DB they have (*with* docs which explain what the types mean) and fall back to testing by size if the info is not supplied.
Regarding the level of default_stats_target, it sounds like people agree that it ought to be raised for the DW use-case, but disagree how much. If that's the case, what if we compromize at 50 for "mixed" and 100 for DW? That should allay people's fears, and we can tinker with it when we have more data.
The question is what value represents a good trade-off between the costs of having large stats targets -- longer analyze, more data stored in pg_statistics, more vacuuming of pg_statistics needed, longer plan times -- and the benefits of having larger stats targets -- fewer columns which need raised stats targets.
Controversial Settings
======================
-checkpoint_segments in the OLTP case should be larger
-Allocating work_mem: total ram/connections seems a little scant in web / oltp case, but with the high connections it could make sense
-Going that large with statistics target
-I don't need DB size if I'm not setting the fsm stuff; leave that out of V8.4 targeted release
-Put a cap on maint_work_mem at 1GB
Parsing Input Units
===================
parse_int is the routine
kB MB and GB are only accepted ones
Some parameters are "GUC_UNIT_MEMORY"; these are the ones this logic applies to
These are ones where the unit name ends with kB
Raw integers are considered in kb unless they are blocksz or xlog_blcksz ones
#define KB_PER_MB (1024)
#define KB_PER_GB (1024*1024)
kB: Divided by (unit size)/kb (typically =8) to get kB
MB: Multplied by KB_PER_MB , divided as above
GB: Multiplied by KB_PER_GB
There are also unit of time variables, don't care about those right now
This is the logic that maps the block size stuff into the units field:
case GUC_UNIT_KB:
values[2] = "kB";
case GUC_UNIT_BLOCKS:
snprintf(buf, sizeof(buf), "%dkB", BLCKSZ / 1024);
case GUC_UNIT_XBLOCKS:
snprintf(buf, sizeof(buf), "%dkB", XLOG_BLCKSZ / 1024);
So I don't have to worry about that; I can just use the unit size as kB
For booleans, on and off are the officially supported version of those values, but many others
are accepted too
Hint interface
==============
One option for displaying suggestions is a hint interface that goes to
standard error. Sample hints:
# HINT: Increase SHMMAX
# HINT: You won't be able to connect to this database over TCP/IP with your listen_address setting
# HINT: The value for max_connections is being reduced from $X to $Y
# HINT: Expected maximum memory use for this configuration: $X (pretty printed)
# HINT: Autovacuum is off
# HINT: If you aren't using partitions, you can improve query planning time by turning constraint_exclusion off
# HINT: Windows doesn't handle large numbers of connections
# HINT: Consider a conneciton pooler
Would be nice to make all the hints show up as comments in case they are mixed together.
That will make the SHMMAX example much less useful though
Python review notes
===================
-from ctypes import * ( 18 ) makes the block difficult to read and
pollutes the namespace.
-The doc strings ( 59, 136, 206 ) don't follow standard conventions,
described here http://www.python.org/dev/peps/pep-0257/.
-Functions also support doc strings ( 342, 351, etc. )
Estimating shared memory usage
==============================
http://www.postgresql.org/docs/current/static/kernel-resources.html
Introduce a platform structure I pass around everywhere that
abstracts away OS, memory, version, PostgreSQL version, etc.
Introduce an "info" option that will show all the information in there,
including anything auto-detected
Include a stub for that to autodetect PG version (even if it doesn't
work just yet)
|