diff options
-rw-r--r-- | doc/src/sgml/admin.sgml | 3 | ||||
-rw-r--r-- | doc/src/sgml/filelist.sgml | 3 | ||||
-rw-r--r-- | doc/src/sgml/maintenance.sgml | 371 | ||||
-rw-r--r-- | doc/src/sgml/syntax.sgml | 23 |
4 files changed, 393 insertions, 7 deletions
diff --git a/doc/src/sgml/admin.sgml b/doc/src/sgml/admin.sgml index b79146cb99f..32c0983b170 100644 --- a/doc/src/sgml/admin.sgml +++ b/doc/src/sgml/admin.sgml @@ -1,5 +1,5 @@ <!-- -$Header: /cvsroot/pgsql/doc/src/sgml/Attic/admin.sgml,v 1.34 2001/05/12 22:51:34 petere Exp $ +$Header: /cvsroot/pgsql/doc/src/sgml/Attic/admin.sgml,v 1.35 2001/08/26 21:17:12 tgl Exp $ --> <book id="admin"> @@ -29,6 +29,7 @@ $Header: /cvsroot/pgsql/doc/src/sgml/Attic/admin.sgml,v 1.34 2001/05/12 22:51:34 &charset; &manage-ag; &user-manag; + &maintenance; &backup; &wal; &storage; diff --git a/doc/src/sgml/filelist.sgml b/doc/src/sgml/filelist.sgml index 45451ecabc5..4b613db8e5b 100644 --- a/doc/src/sgml/filelist.sgml +++ b/doc/src/sgml/filelist.sgml @@ -1,4 +1,4 @@ -<!-- $Header: /cvsroot/pgsql/doc/src/sgml/filelist.sgml,v 1.13 2001/06/02 18:25:16 petere Exp $ --> +<!-- $Header: /cvsroot/pgsql/doc/src/sgml/filelist.sgml,v 1.14 2001/08/26 21:17:12 tgl Exp $ --> <!entity history SYSTEM "history.sgml"> <!entity info SYSTEM "info.sgml"> @@ -43,6 +43,7 @@ <!entity client-auth SYSTEM "client-auth.sgml"> <!entity installation SYSTEM "installation.sgml"> <!entity installw SYSTEM "install-win32.sgml"> +<!entity maintenance SYSTEM "maintenance.sgml"> <!entity manage-ag SYSTEM "manage-ag.sgml"> <!entity recovery SYSTEM "recovery.sgml"> <!entity regress SYSTEM "regress.sgml"> diff --git a/doc/src/sgml/maintenance.sgml b/doc/src/sgml/maintenance.sgml new file mode 100644 index 00000000000..7629c5bd541 --- /dev/null +++ b/doc/src/sgml/maintenance.sgml @@ -0,0 +1,371 @@ +<!-- +$Header: /cvsroot/pgsql/doc/src/sgml/maintenance.sgml,v 1.1 2001/08/26 21:17:12 tgl Exp $ +--> + +<chapter id="maintenance"> + <title>Routine Database Maintenance Tasks</title> + + <sect1 id="maintenance-general"> + <title>General Discussion</Title> + + <para> + There are a few routine maintenance chores that must be performed on + a regular basis to keep a <productname>PostgreSQL</productname> + installation running smoothly. The tasks discussed here are repetitive + in nature and can easily be automated using standard Unix tools such + as <filename>cron</filename> scripts. But it is the database + administrator's responsibility to set up appropriate scripts, and to + check that they execute successfully. + </para> + + <para> + One obvious maintenance task is creation of backup copies of the data on a + regular schedule. Without a recent backup, you have no chance of recovery + after a catastrophe (disk failure, fire, mistakenly dropping a critical + table, etc). The backup and recovery mechanisms available in + <productname>PostgreSQL</productname> are discussed at length in + <xref linkend="backup">. + </para> + + <para> + The other main category of maintenance task is periodic <quote>vacuuming</> + of the database. This activity is discussed in + <xref linkend="routine-vacuuming">. + </para> + +<!-- probably ought to have something about log rotation here; + any other important maintenance tasks? +--> + + <para> + <productname>PostgreSQL</productname> is low-maintenance compared to + some other database products. Nonetheless, appropriate attention to + these tasks will go far towards ensuring a pleasant and productive + experience with the system. + </para> + + </sect1> + + <sect1 id="routine-vacuuming"> + <title>Routine Vacuuming</title> + + <para> + <productname>PostgreSQL</productname>'s <command>VACUUM</> command must be + run on a regular basis for several reasons: + + <orderedlist> + <listitem> + <simpara>To recover disk space occupied by updated or deleted + rows.</simpara> + </listitem> + + <listitem> + <simpara>To update data statistics used by the + <productname>PostgreSQL</productname> query planner.</simpara> + </listitem> + + <listitem> + <simpara>To protect against loss of very old data due to + <firstterm>transaction ID wraparound</>.</simpara> + </listitem> + </orderedlist> + + The frequency and scope of <command>VACUUM</>s performed for each of + these reasons will vary depending on the needs of each installation. + Therefore, database administrators must understand these issues and + develop an appropriate maintenance strategy. This section concentrates + on explaining the high-level issues; for details about command syntax + and so on, see the <command>VACUUM</> command reference page. + </para> + + <para> + Beginning in <productname>PostgreSQL</productname> 7.2, the standard form + of <command>VACUUM</> can run in parallel with normal database operations + (selects, inserts, updates, deletes, but not changes to table schemas). + Routine vacuuming is therefore not nearly as intrusive as it was in prior + releases, and it's not as critical to try to schedule it at low-usage + times of day. + </para> + + <sect2 id="vacuum-for-space-recovery"> + <title>Recovering disk space</title> + + <para> + In normal <productname>PostgreSQL</productname> operation, an UPDATE or + DELETE of a row does not immediately remove the old <firstterm>tuple</> + (version of the row). This approach is necessary to gain the benefits + of multi-version concurrency control (see the User's Guide): the tuple + must not be deleted while + it is still potentially visible to other transactions. But eventually, + an outdated or deleted tuple is no longer of interest to any transaction. + The space it occupies must be reclaimed for reuse by new tuples, to avoid + infinite growth of disk space requirements. This is done by running + <command>VACUUM</>. + </para> + + <para> + Clearly, a table that receives frequent updates or deletes will need + to be vacuumed more often than tables that are seldom updated. It may + be useful to set up periodic cron tasks that vacuum only selected tables, + skipping tables that are known not to change often. This is only likely + to be helpful if you have both large heavily-updated tables and large + seldom-updated tables --- the extra cost of vacuuming a small table + isn't enough to be worth worrying about. + </para> + + <para> + The standard form of <command>VACUUM</> is best used with the goal of + maintaining a fairly level steady-state usage of disk space. The standard + form finds old tuples and makes their space available for re-use within + the table, but it does not try very hard to shorten the table file and + return disk space to the operating system. If you need to return disk + space to the operating system you can use <command>VACUUM FULL</> --- + but what's the point of releasing disk space that will only have to be + allocated again soon? Moderately frequent standard <command>VACUUM</>s + are a better approach than infrequent <command>VACUUM FULL</>s for + maintaining heavily-updated tables. + </para> + + <para> + Recommended practice for most sites is to schedule a database-wide + <command>VACUUM</> once a day at a low-usage time of day, supplemented + by more frequent vacuuming of heavily-updated tables if necessary. + (If you have multiple databases in an installation, don't forget to + vacuum each one; the <filename>vacuumdb</> script may be helpful.) + Use plain <command>VACUUM</>, not <command>VACUUM FULL</>, for routine + vacuuming for space recovery. + </para> + + <para> + <command>VACUUM FULL</> is recommended for cases where you know you have + deleted the majority of tuples in a table, so that the steady-state size + of the table can be shrunk substantially with <command>VACUUM FULL</>'s + more aggressive approach. + </para> + + <para> + If you have a table whose contents are deleted completely every so often, + consider doing it with <command>TRUNCATE</> rather than using + <command>DELETE</> followed by <command>VACUUM</>. + </para> + </sect2> + + <sect2 id="vacuum-for-statistics"> + <title>Updating planner statistics</title> + + <para> + The <productname>PostgreSQL</productname> query planner relies on + statistical information about the contents of tables in order to + generate good plans for queries. These statistics are gathered by + the <command>ANALYZE</> command, which can be invoked by itself or + as an optional step in <command>VACUUM</>. It is important to have + reasonably accurate statistics, otherwise poor choices of plans may + degrade database performance. + </para> + + <para> + As with vacuuming for space recovery, frequent updates of statistics + are more useful for heavily-updated tables than for seldom-updated ones. + But even for a heavily-updated table, there may be no need for + statistics updates if the statistical distribution of the data is not + changing much. A simple rule of thumb is to think about how much + the minimum and maximum values of the columns in the table change. + For example, a timestamp column that contains the time of row update + will have a constantly-increasing maximum value as rows are added and + updated; such a column will probably need more frequent statistics + updates than, say, a column containing URLs for pages accessed on a + website. The URL column may receive changes just as often, but the + statistical distribution of its values probably changes relatively + slowly. + </para> + + <para> + It is possible to run <command>ANALYZE</> on specific tables and even + just specific columns of a table, so the flexibility exists to update some + statistics more frequently than others if your application requires it. + In practice, however, the usefulness of this feature is doubtful. + Beginning in <productname>PostgreSQL</productname> 7.2, + <command>ANALYZE</> is a fairly fast operation even on large tables, + because it uses a statistical random sampling of the rows of a table + rather than reading every single row. So it's probably much simpler + to just run it over the whole database every so often. + </para> + + <tip> + <para> + Although per-column tweaking of <command>ANALYZE</> frequency may not be + very productive, you may well find it worthwhile to do per-column + adjustment of the level of detail of the statistics collected by + <command>ANALYZE</>. Columns that are heavily used in WHERE clauses + and have highly irregular data distributions may require a finer-grain + data histogram than other columns. See <command>ALTER TABLE SET + STATISTICS</>. + </para> + </tip> + + <para> + Recommended practice for most sites is to schedule a database-wide + <command>ANALYZE</> once a day at a low-usage time of day; this can + usefully be combined with a nightly <command>VACUUM</>. However, + sites with relatively slowly changing table statistics may find that + this is overkill, and that less-frequent <command>ANALYZE</> runs + are sufficient. + </para> + </sect2> + + <sect2 id="vacuum-for-wraparound"> + <title>Preventing transaction ID wraparound failures</title> + + <para> + <productname>PostgreSQL</productname>'s MVCC transaction semantics + depend on being able to compare transaction ID (<firstterm>XID</>) + numbers: a tuple with an insertion XID newer than the current + transaction's XID is <quote>in the future</> and should not be visible + to the current transaction. But since transaction IDs have limited size + (32 bits at this writing) an installation that runs for a long time (more + than 4 billion transactions) will suffer <firstterm>transaction ID + wraparound</>: the XID counter wraps around to zero, and all of a sudden + transactions that were in the past appear to be in the future --- which + means their outputs become invisible. In short, catatrophic data loss. + (Actually the data is still there, but that's cold comfort if you can't + get at it.) + </para> + + <para> + Prior to <productname>PostgreSQL</productname> 7.2, the only defense + against XID wraparound was to re-initdb at least every 4 billion + transactions. This of course was not very satisfactory for high-traffic + sites, so a better solution has been devised. The new approach allows an + installation to remain up indefinitely, without initdb or any sort of + restart. The price is this maintenance requirement: <emphasis>every table + in the database must be VACUUMed more often than once every billion + transactions</emphasis>. + </para> + + <para> + In practice this isn't an onerous requirement, but since the consequences + of failing to meet it can be complete data loss (not just wasted disk + space or slow performance), some special provisions have been made to help + database administrators keep track of the time since the last + <command>VACUUM</>. The remainder of this section gives the details. + </para> + + <para> + The new approach to XID comparison distinguishes two special XIDs, numbers + 1 and 2 (<literal>BootstrapXID</> and <literal>FrozenXID</>). These two + XIDs are always considered older than every normal XID. Normal XIDs (those + greater than 2) are compared using modulo-2^31 arithmetic. This means + that for every normal XID, there are two billion XIDs that are + <quote>older</> and two billion that are <quote>newer</>; another way to + say it is that the normal XID space is circular with no endpoint. + Therefore, once a tuple has been created with a particular normal XID, the + tuple will appear to be <quote>in the past</> for the next two billion + transactions, no matter which normal XID we are talking about. If the + tuple still exists after more than two billion transactions, it would + suddenly appear to be in the future. To prevent data loss, old tuples + must be reassigned the XID <literal>FrozenXID</> sometime before they reach + the two-billion-transactions-old mark. Once they are assigned this + special XID, they will appear to be <quote>in the past</> to all normal + transactions regardless of wraparound issues, and so such tuples will be + good until deleted, no matter how long that is. This reassignment of + XID is handled by <command>VACUUM</>. + </para> + + <para> + <command>VACUUM</>'s normal policy is to reassign <literal>FrozenXID</> + to any tuple with a normal XID more than one billion transactions in the + past. This policy preserves the original insertion XID until it is not + likely to be of interest anymore (in fact, most tuples will probably + live and die without ever being <quote>frozen</>). This policy means + that the maximum safe interval between <command>VACUUM</>s of any table + is exactly one billion transactions: if you wait longer, it's possible + that a tuple that was not quite old enough to be reassigned last time + is now more than two billion transactions old and has wrapped around + into the future --- ie, is lost to you. (Of course, it'll reappear + after another two billion transactions, but that's no help.) + </para> + + <para> + Since periodic <command>VACUUM</>s are needed anyway for the reasons + described earlier, it's unlikely that any table would not be vacuumed + for as long as a billion transactions. But to help administrators ensure + this constraint is met, <command>VACUUM</> stores transaction ID + statistics in the system table <filename>pg_database</>. In particular, + the <filename>datfrozenxid</> field of a database's + <filename>pg_database</> row is updated at the completion of any + database-wide vacuum operation (ie, <command>VACUUM</> that does not + name a specific table). The value stored in this field is the freeze + cutoff XID that was used by that <command>VACUUM</> command. All normal + XIDs older than this cutoff XID are guaranteed to have been replaced by + <literal>FrozenXID</> within that database. A convenient way to + examine this information is to execute the query + <informalexample> +<programlisting> +select datname, age(datfrozenxid) from pg_database; +</programlisting> + </informalexample> + The <literal>age</> column measures the number of transactions from the + cutoff XID to the current transaction's XID. + </para> + + <para> + With the standard freezing policy, the <literal>age</> column will start + at one billion for a freshly-vacuumed database. When the <literal>age</> + approaches two billion, the database must be vacuumed again to avoid + risk of wraparound failures. Recommended practice is to vacuum each + database at least once every half-a-billion (500 million) transactions, + so as to provide plenty of safety margin. To help meet this rule, + each database-wide <command>VACUUM</> automatically delivers a warning + if there are any <filename>pg_database</> entries showing an + <literal>age</> of more than 1.5 billion transactions, for example: + <informalexample> +<programlisting> +play=# vacuum; +NOTICE: Some databases have not been vacuumed in 1613770184 transactions. + Better vacuum them within 533713463 transactions, + or you may have a wraparound failure. +VACUUM +</programlisting> + </informalexample> + </para> + + <para> + <command>VACUUM</> with the <command>FREEZE</> option uses a more + aggressive freezing policy: tuples are frozen if they are old enough + to be considered good by all open transactions. In particular, if + a <command>VACUUM FREEZE</> is performed in an otherwise-idle database, + it is guaranteed that <emphasis>all</> tuples in that database will be + frozen. Hence, as long the database is not modified in any way, it + will not need subsequent vacuuming to avoid transaction ID wraparound + problems. This technique is used by <filename>initdb</> to prepare the + <filename>template0</> database. It should also be used to prepare any + user-created databases that are to be marked <literal>datallowconn</> = + <literal>false</> in <filename>pg_database</>, since there isn't any + convenient way to vacuum a database that you can't connect to. Note + that VACUUM's automatic warning message about unvacuumed databases will + ignore <filename>pg_database</> entries with <literal>datallowconn</> = + <literal>false</>, so as to avoid giving false warnings about these + databases; therefore it's up to you to ensure that such databases are + frozen correctly. + </para> + + </sect2> + </sect1> +</chapter> + +<!-- Keep this comment at the end of the file +Local variables: +mode:sgml +sgml-omittag:nil +sgml-shorttag:t +sgml-minimize-attributes:nil +sgml-always-quote-attributes:t +sgml-indent-step:1 +sgml-indent-data:t +sgml-parent-document:nil +sgml-default-dtd-file:"./reference.ced" +sgml-exposed-tags:nil +sgml-local-catalogs:("/usr/lib/sgml/catalog") +sgml-local-ecat-files:nil +End: +--> diff --git a/doc/src/sgml/syntax.sgml b/doc/src/sgml/syntax.sgml index fd340b0c940..ea65b3c2831 100644 --- a/doc/src/sgml/syntax.sgml +++ b/doc/src/sgml/syntax.sgml @@ -1,5 +1,5 @@ <!-- -$Header: /cvsroot/pgsql/doc/src/sgml/syntax.sgml,v 1.44 2001/08/10 18:57:32 tgl Exp $ +$Header: /cvsroot/pgsql/doc/src/sgml/syntax.sgml,v 1.45 2001/08/26 21:17:12 tgl Exp $ --> <chapter id="sql-syntax"> @@ -640,7 +640,7 @@ CAST ( '<replaceable>string</replaceable>' AS <replaceable>type</replaceable> ) <indexterm> <primary>OID</primary> </indexterm> - The unique identifier (object ID) of a row. This is a serial number + The object identifier (object ID) of a row. This is a serial number that is automatically added by Postgres to all table rows (unless the table was created WITHOUT OIDS, in which case this column is not present). @@ -688,7 +688,7 @@ CAST ( '<replaceable>string</replaceable>' AS <replaceable>type</replaceable> ) <para> The identity (transaction ID) of the deleting transaction, or zero for an undeleted tuple. It is possible for this field - to be nonzero in a visible tuple: that indicates that the + to be nonzero in a visible tuple: that usually indicates that the deleting transaction hasn't committed yet, or that an attempted deletion was rolled back. </para> @@ -712,7 +712,8 @@ CAST ( '<replaceable>string</replaceable>' AS <replaceable>type</replaceable> ) (block number, tuple index within block) that identifies the physical location of the tuple. Note that although the ctid can be used to locate the tuple very quickly, a row's ctid - will change each time it is updated or moved by VACUUM. + will change each time it is updated or moved by <command>VACUUM + FULL</>. Therefore ctid is useless as a long-term row identifier. The OID, or even better a user-defined serial number, should be used to identify logical rows. @@ -737,7 +738,19 @@ CAST ( '<replaceable>string</replaceable>' AS <replaceable>type</replaceable> ) </para> <para> - Transaction and command identifiers are 32-bit quantities. + Transaction identifiers are 32-bit quantities. In a long-lived + database it is possible for transaction IDs to wrap around. This + is not a fatal problem given appropriate maintenance procedures; + see the Administrator's Guide for details. However, it is unwise + to depend on uniqueness of transaction IDs over the long term + (more than one billion transactions). + </para> + + <para> + Command identifiers are also 32-bit quantities. This creates a hard + limit of 2^32 (4 billion) SQL commands within a single transaction. + In practice this limit is not a problem --- note that the limit is on + number of SQL queries, not number of tuples processed. </para> <para> |