-<!-- $PostgreSQL: pgsql/doc/src/sgml/ddl.sgml,v 1.37 2005/01/09 17:47:30 tgl Exp $ -->
+<!-- $PostgreSQL: pgsql/doc/src/sgml/ddl.sgml,v 1.38 2005/01/17 01:29:02 tgl Exp $ -->
<chapter id="ddl">
<title>Data Definition</title>
</para>
</sect1>
- <sect1 id="ddl-system-columns">
- <title>System Columns</title>
-
- <para>
- Every table has several <firstterm>system columns</> that are
- implicitly defined by the system. Therefore, these names cannot be
- used as names of user-defined columns. (Note that these
- restrictions are separate from whether the name is a key word or
- not; quoting a name will not allow you to escape these
- restrictions.) You do not really need to be concerned about these
- columns, just know they exist.
- </para>
-
- <indexterm>
- <primary>column</primary>
- <secondary>system column</secondary>
- </indexterm>
-
- <variablelist>
- <varlistentry>
- <term><structfield>oid</></term>
- <listitem>
- <para>
- <indexterm>
- <primary>OID</primary>
- <secondary>column</secondary>
- </indexterm>
- The object identifier (object ID) of a row. This is a serial
- number that is automatically added by
- <productname>PostgreSQL</productname> to all table rows (unless
- the table was created using <literal>WITHOUT OIDS</literal>, in which
- case this column is not present). This column is of type
- <type>oid</type> (same name as the column); see <xref
- linkend="datatype-oid"> for more information about the type.
- </para>
- </listitem>
- </varlistentry>
-
- <varlistentry>
- <term><structfield>tableoid</></term>
- <listitem>
- <indexterm>
- <primary>tableoid</primary>
- </indexterm>
-
- <para>
- The OID of the table containing this row. This column is
- particularly handy for queries that select from inheritance
- hierarchies, since without it, it's difficult to tell which
- individual table a row came from. The
- <structfield>tableoid</structfield> can be joined against the
- <structfield>oid</structfield> column of
- <structname>pg_class</structname> to obtain the table name.
- </para>
- </listitem>
- </varlistentry>
-
- <varlistentry>
- <term><structfield>xmin</></term>
- <listitem>
- <indexterm>
- <primary>xmin</primary>
- </indexterm>
-
- <para>
- The identity (transaction ID) of the inserting transaction for
- this row version. (A row version is an individual state of a
- row; each update of a row creates a new row version for the same
- logical row.)
- </para>
- </listitem>
- </varlistentry>
-
- <varlistentry>
- <term><structfield>cmin</></term>
- <listitem>
- <indexterm>
- <primary>cmin</primary>
- </indexterm>
-
- <para>
- The command identifier (starting at zero) within the inserting
- transaction.
- </para>
- </listitem>
- </varlistentry>
-
- <varlistentry>
- <term><structfield>xmax</></term>
- <listitem>
- <indexterm>
- <primary>xmax</primary>
- </indexterm>
-
- <para>
- The identity (transaction ID) of the deleting transaction, or
- zero for an undeleted row version. It is possible for this column to
- be nonzero in a visible row version. That usually indicates that the
- deleting transaction hasn't committed yet, or that an attempted
- deletion was rolled back.
- </para>
- </listitem>
- </varlistentry>
-
- <varlistentry>
- <term><structfield>cmax</></term>
- <listitem>
- <indexterm>
- <primary>cmax</primary>
- </indexterm>
-
- <para>
- The command identifier within the deleting transaction, or zero.
- </para>
- </listitem>
- </varlistentry>
-
- <varlistentry>
- <term><structfield>ctid</></term>
- <listitem>
- <indexterm>
- <primary>ctid</primary>
- </indexterm>
-
- <para>
- The physical location of the row version within its table. Note that
- although the <structfield>ctid</structfield> can be used to
- locate the row version very quickly, a row's
- <structfield>ctid</structfield> will change each time it is
- updated or moved by <command>VACUUM FULL</>. Therefore
- <structfield>ctid</structfield> is useless as a long-term row
- identifier. The OID, or even better a user-defined serial
- number, should be used to identify logical rows.
- </para>
- </listitem>
- </varlistentry>
- </variablelist>
-
- <para>
- OIDs are 32-bit quantities and are assigned from a single
- cluster-wide counter. In a large or long-lived database, it is
- possible for the counter to wrap around. Hence, it is bad
- practice to assume that OIDs are unique, unless you take steps to
- ensure that this is the case. If you need to identify the rows in
- a table, using a sequence generator is strongly recommended.
- However, OIDs can be used as well, provided that a few additional
- precautions are taken:
-
- <itemizedlist>
- <listitem>
- <para>
- A unique constraint should be created on the OID column of each
- table for which the OID will be used to identify rows.
- </para>
- </listitem>
- <listitem>
- <para>
- OIDs should never be assumed to be unique across tables; use
- the combination of <structfield>tableoid</> and row OID if you
- need a database-wide identifier.
- </para>
- </listitem>
- <listitem>
- <para>
- The tables in question should be created using <literal>WITH
- OIDS</literal> to ensure forward compatibility with future
- releases of <productname>PostgreSQL</productname>. It is
- planned that <literal>WITHOUT OIDS</> will become the default.
- </para>
- </listitem>
- </itemizedlist>
- </para>
-
- <para>
- Transaction identifiers are also 32-bit quantities. In a
- long-lived database it is possible for transaction IDs to wrap
- around. This is not a fatal problem given appropriate maintenance
- procedures; see <xref linkend="maintenance"> for details. It is
- unwise, however, to depend on the uniqueness of transaction IDs
- over the long term (more than one billion transactions).
- </para>
-
- <para>
- Command
- identifiers are also 32-bit quantities. This creates a hard limit
- of 2<superscript>32</> (4 billion) <acronym>SQL</acronym> commands
- within a single transaction. In practice this limit is not a
- problem — note that the limit is on number of
- <acronym>SQL</acronym> commands, not number of rows processed.
- </para>
- </sect1>
-
<sect1 id="ddl-default">
<title>Default Values</title>
</para>
<para>
- The default value may be a scalar expression, which will be
+ The default value may be an expression, which will be
evaluated whenever the default value is inserted
(<emphasis>not</emphasis> when the table is created). A common example
is that a timestamp column may have a default of <literal>now()</>,
<para>
A check constraint is the most generic constraint type. It allows
- you to specify that the value in a certain column must satisfy an
- arbitrary expression. For instance, to require positive product
- prices, you could use:
+ you to specify that the value in a certain column must satisfy a
+ Boolean (truth-value) expression. For instance, to require positive
+ product prices, you could use:
<programlisting>
CREATE TABLE products (
product_no integer,
</programlisting>
So, to specify a named constraint, use the key word
<literal>CONSTRAINT</literal> followed by an identifier followed
- by the constraint definition.
+ by the constraint definition. (If you don't specify a constraint
+ name in this way, the system chooses a name for you.)
</para>
<para>
name text,
price numeric CHECK (price > 0),
discounted_price numeric CHECK (discounted_price > 0),
- CHECK (price > discounted_price)
+ <emphasis>CHECK (price > discounted_price)</emphasis>
);
</programlisting>
</para>
<para>
We say that the first two constraints are column constraints, whereas the
third one is a table constraint because it is written separately
- from the column definitions. Column constraints can also be
+ from any one column definition. Column constraints can also be
written as table constraints, while the reverse is not necessarily
- possible. The above example could also be written as
+ possible, since a column constraint is supposed to refer to only the
+ column it is attached to. (<productname>PostgreSQL</productname> doesn't
+ enforce that rule, but you should follow it if you want your table
+ definitions to work with other database systems.) The above example could
+ also be written as
<programlisting>
CREATE TABLE products (
product_no integer,
It's a matter of taste.
</para>
+ <para>
+ Names can be assigned to table constraints in just the same way as
+ for column constraints:
+<programlisting>
+CREATE TABLE products (
+ product_no integer,
+ name text,
+ price numeric,
+ CHECK (price > 0),
+ discounted_price numeric,
+ CHECK (discounted_price > 0),
+ <emphasis>CONSTRAINT valid_discount</> CHECK (price > discounted_price)
+);
+</programlisting>
+ </para>
+
<indexterm>
<primary>null value</primary>
<secondary sortas="check constraints">with check constraints</secondary>
<para>
It should be noted that a check constraint is satisfied if the
check expression evaluates to true or the null value. Since most
- expressions will evaluate to the null value if one operand is null,
+ expressions will evaluate to the null value if any operand is null,
they will not prevent null values in the constrained columns. To
ensure that a column does not contain null values, the not-null
constraint described in the next section can be used.
<para>
Of course, a column can have more than one constraint. Just write
- the constraints after one another:
+ the constraints one after another:
<programlisting>
CREATE TABLE products (
product_no integer NOT NULL,
The <literal>NOT NULL</literal> constraint has an inverse: the
<literal>NULL</literal> constraint. This does not mean that the
column must be null, which would surely be useless. Instead, this
- simply defines the default behavior that the column may be null.
+ simply selects the default behavior that the column may be null.
The <literal>NULL</literal> constraint is not defined in the SQL
standard and should not be used in portable applications. (It was
only added to <productname>PostgreSQL</productname> to be
<emphasis>UNIQUE (a, c)</emphasis>
);
</programlisting>
+ This specifies that the combination of values in the indicated columns
+ is unique across the whole table, though any one of the columns
+ need not be (and ordinarily isn't) unique.
</para>
<para>
- It is also possible to assign names to unique constraints:
+ You can assign your own name for a unique constraint, in the usual way:
<programlisting>
CREATE TABLE products (
product_no integer <emphasis>CONSTRAINT must_be_different</emphasis> UNIQUE,
<programlisting>
CREATE TABLE orders (
order_id integer PRIMARY KEY,
- product_no integer REFERENCES products,
+ product_no integer <emphasis>REFERENCES products</emphasis>,
quantity integer
);
</programlisting>
<emphasis>FOREIGN KEY (b, c) REFERENCES other_table (c1, c2)</emphasis>
);
</programlisting>
- Of course, the number and type of the constrained columns needs to
+ Of course, the number and type of the constrained columns need to
match the number and type of the referenced columns.
</para>
+ <para>
+ You can assign your own name for a foreign key constraint,
+ in the usual way.
+ </para>
+
<para>
A table can contain more than one foreign key constraint. This is
used to implement many-to-many relationships between tables. Say
PRIMARY KEY (product_no, order_id)
);
</programlisting>
- Note also that the primary key overlaps with the foreign keys in
+ Notice that the primary key overlaps with the foreign keys in
the last table.
</para>
</sect2>
</sect1>
+ <sect1 id="ddl-system-columns">
+ <title>System Columns</title>
+
+ <para>
+ Every table has several <firstterm>system columns</> that are
+ implicitly defined by the system. Therefore, these names cannot be
+ used as names of user-defined columns. (Note that these
+ restrictions are separate from whether the name is a key word or
+ not; quoting a name will not allow you to escape these
+ restrictions.) You do not really need to be concerned about these
+ columns, just know they exist.
+ </para>
+
+ <indexterm>
+ <primary>column</primary>
+ <secondary>system column</secondary>
+ </indexterm>
+
+ <variablelist>
+ <varlistentry>
+ <term><structfield>oid</></term>
+ <listitem>
+ <para>
+ <indexterm>
+ <primary>OID</primary>
+ <secondary>column</secondary>
+ </indexterm>
+ The object identifier (object ID) of a row. This is a serial
+ number that is automatically added by
+ <productname>PostgreSQL</productname> to all table rows (unless
+ the table was created using <literal>WITHOUT OIDS</literal>, in which
+ case this column is not present). This column is of type
+ <type>oid</type> (same name as the column); see <xref
+ linkend="datatype-oid"> for more information about the type.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><structfield>tableoid</></term>
+ <listitem>
+ <indexterm>
+ <primary>tableoid</primary>
+ </indexterm>
+
+ <para>
+ The OID of the table containing this row. This column is
+ particularly handy for queries that select from inheritance
+ hierarchies, since without it, it's difficult to tell which
+ individual table a row came from. The
+ <structfield>tableoid</structfield> can be joined against the
+ <structfield>oid</structfield> column of
+ <structname>pg_class</structname> to obtain the table name.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><structfield>xmin</></term>
+ <listitem>
+ <indexterm>
+ <primary>xmin</primary>
+ </indexterm>
+
+ <para>
+ The identity (transaction ID) of the inserting transaction for
+ this row version. (A row version is an individual state of a
+ row; each update of a row creates a new row version for the same
+ logical row.)
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><structfield>cmin</></term>
+ <listitem>
+ <indexterm>
+ <primary>cmin</primary>
+ </indexterm>
+
+ <para>
+ The command identifier (starting at zero) within the inserting
+ transaction.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><structfield>xmax</></term>
+ <listitem>
+ <indexterm>
+ <primary>xmax</primary>
+ </indexterm>
+
+ <para>
+ The identity (transaction ID) of the deleting transaction, or
+ zero for an undeleted row version. It is possible for this column to
+ be nonzero in a visible row version. That usually indicates that the
+ deleting transaction hasn't committed yet, or that an attempted
+ deletion was rolled back.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><structfield>cmax</></term>
+ <listitem>
+ <indexterm>
+ <primary>cmax</primary>
+ </indexterm>
+
+ <para>
+ The command identifier within the deleting transaction, or zero.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><structfield>ctid</></term>
+ <listitem>
+ <indexterm>
+ <primary>ctid</primary>
+ </indexterm>
+
+ <para>
+ The physical location of the row version within its table. Note that
+ although the <structfield>ctid</structfield> can be used to
+ locate the row version very quickly, a row's
+ <structfield>ctid</structfield> will change each time it is
+ updated or moved by <command>VACUUM FULL</>. Therefore
+ <structfield>ctid</structfield> is useless as a long-term row
+ identifier. The OID, or even better a user-defined serial
+ number, should be used to identify logical rows.
+ </para>
+ </listitem>
+ </varlistentry>
+ </variablelist>
+
+ <para>
+ OIDs are 32-bit quantities and are assigned from a single
+ cluster-wide counter. In a large or long-lived database, it is
+ possible for the counter to wrap around. Hence, it is bad
+ practice to assume that OIDs are unique, unless you take steps to
+ ensure that this is the case. If you need to identify the rows in
+ a table, using a sequence generator is strongly recommended.
+ However, OIDs can be used as well, provided that a few additional
+ precautions are taken:
+
+ <itemizedlist>
+ <listitem>
+ <para>
+ A unique constraint should be created on the OID column of each
+ table for which the OID will be used to identify rows.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ OIDs should never be assumed to be unique across tables; use
+ the combination of <structfield>tableoid</> and row OID if you
+ need a database-wide identifier.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ The tables in question should be created using <literal>WITH
+ OIDS</literal> to ensure forward compatibility with future
+ releases of <productname>PostgreSQL</productname>. It is
+ planned that <literal>WITHOUT OIDS</> will become the default.
+ </para>
+ </listitem>
+ </itemizedlist>
+ </para>
+
+ <para>
+ Transaction identifiers are also 32-bit quantities. In a
+ long-lived database it is possible for transaction IDs to wrap
+ around. This is not a fatal problem given appropriate maintenance
+ procedures; see <xref linkend="maintenance"> for details. It is
+ unwise, however, to depend on the uniqueness of transaction IDs
+ over the long term (more than one billion transactions).
+ </para>
+
+ <para>
+ Command
+ identifiers are also 32-bit quantities. This creates a hard limit
+ of 2<superscript>32</> (4 billion) <acronym>SQL</acronym> commands
+ within a single transaction. In practice this limit is not a
+ problem — note that the limit is on number of
+ <acronym>SQL</acronym> commands, not number of rows processed.
+ </para>
+ </sect1>
+
<sect1 id="ddl-inherit">
<title>Inheritance</title>
<para>
In some cases you may wish to know which table a particular row
originated from. There is a system column called
- <structfield>TABLEOID</structfield> in each table which can tell you the
+ <structfield>tableoid</structfield> in each table which can tell you the
originating table:
<programlisting>
<para>
When you create a table and you realize that you made a mistake, or
- the requirements of the application changed, then you can drop the
+ the requirements of the application change, then you can drop the
table and create it again. But this is not a convenient option if
the table is already filled with data, or if the table is
referenced by other database objects (for instance a foreign key
constraint). Therefore <productname>PostgreSQL</productname>
- provides a family of commands to make modifications on existing
- tables.
+ provides a family of commands to make modifications to existing
+ tables. Note that this is conceptually distinct from altering
+ the data contained in the table: here we are interested in altering
+ the definition, or structure, of the table.
</para>
<para>
</indexterm>
<para>
- To add a column, use this command:
+ To add a column, use a command like this:
<programlisting>
ALTER TABLE products ADD COLUMN description text;
</programlisting>
</indexterm>
<para>
- To remove a column, use this command:
+ To remove a column, use a command like this:
<programlisting>
ALTER TABLE products DROP COLUMN description;
</programlisting>
+ Whatever data was in the column disappears. Table constraints involving
+ the column are dropped, too. However, if the column is referenced by a
+ foreign key constraint of another table,
+ <productname>PostgreSQL</productname> will not silently drop that
+ constraint. You can authorize dropping everything that depends on
+ the column by adding <literal>CASCADE</>:
+<programlisting>
+ALTER TABLE products DROP COLUMN description CASCADE;
+</programlisting>
+ See <xref linkend="ddl-depend"> for a description of the general
+ mechanism behind this.
</para>
</sect2>
identifier.)
</para>
+ <para>
+ As with dropping a column, you need to add <literal>CASCADE</> if you
+ want to drop a constraint that something else depends on. An example
+ is that a foreign key constraint depends on a unique or primary key
+ constraint on the referenced column(s).
+ </para>
+
<para>
This works the same for all constraint types except not-null
constraints. To drop a not null constraint use
<programlisting>
ALTER TABLE products ALTER COLUMN price DROP DEFAULT;
</programlisting>
- This is equivalent to setting the default to null.
+ This is effectively the same as setting the default to null.
As a consequence, it is not an error
to drop a default where one hadn't been defined, because the
default is implicitly the null value.
<synopsis>
<replaceable>schema</><literal>.</><replaceable>table</>
</synopsis>
+ This works anywhere a table name is expected, including the table
+ modification commands and the data access commands discussed in
+ the following chapters.
(For brevity we will speak of tables only, but the same ideas apply
to other kinds of named objects, such as types and functions.)
</para>
<synopsis>
<replaceable>database</><literal>.</><replaceable>schema</><literal>.</><replaceable>table</>
</synopsis>
- can be used too, but at present this is just for pro-forma compliance
- with the SQL standard. If you write a database name, it must be the
- same as the database you are connected to.
+ can be used too, but at present this is just for <foreignphrase>pro
+ forma</> compliance with the SQL standard. If you write a database name,
+ it must be the same as the database you are connected to.
</para>
<para>
...
);
</programlisting>
- This works anywhere a table name is expected, including the table
- modification commands and the data access commands discussed in
- the following chapters.
</para>
<indexterm>
</para>
<para>
- See also <xref linkend="functions-info"> for other ways to access
+ See also <xref linkend="functions-info"> for other ways to manipulate
the schema search path.
</para>
<listitem>
<para>
- Functions, operators, data types, domains
+ Functions and operators
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ Data types and domains
</para>
</listitem>
<para>
According to the SQL standard, specifying either
<literal>RESTRICT</literal> or <literal>CASCADE</literal> is
- required. No database system actually implements it that way, but
+ required. No database system actually enforces that rule, but
whether the default behavior is <literal>RESTRICT</literal> or
<literal>CASCADE</literal> varies across systems.
</para>
from <productname>PostgreSQL</productname> versions prior to 7.3
are <emphasis>not</emphasis> maintained or created during the
upgrade process. All other dependency types will be properly
- created during an upgrade.
+ created during an upgrade from a pre-7.3 database.
</para>
</note>
</sect1>