summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorPeter Eisentraut2001-01-22 23:34:33 +0000
committerPeter Eisentraut2001-01-22 23:34:33 +0000
commit21a3857f1f4a9a8c82355f753dfacff752fdebf3 (patch)
tree608a448675e1a74e16e770f3aa99e26d120b63f5
parente9c936ff38da738d1fd68525eee9c7c1f0c558dc (diff)
Rip out table expression section from SQL syntax chapter and develop it
into new chapter on query (SELECT) syntax. In the end this should become a narrative and example-filled counterpart to the SELECT reference page.
-rw-r--r--doc/src/sgml/filelist.sgml4
-rw-r--r--doc/src/sgml/queries.sgml819
-rw-r--r--doc/src/sgml/syntax.sgml538
-rw-r--r--doc/src/sgml/user.sgml3
4 files changed, 832 insertions, 532 deletions
diff --git a/doc/src/sgml/filelist.sgml b/doc/src/sgml/filelist.sgml
index 799f517144f..5d784d7dcc4 100644
--- a/doc/src/sgml/filelist.sgml
+++ b/doc/src/sgml/filelist.sgml
@@ -1,4 +1,4 @@
-<!-- $Header: /cvsroot/pgsql/doc/src/sgml/filelist.sgml,v 1.4 2001/01/06 11:58:56 petere Exp $ -->
+<!-- $Header: /cvsroot/pgsql/doc/src/sgml/filelist.sgml,v 1.5 2001/01/22 23:34:32 petere Exp $ -->
<!entity about SYSTEM "about.sgml">
<!entity history SYSTEM "history.sgml">
@@ -31,7 +31,7 @@
<!entity plsql SYSTEM "plsql.sgml">
<!entity pltcl SYSTEM "pltcl.sgml">
<!entity psql SYSTEM "psql.sgml">
-<!entity query-ug SYSTEM "query-ug.sgml">
+<!entity queries SYSTEM "queries.sgml">
<!entity storage SYSTEM "storage.sgml">
<!entity syntax SYSTEM "syntax.sgml">
<!entity typeconv SYSTEM "typeconv.sgml">
diff --git a/doc/src/sgml/queries.sgml b/doc/src/sgml/queries.sgml
new file mode 100644
index 00000000000..e9ede076411
--- /dev/null
+++ b/doc/src/sgml/queries.sgml
@@ -0,0 +1,819 @@
+<!-- $Header: /cvsroot/pgsql/doc/src/sgml/queries.sgml,v 1.1 2001/01/22 23:34:33 petere Exp $ -->
+
+<chapter id="queries">
+ <title>Queries</title>
+
+ <para>
+ A <firstterm>query</firstterm> is the process of or the command to
+ retrieve data from a database. In SQL the <command>SELECT</command>
+ command is used to specify queries. The general syntax of the
+ <command>SELECT</command> command is
+<synopsis>
+SELECT <replaceable>select_list</replaceable> FROM <replaceable>table_expression</replaceable> <optional><replaceable>sort_specification</replaceable></optional>
+</synopsis>
+ The following sections describe the details of the select list, the
+ table expression, and the sort specification. The simplest kind of
+ query has the form
+<programlisting>
+SELECT * FROM table1;
+</programlisting>
+ Assuming that there is a table called table1, this command would
+ retrieve all rows and all columns from table1. (The method of
+ retrieval depends on the client application. For example, the
+ <application>psql</application> program will display an ASCII-art
+ table on the screen, client libraries will offer functions to
+ retrieve individual rows and columns.) The select list
+ specification <literal>*</literal> means all columns that the table
+ expression happens to provide. A select list can also select a
+ subset of the available columns or even make calculations on the
+ columns before retrieving them; see <xref
+ linkend="queries-select-lists">. For example, if table1 has columns
+ named a, b, and c (and perhaps others) you can make the following
+ query:
+<programlisting>
+SELECT a, b + c FROM table1;
+</programlisting>
+ (assuming that b and c are of a numeric data type).
+ </para>
+
+ <para>
+ <literal>FROM table1</literal> is a particularly simple kind of
+ table expression. In general, table expressions can be complex
+ constructs of base tables, joins, and subqueries. But you can also
+ omit the table expression entirely and use the SELECT command as a
+ calculator:
+<programlisting>
+SELECT 3 * 4;
+</programlisting>
+ This is more useful if the expressions in the select list return
+ varying results. For example, you could call a function this way.
+<programlisting>
+SELECT random();
+</programlisting>
+ </para>
+
+ <sect1 id="queries-table-expressions">
+ <title>Table Expressions</title>
+
+ <para>
+ A <firstterm>table expression</firstterm> specifies a table. The
+ table expression contains a FROM clause that is optionally followed
+ by WHERE, GROUP BY, and HAVING clauses. Trivial table expressions
+ simply refer to a table on disk, a so-called base table, but more
+ complex expressions can be used to modify or combine base tables in
+ various ways.
+ </para>
+
+ <para>
+ The WHERE, GROUP BY, and HAVING clauses in the table expression
+ specify a pipeline of successive transformations performed on the
+ table derived in the FROM clause. The final transformed table that
+ is derived provides the input rows used to derive output rows as
+ specified by the select list of derived column value expressions.
+ </para>
+
+ <sect2 id="queries-from">
+ <title>FROM clause</title>
+
+ <para>
+ The FROM clause derives a table from one or more other tables
+ given in a comma-separated table reference list.
+<synopsis>
+FROM <replaceable>table_reference</replaceable> <optional>, <replaceable>table_reference</replaceable> <optional>, ...</optional></optional>
+</synopsis>
+
+ A table reference may be a table name or a derived table such as a
+ subquery, a table join, or complex combinations of these. If more
+ than one table reference is listed in the FROM clause they are
+ CROSS JOINed (see below) to form the derived table that may then
+ be subject to transformations by the WHERE, GROUP BY, and HAVING
+ clauses and is finally the result of the overall table expression.
+ </para>
+
+ <para>
+ If a table reference is a simple table name and it is the
+ supertable in a table inheritance hierarchy, rows of the table
+ include rows from all of its subtable successors unless the
+ keyword ONLY precedes the table name.
+ </para>
+
+ <sect3 id="queries-join">
+ <title>Joined Tables</title>
+
+ <para>
+ A joined table is a table derived from two other (real or
+ derived) tables according to the rules of the particular join
+ type. INNER, OUTER, NATURAL, and CROSS JOIN are supported.
+ </para>
+
+ <variablelist>
+ <title>Join Types</title>
+
+ <varlistentry>
+ <term>CROSS JOIN</term>
+
+ <listitem>
+<synopsis>
+<replaceable>T1</replaceable> CROSS JOIN <replaceable>T2</replaceable>
+</synopsis>
+
+ <para>
+ For each combination of rows from
+ <replaceable>T1</replaceable> and
+ <replaceable>T2</replaceable> the derived table will contain a
+ row consisting of all columns in <replaceable>T1</replaceable>
+ followed by all columns in <replaceable>T2</replaceable>. If
+ the tables have have N and M rows respectively, the joined
+ table will have N * M rows. A cross join is essentially an
+ <literal>INNER JOIN ON TRUE</literal>.
+ </para>
+
+ <tip>
+ <para>
+ <literal>FROM <replaceable>T1</replaceable> CROSS JOIN
+ <replaceable>T2</replaceable></literal> is equivalent to
+ <literal>FROM <replaceable>T1</replaceable>,
+ <replaceable>T2</replaceable></literal>.
+ </para>
+ </tip>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>Qualified JOINs</term>
+ <listitem>
+
+<synopsis>
+<replaceable>T1</replaceable> { <optional>INNER</optional> | { LEFT | RIGHT | FULL } <optional>OUTER</optional> } JOIN <replaceable>T2</replaceable> ON <replaceable>boolean expression</replaceable>
+<replaceable>T1</replaceable> { <optional>INNER</optional> | { LEFT | RIGHT | FULL } <optional>OUTER</optional> } JOIN <replaceable>T2</replaceable> USING ( <replaceable>join column list</replaceable> )
+</synopsis>
+
+ <para>
+ The words <token>INNER</token> and <token>OUTER</token> are
+ optional for all JOINs. <token>INNER</token> is the default;
+ <token>LEFT</token>, <token>RIGHT</token>, and
+ <token>FULL</token> are for OUTER JOINs only.
+ </para>
+
+ <para>
+ The <firstterm>join condition</firstterm> is specified in the
+ ON or USING clause. (The meaning of the join condition
+ depends on the particular join type; see below.) The ON
+ clause takes a Boolean value expression of the same kind as is
+ used in a WHERE clause. The USING clause takes a
+ comma-separated list of column names, which the joined tables
+ must have in common, and joins the tables on the equality of
+ those columns as a set, resulting in a joined table having one
+ column for each common column listed and all of the other
+ columns from both tables. Thus, <literal>USING (a, b,
+ c)</literal> is equivalent to <literal>ON (t1.a = t2.a AND
+ t1.b = t2.b AND t1.c = t2.c)</literal> with the exception that
+ if ON is used there will be two columns a, b, and c in the
+ result, whereas with USING there will be only one of each.
+ </para>
+
+ <variablelist>
+ <varlistentry>
+ <term>INNER JOIN</term>
+
+ <listitem>
+ <para>
+ For each row R1 of T1, the joined table has a row for each
+ row in T2 that satisfies the join condition with R1.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>LEFT OUTER JOIN</term>
+
+ <listitem>
+ <para>
+ First, an INNER JOIN is performed. Then, for a row in T1
+ that does not satisfy the join condition with any row in
+ T2, a joined row is returned with NULL values in columns of
+ T2. Thus, the joined table unconditionally has a row for each
+ row in T1.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>RIGHT OUTER JOIN</term>
+
+ <listitem>
+ <para>
+ This is like a left join, only that the result table will
+ unconditionally have a row for each row in T2.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>FULL OUTER JOIN</term>
+
+ <listitem>
+ <para>
+ First, an INNER JOIN is performed. Then, for each row in
+ T1 that does not satisfy the join condition with any row in
+ T2, a joined row is returned with null values in columns of
+ T2. Also, for each row of T2 that does not satisfy the
+ join condition with any row in T1, a joined row with null
+ values in the columns of T1 is returned.
+ </para>
+ </listitem>
+ </varlistentry>
+ </variablelist>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>NATURAL JOIN</term>
+
+ <listitem>
+<synopsis>
+<replaceable>T1</replaceable> NATURAL { <optional>INNER</optional> | { LEFT | RIGHT | FULL } <optional>OUTER</optional> JOIN <replaceable>T2</replaceable>
+</synopsis>
+ <para>
+ A natural join creates a joined table where every pair of matching
+ column names between the two tables are merged into one column. The
+ join specification is effectively a USING clause containing all the
+ common column names and is otherwise like a Qualified JOIN.
+ </para>
+ </listitem>
+ </varlistentry>
+ </variablelist>
+
+ <para>
+ Joins of all types can be chained together or nested where either
+ or both of <replaceable>T1</replaceable> and
+ <replaceable>T2</replaceable> may be JOINed tables. Parenthesis
+ can be used around JOIN clauses to control the join order which
+ are otherwise left to right.
+ </para>
+ </sect3>
+
+ <sect3 id="queries-subqueries">
+ <title>Subqueries</title>
+
+ <para>
+ Subqueries specifying a derived table must be enclosed in
+ parenthesis and <emphasis>must</emphasis> be named using an AS
+ clause. (See <xref linkend="queries-table-aliases">.)
+ </para>
+
+<programlisting>
+FROM (SELECT * FROM table1) AS alias_name
+</programlisting>
+
+ <para>
+ This example is equivalent to <literal>FROM table1 AS
+ alias_name</literal>. Many subqueries can be written as table
+ joins instead.
+ </para>
+ </sect3>
+
+ <sect3 id="queries-table-aliases">
+ <title>Table and Column Aliases</title>
+
+ <para>
+ A temporary name can be given to tables and complex table
+ references to be used for references to the derived table in
+ further processing. This is called a <firstterm>table
+ alias</firstterm>.
+<synopsis>
+FROM <replaceable>table_reference</replaceable> AS <replaceable>alias</replaceable>
+</synopsis>
+ Here, <replaceable>alias</replaceable> can be any regular
+ identifier. The alias becomes the new name of the table
+ reference for the current query -- it is no longer possible to
+ refer to the table by the original name (if the table reference
+ was an ordinary base table). Thus
+<programlisting>
+SELECT * FROM my_table AS m WHERE my_table.a > 5;
+</programlisting>
+ is not valid SQL syntax. What will happen instead, as a
+ <productname>Postgres</productname> extension, is that an implicit
+ table reference is added to the FROM clause, so the query is
+ processed as if it was written as
+<programlisting>
+SELECT * FROM my_table AS m, my_table WHERE my_table.a > 5;
+</programlisting>
+ Table aliases are mainly for notational convenience, but it is
+ necessary to use them when joining a table to itself, e.g.,
+<programlisting>
+SELECT * FROM my_table AS a CROSS JOIN my_table AS b ...
+</programlisting>
+ Additionally, an alias is required if the table reference is a
+ subquery.
+ </para>
+
+ <para>
+ Parenthesis are used to resolve ambiguities. The following
+ statement will assign the alias <literal>b</literal> to the
+ result of the join, unlike the previous example:
+<programlisting>
+SELECT * FROM (my_table AS a CROSS JOIN my_table) AS b ...
+</programlisting>
+ </para>
+
+ <para>
+<synopsis>
+FROM <replaceable>table_reference</replaceable> <replaceable>alias</replaceable>
+</synopsis>
+ This form is equivalent the previously treated one; the
+ <token>AS</token> key word is noise.
+ </para>
+
+ <para>
+<synopsis>
+FROM <replaceable>table_reference</replaceable> <optional>AS</optional> <replaceable>alias</replaceable> ( <replaceable>column1</replaceable> <optional>, <replaceable>column2</replaceable> <optional>, ...</optional></optional> )
+</synopsis>
+ In addition to renaming the table as described above, the columns
+ of the table are also given temporary names. If less column
+ aliases are specified than the actual table has columns, the last
+ columns are not renamed. This syntax is especially useful for
+ self-joins or subqueries.
+ </para>
+ </sect3>
+
+ <sect3 id="queries-table-expression-examples">
+ <title>Examples</title>
+
+ <para>
+<programlisting>
+FROM T1 INNER JOIN T2 USING (C)
+FROM T1 LEFT OUTER JOIN T2 USING (C)
+FROM (T1 RIGHT OUTER JOIN T2 ON (T1C1=T2C1)) AS DT1
+FROM (T1 FULL OUTER JOIN T2 USING (C)) AS DT1 (DT1C1, DT1C2)
+
+FROM T1 NATURAL INNER JOIN T2
+FROM T1 NATURAL LEFT OUTER JOIN T2
+FROM T1 NATURAL RIGHT OUTER JOIN T2
+FROM T1 NATURAL FULL OUTER JOIN T2
+
+FROM (SELECT * FROM T1) DT1 CROSS JOIN T2, T3
+FROM (SELECT * FROM T1) DT1, T2, T3
+</programlisting>
+
+ Above are some examples of joined tables and complex derived
+ tables. Notice how the AS clause renames or names a derived
+ table and how the optional comma-separated list of column names
+ that follows gives names or renames the columns. The last two
+ FROM clauses produce the same derived table from T1, T2, and T3.
+ The AS keyword was omitted in naming the subquery as DT1. The
+ keywords OUTER and INNER are noise that can be omitted also.
+ </para>
+ </sect3>
+
+ </sect2>
+
+ <sect2 id="queries-where">
+ <title>WHERE clause</title>
+
+ <para>
+ The syntax of the WHERE clause is
+<synopsis>
+WHERE <replaceable>search condition</replaceable>
+</synopsis>
+ where <replaceable>search condition</replaceable> is any value
+ expression as defined in <xref linkend="sql-expressions"> that
+ returns a value of type <type>boolean</type>.
+ </para>
+
+ <para>
+ After the processing of the FROM clause is done, each row of the
+ derived table is checked against the search condition. If the
+ result of the condition is true, the row is kept in the output
+ table, otherwise (that is, if the result is false or NULL) it is
+ discarded. The search condition typically references at least some
+ column in the table generated in the FROM clause; this is not
+ required, but otherwise the WHERE clause will be fairly useless.
+ </para>
+
+ <note>
+ <para>
+ Before the implementation of the JOIN syntax, it was necessary to
+ put the join condition of an inner join in the WHERE clause. For
+ example, these table expressions are equivalent:
+<programlisting>
+FROM a, b WHERE a.id = b.id AND b.val &gt; 5
+</programlisting>
+ and
+<programlisting>
+FROM a INNER JOIN b ON (a.id = b.id) WHERE b.val &gt; 5
+</programlisting>
+ or perhaps even
+<programlisting>
+FROM a NATURAL JOIN b WHERE b.val &gt; 5
+</programlisting>
+ Which one of these you use is mainly a matter of style. The JOIN
+ syntax in the FROM clause is probably not as portable to other
+ products. For outer joins there is no choice in any case: they
+ must be done in the FROM clause.
+ </para>
+ </note>
+
+<programlisting>
+FROM FDT WHERE
+ C1 > 5
+
+FROM FDT WHERE
+ C1 IN (1, 2, 3)
+FROM FDT WHERE
+ C1 IN (SELECT C1 FROM T2)
+FROM FDT WHERE
+ C1 IN (SELECT C3 FROM T2 WHERE C2 = FDT.C1 + 10)
+
+FROM FDT WHERE
+ C1 BETWEEN (SELECT C3 FROM T2 WHERE C2 = FDT.C1 + 10) AND 100
+
+FROM FDT WHERE
+ EXISTS (SELECT C1 FROM T2 WHERE C2 > FDT.C1)
+</programlisting>
+
+ <para>
+ In the examples above, FDT is the table derived in the FROM
+ clause. Rows that do not meet the search condition of the where
+ clause are eliminated from FDT. Notice the use of scalar
+ subqueries as value expressions (C2 assumed UNIQUE). Just like
+ any other query, the subqueries can employ complex table
+ expressions. Notice how FDT is referenced in the subqueries.
+ Qualifying C1 as FDT.C1 is only necessary if C1 is the name of a
+ column in the derived input table of the subquery. Qualifying the
+ column name adds clarity even when it is not needed. The column
+ naming scope of an outer query extends into its inner queries.
+ </para>
+ </sect2>
+
+
+ <sect2 id="queries-group">
+ <title>GROUP BY and HAVING clauses</title>
+
+ <para>
+ After passing the WHERE filter, the derived input table may be
+ subject to grouping, using the GROUP BY clause, and elimination of
+ group rows using the HAVING clause.
+ </para>
+
+<synopsis>
+SELECT <replaceable>select_list</replaceable> FROM ... <optional>WHERE ...</optional> GROUP BY <replaceable>grouping_column_reference</replaceable> <optional>, <replaceable>grouping_column_reference</replaceable></optional>...
+</synopsis>
+
+ <para>
+ The GROUP BY clause is used to group together rows in a table that
+ share the same values in all the columns listed. The order in
+ which the columns are listed does not matter (as opposed to an
+ ORDER BY clause). The purpose is to reduce each group of rows
+ sharing common values into one group row that is representative of
+ all rows in the group. This is done to eliminate redundancy in
+ the output and/or obtain aggregates that apply to these groups.
+ </para>
+
+ <para>
+ Once a table is grouped, columns that are not included in the
+ grouping cannot be referenced, except in aggregate expressions,
+ since a specific value in those columns is ambiguous - which row
+ in the group should it come from? The grouped-by columns can be
+ referenced in select list column expressions since they have a
+ known constant value per group. Aggregate functions on the
+ ungrouped columns provide values that span the rows of a group,
+ not of the whole table. For instance, a
+ <function>sum(sales)</function> on a grouped table by product code
+ gives the total sales for each product, not the total sales on all
+ products. The aggregates of the ungrouped columns are
+ representative of the group, whereas their individual values may
+ not be.
+ </para>
+
+ <para>
+ Example:
+<programlisting>
+SELECT pid, p.name, (sum(s.units) * p.price) AS sales
+ FROM products p LEFT JOIN sales s USING ( pid )
+ GROUP BY pid, p.name, p.price;
+</programlisting>
+ In this example, the columns pid, p.name, and p.price must be in
+ the GROUP BY clause since they are referenced in the query select
+ list. The column s.units does not have to be in the GROUP BY list
+ since it is only used in an aggregate expression
+ (<function>sum()</function>), which represents the group of sales
+ of a product. For each product, a summary row is returned about
+ all sales of the product.
+ </para>
+
+ <para>
+ In strict SQL, GROUP BY can only group by columns of the source
+ table but Postgres extends this to also allow GROUP BY to group by
+ select columns in the query select list. Grouping by value
+ expressions instead of simple column names is also allowed.
+ </para>
+
+ <para>
+<synopsis>
+SELECT <replaceable>select_list</replaceable> FROM ... <optional>WHERE ...</optional> GROUP BY ... HAVING <replaceable>boolean_expression</replaceable>
+</synopsis>
+ If a table has been grouped using a GROUP BY clause, but then only
+ certain groups are of interest, the HAVING clause can be used,
+ much like a WHERE clause, to eliminate groups from a grouped
+ table. For some queries, Postgres allows a HAVING clause to be
+ used without a GROUP BY and then it acts just like another WHERE
+ clause, but the point in using HAVING that way is not clear. Since
+ HAVING operates on groups, only grouped columns can be listed in
+ the HAVING clause. If selection based on some ungrouped column is
+ desired, it should be expressed in the WHERE clause.
+ </para>
+
+ <para>
+ Example:
+<programlisting>
+SELECT pid AS "Products",
+ p.name AS "Over 5000",
+ (sum(s.units) * (p.price - p.cost)) AS "Past Month Profit"
+ FROM products p LEFT JOIN sales s USING ( pid )
+ WHERE p.date > CURRENT_DATE - INTERVAL '4 weeks'
+ GROUP BY pid, p.name, p.price, p.cost
+ HAVING p.price > 5000;
+</programlisting>
+ In the example above, the WHERE clause is selecting rows by a
+ column that is not grouped, while the HAVING clause
+ is selecting groups with a price greater than 5000.
+ </para>
+ </sect2>
+ </sect1>
+
+
+ <sect1 id="queries-select-lists">
+ <title>Select Lists</title>
+
+ <para>
+ The table expression in the <command>SELECT</command> command
+ constructs an intermediate virtual table by possibly combining
+ tables, views, eliminating rows, grouping, etc. This table is
+ finally passed on to processing by the select list. The select
+ list determines which <emphasis>columns</emphasis> of the
+ intermediate table are retained. The simplest kind of select list
+ is <literal>*</literal> which retains all columns that the table
+ expression produces. Otherwise, a select list is a comma-separated
+ list of value expressions (as defined in <xref
+ linkend="sql-expressions">). For instance, it could be a list of
+ column names:
+<programlisting>
+SELECT a, b, c FROM ...
+</programlisting>
+ The columns names a, b, and c are either the actual names of the
+ columns of table referenced in the FROM clause, or the aliases
+ given to them as explained in <xref linkend="queries-table-aliases">.
+ The name space available in the select list is the same as in the
+ WHERE clause (unless grouping is used, in which case it is the same
+ as in the HAVING clause). If more than one table has a column of
+ the same name, the table name must also be given, as in
+<programlisting>
+SELECT tbl1.a, tbl2.b, tbl1.c FROM ...
+</programlisting>
+ (see also <xref linkend="queries-where">).
+ </para>
+
+ <para>
+ If an arbitrary value expression is used in the select list, it
+ conceptually adds a new virtual column to the returned table. The
+ value expression is effectively evaluated once for each retrieved
+ row with real values substituted for any column references. But
+ the expressions in the select list do not have to reference any
+ columns in the table expression of the FROM clause; they can be
+ constant arithmetic expressions as well, for instance.
+ </para>
+
+ <sect2 id="queries-column-labels">
+ <title>Column Labels</title>
+
+ <para>
+ The entries in the select list can be assigned names for further
+ processing. The <quote>further processing</quote> in this case is
+ an optional sort specification and the client application (e.g.,
+ column headers for display). For example:
+<programlisting>
+SELECT a AS value, b + c AS sum FROM ...
+</programlisting>
+ The AS key word can in fact be omitted.
+ </para>
+
+ <para>
+ If no name is chosen, the system assigns a default. For simple
+ column references, this is the name of the column. For function
+ calls, this is the name of the function. For complex expressions,
+ the system will generate a generic name.
+ </para>
+
+ <note>
+ <para>
+ The naming of output columns here is different from that done in
+ the FROM clause (see <xref linkend="queries-table-aliases">). This
+ pipeline will in fact allow you to rename the same column twice,
+ but the name chosen in the select list is the one that will be
+ passed on.
+ </para>
+ </note>
+ </sect2>
+
+ <sect2 id="queries-distinct">
+ <title>DISTINCT</title>
+
+ <para>
+ After the select list has been processed, the result table may
+ optionally be subject to the elimination of duplicates. The
+ <token>DISTINCT</token> key word is written directly after the
+ <token>SELECT</token> to enable this:
+<synopsis>
+SELECT DISTINCT <replaceable>select_list</replaceable> ...
+</synopsis>
+ (Instead of <token>DISTINCT</token> the word <token>ALL</token>
+ can be used to select the default behavior of retaining all rows.)
+ </para>
+
+ <para>
+ Obviously, two rows are considered distinct if they differ in at
+ least one column value. NULLs are considered equal in this
+ consideration.
+ </para>
+
+ <para>
+ Alternatively, an arbitrary expression can determine what rows are
+ to be considered distinct:
+<synopsis>
+SELECT DISTINCT ON (<replaceable>expression</replaceable> <optional>, <replaceable>expression</replaceable> ...</optional>) <replaceable>select_list</replaceable> ...
+</synopsis>
+ Here <replaceable>expression</replaceable> is an arbitrary value
+ expression that is evaluated for all rows. A set of rows for
+ which all the expressions is equal are considered duplicates and
+ only the first row is kept in the output. Note that the
+ <quote>first row</quote> of a set is unpredictable unless the
+ query is sorted.
+ </para>
+
+ <para>
+ The DISTINCT ON clause is not part of the SQL standard and is
+ sometimes considered bad style because of the indeterminate nature
+ of its results. With judicious use of GROUP BY and subselects in
+ FROM the construct can be avoided, but it is very often the much
+ more convenient alternative.
+ </para>
+ </sect2>
+ </sect1>
+
+ <sect1 id="queries-union">
+ <title>Combining Queries</title>
+
+ <para>
+ The results of two queries can be combined using the set operations
+ union, intersection, and difference. The syntax is
+<synopsis>
+<replaceable>query1</replaceable> UNION <optional>ALL</optional> <replaceable>query2</replaceable>
+<replaceable>query1</replaceable> INTERSECT <optional>ALL</optional> <replaceable>query2</replaceable>
+<replaceable>query1</replaceable> EXCEPT <optional>ALL</optional> <replaceable>query2</replaceable>
+</synopsis>
+ <replaceable>query1</replaceable> and
+ <replaceable>query2</replaceable> are queries that can use any of
+ the features discussed up to this point. Set operations can also
+ be nested and chained, for example
+<synopsis>
+<replaceable>query1</replaceable> UNION <replaceable>query2</replaceable> UNION <replaceable>query3</replaceable>
+</synopsis>
+ which really says
+<synopsis>
+(<replaceable>query1</replaceable> UNION <replaceable>query2</replaceable>) UNION <replaceable>query3</replaceable>
+</synopsis>
+ </para>
+
+ <para>
+ <command>UNION</command> effectively appends the result of
+ <replaceable>query2</replaceable> to the result of
+ <replaceable>query1</replaceable> (although there is no guarantee
+ that this is the order in which the rows are actually returned) and
+ eliminates all duplicate rows, in the sense of DISTINCT, unless ALL
+ is specified.
+ </para>
+
+ <para>
+ <command>INTERSECT</command> returns all rows that are both in the
+ result of <replaceable>query1</replaceable> and in the result of
+ <replaceable>query2</replaceable>. Duplicate rows are eliminated
+ unless ALL is specified.
+ </para>
+
+ <para>
+ <command>EXCEPT</command> returns all rows that are in the result
+ of <replaceable>query1</replaceable> but not in the result of
+ <replaceable>query2</replaceable>. Again, duplicates are
+ eliminated unless ALL is specified.
+ </para>
+
+ <para>
+ In order to calculate the union, intersection, or difference of two
+ queries, the two queries must be <quote>union compatible</quote>,
+ which means that they both return the same number of columns, and
+ that the corresponding columns have compatible data types, as
+ described in <xref linkend="typeconv-union-case">.
+ </para>
+ </sect1>
+
+
+ <sect1 id="queries-order">
+ <title>Sorting Rows</title>
+
+ <para>
+ After a query has produced an output table (after the select list
+ has been processed) it can optionally be sorted. If sorting is not
+ chosen, the rows will be returned in random order. The actual
+ order in that case will depend on the scan and join plan types and
+ the order on disk, but it must not be relied on. A particular
+ ordering can only be guaranteed if the sort step is explicitly
+ chosen.
+ </para>
+
+ <para>
+ The ORDER BY clause specifies the sort order:
+<synopsis>
+SELECT <replaceable>select_list</replaceable> FROM <replaceable>table_expression</replaceable> ORDER BY <replaceable>column1</replaceable> <optional>ASC | DESC</optional> <optional>, <replaceable>column2</replaceable> <optional>ASC | DESC</optional> ...</optional>
+</synopsis>
+ <replaceable>column1</replaceable>, etc., refer to select list
+ columns: It can either be the name of a column (either the
+ explicit column label or default name, as explained in <xref
+ linkend="queries-column-labels">) or the number of a column. Some
+ examples:
+<programlisting>
+SELECT a, b FROM table1 ORDER BY a;
+SELECT a + b AS sum, c FROM table1 ORDER BY sum;
+SELECT a, sum(b) FROM table1 GROUP BY a ORDER BY 1;
+</programlisting>
+ </para>
+
+ <para>
+ As an extension to the SQL standard, Postgres also allows ordering
+ by arbitrary expressions:
+<programlisting>
+SELECT a, b FROM table1 ORDER BY a + b;
+</programlisting>
+ References to column names in the FROM clause that are renamed in
+ the select list are also allowed:
+<programlisting>
+SELECT a AS b FROM table1 ORDER BY a;
+</programlisting>
+ But this does not work in queries involving UNION, INTERSECT, or
+ EXCEPT, and is not portable.
+ </para>
+
+ <para>
+ Each column specification may be followed by an optional ASC or
+ DESC to set the sort direction. ASC is default. Ascending order
+ puts smaller values first, where <quote>smaller</quote> is defined
+ in terms of the <literal>&lt;</literal> operator. Similarly,
+ descending order is determined with the <literal>&gt;</literal>
+ operator.
+ </para>
+
+ <para>
+ If more than one sort column is specified the later entries are
+ used to sort the rows that are equal under the order imposed by the
+ earlier sort specifications.
+ </para>
+ </sect1>
+
+ <sect1 id="queries-limit">
+ <title>LIMIT and OFFSET</title>
+
+<synopsis>
+SELECT <replaceable>select_list</replaceable> FROM <replaceable>table_expression</replaceable> <optional>ORDER BY <replaceable>sort_spec</replaceable></optional> <optional>LIMIT { <replaceable>number</replaceable> | ALL }</optional> <optional>OFFSET <replaceable>number</replaceable></optional>
+</synopsis>
+
+ <para>
+ LIMIT allows you to retrieve just a portion of the rows that are
+ generated by the rest of the query. If a limit count is given, no
+ more than that many rows will be returned. If an offset is given,
+ that many rows will be skipped before starting to return rows.
+ </para>
+
+ <para>
+ When using LIMIT, it is a good idea to use an ORDER BY clause that
+ constrains the result rows into a unique order. Otherwise you will
+ get an unpredictable subset of the query's rows---you may be asking
+ for the tenth through twentieth rows, but tenth through twentieth
+ in what ordering? The ordering is unknown, unless you specified
+ ORDER BY.
+ </para>
+
+ <para>
+ The query optimizer takes LIMIT into account when generating a
+ query plan, so you are very likely to get different plans (yielding
+ different row orders) depending on what you give for LIMIT and
+ OFFSET. Thus, using different LIMIT/OFFSET values to select
+ different subsets of a query result <emphasis>will give
+ inconsistent results</emphasis> unless you enforce a predictable
+ result ordering with ORDER BY. This is not a bug; it is an
+ inherent consequence of the fact that SQL does not promise to
+ deliver the results of a query in any particular order unless ORDER
+ BY is used to constrain the order.
+ </para>
+ </sect1>
+
+</chapter>
diff --git a/doc/src/sgml/syntax.sgml b/doc/src/sgml/syntax.sgml
index f994cbe0a07..843f2d08f5e 100644
--- a/doc/src/sgml/syntax.sgml
+++ b/doc/src/sgml/syntax.sgml
@@ -1,5 +1,5 @@
<!--
-$Header: /cvsroot/pgsql/doc/src/sgml/syntax.sgml,v 1.36 2001/01/21 22:02:01 petere Exp $
+$Header: /cvsroot/pgsql/doc/src/sgml/syntax.sgml,v 1.37 2001/01/22 23:34:33 petere Exp $
-->
<chapter id="sql-syntax">
@@ -743,7 +743,7 @@ CAST ( '<replaceable>string</replaceable>' AS <replaceable>type</replaceable> )
</para>
<para>
- In addition to this list, there are a number of contructs that can
+ In addition to this list, there are a number of constructs that can
be classified as an expression but do not follow any general syntax
rules. These generally have the semantics of a function or
operator and are explained in the appropriate location in <xref
@@ -763,15 +763,15 @@ CAST ( '<replaceable>string</replaceable>' AS <replaceable>type</replaceable> )
<para>
A column can be referenced in the form:
<synopsis>
-<replaceable>corelation</replaceable>.<replaceable>columnname</replaceable> `['<replaceable>subscript</replaceable>`]'
+<replaceable>correlation</replaceable>.<replaceable>columnname</replaceable> `['<replaceable>subscript</replaceable>`]'
</synopsis>
- <replaceable>corelation</replaceable> is either the name of a
+ <replaceable>correlation</replaceable> is either the name of a
table, an alias for a table defined by means of a FROM clause, or
the keyword <literal>NEW</literal> or <literal>OLD</literal>.
(NEW and OLD can only appear in the action portion of a rule,
- while other corelation names can be used in any SQL statement.)
- The corelation name can be omitted if the column name is unique
+ while other correlation names can be used in any SQL statement.)
+ The correlation name can be omitted if the column name is unique
across all the tables being used in the current query. If
<replaceable>column</replaceable> is of an array type, then the
optional <replaceable>subscript</replaceable> selects a specific
@@ -895,8 +895,8 @@ sqrt(2)
The precedence and associativity of the operators is hard-wired
into the parser. Most operators have the same precedence and are
left-associative. This may lead to non-intuitive behavior; for
- example the boolean operators "&lt;" and "&gt;" have a different
- precedence than the boolean operators "&lt;=" and "&gt;=". Also,
+ example the Boolean operators "&lt;" and "&gt;" have a different
+ precedence than the Boolean operators "&lt;=" and "&gt;=". Also,
you will sometimes need to add parentheses when using combinations
of binary and unary operators. For instance
<programlisting>
@@ -917,7 +917,7 @@ SELECT (5 &amp;) ~ 6;
<tgroup cols="2">
<thead>
<row>
- <entry>OperatorElement</entry>
+ <entry>Operator/Element</entry>
<entry>Associativity</entry>
<entry>Description</entry>
</row>
@@ -1057,526 +1057,6 @@ SELECT (5 &amp;) ~ 6;
</para>
</sect1>
-
- <sect1 id="sql-table-expressions">
- <title>Table Expressions</title>
-
- <para>
- A <firstterm>table expression</firstterm> specifies a table. The
- table expression contains a FROM clause that is optionally followed
- by WHERE, GROUP BY, and HAVING clauses. Trivial table expressions
- simply refer to a table on disk, a so-called base table, but more
- complex expressions can be used to modify or combine base tables in
- various ways.
- </para>
-
- <para>
- The general syntax of the <command>SELECT</command> command is
-<synopsis>
-SELECT <replaceable>select_list</replaceable> <replaceable>table_expression</replaceable>
-</synopsis>
-
- The <replaceable>select_list</replaceable> is a comma separated
- list of <replaceable>value expressions</replaceable> as defined in
- <xref linkend="sql-expressions"> that specify the derived columns
- of the query output table. Column names in the derived table that
- is the result of the <replaceable>table_expression</replaceable>
- can be used in the <replaceable>value expression</replaceable>s of
- the <replaceable>select_list</replaceable>.
- </para>
-
- <para>
- The WHERE, GROUP BY, and HAVING clauses in the table expression
- specify a pipeline of successive transformations performed on the
- table derived in the FROM clause. The final transformed table that
- is derived provides the input rows used to derive output rows as
- specified by the select list of derived column value expressions.
- </para>
-
- <sect2>
- <title>FROM clause</title>
-
- <para>
- The FROM clause derives a table from one or more other tables
- given in a comma-separated table reference list.
-<synopsis>
-FROM <replaceable>table_reference</replaceable> <optional>, <replaceable>table_reference</replaceable> <optional>, ...</optional></optional>
-</synopsis>
-
- A table reference may be a table name or a derived table such as a
- subquery, a table join, or complex combinations of these. If more
- than one table reference is listed in the FROM clause they are
- CROSS JOINed (see below) to form the derived table that may then
- be subject to transformations by the WHERE, GROUP BY, and HAVING
- clauses and is finally the result of the overall table expression.
- </para>
-
- <para>
- If a table reference is a simple table name and it is the
- supertable in a table inheritance hierarchy, rows of the table
- include rows from all of its subtable successors unless the
- keyword ONLY precedes the table name.
- </para>
-
- <sect3>
- <title>Joined Tables</title>
-
- <para>
- A joined table is a table derived from two other (real or
- derived) tables according to the rules of the particular join
- type. INNER, OUTER, NATURAL, and CROSS JOIN are supported.
- </para>
-
- <variablelist>
- <title>Join Types</title>
-
- <varlistentry>
- <term>CROSS JOIN</term>
-
- <listitem>
-<synopsis>
-<replaceable>T1</replaceable> CROSS JOIN <replaceable>T2</replaceable>
-</synopsis>
-
- <para>
- For each combination of rows from
- <replaceable>T1</replaceable> and
- <replaceable>T2</replaceable> the derived table will contain a
- row consisting of all columns in <replaceable>T1</replaceable>
- followed by all columns in <replaceable>T2</replaceable>. If
- the tables have have N and M rows respectively, the joined
- table will have N * M rows. A cross join is essentially an
- <literal>INNER JOIN ON TRUE</literal>.
- </para>
-
- <tip>
- <para>
- <literal>FROM <replaceable>T1</replaceable> CROSS JOIN
- <replaceable>T2</replaceable></literal> is equivalent to
- <literal>FROM <replaceable>T1</replaceable>,
- <replaceable>T2</replaceable></literal>.
- </para>
- </tip>
- </listitem>
- </varlistentry>
-
- <varlistentry>
- <term>Qualified JOINs</term>
- <listitem>
-
-<synopsis>
-<replaceable>T1</replaceable> { <optional>INNER</optional> | { LEFT | RIGHT | FULL } <optional>OUTER</optional> } JOIN <replaceable>T2</replaceable> ON <replaceable>boolean expression</replaceable>
-<replaceable>T1</replaceable> { <optional>INNER</optional> | { LEFT | RIGHT | FULL } <optional>OUTER</optional> } JOIN <replaceable>T2</replaceable> USING ( <replaceable>join column list</replaceable> )
-</synopsis>
-
- <para>
- The words <token>INNER</token> and <token>OUTER</token> are
- optional for all JOINs. <token>INNER</token> is the default;
- <token>LEFT</token>, <token>RIGHT</token>, and
- <token>FULL</token> are for OUTER JOINs only.
- </para>
-
- <para>
- The <firstterm>join condition</firstterm> is specified in the
- ON or USING clause. (The meaning of the join condition
- depends on the particular join type; see below.) The ON
- clause takes a boolean value expression of the same kind as is
- used in a WHERE clause. The USING clause takes a
- comma-separated list of column names, which the joined tables
- must have in common, and joins the tables on the equality of
- those columns as a set, resulting in a joined table having one
- column for each common column listed and all of the other
- columns from both tables. Thus, <literal>USING (a, b,
- c)</literal> is equivalent to <literal>ON (t1.a = t2.a AND
- t1.b = t2.b AND t1.c = t2.c)</literal> with the exception that
- if ON is used there will be two columns a, b, and c in the
- result, whereas with USING there will be only one of each.
- </para>
-
- <variablelist>
- <varlistentry>
- <term>INNER JOIN</term>
-
- <listitem>
- <para>
- For each row R1 of T1, the joined table has a row for each
- row in T2 that satisfies the join condition with R1.
- </para>
- </listitem>
- </varlistentry>
-
- <varlistentry>
- <term>LEFT OUTER JOIN</term>
-
- <listitem>
- <para>
- First, an INNER JOIN is performed. Then, for a row in T1
- that does not satisfy the join condition with any row in
- T2, a joined row is returned with NULL values in columns of
- T2. Thus, the joined table unconditionally has a row for each
- row in T1.
- </para>
- </listitem>
- </varlistentry>
-
- <varlistentry>
- <term>RIGHT OUTER JOIN</term>
-
- <listitem>
- <para>
- This is like a left join, only that the result table will
- unconditionally have a row for each row in T2.
- </para>
- </listitem>
- </varlistentry>
-
- <varlistentry>
- <term>FULL OUTER JOIN</term>
-
- <listitem>
- <para>
- First, an INNER JOIN is performed. Then, for each row in
- T1 that does not satisfy the join condition with any row in
- T2, a joined row is returned with null values in columns of
- T2. Also, for each row of T2 that does not satisfy the
- join condition with any row in T1, a joined row with null
- values in the columns of T1 is returned.
- </para>
- </listitem>
- </varlistentry>
- </variablelist>
- </listitem>
- </varlistentry>
-
- <varlistentry>
- <term>NATURAL JOIN</term>
-
- <listitem>
-<synopsis>
-<replaceable>T1</replaceable> NATURAL { <optional>INNER</optional> | { LEFT | RIGHT | FULL } <optional>OUTER</optional> JOIN <replaceable>T2</replaceable>
-</synopsis>
- <para>
- A natural join creates a joined table where every pair of matching
- column names between the two tables are merged into one column. The
- join specification is effectively a USING clause containing all the
- common column names and is otherwise like a Qualified JOIN.
- </para>
- </listitem>
- </varlistentry>
- </variablelist>
-
- <para>
- Joins of all types can be chained together or nested where either
- or both of <replaceable>T1</replaceable> and
- <replaceable>T2</replaceable> may be JOINed tables. Parenthesis
- can be used around JOIN clauses to control the join order which
- are otherwise left to right.
- </para>
- </sect3>
-
- <sect3 id="sql-subqueries">
- <title>Subqueries</title>
-
- <para>
- Subqueries specifying a derived table must be enclosed in
- parenthesis and <emphasis>must</emphasis> be named using an AS
- clause. (See <xref linkend="sql-table-aliases">.)
- </para>
-
-<programlisting>
-FROM (SELECT * FROM table1) AS alias_name
-</programlisting>
-
- <para>
- This example is equivalent to <literal>FROM table1 AS
- alias_name</literal>. Many subquieries can be written as table
- joins instead.
- </para>
- </sect3>
-
- <sect3 id="sql-table-aliases">
- <title>Table and Column Aliases</title>
-
- <para>
- A temporary name can be given to tables and complex table
- references to be used for references to the derived table in
- further processing. This is called a <firstterm>table
- alias</firstterm>.
-<synopsis>
-FROM <replaceable>table_reference</replaceable> AS <replaceable>alias</replaceable>
-</synopsis>
- Here, <replaceable>alias</replaceable> can be any regular
- identifier. The alias becomes the new name of the table
- reference for the current query -- it is no longer possible to
- refer to the table by the original name (if the table reference
- was an ordinary base table). Thus
-<programlisting>
-SELECT * FROM my_table AS m WHERE my_table.a > 5;
-</programlisting>
- is not valid SQL syntax. What will happen instead, as a
- <productname>Postgres</productname> extension, is that an implict
- table reference is added to the FROM clause, so the query is
- processed as if it was written as
-<programlisting>
-SELECT * FROM my_table AS m, my_table WHERE my_table.a > 5;
-</programlisting>
- Table aliases are mainly for notational convenience, but it is
- necessary to use them when joining a table to itself, e.g.,
-<programlisting>
-SELECT * FROM my_table AS a CROSS JOIN my_table AS b ...
-</programlisting>
- Additionally, an alias is required if the table reference is a
- subquery.
- </para>
-
- <para>
- Parenthesis are used to resolve ambiguities. The following
- statement will assign the alias <literal>b</literal> to the
- result of the join, unlike the previous example:
-<programlisting>
-SELECT * FROM (my_table AS a CROSS JOIN my_table) AS b ...
-</programlisting>
- </para>
-
- <para>
-<synopsis>
-FROM <replaceable>table_reference</replaceable> <replaceable>alias</replaceable>
-</synopsis>
- This form is equivalent the previously treated one; the
- <token>AS</token> key word is noise.
- </para>
-
- <para>
-<synopsis>
-FROM <replaceable>table_reference</replaceable> <optional>AS</optional> <replaceable>alias</replaceable> ( <replaceable>column1</replaceable> <optional>, <replaceable>column2</replaceable> <optional>, ...</optional></optional> )
-</synopsis>
- In addition to renaming the table as described above, the columns
- of the table are also given temporary names. If less column
- aliases are specified than the actual table has columns, the last
- columns are not renamed. This syntax is especially useful for
- self-joins or subqueries.
- </para>
- </sect3>
-
- <sect3>
- <title>Examples</title>
-
- <para>
-<programlisting>
-FROM T1 INNER JOIN T2 USING (C)
-FROM T1 LEFT OUTER JOIN T2 USING (C)
-FROM (T1 RIGHT OUTER JOIN T2 ON (T1C1=T2C1)) AS DT1
-FROM (T1 FULL OUTER JOIN T2 USING (C)) AS DT1 (DT1C1, DT1C2)
-
-FROM T1 NATURAL INNER JOIN T2
-FROM T1 NATURAL LEFT OUTER JOIN T2
-FROM T1 NATURAL RIGHT OUTER JOIN T2
-FROM T1 NATURAL FULL OUTER JOIN T2
-
-FROM (SELECT * FROM T1) DT1 CROSS JOIN T2, T3
-FROM (SELECT * FROM T1) DT1, T2, T3
-</programlisting>
-
- Above are some examples of joined tables and complex derived
- tables. Notice how the AS clause renames or names a derived
- table and how the optional comma-separated list of column names
- that follows gives names or renames the columns. The last two
- FROM clauses produce the same derived table from T1, T2, and T3.
- The AS keyword was omitted in naming the subquery as DT1. The
- keywords OUTER and INNER are noise that can be omitted also.
- </para>
- </sect3>
-
- </sect2>
-
- <sect2>
- <title>WHERE clause</title>
-
- <para>
- The syntax of the WHERE clause is
-<synopsis>
-WHERE <replaceable>search condition</replaceable>
-</synopsis>
- where <replaceable>search condition</replaceable> is any value
- expression as defined in <xref linkend="sql-expressions"> that
- returns a value of type <type>boolean</type>.
- </para>
-
- <para>
- After the processing of the FROM clause is done, each row of the
- derived table is checked against the search condition. If the
- result of the condition is true, the row is kept in the output
- table, otherwise (that is, if the result is false or NULL) it is
- discared. The search condition typically references at least some
- column in the table generated in the FROM clause; this is not
- required, but otherwise the WHERE clause will be fairly useless.
- </para>
-
- <note>
- <para>
- Before the implementation of the JOIN syntax, it was necessary to
- put the join condition of an inner join in the WHERE clause. For
- example, these table expressions are equivalent:
-<programlisting>
-FROM a, b WHERE a.id = b.id AND b.val &gt; 5
-</programlisting>
- and
-<programlisting>
-FROM a INNER JOIN b ON (a.id = b.id) WHERE b.val &gt; 5
-</programlisting>
- or perhaps even
-<programlisting>
-FROM a NATURAL JOIN b WHERE b.val &gt; 5
-</programlisting>
- Which one of these you use is mainly a matter of style. The JOIN
- syntax in the FROM clause is probably not as portable to other
- products. For outer joins there is no choice in any case: they
- must be done in the FROM clause.
- </para>
- </note>
-
-<programlisting>
-FROM FDT WHERE
- C1 > 5
-
-FROM FDT WHERE
- C1 IN (1, 2, 3)
-FROM FDT WHERE
- C1 IN (SELECT C1 FROM T2)
-FROM FDT WHERE
- C1 IN (SELECT C3 FROM T2 WHERE C2 = FDT.C1 + 10)
-
-FROM FDT WHERE
- C1 BETWEEN (SELECT C3 FROM T2 WHERE C2 = FDT.C1 + 10) AND 100
-
-FROM FDT WHERE
- EXISTS (SELECT C1 FROM T2 WHERE C2 > FDT.C1)
-</programlisting>
-
- <para>
- In the examples above, FDT is the table derived in the FROM
- clause. Rows that do not meet the search condition of the where
- clause are eliminated from FDT. Notice the use of scalar
- subqueries as value expressions (C2 assumed UNIQUE). Just like
- any other query, the subqueries can employ complex table
- expressions. Notice how FDT is referenced in the subqueries.
- Qualifying C1 as FDT.C1 is only necessary if C1 is the name of a
- column in the derived input table of the subquery. Qualifying the
- column name adds clarity even when it is not needed. The column
- naming scope of an outer query extends into its inner queries.
- </para>
- </sect2>
-
-<!-- This is confusing as heck. Make it simpler. -->
-
-<![IGNORE[
-
- <sect2>
- <title>GROUP BY and HAVING clauses</title>
-
- <para>
- After passing the WHERE filter, the derived input table may be
- subject to grouping, using the GROUP BY clause, and elimination of
- group rows using the HAVING clause. (The HAVING clause can also
- be used without GROUP BY, but then it is equivalent to the WHERE
- clause.)
- </para>
-
- <para>
- In standard SQL, the GROUP BY clause takes a list of column names,
- that specify a subrow, from the derived input table produced by
- the previous WHERE or FROM clause and partitions the table into
- groups with duplicate subrows such that within a column of the
- subrow, no column value is distinct from other column values. The
- resulting derived input table is a special type of table, called a
- grouped table, which still contains all columns but only
- references to columns of the grouped subrow, and group aggregates,
- derived from any of the columns, may appear in derived column
- value expressions in the query select list. When deriving an
- output table from a query using a grouped input table, each output
- row is derived from a corresponding group/partition of the grouped
- table. Aggregates computed in a derived output column are
- aggregates on the current partition/group of the grouped input
- table being processed. Only one output table row results per
- group/partition of the grouped input table.
- </para>
-
- <para>
- Postgres has extended the GROUP BY clause to allow some
- non-standard, but useful behavior. Derived output columns, given
- names using an AS clause in the query select list, may appear in
- the GROUP BY clause in combination with, or instead of, the input
- table column names. Tables may also be grouped by arbitrary
- expressions. If output table column names appear in the GROUP BY
- list, then the input table is augmented with additional columns of
- the output table columns listed in the GROUP BY clause. The value
- for each row in the additional columns is computed from the value
- expression that defines the output column in the query select
- list. The augmented input table is grouped by the column names
- listed in the GROUP BY clause. The resulting grouped augmented
- input table is then treated according standard SQL GROUP BY
- semantics. Only the columns of the unaugmented input table in the
- grouped subrow (if any), and group aggregates, derived from any of
- the columns of the unaugmented input table, may be referenced in
- the value expressions of the derived output columns of the
- query. Output columns derived with an aggregate expression cannot
- be named in the GROUP BY clause.
- </para>
-
- <para>
- A HAVING clause may optionally follow a GROUP BY clause. The
- HAVING clause selects or eliminates, depending on which
- perspective is taken, groups from the grouped table derived in the
- GROUP BY clause that precedes it. The search condition is the
- same type of expression allowed in a WHERE clause and may
- reference any of the input table column names in the grouped
- subrow, but may not reference any others or any named output
- columns. When the search condition results in TRUE the group is
- retained, otherwise the group is eliminated.
- </para>
- </sect2>
-
- <sect2>
- <title>ORDER BY and LIMIT clauses</title>
-
- <para>
- ORDER BY and LIMIT clauses are not clauses of a table expression.
- They are optional clauses that may follow a query expression and
- are discussed here because they are commonly used with the
- clauses above.
- </para>
-
- <para>
- ORDER BY takes a comma-separated list of columns and performs a
- cascaded ordering of the table by the columns listed, in the
- order listed. The keyword DESC or ASC may follow any column name
- or expression in the list to specify descending or ascending
- ordering, respectively. Ascending order is the default. The
- ORDER BY clause conforms to the SQL standard but is extended in
- Postgres. Postgres allows ORDER BY to reference both output
- table columns, as named in the select list using the AS clause,
- and input table columns, as given by the table derived in the
- FROM clause and other previous clauses. Postgres also extends
- ORDER BY to allow ordering by arbitrary expressions. If used in a
- query with a GROUP BY clause, the ORDER BY clause can only
- reference output table column names and grouped input table
- columns.
- </para>
-
- <para>
- LIMIT is not a standard SQL clause. LIMIT is a Postgres
- extension that limits the number of rows that will be returned
- from a query. The rows returned by a query using the LIMIT
- clause are random if no ORDER BY clause is specified. A LIMIT
- clause may optionally be followed by an OFFSET clause which
- specifies a number of rows to be skipped in the output table
- before returning the number of rows specified in the LIMIT
- clause.
- </para>
- </sect2>
-]]>
- </sect1>
-
</chapter>
<!-- Keep this comment at the end of the file
diff --git a/doc/src/sgml/user.sgml b/doc/src/sgml/user.sgml
index 78d422013da..c78e3d3a7db 100644
--- a/doc/src/sgml/user.sgml
+++ b/doc/src/sgml/user.sgml
@@ -1,5 +1,5 @@
<!--
-$Header: /cvsroot/pgsql/doc/src/sgml/Attic/user.sgml,v 1.24 2001/01/13 23:58:55 petere Exp $
+$Header: /cvsroot/pgsql/doc/src/sgml/Attic/user.sgml,v 1.25 2001/01/22 23:34:33 petere Exp $
-->
<book id="user">
@@ -44,6 +44,7 @@ $Header: /cvsroot/pgsql/doc/src/sgml/Attic/user.sgml,v 1.24 2001/01/13 23:58:55
&intro;
&syntax;
+ &queries;
&datatype;
&func;
&typeconv;