<!--
-$PostgreSQL: pgsql/doc/src/sgml/func.sgml,v 1.231 2004/12/21 01:02:28 momjian Exp $
+$PostgreSQL: pgsql/doc/src/sgml/func.sgml,v 1.232 2004/12/23 23:07:38 tgl Exp $
PostgreSQL documentation
-->
<footnote>
<para>
The <function>to_ascii</function> function supports conversion from
- <literal>LATIN1</>, <literal>LATIN2</>, and <literal>WIN1250</> only.
+ <literal>LATIN1</>, <literal>LATIN2</>, <literal>LATIN9</>,
+ and <literal>WIN1250</> encodings only.
</para>
</footnote>
</entry>
There are three separate approaches to pattern matching provided
by <productname>PostgreSQL</productname>: the traditional
<acronym>SQL</acronym> <function>LIKE</function> operator, the
- more recent <literal>>SIMILAR TO</literal> operator (since
+ more recent <function>SIMILAR TO</function> operator (added in
SQL:1999), and <acronym>POSIX</acronym>-style regular expressions.
Additionally, a pattern matching function,
<function>substring</function>, is available, using either
- <literal>SIMILAR TO</literal>-style or POSIX-style regular
+ <function>SIMILAR TO</function>-style or POSIX-style regular
expressions.
</para>
<para>
<function>LIKE</function> pattern matches always cover the entire
- string. To match a pattern anywhere within a string, the
+ string. To match a sequence anywhere within a string, the
pattern must therefore start and end with a percent sign.
</para>
<para>
The key word <token>ILIKE</token> can be used instead of
- <token>LIKE</token> to make the match case insensitive according
+ <token>LIKE</token> to make the match case-insensitive according
to the active locale. This is not in the <acronym>SQL</acronym> standard but is a
<productname>PostgreSQL</productname> extension.
</para>
pattern. But if the pattern contains any parentheses, the portion
of the text that matched the first parenthesized subexpression (the
one whose left parenthesis comes first) is
- returned. You can always put parentheses around the whole expression
+ returned. You can put parentheses around the whole expression
if you want to use parentheses within it without triggering this
- exception. Also see the non-capturing parentheses described below.
+ exception. If you need parentheses in the pattern before the
+ subexpression you want to extract, see the non-capturing parentheses
+ described below.
</para>
<para>
<para>
The forms using <literal>{</><replaceable>...</><literal>}</>
- are known as <firstterm>bound</>s.
+ are known as <firstterm>bounds</>.
The numbers <replaceable>m</> and <replaceable>n</> within a bound are
unsigned decimal integers with permissible values from 0 to 255 inclusive.
</para>
Normally the flavor of RE being used is determined by
<varname>regex_flavor</>.
However, this can be overridden by a <firstterm>director</> prefix.
- If an RE of any flavor begins with <literal>***:</>,
- the rest of the RE is taken as an ARE.
- If an RE of any flavor begins with <literal>***=</>,
+ If an RE begins with <literal>***:</>,
+ the rest of the RE is taken as an ARE regardless of
+ <varname>regex_flavor</>.
+ If an RE begins with <literal>***=</>,
the rest of the RE is taken to be a literal string,
with all characters considered ordinary characters.
</para>
<para>
Embedded options take effect at the <literal>)</> terminating the sequence.
- They are available only at the start of an ARE,
- and may not be used later within it.
+ They may appear only at the start of an ARE (after the
+ <literal>***:</> director if any).
</para>
<para>
</listitem>
<listitem>
<para>
- white space and comments are illegal within multi-character symbols,
- like the ARE <literal>(?:</> or the BRE <literal>\(</>
+ white space and comments cannot appear within multi-character symbols,
+ such as <literal>(?:</>
</para>
</listitem>
</itemizedlist>
- Expanded-syntax white-space characters are blank, tab, newline, and
+ For this purpose, white-space characters are blank, tab, newline, and
any character that belongs to the <replaceable>space</> character class.
</para>
</table>
<para>
- Usage notes for the date/time formatting:
+ Usage notes for date/time formatting:
<itemizedlist>
<listitem>
</table>
<para>
- Usage notes for the numeric formatting:
+ Usage notes for numeric formatting:
<itemizedlist>
<listitem>
<para>
The <function>extract</function> function retrieves subfields
- from date/time values, such as year or hour.
- <replaceable>source</replaceable> is a value expression that
- evaluates to type <type>timestamp</type> or <type>interval</type>.
- (Expressions of type <type>date</type> or <type>time</type> will
+ such as year or hour from date/time values.
+ <replaceable>source</replaceable> must be a value expression of
+ type <type>timestamp</type>, <type>time</type>, or <type>interval</type>.
+ (Expressions of type <type>date</type> will
be cast to <type>timestamp</type> and can therefore be used as
well.) <replaceable>field</replaceable> is an identifier or
string that selects what field to extract from the source value.
</programlisting>
</para>
- <note>
+ <tip>
<para>
You do not want to use the third form when specifying a <literal>DEFAULT</>
clause while creating a table. The system will convert <literal>now</literal>
because they are function calls. Thus they will give the desired
behavior of defaulting to the time of row insertion.
</para>
- </note>
+ </tip>
</sect2>
</sect1>
<para>
<xref linkend="array-functions-table"> shows the functions
available for use with array types. See <xref linkend="arrays">
- for more discussion and examples for the use of these functions.
+ for more discussion and examples of the use of these functions.
</para>
<table id="array-functions-table">
</literal>
</entry>
<entry><type>anyarray</type></entry>
- <entry>
- concatenate two arrays, returning <literal>NULL</literal>
- for <literal>NULL</literal> inputs
- </entry>
+ <entry>concatenate two arrays</entry>
<entry><literal>array_cat(ARRAY[1,2,3], ARRAY[4,5])</literal></entry>
<entry><literal>{1,2,3,4,5}</literal></entry>
</row>
</literal>
</entry>
<entry><type>anyarray</type></entry>
- <entry>
- append an element to the end of an array, returning
- <literal>NULL</literal> for <literal>NULL</literal> inputs
- </entry>
+ <entry>append an element to the end of an array</entry>
<entry><literal>array_append(ARRAY[1,2], 3)</literal></entry>
<entry><literal>{1,2,3}</literal></entry>
</row>
</literal>
</entry>
<entry><type>anyarray</type></entry>
- <entry>
- append an element to the beginning of an array, returning
- <literal>NULL</literal> for <literal>NULL</literal> inputs
- </entry>
+ <entry>append an element to the beginning of an array</entry>
<entry><literal>array_prepend(1, ARRAY[2,3])</literal></entry>
<entry><literal>{1,2,3}</literal></entry>
</row>
</literal>
</entry>
<entry><type>text</type></entry>
- <entry>
- returns a text representation of array dimension lower and upper bounds,
- generating an ERROR for <literal>NULL</literal> inputs
- </entry>
+ <entry>returns a text representation of array's dimensions</entry>
<entry><literal>array_dims(array[[1,2,3], [4,5,6]])</literal></entry>
<entry><literal>[1:2][1:3]</literal></entry>
</row>
</literal>
</entry>
<entry><type>integer</type></entry>
- <entry>
- returns lower bound of the requested array dimension, returning
- <literal>NULL</literal> for <literal>NULL</literal> inputs
- </entry>
+ <entry>returns lower bound of the requested array dimension</entry>
<entry><literal>array_lower(array_prepend(0, ARRAY[1,2,3]), 1)</literal></entry>
<entry><literal>0</literal></entry>
</row>
</literal>
</entry>
<entry><type>integer</type></entry>
- <entry>
- returns upper bound of the requested array dimension, returning
- <literal>NULL</literal> for <literal>NULL</literal> inputs
- </entry>
+ <entry>returns upper bound of the requested array dimension</entry>
<entry><literal>array_upper(ARRAY[1,2,3,4], 1)</literal></entry>
<entry><literal>4</literal></entry>
</row>
</literal>
</entry>
<entry><type>text</type></entry>
- <entry>
- concatenates array elements using provided delimiter, returning
- <literal>NULL</literal> for <literal>NULL</literal> inputs
- </entry>
+ <entry>concatenates array elements using provided delimiter</entry>
<entry><literal>array_to_string(array[1, 2, 3], '~^~')</literal></entry>
<entry><literal>1~^~2~^~3</literal></entry>
</row>
</literal>
</entry>
<entry><type>text[]</type></entry>
- <entry>
- splits string into array elements using provided delimiter, returning
- <literal>NULL</literal> for <literal>NULL</literal> inputs
- </entry>
+ <entry>splits string into array elements using provided delimiter</entry>
<entry><literal>string_to_array( 'xx~^~yy~^~zz', '~^~')</literal></entry>
<entry><literal>{xx,yy,zz}</literal></entry>
</row>
It should be noted that except for <function>count</function>,
these functions return a null value when no rows are selected. In
particular, <function>sum</function> of no rows returns null, not
- zero as one might expect. The function <function>coalesce</function> may be
+ zero as one might expect. The <function>coalesce</function> function may be
used to substitute zero for null when necessary.
</para>
</indexterm>
<para>
- The <function>session_user</function> is the user that initiated a
- database connection; it is fixed for the duration of that
- connection. The <function>current_user</function> is the user identifier
+ The <function>session_user</function> is normally the user who initiated
+ the current database connection; but superusers can change this setting
+ with <xref linkend="sql-set-session-authorization">.
+ The <function>current_user</function> is the user identifier
that is applicable for permission checking. Normally, it is equal
to the session user, but it changes during the execution of
functions with the attribute <literal>SECURITY DEFINER</literal>.
<function>inet_server_addr</function> returns the IP address on which
the server accepted the current connection, and
<function>inet_server_port</function> returns the port number.
- All these functions return NULL if the connection is via a Unix-domain
- socket.
+ All these functions return NULL if the current connection is via a
+ Unix-domain socket.
</para>
<indexterm zone="functions-info">
</para>
<para>
- To evaluate whether a user holds a grant option on the privilege,
+ To test whether a user holds a grant option on the privilege,
append <literal> WITH GRANT OPTION</literal> to the privilege key
word; for example <literal>'UPDATE WITH GRANT OPTION'</literal>.
</para>
<!--
-$PostgreSQL: pgsql/doc/src/sgml/perform.sgml,v 1.48 2004/12/01 19:00:27 tgl Exp $
+$PostgreSQL: pgsql/doc/src/sgml/perform.sgml,v 1.49 2004/12/23 23:07:38 tgl Exp $
-->
<chapter id="performance-tips">
estimates are converted into disk-page units using some
fairly arbitrary fudge factors. If you want to experiment with these
factors, see the list of run-time configuration parameters in
- <xref linkend="runtime-config-resource">.)
+ <xref linkend="runtime-config-query-constants">.)
</para>
<para>
point would be rolled back, so you won't be stuck with partially
loaded data.
</para>
-
- <para>
- If you are issuing a large sequence of <command>INSERT</command>
- commands to bulk load some data, also consider using <xref
- linkend="sql-prepare" endterm="sql-prepare-title"> to create a
- prepared <command>INSERT</command> statement. Since you are
- executing the same command multiple times, it is more efficient to
- prepare the command once and then use <command>EXECUTE</command>
- as many times as required.
- </para>
</sect2>
<sect2 id="populate-copy-from">
use this method to populate a table.
</para>
+ <para>
+ If you cannot use <command>COPY</command>, it may help to use <xref
+ linkend="sql-prepare" endterm="sql-prepare-title"> to create a
+ prepared <command>INSERT</command> statement, and then use
+ <command>EXECUTE</command> as many times as required. This avoids
+ some of the overhead of repeatedly parsing and planning
+ <command>INSERT</command>.
+ </para>
+
<para>
Note that loading a large number of rows using
<command>COPY</command> is almost always faster than using
- <command>INSERT</command>, even if multiple
- <command>INSERT</command> commands are batched into a single
- transaction.
+ <command>INSERT</command>, even if <command>PREPARE</> is used and
+ multiple insertions are batched into a single transaction.
</para>
</sect2>
<para>
Temporarily increasing the <xref linkend="guc-maintenance-work-mem">
- configuration variable when restoring large amounts of data can
+ configuration variable when loading large amounts of data can
lead to improved performance. This is because when a B-tree index
is created from scratch, the existing content of the table needs
- to be sorted. Allowing the external merge sort to use more memory
+ to be sorted. Allowing the merge sort to use more memory
means that fewer merge passes will be required. A larger setting for
<varname>maintenance_work_mem</varname> may also speed up validation
of foreign-key constraints.
Whenever you have significantly altered the distribution of data
within a table, running <xref linkend="sql-analyze"
endterm="sql-analyze-title"> is strongly recommended. This
- includes when bulk loading large amounts of data into
- <productname>PostgreSQL</productname>. Running
+ includes bulk loading large amounts of data into the table. Running
<command>ANALYZE</command> (or <command>VACUUM ANALYZE</command>)
ensures that the planner has up-to-date statistics about the
table. With no statistics or obsolete statistics, the planner may
<!--
-$PostgreSQL: pgsql/doc/src/sgml/typeconv.sgml,v 1.42 2003/12/14 00:10:32 neilc Exp $
+$PostgreSQL: pgsql/doc/src/sgml/typeconv.sgml,v 1.43 2004/12/23 23:07:38 tgl Exp $
-->
<chapter Id="typeconv">
to understand the details of the type conversion mechanism.
However, the implicit conversions done by <productname>PostgreSQL</productname>
can affect the results of a query. When necessary, these results
-can be tailored by a user or programmer
-using <emphasis>explicit</emphasis> type conversion.
+can be tailored by using <emphasis>explicit</emphasis> type conversion.
</para>
<para>
<productname>PostgreSQL</productname> has an extensible type system that is
much more general and flexible than other <acronym>SQL</acronym> implementations.
Hence, most type conversion behavior in <productname>PostgreSQL</productname>
-should be governed by general rules rather than by <foreignphrase>ad hoc</> heuristics, to allow
+is governed by general rules rather than by <foreignphrase>ad hoc</>
+heuristics. This allows
mixed-type expressions to be meaningful even with user-defined types.
</para>
<para>
-The <productname>PostgreSQL</productname> scanner/parser decodes lexical
-elements into only five fundamental categories: integers, floating-point numbers, strings,
-names, and key words. Constants of most non-numeric types are first classified as
-strings. The <acronym>SQL</acronym> language definition allows specifying type
-names with strings, and this mechanism can be used in
+The <productname>PostgreSQL</productname> scanner/parser divides lexical
+elements into only five fundamental categories: integers, non-integer numbers,
+strings, identifiers, and key words. Constants of most non-numeric types are
+first classified as strings. The <acronym>SQL</acronym> language definition
+allows specifying type names with strings, and this mechanism can be used in
<productname>PostgreSQL</productname> to start the parser down the correct
path. For example, the query
<variablelist>
<varlistentry>
<term>
-Operators
+Function calls
</term>
<listitem>
<para>
-<productname>PostgreSQL</productname> allows expressions with
-prefix and postfix unary (one-argument) operators,
-as well as binary (two-argument) operators.
+Much of the <productname>PostgreSQL</productname> type system is built around a
+rich set of functions. Functions can have one or more arguments.
+Since <productname>PostgreSQL</productname> permits function
+overloading, the function name alone does not uniquely identify the function
+to be called; the parser must select the right function based on the data
+types of the supplied arguments.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term>
-Function calls
+Operators
</term>
<listitem>
<para>
-Much of the <productname>PostgreSQL</productname> type system is built around a
-rich set of functions. Function calls can have one or more arguments.
-Since <productname>PostgreSQL</productname> permits function
-overloading, the function name alone does not uniquely identify the function
-to be called; the parser must select the right function based on the data
-types of the supplied arguments.
+<productname>PostgreSQL</productname> allows expressions with
+prefix and postfix unary (one-argument) operators,
+as well as binary (two-argument) operators. Like functions, operators can
+be overloaded, and so the same problem of selecting the right operator
+exists.
</para>
</listitem>
</varlistentry>
Since all query results from a unionized <command>SELECT</command> statement
must appear in a single set of columns, the types of the results of each
<command>SELECT</> clause must be matched up and converted to a uniform set.
-Similarly, the branch expressions of a <literal>CASE</> construct must be
+Similarly, the result expressions of a <literal>CASE</> construct must be
converted to a common type so that the <literal>CASE</> expression as a whole
has a known output type. The same holds for <literal>ARRAY</> constructs.
</para>
<step performance="required">
<para>
-If the target is a fixed-length type (e.g., <type>char</type> or <type>varchar</type>
-declared with a length) then try to find a sizing function for the target
-type. A sizing function is a function of the same name as the type,
-taking two arguments of which the first is that type and the second is of type
-<type>integer</type>, and returning the same type. If one is found, it is applied,
-passing the column's declared length as the second parameter.
+Check to see if there is a sizing cast for the target type. A sizing
+cast is a cast from that type to itself. If one is found in the
+<structname>pg_cast</> catalog, apply it to the expression before storing
+into the destination column. The implementation function for such a cast
+always takes an extra parameter of type <type>integer</type>, which receives
+the destination column's declared length (actually, its
+<structfield>atttypmod</> value; the interpretation of
+<structfield>atttypmod</> varies for different datatypes). The cast function
+is responsible for applying any length-dependent semantics such as size
+checking or truncation.
</para>
</step>