diff options
author | Tom Lane | 2008-07-29 18:31:20 +0000 |
---|---|---|
committer | Tom Lane | 2008-07-29 18:31:20 +0000 |
commit | ab9907f5e5eba6e4e17a279d07a1a9df21ec5b19 (patch) | |
tree | b57e1d34e059098c8ee023f285d999def612e613 /doc/src | |
parent | 6fe879634121bcad34b7093b7b87c9c149b11d0e (diff) |
Add a new, improved version of citext as a contrib module.
David E. Wheeler
Diffstat (limited to 'doc/src')
-rw-r--r-- | doc/src/sgml/citext.sgml | 222 | ||||
-rw-r--r-- | doc/src/sgml/contrib.sgml | 3 | ||||
-rw-r--r-- | doc/src/sgml/filelist.sgml | 3 |
3 files changed, 226 insertions, 2 deletions
diff --git a/doc/src/sgml/citext.sgml b/doc/src/sgml/citext.sgml new file mode 100644 index 00000000000..cb91da8897b --- /dev/null +++ b/doc/src/sgml/citext.sgml @@ -0,0 +1,222 @@ +<!-- $PostgreSQL: pgsql/doc/src/sgml/citext.sgml,v 1.1 2008/07/29 18:31:20 tgl Exp $ --> + +<sect1 id="citext"> + <title>citext</title> + + <indexterm zone="citext"> + <primary>citext</primary> + </indexterm> + + <para> + The <filename>citext</> module provides a case-insensitive + character string type, <type>citext</>. Essentially, it internally calls + <function>lower</> when comparing values. Otherwise, it behaves almost + exactly like <type>text</>. + </para> + + <sect2> + <title>Rationale</title> + + <para> + The standard approach to doing case-insensitive matches + in <productname>PostgreSQL</> has been to use the <function>lower</> + function when comparing values, for example + + <programlisting> + SELECT * FROM tab WHERE lower(col) = LOWER(?); + </programlisting> + </para> + + <para> + This works reasonably well, but has a number of drawbacks: + </para> + + <itemizedlist> + <listitem> + <para> + It makes your SQL statements verbose, and you always have to remember to + use <function>lower</> on both the column and the query value. + </para> + </listitem> + <listitem> + <para> + It won't use an index, unless you create a functional index using + <function>lower</>. + </para> + </listitem> + <listitem> + <para> + If you declare a column as <literal>UNIQUE</> or <literal>PRIMARY + KEY</>, the implicitly generated index is case-sensitive. So it's + useless for case-insensitive searches, and it won't enforce + uniqueness case-insensitively. + </para> + </listitem> + </itemizedlist> + + <para> + The <type>citext</> data type allows you to eliminate calls + to <function>lower</> in SQL queries, and allows a primary key to + be case-insensitive. <type>citext</> is locale-aware, just + like <type>text</>, which means that the comparison of uppercase and + lowercase characters is dependent on the rules of + the <literal>LC_CTYPE</> locale setting. Again, this behavior is + identical to the use of <function>lower</> in queries. But because it's + done transparently by the datatype, you don't have to remember to do + anything special in your queries. + </para> + + </sect2> + + <sect2> + <title>How to Use It</title> + + <para> + Here's a simple example of usage: + + <programlisting> + CREATE TABLE users ( + nick CITEXT PRIMARY KEY, + pass TEXT NOT NULL + ); + + INSERT INTO users VALUES ( 'larry', md5(random()::text) ); + INSERT INTO users VALUES ( 'Tom', md5(random()::text) ); + INSERT INTO users VALUES ( 'Damian', md5(random()::text) ); + INSERT INTO users VALUES ( 'NEAL', md5(random()::text) ); + INSERT INTO users VALUES ( 'Bjørn', md5(random()::text) ); + + SELECT * FROM users WHERE nick = 'Larry'; + </programlisting> + + The <command>SELECT</> statement will return one tuple, even though + the <structfield>nick</> column was set to <quote>larry</> and the query + was for <quote>Larry</>. + </para> + </sect2> + + <sect2> + <title>Limitations</title> + + <itemizedlist> + <listitem> + <para> + <type>citext</>'s behavior depends on + the <literal>LC_CTYPE</> setting of your database. How it compares + values is therefore determined when + <application>initdb</> is run to create the cluster. It is not truly + case-insensitive in the terms defined by the Unicode standard. + Effectively, what this means is that, as long as you're happy with your + collation, you should be happy with <type>citext</>'s comparisons. But + if you have data in different languages stored in your database, users + of one language may find their query results are not as expected if the + collation is for another language. + </para> + </listitem> + + <listitem> + <para> + The pattern-matching comparison operators, such as <literal>LIKE</>, + <literal>~</>, <literal>~~</>, <literal>!~</>, <literal>!~~</>, etc., + have been overloaded to make case-insensitive comparisons when their + left-hand argument is of type <type>citext</>. However, related + functions have not been changed, including: + </para> + + <itemizedlist> + <listitem> + <para> + <function>regexp_replace()</> + </para> + </listitem> + <listitem> + <para> + <function>regexp_split_to_array()</> + </para> + </listitem> + <listitem> + <para> + <function>regexp_split_to_table()</> + </para> + </listitem> + <listitem> + <para> + <function>replace()</> + </para> + </listitem> + <listitem> + <para> + <function>split_part()</> + </para> + </listitem> + <listitem> + <para> + <function>strpos()</> + </para> + </listitem> + <listitem> + <para> + <function>translate()</> + </para> + </listitem> + </itemizedlist> + + <para> + Of course, for the regular expression functions, you can specify + case-insensitive comparisons in their <parameter>flags</> arguments, but + the same cannot be done for the the non-regexp functions. + </para> + </listitem> + + <listitem> + <para> + <type>citext</> is not as efficient as <type>text</> because the + operator functions and the btree comparison functions must make copies + of the data and convert it to lower case for comparisons. It is, + however, slightly more efficient than using <function>lower</> to get + case-insensitive matching. + </para> + </listitem> + + <listitem> + <para> + <productname>PostgreSQL</> supports casting between <type>text</> + and any other type (even non-string types) by using the other type's + I/O functions. This doesn't work with <type>citext</>. However, + you can cast via I/O functions in two steps, for example + <literal><replaceable>somevalue</>::text::citext</literal> or + <literal><replaceable>citextvalue</>::text::<replaceable>sometype</></literal>. + </para> + </listitem> + + <listitem> + <para> + <type>citext</> doesn't help much if you need data to compare + case-sensitively in some contexts and case-insensitively in other + contexts. The standard answer is to use the <type>text</> type and + manually use the <function>lower</> function when you need to compare + case-insensitively; this works all right if case-insensitive comparison + is needed only infrequently. If you need case-insensitive most of + the time and case-sensitive infrequently, consider storing the data + as <type>citext</> and explicitly casting the column to <type>text</> + when you want case-sensitive comparison. In either situation, you + will need two indexes if you want both types of searches to be fast. + </para> + </listitem> + </itemizedlist> + </sect2> + + <sect2> + <title>Author</title> + + <para> + David E. Wheeler <email>david@kineticode.com</email> + </para> + + <para> + Inspired by the original <type>citext</> module by Donald Fraser. + </para> + + </sect2> + +</sect1> diff --git a/doc/src/sgml/contrib.sgml b/doc/src/sgml/contrib.sgml index b9d78c1c76b..bac5044205b 100644 --- a/doc/src/sgml/contrib.sgml +++ b/doc/src/sgml/contrib.sgml @@ -1,4 +1,4 @@ -<!-- $PostgreSQL: pgsql/doc/src/sgml/contrib.sgml,v 1.8 2007/12/06 04:12:09 tgl Exp $ --> +<!-- $PostgreSQL: pgsql/doc/src/sgml/contrib.sgml,v 1.9 2008/07/29 18:31:20 tgl Exp $ --> <appendix id="contrib"> <title>Additional Supplied Modules</title> @@ -81,6 +81,7 @@ psql -d dbname -f <replaceable>SHAREDIR</>/contrib/<replaceable>module</>.sql &adminpack; &btree-gist; &chkpass; + &citext; &cube; &dblink; &dict-int; diff --git a/doc/src/sgml/filelist.sgml b/doc/src/sgml/filelist.sgml index b6305dc535e..32aa90400bb 100644 --- a/doc/src/sgml/filelist.sgml +++ b/doc/src/sgml/filelist.sgml @@ -1,4 +1,4 @@ -<!-- $PostgreSQL: pgsql/doc/src/sgml/filelist.sgml,v 1.56 2007/12/03 04:18:47 tgl Exp $ --> +<!-- $PostgreSQL: pgsql/doc/src/sgml/filelist.sgml,v 1.57 2008/07/29 18:31:20 tgl Exp $ --> <!entity history SYSTEM "history.sgml"> <!entity info SYSTEM "info.sgml"> @@ -94,6 +94,7 @@ <!entity adminpack SYSTEM "adminpack.sgml"> <!entity btree-gist SYSTEM "btree-gist.sgml"> <!entity chkpass SYSTEM "chkpass.sgml"> +<!entity citext SYSTEM "citext.sgml"> <!entity cube SYSTEM "cube.sgml"> <!entity dblink SYSTEM "dblink.sgml"> <!entity dict-int SYSTEM "dict-int.sgml"> |