From 61d967498802ab86d8897cb3c61740d7e9d712f6 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas
Date: Tue, 23 Sep 2008 09:20:39 +0000
Subject: Make LC_COLLATE and LC_CTYPE database-level settings. Collation and
ctype are now more like encoding, stored in new datcollate and datctype
columns in pg_database.
This is a stripped-down version of Radek Strnad's patch, with further
changes by me.
---
doc/src/sgml/catalogs.sgml | 16 +++++++-
doc/src/sgml/charset.sgml | 73 +++++++++++++++--------------------
doc/src/sgml/indices.sgml | 6 +--
doc/src/sgml/ref/create_database.sgml | 45 ++++++++++++++++++---
doc/src/sgml/ref/initdb.sgml | 41 ++++++++++++--------
doc/src/sgml/ref/pg_controldata.sgml | 4 +-
doc/src/sgml/ref/pg_resetxlog.sgml | 14 +++----
doc/src/sgml/ref/select.sgml | 5 +--
doc/src/sgml/ref/show.sgml | 10 ++---
doc/src/sgml/runtime.sgml | 13 ++++---
doc/src/sgml/textsearch.sgml | 4 +-
11 files changed, 137 insertions(+), 94 deletions(-)
(limited to 'doc/src')
diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 40f1ce568e..bf1ac314f7 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -1,4 +1,4 @@
-
+
@@ -2149,6 +2149,20 @@
this number to the encoding name)
+
+ datcollate
+ name
+
+ LC_COLLATE for this database
+
+
+
+ datctype
+ name
+
+ LC_CTYPE for this database
+
+
datistemplatebool
diff --git a/doc/src/sgml/charset.sgml b/doc/src/sgml/charset.sgml
index 1f4866b203..c012294ef8 100644
--- a/doc/src/sgml/charset.sgml
+++ b/doc/src/sgml/charset.sgml
@@ -1,4 +1,4 @@
-
+
Localization>
@@ -130,23 +130,23 @@ initdb --locale=sv_SE
The nature of some locale categories is that their value has to be
- fixed for the lifetime of a database cluster. That is, once
- initdb has run, you cannot change them anymore.
- LC_COLLATE and LC_CTYPE are
- those categories. They affect the sort order of indexes, so they
- must be kept fixed, or indexes on text columns will become corrupt.
- PostgreSQL enforces this by recording
- the values of LC_COLLATE> and LC_CTYPE> that are
- seen by initdb>. The server automatically adopts
- those two values when it is started.
+ fixed when the database is created. You can use different settings
+ for different databases, but once a database is created, you cannot
+ change them for that database anymore. LC_COLLATE
+ and LC_CTYPE are those categories. They affect
+ the sort order of indexes, so they must be kept fixed, or indexes on
+ text columns will become corrupt. The default values for these
+ categories are defined when initdb is run, and
+ those values are used when new databases are created, unless
+ specified otherwise in the CREATE DATABASE command.
The other locale categories can be changed as desired whenever the
server is running by setting the run-time configuration variables
that have the same name as the locale categories (see for details). The defaults that are
- chosen by initdb are actually only written into
+ linkend="runtime-config-client-format"> for details). The defaults
+ that are chosen by initdb are actually only written into
the configuration file postgresql.conf to
serve as defaults when the server is started. If you delete these
assignments from postgresql.conf then the
@@ -261,7 +261,7 @@ initdb --locale=sv_SE
Check that PostgreSQL> is actually using the locale
- that you think it is. LC_COLLATE> and LC_CTYPE>
+ that you think it is. The default LC_COLLATE> and LC_CTYPE>
settings are determined at initdb> time and cannot be
changed without repeating initdb>. Other locale
settings including LC_MESSAGES> and LC_MONETARY>
@@ -319,17 +319,11 @@ initdb --locale=sv_SE
- An important restriction, however, is that each database character set
- must be compatible with the server's LC_CTYPE> setting.
+ An important restriction, however, is that each database's character set
+ must be compatible with the database's LC_CTYPE> setting.
When LC_CTYPE> is C> or POSIX>, any
character set is allowed, but for other settings of LC_CTYPE>
there is only one character set that will work correctly.
- Since the LC_CTYPE> setting is frozen by initdb>, the
- apparent flexibility to use different encodings in different databases
- of a cluster is more theoretical than real, except when you select
- C> or POSIX> locale (thus disabling any real locale
- awareness). It is likely that these mechanisms will be revisited in future
- versions of PostgreSQL.
@@ -734,19 +728,19 @@ initdb -E EUC_JP
- If you have selected C> or POSIX> locale,
- you can create a database with a different character set:
+ You can specify a non-default encoding at database creation time,
+ provided that the encoding is compatible with the selected locale:
-createdb -E EUC_KR korean
+createdb -E EUC_KR -T template0 --lc-collate=ko_KR.euckr --lc-ctype=ko_KR.euckr korean
This will create a database named korean that
- uses the character set EUC_KR. Another way to
- accomplish this is to use this SQL command:
+ uses the character set EUC_KR, and locale ko_KR.
+ Another way to accomplish this is to use this SQL command:
-CREATE DATABASE korean WITH ENCODING 'EUC_KR';
+CREATE DATABASE korean WITH ENCODING 'EUC_KR' COLLATE='ko_KR.euckr' CTYPE='ko_KR.euckr' TEMPLATE=template0;
The encoding for a database is stored in the system catalog
@@ -756,20 +750,17 @@ CREATE DATABASE korean WITH ENCODING 'EUC_KR';
$ psql -l
- List of databases
- Database | Owner | Encoding
----------------+---------+---------------
- euc_cn | t-ishii | EUC_CN
- euc_jp | t-ishii | EUC_JP
- euc_kr | t-ishii | EUC_KR
- euc_tw | t-ishii | EUC_TW
- mule_internal | t-ishii | MULE_INTERNAL
- postgres | t-ishii | EUC_JP
- regression | t-ishii | SQL_ASCII
- template1 | t-ishii | EUC_JP
- test | t-ishii | EUC_JP
- utf8 | t-ishii | UTF8
-(9 rows)
+ List of databases
+ Name | Owner | Encoding | Collation | Ctype | Access Privileges
+-----------+----------+-----------+-------------+-------------+-------------------------------------
+ clocaledb | hlinnaka | SQL_ASCII | C | C |
+ englishdb | hlinnaka | UTF8 | en_GB.UTF8 | en_GB.UTF8 |
+ japanese | hlinnaka | UTF8 | ja_JP.UTF8 | ja_JP.UTF8 |
+ korean | hlinnaka | EUC_KR | ko_KR.euckr | ko_KR.euckr |
+ postgres | hlinnaka | UTF8 | fi_FI.UTF8 | fi_FI.UTF8 |
+ template0 | hlinnaka | UTF8 | fi_FI.UTF8 | fi_FI.UTF8 | {=c/hlinnaka,hlinnaka=CTc/hlinnaka}
+ template1 | hlinnaka | UTF8 | fi_FI.UTF8 | fi_FI.UTF8 | {=c/hlinnaka,hlinnaka=CTc/hlinnaka}
+(7 rows)
diff --git a/doc/src/sgml/indices.sgml b/doc/src/sgml/indices.sgml
index 2ab713c39b..0993a8be03 100644
--- a/doc/src/sgml/indices.sgml
+++ b/doc/src/sgml/indices.sgml
@@ -1,4 +1,4 @@
-
+
Indexes
@@ -157,7 +157,7 @@ CREATE INDEX test1_id_index ON test1 (id);
if the pattern is a constant and is anchored to
the beginning of the string — for example, col LIKE
'foo%' or col ~ '^foo', but not
- col LIKE '%bar'. However, if your server does not
+ col LIKE '%bar'. However, if your database does not
use the C locale you will need to create the index with a special
operator class to support indexing of pattern-matching queries. See
below. It is also possible to use
@@ -922,7 +922,7 @@ CREATE INDEX name ON table
according to the locale-specific collation rules. This makes
these operator classes suitable for use by queries involving
pattern matching expressions (LIKE or POSIX
- regular expressions) when the server does not use the standard
+ regular expressions) when the database does not use the standard
C locale. As an example, you might index a
varchar column like this:
diff --git a/doc/src/sgml/ref/create_database.sgml b/doc/src/sgml/ref/create_database.sgml
index b1b1333245..5e72768981 100644
--- a/doc/src/sgml/ref/create_database.sgml
+++ b/doc/src/sgml/ref/create_database.sgml
@@ -1,5 +1,5 @@
@@ -24,6 +24,8 @@ CREATE DATABASE name
[ [ WITH ] [ OWNER [=] dbowner ]
[ TEMPLATE [=] template ]
[ ENCODING [=] encoding ]
+ [ COLLATE [=] collate ]
+ [ CTYPE [=] ctype ]
[ TABLESPACE [=] tablespace ]
[ CONNECTION LIMIT [=] connlimit ] ]
@@ -112,6 +114,29 @@ CREATE DATABASE name
+
+ collate
+
+
+ Collation order (LC_COLLATE>) to use in the new database.
+ This affects the sort order applied to strings, e.g in queries with
+ ORDER BY, as well as the order used in indexes on text columns.
+ The default is to use the collation order of the template database.
+ See below for additional restrictions.
+
+
+
+
+ ctype
+
+
+ Character classification (LC_CTYPE>) to use in the new
+ database. This affects the categorization of characters, e.g. lower,
+ upper and digit. The default is to use the character classification of
+ the template database. See below for additional restrictions.
+
+
+ tablespace
@@ -180,13 +205,11 @@ CREATE DATABASE name
- Any character set encoding specified for the new database must be
- compatible with the server's LC_CTYPE> locale setting.
+ The character set encoding specified for the new database must be
+ compatible with the chosen COLLATE and CTYPE settings.
If LC_CTYPE> is C> (or equivalently
POSIX>), then all encodings are allowed, but for other
- locale settings there is only one encoding that will work properly,
- and so the apparent freedom to specify an encoding is illusory if
- you didn't initialize the database cluster in C> locale.
+ locale settings there is only one encoding that will work properly.
CREATE DATABASE> will allow superusers to specify
SQL_ASCII> encoding regardless of the locale setting,
but this choice is deprecated and may result in misbehavior of
@@ -194,6 +217,16 @@ CREATE DATABASE name
with the locale is stored in the database.
+
+ The COLLATE> and CTYPE> settings must match
+ those of the template database, except when template0 is used as
+ template. This is because COLLATE> and CTYPE>
+ affects the ordering in indexes, so that any indexes copied from the
+ template database would be invalid in the new database with different
+ settings. template0, however, is known to not
+ contain any indexes that would be affected.
+
+
The CONNECTION LIMIT> option is only enforced approximately;
if two new sessions start at about the same time when just one
diff --git a/doc/src/sgml/ref/initdb.sgml b/doc/src/sgml/ref/initdb.sgml
index 312da7085a..110c21eb8c 100644
--- a/doc/src/sgml/ref/initdb.sgml
+++ b/doc/src/sgml/ref/initdb.sgml
@@ -1,5 +1,5 @@
@@ -76,25 +76,34 @@ PostgreSQL documentation
initdb initializes the database cluster's default
- locale and character set encoding. The collation order
- (LC_COLLATE>) and character set classes
- (LC_CTYPE>, e.g. upper, lower, digit) are fixed for all
- databases and cannot be changed. Collation orders other than
- C> or POSIX> also have a performance penalty.
- For these reasons it is important to choose the right locale when
- running initdb. The remaining locale categories
- can be changed later when the server is started. All server locale
- values (lc_*>) can be displayed via SHOW ALL>.
+ locale and character set encoding. The character set encoding,
+ collation order (LC_COLLATE>) and character set classes
+ (LC_CTYPE>, e.g. upper, lower, digit) can be set separately
+ for a database when it is created. initdb determines
+ those settings for the template1 database, which will
+ serve as the default for all other databases.
+
+
+
+ To alter the default collation order or character set classes, use the
+ and options.
+ Collation orders other than C> or POSIX> also have
+ a performance penalty. For these reasons it is important to choose the
+ right locale when running initdb.
+
+
+
+ The remaining locale categories can be changed later when the server
+ is started. You can also use to set the
+ default for all locale categories, including collation order and
+ character set classes. All server locale values (lc_*>) can
+ be displayed via SHOW ALL>.
More details can be found in .
- The character set encoding can be set separately for a database when
- it is created. initdb determines the encoding for
- the template1 database, which will serve as the
- default for all other databases. To alter the default encoding use
- the option. More details can be found in
- .
+ To alter the default encoding, use the .
+ More details can be found in .
diff --git a/doc/src/sgml/ref/pg_controldata.sgml b/doc/src/sgml/ref/pg_controldata.sgml
index 466c03e224..62695963e2 100644
--- a/doc/src/sgml/ref/pg_controldata.sgml
+++ b/doc/src/sgml/ref/pg_controldata.sgml
@@ -1,5 +1,5 @@
@@ -30,7 +30,7 @@ PostgreSQL documentation
Descriptionpg_controldata prints information initialized during
- initdb>, such as the catalog version and server locale.
+ initdb>, such as the catalog version.
It also shows information about write-ahead logging and checkpoint
processing. This information is cluster-wide, and not specific to any one
database.
diff --git a/doc/src/sgml/ref/pg_resetxlog.sgml b/doc/src/sgml/ref/pg_resetxlog.sgml
index 588ff38c1b..a9d34298e4 100644
--- a/doc/src/sgml/ref/pg_resetxlog.sgml
+++ b/doc/src/sgml/ref/pg_resetxlog.sgml
@@ -1,5 +1,5 @@
@@ -62,14 +62,10 @@ PostgreSQL documentation
by specifying the -f> (force) switch. In this case plausible
values will be substituted for the missing data. Most of the fields can be
expected to match, but manual assistance might be needed for the next OID,
- next transaction ID and epoch, next multitransaction ID and offset,
- WAL starting address, and database locale fields.
- The first six of these can be set using the switches discussed below.
- pg_resetxlog's own environment is the source for its
- guess at the locale fields; take care that LANG> and so forth
- match the environment that initdb> was run in.
- If you are not able to determine correct values for all these fields,
- -f> can still be used, but
+ next transaction ID and epoch, next multitransaction ID and offset, and
+ WAL starting address fields. These fields can be set using the switches
+ discussed below. If you are not able to determine correct values for all
+ these fields, -f> can still be used, but
the recovered database must be treated with even more suspicion than
usual: an immediate dump and reload is imperative. Do not>
execute any data-modifying operations in the database before you dump,
diff --git a/doc/src/sgml/ref/select.sgml b/doc/src/sgml/ref/select.sgml
index 000b5614dd..d8ed7aef9c 100644
--- a/doc/src/sgml/ref/select.sgml
+++ b/doc/src/sgml/ref/select.sgml
@@ -1,5 +1,5 @@
@@ -747,8 +747,7 @@ SELECT name FROM distributors ORDER BY code;
Character-string data is sorted according to the locale-specific
- collation order that was established when the database cluster
- was initialized.
+ collation order that was established when the database was created.
diff --git a/doc/src/sgml/ref/show.sgml b/doc/src/sgml/ref/show.sgml
index ebd1acee35..fdc348053e 100644
--- a/doc/src/sgml/ref/show.sgml
+++ b/doc/src/sgml/ref/show.sgml
@@ -1,5 +1,5 @@
@@ -82,8 +82,8 @@ SHOW ALL
Shows the database's locale setting for collation (text
ordering). At present, this parameter can be shown but not
- set, because the setting is determined at
- initdb> time.
+ set, because the setting is determined at database creation
+ time.
@@ -94,8 +94,8 @@ SHOW ALL
Shows the database's locale setting for character
classification. At present, this parameter can be shown but
- not set, because the setting is determined at
- initdb> time.
+ not set, because the setting is determined at database creation
+ time.
diff --git a/doc/src/sgml/runtime.sgml b/doc/src/sgml/runtime.sgml
index 75c6d266e9..adde49e1a3 100644
--- a/doc/src/sgml/runtime.sgml
+++ b/doc/src/sgml/runtime.sgml
@@ -1,4 +1,4 @@
-
+
Operating System Environment
@@ -145,11 +145,12 @@ postgres$ initdb -D /usr/local/pgsql/data
Normally, it will just take the locale settings in the environment
and apply them to the initialized database. It is possible to
specify a different locale for the database; more information about
- that can be found in . The sort order used
- within a particular database cluster is set by
- initdb and cannot be changed later, short of
- dumping all data, rerunning initdb, and reloading
- the data. There is also a performance impact for using locales
+ that can be found in . The default sort order used
+ within the particular database cluster is set by
+ initdb, and while you can create new databases using
+ different sort order, the order used in the template databases that initdb
+ creates cannot be changed without dropping and recreating them.
+ There is also a performance impact for using locales
other than C> or POSIX>. Therefore, it is
important to make this choice correctly the first time.
diff --git a/doc/src/sgml/textsearch.sgml b/doc/src/sgml/textsearch.sgml
index 41db566b6c..45a9f5a389 100644
--- a/doc/src/sgml/textsearch.sgml
+++ b/doc/src/sgml/textsearch.sgml
@@ -1,4 +1,4 @@
-
+
Full Text Search
@@ -1896,7 +1896,7 @@ LIMIT 10;
- The parser's notion of a letter> is determined by the server's
+ The parser's notion of a letter> is determined by the database's
locale setting, specifically lc_ctype>. Words containing
only the basic ASCII letters are reported as a separate token type,
since it is sometimes useful to distinguish them. In most European
--
cgit v1.2.3