variants and customization options.
</para>
</sect2>
+
<sect2 id="icu-locales">
<title>ICU Locales</title>
+
<sect3 id="icu-locale-names">
<title>ICU Locale Names</title>
+
<para>
The ICU format for the locale name is a <link
linkend="icu-language-tag">Language Tag</link>.
linkend="icu-language-tag">language tag</link> instead of relying on the
transformation.
</para>
+
<para>
A locale with no language name, or the special language name
<literal>root</literal>, is transformed to have the language
<literal>und</literal> ("undefined").
</para>
+
<para>
ICU can transform most libc locale names, as well as some other formats,
into language tags for easier transition to ICU. If a libc locale name is
used in ICU, it may not have precisely the same behavior as in libc.
</para>
+
<para>
If there is a problem interpreting the locale name, or if the locale name
represents a language or region that ICU does not recognize, you will see
<sect3 id="icu-language-tag">
<title>Language Tag</title>
+
<para>
A language tag, defined in BCP 47, is a standardized identifier used to
identify languages, regions, and other information about a locale.
</para>
+
<para>
Basic language tags are simply
<replaceable>language</replaceable><literal>-</literal><replaceable>region</replaceable>;
<literal>ja-JP</literal>, <literal>de</literal>, or
<literal>fr-CA</literal>.
</para>
+
<para>
Collation settings may be included in the language tag to customize
collation behavior. ICU allows extensive customization, such as
treatment of digits within text; and many other options to satisfy a
variety of uses.
</para>
+
<para>
To include this additional collation information in a language tag,
append <literal>-u</literal>, which indicates there are additional
<literal>-</literal><replaceable>value</replaceable>, which implies a
value of <literal>true</literal>.
</para>
+
<para>
For example, the language tag <literal>en-US-u-kn-ks-level2</literal>
means the locale with the English language in the US region, with
(1 row)
</screen>
</para>
+
<para>
See <xref linkend="icu-custom-collations"/> for details and additional
examples of using language tags with custom collation information for the
</para>
</sect3>
</sect2>
+
<sect2 id="locale-problems">
<title>Problems</title>
</tip>
</sect3>
</sect2>
+
<sect2 id="icu-custom-collations">
<title>ICU Custom Collations</title>
linkend="icu-collation-settings"/>, or see <xref
linkend="icu-external-references"/> for more details.
</para>
+
<sect3 id="icu-collation-comparison-levels">
<title>ICU Comparison Levels</title>
+
<para>
Comparison of two strings (collation) in ICU is determined by a
multi-level process, where textual features are grouped into
linkend="icu-collation-settings-table">collation settings</link>. Higher
levels correspond to finer textual features.
</para>
+
<para>
<xref linkend="icu-collation-levels"/> shows which textual feature
differences are considered significant when determining equality at the
invisible separator, and as seen in the table, is ignored for at all
levels of comparison less than <literal>identic</literal>.
</para>
- <para>
+
<table id="icu-collation-levels">
<title>ICU Collation Levels</title>
<tgroup cols="8">
<colspec colname="col6" colwidth="1*"/>
<colspec colname="col7" colwidth="1*"/>
<colspec colname="col8" colwidth="1*"/>
+
<thead>
<row>
<entry>Level</entry>
<entry><literal>'y' = 'z'</literal></entry>
</row>
</thead>
+
<tbody>
<row>
<entry>level1</entry>
</tgroup>
</table>
+ <para>
At every level, even with full normalization off, basic normalization is
performed. For example, <literal>'á'</literal> may be composed of the
code points <literal>U&'\0061\0301'</literal> or the single code
created with <symbol>deterministic</symbol> set to
<literal>true</literal>.
</para>
+
<sect4 id="icu-collation-level-examples">
<title>Collation Level Examples</title>
- <para>
<programlisting>
CREATE COLLATION level3 (provider = icu, deterministic = false, locale = 'und-u-ka-shifted-ks-level3');
SELECT 'x-y' = 'x_y' COLLATE level4; -- false
</programlisting>
- </para>
</sect4>
</sect3>
<sect3 id="icu-collation-settings">
<title>Collation Settings for an ICU Locale</title>
+
<para>
<xref linkend="icu-collation-settings-table"/> shows the available
collation settings, which can be used as part of a language tag to
customize a collation.
</para>
- <para>
+
<table id="icu-collation-settings-table">
<title>ICU Collation Settings</title>
<tgroup cols="4">
<colspec colname="col2" colwidth="2*"/>
<colspec colname="col3" colwidth="2*"/>
<colspec colname="col4" colwidth="5*"/>
+
<thead>
<row>
<entry>Key</entry>
<entry>Description</entry>
</row>
</thead>
+
<tbody>
<row>
<entry><literal>co</literal></entry>
Collation type. See <xref linkend="icu-external-references"/> for additional options and details.
</entry>
</row>
+
<row>
<entry><literal>ka</literal></entry>
<entry><literal>noignore</literal>, <literal>shifted</literal></entry>
character classes are ignored.
</entry>
</row>
+
<row>
<entry><literal>kb</literal></entry>
<entry><literal>true</literal>, <literal>false</literal></entry>
before <literal>'aé'</literal>.
</entry>
</row>
+
<row>
<entry><literal>kc</literal></entry>
<entry><literal>true</literal>, <literal>false</literal></entry>
</para>
</entry>
</row>
+
<row>
<entry><literal>kf</literal></entry>
<entry>
the rules of the locale.
</entry>
</row>
+
<row>
<entry><literal>kn</literal></entry>
<entry><literal>true</literal>, <literal>false</literal></entry>
<literal>'id-123'</literal>.
</entry>
</row>
+
<row>
<entry><literal>kk</literal></entry>
<entry><literal>true</literal>, <literal>false</literal></entry>
</para>
</entry>
</row>
+
<row>
<entry><literal>kr</literal></entry>
<entry>
</para>
</entry>
</row>
+
<row>
<entry><literal>ks</literal></entry>
<entry><literal>level1</literal>, <literal>level2</literal>, <literal>level3</literal>, <literal>level4</literal>, <literal>identic</literal></entry>
<xref linkend="icu-collation-levels"/> for details.
</entry>
</row>
+
<row>
<entry><literal>kv</literal></entry>
<entry>
</tbody>
</tgroup>
</table>
- Defaults may depend on locale. The above table is not meant to be
- complete. See <xref linkend="icu-external-references"/> for additional
- options and details.
+
+ <para>
+ Defaults may depend on locale. The above table is not meant to be
+ complete. See <xref linkend="icu-external-references"/> for additional
+ options and details.
</para>
+
<note>
<para>
For many collation settings, you must create the collation with
<sect3 id="icu-locale-examples">
<title>Examples</title>
- <para>
+
<variablelist>
<varlistentry id="collation-managing-create-icu-de-u-co-phonebk-x-icu">
<term><literal>CREATE COLLATION "de-u-co-phonebk-x-icu" (provider = icu, locale = 'de-u-co-phonebk');</literal></term>
</listitem>
</varlistentry>
</variablelist>
- </para>
</sect3>
<sect3 id="icu-external-references">
<title>External References for ICU</title>
+
<para>
This section (<xref linkend="icu-custom-collations"/>) is only a brief
overview of ICU behavior and language tags. Refer to the following
documents for technical details, additional options, and new behavior:
</para>
+
<itemizedlist>
<listitem>
<para>
- <ulink
- url="https://www.unicode.org/reports/tr35/tr35-collation.html">Unicode
- Technical Standard #35</ulink>
+ <ulink url="https://www.unicode.org/reports/tr35/tr35-collation.html">Unicode Technical Standard #35</ulink>
</para>
</listitem>
<listitem>
</listitem>
<listitem>
<para>
- <ulink url="https://github.com/unicode-org/cldr/blob/master/common/bcp47/collation.xml">CLDR
- repository</ulink>
+ <ulink url="https://github.com/unicode-org/cldr/blob/master/common/bcp47/collation.xml">CLDR repository</ulink>
</para>
</listitem>
<listitem>