-<!-- $PostgreSQL: pgsql/doc/src/sgml/charset.sgml,v 2.76 2006/02/18 16:15:21 petere Exp $ -->
+<!-- $PostgreSQL: pgsql/doc/src/sgml/charset.sgml,v 2.77 2006/07/28 15:33:17 tgl Exp $ -->
<chapter id="charset">
<title>Localization</>
allows you to store text in a variety of character sets, including
single-byte character sets such as the ISO 8859 series and
multiple-byte character sets such as <acronym>EUC</> (Extended Unix
- Code), UTF-8, and Mule internal code. All character sets can be
- used transparently throughout the server. (If you use extension
- functions from other sources, it depends on whether they wrote
- their code correctly.) The default character set is selected while
+ Code), UTF-8, and Mule internal code. All supported character sets
+ can be used transparently by clients, but a few are not supported
+ for use within the server (that is, as a server-side encoding).
+ The default character set is selected while
initializing your <productname>PostgreSQL</productname> database
cluster using <command>initdb</>. It can be overridden when you
- create a database using <command>createdb</command> or by using the
- SQL command <command>CREATE DATABASE</>. So you can have multiple
+ create a database, so you can have multiple
databases each with a different character set.
</para>
<para>
<xref linkend="charset-table"> shows the character sets available
- for use in the server.
+ for use in <productname>PostgreSQL</productname>.
</para>
<table id="charset-table">
<title>Server Character Sets</title>
- <tgroup cols="2">
+ <tgroup cols="6">
<thead>
<row>
<entry>Name</entry>
<entry>Description</entry>
<entry>Language</entry>
+ <entry>Server?</entry>
<!--
The Bytes/Char field is populated by looking at the values returned
by pg_wchar_table.mblen function for each encoding.
<entry><literal>BIG5</literal></entry>
<entry>Big Five</entry>
<entry>Traditional Chinese</entry>
+ <entry>No</entry>
<entry>1-2</entry>
<entry><literal>WIN950</>, <literal>Windows950</></entry>
</row>
<entry><literal>EUC_CN</literal></entry>
<entry>Extended UNIX Code-CN</entry>
<entry>Simplified Chinese</entry>
+ <entry>Yes</entry>
<entry>1-3</entry>
<entry></entry>
</row>
<entry><literal>EUC_JP</literal></entry>
<entry>Extended UNIX Code-JP</entry>
<entry>Japanese</entry>
+ <entry>Yes</entry>
<entry>1-3</entry>
<entry></entry>
</row>
<entry><literal>EUC_KR</literal></entry>
<entry>Extended UNIX Code-KR</entry>
<entry>Korean</entry>
+ <entry>Yes</entry>
<entry>1-3</entry>
<entry></entry>
</row>
<entry><literal>EUC_TW</literal></entry>
<entry>Extended UNIX Code-TW</entry>
<entry>Traditional Chinese, Taiwanese</entry>
+ <entry>Yes</entry>
<entry>1-3</entry>
<entry></entry>
</row>
<entry><literal>GB18030</literal></entry>
<entry>National Standard</entry>
<entry>Chinese</entry>
+ <entry>No</entry>
<entry>1-2</entry>
<entry></entry>
</row>
<entry><literal>GBK</literal></entry>
<entry>Extended National Standard</entry>
<entry>Simplified Chinese</entry>
+ <entry>No</entry>
<entry>1-2</entry>
<entry><literal>WIN936</>, <literal>Windows936</></entry>
</row>
<entry><literal>ISO_8859_5</literal></entry>
<entry>ISO 8859-5, <acronym>ECMA</> 113</entry>
<entry>Latin/Cyrillic</entry>
+ <entry>Yes</entry>
<entry>1</entry>
<entry></entry>
</row>
<entry><literal>ISO_8859_6</literal></entry>
<entry>ISO 8859-6, <acronym>ECMA</> 114</entry>
<entry>Latin/Arabic</entry>
+ <entry>Yes</entry>
<entry>1</entry>
<entry></entry>
</row>
<entry><literal>ISO_8859_7</literal></entry>
<entry>ISO 8859-7, <acronym>ECMA</> 118</entry>
<entry>Latin/Greek</entry>
+ <entry>Yes</entry>
<entry>1</entry>
<entry></entry>
</row>
<entry><literal>ISO_8859_8</literal></entry>
<entry>ISO 8859-8, <acronym>ECMA</> 121</entry>
<entry>Latin/Hebrew</entry>
+ <entry>Yes</entry>
<entry>1</entry>
<entry></entry>
</row>
<entry><literal>JOHAB</literal></entry>
<entry><acronym>JOHAB</></entry>
<entry>Korean (Hangul)</entry>
+ <entry>Yes</entry>
<entry>1-3</entry>
<entry></entry>
</row>
<entry><literal>KOI8</literal></entry>
<entry><acronym>KOI</acronym>8-R(U)</entry>
<entry>Cyrillic</entry>
+ <entry>Yes</entry>
<entry>1</entry>
<entry><literal>KOI8R</></entry>
</row>
<entry><literal>LATIN1</literal></entry>
<entry>ISO 8859-1, <acronym>ECMA</> 94</entry>
<entry>Western European</entry>
+ <entry>Yes</entry>
<entry>1</entry>
<entry><literal>ISO88591</></entry>
</row>
<entry><literal>LATIN2</literal></entry>
<entry>ISO 8859-2, <acronym>ECMA</> 94</entry>
<entry>Central European</entry>
+ <entry>Yes</entry>
<entry>1</entry>
<entry><literal>ISO88592</></entry>
</row>
<entry><literal>LATIN3</literal></entry>
<entry>ISO 8859-3, <acronym>ECMA</> 94</entry>
<entry>South European</entry>
+ <entry>Yes</entry>
<entry>1</entry>
<entry><literal>ISO88593</></entry>
</row>
<entry><literal>LATIN4</literal></entry>
<entry>ISO 8859-4, <acronym>ECMA</> 94</entry>
<entry>North European</entry>
+ <entry>Yes</entry>
<entry>1</entry>
<entry><literal>ISO88594</></entry>
</row>
<entry><literal>LATIN5</literal></entry>
<entry>ISO 8859-9, <acronym>ECMA</> 128</entry>
<entry>Turkish</entry>
+ <entry>Yes</entry>
<entry>1</entry>
<entry><literal>ISO88599</></entry>
</row>
<entry><literal>LATIN6</literal></entry>
<entry>ISO 8859-10, <acronym>ECMA</> 144</entry>
<entry>Nordic</entry>
+ <entry>Yes</entry>
<entry>1</entry>
<entry><literal>ISO885910</></entry>
</row>
<entry><literal>LATIN7</literal></entry>
<entry>ISO 8859-13</entry>
<entry>Baltic</entry>
+ <entry>Yes</entry>
<entry>1</entry>
<entry><literal>ISO885913</></entry>
</row>
<entry><literal>LATIN8</literal></entry>
<entry>ISO 8859-14</entry>
<entry>Celtic</entry>
+ <entry>Yes</entry>
<entry>1</entry>
<entry><literal>ISO885914</></entry>
</row>
<entry><literal>LATIN9</literal></entry>
<entry>ISO 8859-15</entry>
<entry>LATIN1 with Euro and accents</entry>
+ <entry>Yes</entry>
<entry>1</entry>
<entry>ISO885915</entry>
</row>
<entry><literal>LATIN10</literal></entry>
<entry>ISO 8859-16, <acronym>ASRO</> SR 14111</entry>
<entry>Romanian</entry>
+ <entry>Yes</entry>
<entry>1</entry>
<entry><literal>ISO885916</></entry>
</row>
<entry><literal>MULE_INTERNAL</literal></entry>
<entry>Mule internal code</entry>
<entry>Multilingual Emacs</entry>
+ <entry>Yes</entry>
<entry>1-4</entry>
<entry></entry>
</row>
<entry><literal>SJIS</literal></entry>
<entry>Shift JIS</entry>
<entry>Japanese</entry>
+ <entry>No</entry>
<entry>1-2</entry>
<entry><literal>Mskanji</>, <literal>ShiftJIS</>, <literal>WIN932</>, <literal>Windows932</></entry>
</row>
<entry><literal>SQL_ASCII</literal></entry>
<entry>unspecified (see text)</entry>
<entry><emphasis>any</></entry>
+ <entry>Yes</entry>
<entry>1</entry>
<entry></entry>
</row>
<entry><literal>UHC</literal></entry>
<entry>Unified Hangul Code</entry>
<entry>Korean</entry>
+ <entry>No</entry>
<entry>1-2</entry>
<entry><literal>WIN949</>, <literal>Windows949</></entry>
</row>
<entry><literal>UTF8</literal></entry>
<entry>Unicode, 8-bit</entry>
<entry><emphasis>all</></entry>
+ <entry>Yes</entry>
<entry>1-4</entry>
<entry><literal>Unicode</></entry>
</row>
<entry><literal>WIN866</literal></entry>
<entry>Windows CP866</entry>
<entry>Cyrillic</entry>
+ <entry>Yes</entry>
<entry>1</entry>
<entry><literal>ALT</></entry>
</row>
<entry><literal>WIN874</literal></entry>
<entry>Windows CP874</entry>
<entry>Thai</entry>
+ <entry>Yes</entry>
<entry>1</entry>
<entry></entry>
</row>
<entry><literal>WIN1250</literal></entry>
<entry>Windows CP1250</entry>
<entry>Central European</entry>
+ <entry>Yes</entry>
<entry>1</entry>
<entry></entry>
</row>
<entry><literal>WIN1251</literal></entry>
<entry>Windows CP1251</entry>
<entry>Cyrillic</entry>
+ <entry>Yes</entry>
<entry>1</entry>
<entry><literal>WIN</></entry>
</row>
<entry><literal>WIN1252</literal></entry>
<entry>Windows CP1252</entry>
<entry>Western European</entry>
+ <entry>Yes</entry>
<entry>1</entry>
<entry></entry>
</row>
- <row>
- <entry><literal>WIN1253</literal></entry>
- <entry>Windows CP1253</entry>
- <entry>Greek</entry>
- <entry>1</entry>
+ <row>
+ <entry><literal>WIN1253</literal></entry>
+ <entry>Windows CP1253</entry>
+ <entry>Greek</entry>
+ <entry>Yes</entry>
+ <entry>1</entry>
+ <entry></entry>
+ </row>
+ <row>
+ <entry><literal>WIN1254</literal></entry>
+ <entry>Windows CP1254</entry>
+ <entry>Turkish</entry>
+ <entry>Yes</entry>
+ <entry>1</entry>
+ <entry></entry>
+ </row>
+ <row>
+ <entry><literal>WIN1255</literal></entry>
+ <entry>Windows CP1255</entry>
+ <entry>Hebrew</entry>
+ <entry>Yes</entry>
+ <entry>1</entry>
<entry></entry>
</row>
- <row>
- <entry><literal>WIN1254</literal></entry>
- <entry>Windows CP1254</entry>
- <entry>Turkish</entry>
- <entry>1</entry>
- <entry></entry>
- </row>
- <row>
- <entry><literal>WIN1255</literal></entry>
- <entry>Windows CP1255</entry>
- <entry>Hebrew</entry>
- <entry>1</entry>
- <entry></entry>
- </row>
<row>
<entry><literal>WIN1256</literal></entry>
<entry>Windows CP1256</entry>
<entry>Arabic</entry>
+ <entry>Yes</entry>
+ <entry>1</entry>
+ <entry></entry>
+ </row>
+ <row>
+ <entry><literal>WIN1257</literal></entry>
+ <entry>Windows CP1257</entry>
+ <entry>Baltic</entry>
+ <entry>Yes</entry>
<entry>1</entry>
<entry></entry>
</row>
- <row>
- <entry><literal>WIN1257</literal></entry>
- <entry>Windows CP1257</entry>
- <entry>Baltic</entry>
- <entry>1</entry>
- <entry></entry>
- </row>
<row>
<entry><literal>WIN1258</literal></entry>
<entry>Windows CP1258</entry>
<entry>Vietnamese</entry>
+ <entry>Yes</entry>
<entry>1</entry>
<entry><literal>ABC</>, <literal>TCVN</>, <literal>TCVN5712</>, <literal>VSCII</></entry>
</row>
<para>
<productname>PostgreSQL</productname> supports automatic
character set conversion between server and client for certain
- character sets. The conversion information is stored in the
- <literal>pg_conversion</> system catalog. You can create a new
- conversion by using the SQL command <command>CREATE
- CONVERSION</command>. <productname>PostgreSQL</> comes with some
- predefined conversions. They are listed in <xref
- linkend="multibyte-translation-table">.
+ character set combinations. The conversion information is stored in the
+ <literal>pg_conversion</> system catalog. <productname>PostgreSQL</>
+ comes with some predefined conversions, as shown in <xref
+ linkend="multibyte-translation-table">. You can create a new
+ conversion using the SQL command <command>CREATE CONVERSION</command>.
</para>
<table id="multibyte-translation-table">
SET CLIENT_ENCODING TO '<replaceable>value</>';
</programlisting>
- Also you can use the more standard SQL syntax <literal>SET NAMES</literal> for this purpose:
+ Also you can use the standard SQL syntax <literal>SET NAMES</literal>
+ for this purpose:
<programlisting>
SET NAMES '<replaceable>value</>';
If the conversion of a particular character is not possible
— suppose you chose <literal>EUC_JP</literal> for the
server and <literal>LATIN1</literal> for the client, then some
- Japanese characters cannot be converted to
- <literal>LATIN1</literal> — it is transformed to its
- hexadecimal byte values in parentheses, e.g.,
- <literal>(826C)</literal>.
+ Japanese characters do not have a representation in
+ <literal>LATIN1</literal> — then an error is reported.
</para>
<para>