From 7cd082f907814f0fe90918399cbb95fd83f161c9 Mon Sep 17 00:00:00 2001 From: Peter Eisentraut Date: Tue, 7 Sep 2010 18:54:09 +0000 Subject: [PATCH] Clarify that surrogate pairs are not encoded in UTF-8 directly --- doc/src/sgml/syntax.sgml | 49 +++++++++++++++++++++++----------------- 1 file changed, 28 insertions(+), 21 deletions(-) diff --git a/doc/src/sgml/syntax.sgml b/doc/src/sgml/syntax.sgml index ca092b5ae6e..18582b9216c 100644 --- a/doc/src/sgml/syntax.sgml +++ b/doc/src/sgml/syntax.sgml @@ -1,4 +1,4 @@ - + SQL Syntax @@ -236,12 +236,15 @@ U&"d!0061t!+000061" UESCAPE '!' The Unicode escape syntax works only when the server encoding is - UTF8. When other server encodings are used, only code points in - the ASCII range (up to \007F) can be specified. - Both the 4-digit and the 6-digit form can be used to specify - UTF-16 surrogate pairs to compose characters with code points - larger than U+FFFF (although the availability of - the 6-digit form technically makes this unnecessary). + UTF8. When other server encodings are used, only code + points in the ASCII range (up to \007F) can be + specified. Both the 4-digit and the 6-digit form can be used to + specify UTF-16 surrogate pairs to compose characters with code + points larger than U+FFFF, although the availability of the + 6-digit form technically makes this unnecessary. (When surrogate + pairs are used when the server encoding is UTF8, they + are first combined into a single code point that is then encoded + in UTF-8.) @@ -431,13 +434,15 @@ SELECT 'foo' 'bar'; The Unicode escape syntax works fully only when the server - encoding is UTF-8. When other server encodings are used, only - code points in the ASCII range (up to \u007F) can be - specified. Both the 4-digit and the 8-digit form can be used to - specify UTF-16 surrogate pairs to compose characters with code - points larger than U+FFFF (although the - availability of the 8-digit form technically makes this - unnecessary). + encoding is UTF8. When other server encodings are + used, only code points in the ASCII range (up + to \u007F) can be specified. Both the 4-digit and + the 8-digit form can be used to specify UTF-16 surrogate pairs to + compose characters with code points larger than U+FFFF, although + the availability of the 8-digit form technically makes this + unnecessary. (When surrogate pairs are used when the server + encoding is UTF8, they are first combined into a + single code point that is then encoded in UTF-8.) @@ -517,13 +522,15 @@ U&'d!0061t!+000061' UESCAPE '!' The Unicode escape syntax works only when the server encoding is - UTF8. When other server encodings are used, only code points in - the ASCII range (up to \007F) can be - specified. - Both the 4-digit and the 6-digit form can be used to specify - UTF-16 surrogate pairs to compose characters with code points - larger than U+FFFF (although the availability - of the 6-digit form technically makes this unnecessary). + UTF8. When other server encodings are used, only + code points in the ASCII range (up to \007F) + can be specified. Both the 4-digit and the 6-digit form can be + used to specify UTF-16 surrogate pairs to compose characters with + code points larger than U+FFFF, although the availability of the + 6-digit form technically makes this unnecessary. (When surrogate + pairs are used when the server encoding is UTF8, they + are first combined into a single code point that is then encoded + in UTF-8.) -- 2.30.2