From 55613bf9cd7d6071e43e68ac14bc0243a1027507 Mon Sep 17 00:00:00 2001 From: Andrew Dunstan Date: Tue, 18 Sep 2007 17:41:17 +0000 Subject: Close previously open holes for invalidly encoded data to enter the database via builtin functions, as recently discussed on -hackers. chr() now returns a character in the database encoding. For UTF8 encoded databases the argument is treated as a Unicode code point. For other multi-byte encodings the argument must designate a strict ascii character, or an error is raised, as is also the case if the argument is 0. ascii() is adjusted so that it remains the inverse of chr(). The two argument form of convert() is gone, and the three argument form now takes a bytea first argument and returns a bytea. To cover this loss three new functions are introduced: . convert_from(bytea, name) returns text - converts the first argument from the named encoding to the database encoding . convert_to(text, name) returns bytea - converts the first argument from the database encoding to the named encoding . length(bytea, name) returns int - gives the length of the first argument in characters in the named encoding --- doc/src/sgml/func.sgml | 79 +++++++++++++++++++++++++++++++++++++++++++------- 1 file changed, 68 insertions(+), 11 deletions(-) (limited to 'doc/src') diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index 393c1e31766..b3dae7dcead 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -1,4 +1,4 @@ - + Functions and Operators @@ -1122,13 +1122,14 @@ convert(string using conversion_name) - text + bytea Change encoding using specified conversion name. Conversions can be defined by CREATE CONVERSION. Also there are some pre-defined conversion names. See for available conversion - names. + names. The string must be valid in the + source encoding. convert('PostgreSQL' using iso_8859_1_to_utf8) 'PostgreSQL' in UTF8 (Unicode, 8-bit) encoding @@ -1244,6 +1245,12 @@ chr + + convert_from + + + convert_to + decode @@ -1319,7 +1326,12 @@ ascii(string) int - ASCII code of the first byte of the argument + + ASCII code of the first character of the argument. + For UTF8 returns the Unicode code point of the character. + For other multi-byte encodings. the argument must be a strictly + ASCII character. + ascii('x') 120 @@ -1340,29 +1352,61 @@ chr(int) text - Character with the given ASCII code + + Character with the given code. For UTF8 the argument is + treated as a Unicode code point. For other multi-byte encodings the argument + must designate a strictly ASCII character. + chr(65) A - convert(string text, - src_encoding name, + convert(string bytea, + src_encoding name, dest_encoding name) - text + bytea Convert string to dest_encoding. The original encoding is specified by - src_encoding. If - src_encoding is omitted, database - encoding is assumed. + src_encoding. The string + must be valid in this encoding. convert( 'text_in_utf8', 'UTF8', 'LATIN1') text_in_utf8 represented in ISO 8859-1 encoding + + + convert_from(string bytea, + src_encoding name) + + text + + Convert string to the database encoding. + The original encoding is specified by + src_encoding. The string + must be valid in this encoding. + + convert_from( 'text_in_utf8', 'UTF8') + text_in_utf8 represented in the current database encoding + + + + + convert_to(string text, + dest_encoding name) + + text + + Convert string to dest_encoding. + + convert_to( 'some text', 'UTF8') + some text represented in the UTF8 encoding + + decode(string text, @@ -1415,6 +1459,19 @@ 4 + + length(stringbytea, + encoding name ) + int + + Number of characters in string in the + given encoding. The + string must be valid in this encoding. + + length('jose', 'UTF8') + 4 + + lpad(string text, -- cgit v1.2.3