From bf00bbb0c4940b80b46b7e5b379cd64184f2262f Mon Sep 17 00:00:00 2001
From: Marc G. Fournier
Date: Fri, 24 Jul 1998 03:32:46 +0000
Subject: I really hope that I haven't missed anything in this one...

From: t-ishii@sra.co.jp

Attached are patches to enhance the multi-byte support.  (patches are
against 7/18 snapshot)

* determine encoding at initdb/createdb rather than compile time

Now initdb/createdb has an option to specify the encoding. Also, I
modified the syntax of CREATE DATABASE to accept encoding option. See
README.mb for more details.

For this purpose I have added new column "encoding" to pg_database.
Also pg_attribute and pg_class are changed to catch up the
modification to pg_database.  Actually I haved added pg_database_mb.h,
pg_attribute_mb.h and pg_class_mb.h. These are used only when MB is
enabled. The reason having separate files is I couldn't find a way to
use ifdef or whatever in those files. I have to admit it looks
ugly. No way.

* support for PGCLIENTENCODING when issuing COPY command

commands/copy.c modified.

* support for SQL92 syntax "SET NAMES"

See gram.y.

* support for LATIN2-5
* add UNICODE regression test case
* new test suite for MB

New directory test/mb added.

* clean up source files

Basic idea is to have MB's own subdirectory for easier maintenance.
These are include/mb and backend/utils/mb.
---
 doc/README.mb    | 60 ++++++++++++++++++++++++++++++++++++++++++------
 doc/README.mb.jp | 70 +++++++++++++++++++++++++++++++++++++++++++++-----------
 2 files changed, 110 insertions(+), 20 deletions(-)

(limited to 'doc')

diff --git a/doc/README.mb b/doc/README.mb
index 775d05c48ba..d5436d16039 100644
--- a/doc/README.mb
+++ b/doc/README.mb
@@ -1,4 +1,4 @@
-postgresql 6.4 multi-byte (MB) support README	  Jun 5 1998
+postgresql 6.4 multi-byte (MB) support README	  Jul 22 1998
 
 						Tatsuo Ishii
 						t-ishii@sra.co.jp
@@ -10,7 +10,10 @@ The MB support is intended for allowing PostgreSQL to handle
 multi-byte character sets such as EUC(Extended Unix Code), Unicode and
 Mule internal code. With the MB enabled you can use multi-byte
 character sets in regexp ,LIKE and some functions. The encoding system
-chosen is determined at the compile time.
+chosen is determined when initializing your PostgreSQL installation
+using initdb(1). Note that this can be overrided when creating a
+database using createdb(1) or create database SQL command. So you
+could have multiple databases with different encoding system.
 
 MB also fixes some problems concerning with 8-bit single byte
 character sets including ISO8859. (I would not say all of problems
@@ -36,7 +39,11 @@ where encoding_system is one of:
 	EUC_TW			Taiwan EUC
 	UNICODE			Unicode(UTF-8)
 	MULE_INTERNAL		Mule internal
-	LATIN1			ISO 8859-1 English and some European laguages
+	LATIN1			ISO 8859-1 English and some European languages
+	LATIN2			ISO 8859-2 English and some European languages
+	LATIN3			ISO 8859-3 English and some European languages
+	LATIN4			ISO 8859-4 English and some European languages
+	LATIN5			ISO 8859-5 English and some European languages
 
 Example:
 
@@ -50,7 +57,28 @@ Example:
 If MB is disabled, nothing is changed except better supporting for
 8-bit single byte character sets.
 
-2. PGCLIENTENCODING
+2. How to set encoding
+
+initdb command defines the default encoding for a PostgreSQL
+installation. For example:
+
+	% initdb -e EUC_JP
+
+sets the default encoding to EUC_JP(Extended Unix Code for Japanese).
+Note that you can use "-pgencoding" instead of "-e" if you like longer
+option string:-) If no -e or -pgencoding option is given, the encoding
+specified at the compile time is used.
+
+You can create a database with a different encoding.
+
+	% createdb -E EUC_KR korean
+
+will create a database named "korean" with EUC_KR encoding. The
+another way to accomplish this is to use a SQL command:
+
+	CREATE DATABASE korean WITH ENCODING = 'EUC_KR';
+
+3. PGCLIENTENCODING
 
 If an environment variable PGCLIENTENCODING is defined on the
 frontend, automatic encoding translation is done by the backend. For
@@ -68,7 +96,11 @@ Supported encodings for PGCLIENTENCODING are:
 	EUC_KR			Korean EUC
 	EUC_TW			Taiwan EUC
 	MULE_INTERNAL		Mule internal
-	LATIN1			ISO 8859-1 English and some European laguages
+	LATIN1			ISO 8859-1 English and some European languages
+	LATIN2			ISO 8859-2 English and some European languages
+	LATIN3			ISO 8859-3 English and some European languages
+	LATIN4			ISO 8859-4 English and some European languages
+	LATIN5			ISO 8859-5 English and some European languages
 
 Note that UNICODE is not supported(yet). Also note that the
 translation is not always possible. Suppose you choose EUC_JP for the
@@ -86,7 +118,12 @@ new command:
 	SET CLIENT_ENCODING TO 'encoding';
 
 where encoding is one of the encodings those can be set to
-PGCLIENTENCODING.  To query the current the frontend encoding:
+PGCLIENTENCODING. Also you can use SQL92 syntax "SET NAMES" for this
+purpose:
+
+	SET NAMES 'encoding';
+
+To query the current the frontend encoding:
 
 	SHOW CLIENT_ENCODING;
 
@@ -114,7 +151,16 @@ Unicode: http://www.unicode.org/
 
 5. History
 
-Jun 5, 1988
+Jul 22, 1998
+	* determine encoding at initdb/createdb rather than compile time
+	* support for PGCLIENTENCODING when issuing COPY command
+	* support for SQL92 syntax "SET NAMES"
+	* support for LATIN2-5
+	* add UNICODE regression test case
+	* new test suite for MB
+	* clean up source files
+
+Jun 5, 1998
 	* add support for the encoding translation between the backend
 	  and the frontend
 	* new command SET CLIENT_ENCODING etc. added
diff --git a/doc/README.mb.jp b/doc/README.mb.jp
index 3fde80f4205..08476595dc0 100644
--- a/doc/README.mb.jp
+++ b/doc/README.mb.jp
@@ -1,4 +1,4 @@
-postgresql 6.3.2 multi-byte (MB) support README	       1998/5/25 作成
+postgresql 6.4 multi-byte (MB) support README	       1998/7/22 作成
 
 							石井達夫
 						t-ishii@sra.co.jp
@@ -9,7 +9,7 @@ postgresql 6.3.2 multi-byte (MB) support README	       1998/5/25 作成
   PostgreSQL におけるマルチバイトサポートは以下のような特徴を持っています。
 
     1.マルチバイト文字として、日本語、中国語などの各国の EUC、Unicode、
-      mule internal code, ISO-8859-1 がコンパイル時に選択可能。
+      mule internal code, ISO-8859-1 がデータベース作成時に選択可能。
       データベースにはこのコードのまま格納されます。
     2.テーブル名にマルチバイト文字が使用可能(ただし、OS がマルチバイト
       のファイル名を許していることが必要)
@@ -23,6 +23,7 @@ postgresql 6.3.2 multi-byte (MB) support README	       1998/5/25 作成
       がバックエンド側と異る場合に、自動的にコード変換を行ないます。
 
 インストール：
+
   デフォルトでは PostgreSQL はマルチバイトをサポートしていません。
   マルチバイトサポートを有効にする方法を説明します。
 
@@ -34,9 +35,11 @@ postgresql 6.3.2 multi-byte (MB) support README	       1998/5/25 作成
 
   % configure --with-mb=EUC_JP
 
-  文字コードとしては EUC_JP を含め、以下のコードが指定できます。
-  (現在の実装では、文字コードはコンパイル時に決定され、実行時に
-   動的に変更することはできません)
+  文字コードとしては EUC_JP を含め、以下のコードが initdb による
+  データベース初期化時およびデータベース作成時
+  (Unix コマンドの createdb もしくは SQL の create database)
+  に指定できます。Makefile.custom あるいは configure で指定した文字コー
+  ドは initdb の省略時の文字コードになります。
 
 	EUC_JP		日本語 EUC
 	EUC_CN		GB をベースにした中文EUC。code set 2 は
@@ -48,9 +51,9 @@ postgresql 6.3.2 multi-byte (MB) support README	       1998/5/25 作成
 			すなわち 0xffff までです。
 	MULE_INTERNAL	mule の内部コード。ただし、Type N の不定長文字は
 			サポートしていません。
-	LATIN1		ISO8859 Latin 1。シングルバイトなんですけど、
-			試しということで:-)ちなみに、LATIN2 etc. は
-			未サポート。
+	LATIN*		ISO8859 Latin シリーズ。* は 1 から 5 まで指定
+			できます。シングルバイトなんですけど、
+			試しということで:-)
 
   選択の目安としては、英語と日本語しか使わない場合は EUC_JP(同様に、中
   国語しか使わない場合は EUC_CN... などとなります)、その他の言語も使いた
@@ -69,13 +72,42 @@ postgresql 6.3.2 multi-byte (MB) support README	       1998/5/25 作成
   http://www.sra.co.jp/people/t-ishii/PostgreSQL/ でも簡単なインストー
   ル方法を紹介しています。
 
+initdb/createdb/create database における文字コードの指定について
+
+  initdb では以下のオプションで文字コードが指定できます。
+
+	-e 文字コード
+	-pgencoding 文字コード
+
+  ここで指定した文字コードは、以後 createdb/create database で文字コードを
+  省略した場合に設定される文字コードになります。-e または -pgencoding
+  オプションを省略した場合は、Makefile.custom あるいは configure で指
+  定した文字コードが採用されます。
+
+  createdb では以下のオプションで文字コードが指定できます。
+
+	-E 文字コード
+
+  create database では以下のオプションで文字コードが指定できます。
+
+	CREATE DATABASE dbanme WITH ENCODING = '文字コード';
+
+  LOCATION を同時に指定する場合は以下のようになります。
+
+	CREATE DATABASE dbanme WITH LOCATION = 'path' ENCODING = '文字コード';
+
+  createdb/create database は、文字コード指定を省略した場合は、initdb 
+  で指定した文字コードが採用されます。
+
 環境変数 PGCLIENTENCODING について：
 
-  デフォルトでは、コンパイル時に指定したサーバ側の文字コードと、psql 
-  などのクライアント側の文字コードが一致しているものと見倣されます。サー
-  バ側と異る文字コードを使いたい場合は、環境変数 PGCLIENTENCODING を設
-  定します。設定可能な文字コードは、上記に加え、SJIS (シフトJIS)
-  が指定できます。
+  環境変数 PGCLIENTENCODING が設定されていない場合、libpq はセッション
+  開始時にサーバ側に文字コードを問い合わせ、その値を環境変数 
+  PGCLIENTENCODING に設定します。
+
+  環境変数 PGCLIENTENCODING が設定されている場合はその値が優先され、サー
+  バ側と異なる文字コードが使用できます。設定可能な文字コードは、上記に
+  加え、SJIS (シフトJIS)が指定できます。
 
 	ちなみに、SJIS は JISX0201 の 1バイトカナ、いわゆる「半角カタ
 	カナ」もサポートしています(決して「半角カタカナ」の使用をお勧
@@ -150,6 +182,18 @@ postgresql 6.3.2 multi-byte (MB) support README	       1998/5/25 作成
 
 改定履歴：
 
+  1998/7/22 6.4 α向けにパッチをリリース。
+	* initdb/createdb/create database でサーバ側の文字コードを設定
+          できる機能実装。このため、システムカタログの pg_database に
+          新しいカラム encoding を追加(MBが有効な時だけ)
+	* copy が PGCLIENTENCODING に対応
+	* SQL92 の "SET NAMES" をサポート(MBが有効な時だけ)
+	* LATIN2-5 をサポート
+	* regression test に unicode のテストケースを追加
+	* MB 専用の regression テストディレクトリ test/mb を追加
+	* ソースファイルの置き場所を大幅見直し。MB 関係は
+	  include/mb, backend/utils/mb に置くようにした
+
   1998/5/25 バグ修正(mb_b3.patch として pgsql-jp ML にリリース、
 	本家では 6.4 snapshot に取り込まれる予定)	
 
-- 
cgit v1.2.3