Add SQL functions for Unicode normalization
authorPeter Eisentraut <peter@eisentraut.org>
Thu, 26 Mar 2020 07:14:00 +0000 (08:14 +0100)
committerPeter Eisentraut <peter@eisentraut.org>
Thu, 2 Apr 2020 06:56:27 +0000 (08:56 +0200)
commit2991ac5fc9b3904ca4582be6d323497d7c3d17c9
treed558847de39ee972b261026d4846f1f31e8dff12
parent070c3d3937e75e04d36405287353b7eca516555d
Add SQL functions for Unicode normalization

This adds SQL expressions NORMALIZE() and IS NORMALIZED to convert and
check Unicode normal forms, per SQL standard.

To support fast IS NORMALIZED tests, we pull in a new data file
DerivedNormalizationProps.txt from Unicode and build a lookup table
from that, using techniques similar to ones already used for other
Unicode data.  make update-unicode will keep it up to date.  We only
build and use these tables for the NFC and NFKC forms, because they
are too big for NFD and NFKD and the improvement is not significant
enough there.

Reviewed-by: Daniel Verite <daniel@manitou-mail.org>
Reviewed-by: Andreas Karlsson <andreas@proxel.se>
Discussion: https://www.postgresql.org/message-id/flat/c1909f27-c269-2ed9-12f8-3ab72c8caf7a@2ndquadrant.com
20 files changed:
doc/src/sgml/charset.sgml
doc/src/sgml/func.sgml
src/backend/catalog/sql_features.txt
src/backend/catalog/system_views.sql
src/backend/parser/gram.y
src/backend/utils/adt/varlena.c
src/common/unicode/.gitignore
src/common/unicode/Makefile
src/common/unicode/generate-unicode_normprops_table.pl [new file with mode: 0644]
src/common/unicode_norm.c
src/include/catalog/catversion.h
src/include/catalog/pg_proc.dat
src/include/common/unicode_norm.h
src/include/common/unicode_normprops_table.h [new file with mode: 0644]
src/include/parser/kwlist.h
src/test/regress/expected/unicode.out [new file with mode: 0644]
src/test/regress/expected/unicode_1.out [new file with mode: 0644]
src/test/regress/parallel_schedule
src/test/regress/serial_schedule
src/test/regress/sql/unicode.sql [new file with mode: 0644]