diff options
author | Tom Lane | 2009-12-01 21:00:24 +0000 |
---|---|---|
committer | Tom Lane | 2009-12-01 21:00:24 +0000 |
commit | 0d32342501f2a562bc57156dc92d59a0624be4a6 (patch) | |
tree | 9039a0f5bdc634c1a7dfa99371160e51e1759168 /src/include/regex | |
parent | ef51395e24c7452a9a50e3576b52fb64602f8cad (diff) |
Teach the regular expression functions to do case-insensitive matching and
locale-dependent character classification properly when the database encoding
is UTF8.
The previous coding worked okay in single-byte encodings, or in any case for
ASCII characters, but failed entirely on multibyte characters. The fix
assumes that the <wctype.h> functions use Unicode code points as the wchar
representation for Unicode, ie, wchar matches pg_wchar.
This is only a partial solution, since we're still stupid about non-ASCII
characters in multibyte encodings other than UTF8. The practical effect
of that is limited, however, since those cases are generally Far Eastern
glyphs for which concepts like case-folding don't apply anyway. Certainly
all or nearly all of the field reports of problems have been about UTF8.
A more general solution would require switching to the platform's wchar
representation for all regex operations; which is possible but would have
substantial disadvantages. Let's try this and see if it's sufficient in
practice.
Diffstat (limited to 'src/include/regex')
-rw-r--r-- | src/include/regex/regcustom.h | 13 |
1 files changed, 12 insertions, 1 deletions
diff --git a/src/include/regex/regcustom.h b/src/include/regex/regcustom.h index 269f926be85..d1a07dd00e8 100644 --- a/src/include/regex/regcustom.h +++ b/src/include/regex/regcustom.h @@ -25,7 +25,7 @@ * OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF * ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. * - * $PostgreSQL: pgsql/src/include/regex/regcustom.h,v 1.7 2008/02/14 17:33:37 tgl Exp $ + * $PostgreSQL: pgsql/src/include/regex/regcustom.h,v 1.8 2009/12/01 21:00:24 tgl Exp $ */ /* headers if any */ @@ -34,6 +34,17 @@ #include <ctype.h> #include <limits.h> +/* + * towlower() and friends should be in <wctype.h>, but some pre-C99 systems + * declare them in <wchar.h>. + */ +#ifdef HAVE_WCHAR_H +#include <wchar.h> +#endif +#ifdef HAVE_WCTYPE_H +#include <wctype.h> +#endif + #include "mb/pg_wchar.h" |