Support text position search functions with nondeterministic collations
authorPeter Eisentraut <peter@eisentraut.org>
Fri, 21 Feb 2025 11:21:17 +0000 (12:21 +0100)
committerPeter Eisentraut <peter@eisentraut.org>
Fri, 21 Feb 2025 11:21:17 +0000 (12:21 +0100)
commit329304c9012b2ac6d906afeb18062f9080dceef9
tree77532f3281bbd7f3b6abc577d64331a3992a11a6
parent41336bf085599892b37ecfeace1576d9ae9a599a
Support text position search functions with nondeterministic collations

This allows using text position search functions with nondeterministic
collations.  These functions are

- position, strpos
- replace
- split_part
- string_to_array
- string_to_table

which all use common internal infrastructure.

There was previously no internal implementation of this, so it was met
with a not-supported error.  This adds the internal implementation and
removes the error.

Unlike with deterministic collations, the search cannot use any
byte-by-byte optimized techniques but has to go substring by
substring.  We also need to consider that the found match could have a
different length than the needle and that there could be substrings of
different length matching at a position.  In most cases, we need to
find the longest such substring (greedy semantics), but this can be
configured by each caller.

Reviewed-by: Euler Taveira <euler@eulerto.com>
Discussion: https://www.postgresql.org/message-id/flat/582b2613-0900-48ca-8b0d-340c06f4d400@eisentraut.org
src/backend/utils/adt/varlena.c
src/test/regress/expected/collate.icu.utf8.out
src/test/regress/sql/collate.icu.utf8.sql