summaryrefslogtreecommitdiff
path: root/doc/src
diff options
context:
space:
mode:
authorPeter Geoghegan2020-02-26 19:28:25 +0000
committerPeter Geoghegan2020-02-26 19:28:25 +0000
commit612a1ab76724aa1514b6509269342649f8cab375 (patch)
treedf13756515fd71d6528958f2315123d89d41b817 /doc/src
parent4109bb5de4998b9301ea2ac18c9d6dfb0b4f900b (diff)
Add equalimage B-Tree support functions.
Invent the concept of a B-Tree equalimage ("equality implies image equality") support function, registered as support function 4. This indicates whether it is safe (or not safe) to apply optimizations that assume that any two datums considered equal by an operator class's order method must be interchangeable without any loss of semantic information. This is static information about an operator class and a collation. Register an equalimage routine for almost all of the existing B-Tree opclasses. We only need two trivial routines for all of the opclasses that are included with the core distribution. There is one routine for opclasses that index non-collatable types (which returns 'true' unconditionally), plus another routine for collatable types (which returns 'true' when the collation is a deterministic collation). This patch is infrastructure for an upcoming patch that adds B-Tree deduplication. Author: Peter Geoghegan, Anastasia Lubennikova Discussion: https://postgr.es/m/CAH2-Wzn3Ee49Gmxb7V1VJ3-AC8fWn-Fr8pfWQebHe8rYRxt5OQ@mail.gmail.com
Diffstat (limited to 'doc/src')
-rw-r--r--doc/src/sgml/btree.sgml96
-rw-r--r--doc/src/sgml/ref/alter_opfamily.sgml7
-rw-r--r--doc/src/sgml/ref/create_opclass.sgml14
-rw-r--r--doc/src/sgml/xindex.sgml18
4 files changed, 121 insertions, 14 deletions
diff --git a/doc/src/sgml/btree.sgml b/doc/src/sgml/btree.sgml
index ac6c4423e60..fcf771c857f 100644
--- a/doc/src/sgml/btree.sgml
+++ b/doc/src/sgml/btree.sgml
@@ -207,7 +207,7 @@
<para>
As shown in <xref linkend="xindex-btree-support-table"/>, btree defines
- one required and two optional support functions. The three
+ one required and three optional support functions. The four
user-defined methods are:
</para>
<variablelist>
@@ -456,6 +456,100 @@ returns bool
</para>
</listitem>
</varlistentry>
+ <varlistentry>
+ <term><function>equalimage</function></term>
+ <listitem>
+ <para>
+ Optionally, a btree operator family may provide
+ <function>equalimage</function> (<quote>equality implies image
+ equality</quote>) support functions, registered under support
+ function number 4. These functions allow the core code to
+ determine when it is safe to apply the btree deduplication
+ optimization. Currently, <function>equalimage</function>
+ functions are only called when building or rebuilding an index.
+ </para>
+ <para>
+ An <function>equalimage</function> function must have the
+ signature
+<synopsis>
+equalimage(<replaceable>opcintype</replaceable> <type>oid</type>) returns bool
+</synopsis>
+ The return value is static information about an operator class
+ and collation. Returning <literal>true</literal> indicates that
+ the <function>order</function> function for the operator class is
+ guaranteed to only return <literal>0</literal> (<quote>arguments
+ are equal</quote>) when its <replaceable>A</replaceable> and
+ <replaceable>B</replaceable> arguments are also interchangeable
+ without any loss of semantic information. Not registering an
+ <function>equalimage</function> function or returning
+ <literal>false</literal> indicates that this condition cannot be
+ assumed to hold.
+ </para>
+ <para>
+ The <replaceable>opcintype</replaceable> argument is the
+ <literal><structname>pg_type</structname>.oid</literal> of the
+ data type that the operator class indexes. This is a convenience
+ that allows reuse of the same underlying
+ <function>equalimage</function> function across operator classes.
+ If <replaceable>opcintype</replaceable> is a collatable data
+ type, the appropriate collation OID will be passed to the
+ <function>equalimage</function> function, using the standard
+ <function>PG_GET_COLLATION()</function> mechanism.
+ </para>
+ <para>
+ As far as the operator class is concerned, returning
+ <literal>true</literal> indicates that deduplication is safe (or
+ safe for the collation whose OID was passed to its
+ <function>equalimage</function> function). However, the core
+ code will only deem deduplication safe for an index when
+ <emphasis>every</emphasis> indexed column uses an operator class
+ that registers an <function>equalimage</function> function, and
+ each function actually returns <literal>true</literal> when
+ called.
+ </para>
+ <para>
+ Image equality is <emphasis>almost</emphasis> the same condition
+ as simple bitwise equality. There is one subtle difference: When
+ indexing a varlena data type, the on-disk representation of two
+ image equal datums may not be bitwise equal due to inconsistent
+ application of <acronym>TOAST</acronym> compression on input.
+ Formally, when an operator class's
+ <function>equalimage</function> function returns
+ <literal>true</literal>, it is safe to assume that the
+ <literal>datum_image_eq()</literal> C function will always agree
+ with the operator class's <function>order</function> function
+ (provided that the same collation OID is passed to both the
+ <function>equalimage</function> and <function>order</function>
+ functions).
+ </para>
+ <para>
+ The core code is fundamentally unable to deduce anything about
+ the <quote>equality implies image equality</quote> status of an
+ operator class within a multiple-data-type family based on
+ details from other operator classes in the same family. Also, it
+ is not sensible for an operator family to register a cross-type
+ <function>equalimage</function> function, and attempting to do so
+ will result in an error. This is because <quote>equality implies
+ image equality</quote> status does not just depend on
+ sorting/equality semantics, which are more or less defined at the
+ operator family level. In general, the semantics that one
+ particular data type implements must be considered separately.
+ </para>
+ <para>
+ The convention followed by the operator classes included with the
+ core <productname>PostgreSQL</productname> distribution is to
+ register a stock, generic <function>equalimage</function>
+ function. Most operator classes register
+ <function>btequalimage()</function>, which indicates that
+ deduplication is safe unconditionally. Operator classes for
+ collatable data types such as <type>text</type> register
+ <function>btvarstrequalimage()</function>, which indicates that
+ deduplication is safe with deterministic collations. Best
+ practice for third-party extensions is to register their own
+ custom function to retain control.
+ </para>
+ </listitem>
+ </varlistentry>
</variablelist>
</sect1>
diff --git a/doc/src/sgml/ref/alter_opfamily.sgml b/doc/src/sgml/ref/alter_opfamily.sgml
index 848156c9d7d..4ac1cca95a3 100644
--- a/doc/src/sgml/ref/alter_opfamily.sgml
+++ b/doc/src/sgml/ref/alter_opfamily.sgml
@@ -153,9 +153,10 @@ ALTER OPERATOR FAMILY <replaceable>name</replaceable> USING <replaceable class="
and hash functions it is not necessary to specify <replaceable
class="parameter">op_type</replaceable> since the function's input
data type(s) are always the correct ones to use. For B-tree sort
- support functions and all functions in GiST, SP-GiST and GIN operator
- classes, it is necessary to specify the operand data type(s) the function
- is to be used with.
+ support functions, B-Tree equal image functions, and all
+ functions in GiST, SP-GiST and GIN operator classes, it is
+ necessary to specify the operand data type(s) the function is to
+ be used with.
</para>
<para>
diff --git a/doc/src/sgml/ref/create_opclass.sgml b/doc/src/sgml/ref/create_opclass.sgml
index dd5252fd976..f42fb6494c6 100644
--- a/doc/src/sgml/ref/create_opclass.sgml
+++ b/doc/src/sgml/ref/create_opclass.sgml
@@ -171,12 +171,14 @@ CREATE OPERATOR CLASS <replaceable class="parameter">name</replaceable> [ DEFAUL
function is intended to support, if different from
the input data type(s) of the function (for B-tree comparison functions
and hash functions)
- or the class's data type (for B-tree sort support functions and all
- functions in GiST, SP-GiST, GIN and BRIN operator classes). These defaults
- are correct, and so <replaceable
- class="parameter">op_type</replaceable> need not be specified in
- <literal>FUNCTION</literal> clauses, except for the case of a B-tree sort
- support function that is meant to support cross-data-type comparisons.
+ or the class's data type (for B-tree sort support functions,
+ B-tree equal image functions, and all functions in GiST,
+ SP-GiST, GIN and BRIN operator classes). These defaults are
+ correct, and so <replaceable
+ class="parameter">op_type</replaceable> need not be specified
+ in <literal>FUNCTION</literal> clauses, except for the case of a
+ B-tree sort support function that is meant to support
+ cross-data-type comparisons.
</para>
</listitem>
</varlistentry>
diff --git a/doc/src/sgml/xindex.sgml b/doc/src/sgml/xindex.sgml
index ffb5164aaa0..2e06ad01bf5 100644
--- a/doc/src/sgml/xindex.sgml
+++ b/doc/src/sgml/xindex.sgml
@@ -402,7 +402,7 @@
<para>
B-trees require a comparison support function,
- and allow two additional support functions to be
+ and allow three additional support functions to be
supplied at the operator class author's option, as shown in <xref
linkend="xindex-btree-support-table"/>.
The requirements for these support functions are explained further in
@@ -441,6 +441,13 @@
</entry>
<entry>3</entry>
</row>
+ <row>
+ <entry>
+ Determine if it is safe for indexes that use the operator
+ class to apply the btree deduplication optimization (optional)
+ </entry>
+ <entry>4</entry>
+ </row>
</tbody>
</tgroup>
</table>
@@ -980,7 +987,8 @@ DEFAULT FOR TYPE int8 USING btree FAMILY integer_ops AS
OPERATOR 5 > ,
FUNCTION 1 btint8cmp(int8, int8) ,
FUNCTION 2 btint8sortsupport(internal) ,
- FUNCTION 3 in_range(int8, int8, int8, boolean, boolean) ;
+ FUNCTION 3 in_range(int8, int8, int8, boolean, boolean) ,
+ FUNCTION 4 btequalimage(oid) ;
CREATE OPERATOR CLASS int4_ops
DEFAULT FOR TYPE int4 USING btree FAMILY integer_ops AS
@@ -992,7 +1000,8 @@ DEFAULT FOR TYPE int4 USING btree FAMILY integer_ops AS
OPERATOR 5 > ,
FUNCTION 1 btint4cmp(int4, int4) ,
FUNCTION 2 btint4sortsupport(internal) ,
- FUNCTION 3 in_range(int4, int4, int4, boolean, boolean) ;
+ FUNCTION 3 in_range(int4, int4, int4, boolean, boolean) ,
+ FUNCTION 4 btequalimage(oid) ;
CREATE OPERATOR CLASS int2_ops
DEFAULT FOR TYPE int2 USING btree FAMILY integer_ops AS
@@ -1004,7 +1013,8 @@ DEFAULT FOR TYPE int2 USING btree FAMILY integer_ops AS
OPERATOR 5 > ,
FUNCTION 1 btint2cmp(int2, int2) ,
FUNCTION 2 btint2sortsupport(internal) ,
- FUNCTION 3 in_range(int2, int2, int2, boolean, boolean) ;
+ FUNCTION 3 in_range(int2, int2, int2, boolean, boolean) ,
+ FUNCTION 4 btequalimage(oid) ;
ALTER OPERATOR FAMILY integer_ops USING btree ADD
-- cross-type comparisons int8 vs int2