diff options
| author | Peter Geoghegan | 2020-02-26 19:28:25 +0000 |
|---|---|---|
| committer | Peter Geoghegan | 2020-02-26 19:28:25 +0000 |
| commit | 612a1ab76724aa1514b6509269342649f8cab375 (patch) | |
| tree | df13756515fd71d6528958f2315123d89d41b817 /doc/src | |
| parent | 4109bb5de4998b9301ea2ac18c9d6dfb0b4f900b (diff) | |
Add equalimage B-Tree support functions.
Invent the concept of a B-Tree equalimage ("equality implies image
equality") support function, registered as support function 4. This
indicates whether it is safe (or not safe) to apply optimizations that
assume that any two datums considered equal by an operator class's order
method must be interchangeable without any loss of semantic information.
This is static information about an operator class and a collation.
Register an equalimage routine for almost all of the existing B-Tree
opclasses. We only need two trivial routines for all of the opclasses
that are included with the core distribution. There is one routine for
opclasses that index non-collatable types (which returns 'true'
unconditionally), plus another routine for collatable types (which
returns 'true' when the collation is a deterministic collation).
This patch is infrastructure for an upcoming patch that adds B-Tree
deduplication.
Author: Peter Geoghegan, Anastasia Lubennikova
Discussion: https://postgr.es/m/CAH2-Wzn3Ee49Gmxb7V1VJ3-AC8fWn-Fr8pfWQebHe8rYRxt5OQ@mail.gmail.com
Diffstat (limited to 'doc/src')
| -rw-r--r-- | doc/src/sgml/btree.sgml | 96 | ||||
| -rw-r--r-- | doc/src/sgml/ref/alter_opfamily.sgml | 7 | ||||
| -rw-r--r-- | doc/src/sgml/ref/create_opclass.sgml | 14 | ||||
| -rw-r--r-- | doc/src/sgml/xindex.sgml | 18 |
4 files changed, 121 insertions, 14 deletions
diff --git a/doc/src/sgml/btree.sgml b/doc/src/sgml/btree.sgml index ac6c4423e60..fcf771c857f 100644 --- a/doc/src/sgml/btree.sgml +++ b/doc/src/sgml/btree.sgml @@ -207,7 +207,7 @@ <para> As shown in <xref linkend="xindex-btree-support-table"/>, btree defines - one required and two optional support functions. The three + one required and three optional support functions. The four user-defined methods are: </para> <variablelist> @@ -456,6 +456,100 @@ returns bool </para> </listitem> </varlistentry> + <varlistentry> + <term><function>equalimage</function></term> + <listitem> + <para> + Optionally, a btree operator family may provide + <function>equalimage</function> (<quote>equality implies image + equality</quote>) support functions, registered under support + function number 4. These functions allow the core code to + determine when it is safe to apply the btree deduplication + optimization. Currently, <function>equalimage</function> + functions are only called when building or rebuilding an index. + </para> + <para> + An <function>equalimage</function> function must have the + signature +<synopsis> +equalimage(<replaceable>opcintype</replaceable> <type>oid</type>) returns bool +</synopsis> + The return value is static information about an operator class + and collation. Returning <literal>true</literal> indicates that + the <function>order</function> function for the operator class is + guaranteed to only return <literal>0</literal> (<quote>arguments + are equal</quote>) when its <replaceable>A</replaceable> and + <replaceable>B</replaceable> arguments are also interchangeable + without any loss of semantic information. Not registering an + <function>equalimage</function> function or returning + <literal>false</literal> indicates that this condition cannot be + assumed to hold. + </para> + <para> + The <replaceable>opcintype</replaceable> argument is the + <literal><structname>pg_type</structname>.oid</literal> of the + data type that the operator class indexes. This is a convenience + that allows reuse of the same underlying + <function>equalimage</function> function across operator classes. + If <replaceable>opcintype</replaceable> is a collatable data + type, the appropriate collation OID will be passed to the + <function>equalimage</function> function, using the standard + <function>PG_GET_COLLATION()</function> mechanism. + </para> + <para> + As far as the operator class is concerned, returning + <literal>true</literal> indicates that deduplication is safe (or + safe for the collation whose OID was passed to its + <function>equalimage</function> function). However, the core + code will only deem deduplication safe for an index when + <emphasis>every</emphasis> indexed column uses an operator class + that registers an <function>equalimage</function> function, and + each function actually returns <literal>true</literal> when + called. + </para> + <para> + Image equality is <emphasis>almost</emphasis> the same condition + as simple bitwise equality. There is one subtle difference: When + indexing a varlena data type, the on-disk representation of two + image equal datums may not be bitwise equal due to inconsistent + application of <acronym>TOAST</acronym> compression on input. + Formally, when an operator class's + <function>equalimage</function> function returns + <literal>true</literal>, it is safe to assume that the + <literal>datum_image_eq()</literal> C function will always agree + with the operator class's <function>order</function> function + (provided that the same collation OID is passed to both the + <function>equalimage</function> and <function>order</function> + functions). + </para> + <para> + The core code is fundamentally unable to deduce anything about + the <quote>equality implies image equality</quote> status of an + operator class within a multiple-data-type family based on + details from other operator classes in the same family. Also, it + is not sensible for an operator family to register a cross-type + <function>equalimage</function> function, and attempting to do so + will result in an error. This is because <quote>equality implies + image equality</quote> status does not just depend on + sorting/equality semantics, which are more or less defined at the + operator family level. In general, the semantics that one + particular data type implements must be considered separately. + </para> + <para> + The convention followed by the operator classes included with the + core <productname>PostgreSQL</productname> distribution is to + register a stock, generic <function>equalimage</function> + function. Most operator classes register + <function>btequalimage()</function>, which indicates that + deduplication is safe unconditionally. Operator classes for + collatable data types such as <type>text</type> register + <function>btvarstrequalimage()</function>, which indicates that + deduplication is safe with deterministic collations. Best + practice for third-party extensions is to register their own + custom function to retain control. + </para> + </listitem> + </varlistentry> </variablelist> </sect1> diff --git a/doc/src/sgml/ref/alter_opfamily.sgml b/doc/src/sgml/ref/alter_opfamily.sgml index 848156c9d7d..4ac1cca95a3 100644 --- a/doc/src/sgml/ref/alter_opfamily.sgml +++ b/doc/src/sgml/ref/alter_opfamily.sgml @@ -153,9 +153,10 @@ ALTER OPERATOR FAMILY <replaceable>name</replaceable> USING <replaceable class=" and hash functions it is not necessary to specify <replaceable class="parameter">op_type</replaceable> since the function's input data type(s) are always the correct ones to use. For B-tree sort - support functions and all functions in GiST, SP-GiST and GIN operator - classes, it is necessary to specify the operand data type(s) the function - is to be used with. + support functions, B-Tree equal image functions, and all + functions in GiST, SP-GiST and GIN operator classes, it is + necessary to specify the operand data type(s) the function is to + be used with. </para> <para> diff --git a/doc/src/sgml/ref/create_opclass.sgml b/doc/src/sgml/ref/create_opclass.sgml index dd5252fd976..f42fb6494c6 100644 --- a/doc/src/sgml/ref/create_opclass.sgml +++ b/doc/src/sgml/ref/create_opclass.sgml @@ -171,12 +171,14 @@ CREATE OPERATOR CLASS <replaceable class="parameter">name</replaceable> [ DEFAUL function is intended to support, if different from the input data type(s) of the function (for B-tree comparison functions and hash functions) - or the class's data type (for B-tree sort support functions and all - functions in GiST, SP-GiST, GIN and BRIN operator classes). These defaults - are correct, and so <replaceable - class="parameter">op_type</replaceable> need not be specified in - <literal>FUNCTION</literal> clauses, except for the case of a B-tree sort - support function that is meant to support cross-data-type comparisons. + or the class's data type (for B-tree sort support functions, + B-tree equal image functions, and all functions in GiST, + SP-GiST, GIN and BRIN operator classes). These defaults are + correct, and so <replaceable + class="parameter">op_type</replaceable> need not be specified + in <literal>FUNCTION</literal> clauses, except for the case of a + B-tree sort support function that is meant to support + cross-data-type comparisons. </para> </listitem> </varlistentry> diff --git a/doc/src/sgml/xindex.sgml b/doc/src/sgml/xindex.sgml index ffb5164aaa0..2e06ad01bf5 100644 --- a/doc/src/sgml/xindex.sgml +++ b/doc/src/sgml/xindex.sgml @@ -402,7 +402,7 @@ <para> B-trees require a comparison support function, - and allow two additional support functions to be + and allow three additional support functions to be supplied at the operator class author's option, as shown in <xref linkend="xindex-btree-support-table"/>. The requirements for these support functions are explained further in @@ -441,6 +441,13 @@ </entry> <entry>3</entry> </row> + <row> + <entry> + Determine if it is safe for indexes that use the operator + class to apply the btree deduplication optimization (optional) + </entry> + <entry>4</entry> + </row> </tbody> </tgroup> </table> @@ -980,7 +987,8 @@ DEFAULT FOR TYPE int8 USING btree FAMILY integer_ops AS OPERATOR 5 > , FUNCTION 1 btint8cmp(int8, int8) , FUNCTION 2 btint8sortsupport(internal) , - FUNCTION 3 in_range(int8, int8, int8, boolean, boolean) ; + FUNCTION 3 in_range(int8, int8, int8, boolean, boolean) , + FUNCTION 4 btequalimage(oid) ; CREATE OPERATOR CLASS int4_ops DEFAULT FOR TYPE int4 USING btree FAMILY integer_ops AS @@ -992,7 +1000,8 @@ DEFAULT FOR TYPE int4 USING btree FAMILY integer_ops AS OPERATOR 5 > , FUNCTION 1 btint4cmp(int4, int4) , FUNCTION 2 btint4sortsupport(internal) , - FUNCTION 3 in_range(int4, int4, int4, boolean, boolean) ; + FUNCTION 3 in_range(int4, int4, int4, boolean, boolean) , + FUNCTION 4 btequalimage(oid) ; CREATE OPERATOR CLASS int2_ops DEFAULT FOR TYPE int2 USING btree FAMILY integer_ops AS @@ -1004,7 +1013,8 @@ DEFAULT FOR TYPE int2 USING btree FAMILY integer_ops AS OPERATOR 5 > , FUNCTION 1 btint2cmp(int2, int2) , FUNCTION 2 btint2sortsupport(internal) , - FUNCTION 3 in_range(int2, int2, int2, boolean, boolean) ; + FUNCTION 3 in_range(int2, int2, int2, boolean, boolean) , + FUNCTION 4 btequalimage(oid) ; ALTER OPERATOR FAMILY integer_ops USING btree ADD -- cross-type comparisons int8 vs int2 |
