summaryrefslogtreecommitdiff
path: root/src/include
diff options
context:
space:
mode:
authorPeter Geoghegan2025-04-04 16:27:04 +0000
committerPeter Geoghegan2025-04-04 16:27:04 +0000
commit92fe23d93aa3bbbc40fca669cabc4a4d7975e327 (patch)
treee79d024c849f0a0b89378ff8c16b6d6b2d0cc777 /src/include
parent3ba2cdaa45416196fadc7163998cda7b4e07e7d7 (diff)
Add nbtree skip scan optimization.
Teach nbtree multi-column index scans to opportunistically skip over irrelevant sections of the index given a query with no "=" conditions on one or more prefix index columns. When nbtree is passed input scan keys derived from a predicate "WHERE b = 5", new nbtree preprocessing steps output "WHERE a = ANY(<every possible 'a' value>) AND b = 5" scan keys. That is, preprocessing generates a "skip array" (and an output scan key) for the omitted prefix column "a", which makes it safe to mark the scan key on "b" as required to continue the scan. The scan is therefore able to repeatedly reposition itself by applying both the "a" and "b" keys. A skip array has "elements" that are generated procedurally and on demand, but otherwise works just like a regular ScalarArrayOp array. Preprocessing can freely add a skip array before or after any input ScalarArrayOp arrays. Index scans with a skip array decide when and where to reposition the scan using the same approach as any other scan with array keys. This design builds on the design for array advancement and primitive scan scheduling added to Postgres 17 by commit 5bf748b8. Testing has shown that skip scans of an index with a low cardinality skipped prefix column can be multiple orders of magnitude faster than an equivalent full index scan (or sequential scan). In general, the cardinality of the scan's skipped column(s) limits the number of leaf pages that can be skipped over. The core B-Tree operator classes on most discrete types generate their array elements with the help of their own custom skip support routine. This infrastructure gives nbtree a way to generate the next required array element by incrementing (or decrementing) the current array value. It can reduce the number of index descents in cases where the next possible indexable value frequently turns out to be the next value stored in the index. Opclasses that lack a skip support routine fall back on having nbtree "increment" (or "decrement") a skip array's current element by setting the NEXT (or PRIOR) scan key flag, without directly changing the scan key's sk_argument. These sentinel values behave just like any other value from an array -- though they can never locate equal index tuples (they can only locate the next group of index tuples containing the next set of non-sentinel values that the scan's arrays need to advance to). A skip array's range is constrained by "contradictory" inequality keys. For example, a skip array on "x" will only generate the values 1 and 2 given a qual such as "WHERE x BETWEEN 1 AND 2 AND y = 66". Such a skip array qual usually has near-identical performance characteristics to a comparable SAOP qual "WHERE x = ANY('{1, 2}') AND y = 66". However, improved performance isn't guaranteed. Much depends on physical index characteristics. B-Tree preprocessing is optimistic about skipping working out: it applies static, generic rules when determining where to generate skip arrays, which assumes that the runtime overhead of maintaining skip arrays will pay for itself -- or lead to only a modest performance loss. As things stand, these assumptions are much too optimistic: skip array maintenance will lead to unacceptable regressions with unsympathetic queries (queries whose scan can't skip over many irrelevant leaf pages). An upcoming commit will address the problems in this area by enhancing _bt_readpage's approach to saving cycles on scan key evaluation, making it work in a way that directly considers the needs of = array keys (particularly = skip array keys). Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiro Ikeda <masahiro.ikeda@nttdata.com> Reviewed-By: Heikki Linnakangas <heikki.linnakangas@iki.fi> Reviewed-By: Matthias van de Meent <boekewurm+postgres@gmail.com> Reviewed-By: Tomas Vondra <tomas@vondra.me> Reviewed-By: Aleksander Alekseev <aleksander@timescale.com> Reviewed-By: Alena Rybakina <a.rybakina@postgrespro.ru> Discussion: https://postgr.es/m/CAH2-Wzmn1YsLzOGgjAQZdn1STSG_y8qP__vggTaPAYXJP+G4bw@mail.gmail.com
Diffstat (limited to 'src/include')
-rw-r--r--src/include/access/amapi.h3
-rw-r--r--src/include/access/nbtree.h34
-rw-r--r--src/include/catalog/catversion.h2
-rw-r--r--src/include/catalog/pg_amproc.dat22
-rw-r--r--src/include/catalog/pg_proc.dat27
-rw-r--r--src/include/utils/skipsupport.h98
6 files changed, 180 insertions, 6 deletions
diff --git a/src/include/access/amapi.h b/src/include/access/amapi.h
index c4a0737731f..52916bab7a3 100644
--- a/src/include/access/amapi.h
+++ b/src/include/access/amapi.h
@@ -214,7 +214,8 @@ typedef void (*amrestrpos_function) (IndexScanDesc scan);
*/
/* estimate size of parallel scan descriptor */
-typedef Size (*amestimateparallelscan_function) (int nkeys, int norderbys);
+typedef Size (*amestimateparallelscan_function) (Relation indexRelation,
+ int nkeys, int norderbys);
/* prepare for parallel index scan */
typedef void (*aminitparallelscan_function) (void *target);
diff --git a/src/include/access/nbtree.h b/src/include/access/nbtree.h
index faabcb78e7b..9d9b76d5b48 100644
--- a/src/include/access/nbtree.h
+++ b/src/include/access/nbtree.h
@@ -24,6 +24,7 @@
#include "lib/stringinfo.h"
#include "storage/bufmgr.h"
#include "storage/shm_toc.h"
+#include "utils/skipsupport.h"
/* There's room for a 16-bit vacuum cycle ID in BTPageOpaqueData */
typedef uint16 BTCycleId;
@@ -707,6 +708,10 @@ BTreeTupleGetMaxHeapTID(IndexTuple itup)
* (BTOPTIONS_PROC). These procedures define a set of user-visible
* parameters that can be used to control operator class behavior. None of
* the built-in B-Tree operator classes currently register an "options" proc.
+ *
+ * To facilitate more efficient B-Tree skip scans, an operator class may
+ * choose to offer a sixth amproc procedure (BTSKIPSUPPORT_PROC). For full
+ * details, see src/include/utils/skipsupport.h.
*/
#define BTORDER_PROC 1
@@ -714,7 +719,8 @@ BTreeTupleGetMaxHeapTID(IndexTuple itup)
#define BTINRANGE_PROC 3
#define BTEQUALIMAGE_PROC 4
#define BTOPTIONS_PROC 5
-#define BTNProcs 5
+#define BTSKIPSUPPORT_PROC 6
+#define BTNProcs 6
/*
* We need to be able to tell the difference between read and write
@@ -1027,10 +1033,21 @@ typedef BTScanPosData *BTScanPos;
/* We need one of these for each equality-type SK_SEARCHARRAY scan key */
typedef struct BTArrayKeyInfo
{
+ /* fields set for both kinds of array (SAOP arrays and skip arrays) */
int scan_key; /* index of associated key in keyData */
- int cur_elem; /* index of current element in elem_values */
- int num_elems; /* number of elems in current array value */
+ int num_elems; /* number of elems (-1 means skip array) */
+
+ /* fields set for ScalarArrayOpExpr arrays only */
Datum *elem_values; /* array of num_elems Datums */
+ int cur_elem; /* index of current element in elem_values */
+
+ /* fields set for skip arrays only */
+ int16 attlen; /* attr's length, in bytes */
+ bool attbyval; /* attr's FormData_pg_attribute.attbyval */
+ bool null_elem; /* NULL is lowest/highest element? */
+ SkipSupport sksup; /* skip support (NULL if opclass lacks it) */
+ ScanKey low_compare; /* array's > or >= lower bound */
+ ScanKey high_compare; /* array's < or <= upper bound */
} BTArrayKeyInfo;
typedef struct BTScanOpaqueData
@@ -1119,6 +1136,15 @@ typedef struct BTReadPageState
*/
#define SK_BT_REQFWD 0x00010000 /* required to continue forward scan */
#define SK_BT_REQBKWD 0x00020000 /* required to continue backward scan */
+#define SK_BT_SKIP 0x00040000 /* skip array on column without input = */
+
+/* SK_BT_SKIP-only flags (set and unset by array advancement) */
+#define SK_BT_MINVAL 0x00080000 /* invalid sk_argument, use low_compare */
+#define SK_BT_MAXVAL 0x00100000 /* invalid sk_argument, use high_compare */
+#define SK_BT_NEXT 0x00200000 /* positions the scan > sk_argument */
+#define SK_BT_PRIOR 0x00400000 /* positions the scan < sk_argument */
+
+/* Remaps pg_index flag bits to uppermost SK_BT_* byte */
#define SK_BT_INDOPTION_SHIFT 24 /* must clear the above bits */
#define SK_BT_DESC (INDOPTION_DESC << SK_BT_INDOPTION_SHIFT)
#define SK_BT_NULLS_FIRST (INDOPTION_NULLS_FIRST << SK_BT_INDOPTION_SHIFT)
@@ -1165,7 +1191,7 @@ extern bool btinsert(Relation rel, Datum *values, bool *isnull,
bool indexUnchanged,
struct IndexInfo *indexInfo);
extern IndexScanDesc btbeginscan(Relation rel, int nkeys, int norderbys);
-extern Size btestimateparallelscan(int nkeys, int norderbys);
+extern Size btestimateparallelscan(Relation rel, int nkeys, int norderbys);
extern void btinitparallelscan(void *target);
extern bool btgettuple(IndexScanDesc scan, ScanDirection dir);
extern int64 btgetbitmap(IndexScanDesc scan, TIDBitmap *tbm);
diff --git a/src/include/catalog/catversion.h b/src/include/catalog/catversion.h
index 950fda777fb..208936962ef 100644
--- a/src/include/catalog/catversion.h
+++ b/src/include/catalog/catversion.h
@@ -57,6 +57,6 @@
*/
/* yyyymmddN */
-#define CATALOG_VERSION_NO 202504031
+#define CATALOG_VERSION_NO 202504041
#endif
diff --git a/src/include/catalog/pg_amproc.dat b/src/include/catalog/pg_amproc.dat
index 41056171059..92505148998 100644
--- a/src/include/catalog/pg_amproc.dat
+++ b/src/include/catalog/pg_amproc.dat
@@ -22,6 +22,8 @@
{ amprocfamily => 'btree/bool_ops', amproclefttype => 'bool',
amprocrighttype => 'bool', amprocnum => '1', amproc => 'btboolcmp' },
{ amprocfamily => 'btree/bool_ops', amproclefttype => 'bool',
+ amprocrighttype => 'bool', amprocnum => '6', amproc => 'btboolskipsupport' },
+{ amprocfamily => 'btree/bool_ops', amproclefttype => 'bool',
amprocrighttype => 'bool', amprocnum => '4', amproc => 'btequalimage' },
{ amprocfamily => 'btree/bpchar_ops', amproclefttype => 'bpchar',
amprocrighttype => 'bpchar', amprocnum => '1', amproc => 'bpcharcmp' },
@@ -41,6 +43,8 @@
amprocrighttype => 'char', amprocnum => '1', amproc => 'btcharcmp' },
{ amprocfamily => 'btree/char_ops', amproclefttype => 'char',
amprocrighttype => 'char', amprocnum => '4', amproc => 'btequalimage' },
+{ amprocfamily => 'btree/char_ops', amproclefttype => 'char',
+ amprocrighttype => 'char', amprocnum => '6', amproc => 'btcharskipsupport' },
{ amprocfamily => 'btree/datetime_ops', amproclefttype => 'date',
amprocrighttype => 'date', amprocnum => '1', amproc => 'date_cmp' },
{ amprocfamily => 'btree/datetime_ops', amproclefttype => 'date',
@@ -48,6 +52,8 @@
{ amprocfamily => 'btree/datetime_ops', amproclefttype => 'date',
amprocrighttype => 'date', amprocnum => '4', amproc => 'btequalimage' },
{ amprocfamily => 'btree/datetime_ops', amproclefttype => 'date',
+ amprocrighttype => 'date', amprocnum => '6', amproc => 'date_skipsupport' },
+{ amprocfamily => 'btree/datetime_ops', amproclefttype => 'date',
amprocrighttype => 'timestamp', amprocnum => '1',
amproc => 'date_cmp_timestamp' },
{ amprocfamily => 'btree/datetime_ops', amproclefttype => 'date',
@@ -61,6 +67,9 @@
{ amprocfamily => 'btree/datetime_ops', amproclefttype => 'timestamp',
amprocrighttype => 'timestamp', amprocnum => '4', amproc => 'btequalimage' },
{ amprocfamily => 'btree/datetime_ops', amproclefttype => 'timestamp',
+ amprocrighttype => 'timestamp', amprocnum => '6',
+ amproc => 'timestamp_skipsupport' },
+{ amprocfamily => 'btree/datetime_ops', amproclefttype => 'timestamp',
amprocrighttype => 'date', amprocnum => '1', amproc => 'timestamp_cmp_date' },
{ amprocfamily => 'btree/datetime_ops', amproclefttype => 'timestamp',
amprocrighttype => 'timestamptz', amprocnum => '1',
@@ -75,6 +84,9 @@
amprocrighttype => 'timestamptz', amprocnum => '4',
amproc => 'btequalimage' },
{ amprocfamily => 'btree/datetime_ops', amproclefttype => 'timestamptz',
+ amprocrighttype => 'timestamptz', amprocnum => '6',
+ amproc => 'timestamp_skipsupport' },
+{ amprocfamily => 'btree/datetime_ops', amproclefttype => 'timestamptz',
amprocrighttype => 'date', amprocnum => '1',
amproc => 'timestamptz_cmp_date' },
{ amprocfamily => 'btree/datetime_ops', amproclefttype => 'timestamptz',
@@ -123,6 +135,8 @@
{ amprocfamily => 'btree/integer_ops', amproclefttype => 'int2',
amprocrighttype => 'int2', amprocnum => '4', amproc => 'btequalimage' },
{ amprocfamily => 'btree/integer_ops', amproclefttype => 'int2',
+ amprocrighttype => 'int2', amprocnum => '6', amproc => 'btint2skipsupport' },
+{ amprocfamily => 'btree/integer_ops', amproclefttype => 'int2',
amprocrighttype => 'int4', amprocnum => '1', amproc => 'btint24cmp' },
{ amprocfamily => 'btree/integer_ops', amproclefttype => 'int2',
amprocrighttype => 'int8', amprocnum => '1', amproc => 'btint28cmp' },
@@ -142,6 +156,8 @@
{ amprocfamily => 'btree/integer_ops', amproclefttype => 'int4',
amprocrighttype => 'int4', amprocnum => '4', amproc => 'btequalimage' },
{ amprocfamily => 'btree/integer_ops', amproclefttype => 'int4',
+ amprocrighttype => 'int4', amprocnum => '6', amproc => 'btint4skipsupport' },
+{ amprocfamily => 'btree/integer_ops', amproclefttype => 'int4',
amprocrighttype => 'int8', amprocnum => '1', amproc => 'btint48cmp' },
{ amprocfamily => 'btree/integer_ops', amproclefttype => 'int4',
amprocrighttype => 'int2', amprocnum => '1', amproc => 'btint42cmp' },
@@ -161,6 +177,8 @@
{ amprocfamily => 'btree/integer_ops', amproclefttype => 'int8',
amprocrighttype => 'int8', amprocnum => '4', amproc => 'btequalimage' },
{ amprocfamily => 'btree/integer_ops', amproclefttype => 'int8',
+ amprocrighttype => 'int8', amprocnum => '6', amproc => 'btint8skipsupport' },
+{ amprocfamily => 'btree/integer_ops', amproclefttype => 'int8',
amprocrighttype => 'int4', amprocnum => '1', amproc => 'btint84cmp' },
{ amprocfamily => 'btree/integer_ops', amproclefttype => 'int8',
amprocrighttype => 'int2', amprocnum => '1', amproc => 'btint82cmp' },
@@ -193,6 +211,8 @@
amprocrighttype => 'oid', amprocnum => '2', amproc => 'btoidsortsupport' },
{ amprocfamily => 'btree/oid_ops', amproclefttype => 'oid',
amprocrighttype => 'oid', amprocnum => '4', amproc => 'btequalimage' },
+{ amprocfamily => 'btree/oid_ops', amproclefttype => 'oid',
+ amprocrighttype => 'oid', amprocnum => '6', amproc => 'btoidskipsupport' },
{ amprocfamily => 'btree/oidvector_ops', amproclefttype => 'oidvector',
amprocrighttype => 'oidvector', amprocnum => '1',
amproc => 'btoidvectorcmp' },
@@ -261,6 +281,8 @@
amprocrighttype => 'uuid', amprocnum => '2', amproc => 'uuid_sortsupport' },
{ amprocfamily => 'btree/uuid_ops', amproclefttype => 'uuid',
amprocrighttype => 'uuid', amprocnum => '4', amproc => 'btequalimage' },
+{ amprocfamily => 'btree/uuid_ops', amproclefttype => 'uuid',
+ amprocrighttype => 'uuid', amprocnum => '6', amproc => 'uuid_skipsupport' },
{ amprocfamily => 'btree/record_ops', amproclefttype => 'record',
amprocrighttype => 'record', amprocnum => '1', amproc => 'btrecordcmp' },
{ amprocfamily => 'btree/record_image_ops', amproclefttype => 'record',
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index a28a15993a2..5d5be8ba4e1 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -1004,18 +1004,27 @@
{ oid => '3129', descr => 'sort support',
proname => 'btint2sortsupport', prorettype => 'void',
proargtypes => 'internal', prosrc => 'btint2sortsupport' },
+{ oid => '9290', descr => 'skip support',
+ proname => 'btint2skipsupport', prorettype => 'void',
+ proargtypes => 'internal', prosrc => 'btint2skipsupport' },
{ oid => '351', descr => 'less-equal-greater',
proname => 'btint4cmp', proleakproof => 't', prorettype => 'int4',
proargtypes => 'int4 int4', prosrc => 'btint4cmp' },
{ oid => '3130', descr => 'sort support',
proname => 'btint4sortsupport', prorettype => 'void',
proargtypes => 'internal', prosrc => 'btint4sortsupport' },
+{ oid => '9291', descr => 'skip support',
+ proname => 'btint4skipsupport', prorettype => 'void',
+ proargtypes => 'internal', prosrc => 'btint4skipsupport' },
{ oid => '842', descr => 'less-equal-greater',
proname => 'btint8cmp', proleakproof => 't', prorettype => 'int4',
proargtypes => 'int8 int8', prosrc => 'btint8cmp' },
{ oid => '3131', descr => 'sort support',
proname => 'btint8sortsupport', prorettype => 'void',
proargtypes => 'internal', prosrc => 'btint8sortsupport' },
+{ oid => '9292', descr => 'skip support',
+ proname => 'btint8skipsupport', prorettype => 'void',
+ proargtypes => 'internal', prosrc => 'btint8skipsupport' },
{ oid => '354', descr => 'less-equal-greater',
proname => 'btfloat4cmp', proleakproof => 't', prorettype => 'int4',
proargtypes => 'float4 float4', prosrc => 'btfloat4cmp' },
@@ -1034,12 +1043,18 @@
{ oid => '3134', descr => 'sort support',
proname => 'btoidsortsupport', prorettype => 'void',
proargtypes => 'internal', prosrc => 'btoidsortsupport' },
+{ oid => '9293', descr => 'skip support',
+ proname => 'btoidskipsupport', prorettype => 'void',
+ proargtypes => 'internal', prosrc => 'btoidskipsupport' },
{ oid => '404', descr => 'less-equal-greater',
proname => 'btoidvectorcmp', proleakproof => 't', prorettype => 'int4',
proargtypes => 'oidvector oidvector', prosrc => 'btoidvectorcmp' },
{ oid => '358', descr => 'less-equal-greater',
proname => 'btcharcmp', proleakproof => 't', prorettype => 'int4',
proargtypes => 'char char', prosrc => 'btcharcmp' },
+{ oid => '9294', descr => 'skip support',
+ proname => 'btcharskipsupport', prorettype => 'void',
+ proargtypes => 'internal', prosrc => 'btcharskipsupport' },
{ oid => '359', descr => 'less-equal-greater',
proname => 'btnamecmp', proleakproof => 't', prorettype => 'int4',
proargtypes => 'name name', prosrc => 'btnamecmp' },
@@ -2300,6 +2315,9 @@
{ oid => '3136', descr => 'sort support',
proname => 'date_sortsupport', prorettype => 'void',
proargtypes => 'internal', prosrc => 'date_sortsupport' },
+{ oid => '9295', descr => 'skip support',
+ proname => 'date_skipsupport', prorettype => 'void',
+ proargtypes => 'internal', prosrc => 'date_skipsupport' },
{ oid => '4133', descr => 'window RANGE support',
proname => 'in_range', prorettype => 'bool',
proargtypes => 'date date interval bool bool',
@@ -4497,6 +4515,9 @@
{ oid => '1693', descr => 'less-equal-greater',
proname => 'btboolcmp', proleakproof => 't', prorettype => 'int4',
proargtypes => 'bool bool', prosrc => 'btboolcmp' },
+{ oid => '9296', descr => 'skip support',
+ proname => 'btboolskipsupport', prorettype => 'void',
+ proargtypes => 'internal', prosrc => 'btboolskipsupport' },
{ oid => '1688', descr => 'hash',
proname => 'time_hash', prorettype => 'int4', proargtypes => 'time',
@@ -6376,6 +6397,9 @@
{ oid => '3137', descr => 'sort support',
proname => 'timestamp_sortsupport', prorettype => 'void',
proargtypes => 'internal', prosrc => 'timestamp_sortsupport' },
+{ oid => '9297', descr => 'skip support',
+ proname => 'timestamp_skipsupport', prorettype => 'void',
+ proargtypes => 'internal', prosrc => 'timestamp_skipsupport' },
{ oid => '4134', descr => 'window RANGE support',
proname => 'in_range', prorettype => 'bool',
@@ -9431,6 +9455,9 @@
{ oid => '3300', descr => 'sort support',
proname => 'uuid_sortsupport', prorettype => 'void',
proargtypes => 'internal', prosrc => 'uuid_sortsupport' },
+{ oid => '9298', descr => 'skip support',
+ proname => 'uuid_skipsupport', prorettype => 'void',
+ proargtypes => 'internal', prosrc => 'uuid_skipsupport' },
{ oid => '2961', descr => 'I/O',
proname => 'uuid_recv', prorettype => 'uuid', proargtypes => 'internal',
prosrc => 'uuid_recv' },
diff --git a/src/include/utils/skipsupport.h b/src/include/utils/skipsupport.h
new file mode 100644
index 00000000000..bc51847cf61
--- /dev/null
+++ b/src/include/utils/skipsupport.h
@@ -0,0 +1,98 @@
+/*-------------------------------------------------------------------------
+ *
+ * skipsupport.h
+ * Support routines for B-Tree skip scan.
+ *
+ * B-Tree operator classes for discrete types can optionally provide a support
+ * function for skipping. This is used during skip scans.
+ *
+ * A B-tree operator class that implements skip support provides B-tree index
+ * scans with a way of enumerating and iterating through every possible value
+ * from the domain of indexable values. This gives scans a way to determine
+ * the next value in line for a given skip array/scan key/skipped attribute.
+ * Scans request the next (or previous) value whenever they run out of tuples
+ * matching the skip array's current element value. The next (or previous)
+ * value can be used to relocate the scan; it is applied in combination with
+ * at least one additional lower-order non-skip key, taken from the query.
+ *
+ * Skip support is used by discrete type (e.g., integer and date) opclasses.
+ * Indexes with an attribute whose input opclass is of one of these types tend
+ * to store adjacent values in adjoining groups of index tuples. Each time a
+ * skip scan with skip support successfully guesses that the next value in the
+ * index (for a given skipped column) is indeed the value that skip support
+ * just incremented its skip array to, it will have saved the scan some work.
+ * The scan will have avoided an index probe that directly finds the next
+ * value that appears in the index. (When skip support guesses wrong, then it
+ * won't have saved any work, but it also won't have added any useless work.
+ * The failed attempt to locate exactly-matching index tuples acts just like
+ * an explicit probe would; it'll still find the index's true next value.)
+ *
+ * It usually isn't feasible to implement skip support for an opclass whose
+ * input type is continuous. The B-Tree code falls back on next-key sentinel
+ * values for any opclass that doesn't provide its own skip support function.
+ * This isn't really an implementation restriction; there is no benefit to
+ * providing skip support for an opclass where guessing that the next indexed
+ * value is the next possible indexable value never (or hardly ever) works out.
+ *
+ *
+ * Portions Copyright (c) 1996-2025, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/utils/skipsupport.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef SKIPSUPPORT_H
+#define SKIPSUPPORT_H
+
+#include "utils/relcache.h"
+
+typedef struct SkipSupportData *SkipSupport;
+typedef Datum (*SkipSupportIncDec) (Relation rel,
+ Datum existing,
+ bool *overflow);
+
+/*
+ * State/callbacks used by skip arrays to procedurally generate elements.
+ *
+ * A BTSKIPSUPPORT_PROC function must set each and every field when called
+ * (there are no optional fields).
+ */
+typedef struct SkipSupportData
+{
+ /*
+ * low_elem and high_elem must be set with the lowest and highest possible
+ * values from the domain of indexable values (assuming ascending order)
+ */
+ Datum low_elem; /* lowest sorting/leftmost non-NULL value */
+ Datum high_elem; /* highest sorting/rightmost non-NULL value */
+
+ /*
+ * Decrement/increment functions.
+ *
+ * Returns a decremented/incremented copy of caller's existing datum,
+ * allocated in caller's memory context (for pass-by-reference types).
+ * It's not okay for these functions to leak any memory.
+ *
+ * When the decrement function (or increment function) is called with a
+ * value that already matches low_elem (or high_elem), function must set
+ * the *overflow argument. The return value is treated as undefined by
+ * the B-Tree code; it shouldn't need to be (and won't be) pfree'd.
+ *
+ * The B-Tree code's "existing" datum argument is often just a straight
+ * copy of a value from an index tuple. Operator classes must accept
+ * every possible representational variation within the underlying type.
+ * On the other hand, opclasses are _not_ required to preserve information
+ * that doesn't affect how datums are sorted (e.g., skip support for a
+ * fixed precision numeric type needn't preserve datum display scale).
+ * Operator class decrement/increment functions will never be called with
+ * a NULL "existing" argument, either.
+ */
+ SkipSupportIncDec decrement;
+ SkipSupportIncDec increment;
+} SkipSupportData;
+
+extern SkipSupport PrepareSkipSupportFromOpclass(Oid opfamily, Oid opcintype,
+ bool reverse);
+
+#endif /* SKIPSUPPORT_H */