From 47309464e4e937e1b11320eab9b0eff9ad63cd80 Mon Sep 17 00:00:00 2001
From: Tom Lane
Date: Fri, 31 Oct 2003 22:41:21 +0000
Subject: Rewrite GiST documentation into something actually useful.
Christopher Kings-Lynne
---
doc/src/sgml/gist.sgml | 370 ++++++++++++++++++++++++++++++++++---------------
1 file changed, 260 insertions(+), 110 deletions(-)
diff --git a/doc/src/sgml/gist.sgml b/doc/src/sgml/gist.sgml
index 4354d8a4b64..6b34e498eac 100644
--- a/doc/src/sgml/gist.sgml
+++ b/doc/src/sgml/gist.sgml
@@ -1,113 +1,263 @@
-
-
-
-
-Gene
-Selkov
-
-
-Transcribed 1998-02-19
-
-GiST Indexes
-
-
-The information about GIST is at
- http://GiST.CS.Berkeley.EDU:8000/gist/
-
-with more on different indexing and sorting schemes at
-http://s2k-ftp.CS.Berkeley.EDU:8000/personal/jmh/.
-
-And there is more interesting reading at
-http://epoch.cs.berkeley.edu:8000/ and
-http://www.sai.msu.su/~megera/postgres/gist/.
-
-
-
-
-Author
-
-This extraction from an email sent by
-Eugene Selkov, Jr. (selkovjr@mcs.anl.gov)
-contains good information
-on GiST. Hopefully we will learn more in the future and update this information.
-- thomas 1998-03-01
-
-
-
-
-Well, I can't say I quite understand what's going on, but at least
-I (almost) succeeded in porting GiST examples to linux. The GiST access
-method is already in the postgres tree (src/backend/access/gist).
-
-
-Examples at Berkeley
-come with an overview of the methods and demonstrate spatial index
-mechanisms for 2D boxes, polygons, integer intervals and text
-(see also GiST at Berkeley).
-In the box example, we
-are supposed to see a performance gain when using the GiST index; it did
-work for me but I do not have a reasonably large collection of boxes
-to check that. Other examples also worked, except polygons: I got an
-error doing
-
-
-test=> CREATE INDEX pix ON polytmp
-test-> USING GIST (p:box gist_poly_ops) WITH (ISLOSSY);
-ERROR: cannot open pix
-
-(PostgreSQL 6.3 Sun Feb 1 14:57:30 EST 1998)
-
-
-
-I could not get sense of this error message; it appears to be something
-we'd rather ask the developers about (see also Note 4 below). What I
-would suggest here is that someone of you linux guys (linux==gcc?) fetch the
-original sources quoted above and apply my patch (see attachment) and
-tell us what you feel about it. Looks cool to me, but I would not like
-to hold it up while there are so many competent people around.
-
-
-A few notes on the sources:
-
-
-1. I failed to make use of the original (HP-UX) Makefile and rearranged
- the Makefile from the ancient postgres95 tutorial to do the job. I tried
- to keep it generic, but I am a very poor makefile writer -- just did
- some monkey work. Sorry about that, but I guess it is now a little
- more portable that the original makefile.
-
-
-2. I built the example sources right under pgsql/src (just extracted the
- tar file there). The aforementioned Makefile assumes it is one level
- below pgsql/src (in our case, in pgsql/src/pggist).
-
-
-3. The changes I made to the *.c files were all about #include's,
- function prototypes and typecasting. Other than that, I just threw
- away a bunch of unused vars and added a couple parentheses to please
- gcc. I hope I did not screw up too much :)
-
-
-4. There is a comment in polyproc.sql:
-
-
--- -- there's a memory leak in rtree poly_ops!!
--- -- CREATE INDEX pix2 ON polytmp USING RTREE (p poly_ops);
-
-
- Roger that!! I thought it could be related to a number of
- PostgreSQL versions
- back and tried the query. My system went nuts and I had to shoot down
- the postmaster in about ten minutes.
-
-
-
-I will continue to look into GiST for a while, but I would also
-appreciate
-more examples of R-tree usage.
-
-
+
+GiST Indexes
+
+
+ Introduction
+
+
+ GiST stands for Generalized Search Tree. It is a
+ balanced, tree-structured access method, that acts as a base template in
+ which to implement arbitrary indexing schemes. B+-trees, R-trees and many
+ other indexing schemes can be implemented in GiST.
+
+
+
+ One advantage of GiST is that it allows the development
+ of custom data types with the appropriate access methods, by
+ an expert in the domain of the data type, rather than a database expert.
+
+
+
+ Some of the information here is derived from the University of California at
+ Berkeley's GiST Indexing Project web site and Marcel Kornacker's
+ thesis,
+ Access Methods for
+ Next-Generation Database Systems. The GiST
+ implementation in PostgreSQL is primarily
+ maintained by Teodor Sigaev and Oleg Bartunov, and there is more
+ information on their website: >.
+
+
+
+
+
+ Extensibility
+
+
+ Traditionally, implementing a new index access method meant a lot of
+ difficult work. It was necessary to understand the inner workings of the
+ database, such as the lock manager and Write-Ahead Log. The
+ GiST interface has a high level of abstraction,
+ requiring the access method implementor to only implement the semantics of
+ the data type being accessed. The GiST layer itself
+ takes care of concurrency, logging and searching the tree structure.
+
+
+
+ This extensibility should not be confused with the extensibility of the
+ other standard search trees in terms of the data they can handle. For
+ example, PostgreSQL supports extensible B+-trees
+ and R-trees. That means that you can use
+ PostgreSQL to build a B+-tree or R-tree over any
+ data type you want. But B+-trees only support range predicates
+ (<, =, >),
+ and R-trees only support n-D range queries (contains, contained, equals).
+
+
+
+ So if you index, say, an image collection with a
+ PostgreSQL B+-tree, you can only issue queries
+ such as is imagex equal to imagey
, is imagex less
+ than imagey
and is imagex greater than imagey
?
+ Depending on how you define equals
, less than
+ and greater than
in this context, this could be useful.
+ However, by using a GiST based index, you could create
+ ways to ask domain-specific questions, perhaps find all images of
+ horses
or find all over-exposed images
.
+
+
+
+ All it takes to get a GiST access method up and running
+ is to implement seven user-defined methods, which define the behavior of
+ keys in the tree. Of course these methods have to be pretty fancy to
+ support fancy queries, but for all the standard queries (B+-trees,
+ R-trees, etc.) they're relatively straightforward. In short,
+ GiST combines extensibility along with generality, code
+ reuse, and a clean interface.
+
+
+
+
+
+ Implementation
+
+
+ There are seven methods that an index operator class for
+ GiST must provide:
+
+
+
+
+ consistent
+
+
+ Given a predicate p on a tree page, and a user
+ query, q, this method will return false if it is
+ certain that both p and q cannot
+ be true for a given data item.
+
+
+
+
+
+ union
+
+
+ This method consolidates information in the tree. Given a set of
+ entries, this function generates a new predicate that is true for all
+ the entries.
+
+
+
+
+
+ compress
+
+
+ Converts the data item into a format suitable for physical storage in
+ an index page.
+
+
+
+
+
+ decompress
+
+
+ The reverse of the compress method. Converts the
+ index representation of the data item into a format that can be
+ manipulated by the database.
+
+
+
+
+
+ penalty
+
+
+ Returns a value indicating the cost
of inserting the new
+ entry into a particular branch of the tree. items will be inserted
+ down the path of least penalty in the tree.
+
+
+
+
+
+ picksplit
+
+
+ When a page split is necessary, this function decides which entries on
+ the page are to stay on the old page, and which are to move to the new
+ page.
+
+
+
+
+
+ same
+
+
+ Returns true if two entries are identical, false otherwise.
+
+
+
+
+
+
+
+
+
+ Limitations
+
+
+ The current implementation of GiST within
+ PostgreSQL has some major limitations:
+ GiST access is not concurrent; the
+ GiST interface doesn't allow the development of certain
+ data types, such as digital trees (see papers by Aoki et al); and there
+ is not yet any support for write-ahead logging of updates in
+ GiST indexes.
+
+
+
+ Solutions to the concurrency problems appear in Marcel Kornacker's
+ thesis; however these ideas have not yet been put into practice in the
+ PostgreSQL implementation.
+
+
+
+ The lack of write-ahead logging is just a small matter of programming,
+ but since it isn't done yet, a crash could render a GiST
+ index inconsistent, forcing a REINDEX.
+
+
+
+
+
+ Examples
+
+
+ To see example implementations of index methods implemented using
+ GiST, examine the following contrib modules:
+
+
+
+
+ btree_gist
+
+ B-Tree
+
+
+
+
+ cube
+
+ Indexing for multi-dimensional cubes
+
+
+
+
+ intarray
+
+ RD-Tree for one-dimensional array of int4 values
+
+
+
+
+ ltree
+
+ Indexing for tree-like stuctures
+
+
+
+
+ rtree_gist
+
+ R-Tree
+
+
+
+
+ seg
+
+ Storage and indexed access for float ranges
+
+
+
+
+ tsearch and tsearch2
+
+ Full text indexing
+
+
+
+
+
+
+
--
cgit v1.2.3