summaryrefslogtreecommitdiff
path: root/contrib/ltree/README.ltree
diff options
context:
space:
mode:
Diffstat (limited to 'contrib/ltree/README.ltree')
-rw-r--r--contrib/ltree/README.ltree512
1 files changed, 0 insertions, 512 deletions
diff --git a/contrib/ltree/README.ltree b/contrib/ltree/README.ltree
deleted file mode 100644
index a9d722d051..0000000000
--- a/contrib/ltree/README.ltree
+++ /dev/null
@@ -1,512 +0,0 @@
-contrib/ltree module
-
-ltree - is a PostgreSQL contrib module which contains implementation of data
-types, indexed access methods and queries for data organized as a tree-like
-structures.
-This module will works for PostgreSQL version 7.3.
-(version for 7.2 version is available from http://www.sai.msu.su/~megera/postgres/gist/ltree/ltree-7.2.tar.gz)
--------------------------------------------------------------------------------
-All work was done by Teodor Sigaev (teodor@stack.net) and Oleg Bartunov
-(oleg@sai.msu.su). See http://www.sai.msu.su/~megera/postgres/gist for
-additional information. Authors would like to thank Eugeny Rodichev for helpful
-discussions. Comments and bug reports are welcome.
--------------------------------------------------------------------------------
-
-LEGAL NOTICES: This module is released under BSD license (as PostgreSQL
-itself). This work was done in framework of Russian Scientific Network and
-partially supported by Russian Foundation for Basic Research and Stack Group.
--------------------------------------------------------------------------------
-
-MOTIVATION
-
-This is a placeholder for introduction to the problem. Hope, people reading
-this document doesn't need it too much :-)
-
-DEFINITIONS
-
-A label of a node is a sequence of one or more words separated by blank
-character '_' and containing letters and digits ( for example, [a-zA-Z0-9] for
-C locale). The length of a label is limited by 256 bytes.
-
-Example: 'Countries', 'Personal_Services'
-
-A label path of a node is a sequence of one or more dot-separated labels
-l1.l2...ln, represents path from root to the node. The length of a label path
-is limited by 65Kb, but size <= 2Kb is preferrable. We consider it's not a
-strict limitation ( maximal size of label path for DMOZ catalogue - http://
-www.dmoz.org, is about 240 bytes !)
-
-Example: 'Top.Countries.Europe.Russia'
-
-We introduce several datatypes:
-
-ltree
- - is a datatype for label path.
-
-ltree[]
- - is a datatype for arrays of ltree.
-
-lquery
- - is a path expression that has regular expression in the label path and
- used for ltree matching. Star symbol (*) is used to specify any number of
- labels (levels) and could be used at the beginning and the end of lquery,
- for example, '*.Europe.*'.
-
- The following quantifiers are recognized for '*' (like in Perl):
-
- {n} Match exactly n levels
- {n,} Match at least n levels
- {n,m} Match at least n but not more than m levels
- {,m} Match at maximum m levels (eq. to {0,m})
-
- It is possible to use several modifiers at the end of a label:
-
-
- @ Do case-insensitive label matching
- * Do prefix matching for a label
- % Don't account word separator '_' in label matching, that is
- 'Russian%' would match 'Russian_nations', but not 'Russian'
-
- lquery could contains logical '!' (NOT) at the beginning of the label and '
- |' (OR) to specify possible alternatives for label matching.
-
- Example of lquery:
-
-
- Top.*{0,2}.sport*@.!football|tennis.Russ*|Spain
- a) b) c) d) e)
-
- A label path should
- + a) begins from a node with label 'Top'
- + b) and following zero or 2 labels until
- + c) a node with label beginning from case-insensitive prefix 'sport'
- + d) following node with label not matched 'football' or 'tennis' and
- + e) ends on node with label beginning from 'Russ' or strictly matched
- 'Spain'.
-
-ltxtquery
- - is a datatype for label searching (like type 'query' for full text
- searching, see contrib/tsearch). It's possible to use modifiers @,%,* at
- the end of word. The meaning of modifiers are the same as for lquery.
-
- Example: 'Europe & Russia*@ & !Transportation'
-
- Search paths contain words 'Europe' and 'Russia*' (case-insensitive) and
- not 'Transportation'. Notice, the order of words as they appear in label
- path is not important !
-
-OPERATIONS
-
-The following operations are defined for type ltree:
-
-<,>,<=,>=,=, <>
- - have their usual meanings. Comparison is doing in the order of direct
- tree traversing, children of a node are sorted lexicographic.
-ltree @> ltree
- - returns TRUE if left argument is an ancestor of right argument (or
- equal).
-ltree <@ ltree
- - returns TRUE if left argument is a descendant of right argument (or
- equal).
-ltree ~ lquery, lquery ~ ltree
- - return TRUE if node represented by ltree satisfies lquery.
-ltree ? lquery[], lquery ? ltree[]
- - return TRUE if node represented by ltree satisfies at least one lquery
- from array.
-ltree @ ltxtquery, ltxtquery @ ltree
- - return TRUE if node represented by ltree satisfies ltxtquery.
-ltree || ltree, ltree || text, text || ltree
- - return concatenated ltree.
-
-Operations for arrays of ltree (ltree[]):
-
-ltree[] @> ltree, ltree <@ ltree[]
- - returns TRUE if array ltree[] contains an ancestor of ltree.
-ltree @> ltree[], ltree[] <@ ltree
- - returns TRUE if array ltree[] contains a descendant of ltree.
-ltree[] ~ lquery, lquery ~ ltree[]
- - returns TRUE if array ltree[] contains label paths matched lquery.
-ltree[] ? lquery[], lquery[] ? ltree[]
- - returns TRUE if array ltree[] contains label paths matched atleaset one
- lquery from array.
-ltree[] @ ltxtquery, ltxtquery @ ltree[]
- - returns TRUE if array ltree[] contains label paths matched ltxtquery
- (full text search).
-ltree[] ?@> ltree, ltree ?<@ ltree[], ltree[] ?~ lquery, ltree[] ?@ ltxtquery
- - returns first element of array ltree[] satisfies corresponding condition
- and NULL in vice versa.
-
-REMARK
-
-Operations <@, @>, @ and ~ have analogues - ^<@, ^@>, ^@, ^~, which doesn't use
-indices !
-
-INDICES
-
-Various indices could be created to speed up execution of operations:
-
- * B-tree index over ltree:
- <, <=, =, >=, >
- * GiST index over ltree:
- <, <=, =, >=, >, @>, <@, @, ~, ?
- Example:
- create index path_gist_idx on test using gist (path);
- * GiST index over ltree[]:
- ltree[]<@ ltree, ltree @> ltree[], @, ~, ?.
- Example:
- create index path_gist_idx on test using gist (array_path);
- Notices: This index is lossy.
-
-FUNCTIONS
-
-ltree subltree
- ltree subltree(ltree, start, end)
- returns subpath of ltree from start (inclusive) until the end.
- # select subltree('Top.Child1.Child2',1,2);
- subltree
- --------
- Child1
-ltree subpath
- ltree subpath(ltree, OFFSET,LEN)
- ltree subpath(ltree, OFFSET)
- returns subpath of ltree from OFFSET (inclusive) with length LEN.
- If OFFSET is negative returns subpath starts that far from the end
- of the path. If LENGTH is omitted, returns everything to the end
- of the path. If LENGTH is negative, leaves that many labels off
- the end of the path.
- # select subpath('Top.Child1.Child2',1,2);
- subpath
- -------
- Child1.Child2
-
- # select subpath('Top.Child1.Child2',-2,1);
- subpath
- ---------
- Child1
-int4 nlevel
-
- int4 nlevel(ltree) - returns level of the node.
- # select nlevel('Top.Child1.Child2');
- nlevel
- --------
- 3
- Note, that arguments start, end, OFFSET, LEN have meaning of level of the
- node !
-
-int4 index(ltree,ltree), int4 index(ltree,ltree,OFFSET)
- returns number of level of the first occurence of second argument in first
- one beginning from OFFSET. if OFFSET is negative, than search begins from |
- OFFSET| levels from the end of the path.
- SELECT index('0.1.2.3.5.4.5.6.8.5.6.8','5.6',3);
- index
- -------
- 6
- SELECT index('0.1.2.3.5.4.5.6.8.5.6.8','5.6',-4);
- index
- -------
- 9
-
-ltree text2ltree(text), text ltree2text(text)
- cast functions for ltree and text.
-
-
-ltree lca(ltree,ltree,...) (up to 8 arguments)
- ltree lca(ltree[])
- Returns Lowest Common Ancestor (lca)
- # select lca('1.2.2.3','1.2.3.4.5.6');
- lca
- -----
- 1.2
- # select lca('{la.2.3,1.2.3.4.5.6}') is null;
- ?column?
- ----------
- f
-
-
-INSTALLATION
-
- cd contrib/ltree
- make
- make install
- make installcheck
-
-EXAMPLE OF USAGE
-
- createdb ltreetest
- psql ltreetest < /usr/local/pgsql/share/contrib/ltree.sql
- psql ltreetest < ltreetest.sql
-
-Now, we have a database ltreetest populated with a data describing hierarchy
-shown below:
-
-
- TOP
- / | \
- Science Hobbies Collections
- / | \
- Astronomy Amateurs_Astronomy Pictures
- / \ |
- Astrophysics Cosmology Astronomy
- / | \
- Galaxies Stars Astronauts
-
-Inheritance:
-
-ltreetest=# select path from test where path <@ 'Top.Science';
- path
-------------------------------------
- Top.Science
- Top.Science.Astronomy
- Top.Science.Astronomy.Astrophysics
- Top.Science.Astronomy.Cosmology
-(4 rows)
-
-Matching:
-
-ltreetest=# select path from test where path ~ '*.Astronomy.*';
- path
------------------------------------------------
- Top.Science.Astronomy
- Top.Science.Astronomy.Astrophysics
- Top.Science.Astronomy.Cosmology
- Top.Collections.Pictures.Astronomy
- Top.Collections.Pictures.Astronomy.Stars
- Top.Collections.Pictures.Astronomy.Galaxies
- Top.Collections.Pictures.Astronomy.Astronauts
-(7 rows)
-ltreetest=# select path from test where path ~ '*.!pictures@.*.Astronomy.*';
- path
-------------------------------------
- Top.Science.Astronomy
- Top.Science.Astronomy.Astrophysics
- Top.Science.Astronomy.Cosmology
-(3 rows)
-
-Full text search:
-
-ltreetest=# select path from test where path @ 'Astro*% & !pictures@';
- path
-------------------------------------
- Top.Science.Astronomy
- Top.Science.Astronomy.Astrophysics
- Top.Science.Astronomy.Cosmology
- Top.Hobbies.Amateurs_Astronomy
-(4 rows)
-
-ltreetest=# select path from test where path @ 'Astro* & !pictures@';
- path
-------------------------------------
- Top.Science.Astronomy
- Top.Science.Astronomy.Astrophysics
- Top.Science.Astronomy.Cosmology
-(3 rows)
-
-Using Functions:
-
-ltreetest=# select subpath(path,0,2)||'Space'||subpath(path,2) from test where path <@ 'Top.Science.Astronomy';
- ?column?
-------------------------------------------
- Top.Science.Space.Astronomy
- Top.Science.Space.Astronomy.Astrophysics
- Top.Science.Space.Astronomy.Cosmology
-(3 rows)
-We could create SQL-function:
-CREATE FUNCTION ins_label(ltree, int4, text) RETURNS ltree
-AS 'select subpath($1,0,$2) || $3 || subpath($1,$2);'
-LANGUAGE SQL IMMUTABLE;
-
-and previous select could be rewritten as:
-
-ltreetest=# select ins_label(path,2,'Space') from test where path <@ 'Top.Science.Astronomy';
- ins_label
-------------------------------------------
- Top.Science.Space.Astronomy
- Top.Science.Space.Astronomy.Astrophysics
- Top.Science.Space.Astronomy.Cosmology
-(3 rows)
-
-Or with another arguments:
-
-CREATE FUNCTION ins_label(ltree, ltree, text) RETURNS ltree
-AS 'select subpath($1,0,nlevel($2)) || $3 || subpath($1,nlevel($2));'
-LANGUAGE SQL IMMUTABLE;
-
-ltreetest=# select ins_label(path,'Top.Science'::ltree,'Space') from test where path <@ 'Top.Science.Astronomy';
- ins_label
-------------------------------------------
- Top.Science.Space.Astronomy
- Top.Science.Space.Astronomy.Astrophysics
- Top.Science.Space.Astronomy.Cosmology
-(3 rows)
-
-ADDITIONAL DATA
-
-To get more feeling from our ltree module you could download
-dmozltree-eng.sql.gz (about 3Mb tar.gz archive containing 300,274 nodes),
-available from http://www.sai.msu.su/~megera/postgres/gist/ltree/
-dmozltree-eng.sql.gz, which is DMOZ catalogue, prepared for use with ltree.
-Setup your test database (dmoz), load ltree module and issue command:
-
-zcat dmozltree-eng.sql.gz| psql dmoz
-
-Data will be loaded into database dmoz and all indices will be created.
-
-BENCHMARKS
-
-All runs were performed on my IBM ThinkPad T21 (256 MB RAM, 750Mhz) using DMOZ
-data, containing 300,274 nodes (see above for download link). We used some
-basic queries typical for walking through catalog.
-
-QUERIES
-
- * Q0: Count all rows (sort of base time for comparison)
- select count(*) from dmoz;
- count
- --------
- 300274
- (1 row)
- * Q1: Get direct children (without inheritance)
- select path from dmoz where path ~ 'Top.Adult.Arts.Animation.*{1}';
- path
- -----------------------------------
- Top.Adult.Arts.Animation.Cartoons
- Top.Adult.Arts.Animation.Anime
- (2 rows)
- * Q2: The same as Q1 but with counting of successors
- select path as parentpath , (select count(*)-1 from dmoz where path <@
- p.path) as count from dmoz p where path ~ 'Top.Adult.Arts.Animation.*{1}';
- parentpath | count
- -----------------------------------+-------
- Top.Adult.Arts.Animation.Cartoons | 2
- Top.Adult.Arts.Animation.Anime | 61
- (2 rows)
- * Q3: Get all parents
- select path from dmoz where path @> 'Top.Adult.Arts.Animation' order by
- path asc;
- path
- --------------------------
- Top
- Top.Adult
- Top.Adult.Arts
- Top.Adult.Arts.Animation
- (4 rows)
- * Q4: Get all parents with counting of children
- select path, (select count(*)-1 from dmoz where path <@ p.path) as count
- from dmoz p where path @> 'Top.Adult.Arts.Animation' order by path asc;
- path | count
- --------------------------+--------
- Top | 300273
- Top.Adult | 4913
- Top.Adult.Arts | 339
- Top.Adult.Arts.Animation | 65
- (4 rows)
- * Q5: Get all children with levels
- select path, nlevel(path) - nlevel('Top.Adult.Arts.Animation') as level
- from dmoz where path ~ 'Top.Adult.Arts.Animation.*{1,2}' order by path asc;
- path | level
- ------------------------------------------------+-------
- Top.Adult.Arts.Animation.Anime | 1
- Top.Adult.Arts.Animation.Anime.Fan_Works | 2
- Top.Adult.Arts.Animation.Anime.Games | 2
- Top.Adult.Arts.Animation.Anime.Genres | 2
- Top.Adult.Arts.Animation.Anime.Image_Galleries | 2
- Top.Adult.Arts.Animation.Anime.Multimedia | 2
- Top.Adult.Arts.Animation.Anime.Resources | 2
- Top.Adult.Arts.Animation.Anime.Titles | 2
- Top.Adult.Arts.Animation.Cartoons | 1
- Top.Adult.Arts.Animation.Cartoons.AVS | 2
- Top.Adult.Arts.Animation.Cartoons.Members | 2
- (11 rows)
-
-Timings
-
-+---------------------------------------------+
-|Query|Rows|Time (ms) index|Time (ms) no index|
-|-----+----+---------------+------------------|
-| Q0| 1| NA| 1453.44|
-|-----+----+---------------+------------------|
-| Q1| 2| 0.49| 1001.54|
-|-----+----+---------------+------------------|
-| Q2| 2| 1.48| 3009.39|
-|-----+----+---------------+------------------|
-| Q3| 4| 0.55| 906.98|
-|-----+----+---------------+------------------|
-| Q4| 4| 24385.07| 4951.91|
-|-----+----+---------------+------------------|
-| Q5| 11| 0.85| 1003.23|
-+---------------------------------------------+
-Timings without indices were obtained using operations which doesn't use
-indices (see above)
-
-Remarks
-
-We didn't run full-scale tests, also we didn't present (yet) data for
-operations with arrays of ltree (ltree[]) and full text searching. We'll
-appreciate your input. So far, below some (rather obvious) results:
-
- * Indices does help execution of queries
- * Q4 performs bad because one needs to read almost all data from the HDD
-
-CHANGES
-
-Mar 28, 2003
- Added functions index(ltree,ltree,offset), text2ltree(text),
- ltree2text(text)
-Feb 7, 2003
- Add ? operation
- Fix ~ operation bug: eg '1.1.1' ~ '*.1'
- Optimize index storage
-Aug 9, 2002
- Fixed very stupid but important bug :-)
-July 31, 2002
- Now works on 64-bit platforms.
- Added function lca - lowest common ancestor
- Version for 7.2 is distributed as separate package -
- http://www.sai.msu.su/~megera/postgres/gist/ltree/ltree-7.2.tar.gz
-July 13, 2002
- Initial release.
-
-TODO
-
- * Testing on 64-bit platforms. There are several known problems with byte
- alignment; -- RESOLVED
- * Better documentation;
- * We plan (probably) to improve regular expressions processing using
- non-deterministic automata;
- * Some sort of XML support;
- * Better full text searching;
-
-SOME BACKGROUNDS
-
-The approach we use for ltree is much like one we used in our other GiST based
-contrib modules (intarray, tsearch, tree, btree_gist, rtree_gist). Theoretical
-background is available in papers referenced from our GiST development page
-(http://www.sai.msu.su/~megera/postgres/gist).
-
-A hierarchical data structure (tree) is a set of nodes. Each node has a
-signature (LPS) of a fixed size, which is a hashed label path of that node.
-Traversing a tree we could *certainly* prune branches if
-
-LQS (bitwise AND) LPS != LQS
-
-where LQS is a signature of lquery or ltxtquery, obtained in the same way as
-LPS.
-
-ltree[]:
-For array of ltree LPS is a bitwise OR-ed signatures of *ALL* children
-reachable from that node. Signatures are stored in RD-tree, implemented using
-GiST, which provides indexed access.
-
-ltree:
-For ltree we store LPS in a B-tree, implemented using GiST. Each node entry is
-represented by (left_bound, signature, right_bound), so that we could speedup
-operations <, <=, =, >=, > using left_bound, right_bound and prune branches of
-a tree using signature.
--------------------------------------------------------------------------------
-We ask people who find the module useful to send us a postcards to:
-Moscow, 119899, Universitetski pr.13, Moscow State University, Sternberg
-Astronomical Institute, Russia
-For: Bartunov O.S.
-and
-Moscow, Bratislavskaya str.23, appt. 18, Russia
-For: Sigaev F.G.