diff options
Diffstat (limited to 'contrib/ltree/README.ltree')
| -rw-r--r-- | contrib/ltree/README.ltree | 512 |
1 files changed, 0 insertions, 512 deletions
diff --git a/contrib/ltree/README.ltree b/contrib/ltree/README.ltree deleted file mode 100644 index a9d722d051..0000000000 --- a/contrib/ltree/README.ltree +++ /dev/null @@ -1,512 +0,0 @@ -contrib/ltree module - -ltree - is a PostgreSQL contrib module which contains implementation of data -types, indexed access methods and queries for data organized as a tree-like -structures. -This module will works for PostgreSQL version 7.3. -(version for 7.2 version is available from http://www.sai.msu.su/~megera/postgres/gist/ltree/ltree-7.2.tar.gz) -------------------------------------------------------------------------------- -All work was done by Teodor Sigaev (teodor@stack.net) and Oleg Bartunov -(oleg@sai.msu.su). See http://www.sai.msu.su/~megera/postgres/gist for -additional information. Authors would like to thank Eugeny Rodichev for helpful -discussions. Comments and bug reports are welcome. -------------------------------------------------------------------------------- - -LEGAL NOTICES: This module is released under BSD license (as PostgreSQL -itself). This work was done in framework of Russian Scientific Network and -partially supported by Russian Foundation for Basic Research and Stack Group. -------------------------------------------------------------------------------- - -MOTIVATION - -This is a placeholder for introduction to the problem. Hope, people reading -this document doesn't need it too much :-) - -DEFINITIONS - -A label of a node is a sequence of one or more words separated by blank -character '_' and containing letters and digits ( for example, [a-zA-Z0-9] for -C locale). The length of a label is limited by 256 bytes. - -Example: 'Countries', 'Personal_Services' - -A label path of a node is a sequence of one or more dot-separated labels -l1.l2...ln, represents path from root to the node. The length of a label path -is limited by 65Kb, but size <= 2Kb is preferrable. We consider it's not a -strict limitation ( maximal size of label path for DMOZ catalogue - http:// -www.dmoz.org, is about 240 bytes !) - -Example: 'Top.Countries.Europe.Russia' - -We introduce several datatypes: - -ltree - - is a datatype for label path. - -ltree[] - - is a datatype for arrays of ltree. - -lquery - - is a path expression that has regular expression in the label path and - used for ltree matching. Star symbol (*) is used to specify any number of - labels (levels) and could be used at the beginning and the end of lquery, - for example, '*.Europe.*'. - - The following quantifiers are recognized for '*' (like in Perl): - - {n} Match exactly n levels - {n,} Match at least n levels - {n,m} Match at least n but not more than m levels - {,m} Match at maximum m levels (eq. to {0,m}) - - It is possible to use several modifiers at the end of a label: - - - @ Do case-insensitive label matching - * Do prefix matching for a label - % Don't account word separator '_' in label matching, that is - 'Russian%' would match 'Russian_nations', but not 'Russian' - - lquery could contains logical '!' (NOT) at the beginning of the label and ' - |' (OR) to specify possible alternatives for label matching. - - Example of lquery: - - - Top.*{0,2}.sport*@.!football|tennis.Russ*|Spain - a) b) c) d) e) - - A label path should - + a) begins from a node with label 'Top' - + b) and following zero or 2 labels until - + c) a node with label beginning from case-insensitive prefix 'sport' - + d) following node with label not matched 'football' or 'tennis' and - + e) ends on node with label beginning from 'Russ' or strictly matched - 'Spain'. - -ltxtquery - - is a datatype for label searching (like type 'query' for full text - searching, see contrib/tsearch). It's possible to use modifiers @,%,* at - the end of word. The meaning of modifiers are the same as for lquery. - - Example: 'Europe & Russia*@ & !Transportation' - - Search paths contain words 'Europe' and 'Russia*' (case-insensitive) and - not 'Transportation'. Notice, the order of words as they appear in label - path is not important ! - -OPERATIONS - -The following operations are defined for type ltree: - -<,>,<=,>=,=, <> - - have their usual meanings. Comparison is doing in the order of direct - tree traversing, children of a node are sorted lexicographic. -ltree @> ltree - - returns TRUE if left argument is an ancestor of right argument (or - equal). -ltree <@ ltree - - returns TRUE if left argument is a descendant of right argument (or - equal). -ltree ~ lquery, lquery ~ ltree - - return TRUE if node represented by ltree satisfies lquery. -ltree ? lquery[], lquery ? ltree[] - - return TRUE if node represented by ltree satisfies at least one lquery - from array. -ltree @ ltxtquery, ltxtquery @ ltree - - return TRUE if node represented by ltree satisfies ltxtquery. -ltree || ltree, ltree || text, text || ltree - - return concatenated ltree. - -Operations for arrays of ltree (ltree[]): - -ltree[] @> ltree, ltree <@ ltree[] - - returns TRUE if array ltree[] contains an ancestor of ltree. -ltree @> ltree[], ltree[] <@ ltree - - returns TRUE if array ltree[] contains a descendant of ltree. -ltree[] ~ lquery, lquery ~ ltree[] - - returns TRUE if array ltree[] contains label paths matched lquery. -ltree[] ? lquery[], lquery[] ? ltree[] - - returns TRUE if array ltree[] contains label paths matched atleaset one - lquery from array. -ltree[] @ ltxtquery, ltxtquery @ ltree[] - - returns TRUE if array ltree[] contains label paths matched ltxtquery - (full text search). -ltree[] ?@> ltree, ltree ?<@ ltree[], ltree[] ?~ lquery, ltree[] ?@ ltxtquery - - returns first element of array ltree[] satisfies corresponding condition - and NULL in vice versa. - -REMARK - -Operations <@, @>, @ and ~ have analogues - ^<@, ^@>, ^@, ^~, which doesn't use -indices ! - -INDICES - -Various indices could be created to speed up execution of operations: - - * B-tree index over ltree: - <, <=, =, >=, > - * GiST index over ltree: - <, <=, =, >=, >, @>, <@, @, ~, ? - Example: - create index path_gist_idx on test using gist (path); - * GiST index over ltree[]: - ltree[]<@ ltree, ltree @> ltree[], @, ~, ?. - Example: - create index path_gist_idx on test using gist (array_path); - Notices: This index is lossy. - -FUNCTIONS - -ltree subltree - ltree subltree(ltree, start, end) - returns subpath of ltree from start (inclusive) until the end. - # select subltree('Top.Child1.Child2',1,2); - subltree - -------- - Child1 -ltree subpath - ltree subpath(ltree, OFFSET,LEN) - ltree subpath(ltree, OFFSET) - returns subpath of ltree from OFFSET (inclusive) with length LEN. - If OFFSET is negative returns subpath starts that far from the end - of the path. If LENGTH is omitted, returns everything to the end - of the path. If LENGTH is negative, leaves that many labels off - the end of the path. - # select subpath('Top.Child1.Child2',1,2); - subpath - ------- - Child1.Child2 - - # select subpath('Top.Child1.Child2',-2,1); - subpath - --------- - Child1 -int4 nlevel - - int4 nlevel(ltree) - returns level of the node. - # select nlevel('Top.Child1.Child2'); - nlevel - -------- - 3 - Note, that arguments start, end, OFFSET, LEN have meaning of level of the - node ! - -int4 index(ltree,ltree), int4 index(ltree,ltree,OFFSET) - returns number of level of the first occurence of second argument in first - one beginning from OFFSET. if OFFSET is negative, than search begins from | - OFFSET| levels from the end of the path. - SELECT index('0.1.2.3.5.4.5.6.8.5.6.8','5.6',3); - index - ------- - 6 - SELECT index('0.1.2.3.5.4.5.6.8.5.6.8','5.6',-4); - index - ------- - 9 - -ltree text2ltree(text), text ltree2text(text) - cast functions for ltree and text. - - -ltree lca(ltree,ltree,...) (up to 8 arguments) - ltree lca(ltree[]) - Returns Lowest Common Ancestor (lca) - # select lca('1.2.2.3','1.2.3.4.5.6'); - lca - ----- - 1.2 - # select lca('{la.2.3,1.2.3.4.5.6}') is null; - ?column? - ---------- - f - - -INSTALLATION - - cd contrib/ltree - make - make install - make installcheck - -EXAMPLE OF USAGE - - createdb ltreetest - psql ltreetest < /usr/local/pgsql/share/contrib/ltree.sql - psql ltreetest < ltreetest.sql - -Now, we have a database ltreetest populated with a data describing hierarchy -shown below: - - - TOP - / | \ - Science Hobbies Collections - / | \ - Astronomy Amateurs_Astronomy Pictures - / \ | - Astrophysics Cosmology Astronomy - / | \ - Galaxies Stars Astronauts - -Inheritance: - -ltreetest=# select path from test where path <@ 'Top.Science'; - path ------------------------------------- - Top.Science - Top.Science.Astronomy - Top.Science.Astronomy.Astrophysics - Top.Science.Astronomy.Cosmology -(4 rows) - -Matching: - -ltreetest=# select path from test where path ~ '*.Astronomy.*'; - path ------------------------------------------------ - Top.Science.Astronomy - Top.Science.Astronomy.Astrophysics - Top.Science.Astronomy.Cosmology - Top.Collections.Pictures.Astronomy - Top.Collections.Pictures.Astronomy.Stars - Top.Collections.Pictures.Astronomy.Galaxies - Top.Collections.Pictures.Astronomy.Astronauts -(7 rows) -ltreetest=# select path from test where path ~ '*.!pictures@.*.Astronomy.*'; - path ------------------------------------- - Top.Science.Astronomy - Top.Science.Astronomy.Astrophysics - Top.Science.Astronomy.Cosmology -(3 rows) - -Full text search: - -ltreetest=# select path from test where path @ 'Astro*% & !pictures@'; - path ------------------------------------- - Top.Science.Astronomy - Top.Science.Astronomy.Astrophysics - Top.Science.Astronomy.Cosmology - Top.Hobbies.Amateurs_Astronomy -(4 rows) - -ltreetest=# select path from test where path @ 'Astro* & !pictures@'; - path ------------------------------------- - Top.Science.Astronomy - Top.Science.Astronomy.Astrophysics - Top.Science.Astronomy.Cosmology -(3 rows) - -Using Functions: - -ltreetest=# select subpath(path,0,2)||'Space'||subpath(path,2) from test where path <@ 'Top.Science.Astronomy'; - ?column? ------------------------------------------- - Top.Science.Space.Astronomy - Top.Science.Space.Astronomy.Astrophysics - Top.Science.Space.Astronomy.Cosmology -(3 rows) -We could create SQL-function: -CREATE FUNCTION ins_label(ltree, int4, text) RETURNS ltree -AS 'select subpath($1,0,$2) || $3 || subpath($1,$2);' -LANGUAGE SQL IMMUTABLE; - -and previous select could be rewritten as: - -ltreetest=# select ins_label(path,2,'Space') from test where path <@ 'Top.Science.Astronomy'; - ins_label ------------------------------------------- - Top.Science.Space.Astronomy - Top.Science.Space.Astronomy.Astrophysics - Top.Science.Space.Astronomy.Cosmology -(3 rows) - -Or with another arguments: - -CREATE FUNCTION ins_label(ltree, ltree, text) RETURNS ltree -AS 'select subpath($1,0,nlevel($2)) || $3 || subpath($1,nlevel($2));' -LANGUAGE SQL IMMUTABLE; - -ltreetest=# select ins_label(path,'Top.Science'::ltree,'Space') from test where path <@ 'Top.Science.Astronomy'; - ins_label ------------------------------------------- - Top.Science.Space.Astronomy - Top.Science.Space.Astronomy.Astrophysics - Top.Science.Space.Astronomy.Cosmology -(3 rows) - -ADDITIONAL DATA - -To get more feeling from our ltree module you could download -dmozltree-eng.sql.gz (about 3Mb tar.gz archive containing 300,274 nodes), -available from http://www.sai.msu.su/~megera/postgres/gist/ltree/ -dmozltree-eng.sql.gz, which is DMOZ catalogue, prepared for use with ltree. -Setup your test database (dmoz), load ltree module and issue command: - -zcat dmozltree-eng.sql.gz| psql dmoz - -Data will be loaded into database dmoz and all indices will be created. - -BENCHMARKS - -All runs were performed on my IBM ThinkPad T21 (256 MB RAM, 750Mhz) using DMOZ -data, containing 300,274 nodes (see above for download link). We used some -basic queries typical for walking through catalog. - -QUERIES - - * Q0: Count all rows (sort of base time for comparison) - select count(*) from dmoz; - count - -------- - 300274 - (1 row) - * Q1: Get direct children (without inheritance) - select path from dmoz where path ~ 'Top.Adult.Arts.Animation.*{1}'; - path - ----------------------------------- - Top.Adult.Arts.Animation.Cartoons - Top.Adult.Arts.Animation.Anime - (2 rows) - * Q2: The same as Q1 but with counting of successors - select path as parentpath , (select count(*)-1 from dmoz where path <@ - p.path) as count from dmoz p where path ~ 'Top.Adult.Arts.Animation.*{1}'; - parentpath | count - -----------------------------------+------- - Top.Adult.Arts.Animation.Cartoons | 2 - Top.Adult.Arts.Animation.Anime | 61 - (2 rows) - * Q3: Get all parents - select path from dmoz where path @> 'Top.Adult.Arts.Animation' order by - path asc; - path - -------------------------- - Top - Top.Adult - Top.Adult.Arts - Top.Adult.Arts.Animation - (4 rows) - * Q4: Get all parents with counting of children - select path, (select count(*)-1 from dmoz where path <@ p.path) as count - from dmoz p where path @> 'Top.Adult.Arts.Animation' order by path asc; - path | count - --------------------------+-------- - Top | 300273 - Top.Adult | 4913 - Top.Adult.Arts | 339 - Top.Adult.Arts.Animation | 65 - (4 rows) - * Q5: Get all children with levels - select path, nlevel(path) - nlevel('Top.Adult.Arts.Animation') as level - from dmoz where path ~ 'Top.Adult.Arts.Animation.*{1,2}' order by path asc; - path | level - ------------------------------------------------+------- - Top.Adult.Arts.Animation.Anime | 1 - Top.Adult.Arts.Animation.Anime.Fan_Works | 2 - Top.Adult.Arts.Animation.Anime.Games | 2 - Top.Adult.Arts.Animation.Anime.Genres | 2 - Top.Adult.Arts.Animation.Anime.Image_Galleries | 2 - Top.Adult.Arts.Animation.Anime.Multimedia | 2 - Top.Adult.Arts.Animation.Anime.Resources | 2 - Top.Adult.Arts.Animation.Anime.Titles | 2 - Top.Adult.Arts.Animation.Cartoons | 1 - Top.Adult.Arts.Animation.Cartoons.AVS | 2 - Top.Adult.Arts.Animation.Cartoons.Members | 2 - (11 rows) - -Timings - -+---------------------------------------------+ -|Query|Rows|Time (ms) index|Time (ms) no index| -|-----+----+---------------+------------------| -| Q0| 1| NA| 1453.44| -|-----+----+---------------+------------------| -| Q1| 2| 0.49| 1001.54| -|-----+----+---------------+------------------| -| Q2| 2| 1.48| 3009.39| -|-----+----+---------------+------------------| -| Q3| 4| 0.55| 906.98| -|-----+----+---------------+------------------| -| Q4| 4| 24385.07| 4951.91| -|-----+----+---------------+------------------| -| Q5| 11| 0.85| 1003.23| -+---------------------------------------------+ -Timings without indices were obtained using operations which doesn't use -indices (see above) - -Remarks - -We didn't run full-scale tests, also we didn't present (yet) data for -operations with arrays of ltree (ltree[]) and full text searching. We'll -appreciate your input. So far, below some (rather obvious) results: - - * Indices does help execution of queries - * Q4 performs bad because one needs to read almost all data from the HDD - -CHANGES - -Mar 28, 2003 - Added functions index(ltree,ltree,offset), text2ltree(text), - ltree2text(text) -Feb 7, 2003 - Add ? operation - Fix ~ operation bug: eg '1.1.1' ~ '*.1' - Optimize index storage -Aug 9, 2002 - Fixed very stupid but important bug :-) -July 31, 2002 - Now works on 64-bit platforms. - Added function lca - lowest common ancestor - Version for 7.2 is distributed as separate package - - http://www.sai.msu.su/~megera/postgres/gist/ltree/ltree-7.2.tar.gz -July 13, 2002 - Initial release. - -TODO - - * Testing on 64-bit platforms. There are several known problems with byte - alignment; -- RESOLVED - * Better documentation; - * We plan (probably) to improve regular expressions processing using - non-deterministic automata; - * Some sort of XML support; - * Better full text searching; - -SOME BACKGROUNDS - -The approach we use for ltree is much like one we used in our other GiST based -contrib modules (intarray, tsearch, tree, btree_gist, rtree_gist). Theoretical -background is available in papers referenced from our GiST development page -(http://www.sai.msu.su/~megera/postgres/gist). - -A hierarchical data structure (tree) is a set of nodes. Each node has a -signature (LPS) of a fixed size, which is a hashed label path of that node. -Traversing a tree we could *certainly* prune branches if - -LQS (bitwise AND) LPS != LQS - -where LQS is a signature of lquery or ltxtquery, obtained in the same way as -LPS. - -ltree[]: -For array of ltree LPS is a bitwise OR-ed signatures of *ALL* children -reachable from that node. Signatures are stored in RD-tree, implemented using -GiST, which provides indexed access. - -ltree: -For ltree we store LPS in a B-tree, implemented using GiST. Each node entry is -represented by (left_bound, signature, right_bound), so that we could speedup -operations <, <=, =, >=, > using left_bound, right_bound and prune branches of -a tree using signature. -------------------------------------------------------------------------------- -We ask people who find the module useful to send us a postcards to: -Moscow, 119899, Universitetski pr.13, Moscow State University, Sternberg -Astronomical Institute, Russia -For: Bartunov O.S. -and -Moscow, Bratislavskaya str.23, appt. 18, Russia -For: Sigaev F.G. |
