From fc8d970cbcdd6f025475822a4cf01dfda0873226 Mon Sep 17 00:00:00 2001 From: Tom Lane Date: Wed, 28 May 2003 16:04:02 +0000 Subject: Replace functional-index facility with expressional indexes. Any column of an index can now be a computed expression instead of a simple variable. Restrictions on expressions are the same as for predicates (only immutable functions, no sub-selects). This fixes problems recently introduced with inlining SQL functions, because the inlining transformation is applied to both expression trees so the planner can still match them up. Along the way, improve efficiency of handling index predicates (both predicates and index expressions are now cached by the relcache) and fix 7.3 oversight that didn't record dependencies of predicate expressions. --- doc/src/sgml/catalogs.sgml | 54 +++++++++--------- doc/src/sgml/indices.sgml | 109 ++++++++++++++++++++----------------- doc/src/sgml/plpgsql.sgml | 7 ++- doc/src/sgml/ref/create_index.sgml | 47 ++++++++-------- doc/src/sgml/release.sgml | 3 +- 5 files changed, 116 insertions(+), 104 deletions(-) (limited to 'doc/src') diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml index 0cc355330dd..a8f7190856c 100644 --- a/doc/src/sgml/catalogs.sgml +++ b/doc/src/sgml/catalogs.sgml @@ -1,6 +1,6 @@ @@ -1933,26 +1933,18 @@ The OID of the pg_class entry for the table this index is for - - indproc - regproc - pg_proc.oid - The function's OID if this is a functional index, - else zero - - indkey int2vector pg_attribute.attnum - This is an array of up to - INDEX_MAX_KEYS values that indicate which - table columns this index pertains to. For example a value of - 1 3 would mean that the first and the third - column make up the index key. For a functional index, these - columns are the inputs to the function, and the function's return - value is the index key. + This is an array of indnatts (up to + INDEX_MAX_KEYS) values that indicate which + table columns this index indexes. For example a value of + 1 3 would mean that the first and the third table + columns make up the index key. A zero in this array indicates that the + corresponding index attribute is an expression over the table columns, + rather than a simple column reference. @@ -1961,17 +1953,18 @@ oidvector pg_opclass.oid - For each column in the index key this contains a reference to + For each column in the index key this contains the OID of the operator class to use. See pg_opclass for details. - indisclustered - bool + indnatts + int2 - If true, the table was last clustered on this index. + The number of columns in the index (duplicates + pg_class.relnatts) @@ -1990,19 +1983,28 @@ - indreference - oid + indisclustered + bool - unused + If true, the table was last clustered on this index. + + + + indexprs + text + + Expression trees (in nodeToString() representation) + for index attributes that are not simple column references. This is a + list with one element for each zero entry in indkey. + Null if all index attributes are simple references. indpred text - Expression tree (in the form of a nodeToString() representation) - for partial index predicate. Empty string if not a partial - index. + Expression tree (in nodeToString() representation) + for partial index predicate. Null if not a partial index. diff --git a/doc/src/sgml/indices.sgml b/doc/src/sgml/indices.sgml index fcd7108a14c..4e79084e7ec 100644 --- a/doc/src/sgml/indices.sgml +++ b/doc/src/sgml/indices.sgml @@ -1,4 +1,4 @@ - + Indexes @@ -20,8 +20,7 @@ Introduction - The classical example for the need of an index is if there is a - table similar to this: + Suppose we have a table similar to this: CREATE TABLE test1 ( id integer, @@ -32,24 +31,24 @@ CREATE TABLE test1 ( SELECT content FROM test1 WHERE id = constant; - Ordinarily, the system would have to scan the entire - test1 table row by row to find all + With no advance preparation, the system would have to scan the entire + test1 table, row by row, to find all matching entries. If there are a lot of rows in - test1 and only a few rows (possibly zero - or one) returned by the query, then this is clearly an inefficient - method. If the system were instructed to maintain an index on the - id column, then it could use a more + test1 and only a few rows (perhaps only zero + or one) that would be returned by such a query, then this is clearly an + inefficient method. But if the system has been instructed to maintain an + index on the id column, then it can use a more efficient method for locating matching rows. For instance, it might only have to walk a few levels deep into a search tree. - A similar approach is used in most books of non-fiction: Terms and + A similar approach is used in most books of non-fiction: terms and concepts that are frequently looked up by readers are collected in an alphabetic index at the end of the book. The interested reader can scan the index relatively quickly and flip to the appropriate - page, and would not have to read the entire book to find the - interesting location. As it is the task of the author to + page(s), rather than having to read the entire book to find the + material of interest. Just as it is the task of the author to anticipate the items that the readers are most likely to look up, it is the task of the database programmer to foresee which indexes would be of advantage. @@ -73,13 +72,14 @@ CREATE INDEX test1_id_index ON test1 (id); Once the index is created, no further intervention is required: the - system will use the index when it thinks it would be more efficient + system will update the index when the table is modified, and it will + use the index in queries when it thinks this would be more efficient than a sequential table scan. But you may have to run the ANALYZE command regularly to update statistics to allow the query planner to make educated decisions. Also read for information about how to find out whether an index is used and when and why the - planner may choose to not use an index. + planner may choose not to use an index. @@ -198,7 +198,7 @@ CREATE INDEX name ON table than B-tree indexes, and the index size and build time for hash indexes is much worse. Hash indexes also suffer poor performance under high concurrency. For these reasons, hash index use is - discouraged. + presently discouraged. @@ -250,14 +250,13 @@ CREATE INDEX test2_mm_idx ON test2 (major, minor); Currently, only the B-tree and GiST implementations support multicolumn indexes. Up to 32 columns may be specified. (This limit can be altered when building PostgreSQL; see the - file pg_config.h.) + file pg_config_manual.h.) The query planner can use a multicolumn index for queries that - involve the leftmost column in the index definition and any number - of columns listed to the right of it without a gap (when - used with appropriate operators). For example, + involve the leftmost column in the index definition plus any number + of columns listed to the right of it, without a gap. For example, an index on (a, b, c) can be used in queries involving all of a, b, and c, or in queries involving both @@ -266,7 +265,9 @@ CREATE INDEX test2_mm_idx ON test2 (major, minor); (In a query involving a and c the planner might choose to use the index for a only and treat c like an - ordinary unindexed column.) + ordinary unindexed column.) Of course, each column must be used with + operators appropriate to the index type; clauses that involve other + operators will not be considered. @@ -283,8 +284,8 @@ SELECT name FROM test2 WHERE major = constant OR mino Multicolumn indexes should be used sparingly. Most of the time, an index on a single column is sufficient and saves space and time. - Indexes with more than three columns are almost certainly - inappropriate. + Indexes with more than three columns are unlikely to be helpful + unless the usage of the table is extremely stylized. @@ -332,19 +333,19 @@ CREATE UNIQUE INDEX name ON table - - Functional Indexes + + Indexes on Expressions - + indexes - on functions + on expressions - For a functional index, an index is defined - on the result of a function applied to one or more columns of a - single table. Functional indexes can be used to obtain fast access - to data based on the result of function calls. + An index column need not be just a column of the underlying table, + but can be a function or scalar expression computed from one or + more columns of the table. This feature is useful to obtain fast + access to tables based on the results of computations. @@ -362,20 +363,29 @@ CREATE INDEX test1_lower_col1_idx ON test1 (lower(col1)); - The function in the index definition can take more than one - argument, but they must be table columns, not constants. - Functional indexes are always single-column (namely, the function - result) even if the function uses more than one input column; there - cannot be multicolumn indexes that contain function calls. + As another example, if one often does queries like this: + +SELECT * FROM people WHERE (first_name || ' ' || last_name) = 'John Smith'; + + then it might be worth creating an index like this: + +CREATE INDEX people_names ON people ((first_name || ' ' || last_name)); + - - - The restrictions mentioned in the previous paragraph can easily be - worked around by defining a custom function to use in the index - definition that computes any desired result internally. - - + + The syntax of the CREATE INDEX command normally requires + writing parentheses around index expressions, as shown in the second + example. The parentheses may be omitted when the expression is just + a function call, as in the first example. + + + + Index expressions are relatively expensive to maintain, since the + derived expression(s) must be computed for each row upon insertion + or whenever it is updated. Therefore they should be used only when + queries that can use the index are very frequent. + @@ -391,8 +401,8 @@ CREATE INDEX name ON table The operator class identifies the operators to be used by the index for that column. For example, a B-tree index on the type int4 would use the int4_ops class; this operator - class includes comparison functions for values of type int4. In - practice the default operator class for the column's data type is + class includes comparison functions for values of type int4. + In practice the default operator class for the column's data type is usually sufficient. The main point of having operator classes is that for some data types, there could be more than one meaningful ordering. For example, we might want to sort a complex-number data @@ -427,24 +437,25 @@ CREATE INDEX name ON table name_pattern_ops support B-tree indexes on the types text, varchar, char, and name, respectively. The - difference to the ordinary operator classes is that the values + difference from the ordinary operator classes is that the values are compared strictly character by character rather than according to the locale-specific collation rules. This makes these operator classes suitable for use by queries involving pattern matching expressions (LIKE or POSIX regular expressions) if the server does not use the standard - C locale. As an example, to index a + C locale. As an example, you might index a varchar column like this: CREATE INDEX test_index ON test_table (col varchar_pattern_ops); - If you do use the C locale, you should instead create an index - with the default operator class. Also note that you should + If you do use the C locale, you may instead create an index + with the default operator class, and it will still be useful + for pattern-matching queries. Also note that you should create an index with the default operator class if you want queries involving ordinary comparisons to use an index. Such queries cannot use the xxx_pattern_ops - operator classes. It is possible, however, to create multiple + operator classes. It is allowed to create multiple indexes on the same column with different operator classes. diff --git a/doc/src/sgml/plpgsql.sgml b/doc/src/sgml/plpgsql.sgml index 2aa0c07ed1b..2f51d50cf3e 100644 --- a/doc/src/sgml/plpgsql.sgml +++ b/doc/src/sgml/plpgsql.sgml @@ -1,5 +1,5 @@ @@ -136,9 +136,10 @@ END; Except for input/output conversion and calculation functions for user-defined types, anything that can be defined in C language - functions can also be done with PL/pgSQL. For example, it is possible to + functions can also be done with PL/pgSQL. + For example, it is possible to create complex conditional computation functions and later use - them to define operators or use them in functional indexes. + them to define operators or use them in index expressions. diff --git a/doc/src/sgml/ref/create_index.sgml b/doc/src/sgml/ref/create_index.sgml index 2c6a10ce3df..ff60aaa3b01 100644 --- a/doc/src/sgml/ref/create_index.sgml +++ b/doc/src/sgml/ref/create_index.sgml @@ -1,5 +1,5 @@ @@ -16,12 +16,8 @@ PostgreSQL documentation -CREATE [ UNIQUE ] INDEX index_name ON table - [ USING method ] ( column [ ops_name ] [, ...] ) - [ WHERE predicate ] - -CREATE [ UNIQUE ] INDEX index_name ON table - [ USING method ] ( func_name( column [, ... ]) [ ops_name ] ) +CREATE [ UNIQUE ] INDEX index_name ON table [ USING method ] + ( { column | ( expression ) } [ opclass ] [, ...] ) [ WHERE predicate ] @@ -32,25 +28,22 @@ CREATE [ UNIQUE ] INDEX index_name CREATE INDEX constructs an index index_name on the specified table. - Indexes are primarily used to enhance database performance. But - inappropriate use will result in slower performance. + Indexes are primarily used to enhance database performance (though + inappropriate use will result in slower performance). - In the first syntax shown above, the key field(s) for the - index are specified as column names. + The key field(s) for the index are specified as column names, + or alternatively as expressions written in parentheses. Multiple fields can be specified if the index method supports multicolumn indexes. - In the second syntax shown above, an index is defined on the result - of a user-specified function func_name applied to one or more - columns of a single table. These functional - indexes can be used to obtain fast access to data based - on operators that would normally require some transformation to apply - them to the base data. For example, a functional index on + An index field can be an expression computed from the values of + one or more columns of the table row. This feature can be used + to obtain fast access to data based on some transformation of + the basic data. For example, an index computed on upper(col) would allow the clause WHERE upper(col) = 'JIM' to use an index. @@ -84,6 +77,7 @@ CREATE [ UNIQUE ] INDEX index_name only to columns of the underlying table (but it can use all columns, not only the one(s) being indexed). Presently, subqueries and aggregate expressions are also forbidden in WHERE. + The same restrictions apply to index fields that are expressions. @@ -92,8 +86,8 @@ CREATE [ UNIQUE ] INDEX index_name their arguments and never on any outside influence (such as the contents of another table or the current time). This restriction ensures that the behavior of the index is well-defined. To use a - user-defined function in an index, remember to mark the function immutable - when you create it. + user-defined function in an index expression or WHERE + clause, remember to mark the function immutable when you create it. @@ -156,19 +150,22 @@ CREATE [ UNIQUE ] INDEX index_name - ops_name + expression - An associated operator class. See below for details. + An expression based on one or more columns of the table. The + expression usually must be written with surrounding parentheses, + as shown in the syntax. However, the parentheses may be omitted + if the expression has the form of a function call. - func_name + opclass - A function, which returns a value that can be indexed. + The name of an operator class. See below for details. @@ -177,7 +174,7 @@ CREATE [ UNIQUE ] INDEX index_name predicate - Defines the constraint expression for a partial index. + The constraint expression for a partial index. diff --git a/doc/src/sgml/release.sgml b/doc/src/sgml/release.sgml index 45ba598bd67..db59de76f64 100644 --- a/doc/src/sgml/release.sgml +++ b/doc/src/sgml/release.sgml @@ -1,5 +1,5 @@ @@ -24,6 +24,7 @@ CDATA means the content is "SGML-free", so you can write without worries about funny characters. -->