diff options
| author | Tom Lane | 2017-05-14 14:54:47 +0000 |
|---|---|---|
| committer | Tom Lane | 2017-05-14 14:55:01 +0000 |
| commit | f04c9a61468904b6815b2bc73a48878817766e0e (patch) | |
| tree | b8eb1b9a131023b1ab0f7e151e036988d02eb8a3 /src/backend/statistics | |
| parent | 12ad38b3b4b5004001a525e0a0eda2ec45329e8e (diff) | |
Standardize terminology for pg_statistic_ext entries.
Consistently refer to such an entry as a "statistics object", not just
"statistics" or "extended statistics". Previously we had a mismash of
terms, accompanied by utter confusion as to whether the term was
singular or plural. That's not only grating (at least to the ear of
a native English speaker) but could be outright misleading, eg in error
messages that seemed to be referring to multiple objects where only one
could be meant.
This commit fixes the code and a lot of comments (though I may have
missed a few). I also renamed two new SQL functions,
pg_get_statisticsextdef -> pg_get_statisticsobjdef
pg_statistic_ext_is_visible -> pg_statistics_obj_is_visible
to conform better with this terminology.
I have not touched the SGML docs other than fixing those function
names; the docs certainly need work but it seems like a separable task.
Discussion: https://postgr.es/m/22676.1494557205@sss.pgh.pa.us
Diffstat (limited to 'src/backend/statistics')
| -rw-r--r-- | src/backend/statistics/README | 9 | ||||
| -rw-r--r-- | src/backend/statistics/README.dependencies | 2 | ||||
| -rw-r--r-- | src/backend/statistics/dependencies.c | 14 | ||||
| -rw-r--r-- | src/backend/statistics/extended_stats.c | 44 | ||||
| -rw-r--r-- | src/backend/statistics/mvdistinct.c | 7 |
5 files changed, 40 insertions, 36 deletions
diff --git a/src/backend/statistics/README b/src/backend/statistics/README index af7651127eb..a8f00a590e6 100644 --- a/src/backend/statistics/README +++ b/src/backend/statistics/README @@ -12,7 +12,7 @@ hopefully improving the estimates and producing better plans. Types of statistics ------------------- -There are two kinds of extended statistics: +There are currently two kinds of extended statistics: (a) ndistinct coefficients @@ -36,7 +36,7 @@ Complex clauses We also support estimating more complex clauses - essentially AND/OR clauses with (Var op Const) as leaves, as long as all the referenced attributes are -covered by a single statistics. +covered by a single statistics object. For example this condition @@ -59,7 +59,7 @@ Selectivity estimation Throughout the planner clauselist_selectivity() still remains in charge of most selectivity estimate requests. clauselist_selectivity() can be instructed to try to make use of any extended statistics on the given RelOptInfo, which -it will do, if: +it will do if: (a) An actual valid RelOptInfo was given. Join relations are passed in as NULL, therefore are invalid. @@ -77,6 +77,7 @@ performing estimations knows which clauses are to be skipped. Size of sample in ANALYZE ------------------------- + When performing ANALYZE, the number of rows to sample is determined as (300 * statistics_target) @@ -93,4 +94,4 @@ those are not necessarily limited by statistics_target. This however merits further discussion, because collecting the sample is quite expensive and increasing it further would make ANALYZE even more painful. Judging by the experiments with the current implementation, the fixed size -seems to work reasonably well for now, so we leave this as a future work. +seems to work reasonably well for now, so we leave this as future work. diff --git a/src/backend/statistics/README.dependencies b/src/backend/statistics/README.dependencies index 7bc2533dc65..59f9d576578 100644 --- a/src/backend/statistics/README.dependencies +++ b/src/backend/statistics/README.dependencies @@ -48,7 +48,7 @@ rendering the approach mostly useless even for slightly noisy data sets, or result in sudden changes in behavior depending on minor differences between samples provided to ANALYZE. -For this reason, the statistics implements "soft" functional dependencies, +For this reason, extended statistics implement "soft" functional dependencies, associating each functional dependency with a degree of validity (a number between 0 and 1). This degree is then used to combine selectivities in a smooth manner. diff --git a/src/backend/statistics/dependencies.c b/src/backend/statistics/dependencies.c index fe9a9ef5de0..0e71f058ad2 100644 --- a/src/backend/statistics/dependencies.c +++ b/src/backend/statistics/dependencies.c @@ -342,8 +342,8 @@ dependency_degree(int numrows, HeapTuple *rows, int k, AttrNumber *dependency, * detects functional dependencies between groups of columns * * Generates all possible subsets of columns (variations) and computes - * the degree of validity for each one. For example with a statistic on - * three columns (a,b,c) there are 9 possible dependencies + * the degree of validity for each one. For example when creating statistics + * on three columns (a,b,c) there are 9 possible dependencies * * two columns three columns * ----------- ------------- @@ -383,8 +383,8 @@ statext_dependencies_build(int numrows, HeapTuple *rows, Bitmapset *attrs, /* * We'll try build functional dependencies starting from the smallest ones * covering just 2 columns, to the largest ones, covering all columns - * included in the statistics. We start from the smallest ones because we - * want to be able to skip already implied ones. + * included in the statistics object. We start from the smallest ones + * because we want to be able to skip already implied ones. */ for (k = 2; k <= numattrs; k++) { @@ -644,7 +644,7 @@ staext_dependencies_load(Oid mvoid) HeapTuple htup = SearchSysCache1(STATEXTOID, ObjectIdGetDatum(mvoid)); if (!HeapTupleIsValid(htup)) - elog(ERROR, "cache lookup failed for extended statistics %u", mvoid); + elog(ERROR, "cache lookup failed for statistics object %u", mvoid); deps = SysCacheGetAttr(STATEXTOID, htup, Anum_pg_statistic_ext_stxdependencies, &isnull); @@ -975,7 +975,7 @@ dependencies_clauselist_selectivity(PlannerInfo *root, return 1.0; } - /* find the best suited statistics for these attnums */ + /* find the best suited statistics object for these attnums */ stat = choose_best_statistics(rel->statlist, clauses_attnums, STATS_EXT_DEPENDENCIES); @@ -986,7 +986,7 @@ dependencies_clauselist_selectivity(PlannerInfo *root, return 1.0; } - /* load the dependency items stored in the statistics */ + /* load the dependency items stored in the statistics object */ dependencies = staext_dependencies_load(stat->statOid); /* diff --git a/src/backend/statistics/extended_stats.c b/src/backend/statistics/extended_stats.c index b334140c48a..3f74cee05f8 100644 --- a/src/backend/statistics/extended_stats.c +++ b/src/backend/statistics/extended_stats.c @@ -3,7 +3,7 @@ * extended_stats.c * POSTGRES extended statistics * - * Generic code supporting statistic objects created via CREATE STATISTICS. + * Generic code supporting statistics objects created via CREATE STATISTICS. * * * Portions Copyright (c) 1996-2017, PostgreSQL Global Development Group @@ -35,14 +35,15 @@ /* - * Used internally to refer to an individual pg_statistic_ext entry. + * Used internally to refer to an individual statistics object, i.e., + * a pg_statistic_ext entry. */ typedef struct StatExtEntry { Oid statOid; /* OID of pg_statistic_ext entry */ - char *schema; /* statistics schema */ - char *name; /* statistics name */ - Bitmapset *columns; /* attribute numbers covered by the statistics */ + char *schema; /* statistics object's schema */ + char *name; /* statistics object's name */ + Bitmapset *columns; /* attribute numbers covered by the object */ List *types; /* 'char' list of enabled statistic kinds */ } StatExtEntry; @@ -59,8 +60,8 @@ static void statext_store(Relation pg_stext, Oid relid, * Compute requested extended stats, using the rows sampled for the plain * (single-column) stats. * - * This fetches a list of stats from pg_statistic_ext, computes the stats - * and serializes them back into the catalog (as bytea values). + * This fetches a list of stats types from pg_statistic_ext, computes the + * requested stats, and serializes them back into the catalog. */ void BuildRelationExtStatistics(Relation onerel, double totalrows, @@ -98,7 +99,7 @@ BuildRelationExtStatistics(Relation onerel, double totalrows, { ereport(WARNING, (errcode(ERRCODE_INVALID_OBJECT_DEFINITION), - errmsg("extended statistics \"%s.%s\" could not be collected for relation %s.%s", + errmsg("statistics object \"%s.%s\" could not be computed for relation \"%s.%s\"", stat->schema, stat->name, get_namespace_name(onerel->rd_rel->relnamespace), RelationGetRelationName(onerel)), @@ -110,7 +111,7 @@ BuildRelationExtStatistics(Relation onerel, double totalrows, Assert(bms_num_members(stat->columns) >= 2 && bms_num_members(stat->columns) <= STATS_MAX_DIMENSIONS); - /* compute statistic of each type */ + /* compute statistic of each requested type */ foreach(lc2, stat->types) { char t = (char) lfirst_int(lc2); @@ -160,7 +161,7 @@ statext_is_kind_built(HeapTuple htup, char type) } /* - * Return a list (of StatExtEntry) of statistics for the given relation. + * Return a list (of StatExtEntry) of statistics objects for the given relation. */ static List * fetch_statentries_for_relation(Relation pg_statext, Oid relid) @@ -171,7 +172,7 @@ fetch_statentries_for_relation(Relation pg_statext, Oid relid) List *result = NIL; /* - * Prepare to scan pg_statistic_ext for entries having indrelid = this + * Prepare to scan pg_statistic_ext for entries having stxrelid = this * rel. */ ScanKeyInit(&skey, @@ -329,7 +330,7 @@ statext_store(Relation pg_stext, Oid statOid, /* there should already be a pg_statistic_ext tuple */ oldtup = SearchSysCache1(STATEXTOID, ObjectIdGetDatum(statOid)); if (!HeapTupleIsValid(oldtup)) - elog(ERROR, "cache lookup failed for extended statistics %u", statOid); + elog(ERROR, "cache lookup failed for statistics object %u", statOid); /* replace it */ stup = heap_modify_tuple(oldtup, @@ -433,7 +434,7 @@ multi_sort_compare_dims(int start, int end, /* * has_stats_of_kind - * Check that the list contains statistic of a given kind + * Check whether the list contains statistic of a given kind */ bool has_stats_of_kind(List *stats, char requiredkind) @@ -458,11 +459,12 @@ has_stats_of_kind(List *stats, char requiredkind) * there's no match. * * The current selection criteria is very simple - we choose the statistics - * referencing the most attributes with the least keys. + * object referencing the most of the requested attributes, breaking ties + * in favor of objects with fewer keys overall. * - * XXX if multiple statistics exists of the same size matching the same number - * of keys, then the statistics which are chosen depend on the order that they - * appear in the stats list. Perhaps this needs to be more definitive. + * XXX if multiple statistics objects tie on both criteria, then which object + * is chosen depends on the order that they appear in the stats list. Perhaps + * further tiebreakers are needed. */ StatisticExtInfo * choose_best_statistics(List *stats, Bitmapset *attnums, char requiredkind) @@ -479,7 +481,7 @@ choose_best_statistics(List *stats, Bitmapset *attnums, char requiredkind) int numkeys; Bitmapset *matched; - /* skip statistics that are not the correct type */ + /* skip statistics that are not of the correct type */ if (info->kind != requiredkind) continue; @@ -495,9 +497,9 @@ choose_best_statistics(List *stats, Bitmapset *attnums, char requiredkind) numkeys = bms_num_members(info->keys); /* - * Use these statistics when it increases the number of matched - * clauses or when it matches the same number of attributes but these - * stats have fewer keys than any previous match. + * Use this object when it increases the number of matched clauses or + * when it matches the same number of attributes but these stats have + * fewer keys than any previous match. */ if (num_matched > best_num_matched || (num_matched == best_num_matched && numkeys < best_match_keys)) diff --git a/src/backend/statistics/mvdistinct.c b/src/backend/statistics/mvdistinct.c index f67f5762360..47b2490abbf 100644 --- a/src/backend/statistics/mvdistinct.c +++ b/src/backend/statistics/mvdistinct.c @@ -9,7 +9,7 @@ * The multivariate ndistinct coefficients address this by storing ndistinct * estimates for combinations of the user-specified columns. So for example * given a statistics object on three columns (a,b,c), this module estimates - * and store n-distinct for (a,b), (a,c), (b,c) and (a,b,c). The per-column + * and stores n-distinct for (a,b), (a,c), (b,c) and (a,b,c). The per-column * estimates are already available in pg_statistic. * * @@ -18,6 +18,7 @@ * * IDENTIFICATION * src/backend/statistics/mvdistinct.c + * *------------------------------------------------------------------------- */ #include "postgres.h" @@ -131,13 +132,13 @@ statext_ndistinct_load(Oid mvoid) htup = SearchSysCache1(STATEXTOID, ObjectIdGetDatum(mvoid)); if (!htup) - elog(ERROR, "cache lookup failed for statistics %u", mvoid); + elog(ERROR, "cache lookup failed for statistics object %u", mvoid); ndist = SysCacheGetAttr(STATEXTOID, htup, Anum_pg_statistic_ext_stxndistinct, &isnull); if (isnull) elog(ERROR, - "requested statistic kind %c not yet built for statistics %u", + "requested statistic kind %c is not yet built for statistics object %u", STATS_EXT_NDISTINCT, mvoid); ReleaseSysCache(htup); |
