Make pg_statistic and related code account more honestly for collations.

When we first put in collations support, we basically punted on teaching pg_statistic, ANALYZE, and the planner selectivity functions about that. They've just used DEFAULT_COLLATION_OID independently of the actual collation of the data. It's time to improve that, so: * Add columns to pg_statistic that record the specific collation associated with each statistics slot. * Teach ANALYZE to use the column's actual collation when comparing values for statistical purposes, and record this in the appropriate slot. (Note that type-specific typanalyze functions are now expected to fill stats->stacoll with the appropriate collation, too.) * Teach assorted selectivity functions to use the actual collation of the stats they are looking at, instead of just assuming it's DEFAULT_COLLATION_OID. This should give noticeably better results in selectivity estimates for columns with nondefault collations, at least for query clauses that use that same collation (which would be the default behavior in most cases). It's still true that comparisons with explicit COLLATE clauses different from the stored data's collation won't be well-estimated, but that's no worse than before. Also, this patch does make the first step towards doing better with that, which is that it's now theoretically possible to collect stats for a collation other than the column's own collation. Patch by me; thanks to Peter Eisentraut for review. Discussion: https://postgr.es/m/14706.1544630227@sss.pgh.pa.us
author: Tom Lane 2018-12-14 17:52:49 +0000
committer: Tom Lane 2018-12-14 17:52:49 +0000
commit: 5e09280057a4c3f5db297348ea3e044c9c5f4ef8 (patch)
tree: a153ceede13d3b807d48d420896b6763d44c9086 /src/backend/statistics
parent: 8fb569e978af3995f0dd6b0033758ec571aab0c1 (diff)
3 files changed, 12 insertions, 6 deletions
diff --git a/src/backend/statistics/dependencies.c b/src/backend/statistics/dependencies.c
index 140783cfb3a..58d0df20f69 100644
--- a/src/backend/statistics/dependencies.c
+++ b/src/backend/statistics/dependencies.c
@@ -252,6 +252,9 @@ dependency_degree(int numrows, HeapTuple *rows, int k, AttrNumber *dependency,
 	 * (b) split the data into groups by first (k-1) columns
 	 *
 	 * (c) for each group count different values in the last column
+	 *
+	 * We use the column data types' default sort operators and collations;
+	 * perhaps at some point it'd be worth using column-specific collations?
 	 */
 
 	/* prepare the sort function for the first dimension, and SortItem array */
@@ -266,7 +269,7 @@ dependency_degree(int numrows, HeapTuple *rows, int k, AttrNumber *dependency,
 				 colstat->attrtypid);
 
 		/* prepare the sort function for this dimension */
-		multi_sort_add_dimension(mss, i, type->lt_opr);
+		multi_sort_add_dimension(mss, i, type->lt_opr, type->typcollation);
 
 		/* accumulate all the data for both columns into an array and sort it */
 		for (j = 0; j < numrows; j++)
diff --git a/src/backend/statistics/extended_stats.c b/src/backend/statistics/extended_stats.c
index 5dcee95250a..082f0506da0 100644
--- a/src/backend/statistics/extended_stats.c
+++ b/src/backend/statistics/extended_stats.c
@@ -363,18 +363,18 @@ multi_sort_init(int ndims)
 }
 
 /*
- * Prepare sort support info using the given sort operator
+ * Prepare sort support info using the given sort operator and collation
  * at the position 'sortdim'
  */
 void
-multi_sort_add_dimension(MultiSortSupport mss, int sortdim, Oid oper)
+multi_sort_add_dimension(MultiSortSupport mss, int sortdim,
+						 Oid oper, Oid collation)
 {
 	SortSupport ssup = &mss->ssup[sortdim];
 
 	ssup->ssup_cxt = CurrentMemoryContext;
-	ssup->ssup_collation = DEFAULT_COLLATION_OID;
+	ssup->ssup_collation = collation;
 	ssup->ssup_nulls_first = false;
-	ssup->ssup_cxt = CurrentMemoryContext;
 
 	PrepareSortSupportFromOrderingOp(oper, ssup);
 }
diff --git a/src/backend/statistics/mvdistinct.c b/src/backend/statistics/mvdistinct.c
index 593c2198396..3071e42d864 100644
--- a/src/backend/statistics/mvdistinct.c
+++ b/src/backend/statistics/mvdistinct.c
@@ -454,6 +454,9 @@ ndistinct_for_combination(double totalrows, int numrows, HeapTuple *rows,
 	/*
 	 * For each dimension, set up sort-support and fill in the values from the
 	 * sample data.
+	 *
+	 * We use the column data types' default sort operators and collations;
+	 * perhaps at some point it'd be worth using column-specific collations?
 	 */
 	for (i = 0; i < k; i++)
 	{
@@ -466,7 +469,7 @@ ndistinct_for_combination(double totalrows, int numrows, HeapTuple *rows,
 				 colstat->attrtypid);
 
 		/* prepare the sort function for this dimension */
-		multi_sort_add_dimension(mss, i, type->lt_opr);
+		multi_sort_add_dimension(mss, i, type->lt_opr, type->typcollation);
 
 		/* accumulate all the data for this dimension into the arrays */
 		for (j = 0; j < numrows; j++)
author	Tom Lane	2018-12-14 17:52:49 +0000
committer	Tom Lane	2018-12-14 17:52:49 +0000
commit	5e09280057a4c3f5db297348ea3e044c9c5f4ef8 (patch)
tree	a153ceede13d3b807d48d420896b6763d44c9086 /src/backend/statistics
parent	8fb569e978af3995f0dd6b0033758ec571aab0c1 (diff)