diff options
| author | Tom Lane | 2011-02-18 00:00:49 +0000 |
|---|---|---|
| committer | Tom Lane | 2011-02-18 00:00:49 +0000 |
| commit | 52b60530f257b1591d8b72264cd6c0dd9aabfd46 (patch) | |
| tree | 6eda7c7cbeaac73debfe3526614a65a8db272a58 /src/include | |
| parent | de623f33353c96657651f9c3a6c8756616c610e4 (diff) | |
Fix tsmatchsel() to account properly for null rows.
ts_typanalyze.c computes MCE statistics as fractions of the non-null rows,
which seems fairly reasonable, and anyway changing it in released versions
wouldn't be a good idea. But then ts_selfuncs.c has to account for that.
Failure to do so results in overestimates in columns with a significant
fraction of null documents. Back-patch to 8.4 where this stuff was
introduced.
Jesper Krogh
Diffstat (limited to 'src/include')
| -rw-r--r-- | src/include/catalog/pg_statistic.h | 2 |
1 files changed, 2 insertions, 0 deletions
diff --git a/src/include/catalog/pg_statistic.h b/src/include/catalog/pg_statistic.h index f38921f1c69..927cd0b0471 100644 --- a/src/include/catalog/pg_statistic.h +++ b/src/include/catalog/pg_statistic.h @@ -246,6 +246,8 @@ typedef FormData_pg_statistic *Form_pg_statistic; * type with identifiable elements (for instance, tsvector). staop contains * the equality operator appropriate to the element type. stavalues contains * the most common element values, and stanumbers their frequencies. Unlike + * MCV slots, frequencies are measured as the fraction of non-null rows the + * element value appears in, not the frequency of all rows. Also unlike * MCV slots, the values are sorted into order (to support binary search * for a particular value). Since this puts the minimum and maximum * frequencies at unpredictable spots in stanumbers, there are two extra |
