summaryrefslogtreecommitdiff
path: root/src
diff options
context:
space:
mode:
authorTomas Vondra2017-07-28 22:00:09 +0000
committerTomas Vondra2017-07-31 01:18:22 +0000
commitf7dff6ae5b1a1eb16cc4b52308c741cc534faa4b (patch)
tree3b44e0e6844c9a16d17d04fc87c6c4f8e3c858e5 /src
parent69072746e165bf2b53b7f9851a4f7a2876cf5501 (diff)
Accept plan change in xc_groupby regression test
The plan changed in two ways. Firstly, the targetlists changed due to abandoning the custom distributed aggregation and reusing the upstream partial aggregation code. That means we're not prefixing the aggregate with schema name, etc. The plan also switches from distributed aggregation to plain aggregation with all the work done on top of a remote query. This happens simply due to costing, as the tables are tiny and two-phase aggregation has some overhead. The original implementation (as in XL 9.5) distributed the aggregate unconditionally, ignoring the costing. Parf of the problem is that the query groups by two columns from two different tables, resulting in overestimation of the number of groups. That means the optimizer thinks distributing the aggregation would not reduce the number of rows, which increases the cost estimate as each row requires network transfer and the finalize aggregate also depends on the number of input rows. We could make the tables larger and the optimizer would eventually switch to distributed aggregate. For example this seems to do the trick: insert into xc_groupby_tab1 select 1, mod(i,1000) from generate_series(1,20000) s(i); insert into xc_groupby_tab2 select 1, mod(i,1000) from generate_series(1,20000) s(i); But it does not seem worth it, considering it's just a workaround for the estimation issue and the increased duration. And we already have other regression tests testing plausible queries benefiting from distributed aggregation. So just accept the plan change.
Diffstat (limited to 'src')
-rw-r--r--src/test/regress/expected/xc_groupby.out54
1 files changed, 27 insertions, 27 deletions
diff --git a/src/test/regress/expected/xc_groupby.out b/src/test/regress/expected/xc_groupby.out
index cec10c1550..cb3d397ae3 100644
--- a/src/test/regress/expected/xc_groupby.out
+++ b/src/test/regress/expected/xc_groupby.out
@@ -5400,40 +5400,40 @@ select count(*), sum(xc_groupby_tab1.val * xc_groupby_tab2.val), avg(xc_groupby_
(4 rows)
explain (verbose true, costs false, nodes false) select count(*), sum(xc_groupby_tab1.val * xc_groupby_tab2.val), avg(xc_groupby_tab1.val*xc_groupby_tab2.val), sum(xc_groupby_tab1.val*xc_groupby_tab2.val)::float8/count(*), xc_groupby_tab1.val2, xc_groupby_tab2.val2 from xc_groupby_tab1 full outer join xc_groupby_tab2 on xc_groupby_tab1.val2 = xc_groupby_tab2.val2 group by xc_groupby_tab1.val2, xc_groupby_tab2.val2;
- QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+ QUERY PLAN
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
GroupAggregate
- Output: pg_catalog.count(*), pg_catalog.sum((sum((xc_groupby_tab1.val * xc_groupby_tab2.val)))), pg_catalog.avg((avg((xc_groupby_tab1.val * xc_groupby_tab2.val)))), ((pg_catalog.sum((sum((xc_groupby_tab1.val * xc_groupby_tab2.val)))))::double precision / (pg_catalog.count(*))::double precision), xc_groupby_tab1.val2, xc_groupby_tab2.val2
+ Output: count(*), sum((xc_groupby_tab1.val * xc_groupby_tab2.val)), avg((xc_groupby_tab1.val * xc_groupby_tab2.val)), ((sum((xc_groupby_tab1.val * xc_groupby_tab2.val)))::double precision / (count(*))::double precision), xc_groupby_tab1.val2, xc_groupby_tab2.val2
Group Key: xc_groupby_tab1.val2, xc_groupby_tab2.val2
-> Remote Subquery Scan on all
- Output: count(*), sum((xc_groupby_tab1.val * xc_groupby_tab2.val)), avg((xc_groupby_tab1.val * xc_groupby_tab2.val)), xc_groupby_tab1.val2, xc_groupby_tab2.val2
- -> GroupAggregate
- Output: count(*), sum((xc_groupby_tab1.val * xc_groupby_tab2.val)), avg((xc_groupby_tab1.val * xc_groupby_tab2.val)), xc_groupby_tab1.val2, xc_groupby_tab2.val2
- Group Key: xc_groupby_tab1.val2, xc_groupby_tab2.val2
- -> Sort
+ Output: xc_groupby_tab1.val2, xc_groupby_tab2.val2, xc_groupby_tab1.val, xc_groupby_tab2.val
+ Sort Key: xc_groupby_tab1.val2, xc_groupby_tab2.val2
+ -> Sort
+ Output: xc_groupby_tab1.val2, xc_groupby_tab2.val2, xc_groupby_tab1.val, xc_groupby_tab2.val
+ Sort Key: xc_groupby_tab1.val2, xc_groupby_tab2.val2
+ -> Merge Full Join
Output: xc_groupby_tab1.val2, xc_groupby_tab2.val2, xc_groupby_tab1.val, xc_groupby_tab2.val
- Sort Key: xc_groupby_tab1.val2, xc_groupby_tab2.val2
- -> Merge Full Join
- Output: xc_groupby_tab1.val2, xc_groupby_tab2.val2, xc_groupby_tab1.val, xc_groupby_tab2.val
- Merge Cond: (xc_groupby_tab1.val2 = xc_groupby_tab2.val2)
+ Merge Cond: (xc_groupby_tab1.val2 = xc_groupby_tab2.val2)
+ -> Remote Subquery Scan on all
+ Output: xc_groupby_tab1.val, xc_groupby_tab1.val2
+ Distribute results by H: val2
+ Sort Key: xc_groupby_tab1.val2
+ -> Sort
+ Output: xc_groupby_tab1.val, xc_groupby_tab1.val2
+ Sort Key: xc_groupby_tab1.val2
+ -> Seq Scan on public.xc_groupby_tab1
+ Output: xc_groupby_tab1.val, xc_groupby_tab1.val2
+ -> Materialize
+ Output: xc_groupby_tab2.val, xc_groupby_tab2.val2
-> Remote Subquery Scan on all
- Output: xc_groupby_tab1.val2, xc_groupby_tab1.val
+ Output: xc_groupby_tab2.val, xc_groupby_tab2.val2
Distribute results by H: val2
+ Sort Key: xc_groupby_tab2.val2
-> Sort
- Output: xc_groupby_tab1.val2, xc_groupby_tab1.val
- Sort Key: xc_groupby_tab1.val2
- -> Seq Scan on public.xc_groupby_tab1
- Output: xc_groupby_tab1.val2, xc_groupby_tab1.val
- -> Materialize
- Output: xc_groupby_tab2.val2, xc_groupby_tab2.val
- -> Remote Subquery Scan on all
- Output: xc_groupby_tab2.val2, xc_groupby_tab2.val
- Distribute results by H: val2
- -> Sort
- Output: xc_groupby_tab2.val2, xc_groupby_tab2.val
- Sort Key: xc_groupby_tab2.val2
- -> Seq Scan on public.xc_groupby_tab2
- Output: xc_groupby_tab2.val2, xc_groupby_tab2.val
+ Output: xc_groupby_tab2.val, xc_groupby_tab2.val2
+ Sort Key: xc_groupby_tab2.val2
+ -> Seq Scan on public.xc_groupby_tab2
+ Output: xc_groupby_tab2.val, xc_groupby_tab2.val2
(32 rows)
-- aggregates over aggregates