summaryrefslogtreecommitdiff
path: root/src/backend/optimizer
AgeCommit message (Collapse)Author
2017-10-19Fix handling of root->distribution during redistributionTomas Vondra
This fixes some remaining bugs in handling root->distribution, caused by the upper-planner pathification (in PostgreSQL 9.6). Prior to the pathification (so in PostgreSQL 9.5 and Postgres-XL 9.5), the root->distribution was used for two purposes: * To track distribution expected by ModifyTable (UPDATE,DELETE), so that grouping_planner() knew how to redistribute the data. * To communicate the resulting distribution from grouping_planner() back to standard_planner(). This worked fine in 9.5 as grouping_planner() was only dealing with a single remaining path (plan) when considering the redistribution, and so it was OK to tweak root->distribution. But since the pathification in 9.6 that is no longer true. There is no obvious reason why all the paths would have to share the same distribution, and we don't know which one will be the cheapest one. So from now on root->distribution is used to track the distribution expected by ModifyTable. Distribution for each path is available in path->distribution if needed. Note: We still use subroot->distribution to pass information about distribution of subqueries, though. But we only set it after the one cheapest path is selected.
2017-09-15Fix incorrect planning of grouping setsTomas Vondra
Commit 04f96689945462a4212047f03eb3281fb56bcf2f incorrectly allowed distributed grouping paths for grouping sets, causing failures in 'groupingsets' regression test suite. So fix that by making sure try_distributed_aggregation=false for plans with grouping sets.
2017-08-28Do not add any distribution to a dummy append nodePavan Deolasee
A dummy append node with no subpaths doesn't need any adjustment for distribution. This allows us to actually correct handle UPDATE/DELETE in some cases which were failing earlier.
2017-08-22Do not FQS NextValueExprPavan Deolasee
The target datanode must be determined after computing the next value. So let is go through regular planning. This fixes couple of regression failures.
2017-08-18Merge commit '21d304dfedb4f26d0d6587d9ac39b1b5c499bb55'Pavan Deolasee
This is the merge-base of PostgreSQL's master branch and REL_10_STABLE branch. This should be the last merge from PG's master branch into XL 10 branch. Subsequent merges must happen from REL_10_STABLE branch
2017-08-14Final pgindent + perltidy run for v10.Tom Lane
2017-08-08Use sort_pathkeys instead of query_pathkeys in standard_plannerTomas Vondra
When adding the top-level remote subquery, the code used query_pathkeys, but that seems to be incorrect as those may be group_pathkeys, as set by standard_qp_callback(). Consider this query from xc_groupby tests: select count(*) from xc_groupby_def where a is not null group by a order by 1; planned like this QUERY PLAN ------------------------------------------------------------ Remote Subquery Scan on all Output: count(*), a Sort Key: xc_groupby_def.a -> Sort Output: (count(*)), a Sort Key: (count(*)) -> HashAggregate Output: count(*), a Group Key: xc_groupby_def.a -> Seq Scan on public.xc_groupby_def Output: a, b Filter: (xc_groupby_def.a IS NOT NULL) (12 rows) That's clearly incorrect, because the final sort key should be count(*) and not xc_groupby_def.a (which is, in fact the group key). For some reason this did not cause issues on XL 9.5, but apparently the upper-planner pathification changed the code in a way that affected the top-level remote subquery. To fix this, simply use sort_pathkeys instead of query_pathkeys. That fixes the plans, and also identifies a number of additional plans in regression tests that were in fact incorrect (but no one noticed). Several plans stopped producing results with stable ordering, so fix that by adding an explicit ORDER BY clause.
2017-07-31Copy distribution information correctly to ProjectSet pathPavan Deolasee
ProjectSet is a new path type in PG 10 and we'd missed to copy the distribution information correctly to the path. This was resulting in failures in many regression test cases. Lack of distribution information, prevented the distributed query planner from adding a Remote Subplan node on top of the plan, thus resulting in local execution of the plan. Since the underlying table is actually a distributed table, local execution fails to fetch any data. Fix this by properly copying distribution info. Several regression failures are fixed automatically with this patch.
2017-07-31Refactor the construction of distributed grouping pathsTomas Vondra
The code generating distributed grouping paths was originally structured like this: if (try_distributed_aggregation) { ... } if (can_sort && try_distributed_aggregation) { ... } if (can_hash && try_distributed_aggregation) { ... } It's refactored like this, to resemble the upstream part of the code: if (try_distributed_aggregation) { ... if (can_sort) { ... } if (can_hash) { ... } }
2017-07-24When WCOs are present, disable direct foreign table modification.Robert Haas
If the user modifies a view that has CHECK OPTIONs and this gets translated into a modification to an underlying relation which happens to be a foreign table, the check options should be enforced. In the normal code path, that was happening properly, but it was not working properly for "direct" modification because the whole operation gets pushed to the remote side in that case and we never have an option to enforce the constraint against individual tuples. Fix by disabling direct modification when there is a need to enforce CHECK OPTIONs. Etsuro Fujita, reviewed by Kyotaro Horiguchi and by me. Discussion: http://postgr.es/m/f8a48f54-6f02-9c8a-5250-9791603171ee@lab.ntt.co.jp
2017-07-24Be more consistent about errors for opfamily member lookup failures.Tom Lane
Add error checks in some places that were calling get_opfamily_member or get_opfamily_proc and just assuming that the call could never fail. Also, standardize the wording for such errors in some other places. None of these errors are expected in normal use, hence they're just elog not ereport. But they may be handy for diagnosing omissions in custom opclasses. Rushabh Lathia found the oversight in RelationBuildPartitionKey(); I found the others by grepping for all callers of these functions. Discussion: https://postgr.es/m/CAGPqQf2R9Nk8htpv0FFi+FP776EwMyGuORpc9zYkZKC8sFQE3g@mail.gmail.com
2017-07-15Ensure grouping sets get properly distributed dataTomas Vondra
Grouping sets are stricter about distribution of input data, as all the execution happens on the coordinator - there is no support for partial grouping sets yet, so we can either push all the grouping set work to the remote node (if all the sets include the distribution key), or make sure that there is a Remote Subquery on the input path. This is what Postgres-XL 9.6 was doing, but it got lost during merge with PostgreSQL 10 which significantly reworked this part of the code. Two queries still produce incorrect result, but those are not actually using the grouping sets paths because GROUP BY GROUPING SETS (a, b) gets transformed into simple GROUP BY a, b and ends up using parallel aggregation. The bug seems to be that the sort orders mismatch for some reason - the remote part produces data sorted by "a" but the "Finalize GroupAggregate" expects input sorted by "a, b" leading to duplicate groups in the result.
2017-07-14Code review for NextValueExpr expression node type.Tom Lane
Add missing infrastructure for this node type, notably in ruleutils.c where its lack could demonstrably cause EXPLAIN to fail. Add outfuncs/readfuncs support. (outfuncs support is useful today for debugging purposes. The readfuncs support may never be needed, since at present it would only matter for parallel query and NextValueExpr should never appear in a parallelizable query; but it seems like a bad idea to have a primnode type that isn't fully supported here.) Teach planner infrastructure that NextValueExpr is a volatile, parallel-unsafe, non-leaky expression node with cost cpu_operator_cost. Given its limited scope of usage, there *might* be no live bug today from the lack of that knowledge, but it's certainly going to bite us on the rear someday. Teach pg_stat_statements about the new node type, too. While at it, also teach cost_qual_eval() that MinMaxExpr, SQLValueFunction, XmlExpr, and CoerceToDomain should be charged as cpu_operator_cost. Failing to do this for SQLValueFunction was an oversight in my commit 0bb51aa96. The others are longer-standing oversights, but no time like the present to fix them. (In principle, CoerceToDomain could have cost much higher than this, but it doesn't presently seem worth trying to examine the domain's constraints here.) Modify execExprInterp.c to execute NextValueExpr as an out-of-line function; it seems quite unlikely to me that it's worth insisting that it be inlined in all expression eval methods. Besides, providing the out-of-line function doesn't stop anyone from inlining if they want to. Adjust some places where NextValueExpr support had been inserted with the aid of a dartboard rather than keeping it in the same order as elsewhere. Discussion: https://postgr.es/m/23862.1499981661@sss.pgh.pa.us
2017-07-11Ensure all partitions of a partitioned table has the same distribution.Pavan Deolasee
To optimise and simplify XL's distributed query planning, we enforce that all partitions of a partitioned table use the same distribution strategy. We also put further restrictions that all columns in the partitions and the partitioned table has matching positions. This can cause some problems when tables have dropped columns etc, but we think it's far better to optimise XL's plans than supporting all corner cases. We can look at removing some of these restrictions later once the more usual queries run faster. These restrictions allow us to unconditionally push down Append and MergeAppend nodes to datanodes when these nodes are processing partitioned tables. Some regression tests currently fail because of these added restrictions. We would look at them in due course of time.
2017-07-09Properly redistribute results of Gather Merge nodesTomas Vondra
The optimizer was not generating correct distributed paths with Gather Merge nodes, because those nodes always looked as if the data was not distributed at all. There were two bugs causing this: 1) Gather Merge did not copy distribution from the subpath, leaving it NULL (as if running on coordinator), so no Remote Subquery needed. 2) create_grouping_paths() did not check if a Remote Subquery is needed on top of Gather Merge anyway. After fixing these two issues, we're now generating correct plans (at least judging by select_parallel regression suite).
2017-06-28Merge remote-tracking branch 'remotes/origin/master' into xl10develPavan Deolasee
This merges the current master branch of XL with the XL 10 development branch. Commits upto f72330316ea5796a2b11a05710b98eba4e706788 are included in this merge.
2017-06-27Merge PG10 master branch into xl10develPavan Deolasee
This commit merges PG10 branch upto commit 2710ccd782d0308a3fa1ab193531183148e9b626. Regression tests show no noteworthy additional failures. This merge includes major pgindent work done with the newer version of pgindent
2017-06-22Document partitioned_rels in create_modifytable_path header comment.Robert Haas
Etsuro Fujita, slightly adjusted by me. Discussion: http://postgr.es/m/e87c4a6d-23d7-5e7c-e8db-44ed418eb5d1@lab.ntt.co.jp
2017-06-22Update comment to account for table partitioning.Robert Haas
Ashutosh Bapat and Amit Langote Discussion: http://postgr.es/m/CAFjFpRcG_NaAv6cDHD-9VfGdvB8maAtSfB=fTQr5+kxP2_sXzg@mail.gmail.com
2017-06-21Phase 3 of pgindent updates.Tom Lane
Don't move parenthesized lines to the left, even if that means they flow past the right margin. By default, BSD indent lines up statement continuation lines that are within parentheses so that they start just to the right of the preceding left parenthesis. However, traditionally, if that resulted in the continuation line extending to the right of the desired right margin, then indent would push it left just far enough to not overrun the margin, if it could do so without making the continuation line start to the left of the current statement indent. That makes for a weird mix of indentations unless one has been completely rigid about never violating the 80-column limit. This behavior has been pretty universally panned by Postgres developers. Hence, disable it with indent's new -lpl switch, so that parenthesized lines are always lined up with the preceding left paren. This patch is much less interesting than the first round of indent changes, but also bulkier, so I thought it best to separate the effects. Discussion: https://postgr.es/m/E1dAmxK-0006EE-1r@gemulon.postgresql.org Discussion: https://postgr.es/m/30527.1495162840@sss.pgh.pa.us
2017-06-21Phase 2 of pgindent updates.Tom Lane
Change pg_bsd_indent to follow upstream rules for placement of comments to the right of code, and remove pgindent hack that caused comments following #endif to not obey the general rule. Commit e3860ffa4dd0dad0dd9eea4be9cc1412373a8c89 wasn't actually using the published version of pg_bsd_indent, but a hacked-up version that tried to minimize the amount of movement of comments to the right of code. The situation of interest is where such a comment has to be moved to the right of its default placement at column 33 because there's code there. BSD indent has always moved right in units of tab stops in such cases --- but in the previous incarnation, indent was working in 8-space tab stops, while now it knows we use 4-space tabs. So the net result is that in about half the cases, such comments are placed one tab stop left of before. This is better all around: it leaves more room on the line for comment text, and it means that in such cases the comment uniformly starts at the next 4-space tab stop after the code, rather than sometimes one and sometimes two tabs after. Also, ensure that comments following #endif are indented the same as comments following other preprocessor commands such as #else. That inconsistency turns out to have been self-inflicted damage from a poorly-thought-through post-indent "fixup" in pgindent. This patch is much less interesting than the first round of indent changes, but also bulkier, so I thought it best to separate the effects. Discussion: https://postgr.es/m/E1dAmxK-0006EE-1r@gemulon.postgresql.org Discussion: https://postgr.es/m/30527.1495162840@sss.pgh.pa.us
2017-06-21Initial pgindent run with pg_bsd_indent version 2.0.Tom Lane
The new indent version includes numerous fixes thanks to Piotr Stefaniak. The main changes visible in this commit are: * Nicer formatting of function-pointer declarations. * No longer unexpectedly removes spaces in expressions using casts, sizeof, or offsetof. * No longer wants to add a space in "struct structname *varname", as well as some similar cases for const- or volatile-qualified pointers. * Declarations using PG_USED_FOR_ASSERTS_ONLY are formatted more nicely. * Fixes bug where comments following declarations were sometimes placed with no space separating them from the code. * Fixes some odd decisions for comments following case labels. * Fixes some cases where comments following code were indented to less than the expected column 33. On the less good side, it now tends to put more whitespace around typedef names that are not listed in typedefs.list. This might encourage us to put more effort into typedef name collection; it's not really a bug in indent itself. There are more changes coming after this round, having to do with comment indentation and alignment of lines appearing within parentheses. I wanted to limit the size of the diffs to something that could be reviewed without one's eyes completely glazing over, so it seemed better to split up the changes as much as practical. Discussion: https://postgr.es/m/E1dAmxK-0006EE-1r@gemulon.postgresql.org Discussion: https://postgr.es/m/30527.1495162840@sss.pgh.pa.us
2017-06-21Final pgindent run with old pg_bsd_indent (version 1.3).Tom Lane
This is just to have a clean basis for comparison with the results of the new version (which will indeed end up reverting some of these changes...) Discussion: https://postgr.es/m/E1dAmxK-0006EE-1r@gemulon.postgresql.org Discussion: https://postgr.es/m/30527.1495162840@sss.pgh.pa.us
2017-06-19Avoid regressions in foreign-key-based selectivity estimates.Tom Lane
David Rowley found that the "use the smallest per-column selectivity" heuristic applied in some cases by get_foreign_key_join_selectivity() was badly off if the FK columns are independent, producing estimates much worse than we got before that code was added in 9.6. One case where that heuristic was used was for LEFT and FULL outer joins with the referenced rel on the outside of the join. But we should not really need to special-case those here. eqjoinsel() never has had such a special case; the correction is applied by calc_joinrel_size_estimate() instead. Let's just estimate such cases like inner joins and rely on that later adjustment. (I think there was something of a thinko here, in that the comments seem to be thinking about the selectivity as defined for semi/anti joins; but that shouldn't apply to left/full joins.) Add a regression test exercising such a case to show that this is sane in at least some cases. The other case where we used that heuristic was for SEMI/ANTI outer joins, either if the referenced rel was on the outside, or if it was on the inside but was part of a join within the RHS. In either case, the FK doesn't give us a lot of traction towards estimating the selectivity. To ensure that we don't have regressions from what happened before 9.6, let's punt by ignoring the FK in such cases and applying the traditional selectivity calculation. (We might be able to improve on that later, but for now I just want to be sure it's not worse than 9.5.) Report and patch by David Rowley, simplified a bit by me. Back-patch to 9.6 where this code was added. Discussion: https://postgr.es/m/CAKJS1f8NO8oCDcxrteohG6O72uU1saEVT9qX=R8pENr5QWerXw@mail.gmail.com
2017-06-16Handle NextValueExpr node correctly at various places.Pavan Deolasee
read/out functions for this node type were missing. So implement those functions. In addition, the FQS code path was not recongnizing this new node type correctly. Fix that too. The ruleutils also missed ability to deparse this expression. For now we just emit a DEFAULT clause while deparsing NextValueExpr and assume that the remote node will do the necessary lookups to find the correct sequence and invoke nextval() on the sequence.
2017-06-15Merge 'remotes/PGSQL/master' into xl10develPavan Deolasee
Merge upstream master branch upto e800656d9a9b40b2f55afabe76354ab6d93353b3. Code compiles and regression works ok (with lots and lots of failures though).
2017-06-15Check for SQLValueFunction node in shippability walkerPavan Deolasee
2017-06-14Teach predtest.c about CHECK clauses to fix partitioning bugs.Robert Haas
In a CHECK clause, a null result means true, whereas in a WHERE clause it means false. predtest.c provided different functions depending on which set of semantics applied to the predicate being proved, but had no option to control what a null meant in the clauses provided as axioms. Add one. Use that in the partitioning code when figuring out whether the validation scan on a new partition can be skipped. Rip out the old logic that attempted (not very successfully) to compensate for the absence of the necessary support in predtest.c. Ashutosh Bapat and Robert Haas, reviewed by Amit Langote and incorporating feedback from Tom Lane. Discussion: http://postgr.es/m/CAFjFpReT_kq_uwU_B8aWDxR7jNGE=P0iELycdq5oupi=xSQTOw@mail.gmail.com
2017-06-14Merge from PG master upto d5cb3bab564e0927ffac7c8729eacf181a12dd40Pavan Deolasee
This is the result of the "git merge remotes/PGSQL/master" upto the said commit point. We have done some basic analysis, fixed compilation problems etc, but bulk of the logical problems in conflict resolution etc will be handled by subsequent commits.
2017-06-10Restrict subplan nodes even for equal distributionsTomas Vondra
Many regression tests were failing because the expected plan contains Remote Subquery Scan on all (datanode_1) but we were producing Remote Subquery Scan on any (datanode_1,datanode_2) Both those plans are in fact valid (at least on replicated tables). The difference is that in the first case the restriction was computed in adjust_subplan_distribution() while in the second case this did not happen (and instead will happen at execution time). The restriction is not applied because adjust_subplan_distribution() contains this condition if (subd && !equal(subd, pathd)) { ... restrict execution nodes ... } In Postgres-XL 9.6 the two distributions happen to be equal in some cases where where that was not the case in Postgres-XL 9.5. It's not entirely clear why this happens (it seems to be another consequence of the upper-planner pathification), but the consequence is that the restriction code is skipped. Removing the equal() call from the condition fixes all the regression failures caused by plans switching between any/all restrictions. In fact, this fixes all remaining regressions failures in five regression suites: create_view, subselect, aggregates, rangefuncs, xc_having. In the future, we will probably pathify adjust_subplan_distribution(), i.e. we'll probably get rid of it entirely and compute the restriction either when constructing the path or possibly defer it until execution time. Before the upper-planner pathification this was not possible.
2017-06-10Disable aggregate paths with extra COMBINE phaseTomas Vondra
The planner was generating aggregate paths with an additional COMBINE step pushed to the remote side, like this: QUERY PLAN --------------------------------------------------------- Finalize Aggregate -> Remote Subquery Scan on all (datanode1,datanode2) -> Partial Aggregate (Combine) -> Gather -> Partial Aggregate -> Parallel Seq Scan on public.t This was done with the goal to reduce the amount of data transmitted over network, and the amount of work to be done on a coordinator. Unfortunately, the upstream code seems not quite ready for such plans, leading to failures like this ERROR: variable not found in subplan target list for large amounts of data and high max_parallel_workers_per_gather. Those plans would still be quite beneficial, improving the scalability of Postgres-XL clusters in analytics. But we can reintroduce them once the targetlist issue gets fixed.
2017-06-08Add remote subquery step to recurse_set_operationsTomas Vondra
During the initial phase of resolving 9.6 merge conflicts in the planner we have switched back to a clean upstream code for some files (including prepunion.c). Then we reintroduced the XL-specific bits with necessary tweaks, caused particularly by upper-planner pathification. We have however forgot about this bit in recurse_set_operations, so this commit fixes that by adding the redistribution again. This fixes failures in collate, copyselect and union regression suites. Patch by senhu <senhu@tencent.com>, review and commit by me.
2017-06-04#ifdef out assorted unused GEQO code.Tom Lane
I'd always assumed that backend/optimizer/geqo/'s remarkably poor showing on code coverage metrics was because we weren't exercising it much in the regression tests. But it turns out that a good chunk of the problem is that there's a bunch of code that is physically unreachable (because the calls to it are #ifdef'd out in geqo_main.c) but is being built anyway. Making the called code have #if guards similar to the calling code saves a couple of kilobytes of executable size and should make the coverage numbers more reflective of reality. It's arguable that we should just delete all the unused recombination mechanisms altogether, but I didn't feel a need to go that far today.
2017-06-03Fix old corner-case logic error in final_cost_nestloop().Tom Lane
When costing a nestloop with stop-at-first-inner-match semantics, and a non-indexscan inner path, final_cost_nestloop() wants to charge the full scan cost of the inner rel at least once, with additional scans charged at inner_rescan_run_cost which might be less. However the logic for doing this effectively assumed that outer_matched_rows is at least 1. If it's zero, which is not unlikely for a small outer rel, we ended up charging inner_run_cost plus N times inner_rescan_run_cost, as much as double the correct charge for an outer rel with only one row that we're betting won't be matched. (Unless the inner rel is materialized, in which case it has very small inner_rescan_run_cost and the cost is not so far off what it should have been.) The upshot of this was that the planner had a tendency to select plans that failed to make effective use of the stop-at-first-inner-match semantics, and that might have Materialize nodes in them even when the predicted number of executions of the Materialize subplan was only 1. This was not so obvious before commit 9c7f5229a, because the case only arose in connection with semi/anti joins where there's not freedom to reverse the join order. But with the addition of unique-inner joins, it could result in some fairly bad planning choices, as reported by Teodor Sigaev. Indeed, some of the test cases added by that commit have plans that look dubious on closer inspection, and are changed by this patch. Fix the logic to ensure that we don't charge for too many inner scans. I chose to adjust it so that the full-freight scan cost is associated with an unmatched outer row if possible, not a matched one, since that seems like a better model of what would happen at runtime. This is a longstanding bug, but given the lesser impact in back branches, and the lack of field complaints, I won't risk a back-patch. Discussion: https://postgr.es/m/CAKJS1f-LzkUsFxdJ_-Luy38orQ+AdEXM5o+vANR+-pHAWPSecg@mail.gmail.com
2017-05-19Copy partitioned_rels lists to avoid shared substructure.Robert Haas
Otherwise, set_plan_refs() can get applied to the same list multiple times through different references, leading to chaos. Amit Langote, Dilip Kumar, and Robert Haas, reviewed by Ashutosh Bapat. Original report by Sveinn Sveinsson. Discussion: http://postgr.es/m/20170517141151.1435.79890@wrigleys.postgresql.org
2017-05-17Post-PG 10 beta1 pgindent runBruce Momjian
perltidy run not included.
2017-05-14Remove no-longer-needed fields of Hash plan nodes.Tom Lane
skewColType/skewColTypmod are no longer used in the wake of commit 9aab83fc5, and seem unlikely to be wanted in future, so let's drop 'em. Discussion: https://postgr.es/m/16364.1494520862@sss.pgh.pa.us
2017-05-14Standardize terminology for pg_statistic_ext entries.Tom Lane
Consistently refer to such an entry as a "statistics object", not just "statistics" or "extended statistics". Previously we had a mismash of terms, accompanied by utter confusion as to whether the term was singular or plural. That's not only grating (at least to the ear of a native English speaker) but could be outright misleading, eg in error messages that seemed to be referring to multiple objects where only one could be meant. This commit fixes the code and a lot of comments (though I may have missed a few). I also renamed two new SQL functions, pg_get_statisticsextdef -> pg_get_statisticsobjdef pg_statistic_ext_is_visible -> pg_statistics_obj_is_visible to conform better with this terminology. I have not touched the SGML docs other than fixing those function names; the docs certainly need work but it seems like a separable task. Discussion: https://postgr.es/m/22676.1494557205@sss.pgh.pa.us
2017-05-05remove pgxcpath.c and RemoteQueryPath, which are both unusedTomas Vondra
The pgxcpath.c file was not even built, so it's obviously dead code. As it was the only place referencing RemoteQueryPath, remove that structure too.
2017-05-05remove optimizer/pgxcplan.h, mostly a duplicate of pgxc/planner.hTomas Vondra
The header defined pretty much the same structures as pgxc/planner.h, but often with subtle differences (missing fields etc.). Furthermore, most of the the function prototypes were not actually implemented. Turns out, the header was only included in pgxcship.c, which only used a single function from the header. So just remove the file and make the function static within pgxcship.c.
2017-05-01Reduce semijoins with unique inner relations to plain inner joins.Tom Lane
If the inner relation can be proven unique, that is it can have no more than one matching row for any row of the outer query, then we might as well implement the semijoin as a plain inner join, allowing substantially more freedom to the planner. This is a form of outer join strength reduction, but it can't be implemented in reduce_outer_joins() because we don't have enough info about the individual relations at that stage. Instead do it much like remove_useless_joins(): once we've built base relations, we can make another pass over the SpecialJoinInfo list and get rid of any entries representing reducible semijoins. This is essentially a followon to the inner-unique patch (commit 9c7f5229a) and makes use of the proof machinery that that patch created. We need only minor refactoring of innerrel_is_unique's API to support this usage. Per performance complaint from Teodor Sigaev. Discussion: https://postgr.es/m/f994fc98-389f-4a46-d1bc-c42e05cb43ed@sigaev.ru
2017-05-01Fix mis-optimization of semijoins with more than one LHS relation.Tom Lane
The inner-unique patch (commit 9c7f5229a) supposed that if we're considering a JOIN_UNIQUE_INNER join path, we can always set inner_unique for the join, because the inner path produced by create_unique_path should be unique relative to the outer relation. However, that's true only if we're considering joining to the whole outer relation --- otherwise we may be applying only some of the join quals, and so the inner path might be non-unique from the perspective of this join. Adjust the test to only believe that we can set inner_unique if we have the whole semijoin LHS on the outer side. There is more that can be done in this area, but this commit is only intended to provide the minimal fix needed to get correct plans. Per report from Teodor Sigaev. Thanks to David Rowley for preliminary investigation. Discussion: https://postgr.es/m/f994fc98-389f-4a46-d1bc-c42e05cb43ed@sigaev.ru
2017-05-01Fire per-statement triggers on partitioned tables.Robert Haas
Even though no actual tuples are ever inserted into a partitioned table (the actual tuples are in the partitions, not the partitioned table itself), we still need to have a ResultRelInfo for the partitioned table, or per-statement triggers won't get fired. Amit Langote, per a report from Rajkumar Raghuwanshi. Reviewed by me. Discussion: http://postgr.es/m/CAKcux6%3DwYospCRY2J4XEFuVy0L41S%3Dfic7rmkbsU-GXhhSbmBg%40mail.gmail.com
2017-04-20Propagate the distribution up when (root->distribution==NULL).Tomas Vondra
Without this patch the planner was constricting paths like this: -> Material -> SubquerySubPath -> RemoteSubPath -> BitmapHeapScan which then led to failures to plan some of the UPDATE and DELETE queries in updatable_views. This fixes it, and makes the planner build the same paths as on XL 9.5 (without the RemoteSubPath, discarding the distribution info). This resolved all the 'could not plan distributed UPDATE/DELETE' in updatable_views, but not universally. So either this still is not entirely correct, or there's another independent bug.
2017-04-18Fix testing of parallel-safety of SubPlans.Tom Lane
is_parallel_safe() supposed that the only relevant property of a SubPlan was the parallel safety of the referenced subplan tree. This is wrong: the testexpr or args subtrees might contain parallel-unsafe stuff, as demonstrated by the test case added here. However, just recursing into the subtrees fails in a different way: we'll typically find PARAM_EXEC Params representing the subplan's output columns in the testexpr. The previous coding supposed that any Param must be treated as parallel-restricted, so that a naive attempt at fixing this disabled parallel pushdown of SubPlans altogether. We must instead determine, for any visited Param, whether it is one that would be computed by a surrounding SubPlan node; if so, it's safe to push down along with the SubPlan node. We might later be able to extend this logic to cope with Params used for correlated subplans and other cases; but that's a task for v11 or beyond. Tom Lane and Amit Kapila Discussion: https://postgr.es/m/7064.1492022469@sss.pgh.pa.us
2017-04-17Rename columns in new pg_statistic_ext catalogAlvaro Herrera
The new catalog reused a column prefix "sta" from pg_statistic, but this is undesirable, so change the catalog to use prefix "stx" instead. Also, rename the column that lists enabled statistic kinds as "stxkind" rather than "enabled". Discussion: https://postgr.es/m/CAKJS1f_2t5jhSN7huYRFH3w3rrHfG2QU7hiUHsu-Vdjd1rYT3w@mail.gmail.com
2017-04-17Always build a custom plan node's targetlist from the path's pathtarget.Tom Lane
We were applying the use_physical_tlist optimization to all relation scan plans, even those implemented by custom scan providers. However, that's a bad idea for a couple of reasons. The custom provider might be unable to provide columns that it hadn't expected to be asked for (for example, the custom scan might depend on an index-only scan). Even more to the point, there's no good reason to suppose that this "optimization" is a win for a custom scan; whatever the custom provider is doing is likely not based on simply returning physical heap tuples. (As a counterexample, if the custom scan is an interface to a column store, demanding all columns would be a huge loss.) If it is a win, the custom provider could make that decision for itself and insert a suitable pathtarget into the path, anyway. Per discussion with Dmitry Ivanov. Back-patch to 9.5 where custom scan support was introduced. The argument that the custom provider can adjust the behavior by changing the pathtarget only applies to 9.6+, but on balance it seems more likely that use_physical_tlist will hurt custom scans than help them. Discussion: https://postgr.es/m/e29ddd30-8ef9-4da5-a50b-2bb7b8c7198d@postgrespro.ru
2017-04-12Mark finished Plan nodes with parallel_safe flags.Tom Lane
We'd managed to avoid doing this so far, but it seems pretty obvious that it would be forced on us some day, and this is much the cleanest way of approaching the open problem that parallel-unsafe subplans are being transmitted to parallel workers. Anyway there's no space cost due to alignment considerations, and the time cost is pretty minimal since we're just copying the flag from the corresponding Path node. (At least in most cases ... some of the klugier spots in createplan.c have to work a bit harder.) In principle we could perhaps get rid of SubPlan.parallel_safe, but I thought it better to keep that in case there are reasons to consider a SubPlan unsafe even when its child plan is parallel-safe. This patch doesn't actually do anything with the new flags, but I thought I'd commit it separately anyway. Note: although this touches outfuncs/readfuncs, there's no need for a catversion bump because Plan trees aren't stored on disk. Discussion: https://postgr.es/m/87tw5x4vcu.fsf@credativ.de
2017-04-10Improve castNode notation by introducing list-extraction-specific variants.Tom Lane
This extends the castNode() notation introduced by commit 5bcab1114 to provide, in one step, extraction of a list cell's pointer and coercion to a concrete node type. For example, "lfirst_node(Foo, lc)" is the same as "castNode(Foo, lfirst(lc))". Almost half of the uses of castNode that have appeared so far include a list extraction call, so this is pretty widely useful, and it saves a few more keystrokes compared to the old way. As with the previous patch, back-patch the addition of these macros to pg_list.h, so that the notation will be available when back-patching. Patch by me, after an idea of Andrew Gierth's. Discussion: https://postgr.es/m/14197.1491841216@sss.pgh.pa.us
2017-04-08Clean up bugs in clause_selectivity() cleanup.Tom Lane
Commit ac2b09508 was not terribly carefully reviewed. Band-aid it to not fail on non-RestrictInfo input, per report from Andreas Seltenreich. Also make it do something more reasonable with variable-free clauses, and improve nearby comments. Discussion: https://postgr.es/m/87inmf5rdx.fsf@credativ.de