Add an explicit representation of the output targetlist to Paths.

Up to now, there's been an assumption that all Paths for a given relation compute the same output column set (targetlist). However, there are good reasons to remove that assumption. For example, an indexscan on an expression index might be able to return the value of an expensive function "for free". While we have the ability to generate such a plan today in simple cases, we don't have a way to model that it's cheaper than a plan that computes the function from scratch, nor a way to create such a plan in join cases (where the function computation would normally happen at the topmost join node). Also, we need this so that we can have Paths representing post-scan/join steps, where the targetlist may well change from one step to the next. Therefore, invent a "struct PathTarget" representing the columns we expect a plan step to emit. It's convenient to include the output tuple width and tlist evaluation cost in this struct, and there will likely be additional fields in future. While Path nodes that actually do have custom outputs will need their own PathTargets, it will still be true that most Paths for a given relation will compute the same tlist. To reduce the overhead added by this patch, keep a "default PathTarget" in RelOptInfo, and allow Paths that compute that column set to just point to their parent RelOptInfo's reltarget. (In the patch as committed, actually every Path is like that, since we do not yet have any cases of custom PathTargets.) I took this opportunity to provide some more-honest costing of PlaceHolderVar evaluation. Up to now, the assumption that "scan/join reltargetlists have cost zero" was applied not only to Vars, where it's reasonable, but also PlaceHolderVars where it isn't. Now, we add the eval cost of a PlaceHolderVar's expression to the first plan level where it can be computed, by including it in the PathTarget cost field and adding that to the cost estimates for Paths. This isn't perfect yet but it's much better than before, and there is a way forward to improve it more. This costing change affects the join order chosen for a couple of the regression tests, changing expected row ordering.
author: Tom Lane 2016-02-19 01:01:49 +0000
committer: Tom Lane 2016-02-19 01:02:03 +0000
commit: 19a541143a09c067ec8cac77ec6a64eb5b1b662b (patch)
tree: ced2ec5f03b4af139f51d0df7101e603b57bf081 /src/include/nodes
parent: 3386f34cdcf162e895a8d998532796076057913d (diff)
1 files changed, 37 insertions, 10 deletions
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index 96198aeec18..af8cb6be687 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -61,6 +61,25 @@ typedef struct AggClauseCosts
 	Size		transitionSpace;	/* space for pass-by-ref transition data */
 } AggClauseCosts;
 
+/*
+ * This struct contains what we need to know during planning about the
+ * targetlist (output columns) that a Path will compute.  Each RelOptInfo
+ * includes a default PathTarget, which its individual Paths may merely point
+ * to.  However, in some cases a Path may compute outputs different from other
+ * Paths, and in that case we make a custom PathTarget struct for it.  For
+ * example, an indexscan might return index expressions that would otherwise
+ * need to be explicitly calculated.
+ *
+ * Note that PathTarget.exprs is just a list of expressions; they do not have
+ * TargetEntry nodes on top, though those will appear in the finished Plan.
+ */
+typedef struct PathTarget
+{
+	List	   *exprs;			/* list of expressions to be computed */
+	QualCost	cost;			/* cost of evaluating the above */
+	int			width;			/* estimated avg width of result tuples */
+} PathTarget;
+
 
 /*----------
  * PlannerGlobal
@@ -334,17 +353,16 @@ typedef struct PlannerInfo
  *				if there is just one, a join relation if more than one
  *		rows - estimated number of tuples in the relation after restriction
  *			   clauses have been applied (ie, output rows of a plan for it)
- *		width - avg. number of bytes per tuple in the relation after the
- *				appropriate projections have been done (ie, output width)
  *		consider_startup - true if there is any value in keeping plain paths for
  *						   this rel on the basis of having cheap startup cost
  *		consider_param_startup - the same for parameterized paths
- *		reltargetlist - List of Var and PlaceHolderVar nodes for the values
- *						we need to output from this relation.
- *						List is in no particular order, but all rels of an
- *						appendrel set must use corresponding orders.
- *						NOTE: in an appendrel child relation, may contain
- *						arbitrary expressions pulled up from a subquery!
+ *		reltarget - Default Path output tlist for this rel; normally contains
+ *					Var and PlaceHolderVar nodes for the values we need to
+ *					output from this relation.
+ *					List is in no particular order, but all rels of an
+ *					appendrel set must use corresponding orders.
+ *					NOTE: in an appendrel child relation, may contain
+ *					arbitrary expressions pulled up from a subquery!
  *		pathlist - List of Path nodes, one for each potentially useful
  *				   method of generating the relation
  *		ppilist - ParamPathInfo nodes for parameterized Paths, if any
@@ -451,15 +469,16 @@ typedef struct RelOptInfo
 
 	/* size estimates generated by planner */
 	double		rows;			/* estimated number of result tuples */
-	int			width;			/* estimated avg width of result tuples */
 
 	/* per-relation planner control flags */
 	bool		consider_startup;		/* keep cheap-startup-cost paths? */
 	bool		consider_param_startup; /* ditto, for parameterized paths? */
 	bool		consider_parallel;		/* consider parallel paths? */
 
+	/* default result targetlist for Paths scanning this relation */
+	PathTarget	reltarget;		/* list of Vars/Exprs, cost, width */
+
 	/* materialization information */
-	List	   *reltargetlist;	/* Vars to be output by scan of relation */
 	List	   *pathlist;		/* Path structures */
 	List	   *ppilist;		/* ParamPathInfos used in pathlist */
 	List	   *partial_pathlist;	/* partial Paths */
@@ -744,6 +763,11 @@ typedef struct ParamPathInfo
  * the same Path type for multiple Plan types when there is no need to
  * distinguish the Plan type during path processing.
  *
+ * "parent" identifies the relation this Path scans, and "pathtarget"
+ * describes the precise set of output columns the Path would compute.
+ * In simple cases all Paths for a given rel share the same targetlist,
+ * which we represent by having path->pathtarget point to parent->reltarget.
+ *
  * "param_info", if not NULL, links to a ParamPathInfo that identifies outer
  * relation(s) that provide parameter values to each scan of this path.
  * That means this path can only be joined to those rels by means of nestloop
@@ -765,7 +789,10 @@ typedef struct Path
 	NodeTag		pathtype;		/* tag identifying scan/join method */
 
 	RelOptInfo *parent;			/* the relation this path can build */
+	PathTarget *pathtarget;		/* list of Vars/Exprs, cost, width */
+
 	ParamPathInfo *param_info;	/* parameterization info, or NULL if none */
+
 	bool		parallel_aware; /* engage parallel-aware logic? */
 	bool		parallel_safe;	/* OK to use as part of parallel plan? */
 	int			parallel_degree; /* desired parallel degree; 0 = not parallel */
author	Tom Lane	2016-02-19 01:01:49 +0000
committer	Tom Lane	2016-02-19 01:02:03 +0000
commit	19a541143a09c067ec8cac77ec6a64eb5b1b662b (patch)
tree	ced2ec5f03b4af139f51d0df7101e603b57bf081 /src/include/nodes
parent	3386f34cdcf162e895a8d998532796076057913d (diff)