Fix cost_rescan() to account for multi-batch hashing correctly.
authorTom Lane <tgl@sss.pgh.pa.us>
Wed, 27 Jul 2016 21:44:34 +0000 (17:44 -0400)
committerTom Lane <tgl@sss.pgh.pa.us>
Wed, 27 Jul 2016 21:45:05 +0000 (17:45 -0400)
cost_rescan assumed that we don't need to rebuild the hash table when
rescanning a hash join.  However, that's currently only true for
single-batch joins; for a multi-batch join we must charge full freight.

This probably has escaped notice because we'd be unlikely to put a hash
join on the inside of a nestloop anyway.  Nonetheless, it's wrong.
Fix in HEAD, but don't backpatch for fear of destabilizing plans in
stable releases.

src/backend/optimizer/path/costsize.c

index 1c20edcdfeb2a348dd8cce66f19c0320d8bc9247..2a49639f1254a1e564169ff0cb4587a79b40f823 100644 (file)
@@ -3114,11 +3114,21 @@ cost_rescan(PlannerInfo *root, Path *path,
        case T_HashJoin:
 
            /*
-            * Assume that all of the startup cost represents hash table
-            * building, which we won't have to do over.
+            * If it's a single-batch join, we don't need to rebuild the hash
+            * table during a rescan.
             */
-           *rescan_startup_cost = 0;
-           *rescan_total_cost = path->total_cost - path->startup_cost;
+           if (((HashPath *) path)->num_batches == 1)
+           {
+               /* Startup cost is exactly the cost of hash table building */
+               *rescan_startup_cost = 0;
+               *rescan_total_cost = path->total_cost - path->startup_cost;
+           }
+           else
+           {
+               /* Otherwise, no special treatment */
+               *rescan_startup_cost = path->startup_cost;
+               *rescan_total_cost = path->total_cost;
+           }
            break;
        case T_CteScan:
        case T_WorkTableScan: