Adjust cost model for HashAgg that spills to disk.
authorJeff Davis <jdavis@postgresql.org>
Mon, 7 Sep 2020 20:31:59 +0000 (13:31 -0700)
committerJeff Davis <jdavis@postgresql.org>
Mon, 7 Sep 2020 20:31:59 +0000 (13:31 -0700)
Tomas Vondra observed that the IO behavior for HashAgg tends to be
worse than for Sort. Penalize HashAgg IO costs accordingly.

Also, account for the CPU effort of spilling the tuples and reading
them back.

Discussion: https://postgr.es/m/20200906212112.nzoy5ytrzjjodpfh@development
Reviewed-by: Tomas Vondra
Backpatch-through: 13

src/backend/optimizer/path/costsize.c

index fda4b2c6e875f513c8bf78c87d878e184664bc95..cd3716d494f033f2d2d1a753a291fc4606715120 100644 (file)
@@ -2416,6 +2416,7 @@ cost_agg(Path *path, PlannerInfo *root,
                double          pages;
                double          pages_written = 0.0;
                double          pages_read = 0.0;
+               double          spill_cost;
                double          hashentrysize;
                double          nbatches;
                Size            mem_limit;
@@ -2453,9 +2454,21 @@ cost_agg(Path *path, PlannerInfo *root,
                pages = relation_byte_size(input_tuples, input_width) / BLCKSZ;
                pages_written = pages_read = pages * depth;
 
+               /*
+                * HashAgg has somewhat worse IO behavior than Sort on typical
+                * hardware/OS combinations. Account for this with a generic penalty.
+                */
+               pages_read *= 2.0;
+               pages_written *= 2.0;
+
                startup_cost += pages_written * random_page_cost;
                total_cost += pages_written * random_page_cost;
                total_cost += pages_read * seq_page_cost;
+
+               /* account for CPU cost of spilling a tuple and reading it back */
+               spill_cost = depth * input_tuples * 2.0 * cpu_tuple_cost;
+               startup_cost += spill_cost;
+               total_cost += spill_cost;
        }
 
        /*