浅析pg_hint_plan使用

原创于 2025-05-20 14:13:30 发布 · 683 阅读

23 ·

CC 4.0 BY-SA版权

文章标签：

#数据库

文章目录

环境
文档用途
详细信息

环境

系统平台：N/A
版本：9.0,6.0,4.5

文档用途

介绍pg_hint_plan的使用。此扩展插件允许使用“提示”调整执行计划。

详细信息

1.安装
下载pg_hint_plan，支持pg10及以上版本。

注：下载时选择对应版本的branches

构建模块

tar -zxvf pg_hint_plan-REL12_1_3_9.tar.gz

cd pg_hint_plan-REL12_1_3_9

make

make install

加载扩展

pg_hint_plan 不需要创建扩展.使用 LOAD 命令加载即可激活它。当然也可以通过创建扩展来全局加载。

–load生效

LOAD 'pg_hint_plan';

–全局加载

create extension pg_hint_plan;

alter system set shared_preload_libraries = pg_hint_plan;

–全局加载，修改参数shared_preload_libraries需重启数据库生效

2.基本使用
pg_hint_plan 读取给定 SQL 语句的特殊形式的注释中的提示短语。提示短语“/+”开头，以“/”结尾，提示短语由提示名称和参数组成，这些名称和参数用括号括起来，并用空格分隔。提示短语可以使用换行符以提高可读性。

使用示例：

create table test_hint(id int);

insert into test_hint select * from generate_series(1,10);

create index idx_test_hint on test_hint (id);



--提示使用索引

/*+

    IndexScan(test_hint)

*/

explain analyze select * from test_hint where id=1;

3.提示表
提示是以一种特殊形式添加到sql中，如果无法编辑sql，这将很不方便。对于这种情况，可以将提示放在一个名为 hint_plan.hints 的特殊表中。在不编辑sql的情况下改变sql的执行计划。这也是固定sql执行计划的一种方式。
该表由以下列组成：

列															描述
id										用于标识提示行的唯一编号。此列按顺序自动填充。
norm_query_string			与要提示匹配的sql。查询中的常量必须替换为 ? 。空格在模式中很重要。
application_name			要应用提示的会话的application_name值。如psql。如果字段值为空表示匹配任何的application_name .
hint									提示短语。这必须是一系列提示，不包括周围的注释标记。

提示表示例

–设置提示表，固定执行计划：当扫描表est_hint时使用SeqScan

INSERT INTO hint_plan.hints(norm_query_string,application_name,hints)

VALUES (

    'explain analyze select * from test_hint where id=?;',

    '',

    'SeqScan(test_hint)'

);



--开启参数，启用提示表，否则无效。一开始怎么都不生效，还以为存在bug呢。。

set pg_hint_plan.enable_hint_table=on;



--执行sql，会走固定执行计划

explain analyze select * from test_hint where id=1;



postgres=# explain analyze select * from test_hint where id=1;

                                            QUERY PLAN                                            

---------------------------------------------------------------------------------------------------

 Seq Scan on test_hint  (cost=0.00..1.12 rows=1 width=4) (actual time=0.015..0.019 rows=1 loops=1)

   Filter: (id = 1)

   Rows Removed by Filter: 9

 Planning Time: 0.261 ms

 Execution Time: 0.038 ms

(5 rows)



--更改注释，走IndexScan

UPDATE hint_plan.hints SET hints = 'IndexScan(test_hint)' WHERE id = 1;

explain analyze select * from test_hint where id=1;



postgres=# explain analyze select * from test_hint where id=1;

                                                       QUERY PLAN                                                        

-------------------------------------------------------------------------------------------------------------------------

 Index Scan using idx_test_hint on test_hint  (cost=0.14..8.15 rows=1 width=4) (actual time=0.015..0.017 rows=1 loops=1)

   Index Cond: (id = 1)

 Planning Time: 0.185 ms

 Execution Time: 0.041 ms

(4 rows)



--删除注释

DELETE FROM hint_plan.hints WHERE id = 1;

提示表归创建者用户所有，在创建时 CREATE EXTENSION 具有缺省权限。

4. 提示类型

提示短语根据对象的种类及其对规划的影响分为六种类型：扫描方法、连接方法、连接顺序、行数更正、并行查询和GUC参数设置。

扫描方法提示：扫描方法提示对目标表强制执行特定的扫描方法。pg_hint_plan通过别名（如果有）来识别目标表。在这种提示中，它们是“SeqScan”、“IndexScan”等。扫描提示对普通表、继承表、表、 UNLOGGED表、临时表和系统目录有效。外部（外部）表、表函数、VALUES子句、CTE、视图和子查询不受影响。

连接方法的提示：联接方法提示强制执行涉及指定表的联接的联接方法。这只会影响普通表、继承表、 UNLOGGED 表、临时表、外部（外部）表、系统目录、表函数、 VALUES 命令结果和 CTE 上的联接。但视图和子查询的联接不受影响。

/*+

    NestLoop(t1 t2)

    MergeJoin(t1 t2 t3)

    Leading(t1 t2 t3)

 */

SELECT * FROM table1 t1

    JOIN table table2 t2 ON (t1.key = t2.key)

    JOIN table table3 t3 ON (t2.key = t3.key);

连接顺序提示：“Leading”提示在两个或多个表上强制执行联接顺序。有两种强制执行方式。一种是强制执行特定的联接顺序，但不限制每个联接级别的方向。另一种方式也是强制执行联接方向。

/*+ Leading ((t1 (t2 t3))) */ SELECT …

行数修正提示：“row”提示纠正了由于规划器的限制而导致的联接行数错误估计。

/*+ Rows(a b #10) */ SELECT… ; Sets rows of join result to 10

/*+ Rows(a b +10) */ SELECT… ; Increments row number by 10

/*+ Rows(a b -10) */ SELECT… ; Subtracts 10 from the row number.

/*+ Rows(a b *10) */ SELECT… ; Makes the number 10 times larger.

并行计划的提示： “Parallel” 提示在扫描时强制执行并行执行配置。第三个参数指定强制执行的强度。“soft”意味着pg_hint_plan只改变 max_parallel_workers_per_gather ，而将所有其他事情留给计划器。“hard”更改其他计划器参数，以便强制应用该数字。这可能会影响普通表、继承父表、UNLOGGED表和系统目录。外部表、表函数、VALUES 子句、CTE、 VALUES 视图和子查询不受影响。视图的内部表可以通过其真实名称/别名指定为目标对象。以下示例显示对每个表的查询实施方式不同。

explain /*+ Parallel(c1 3 hard) Parallel(c2 5 hard) */

       SELECT c2.a FROM c1 JOIN c2 ON (c1.a = c2.a);

GUC参数临时设置：“Set”提示会在规划器中临时更改全局参数。

/*+ Set(random_page_cost 2.0) */

SELECT * FROM table1 t1 WHERE key = 'value';

5. 支持的提示
支持的提示列表

Group Format Description
Scan method SeqScan(table) Forces sequential scan on the table.

IndexScan(table[ index…]) Forces TID scan on the table.

TidScan(table) Forces index scan on the table. Restricts to specified indexes if any.

IndexOnlyScan(table[ index…]) Forces index only scan on the table. Restricts to specified indexes if any. Index scan may be used if index only scan is not available.

BitmapScan(table[ index…]) Forces bitmap scan on the table. Restricts to specified indexes if any.

IndexScanRegexp(table[ POSIX regexp…]) Forces index scan on the table. Restricts to indexes that match the specified POSIX regular expression.

IndexOnlyScanRegexp(table[ POSIX regexp…]) Forces index only scan on the table. Restricts to indexes that match the specified POSIX regular expression.

BitmapScanRegexp(table[ POSIX regexp…]) Forces bitmap scan on the table. Restricts to indexes that match the specified POSIX regular expression.

NoSeqScan(table) Forces not to do sequential scan on the table.

NoTidScan(table) Forces not to do TID scan on the table.

NoIndexScan(table) Forces not to do index scan and index only scan on the table.

NoIndexOnlyScan(table) Forces not to do index only scan on the table.

NoBitmapScan(table) Forces not to do bitmap scan on the table.
Join method NestLoop(table table[ table…]) Forces nested loop for the joins with the specified tables.

HashJoin(table table[ table…]) Forces hash join for the joins with the specified tables.

MergeJoin(table table[ table…]) Forces merge join for the joins with the specified tables.

NoNestLoop(table table[ table…]) Forces not to do nested loop for the joins with the specified tables.

NoHashJoin(table table[ table…]) Forces not to do hash join for the joins with the specified tables.

NoMergeJoin(table table[ table…]) Forces not to do merge join for the joins with the specified tables.
Join order Leading(table table[ table…]) Forces join order as specified.

Leading() Forces join order and directions as specified. A join pair is a pair of tables and/or other join pairs enclosed in parentheses, which can make a nested structure.
Row number correction Rows(table table[ table…] correction) Corrects row number of a result of the joins with the specified tables. The available correction methods are absolute (#), addition (+), subtraction (-) and multiplication (*). should be a string that strtod() can read.
Parallel query configuration Parallel(table <# of workers> [soft|hard]) Enforce or inhibit parallel execution of the specified table. <# of workers> is the desired number of parallel workers, where zero means inhibiting parallel execution. If the third parameter is soft (default), it just changes max_parallel_workers_per_gather and leaves everything else to the planner. The hard value enforces the specified number of workers.
GUC Set(GUC-parameter value) Set the GUC parameter to the value while planner is running.

6. 使用示例

--准备测试环境

create table test_t1 (id int,name text);

INSERT INTO test_t1 SELECT generate_series(1,100),md5(random()::text);



create table test_t2 (id int,name text,create_time timestamp);

INSERT INTO test_t2 SELECT generate_series(1,1000000),md5(random()::text),clock_timestamp();



analyze test_t1;

analyze test_t2;



explain (analyze,buffers) select t1.id,t2.create_time from test_t1 as t1 join test_t2 as t2 on (t1.id=t2.id);



postgres=# explain (analyze,buffers) select t1.id,t2.create_time from test_t1 as t1 join test_t2 as t2 on (t1.id=t2.id);

                                                               QUERY PLAN                                                                

-----------------------------------------------------------------------------------------------------------------------------------------

 Gather  (cost=1003.25..16088.84 rows=100 width=12) (actual time=0.748..54.416 rows=100 loops=1)

   Workers Planned: 2

   Workers Launched: 2

   Buffers: shared hit=9545

   ->  Hash Join  (cost=3.25..15078.84 rows=42 width=12) (actual time=27.641..44.877 rows=33 loops=3)

         Hash Cond: (t2.id = t1.id)

         Buffers: shared hit=9545

         ->  Parallel Seq Scan on test_t2 t2  (cost=0.00..13512.67 rows=416667 width=12) (actual time=0.008..20.190 rows=333333 loops=3)

               Buffers: shared hit=9346

         ->  Hash  (cost=2.00..2.00 rows=100 width=4) (actual time=0.047..0.047 rows=100 loops=3)

               Buckets: 1024  Batches: 1  Memory Usage: 12kB

               Buffers: shared hit=3

               ->  Seq Scan on test_t1 t1  (cost=0.00..2.00 rows=100 width=4) (actual time=0.013..0.025 rows=100 loops=3)

                     Buffers: shared hit=3

 Planning:

   Buffers: shared hit=13

 Planning Time: 0.372 ms

 Execution Time: 54.466 ms

(18 rows)

--默认，优化器选择表连接方式为Hash Join，对没有索引的大表来说Hash Join是最好的选择。



postgres=# /*+

postgres*#     NestLoop(t1 t2)

postgres*# */

postgres-# explain (analyze,buffers) select t1.id,t2.create_time from test_t1 as t1 join test_t2 as t2 on (t1.id=t2.id);



                                                               QUERY PLAN                                                                

-----------------------------------------------------------------------------------------------------------------------------------------

 Gather  (cost=1000.00..1368690.42 rows=100 width=12) (actual time=0.549..4165.511 rows=100 loops=1)

   Workers Planned: 2

   Workers Launched: 2

   Buffers: shared hit=1009346

   ->  Nested Loop  (cost=0.00..1367680.42 rows=42 width=12) (actual time=2761.049..4149.089 rows=33 loops=3)

         Join Filter: (t1.id = t2.id)

         Rows Removed by Join Filter: 33333300

         Buffers: shared hit=1009346

         ->  Parallel Seq Scan on test_t2 t2  (cost=0.00..13512.67 rows=416667 width=12) (actual time=0.008..14.851 rows=333333 loops=3)

               Buffers: shared hit=9346

         ->  Seq Scan on test_t1 t1  (cost=0.00..2.00 rows=100 width=4) (actual time=0.001..0.006 rows=100 loops=1000000)

               Buffers: shared hit=1000000

 Planning Time: 0.266 ms

 Execution Time: 4165.565 ms

(14 rows)



--使用hint提示，走NestLoop连接方式，时间会更长，优化器选择使用Hash Join连接方式，成本是最低的。



--设置并行度提示

/*+

  Parallel(test_t2 2 hard)

*/

explain (analyze,buffers) select id,name from test_t2 order by create_time;



postgres=# /*+ Parallel(test_t2 2 hard) */

postgres-# explain (analyze,buffers) select id,name from test_t2 order by create_time;

                                                            QUERY PLAN                                                            

----------------------------------------------------------------------------------------------------------------------------------

 Gather Merge  (cost=52713.34..149942.43 rows=833334 width=45) (actual time=97.624..226.105 rows=1000000 loops=1)

   Workers Planned: 2

   Workers Launched: 2

   Buffers: shared hit=9460, temp read=7098 written=7103

   ->  Sort  (cost=51713.31..52754.98 rows=416667 width=45) (actual time=81.667..104.352 rows=333333 loops=3)

         Sort Key: create_time

         Sort Method: external merge  Disk: 30720kB

         Buffers: shared hit=9460, temp read=7098 written=7103

         Worker 0:  Sort Method: external merge  Disk: 11248kB

         Worker 1:  Sort Method: external merge  Disk: 14816kB

         ->  Parallel Seq Scan on test_t2  (cost=0.00..0.00 rows=416667 width=45) (actual time=0.009..13.614 rows=333333 loops=3)

               Buffers: shared hit=9346

 Planning Time: 0.235 ms

 Execution Time: 257.256 ms

(14 rows)



--使用并行度提示，2个并行进行排序操作。

# 相关文档