當前位置：首頁 > 运维知识 > 数据库 >内容正文

数据库

pg_hint_plan 使用hint固定SQL执行计划

發布時間：2024/1/8 数据库 26 豆豆

生活随笔收集整理的這篇文章主要介紹了 pg_hint_plan 使用hint固定SQL执行计划小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

Plan Hint是PG社區官方版”永遠”不考慮引入的功能之一（參見PG TODO，查找”Oracle-style”），社區開發者的理念是，引入Hint功能，會掩蓋優化器本身的問題，導致缺陷不被暴露出來。但對于使用者來講，遇到某些SQL的查詢計劃不好，性能出了問題，其他方法又不奏效的情況下，首先的目標還是想盡快解決問題，而Hint就可以在這種時候幫助到我們。

民間的?pg_hint_plan插件支持類似oracle hint的功能，另外阿里云RDS for PG、PPAS也已經支持了Hint功能。

一、插件安裝及設置

test=> create extension pg_hint_plan; CREATE EXTENSION

修改所有用戶的session_preload_libraries

由于pg_hint_plan是用到了hook的，使用前一定要調用so中的_PG_init(void)初始化一下hook，因此用到了 alter role all set session_preload_libraries='pg_hint_plan';

postgres=> alter role all set session_preload_libraries = 'pg_hint_plan'; ALTER ROLE

只為本用戶設置（會話需要重新連接，以加載hook）

alter role 自己 set session_preload_libraries='pg_hint_plan';

如果只想在當前會話有效，可以用LOAD命令啟用：

postgres=# LOAD 'pg_hint_plan'; LOAD

二、?pg_hint_plan用法

整個query的第一個comment (/*+ */) 被pg_hint_plan用來固定執行計劃。如果有多個comment，也只認第一個，請務必注意。

1. 簡單案例

其中?/*+ seqscan(test) */ 和?/*+ bitmapscan(test) */ 就是指定的hint。

test=> create table test(id int primary key, info text); CREATE TABLE test=> insert into test select generate_series(1,100000); INSERT 0 100000test=> explain select * from test where id=1;QUERY PLAN -----------------------------------------------------------------------Index Scan using test_pkey on test (cost=0.29..8.31 rows=1 width=36)Index Cond: (id = 1) (2 rows)test=> /*+ seqscan(test) */ explain select * from test where id=1;QUERY PLAN ----------------------------------------------------------Seq Scan on test (cost=0.00..1124.11 rows=272 width=36)Filter: (id = 1) (2 rows)test=> /*+ bitmapscan(test) */ explain select * from test where id=1;QUERY PLAN ------------------------------------------------------------------------Bitmap Heap Scan on test (cost=4.30..8.31 rows=1 width=36)Recheck Cond: (id = 1)-> Bitmap Index Scan on test_pkey (cost=0.00..4.30 rows=1 width=0)Index Cond: (id = 1) (4 rows)

2. 較復雜案例

來看個復雜點的例子

需要固定為以下執行計劃

Limit (cost=1204.30..1204.33 rows=10 width=276) (actual time=11.131..11.132 rows=10 loops=1)-> Sort (cost=1204.30..1204.49 rows=75 width=276) (actual time=11.131..11.131 rows=10 loops=1)Sort Key: d.test02_timeSort Method: quicksort Memory: 33kB-> Nested Loop (cost=1.28..1202.68 rows=75 width=276) (actual time=0.085..11.095 rows=17 loops=1)-> Nested Loop (cost=0.85..1117.07 rows=62 width=101) (actual time=0.066..10.879 rows=13 loops=1)-> Index Scan using "abc" on test03 ir (cost=0.42..14.89 rows=10 width=38) (actual time=0.026..0.166 rows=96 loops=1)Index Cond: ((u_uuid)::text = 'ttttttttt'::text)-> Index Scan using "bcd" on test02 d (cost=0.43..109.86 rows=36 width=80) (actual time=0.109..0.111 rows=0 loops=96)Index Cond: ((test03_uuid)::text = (ir.uuid)::text)Filter: ((NOT is_sub) AND (NOT is_filter) AND ((test02_status)::text = ANY ('{test02ed,checked}'::text[])) AND ((status)::text = 'normal'::text))Rows Removed by Filter: 67-> Index Scan using "def" on test01 rd (cost=0.43..1.32 rows=6 width=192) (actual time=0.014..0.014 rows=1 loops=13)Index Cond: ((test02_uuid)::text = (d.uuid)::text)

分析以上執行計劃，需要固定的包括索引，嵌套循環，以及JOIN的順序，驅動順序。

SQL語句改成

/*+ NestLoop(ir d) NestLoop(ir d rd) Leading(((ir d) rd)) IndexScan(rd "def") IndexScan(d "bcd") IndexScan(ir "abc") */ SELECT xxxxFROM"test01" AS rdINNER JOIN "test02" AS d ON (rd.test02_uuid = d.uuid)INNER JOIN "test03" AS ir ON (d.test03_uuid = ir.uuid)WHEREd.status = 'normal'AND ir.u_uuid = 'ttttttttt' and (d.test02_status in ('test02ed','checked')) and d.is_sub = false and d.is_filter = false ORDER BY d.test02_time desc limit 10 offset 0;

3.?pg_hint_plan語法解釋

/*+ NestLoop(ir d) # 表示ir d兩個表使用嵌套循環JOIN，如果用了別名，請使用別名。 NestLoop(ir d rd) # 表示ir與d join完后再與rd JOIN。所以如果要固定JOIN順序，可以分多個JOIN hint來寫，就像上面這樣。 Leading(((ir d) rd)) # 表示JOIN順序和驅動順序，每一對JOIN對象都需要用括號表示，hash 和 nestloop JOIN 請務必注意括號內的別名或表名順序)。IndexScan(rd "def") # 指定索引，如果表名使用了別名，請使用別名。 IndexScan(d "bcd") IndexScan(ir "abc") 務必注意，如果對象名用了小寫和下劃線以外的字符。必須使用""引用起來。 */

三、 pg_hint_plan支持的Hint類型

支持的Hint有很多種，最新版參考pg_hint_plan文檔

GroupFormatDescription

Scan method	SeqScan(table)	Forces sequential scan on the table
	TidScan(table)	Forces TID scan on the table.
	IndexScan(table[ index...])	Forces index scan on the table. Restricts to specified indexes if any.
	IndexOnlyScan(table[ index...])	Forces index only scan on the table. Rstricts to specfied indexes if any. Index scan may be used if index only scan is not available. Available for PostgreSQL 9.2 and later.
	BitmapScan(table[ index...])	Forces bitmap scan on the table. Restoricts to specfied indexes if any.
	NoSeqScan(table)	Forces not to do sequential scan on the table.
	NoTidScan(table)	Forces not to do TID scan on the table.
	NoIndexScan(table)	Forces not to do index scan and index only scan (For PostgreSQL 9.2 and later) on the table.
	NoIndexOnlyScan(table)	Forces not to do index only scan on the table. Available for PostgreSQL 9.2 and later.
	NoBitmapScan(table)	Forces not to do bitmap scan on the table.
Join method	NestLoop(table table[ table...])	Forces nested loop for the joins consist of the specifiled tables.
	HashJoin(table table[ table...])	Forces hash join for the joins consist of the specifiled tables.
	MergeJoin(table table[ table...])	Forces merge join for the joins consist of the specifiled tables.
	NoNestLoop(table table[ table...])	Forces not to do nested loop for the joins consist of the specifiled tables.
	NoHashJoin(table table[ table...])	Forces not to do hash join for the joins consist of the specifiled tables.
	NoMergeJoin(table table[ table...])	Forces not to do merge join for the joins consist of the specifiled tables.
Join order	Leading(table table[ table...])	Forces join order as specified.
Join order	Leading(<join pair>)	Forces join order and directions as specified. A join pair is a pair of tables and/or other join pairs enclosed by parentheses, which can make a nested structure.
Row number correction	Rows(table table[ table...] correction)	Corrects row number of a result of the joins consist of the specfied tables. The available correction methods are absolute (#<n>), addition (+<n>), subtract (-<n>) and multiplication (*<n>). <n> should be a string that strtod() can read.
GUC	Set(GUC-param value)	Set the GUC parameter to the value while planner is running.

例如

postgres=# /*+ postgres*# Set(random_page_cost 2.0) postgres*# */ postgres-# SELECT * FROM table1 t1 WHERE key = 'value';postgres=# /*+ Rows(a b #10) */ SELECT... ; Sets rows of join result to 10 postgres=# /*+ Rows(a b +10) */ SELECT... ; Increments row number by 10 postgres=# /*+ Rows(a b -10) */ SELECT... ; Subtracts 10 from the row number. postgres=# /*+ Rows(a b *10) */ SELECT... ; Makes the number 10 times larger.

四、內核實現

看完了形形色色的Hint，我們會想，這些Hint是怎么改變復雜的優化器邏輯，使其生成我們需要的查詢計劃的呢？我們從其源碼看起（源碼可以從這里下載）。

插件主要的代碼集中在pg_hint_plan.c里面。從其中PG_init函數的代碼可以看出，它利用了planner_hook（優化器的函數鉤子，實際上是全局變量，存放函數地址，可以被插件更改，換成插件自定義函數的地址），用pg_hint_plan_planner取代了原來的優化器邏輯。這樣PG在處理一個SQL時，將調用pg_hint_plan_planner來做優化。而pg_hint_plan_planner會調用get_hints_from_comment，來讀取Hint，并調用create_hintstate進行語法分析。這里要說明的是，create_hintstate遇到一張表上的多個同類型Hint（包括重復的Hint），只保留最后一個，前面的會忽略。

另外，還有兩個函數鉤子被利用：get_relation_info_hook?和?join_search_hook。這兩個鉤子分別被修改指向了pg_hint_plan_get_relation_info和pg_hint_plan_join_search。前者是在優化器處理基本表（非視圖、非函數的表）獲取表信息時被調用，調用棧如下：

query_planner -> add_base_rels_to_query -> build_simple_rel -> get_relation_info -> get_relation_info_hook(即pg_hint_plan_get_relation_info)

這個pg_hint_plan_get_relation_info做了什么呢？仔細看會驚訝的發現，它是用來刪除索引的！對，它在優化器獲取表的基本信息后被調用，然后其從基本信息刪除了那些在Hint中未使用的索引。例如，t1上有兩個索引t1_i_a和t1_i_b，如果指定了IndexScan(t1 t_i_b)這個Hint，那么t1_i_a的索引信息在這里被刪除，這樣在后續的優化中，就永遠不會考慮t1_i_a這個索引了！

再看pg_hint_plan_join_search，其被調用的位置如下：

query_planner -> make_one_rel -> make_rel_from_joinlist ->join_search_hook(即pg_hint_plan_join_search)

可見，它是在為一個SQL語句生成連接結果時被調用，其輸入為待連接的表，輸出為連接后生成的表及其最優的查詢計劃。它主要做了兩件事：

調用rebuild_scan_path重新生成基本表的訪問路徑。為什么要重新生成呢？因為在基本表的訪問計劃生成階段，掃描類的Hint并未實際起作用（只是對索引做過刪除處理）。例如，即使指定了IndexScan(t1 t1_i_a)，但外部的GUC變量enable_indexscan被設置為了off，在這里也只會看到一個表掃描（SeqScan）的查詢計劃。因此這里需要重新設置好GUC變量（例如如果遇到IndexScan Hint，需要把GUC變量enable_indexscan重置為on），再做一遍訪問計劃。由于基本表一般數量較少，訪問計劃也只需再生成一次，所以此步開銷是可接受的；

調用pg_hint_plan_standard_join_search生成連接的計劃。這里是應用連接方法和連接順序Hint的地方。要想改變連接方法或順序，需要進一步修改優化器的整個邏輯，但優化器沒那么多的預定義鉤子可用了，采用函數鉤子的方法不可行。于是，插件便“自備”了優化器的主流程代碼（其實是從同版本的PG里面拷貝出來的），見插件代碼中的core.c和make_join_rel.c兩個文件。里面很多地方是被插件修改過的。其中核心的是修改對add_paths_to_joinrel的調用，使優化器實際調用add_paths_to_joinrel_wrapper。這個函數是用于為輸入的兩張表（可能是連接生成的中間表），生成一個連接計劃。可以看到add_paths_to_joinrel_wrapper會先去查找有沒有對應的Hint，如果有就直接利用，并舍棄掉不符合Hint的連接方法和順序（這是連接順序Hint其作用的地方）。

可以看到，此插件的實現并不復雜，它巧妙利用了優化器優化流程中的關鍵點，來應用Hint，達到固定查詢計劃的目的。

參考

AliCloudDB for PostgreSQL pg_hint_plan插件的用法-阿里云開發者社區

http://pghintplan.osdn.jp/pg_hint_plan.html
http://pghintplan.osdn.jp/hint_list.html
https://yq.aliyun.com/articles/17212

PgSQL · 特性分析 · Plan Hint

總結

以上是生活随笔為你收集整理的pg_hint_plan 使用hint固定SQL执行计划的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇：微信小程序最简单的轮播图
下一篇：谷歌网盘云盘google drive扩容