當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

oralce之 10046对Hash Join分析

發布時間：2025/3/21 编程问答 23 豆豆

生活随笔收集整理的這篇文章主要介紹了 oralce之 10046对Hash Join分析小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

前兩天解決了一個優化SQL的case，SQL語句如下，big_table為150G大小，small_table很小，9000多條記錄，不到1M大小，hash_area_size, sort_area_size均設置足夠大，可以進行optimal hash join和memory sort。

1 2 3 4 5 6

select?/*+ leading(b) use_hash(a b) */?distinct?a.ID from?BIG_TABLE a, SMALL_TABLE b where?(a.category? = b.from_cat or ???????a.category2 = b.from_cat) and ???????a.site_id? = b.site_id and ???????a.sale_end >= sysdate;

執行計劃如下：

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

-------------------------------------------------------------------------- | Id? | Operation??????????? |? Name??????? | Rows? | Bytes | Cost (%CPU)| -------------------------------------------------------------------------- |?? 0 | SELECT STATEMENT???? |????????????? |???? 2 |?? 174 |??? 18? (17)| |?? 1 |? SORT UNIQUE???????? |????????????? |???? 2 |?? 174 |??? 18? (17)| |*? 2 |?? HASH JOIN????????? |????????????? |???? 2 |?? 174 |??? 17? (12)| |?? 3 |??? TABLE ACCESS FULL | SMALL_TABLE? |? 1879 | 48854 |??? 14?? (8)| |*? 4 |??? TABLE ACCESS FULL | BIG_TABLE??? |???? 4 |?? 244 |???? 3? (34)| -------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- ???2 - access("A"."SITE_ID"="B"."SITE_ID") ???????filter("A"."CATEGORY"="B"."FROM_CAT" OR ??????????????"A"."CATEGORY2"="B"."FROM_CAT") ???4 - filter("A"."SALE_END">=SYSDATE@!)

粗略來看，PLAN非常的完美，SQL HINT寫的也很到位，小表在內build hash table,大表在外進行probe操作，根據經驗來看，整個SQL執行的時間應該和FTS（Full?Table?Scan） BIG_TABLE的時間差不多。

但是FTS BIG_TABLE的時間大約是8分鐘，而真個SQL執行的時間長達3~4小時。

那么問題究竟出在哪里?

FTS時間應該不會有太大變化，那么問題應該在hash join，設置event來trace一下hash join的過程：

1	alter?session set?events '10104 trace name context forever, level 2';

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28

### Hash table ### # NOTE: The calculated number of rows in non-empty buckets may be smaller #?????? than the true number. Number of buckets with?? 0 rows:????? 16373 Number of buckets with?? 1 rows:????????? 0 Number of buckets with?? 2 rows:????????? 0 Number of buckets with?? 3 rows:????????? 1 Number of buckets with?? 4 rows:????????? 0 Number of buckets with?? 5 rows:????????? 0 Number of buckets with?? 6 rows:????????? 0 Number of buckets with?? 7 rows:????????? 1 Number of buckets with?? 8 rows:????????? 0 Number of buckets with?? 9 rows:????????? 0 Number of buckets with between? 10 and? 19 rows:????????? 1 Number of buckets with between? 20 and? 29 rows:????????? 1 Number of buckets with between? 30 and? 39 rows:????????? 3 Number of buckets with between? 40 and? 49 rows:????????? 0 Number of buckets with between? 50 and? 59 rows:????????? 0 Number of buckets with between? 60 and? 69 rows:????????? 0 Number of buckets with between? 70 and? 79 rows:????????? 0 Number of buckets with between? 80 and? 89 rows:????????? 0 Number of buckets with between? 90 and? 99 rows:????????? 0 Number of buckets with 100 or more rows:????????? 4 ### Hash table overall statistics ### Total buckets: 16384 Empty buckets: 16373 Non-empty buckets: 11 Total number of rows: 9232 Maximum number of rows in a bucket: 2531 Average number of rows in non-empty buckets: 839.272705

仔細看，在一個bucket中最多的行數竟然有2531行，因為bucket中是一個鏈表的結構，所以這幾千行都是串在一個鏈表上。?
由這一點想到這個Hash Table所依賴的hash key的distinct value可能太少，重復值太多。否則不應該會有這么多行在同一個bucket里面。

因為Join條件里面有兩個列from_cat和site_id,窮舉法有三種情況：

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

SQL> select?site_id,from_cat,count(*) from?SMALL_TABLE group?by?site_id,from_cat having?count(*)>100; no?rows?selected 2. Build hash table?based on?(from_cat): SQL> select?from_cat,count(*) from?SMALL_TABLE group?by?from_cat having?count(*)>100; no?rows?selected 3. Build hash table?based on?(site_id): SQL> select?site_id,count(*) from?SMALL_TABLE group?by?site_id having?count(*)>100; ???SITE_ID?? COUNT(*) ---------- ---------- ?????????0?????? 2531 ?????????2?????? 2527 ???????146?????? 1490 ???????210?????? 2526

到這里可以發現，基于site_id這種情況和trace file中這兩行很相符：

1 2	Number of buckets with 100 or more rows: 4 Maximum number of rows in a bucket: 2531

注：這判斷過程可以從執行計劃的“Predicate Information”部分看出：

1	access("A"."SITE_ID"="B"."SITE_ID")

所以推斷這個hash table是基于site_id而建的，而Big_Table中大量的行site_id=0，都落在這個linked list最長的bucket中，而大部分行都會掃描完整個鏈表而最后被丟棄掉,所以這個Hash Join的操作效率非常差，幾乎變為了Nest Loop操作。

找到了根本原因，問題也就迎刃而解了。

理想狀況下，hash table應當建立于(site_id,from_cat)上，那么問題肯定出在這個OR上，把OR用UNION改寫：

1 2 3 4 5 6 7 8 9 10 11

select?/*+ leading(b) use_hash(a b) */?distinct?a.ID from?BIG_TABLE a, SMALL_TABLE b where??a.category? = b.from_cat and ???????a.site_id? = b.site_id and ???????a.sale_end >= sysdate UNION select?/*+ leading(b) use_hash(a b) */?distinct?a.ID from?BIG_TABLE a, SMALL_TABLE b where??a.category2 = b.from_cat and ???????a.site_id? = b.site_id and ???????a.sale_end >= sysdate;

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22

-------------------------------------------------------------------------- | Id? | Operation??????????? |? Name??????? | Rows? | Bytes | Cost (%CPU)| -------------------------------------------------------------------------- |?? 0 | SELECT STATEMENT???? |????????????? |???? 2 |?? 148 |??? 36? (59)| |?? 1 |? SORT UNIQUE???????? |????????????? |???? 2 |?? 148 |??? 36? (59)| |?? 2 |?? UNION-ALL????????? |????????????? |?????? |?????? |??????????? | |*? 3 |??? HASH JOIN???????? |????????????? |???? 1 |??? 74 |??? 17? (12)| |?? 4 |???? TABLE ACCESS FULL| SMALL_TABLE? |? 1879 | 48854 |??? 14?? (8)| |*? 5 |???? TABLE ACCESS FULL| BIG_TABLE??? |???? 4 |?? 192 |???? 3? (34)| |*? 6 |??? HASH JOIN???????? |????????????? |???? 1 |??? 74 |??? 17? (12)| |?? 7 |???? TABLE ACCESS FULL| SMALL_TABLE? |? 1879 | 48854 |??? 14?? (8)| |*? 8 |???? TABLE ACCESS FULL| BIG_TABLE??? |???? 4 |?? 192 |???? 3? (34)| -------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- ???3 - access("A"."CATEGORY"="B"."FROM_CAT" AND ??????????????"A"."SITE_ID"="B"."SITE_ID") ???5 - filter("A"."SALE_END">=SYSDATE@!) ???6 - access("A"."CATEGORY2"="B"."FROM_CAT" AND ??????????????"A"."SITE_ID"="B"."SITE_ID") ???8 - filter("A"."SALE_END">=SYSDATE@!)

初看這個PLAN好像不如第一個PLAN，因為執行了兩次BIG_TABLE的FTS，但是讓我們在來看看HASH TABLE的結構

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28

### Hash table ### # NOTE: The calculated number of rows in non-empty buckets may be smaller #?????? than the true number. Number of buckets with?? 0 rows:?????? 9306 Number of buckets with?? 1 rows:?????? 5310 Number of buckets with?? 2 rows:?????? 1436 Number of buckets with?? 3 rows:??????? 285 Number of buckets with?? 4 rows:???????? 43 Number of buckets with?? 5 rows:????????? 4 Number of buckets with?? 6 rows:????????? 0 Number of buckets with?? 7 rows:????????? 0 Number of buckets with?? 8 rows:????????? 0 Number of buckets with?? 9 rows:????????? 0 Number of buckets with between? 10 and? 19 rows:????????? 0 Number of buckets with between? 20 and? 29 rows:????????? 0 Number of buckets with between? 30 and? 39 rows:????????? 0 Number of buckets with between? 40 and? 49 rows:????????? 0 Number of buckets with between? 50 and? 59 rows:????????? 0 Number of buckets with between? 60 and? 69 rows:????????? 0 Number of buckets with between? 70 and? 79 rows:????????? 0 Number of buckets with between? 80 and? 89 rows:????????? 0 Number of buckets with between? 90 and? 99 rows:????????? 0 Number of buckets with 100 or more rows:????????? 0 ### Hash table overall statistics ### Total buckets: 16384 Empty buckets: 9306 Non-empty buckets: 7078 Total number of rows: 9232 Maximum number of rows in a bucket: 5 Average number of rows in non-empty buckets: 1.304323

這就是我們所需要的Hash Table，最長的鏈表只有五行數據。

整個SQL的執行時間從三四個小時縮短為16分鐘，大大超出了developer的預期。

這個SQL單純從PLAN上很難看出問題所在，需要了解Hash Join的機制，進行更深一步的分析。

source:http://www.itpub.net/thread-955209-1-1.html

文章可以轉載，必須以鏈接形式標明出處。

本文轉自張沖andy 博客園博客，原文鏈接： ?http://www.cnblogs.com/andy6/p/7502059.html?，如需轉載請自行聯系原作者

總結

以上是生活随笔為你收集整理的oralce之 10046对Hash Join分析的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇： phpcms v9调用指定栏目名称、ur
下一篇：网页中嵌入地图位置方法