MySQL 8.0 Server层最新架构详解
背景和架構
本文基于MySQL?8.0.25源碼進行分析和總結。這里MySQL?Server層指的是MySQL的優化器、執行器部分。我們對MySQL的理解還建立在5.6和5.7版本的理解之上,更多的是對比PostgreSQL或者傳統數據庫。然而從MySQL?8.0開始,持續每三個月的迭代和重構工作,使得MySQL?Server層的整體架構有了質的飛越。下面來看下MySQL最新的架構。
我們可以看到最新的MySQL的分層架構和其他數據庫并沒有太大的區別,另外值得一提的是從圖中可以看出MySQL現在更多的加強InnoDB、NDB集群和RAPID(HeatWave?clusters)內存集群架構的演進。下面我們就看下具體細節,我們這次不隨著官方的Feature實現和重構順序進行理解,本文更偏向于從優化器、執行器的流程角度來演進。
MySQL?解析器Parser?
首先從Parser開始,官方MySQL?8.0使用Bison進行了重寫,生成Parser?Tree,同時Parser?Tree會contextualize生成MySQL抽象語法樹(Abstract?Syntax?Tree)。
MySQL抽象語法樹和其他數據庫有些不同,參考《MySQL?8.0?新的火山模型執行器》文章,是由讓比較讓人拗口SELECT_LEX_UNIT/SELECT_LEX類交替構成的,然而這兩個結構在最新的版本中已經重命名成標準的SELECT_LEX?->?Query_block和ELECT_LEX_UNIT?->?Query_expression。Query_block是代表查詢塊,而Query_expression是包含多個查詢塊的查詢表達式,包括UNION?AND/OR的查詢塊(如SELECT?*?FROM?t1?union?SELECT?*?FROM?t2)或者有多Level的ORDER?BY/LIMIT?(如SELECT?*?FROM?t1?ORDER?BY?a?LIMIT?10)?ORDER?BY?b?LIMIT?5
例如,來看一個復雜的嵌套查詢:
(SELECT * FROM ttt1) UNION ALL (SELECT * FROM (SELECT * FROM ttt2) AS a, (SELECT * FROM ttt3 UNION ALL SELECT * FROM ttt4) AS b)在MySQL中就可以用下面方式表達:
經過解析和轉換后的語法樹仍然建立在Query_block和Query_expression的框架下,只不過有些LEVEL的query?block被消除或者合并了,這里不在詳細展開。
MySQL?prepare/rewrite階段
接下來我們要經過resolve和transformation過程Query_expression::prepare->Query_block::prepare,這個過程包括(按功能分而非完全按照執行順序):
Setup?and?Fix
- setup_tables?:?Set?up?table?leaves?in?the?query?block?based?on?list?of?tables.
- resolve_placeholder_tables/merge_derived/setup_table_function/setup_materialized_derived?:?Resolve?derived?table,?view?or?table?function?references?in?query?block.
- setup_natural_join_row_types?:?Compute?and?store?the?row?types?of?the?top-most?NATURAL/USING?joins.
- setup_wild?:?Expand?all?'*'?in?list?of?expressions?with?the?matching?column?references.
- setup_base_ref_items?:?Set?query_block's?base_ref_items.
- setup_fields?:?Check?that?all?given?fields?exists?and?fill?struct?with?current?data.
- setup_conds?:?Resolve?WHERE?condition?and?join?conditions
- setup_group?:?Resolve?and?set?up?the?GROUP?BY?list.
- m_having_cond->fix_fields?:?Setup?the?HAVING?clause.
- resolve_rollup?:?Resolve?items?in?SELECT?list?and?ORDER?BY?list?for?rollup?processing
- resolve_rollup_item?:?Resolve?an?item?(and?its?tree)?for?rollup?processing?by?replacing?items?matching?grouped?expressions?with?Item_rollup_group_items?and?updating?properties?(m_nullable,?PROP_ROLLUP_FIELD).?Also?check?any?GROUPING?function?for?incorrect?column.
- setup_order?:?Set?up?the?ORDER?BY?clause.
- resolve_limits?:?Resolve?OFFSET?and?LIMIT?clauses.
- Window::setup_windows1:?Set?up?windows?after?setup_order()?and?before?setup_order_final()
- setup_order_final:?Do?final?setup?of?ORDER?BY?clause,?after?the?query?block?is?fully?resolved.
- setup_ftfuncs?:?Setup?full-text?functions?after?resolving?HAVING
- resolve_rollup_wfs?:?Replace?group?by?field?references?inside?window?functions?with?references?in?the?presence?of?ROLLUP.
Transformation
- remove_redundant_subquery_clause?:?Permanently?remove?redundant?parts?from?the?query?if?1)?This?is?a?subquery?2)?Not?normalizing?a?view.?Removal?should?take?place?when?a?query?involving?a?view?is?optimized,?not?when?the?view?is?created.?
- remove_base_options:?Remove?SELECT_DISTINCT?options?from?a?query?block?if?can?skip?distinct
- resolve_subquery?:?Resolve?predicate?involving?subquery,?perform?early?unconditional?subquery?transformations
- Convert?subquery?predicate?into?semi-join,?or
- Mark?the?subquery?for?execution?using?materialization,?or
- Perform?IN->EXISTS?transformation,?or
- Perform?more/less?ALL/ANY?->?MIN/MAX?rewrite
- Substitute?trivial?scalar-context?subquery?with?its?value
- Convert?subquery?predicate?into?semi-join,?or
- transform_scalar_subqueries_to_join_with_derived:?Transform?eligible?scalar?subqueries?to?derived?tables.
- flatten_subqueries?:?Convert?semi-join?subquery?predicates?into?semi-join?join?nests.?Convert?candidate?subquery?predicates?into?semi-join?join?nests.?This?transformation?is?performed?once?in?query?lifetime?and?is?irreversible.
- apply_local_transforms?:?
- delete_unused_merged_columns?:?If?query?block?contains?one?or?more?merged?derived?tables/views,?walk?through?lists?of?columns?in?select?lists?and?remove?unused?columns.
- simplify_joins?:?Convert?all?outer?joins?to?inner?joins?if?possible
- prune_partitions?:Perform?partition?pruning?for?a?given?table?and?condition.
- delete_unused_merged_columns?:?If?query?block?contains?one?or?more?merged?derived?tables/views,?walk?through?lists?of?columns?in?select?lists?and?remove?unused?columns.
- push_conditions_to_derived_tables?:?Pushing?conditions?down?to?derived?tables?must?be?done?after?validity?checks?of?grouped?queries?done?by?apply_local_transforms();
- Window::eliminate_unused_objects:?Eliminate?unused?window?definitions,?redundant?sorts?etc.
這里,節省篇幅,我們只舉例關注下和top_join_list相關的simple_joins這個函數的作用,對于Query_block中嵌套join的簡化過程。
對比PostgreSQL
為了更清晰的理解標準數據庫的做法,我們這里引用了PostgreSQL的這三個過程:
Parser
下圖首先Parser把SQL語句生成parse?tree。
testdb=# SELECT id, data FROM tbl_a WHERE id < 300 ORDER BY data;Analyzer/Analyser
下圖展示了PostgreSQL的analyzer/analyser如何將parse?tree通過語義分析后生成query?tree。
Rewriter
Rewriter會根據規則系統中的規則把query?tree進行轉換改寫。
sampledb=# CREATE VIEW employees_list sampledb-# AS SELECT e.id, e.name, d.name AS department sampledb-# FROM employees AS e, departments AS d WHERE e.department_id = d.id;下圖的例子就是一個包含view的query?tree如何展開成新的query?tree。
sampledb=# SELECT * FROM employees_list;MySQL?Optimize和Planning階段
接下來我們進入了邏輯計劃生成物理計劃的過程,本文還是注重于結構的解析,而不去介紹生成的細節,MySQL過去在8.0.22之前,主要依賴的結構就是JOIN和QEP_TAB。JOIN是與之對應的每個Query_block,而QEP_TAB對應的每個Query_block涉及到的具體“表”的順序、方法和執行計劃。然而在8.0.22之后,新的基于Hypergraph的優化器算法成功的拋棄了QEP_TAB結構來表達左深樹的執行計劃,而直接使用HyperNode/HyperEdge的圖來表示執行計劃。
舉例可以看到數據結構表達的left?deep?tree和超圖結構表達的bushy?tree對應的不同計劃展現:
| -> Inner hash join (no condition) (cost=1.40 rows=1) -> Table scan on R4 (cost=0.35 rows=1) -> Hash -> Inner hash join (no condition) (cost=1.05 rows=1) -> Table scan on R3 (cost=0.35 rows=1) -> Hash -> Inner hash join (no condition) (cost=0.70 rows=1) -> Table scan on R2 (cost=0.35 rows=1) -> Hash -> Table scan on R1 (cost=0.35 rows=1) | -> Nested loop inner join (cost=0.55..0.55 rows=0) -> Nested loop inner join (cost=0.50..0.50 rows=0) -> Table scan on R4 (cost=0.25..0.25 rows=1) -> Filter: (R4.c1 = R3.c1) (cost=0.35..0.35 rows=0) -> Table scan on R3 (cost=0.25..0.25 rows=1) -> Nested loop inner join (cost=0.50..0.50 rows=0) -> Table scan on R2 (cost=0.25..0.25 rows=1) -> Filter: (R2.c1 = R1.c1) (cost=0.35..0.35 rows=0) -> Table scan on R1 (cost=0.25..0.25 rows=1)MySQL8.0.2x為了更好的兼容兩種優化器,引入了新的類AccessPath,可以認為這是MySQL為了解耦執行器和不同優化器抽象出來的Plan?Tree。
老優化器的入口
老優化器仍然走JOIN::optimize來把query?block轉換成query?execution?plan?(QEP.)
這個階段仍然做一些邏輯的重寫工作,這個階段的轉換可以理解為基于cost-based優化前做準備,詳細步驟如下:
- Logical?transformations:
- optimize_derived?:?Optimize?the?query?expression?representing?a?derived?table/view.
- optimize_cond?:?Equality/constant?propagation.
- prune_table_partitions?:?Partition?pruning.
- optimize_aggregated_query?:?COUNT(*),?MIN(),?MAX()?constant?substitution?in?case?of?implicit?grouping.
- substitute_gc?:?ORDER?BY?optimization,?substitute?all?expressions?in?the?WHERE?condition?and?ORDER/GROUP?lists?that?match?generated?columns?(GC)?expressions?with?GC?fields,?if?any.
- optimize_derived?:?Optimize?the?query?expression?representing?a?derived?table/view.
- Perform?cost-based?optimization?of?table?order?and?access?path?selection.
- JOIN::make_join_plan()?:?Set?up?join?order?and?initial?access?paths.
- JOIN::make_join_plan()?:?Set?up?join?order?and?initial?access?paths.
- Post-join?order?optimization
- substitute_for_best_equal_field?:?Create?optimal?table?conditions?from?the?where?clause?and?the?join?conditions.
- make_join_query_block?:?Inject?outer-join?guarding?conditions.
- Adjust?data?access?methods?after?determining?table?condition?(several?times.)
- optimize_distinct_group_order?:?Optimize?ORDER?BY/DISTINCT.
- optimize_fts_query?:?Perform?FULLTEXT?search?before?all?regular?searches.
- remove_eq_conds?:?Removes?const?and?eq?items.?Returns?the?new?item,?or?nullptr?if?no?condition.
- replace_index_subquery/create_access_paths_for_index_subquery?:?See?if?this?subquery?can?be?evaluated?with?subselect_indexsubquery_engine
- setup_join_buffering?:?Check?whether?join?cache?could?be?used
- substitute_for_best_equal_field?:?Create?optimal?table?conditions?from?the?where?clause?and?the?join?conditions.
- Code?generation
- alloc_qep(tables)?:?Create?QEP_TAB?array
- test_skip_sort?:?Try?to?optimize?away?sorting/distinct.
- make_join_readinfo?:?Plan?refinement?stage:?do?various?setup?things?for?the?executor
- make_tmp_tables_info?:?Setup?temporary?table?usage?for?grouping?and/or?sorting.
- push_to_engines?:?Push?(parts?of)?the?query?execution?down?to?the?storage?engines?if?they?can?provide?faster?execution?of?the?query,?or?part?of?it.
- create_access_paths?:?generated?ACCESS_PATH.
- alloc_qep(tables)?:?Create?QEP_TAB?array
新優化器的入口
新優化器默認不打開,必須通過set?optimizer_switch="hypergraph_optimizer=on";?來打開,其代碼流程如下:
FindBestQueryPlan -> MakeJoinHypergraph 構建hypergraph -> MakeRelationalExpressionFromJoinList 基于top_join_list構建operator tree ... -> MakeJoinGraphFromRelationalExpression 基于operator tree => hypergraph 目前對non-inner join的約束比較粗暴: 1. Inner join,且左右子樹都只有inner join,只約束SES 2. Inner join,但子樹有非inner join: 將那顆子樹整個固定在當前operator下,作為hypernode 3. 非Inner join,直接把左右子樹都固定死,變成2個hypernode ... -> EnumerateAllConnectedPartitions 開始算法流程 (Solve) -> EnumerateComplementsTo 找互補對 (EmitCsg) -> FoundSubgraphPair 找到csg-cmp-pair (EmitCsgCmp) -> ExpandComplement 擴展互補對 (EnumerateCmpRec) -> ExpandSubgraph 擴展連通子圖 (EnumerateCsgRec) 輔助函數 FindNeighborhood 獲取neighbors舉例看下Plan(AccessPath)和SQL的關系
最后生成Iterator執行器框架需要的Iterator執行載體,AccessPath和Iterator是一對一的關系(Access?paths?are?a?query?planning?structure?that?correspond?1:1?to?iterators)。
Query_expression::m_root_iterator = CreateIteratorFromAccessPath(......) unique_ptr_destroy_only<RowIterator> CreateIteratorFromAccessPath( THD *thd, AccessPath *path, JOIN *join, bool eligible_for_batch_mode) { ...... switch (path->type) { case AccessPath::TABLE_SCAN: { const auto ¶m = path->table_scan(); iterator = NewIterator<TableScanIterator>( thd, param.table, path->num_output_rows, examined_rows); break; } case AccessPath::INDEX_SCAN: { const auto ¶m = path->index_scan(); if (param.reverse) { iterator = NewIterator<IndexScanIterator<true>>( thd, param.table, param.idx, param.use_order, path->num_output_rows, examined_rows); } else { iterator = NewIterator<IndexScanIterator<false>>( thd, param.table, param.idx, param.use_order, path->num_output_rows, examined_rows); } break; } case AccessPath::REF: { ...... }對比PostgreSQL
testdb=# EXPLAIN SELECT * FROM tbl_a WHERE id < 300 ORDER BY data; QUERY PLAN --------------------------------------------------------------- Sort (cost=182.34..183.09 rows=300 width=8) Sort Key: data -> Seq Scan on tbl_a (cost=0.00..170.00 rows=300 width=8) Filter: (id < 300) (4 rows)總結
本文主要focus在MySQL最新版本官方的源碼上,重點分析了官方的重構在多階段和各階段結構上的變化和聯系,更多的是為了讓大家了解一個全新的MySQL的發展。最后這里貼幾張官方youtube介紹的refactor的工作內容。
Refactoring:?Separating?stages
- Started?~10?years?ago
Finished?a?few?years?ago
- A?clear?separation?between?query?processing?stages
- Fixed?a?large?number?of?bugs
Improved?stability
- Faster?feature?devlopment
Fewer?surprises?and?complications?during?development
Refactoring:?Parsing?and?preparing
- Still?ongoing
Implemented?piece?by?piece
- Separating?parsing?and?resolving?stages
Elimiating?semantic?actions?that?do?too?much?
Get?a?true?bottom-up?parser
- Makes?it?easier?to?extend?with?new?SQL?syntax
Parsing?doesn't?have?unintented?side?effects
- Condistent?name?and?type?resolving
Names?resolved?top-down
Types?resolved?bottom-up
- Transformations?done?in?the?prepare?stage
Bottom-up
Refactoring:?Execution
- Volcano?iterator?model
- Possible?because?stages?were?separated
- Done?silently?in?8.0?oever?the?last?two?years
- Much?more?modular?executor
Common?iterator?interface?for?all?iterators
Each?operation?is?contained?within?an?iterator
- Able?to?put?plans?together?in?new?ways
Immediate?benefix:?Remove?some?instances?of?teamporary?tables
- Join?is?just?an?iterator
Nested?loop?join?is?just?an?interator
Hash?join?is?just?another?iterator
Your?favourite?join?method?is?just?another?iterator
Old?vs?current?executor
Old?executor
| New?(current)?executor
|
參考資料
《Refactoring?Query?Processing?in?MySQL?(Norvald?H.?Ryeng,?Oracle)》
《Dynamic?Programming?Strikes?Back?-?MySQL8.0的新優化器》
《The?Internals?of?PostgreSQL?Chapter?3?Query?Processing》
原文鏈接:https://developer.aliyun.com/article/789962?
版權聲明:本文內容由阿里云實名注冊用戶自發貢獻,版權歸原作者所有,阿里云開發者社區不擁有其著作權,亦不承擔相應法律責任。具體規則請查看《阿里云開發者社區用戶服務協議》和《阿里云開發者社區知識產權保護指引》。如果您發現本社區中有涉嫌抄襲的內容,填寫侵權投訴表單進行舉報,一經查實,本社區將立刻刪除涉嫌侵權內容。總結
以上是生活随笔為你收集整理的MySQL 8.0 Server层最新架构详解的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 消控宝点亮智慧消防
- 下一篇: Dubbo 和 HSF 在阿里巴巴的实践