深入解读MySQL8.0 新特性 :Crash Safe DDL
前言
在MySQL8.0之前的版本中,由于架構(gòu)的原因,mysql在server層使用統(tǒng)一的frm文件來(lái)存儲(chǔ)表元數(shù)據(jù)信息,這個(gè)信息能夠被不同的存儲(chǔ)引擎識(shí)別。而實(shí)際上innodb本身也存儲(chǔ)有元數(shù)據(jù)信息。這給ddl帶來(lái)了一定的挑戰(zhàn),因?yàn)檫@種架構(gòu)無(wú)法做到ddl的原子化,我們?cè)诰€上經(jīng)常能夠看到數(shù)據(jù)目錄下遺留的臨時(shí)文件,或者類似server層和innodb層列個(gè)數(shù)不一致之類的錯(cuò)誤。甚至某些ddl可能還遺留元數(shù)據(jù)在innodb內(nèi),而丟失了frm,導(dǎo)致無(wú)法重建表…..(我們?yōu)榱私鉀Q這個(gè)問(wèn)題,實(shí)現(xiàn)了一個(gè)叫drop table force的功能,去強(qiáng)制做清理….)
(以下所有的討論都假定使用InnoDB存儲(chǔ)引擎)
到了8.0版本,我們知道所有的元數(shù)據(jù)已經(jīng)統(tǒng)一用InnoDB來(lái)進(jìn)行管理,這就給實(shí)現(xiàn)原子ddl帶來(lái)了可能,幾乎所有的對(duì)innodb表,存儲(chǔ)過(guò)程,觸發(fā)器,視圖或者UDF的操作,都能做到原子化:
- 元數(shù)據(jù)修改,binlog以及innodb的操作都放在一個(gè)事務(wù)中 - 增加了一個(gè)內(nèi)部隱藏的系統(tǒng)表`mysql.innodb_ddl_log`,ddl操作被記錄到這個(gè)表中,注意對(duì)該表的操作產(chǎn)生的redo會(huì)fsync到磁盤上,而不會(huì)考慮innodb_flush_log_at_trx_commit的配置。當(dāng)崩潰重啟時(shí),會(huì)根據(jù)事務(wù)是否提交來(lái)決定通過(guò)這張表的記錄去回滾或者執(zhí)行ddl操作 - 增加了一個(gè)post-ddl的階段,這也是ddl的最后一個(gè)階段,會(huì)去:1\. 真正的物理刪除或重命名文件; 2\. 刪除innodb_ddl_log中的記錄項(xiàng); 3.對(duì)于一些ddl操作還會(huì)去更新其動(dòng)態(tài)元數(shù)據(jù)信息(存儲(chǔ)在`mysql.innodb_dynamic_metadata`,例如corrupt flag, auto_inc值等) - 一個(gè)正常運(yùn)行的ddl結(jié)束后,其ddl log也應(yīng)該被清理,如果這中間崩潰了,重啟時(shí)會(huì)去嘗試重放:1.如果已經(jīng)走到最后一個(gè)ddl階段的(commit之后),就replay ddl log,把ddl完成掉;2\. 如果處于某個(gè)中間態(tài),則回滾ddl由于引入了atomic ddl, 有些ddl操作的行為也發(fā)生了變化:
- DROP TABLE: 在之前的版本中,一個(gè)drop table語(yǔ)句中如果要?jiǎng)h多個(gè)表,比如t1,t2, t2不存在時(shí),t1會(huì)被刪除。但在8.0中,t1和t2都不會(huì)被刪除,而是拋出錯(cuò)誤。因此要注意5.7->8.0的復(fù)制問(wèn)題 (DROP VIEW, CREATE USER也有類似的問(wèn)題) - DROP DATABASE: 修改元數(shù)據(jù)和ddl_log先提交事務(wù),而真正的物理刪除數(shù)據(jù)文件放在最后,因此如果在刪除文件時(shí)崩潰,重啟時(shí)會(huì)根據(jù)ddl_log繼續(xù)執(zhí)行drop database測(cè)試:
MySQL很貼心的加了一個(gè)選項(xiàng)innodb_print_ddl_logs,打開后我們可以從錯(cuò)誤日志看到對(duì)應(yīng)的ddl log,下面我們通過(guò)這個(gè)來(lái)看下一些典型ddl的過(guò)程
root@(none) 11:12:19>SET GLOBAL innodb_print_ddl_logs = 1; Query OK, 0 rows affected (0.00 sec)root@(none) 11:12:22>SET GLOBAL log_error_verbosity = 3; Query OK, 0 rows affected (0.00 sec)CREATE DATABASE
mysql> CREATE DATABASE test; Query OK, 1 row affected (0.02 sec)創(chuàng)建數(shù)據(jù)庫(kù)語(yǔ)句沒(méi)有寫log_ddl,可能覺(jué)得這不是高頻操作,如果創(chuàng)建database的過(guò)程中失敗了,重啟后可能需要手動(dòng)刪除目錄。
CREATE TABLE
mysql> USE test; Database changed mysql> CREATE TABLE t1 (a INT PRIMARY KEY, b INT); Query OK, 0 rows affected (0.06 sec)[InnoDB] DDL log insert : [DDL record: DELETE SPACE, id=428, thread_id=7, space_id=76, old_file_path=./test/t1.ibd] [InnoDB] DDL log delete : by id 428 [InnoDB] DDL log insert : [DDL record: REMOVE CACHE, id=429, thread_id=7, table_id=1102, new_file_path=test/t1] [InnoDB] DDL log delete : by id 429 [InnoDB] DDL log insert : [DDL record: FREE, id=430, thread_id=7, space_id=76, index_id=190, page_no=4] [InnoDB] DDL log delete : by id 430 [InnoDB] DDL log post ddl : begin for thread id : 7 InnoDB] DDL log post ddl : end for thread id : 7從日志來(lái)看有三類操作,實(shí)際上描述了如果操作失敗需要進(jìn)行的三項(xiàng)逆向操作:刪除數(shù)據(jù)文件,釋放內(nèi)存中的數(shù)據(jù)詞典信息,刪除索引btree。在創(chuàng)建表之前,這些數(shù)據(jù)被寫入到ddl_log中,在創(chuàng)建完表并commit后,再?gòu)膁dl log中刪除這些記錄。
另外上述日志中還有DDL log delete日志,其實(shí)在每次寫入ddl log時(shí)是單獨(dú)事務(wù)提交的,但在提交之后,會(huì)使用當(dāng)前事務(wù)執(zhí)行一條delete操作,直到操作結(jié)束了才會(huì)提交。
加列(instant)
mysql> ALTER TABLE t1 ADD COLUMN c INT; Query OK, 0 rows affected (0.08 sec) Records: 0 Duplicates: 0 Warnings: 0[InnoDB] DDL log post ddl : begin for thread id : 7 [InnoDB] DDL log post ddl : end for thread id : 7注意這里執(zhí)行的是Instant ddl, 這是8.0.13新支持的特性,加列操作可以只修改元數(shù)據(jù),因此從ddl log中無(wú)需記錄數(shù)據(jù)
刪列
mysql> ALTER TABLE t1 DROP COLUMN c; Query OK, 0 rows affected (2.77 sec) Records: 0 Duplicates: 0 Warnings: 0[InnoDB] DDL log insert : [DDL record: DELETE SPACE, id=487, thread_id=7, space_id=83, old_file_path=./test/#sql-ib1108-1917598001.ibd] [InnoDB] DDL log delete : by id 487 [InnoDB] DDL log insert : [DDL record: REMOVE CACHE, id=488, thread_id=7, table_id=1109, new_file_path=test/#sql-ib1108-1917598001] [InnoDB] DDL log delete : by id 488 [InnoDB] DDL log insert : [DDL record: FREE, id=489, thread_id=7, space_id=83, index_id=200, page_no=4] [InnoDB] DDL log delete : by id 489[InnoDB] DDL log insert : [DDL record: DROP, id=490, thread_id=7, table_id=1108] [InnoDB] DDL log insert : [DDL record: RENAME SPACE, id=491, thread_id=7, space_id=82, old_file_path=./test/#sql-ib1109-1917598002.ibd, new_file_path=./test/t1.ibd] [InnoDB] DDL log delete : by id 491 [InnoDB] DDL log insert : [DDL record: RENAME TABLE, id=492, thread_id=7, table_id=1108, old_file_path=test/#sql-ib1109-1917598002, new_file_path=test/t1] [InnoDB] DDL log delete : by id 492 [InnoDB] DDL log insert : [DDL record: RENAME SPACE, id=493, thread_id=7, space_id=83, old_file_path=./test/t1.ibd, new_file_path=./test/#sql-ib1108-1917598001.ibd] [InnoDB] DDL log delete : by id 493 [InnoDB] DDL log insert : [DDL record: RENAME TABLE, id=494, thread_id=7, table_id=1109, old_file_path=test/t1, new_file_path=test/#sql-ib1108-1917598001] [InnoDB] DDL log delete : by id 494 [InnoDB] DDL log insert : [DDL record: DROP, id=495, thread_id=7, table_id=1108] [InnoDB] DDL log insert : [DDL record: DELETE SPACE, id=496, thread_id=7, space_id=82, old_file_path=./test/#sql-ib1109-1917598002.ibd][InnoDB] DDL log post ddl : begin for thread id : 7 [InnoDB] DDL log replay : [DDL record: DELETE SPACE, id=496, thread_id=7, space_id=82, old_file_path=./test/#sql-ib1109-1917598002.ibd] [InnoDB] DDL log replay : [DDL record: DROP, id=495, thread_id=7, table_id=1108] [InnoDB] DDL log replay : [DDL record: DROP, id=490, thread_id=7, table_id=1108] [InnoDB] DDL log post ddl : end for thread id : 7這是個(gè)典型的三階段ddl的過(guò)程:分為prepare, perform 以及commit三個(gè)階段:
- Prepare: 這個(gè)階段會(huì)修改元數(shù)據(jù),創(chuàng)建臨時(shí)ibd文件#sql-ib1108-1917598001.ibd, 如果發(fā)生異常崩潰,我們需要能把這個(gè)臨時(shí)文件刪除掉, 因此和create table類似,也為這個(gè)idb寫了三條日志:delete space, remove cache,以及free btree
- Perform: 執(zhí)行操作,將數(shù)據(jù)拷貝到上述ibd文件中,(同時(shí)處理online dmllog), 這部分不涉及l(fā)og ddl操作
-
Commit: 更新數(shù)據(jù)詞典信息并提交事務(wù), 這里會(huì)寫幾條日志:
- DROP : table_id=1108
- RENAME SPACE: #sql-ib1109-1917598002.ibd文件被rename成t1.ibd
- RENAME TABLE: #sql-ib1109-1917598002被rename成t1
- RENAME SPACE: t1.ibd 被rename成#sql-ib1108-1917598001.ibd
- RENAME TABLE: t1表被rename成#sql-ib1108-1917598001
- DROP TABLE: table_id=1108
- DELETE SPACE: 刪除#sql-ib1109-1917598002.ibd
實(shí)際上這一步寫的ddl log描述了commit階段操作的逆向過(guò)程:將t1.ibd rename成#sql-ib1109-1917598002, 并將sql-ib1108-1917598001 rename成t1表,最后刪除舊表。其中刪除舊表的操作這里不執(zhí)行,而是到post-ddl階段執(zhí)行
-
Post-ddl: 在事務(wù)提交后,執(zhí)行最后的操作:replay ddl log, 刪除舊文件,清理mysql.innodb_dynamic_metadata中相關(guān)信息
- DELETE SPACE: #sql-ib1109-1917598002.ibd
- DROP: table_id=1108
- DROP: table_id=1108
加索引
mysql> ALTER TABLE t1 ADD KEY(b); Query OK, 0 rows affected (0.14 sec) Records: 0 Duplicates: 0 Warnings: 0[InnoDB] DDL log insert : [DDL record: FREE, id=431, thread_id=7, space_id=76, index_id=191, page_no=5] [InnoDB] DDL log delete : by id 431[InnoDB] DDL log post ddl : begin for thread id : 7 [InnoDB] DDL log post ddl : end for thread id : 7創(chuàng)建索引采用inplace創(chuàng)建的方式,沒(méi)有臨時(shí)文件,但如果異常發(fā)生的話,依然需要在發(fā)生異常時(shí)清理臨時(shí)索引, 因此增加了一條FREE log,用于異常發(fā)生時(shí)能夠刪除臨時(shí)索引.
TRUNCATE TABLE
mysql> TRUNCATE TABLE t1; Query OK, 0 rows affected (0.13 sec)[InnoDB] DDL log insert : [DDL record: RENAME SPACE, id=439, thread_id=7, space_id=77, old_file_path=./test/#sql-ib1103-1917597994.ibd, new_file_path=./test/t1.ibd] [InnoDB] DDL log delete : by id 439 [InnoDB] DDL log insert : [DDL record: DROP, id=440, thread_id=7, table_id=1103] [InnoDB] DDL log insert : [DDL record: DELETE SPACE, id=441, thread_id=7, space_id=77, old_file_path=./test/#sql-ib1103-1917597994.ibd] [InnoDB] DDL log insert : [DDL record: DELETE SPACE, id=442, thread_id=7, space_id=78, old_file_path=./test/t1.ibd] [InnoDB] DDL log delete : by id 442 [InnoDB] DDL log insert : [DDL record: REMOVE CACHE, id=443, thread_id=7, table_id=1104, new_file_path=test/t1] [InnoDB] DDL log delete : by id 443 [InnoDB] DDL log insert : [DDL record: FREE, id=444, thread_id=7, space_id=78, index_id=194, page_no=4] [InnoDB] DDL log delete : by id 444 [InnoDB] DDL log insert : [DDL record: FREE, id=445, thread_id=7, space_id=78, index_id=195, page_no=5] [InnoDB] DDL log delete : by id 445[InnoDB] DDL log post ddl : begin for thread id : 7 [InnoDB] DDL log replay : [DDL record: DELETE SPACE, id=441, thread_id=7, space_id=77, old_file_path=./test/#sql-ib1103-1917597994.ibd] [InnoDB] DDL log replay : [DDL record: DROP, id=440, thread_id=7, table_id=1103] [InnoDB] DDL log post ddl : end for thread id : 7Truncate table是個(gè)比較有意思的話題,在早期5.6及之前的版本中, 是通過(guò)刪除舊表創(chuàng)建新表的方式來(lái)進(jìn)行的,5.7之后為了保證原子性,改成了原地truncate文件,同時(shí)增加了一個(gè)truncate log文件,如果在truncate過(guò)程中崩潰,可以通過(guò)這個(gè)文件在崩潰恢復(fù)時(shí)重新truncate。到了8.0版本,又恢復(fù)成了刪除舊表,創(chuàng)建新表的方式,與之前不同的是,8.0版本在崩潰時(shí)可以回滾到舊數(shù)據(jù),而不是再次執(zhí)行。以上述為例,主要包括幾個(gè)步驟:
- 將表t1.ibd rename成#sql-ib1103-1917597994.ibd
- 創(chuàng)建新文件t1.ibd
- post-ddl: 將老文件#sql-ib1103-1917597994.ibd刪除
RENAME TABLE
mysql> RENAME TABLE t1 TO t2; Query OK, 0 rows affected (0.06 sec)DDL LOG:
[InnoDB] DDL log insert : [DDL record: RENAME SPACE, id=450, thread_id=7, space_id=78, old_file_path=./test/t2.ibd, new_file_path=./test/t1.ibd] [InnoDB] DDL log delete : by id 450 [InnoDB] DDL log insert : [DDL record: RENAME TABLE, id=451, thread_id=7, table_id=1104, old_file_path=test/t2, new_file_path=test/t1] [InnoDB] DDL log delete : by id 451[InnoDB] DDL log post ddl : begin for thread id : 7 [InnoDB] DDL log post ddl : end for thread id : 7這個(gè)就比較簡(jiǎn)單了,只需要記錄rename space 和rename table的逆操作即可. post-ddl不需要做實(shí)際的操作
DROP TABLE
DROP TABLE t2 [InnoDB] DDL log insert : [DDL record: DROP, id=595, thread_id=7, table_id=1119] [InnoDB] DDL log insert : [DDL record: DELETE SPACE, id=596, thread_id=7, space_id=93, old_file_path=./test/t2.ibd][InnoDB] DDL log post ddl : begin for thread id : 7 [InnoDB] DDL log replay : [DDL record: DELETE SPACE, id=596, thread_id=7, space_id=93, old_file_path=./test/t2.ibd] [InnoDB] DDL log replay : [DDL record: DROP, id=595, thread_id=7, table_id=1119] [InnoDB] DDL log post ddl : end for thread id : 7先在ddl log中記錄下需要?jiǎng)h除的數(shù)據(jù),再提交后,再最后post-ddl階段執(zhí)行真正的刪除表對(duì)象和文件操作
代碼實(shí)現(xiàn):
主要實(shí)現(xiàn)代碼集中在文件storage/innobase/log/log0ddl.cc中,包含了向log_ddl表中插入記錄以及replay的邏輯。
隱藏的innodb_log_ddl表結(jié)構(gòu)如下
def->add_field(0, "id", "id BIGINT UNSIGNED NOT NULL AUTO_INCREMENT");def->add_field(1, "thread_id", "thread_id BIGINT UNSIGNED NOT NULL");def->add_field(2, "type", "type INT UNSIGNED NOT NULL");def->add_field(3, "space_id", "space_id INT UNSIGNED");def->add_field(4, "page_no", "page_no INT UNSIGNED");def->add_field(5, "index_id", "index_id BIGINT UNSIGNED");def->add_field(6, "table_id", "table_id BIGINT UNSIGNED");def->add_field(7, "old_file_path","old_file_path VARCHAR(512) COLLATE UTF8_BIN");def->add_field(8, "new_file_path","new_file_path VARCHAR(512) COLLATE UTF8_BIN");def->add_index(0, "index_pk", "PRIMARY KEY(id)");def->add_index(1, "index_k_thread_id", "KEY(thread_id)");記錄類型
根據(jù)不同的操作類型,可以分為如下幾類:
目的是釋放索引btree,入口函數(shù)log_DDL::write_free_tree_log,在創(chuàng)建索引和刪除表時(shí)會(huì)調(diào)用到
對(duì)于drop table中涉及的刪索引操作,log ddl的插入操作放到父事務(wù)中,一起要么提交要么回滾
對(duì)于創(chuàng)建索引的case, log ddl就需要單獨(dú)提交,父事務(wù)將記錄標(biāo)記刪除,這樣后面如果ddl回滾了,也能將殘留的index刪掉。
入口函數(shù):Log_DDL::write_delete_space_log
用于記錄刪除tablespace操作,同樣分為兩種情況:
入口函數(shù):Log_DDL::write_rename_space_log
用于記錄rename操作,例如如果我們把表t1 rename成t2,在其中就記錄了逆向操作t2 rename to t1.
在函數(shù)Fil_shard::space_rename()中,總是先寫ddl log, 再做真正的rename操作. 寫日志的過(guò)程同樣是獨(dú)立事務(wù)提交,父事務(wù)做未提交的刪除操作
入口函數(shù):?Log_DDL::write_drop_log
用于記錄刪除表對(duì)象操作,這里不涉及文件層操作,寫ddl log在父事務(wù)中執(zhí)行
入口函數(shù):?Log_DDL::write_rename_table_log
用于記錄rename table對(duì)象的逆操作,和rename space類似,也是獨(dú)立事務(wù)提交ddl log, 父事務(wù)標(biāo)記刪除
入口函數(shù):?Log_DDL::write_remove_cache_log
用于處理內(nèi)存表對(duì)象的清理,獨(dú)立事務(wù)提交,父事務(wù)標(biāo)記刪除
入口函數(shù):?Log_DDL::write_alter_encrypt_space_log
用于記錄對(duì)tablespace加密屬性的修改,獨(dú)立事務(wù)提交. 在寫完ddl log后修改tablespace page0 中的加密標(biāo)記
綜上,在ddl的過(guò)程中可能會(huì)提交多次事務(wù),大概分為三類:
- 獨(dú)立事務(wù)寫ddl log并提交,父事務(wù)標(biāo)記刪除, 如果父事務(wù)提交了,ddl log也被順便刪除了,如果父事務(wù)回滾了,那就要根據(jù)ddl log做逆操作來(lái)回滾ddl
- 獨(dú)立事務(wù)寫ddl log 并提交, (目前只有ALTER_ENCRYPT_TABLESPACE_LOG)
- 使用父事務(wù)寫ddl log,在ddl結(jié)束時(shí)提交。需要在post-ddl階段處理
post_ddl
如上所述,有些ddl log是隨著父事務(wù)一起提交的,有些則在post-ddl階段再執(zhí)行, post_ddl發(fā)生在父事提交或回滾之后: 若事務(wù)回滾,根據(jù)ddl log做逆操作,若事務(wù)提交,在post-ddl階段做最后真正不可逆操作(例如刪除文件)
入口函數(shù):?Log_DDL::post_ddl -->Log_DDL::replay_by_thread_id
根據(jù)執(zhí)行ddl的線程thread id通過(guò)innodb_log_ddl表上的二級(jí)索引,找到log id,再到聚集索引上找到其對(duì)應(yīng)的記錄項(xiàng),然后再replay這些操作,完成ddl后,清理對(duì)應(yīng)記錄
崩潰恢復(fù)
在崩潰恢復(fù)結(jié)束后,會(huì)調(diào)用ha_post_recover接口函數(shù),進(jìn)而調(diào)用innodb內(nèi)的函數(shù)Log_DDL::recover(), 同樣的replay其中的記錄,并在結(jié)束后刪除記錄。但ALTER_ENCRYPT_TABLESPACE_LOG類型并不是在這一步刪除,而是加入到一個(gè)數(shù)組ts_encrypt_ddl_records中,在之后調(diào)用resume_alter_encrypt_tablespace來(lái)恢復(fù)操作,
參考文檔
1. 官方文檔
2. wl#9536: support crash safe ddl
本文作者:zhaiwx_yinfeng
閱讀原文
本文為云棲社區(qū)原創(chuàng)內(nèi)容,未經(jīng)允許不得轉(zhuǎn)載。
《新程序員》:云原生和全面數(shù)字化實(shí)踐50位技術(shù)專家共同創(chuàng)作,文字、視頻、音頻交互閱讀總結(jié)
以上是生活随笔為你收集整理的深入解读MySQL8.0 新特性 :Crash Safe DDL的全部?jī)?nèi)容,希望文章能夠幫你解決所遇到的問(wèn)題。
- 上一篇: ios 原生骨架动画库
- 下一篇: nodejs实现webservice问题