[原]Oracle数据文件损坏的模拟和修复(一) |ORA-01578 data block corrupted|
造成數(shù)據(jù)塊損壞的原因通常是由于開啟了異步I/O或者增加了寫進(jìn)程,還有可能是硬件引起的,今天模擬一下該問題的發(fā)生及修復(fù)方法。由于水平有限,那面疏漏,歡迎大家指正。
?
創(chuàng)建測試環(huán)境
建立測試表空間:
create tablespace test datafile '/u02/oradata/logdw/test01.dbf' size 180K autoextend off logging segment space management auto extent management local;創(chuàng)建測試用戶并設(shè)置相應(yīng)的權(quán)限:
create user test identified by test default tablespace test; grant connect to test; grant resource to test; revoke unlimited tablespace from test; alter user test quota unlimited on test;插入測試數(shù)據(jù),直到填滿 test01.dbf:
declarei number; beginfor i in 1..99999 loopinsert into test_table values(i,lpad('Q',30));commit;end loop; end; SQL> declare2 i number;3 begin4 for i in 1..99999 loop5 insert into test_table values(i,lpad('Q',30));6 commit;7 end loop;8 end;9 / declare * ERROR at line 1: ORA-01653: unable to extend table TEST.TEST_TABLE by 8 in tablespace TEST ORA-06512: at line 5test01.dbf 已經(jīng)填滿了數(shù)據(jù)后,看看一共插入了多少條數(shù)據(jù):
select count(*) from test_table;COUNT(*) ----------2356如果我想再加一個索引是不會成功的,我為該表空間添加多一個數(shù)據(jù)文件來擴(kuò)容。
alter tablespace test add datafile '/u02/oradata/logdw/test02.dbf' size 180K autoextend off;擴(kuò)容后的 test 表空間可以創(chuàng)建索引了:
create index idx_test_table on test_table(id);通過這樣一些步驟,現(xiàn)在數(shù)據(jù)的分布情況是test01.dbf這個數(shù)據(jù)文件裝著數(shù)據(jù),test02.dbf裝著索引。可以使用查詢一下dba_extents 確認(rèn)一下:
select owner,segment_name,segment_type,extent_id,file_id,block_id,bytes from dba_extents where owner='TEST'; OWNER SEGMENT_NAME SEGMENT_TYPE EXTENT_ID FILE_ID BLOCK_ID BYTES ----- --------------- ------------------ ---------- ---------- ---------- ---------- TEST TEST_TABLE TABLE 0 15 8 65536 TEST TEST_TABLE TABLE 1 15 16 65536 TEST IDX_TEST_TABLE INDEX 0 16 8 65536 TEST IDX_TEST_TABLE INDEX 1 16 16 65536一條使用索引的典型SQL:
SQL> select count(id) from test_table; COUNT(ID) ----------2356---------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ---------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 13 | 4 (0)| 00:00:01 | | 1 | SORT AGGREGATE | | 1 | 13 | | | | 2 | INDEX FAST FULL SCAN| IDX_TEST_TABLE | 2356 | 30628 | 4 (0)| 00:00:01 | ----------------------------------------------------------------------------------------自此,我們的測試環(huán)境就建立完成了。
?
破壞并修復(fù)數(shù)據(jù)文件
在Linux修改二進(jìn)制文件的方法可以參考我這篇blog《在 Linux下使用vim配合xxd查看并編輯二進(jìn)制文件》,這里就不熬述了。不同位置的數(shù)據(jù)塊遭到損壞就會報不同的錯誤,非常幸運(yùn),我一個下午就遇到了3個。
?
使用drop和create修復(fù)ORA-01578
關(guān)閉數(shù)據(jù)庫后打開test02.dbf ,定位到偏移地址 001e9c0 左右的地方修改一下,要注意偏移的位置,不同的位置可能會引發(fā)不同的問題,保存并退出。
使用dbv工具檢查一下數(shù)據(jù)文件:
[oracle@logserver logdw]$ dbv file=test02.dbf blocksize=8192DBVERIFY: Release 11.2.0.1.0 - Production on Wed Jun 23 09:16:35 2010Copyright (c) 1982, 2009, Oracle and/or its affiliates. All rights reserved.DBVERIFY - Verification starting : FILE = /u02/oradata/logdw/test02.dbf Page 15 is marked corrupt Corrupt block relative dba: 0x0400000f (file 16, block 15) Bad check value found during dbv: Data in bad block:type: 6 format: 2 rdba: 0x0400000flast change scn: 0x0000.0a71e746 seq: 0x1 flg: 0x04spare1: 0x0 spare2: 0x0 spare3: 0x0consistency value in tail: 0xe7460601check value in block header: 0x46a7computed block checksum: 0x12d1Page 16 is marked corrupt Corrupt block relative dba: 0x04000010 (file 16, block 16) Bad check value found during dbv: Data in bad block:type: 6 format: 2 rdba: 0x04000010last change scn: 0x0000.0a71e746 seq: 0x2 flg: 0x04spare1: 0x0 spare2: 0x0 spare3: 0x0consistency value in tail: 0xe7460602check value in block header: 0x99f0computed block checksum: 0xadcaDBVERIFY - Verification completeTotal Pages Examined : 23 Total Pages Processed (Data) : 0 Total Pages Failing (Data) : 0 Total Pages Processed (Index): 4 Total Pages Failing (Index): 0 Total Pages Processed (Other): 10 Total Pages Processed (Seg) : 0 Total Pages Failing (Seg) : 0 Total Pages Empty : 7 Total Pages Marked Corrupt : 2 Total Pages Influx : 0 Total Pages Encrypted : 0 Highest block SCN : 175236934 (0.175236934)此時,數(shù)據(jù)庫可以正常啟動,但是當(dāng)接觸到壞塊的時候就會報ORA-01578:
SQL> startup ; ORACLE instance started.Total System Global Area 1887350784 bytes Fixed Size 2214456 bytes Variable Size 1560282568 bytes Database Buffers 301989888 bytes Redo Buffers 22863872 bytes Database mounted. Database opened. SQL> connect test/test SQL> show user; USER is "TEST" SQL> set autotrace on ; SQL> select count(id) from test_table; select count(id) from test_table* ERROR at line 1: ORA-01578: ORACLE data block corrupted (file # 16, block # 15) ORA-01110: data file 16: '/u02/oradata/logdw/test02.dbf'SQL> select count(*) from test_table;COUNT(*) ----------2356------------------------------------------------------------------------- | Id | Operation | Name | Rows | Cost (%CPU)| Time | ------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 5 (0)| 00:00:01 | | 1 | SORT AGGREGATE | | 1 | | | | 2 | TABLE ACCESS FULL| TEST_TABLE | 2356 | 5 (0)| 00:00:01 | -------------------------------------------------------------------------如果查詢不用到索引,就不會有問題,證明我們的數(shù)據(jù)沒有問題,僅是損失了索引。由于場景特殊,我們知道肯定是索引損壞了,數(shù)據(jù)沒有丟失的,通常的場景是我們知道損壞所在的文件編號和塊編號,但是不知道對應(yīng)的是什么對象,可以用以下查詢定位到對應(yīng)的對象上:
sys$logdw@logserver SQL> SELECT tablespace_name, owner,segment_name,segment_type2 FROM dba_extents3 WHERE file_id = &file4 and &block between block_id AND block_id + blocks - 1 ; Enter value for file: 16 old 3: WHERE file_id = &file new 3: WHERE file_id = 16 Enter value for block: 15 old 4: and &block between block_id AND block_id + blocks - 1 new 4: and 15 between block_id AND block_id + blocks - 1TABLESPACE OWNER SEGMENT_NAME SEGMENT_TYPE ---------- ---------- -------------------- ------------------ TEST TEST IDX_TEST_TABLE INDEX由于數(shù)據(jù)沒有丟失,我嘗試重建索引。
SQL> alter index IDX_TEST_TABLE rebuild online ; alter index IDX_TEST_TABLE rebuild online * ERROR at line 1: ORA-00604: error occurred at recursive SQL level 1 ORA-01658: unable to create INITIAL extent for segment in tablespace TEST空間不足……,于是drop后再create吧:
SQL> drop index IDX_TEST_TABLE ; Index dropped.Elapsed: 00:00:00.57 SQL> create index IDX_TEST_TABLE on test_table(id);Index created.Elapsed: 00:00:00.10我們再看看使用索引的查詢:
SQL> select count(id) from test_table ;COUNT(ID) ----------2356---------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ---------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 4 | 3 (0)| 00:00:01 | | 1 | SORT AGGREGATE | | 1 | 4 | | | | 2 | INDEX FAST FULL SCAN| IDX_TEST_TABLE | 2356 | 9424 | 3 (0)| 00:00:01 | ----------------------------------------------------------------------------------------索引恢復(fù)正常了。
該方法適用于壞塊落在非數(shù)據(jù)區(qū)的情況。
?
使用RMAN修復(fù)ORA-01578
如果損壞了僅僅兩個數(shù)據(jù)塊(8K*2),導(dǎo)致一個幾百兆的索引重建,這樣的很不值,我們可以試一下Oracle的blockrecover,但前提是之前有備份。
RMAN> blockrecover datafile 16 block 15; RMAN> blockrecover datafile 16 block 16; -- 或者組合起來 RMAN> blockrecover datafile 16 block 15 datafile 16 block 16; RMAN> blockrecover datafile 16 block 15,16;如果壞塊很多,例如我們就要寫很長很長的命令,此時可以試一下:
RMAN> blockrecover corruption list;這個命令是依靠 v$database_block_corruption 這個表進(jìn)行工作的。Oracle每當(dāng)發(fā)現(xiàn)一個損壞的數(shù)據(jù)塊就會在這個表中多一條記錄,相反地如果沒有碰到,就自然不會在這表中顯示出來的。
sys$logdw@logserver SQL> select * from v$database_block_corruption;FILE# BLOCK# BLOCKS CORRUPTION_CHANGE# CORRUPTIO ---------- ---------- ---------- ------------------ ---------16 15 2 0 CHECKSUM我們可以借助rman對數(shù)據(jù)庫、文件或表空間進(jìn)行“掃描”,檢查出所有的壞塊。
RMAN> backup validate database; RMAN> backup validate tablespace test; RMAN> backup validate datafile 16;如果數(shù)據(jù)庫比較大或者表空間也比較大,backup validate就會長時間占用大量I/O資源,因此范圍應(yīng)該盡量小。
RMAN> backup validate datafile 16;Starting backup at 2010-06-23 10:42:23 using target database control file instead of recovery catalog allocated channel: ORA_DISK_1 channel ORA_DISK_1: SID=71 device type=DISK channel ORA_DISK_1: starting full datafile backup set channel ORA_DISK_1: specifying datafile(s) in backup set input datafile file number=00016 name=/u02/oradata/logdw/test02.dbf channel ORA_DISK_1: backup set complete, elapsed time: 00:00:01 List of Datafiles ================= File Status Marked Corrupt Empty Blocks Blocks Examined High SCN ---- ------ -------------- ------------ --------------- ---------- 16 FAILED 0 7 23 175356605 File Name: /u02/oradata/logdw/test02.dbfBlock Type Blocks Failing Blocks Processed---------- -------------- ----------------Data 0 0 Index 2 6 Other 0 10 validate found one or more corrupt blocks See trace file /u01/app/oracle/diag/rdbms/logdw/logdw/trace/logdw_ora_30130.trc for details Finished backup at 2010-06-23 10:42:26只要讓Oracle直到了哪些是壞塊就好辦了,是blockrecover即可修復(fù)壞塊:
RMAN> blockrecover corruption list; Starting recover at 2010-06-23 11:00:49 using target database control file instead of recovery catalog allocated channel: ORA_DISK_1 channel ORA_DISK_1: SID=136 device type=DISK searching flashback logs for block imagesfinished flashback log search, restored 2 blocksstarting media recovery media recovery complete, elapsed time: 00:00:01Finished recover at 2010-06-23 11:10:48再使用索引查詢一下數(shù)據(jù):
test$logdw@logdw SQL> select count(id) from test_table ;COUNT(ID) ----------2356---------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ---------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 4 | 3 (0)| 00:00:01 | | 1 | SORT AGGREGATE | | 1 | 4 | | | | 2 | INDEX FAST FULL SCAN| IDX_TEST_TABLE | 2356 | 9424 | 3 (0)| 00:00:01 | ----------------------------------------------------------------------------------------此時索引idx_test_table可以用了,一切都是在線做的。
該方法需要有rman的備份,如果日常有rman的備份就不用怕了。
?
使用exp和imp盡力挽救數(shù)據(jù)
如果沒有rman備份呢?以下主要參考蓋國強(qiáng)的《Oracle中模擬及修復(fù)數(shù)據(jù)塊損壞》
我們再用同樣的方法編輯test01.dbf,定位到最后,將數(shù)據(jù)破壞。
可以預(yù)見test_table的數(shù)據(jù)已經(jīng)損壞,索引沒有損壞。
test$logdw@logdw SQL> select count(*) from test_table ; select count(*) from test_table* ERROR at line 1: ORA-01578: ORACLE data block corrupted (file # 15, block # 23) ORA-01110: data file 15: '/u02/oradata/logdw/test01.dbf'Elapsed: 00:00:01.93 test$logdw@logdw SQL> select count(id) from test_table ; COUNT(ID) ----------2356---------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ---------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 4 | 3 (0)| 00:00:01 | | 1 | SORT AGGREGATE | | 1 | 4 | | | | 2 | INDEX FAST FULL SCAN| IDX_TEST_TABLE | 2356 | 9424 | 3 (0)| 00:00:01 | ----------------------------------------------------------------------------------------由于沒有已經(jīng)假設(shè)沒有rman的備份,所以blockrecover已經(jīng)不可能了,此時可以用vim+xxd重新對test01.dbf進(jìn)行修改或者借助內(nèi)部工具 bbed 將壞塊修正也是可以的,據(jù)說這是Oracle 8i之前的做法,但幾乎是mission impossible。
如果我們將任務(wù)定位為可以容忍數(shù)據(jù)丟失,希望能盡量挽救,減少數(shù)據(jù)丟失。
我們使用導(dǎo)出得到一個錯誤
通過設(shè)置內(nèi)部事件,設(shè)置在全表掃描時跳過損壞的數(shù)據(jù)塊
ALTER SYSTEM SET EVENTS='10231 trace name context forever,level 10' ; ALTER SYSTEM SET EVENTS='10231 trace name context off' ;exp可以成功執(zhí)行,將損壞的數(shù)據(jù)表刪除掉,在imp即可挽救部分?jǐn)?shù)據(jù)。
在當(dāng)前的場景中,沒有設(shè)置內(nèi)部事件的情況下挽救了 2356 條數(shù)據(jù)中的 1920 條;而設(shè)置了內(nèi)部事件之后挽救了 2175 條數(shù)據(jù)。
該方法幾乎成了最后的救命稻草了,如果你足夠強(qiáng)悍可以試一下用bbed或者其他工具直接對數(shù)據(jù)文件進(jìn)行二進(jìn)制編輯從而手動修復(fù)數(shù)據(jù)文件。
?
小結(jié)
遇到ORA-01578后,第一步應(yīng)首先確定rman備份情況,如果有可用備份,一切都不太糟糕,幾條rman命令就可以修復(fù);
如果沒有再確定壞塊對應(yīng)的是什么對象,如果是索引等非數(shù)據(jù)對象,可以考慮drop后再create一次;
如果壞塊落在數(shù)據(jù)上,設(shè)置10231內(nèi)部事件,然后求神拜佛,能救回多少數(shù)據(jù)只能看人品了。
轉(zhuǎn)載于:https://www.cnblogs.com/killkill/archive/2010/06/24/1764067.html
《新程序員》:云原生和全面數(shù)字化實(shí)踐50位技術(shù)專家共同創(chuàng)作,文字、視頻、音頻交互閱讀總結(jié)
以上是生活随笔為你收集整理的[原]Oracle数据文件损坏的模拟和修复(一) |ORA-01578 data block corrupted|的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: DIRECTORY_SEPARATOR
- 下一篇: 朝阳科技是做什么的 主要专注电声部件及电