Oracle ORA-04031 错误 说明
?
?????? 在 老熊 的Blog上看到他們寫的有關(guān)ORA-04031的文章,轉(zhuǎn)到blog。
?
老熊的Blog:
http://www.laoxiong.net/an-ora-04031-case.html
?
?????? ORA-04031這個(gè)錯(cuò)誤,幾乎每一個(gè)專業(yè)的DBA都遇到過。這是一個(gè)相當(dāng)嚴(yán)重的錯(cuò)誤,Oracle進(jìn)程在向SGA申請內(nèi)存時(shí),如果申請失敗,則會報(bào)這個(gè)錯(cuò)誤。大部分情況下是在向SGA中的shared pool申請內(nèi)存時(shí)失敗,而少有向large pool等池中申請內(nèi)存失敗。比如下面的報(bào)錯(cuò):
?
Wed Apr 27 16:00:25 2011
Errors in file /oracle/app/oracle/admin/zxin/bdump/zxin1_ora_2052294.trc:
ORA-04031: unable to allocate 4128 bytes of shared memory
("shared pool","unknown object","sga heap(3,0)","kgllk hash table")
?
這里很清楚地表示出來,是在向shared pool申請內(nèi)存時(shí)失敗。
?
shared pool內(nèi)存申請(分配)失敗,通常有如下的幾種可能:
?????? (1)shared pool過小,比如在SGA Manual Management方式下,shared pool設(shè)置過小。比如一套數(shù)千連接的大系統(tǒng),shared pool只設(shè)置了幾百M。這種情況下,要解決問題很解單,增加shared pool的大小即可。
?????? (2)應(yīng)用沒有使用綁定變量,硬解析非常多,導(dǎo)致shared pool內(nèi)存碎片嚴(yán)重,分配大塊內(nèi)存時(shí)不能獲得連續(xù)的內(nèi)存空間。硬解析多的一個(gè)變種是雖然使用了綁定變量,但是由于某種原因,Cursor不能共享,導(dǎo)致Child Cursor非常多。實(shí)際上,如果shared pool較大(比如數(shù)GB大小),這種問題還是很少出現(xiàn)的,并且出現(xiàn)也通常出現(xiàn)在申請大塊內(nèi)存時(shí)。這種情況如果使用alter system flush shared_pool可以暫時(shí)緩解問題。但是這條命令又通常不適用于shared pool較大而且比較繁忙的系統(tǒng)。使用綁定變量
?????? (3) Cache的cursor很多,同時(shí)cursor_space_for_time這一參數(shù)設(shè)置為TRUE,可能會使shared pool碎片化嚴(yán)重,導(dǎo)致不能分配到大塊的連續(xù)內(nèi)存。
?????? (4)Oracle的BUG導(dǎo)致內(nèi)存泄露,比如在一些版本中查詢v$segment_statistics這樣的視圖導(dǎo)致內(nèi)存泄露,使shared pool內(nèi)存耗光。同樣的情形還有類似于“obj stat memory”,”gcs resources”,”ges resources”等。通常這類內(nèi)存為perm類型(permanet),這類內(nèi)存通常是在分配時(shí)就確定了固定的用途,不能用于其他用途,因此極容易產(chǎn)生碎片。
?????? (5)Oracle從9i開始,根據(jù)shared pool的大小將shared pool分為多個(gè)子池(subpool),每個(gè)子池有獨(dú)立的free list,同時(shí)在分配時(shí)單獨(dú)管理(有其獨(dú)立 的shared pool latch)。Oracle的BUG或者說是內(nèi)存分配策略缺陷導(dǎo)致某一類shared pool的內(nèi)存分配只在一個(gè)子池(subpool)中,即多個(gè)子池的使用極不均衡,導(dǎo)致向那個(gè)使用得最多的子池申請內(nèi)存時(shí)失敗。報(bào)錯(cuò)信息中的”sga heap(3,0)”即指明是在第3個(gè)子池申請內(nèi)存時(shí)失敗。本文案例中的ORA-04031錯(cuò)誤其產(chǎn)生的原因可以歸結(jié)為Oracle對shared pool的分配/使用策略問題。
?????? (6)操作系統(tǒng)內(nèi)存不足,這只會出現(xiàn)在shared pool的使用還沒有達(dá)到最大值時(shí)才會出現(xiàn),并且在操作系統(tǒng)都有swap的情況下,只有部分操作系統(tǒng)才可能有這種情況,比如在HP-UX下,reserved 內(nèi)存過多導(dǎo)致swap滿。
?????? (7)其他原因,多數(shù)是因?yàn)?/span>BUG。請參考下面提及的MOS參考文檔。
?
?????? 本文中的案例,其數(shù)據(jù)庫是運(yùn)行在AIX 5.3系統(tǒng)中的10.2.0.4 RAC,RAC節(jié)點(diǎn)數(shù)為2。數(shù)據(jù)庫是從9i升級到10g,而目前處于正式升級前的測試階段。數(shù)據(jù)庫報(bào)的ORA-04031錯(cuò)誤信息如本文前面所示(其中的數(shù)據(jù)庫名稱已經(jīng)做了處理)。
?
在繼續(xù)講解案例之前,不得不提到MOS上的幾篇關(guān)于ORA-04031錯(cuò)誤的文檔:
?????? (1)Master Note for Diagnosing ORA-4031 [ID 1088239.1]
?????? (2)Diagnosing and Resolving Error ORA-04031 on the Shared Pool or Other Memory Pools [Video] [ID 146599.1]
?????? (3)Interpreting the automatically generated ORA-4031 diagnostic trace. [ID 809560.1]
?????? (4)Troubleshooting and Diagnosing ORA-4031 Error [Video] [ID 396940.1]
?????? (5)ORA-4031 Common Analysis/Diagnostic Scripts [Video] [ID 430473.1]
?
其實(shí)分析ORA-04031錯(cuò)誤,通常有以下幾個(gè)要點(diǎn):
?????? (1)判斷錯(cuò)誤發(fā)生所有的內(nèi)存區(qū)域,是shared pool,large pool還是streams pool等。這個(gè)很容易從錯(cuò)誤信息中判斷出來,本文主要描述shared pool的ORA-04031錯(cuò)誤,這也是最常見的。
?????? (2)檢查Shared Pool的總大小以及free memory的大小。如果free memory看上去挺多,以subpool為單位檢查是否存在是由于碎片導(dǎo)致沒有足夠的連續(xù)內(nèi)存以供分配,特別是關(guān)注報(bào)錯(cuò)信息中提及的子池。
?????? (3)如果Shared Pool相較于系統(tǒng)規(guī)模來說足夠大(通常數(shù)GB都已經(jīng)是很大的了),檢查Shared Pool中有沒有占用非常多的內(nèi)存類型或內(nèi)存組件,如果有,是什么樣的類型的內(nèi)存,在各個(gè)子池之間是否分布均勻。如果有異常占用較多的內(nèi)存類型,根據(jù)此類型在MOS上搜尋是否是會有相應(yīng)的BUG引起,或者分析這種類型的內(nèi)存消耗較多的原因。比如如果是sql area很大,檢查是不是硬解析特別多,或者是不是child cursor特別多引起。
?????? (4)基于以上分析的數(shù)據(jù),來判斷shared pool內(nèi)存分配失敗的原因。
?
?????? 上面的步驟寫得比較粗略,關(guān)于分析和解決ORA-04031問題,這里也有一篇不錯(cuò)的文章:Simplified Approach to Resolve ORA-4031
?
?????? 這里關(guān)鍵的是分析Shared Pool的內(nèi)存數(shù)據(jù)。ORA-04031錯(cuò)誤發(fā)生后如果有條件可以馬上連接到數(shù)據(jù)庫中查詢相應(yīng)的x$表和v$視圖得到相應(yīng)的數(shù)據(jù),否則只能通過ORA-4031錯(cuò)誤發(fā)生時(shí)產(chǎn)生的trace文件。_4031_dump_bitvec這個(gè)隱含參數(shù)用于控制發(fā)生ORA-04031錯(cuò)誤時(shí)對SGA的dump行為,而trace文件的分析就不像使用SQL那樣簡單了。
?
下面再來詳細(xì)地分析案例:
?????? 從錯(cuò)誤信息來看,很顯然,是向shared pool的第3個(gè)subpool申請內(nèi)存時(shí)出錯(cuò)。以下的數(shù)據(jù)是shared pool的數(shù)據(jù):
?
SQL> select sum(bytes)/1024/1024 mb from v$sgastat where pool='shared pool';
MB
----------
4096.53062
?
SQL> SELECT KSMCHCLS CLASS, COUNT(KSMCHCLS) NUM, SUM(KSMCHSIZ)
? 2? SIZ,
? 3? To_char( ((SUM(KSMCHSIZ)/COUNT(KSMCHCLS)/1024)),'999,999.00')|
? 4? |'k' "AVG SIZE"
? 5? FROM X$KSMSP GROUP BY KSMCHCLS;
?
CLASS????? NUM???? SIZ AVG?? ?SIZE
-------- ---------- ---------- ------------
R-freea???????? 512????? 24576???????? .05k
freeabl????? 807395 1643969848??????? 1.99k
recr???????? 530728? 662065240??????? 1.22k
R-free????????? 256? 214910976????? 819.82k
free????????? 43063? 100605496??????? 2.28k
perm??????????? 140 1673368632?? 11,672.49k
?
注意:
?????? 在生產(chǎn)庫上查詢X$KSMSP時(shí),要看下系統(tǒng)的繁忙或者說是負(fù)載高低,因?yàn)榭赡軙?dǎo)致db hang 住。
?
?
雖然free的數(shù)量不是太多,但是freeable的數(shù)量還是很多的。
下面是各個(gè)子池更詳細(xì)的數(shù)據(jù):
?
SQL> SELECT KSMCHIDX,KSMCHDUR, KSMCHCLS CLASS, COUNT(KSMCHCLS) NUM, SUM(KSMCHSIZ)
? 2? SIZ,
? 3? To_char( ((SUM(KSMCHSIZ)/COUNT(KSMCHCLS)/1024)),'999,999.00')|
? 4? |'k' "AVG SIZE"
? 5? FROM X$KSMSP GROUP BY KSMCHIDX,KSMCHDUR, KSMCHCLS
? 6? order by 1,2,3;
?
KSMCHIDX?? KSMCHDUR CLASS?????? NUM??????? SIZ AVG SIZE
---------- ---------- -------- ---------- ---------- ------------
???????? 1????????? 1 R-free?????????? 27?? 22666392????? 819.82k
????????????????????? R-freea????????? 54?????? 2592???????? .05k
????????????????????? free???????????? 26????? 14024???????? .53k
????????????????????? perm???????????? 32? 430299448?? 13,131.70k
??????????????????? 2 R-free?????????? 12?? 10073952????? 819.82k
????????????????????? R-freea????????? 24?????? 1152???????? .05k
????????????????????? free????????? 10531 ??24519112??????? 2.27k
????????????????????? freeabl?????? 44922?? 32457736???????? .71k
????????????????????? recr???????? 163177? 134273584???????? .80k
??????????????????? 3 R-free??????????? 9??? 7555464????? 819.82k
????????????????????? R-freea???? ?????18??????? 864???????? .05k
????????????????????? free?????????? 1678??? 4555704??????? 2.65k
????????????????????? freeabl?????? 79815? 102514024??????? 1.25k
????????????????????? recr????????? 32689?? 36368096??????? 1.09k
??????????????????? 4 R-free?????????? 20?? 16789920????? 819.82k
????????????????????? R-freea????????? 40?????? 1920???????? .05k
????????????????????? free?????????? 2182??? 5810056??????? 2.60k
????????????????????? freeabl?????? 66235? 254656184??????? 3.75k
????????????????? ????recr????????? 16245?? 58284480??????? 3.50k
???? ????2????????? 1 R-free?????????? 25?? 20987400????? 819.82k
????????????????????? R-freea????????? 50?????? 2400???????? .05k
????????????????????? free???????????? 23????? 20016???????? .85k
????????? ????????????perm???????????? 35? 398418384?? 11,116.58k
??????????????????? 2 R-free??????????? 4??? 3357984????? 819.82k
????????????????????? R-freea?????????? 8??????? 384???????? .05k
????????????????????? free?????????? 5137??? 6604176??????? 1.26k
? ????????????????????freeabl?????? 40377?? 12140944???????? .29k
????????????????????? recr????????? 54942?? 45005024???????? .80k
??????????????????? 3 R-free??????????? 9??? 7555464????? 819.82k
????????????????????? R-freea????????? 18??????? 864??????? ?.05k
????????????????????? free?????????? 1477??? 5524568??????? 3.65k
????????????????????? freeabl?????? 79548? 101879808??????? 1.25k
????????????????????? recr????????? 32380?? 36033448??????? 1.09k
??????????????????? 4 R-free?????????? 21?? 17629416????? 819.82k
????????????????????? R-freea????????? 42?????? 2016???????? .05k
????????????????????? free?????????? 2540??? 7092424??????? 2.73k
????????????????????? freeabl?????? 70133? 270332800??????? 3.76k
????????????????????? recr????????? 15924?? 57263032??????? 3.51k
?? ??????3????????? 1 R-free?????????? 26?? 21826896????? 819.82k
????????????????????? R-freea????????? 52?????? 2496???????? .05k
????????????????????? free???????????? 26????? 20520???????? .77k
????????????????????? perm????????? ???33? 414355416?? 12,261.94k
??????????????????? 2 R-free??????????? 4??? 3357984????? 819.82k
????????????????????? R-freea?????????? 8??????? 384???????? .05k
????????????????????? free?????????? 4693??? 7053032??????? 1.47k
????????????????????? freeabl?????? 49723?? 14339800???????? .28k
????????????????????? recr????????? 52771?? 42357312???????? .78k
??????????????????? 3 R-free?????????? 11??? 9234456????? 819.82k
????????????????????? R-freea????????? 22?????? 1056???????? .05k
??????????????????? ??free?????????? 3594??? 9280904??????? 2.52k
????????????????????? freeabl?????? 95823? 121934488??????? 1.24k
????????????????????? recr????????? 39643?? 44097504??????? 1.09k
??????????????????? 4 R-free?????????? 25?? 20987400????? 819.82k
??????????? ??????????R-freea????????? 50?????? 2400???????? .05k
????????????????????? free?????????? 2822??? 7291680??????? 2.52k
????????????????????? freeabl?????? 84443? 323149712??????? 3.74k
????????????????????? recr????????? 19148?? 67997008??????? 3.47k
??? ?????4????????? 1 R-free?????????? 27?? 22666392????? 819.82k
????????????????????? R-freea????????? 54?????? 2592???????? .05k
????????????????????? free???????????? 26????? 18088???????? .68k
????????????????????? perm???????????? 40? 430295384?? 10,505.26k
??????????????????? 2 R-free??????????? 6??? 5036976????? 819.82k
????????????????????? R-freea????????? 12??????? 576???????? .05k
????????????????????? free?????????? 4818?? 11473920??????? 2.33k
????????????????????? freeabl?????? 46089?? 39963224? ???????.85k
????????????????????? recr????????? 54061?? 44188072???????? .80k
??????????????????? 3 R-free??????????? 9??? 7555464????? 819.82k
????????????????????? R-freea????????? 18??????? 864???????? .05k
????????????????????? free?????????? 1427??? 4109504??????? 2.81k
????????????????????? freeabl?????? 80026? 102379080??????? 1.25k
????????????????????? recr????????? 33217?? 36949240??????? 1.09k
??????????????????? 4 R-free?????????? 21?? 17629416????? 819.82k
????????????????????? R-freea???????? ?42?????? 2016???????? .05k
????????????????????? free?????????? 1993??? 6228464??????? 3.05k
????????????????????? freeabl?????? 70581? 269087136??????? 3.72k
????????????????????? recr????????? 16614?? 59372656??????? 3.49k
?
?????? KSMCHDUR是什么意思?在9i里面這個(gè)列的值通常為1。實(shí)際上,Oracle從9i開始,將shared pool劃分為多個(gè)sub pool。而在10g以上的版本中(具體開始的版本號已經(jīng)不記得),每個(gè)sub pool又分了4個(gè)更小的池,我們暫且稱之為mini heap。每個(gè)mini heap有其自己的free list。KSMCHDUR這一列即表示mini heap的編號。”heap(3,0)”中的0是指第1個(gè)mini heap。
?
?????? 在上面的數(shù)據(jù)中,可以看到這個(gè)子池的第1個(gè)mini heap的free已經(jīng)很少,只有10來K。另外,我們可以觀察到,perm類型的內(nèi)存塊只存在于每個(gè)sub pool的第1個(gè)min heap中。這個(gè)是一個(gè)重點(diǎn),在后面會有解釋。
?
?????? 這里本應(yīng)該有通過查詢v$sgastat得到shared pool的各個(gè)組件占用的內(nèi)存分布,只是寫BLOG時(shí)找不到了….但是我們可以在trace文件中找到數(shù)據(jù),下面只列出sub pool 3的數(shù)據(jù):
?
==============================
Memory Utilization of Subpool 3
================================
???? Allocation Name????????? Size
_________________________? __________
"free memory????????????? "??? 81466568
"miscellaneous??????????? "?????????? 0
"dpslut_kfdsg???????????? "???????? 512
"trace buffer???????????? "????? 737280
"trace_knlasg???????????? "????? ???504
"gcs res hash bucket????? "???? 1048576
"gcs res latch table????? "?????? 12288
"evaluation con?????????? "?????????? 0
"sql area???????????????? "?? 344545752
"UNDO STAT INFO?????????? "?????? 59904
"txncallback????????????? "????? 141744
"transaction????????????? "???? 2103264
"ges resource pools?????? "??????? 3968
"sessions???????????????? "???? 4526488
"dlo fib struct?????????? "????? 128032
"KJCTS process batching st"???????? 240
"row cache??????????????? "??????? 3272
"KCB where statistics arra"?????? 25888
"KCB buffer wait statistic"?????? 32000
"KCB incremental ckpt entr"???????? 512
"invalid low rba queue??? "??????? 1024
"table definiti?????????? "????? 108704
"temporary tabl?????????? "??????? 4136
"KCL instance cache transf"????? 131072
"resumable??????????????? "??????? 2720
"KESTB existence bitvec se"???????? 128
"type object de?????????? "????? 392448
"enqueue_hash???????????? "????? 318960
"KSXR pending consumption "?????? 20192
"KTI SGA freeable small po"?????????? 0
"trigger defini? ?????????"????? 885472
"trigger source?????????? "?????? 99264
"trigger inform?????????? "???????? 960
"KTCN: Obj Invalidation Se"??????? 2336
"kmgsb circular statistics"????? 108800
"kgl lock hash table state"?????? 45360
"kglsim size of pinned mem"????? ??8024
"kelr system metrics table"???????? 280
"kzctxgjsi ksuseclid memor"????? 117360
"kzctxgjci ksuseclid? memo"?????????? 0
"CCursor????????????????? "??? 95912048
"ksr message pool free que"????? 188960
"ksb ci process list (each"???????? 144
"ksunfy: nodes of hierarch"???????? 320
"ksuloi: long op free list"???????? 256
"kwqmncal: allocate buffer"??????? 4048
"ksim group query request "?????????? 0
"ksuxds ksuseclid? memory "?????????? 0
"call???????????????????? "?????? 87304
"dictionary cache??????? ?"?????????? 0
"KSXR pending reply queue "????? 255488
"hng: All sessions data fo"?????????? 0
"ksfv subheap descriptor? "???????? 184
"gcs resources??????????? "?? 169082312
"gcs affinity???????????? "??????? 8320
"gcs opaque in??????????? "?????? 12312
"PCursor????????????????? "??? 50743128
"ges resource???????????? "????? 539376
"fdrec_kffsg????????????? "????????? 24
"work area tab??????????? "?????? 80640
"kglsim main lru count??? "?????? 38400
"plwpil:wa??????????????? "??????? 4264
"grptab_kfgsg??? ?????????"??????? 2464
"AW SGA?????????????????? "????????? 40
"KEWS sesstat seg tbl???? "?????????? 8
"kebm slave descriptors?? "??????? 1456
"kglsim hash table bkts?? "???? 1048576
"KSXR global channels???? "??????? 1288
"ges enqueues???????????? "??? 17333720
"PLS chunk??????????????? "???????? 352
"KSQ event description??? "??????? 1440
"KESTB existence bitvec?? "??????? 4096
"gcs shadows????????????? "?? 101246344
"qmtb_init_data?????????? "???????? 224
"Core dump directory????? "???????? 264
"sort segment handle????? "??????? 7480
"SERVICE NAME ENTRY?????? "????????? 48
"PQ/BizCard?????????????? "??????? 1536
"qtree_kwqbspse?????????? "????????? 40
"latch descriptor table?? "??????? 1576
"recovery domain????????? "?????? 29856
"parameters????????????? ?"?????? 30056
"SHARED SERVERS INFO????? "???????? 240
"qtree_kwqbsgn??????????? "????????? 40
"post agent?????????????? "?????????? 0
"pspool_kfsg????????????? "????????? 80
"plwsppwp:wa????????????? "?????????? 0
"PL/SQL DIANA???????????? "??? 14050624
"segmented arrays???????? "??????? 2072
"Checkpoint queue???????? "???? 4097024
"sim lru segments???????? "??????? 2560
"sim segment hits???????? "??????? 2560
"sim state object???????? "????????? 40
"partitioning d?????????? "????? 199616
"ASH buffers???? ?????????"???? 8388608
"message pool freequeue?? "????? 276336
"PL/SQL MPCODE??????????? "???? 4499360
"PL/SQL PPCODE??????????? "???? 3984944
"procs: ksunfy??????????? "???? 1512000
"primem_kfmdsg??????????? "??????? 1032
"SYSTEM PARAMETERS??????? "????? ?76624
"object queue hash buckets"????? 262656
"object queue hash table d"??????? 7552
"object level stats hash t"???????? 512
"object stat dummy statprv"???????? 144
"sim cache sizes????????? "???????? 320
"logout storm management? "?????? 24000
"pl/sql source??????????? "?????? 21256
"sys event stats????????? "????? 199136
"parameter handle???????? "?????? 67896
"Parameter Handle???????? "??????? 1656
"channel handle?????????? "????? 828672
"API blockers array?????? "????????? 64
"PARAMETER TABLE???????? ?"??????? 2048
"PARAMETER ENTRY????????? "?????????? 8
"LGWR post requested array"????????? 24
"bloom filter???????????? "??????? 3104
"param hash values??????? "??????? 5984
"sql area:PLSQL?????????? "??? 11477776
"PX subheap desc????????? "???????? 256
"repository?????????????? "????? 213544
"sql area:KOKA??????????? "?????? 16192
"archive_lag_target?????? "??????? 9624
"state objects??????????? "???????? 640
"latch nowait fails or sle"????? 116832
"sched job slv??????????? "??????? 5952
"pso tbs: ksunfy ?????????"????? 390000
"dummy??????????????????? "????? 269928
"Sort Segment???????????? "?????? 37440
"Cursor Stats???????????? "???? 6095760
"Banner Storage?????????? "??????? 2048
"quiescing session??????? "??????? 3872
"API data buffer????????? "????? ????16
"buffer handles?????????? "???? 1020000
"prmtzdini tz region????? "????? 408320
"sga node map???????????? "????????? 16
"savepoints?????????????? "?????????? 0
"Managed Standby Proc Arra"?????? 24576
"OS proc request holder?? "??????? 4664
"db_files???????????????? "????? 416576
"PX server msg stats????? "??????? 2288
"KQR M PO???????????????? "????? 283376
"kks stats??????????????? "????????? 40
"parameter table block??? "????? 483168
"KSFV SGA???????????????? "???????? 824
"plugin datafile array?? ?"?????? 36016
"plwda:PLW_STR_NEW_RVAL?? "????????? 24
"plwspv:PLW_STR_NEW_VAL?? "????????? 16
"KGKP sga???????????????? "????????? 32
"BRANCH TABLE SEGMENTED AR"?????? 70176
"mvobj part des?????????? "????? 306544
"parameter value memory?? "???????? 216
"multiblock re??????????? "?????? 98496
"parameter text value???? "??????? 1080
"parallel_max_servers???? "??????? 8192
"KGLS heap??????????????? "??? 13290800
"KGSKI sga??????????????? "????????? 80
"resize request state obje"????? 368000
"MTTR advisory?? ?????????"???? 1462832
"monitoring co??????????? "?????? 12480
"rules engine aggregate st"??????? 1416
"krbmror????????????????? "?????? 36400
"joxs heap??????????????? "???????? 136
"krbmrsr????????????????? "???????? 152
"ksfqpar????????????????? "????? ??4008
"SGA - SWRF DrvMet Runtime"??????? 2656
"SGA - SWRF Metrics ksuTim"????????? 72
"SGA - SWRF RawMet Runtime"??????? 1408
"SGA - SWRF Metrics WCTime"????????? 32
"SQL Memory Manager Base W"?????? 13400
"change notification regis"??????? 4096
"simulator latch/bucket st"?????? 59392
"Prefetch history buffer? "??????? 2832
"change notification obj m"??????? 4096
"KQR ENQ????????????????? "?????? 16512
"kksss??????????????????? "?????? 16464
"API data buffer length?? "?????????? 0
"kokcd?????????????????? ?"?????????? 0
"kohsg??????????????????? "?????????? 8
"Sequence Background Insta"????????? 88
"ksfqpn?????????????????? "???????? 416
"KGLS SP????????????????? "??????? 4704
"knstsg?????????????????? "????????? 48
"latch classes??????????? "???????? 352
"system default language h"???????? 568
"name-service entry?????? "??????? 2592
"API data buffer array??? "?????????? 0
"kzull??????????????????? "??????? 4096
"kzulu??????????????????? "???????? 392
"kfgsga?????????????????? "???????? 104
"library cache?? ?????????"??? 46604712
"kcrrny?????????????????? "?????? 25320
"spfile cleanup structure "?????? 16760
"xssinfo????????????????? "??????? 5952
"buffer_pool_desc_array?? "??????? 3384
"row cache child latch??? "??????? 3360
"rm request queue link??? "????? ??5320
"SCHEDULING POLICY TABLE? "???????? 160
"namhsh_kfdsg???????????? "??????? 4104
"Closed Thread SCN Bitvec "??????? 8448
"Client ID trace settings "??????? 3872
"osp allocation?????????? "?????? 21104
"os statistics??????????? "??????? 9192
"plwppwp:PLW_STR_NEW_LEN_V"????????? 16
"plwgc: plwgc_garbage_clea"?????????? 0
"plwiiw: kglpql warnings? "?????????? 0
"object queue???????????? "????? 808080
"obj stat memo??????????? "????? 599184
"obj htab chun??????????? "????? 122960
"object level??????????? ?"????? 111888
"XCT XGA????????????????? "?????????? 0
"SGA - SWRF Metric Eidbuf "????? 900840
"Processor group descripto"????????? 64
"Prefetch client count per"????????? 32
"X$SKGXPIA??????????????? "??????? 2680
"simulator hash buckets?? "???? 2101248
"State object subpools??? "???????? 896
"API data buffer length ma"?????????? 0
"AWR Table Info (KEW layer"???????? 872
"character set memory???? "??????? 4856
"sim segment num bufs???? "??????? 1280
"character set object???? "????? 129728
"session idle latches???? "??????? 2560
"qesmmaInitialize:??????? "???????? 112
"returns from remote ops? "?????? 49152
"name-service???????????? "??????? 4080
"SGA - SWRF Metric CHBs?? "?????? 10912
"listener addresses?????? "????????? 32
"db_block_hash_buckets??? "??? 67108864
"KSI resource types?????? "??????? 2704
"kglsim object batch????? "???? 4196304
"trigger condition node?? "????????? 72
"ksws service events????? "?????? 18560
"Heap0: KGL?????????????? "??? 11642128
"fixed allocation callback"???????? 392
"kqlpWrntoStr:value?????? "?????????? 0
"KEWS statistic name????? "???????? 424
"KEWS statistic maps????? "??????? 1096
"KCL partition table????? "????? 131072
"kebm slave message?????? "????????? 88
"kcbl state objects?????? "?????? 12800
"free rm request queue lin"?????????? 0
"xsoqsehift?????????????? "??????? 3104
"DBWR event stats array?? "???????? 192
"kgllk hash table???????? "????? 659456
"event descriptor table?? "???????? 192
"kpssnfy: kpsssgct??????? "????????? 32
"kpscad: kpscscon???????? "??????? 1952
"dbwriter coalesce buffer "???? 3158016
"kglsim hash table??????? "??????? 8208
"gcs resource freelist dyn"???????? 256
"gcs shadow locks dyn seg "???????? 256
"kks stats latch????????? "???????? 160
"KTC latch cleanup??????? "???????? 576
"ges enqueue max. usage pe"????????? 64
"ges lmd process descripto"??????? 2760
"KTU latch cleanup??????? "??????? 2496
"kscdnfyinithead????????? "????????? 16
"X$KSVIT table??????????? "???????? 512
"kqlpaac:value-1????????? "????????? 64
"KCL buffer header??????? "????? 192064
"kxfpdp pointers????????? "?????? 28800
"kodosgi kopfdo?????????? "???????? 104
"kglsim latches?????????? "???????? 136
"TXN TABLE SEGMENTED ARRAY"?????? 54784
"KJCT remote i??????????? "??????? 1640
"KKJ SHRD WRQS??????????? "???????? 288
"KJC dest ctx???????????? "??????? 3560
"kwrsnfy: kwrs??????????? "??????? 1624
"kwqmn:tskdata??????????? "?????????? 0
"KKKI consumer??????????? "??????? 4136
"dbwr suspend/resume ptr a"????????? 16
"dbwr actual working sets "????????? 64
"KGSKI schedule????????? ?"?????????? 0
"temp lob duration state o"??????? 3296
"ges regular msg buffers? "???? 3078008
"jsksncb: 9?????????????? "?????? 28672
"Transportable DB Converte"??????? 2552
"KTU lat struct?????????? "???????? 800
"kks stats hds??????????? "???????? 256
"KSFD SGA I/O b?????????? "???? 4190248
"HTTP fixed headers?????? "????????? 72
"UNDO INFO SEGMENTED ARRAY"????? 649856
"ges process hash table?? "????? 132000
"jsksncb-latch??????????? "??????? 1280
"kfkid hrec?????????????? "????????? 24
"KTCCC OBJECT??? ?????????"?????????? 0
"KTPR HIST TB???????????? "??????? 2808
"KTF MAPPINGS???????????? "?????? 12288
"kksss-heap?????????????? "?????? 35136
"kglsim heap????????????? "???? 3431232
"event statistics per sess"???? 7665280
"eventlist to post commits"????? ????16
?
?????? 從上面的數(shù)據(jù)可以看到,第3個(gè)sub pool中,占用較多的內(nèi)存是gcs resources、gcs shadows以及sql area。但是沒有明顯的異常。
下面是第3個(gè)sub pool中第1個(gè)mini-heap中free memory的更詳細(xì)數(shù)據(jù):
SQL> break on ksmchidx on ksmchdur
SQL> select
? 2??? ksmchidx,ksmchdur,
? 3??? case
? 4????????? when ksmchsiz < 1672 then trunc((ksmchsiz-32)/8)
? 5????????? when ksmchsiz < 4120 then trunc((ksmchsiz+7928)/48)
? 6????????? when ksmchsiz < 8216 then 250
? 7????????? when ksmchsiz < 16408 then 251
? 8????????? when ksmchsiz < 32792 then 252
? 9????????? when ksmchsiz < 65560 then 253
?10????????? when ksmchsiz >= 65560 then 253
?11???? end bucket,
?12??? sum(ksmchsiz)? free_space,
?13??? count(*)? free_chunks,
?14??? trunc(avg(ksmchsiz))? average_size,
?15??? max(ksmchsiz)? biggest
?16? from
?17??? sys.x$ksmsp
?18? where
?19??? inst_id = userenv('Instance') and
?20??? ksmchcls = 'free'
?21? group by
?22??? case
?23????????? when ksmchsiz < 1672 then trunc((ksmchsiz-32)/8)
?24????????? when ksmchsiz < 4120 then trunc((ksmchsiz+7928)/48)
?25????????? when ksmchsiz < 8216 then 250
?26????????? when ksmchsiz < 16408 then 251
?27????????? when ksmchsiz < 32792 then 252
?28????????? when ksmchsiz < 65560 then 253
?29????????? when ksmchsiz >= 65560 then 253
?30???? end ,
?31??? ksmchidx, ksmchdur
?32? order by ksmchidx , ksmchdur
?33? /
?
ksmchidx?? ksmchdur???? bucket free_space free_chunks average_size??? biggest
---------- ---------- ---------- ---------- ----------- ------------ ----------
3???????? 1????????? 5???????? 72?????????? 1?????????? 72???????? 72
?????????????????? 13 ???????136?????????? 1????????? 136??????? 136
?????????????????? 27??????? 248?????????? 1????????? 248??????? 248
???????????????? ??48??????? 416?????????? 1????????? 416??????? 416
?????????????????? 56?????? 1920?????????? 4????????? 480??????? 480
?????????????????? 66?????? 1680?????????? 3????????? 560??????? 560
?????????????????? 68?????? 4608?????????? 8????????? 576??????? 576
?????????????????? 164?????? 1344?????????? 1???????? 1344?????? 1344
?????????????????? 180?????? 1472?????????? 1?? ??????1472?????? 1472
????????????????? 188?????? 1536?????????? 1???????? 1536?????? 1536
????????????????? 190?????? 1552?????????? 1???????? 1552?????? 1552
????????????????? 199?????? 1624?????????? 1???????? 1624?????? 1624
????????????????? 204????? ?1880?????????? 1???????? 1880?????? 1880
????????????????? 207?????? 2032?????????? 1???????? 2032?????? 2032
?
?????? 可以看到,最大的free memory塊才2032字節(jié),而報(bào)錯(cuò)中提到的申請的內(nèi)存大小為4128字節(jié)。由于在第3個(gè)sub pool的第1個(gè)mini heap中沒有4128字節(jié)的連續(xù)free memory,所以導(dǎo)致內(nèi)存申請失敗。
?
?????? 那么這里的問題是,為什么這個(gè)mini heap中的free memory那么少?正如前面提及,為什么這個(gè)mini heap中的已經(jīng)使用的類型全是perm類型?這個(gè)問題的答案就在于”DURATION”。Oracle在啟用了SGA自動管理的模式下,為了便于在shared pool與buffer cache或其他內(nèi)存之間動態(tài)調(diào)整大小,規(guī)定了在每一個(gè)mini heap中分配內(nèi)存按照duration來進(jìn)行。這里duration可以理解為內(nèi)存塊的持久時(shí)間。
?????? perm類型的內(nèi)存塊,就是分配后不能釋放,只能用于相同組件的重用。比如gcs resources這種組件的內(nèi)存是perm類型,這種內(nèi)存被分配后,不能釋放給sql area使用,也不能給gcs shadows使用,只能給其他的gcs resource使用。
?????? 按DURATION分配內(nèi)存時(shí),perm類型的內(nèi)存就只能從每個(gè)sub pool的第1個(gè)mini heap中分配。而其他類型的內(nèi)存通常在sub pool的第2-4個(gè)mini heap中分配。由于perm類型的內(nèi)存不能釋放,也不能被其他組件的內(nèi)存重用,所以里面的內(nèi)存會越用越少,如果沒有了free memory怎么辦?前面說到,這種模式主要是工作在SGA自動管理模式下,如果free memory沒有了,就會從SGA中的其他部分,比如buffer cache中取得memory chunk,加入到缺少內(nèi)存的mini heap中。正常情況下這種機(jī)制沒有問題。
?
?????? 完全使用SGA自動管理有一個(gè)缺陷就是,如果應(yīng)用系統(tǒng)綁定變量做得不好,或者由于BUG,child cursor過多,導(dǎo)致shared pool會變得很大,甚至超過10G,嚴(yán)重的比buffer cache還大,另一方面,在buffer cache和shared pool之間頻繁地調(diào)整大小,可能會導(dǎo)致嚴(yán)重的解析問題和其他性能問題。
?
?????? 針對這個(gè)問題,通常有2種解決辦法:一種就是關(guān)閉SGA自動管理,即將SGA_TARGET設(shè)置為0,以9i的方式來設(shè)置shared_pool_size,db_cache_size這些參數(shù),來手動管理SGA;第二種就是sga_target仍然大于0,即自動管理SGA,但是通過設(shè)置shared_pool_size,db_cache_size等參數(shù)限制這些內(nèi)存組件的最小大小,而只留給系統(tǒng)極少的自動調(diào)整空間。
?
?????? 而出現(xiàn)問題的這套系統(tǒng),正是使用了第二種方式,開啟了SGA自動調(diào)整,但是留給自動調(diào)整的空間極少。SGA_TARGET為35G,buffer_cache_size為30G,shared_pool_size為4G,再加上large_pool等組件,幾乎沒有什么可自動調(diào)整的余地。這種方式下,就存在了問題。
?
下面來做一個(gè)按時(shí)間的分析:
?????? (1)時(shí)間T1,數(shù)據(jù)庫啟動,shared pool只消耗了極少量的內(nèi)存。
?????? (2)時(shí)間T2至?xí)r間T3,Oracle進(jìn)程請求shared pool內(nèi)存,Oracle會向操作系統(tǒng)以指定的粒度為單位(比如16MB)請求物理內(nèi)存,加入到所請求內(nèi)存所在的mini heap中。直至shared pool的大小達(dá)到shared pool最大容許的大小。這個(gè)容許大小由各參數(shù)計(jì)算而來。比如說SGA_TARGET為10G,其他組件的參數(shù)設(shè)置后最小值為8G,shared_pool_size的值為1G,但是shared pool的最大容許大小為2G。這個(gè)時(shí)候,每個(gè)sub pool的mini heap的大小已經(jīng)固定。在到達(dá)shared pool最大容許大小這一階段,可能會從buffer cache等組件中占用。
?????? (3)時(shí)間T4,Oracle進(jìn)程請求shared pool內(nèi)存,這個(gè)時(shí)候只能從free list或age out內(nèi)存塊后獲取內(nèi)存,對于sub pool的第1個(gè)mini heap,只能從free list中獲取,因?yàn)檫@個(gè)mini heap中的已用內(nèi)存全是perm,是不能age out的。
?????? (4)時(shí)間T5,Oracle進(jìn)程請求shared sub pool中第1個(gè)mini heap的內(nèi)存,但是free list中已經(jīng)沒有內(nèi)存。所以報(bào)ORA-04031錯(cuò)誤。
??????
?????? 在上面的時(shí)間點(diǎn)T5那里,如果SGA有較大的自動調(diào)整空間,比如說完全沒有限制,即buffer_cache_size等參數(shù)很少或?yàn)?/span>0,這樣在請求第1個(gè)mini heap中的內(nèi)存時(shí),完全可以從buffer cache中占用,這樣的后果是使shared pool越來越大。
?
?????? 而本文案例的ORA-04031,正是由于SGA自動管理,而自動調(diào)整的余地又太小,最終使sub pool的第1個(gè)mini heap空間用光。當(dāng)然我們可以分析為什么會用光,這個(gè)就顯得更為復(fù)雜,這跟數(shù)據(jù)量、應(yīng)用系統(tǒng)都有很大的關(guān)系。而系統(tǒng)中第1次出現(xiàn)ORA-04031錯(cuò)誤的進(jìn)程,是一個(gè)job進(jìn)程,而此后大部分出現(xiàn)的錯(cuò)誤均是job進(jìn)程,能檢查job代碼,發(fā)現(xiàn)在做大量的表的大量數(shù)據(jù)的UPDATE操作,這可能是引起gcs shadows和gcs resources大量內(nèi)存使用的原因。在一套RAC數(shù)據(jù)庫中,gcs和ges相關(guān)的perm內(nèi)存占用可能會比較大。
?
?????? 那么除了調(diào)整應(yīng)用,應(yīng)該怎么樣解決這樣問題?這里的解決方法是增加shared_pool_size參數(shù)到6G,同時(shí)將sga_target設(shè)置為0,再重啟。
?????? 而另一種可能的辦法是將參數(shù)“_enable_shared_pool_durations”設(shè)置為FALSE。這一參數(shù)為FALSE,將會使shared pool內(nèi)存分配時(shí),不再使某一類型的內(nèi)存(比如perm)必須要求在一個(gè)固定的mini heap中。而實(shí)際上,sga_target設(shè)置為0之后,這一個(gè)參數(shù)自動會設(shè)為FALSE(由于這一參數(shù)是靜態(tài)參數(shù),所以修改了sga_target之后需要重啟才會使這個(gè)隱含參數(shù)改變),所以建議的解決辦法是設(shè)置sga_target參數(shù),而不建議修改隱含參數(shù)。當(dāng)然還有一種辦法是完全讓Oracle自動管理SGA,將buffer_cache_size和shared_pool_size等參數(shù)設(shè)置為0,但是正如前面所說,這種方法有比較大的缺陷。
?
?
SGA管理方式由自動管理改為手工管理 ,示例:
alter system set sga_target=0m scope=spfile;
alter system set shared_pool_size=3072m scope=spfile;
alter system set large_pool_size=512m scope=spfile;
alter system set java_pool_size=16m scope=spfile;
alter system set db_cache_size=6400m scope=both;
?
?
?
?
?
-------------------------------------------------------------------------------------------------------
Blog: http://blog.csdn.net/tianlesoftware
Email: dvd.dba@gmail.com
DBA1 群:62697716(滿);?? DBA2 群:62697977(滿)?? DBA3 群:62697850(滿)??
DBA 超級群:63306533(滿);? DBA4 群: 83829929? DBA5群: 142216823???
DBA6 群:158654907? 聊天 群:40132017?? 聊天2群:69087192
--加群需要在備注說明Oracle表空間和數(shù)據(jù)文件的關(guān)系,否則拒絕申請
轉(zhuǎn)載于:https://www.cnblogs.com/springside-example/archive/2011/06/15/2529719.html
總結(jié)
以上是生活随笔為你收集整理的Oracle ORA-04031 错误 说明的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 小D学blend-----优美简洁的mp
- 下一篇: [Ruby][3] 条件语句