JVM 调优 —— GC 长时间停顿问题及解决方法
生活随笔
收集整理的這篇文章主要介紹了
JVM 调优 —— GC 长时间停顿问题及解决方法
小編覺(jué)得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.
零. 簡(jiǎn)介
垃圾收集器長(zhǎng)時(shí)間停頓,表現(xiàn)在 Web 頁(yè)面上可能是頁(yè)面響應(yīng)碼 500 之類的服務(wù)器錯(cuò)誤問(wèn)題,如果是個(gè)支付過(guò)程可能會(huì)導(dǎo)致支付失敗,將造成公司的直接經(jīng)濟(jì)損失,程序員要盡量避免或者說(shuō)減少此類情況發(fā)生。一. 并發(fā)模式失敗(concurrent mode failure)
并發(fā)模式失敗日志:
| 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 | 2016-02-21T13:53:07.974+0800: 171467.254: [GC [1 CMS-initial-mark: 1436199K(1560576K)] 1512927K(2504320K), 0.0618140 secs] [Times: user=0.06 sys=0.00, real=0.07 secs]? 2016-02-21T13:53:08.036+0800: 171467.316: [CMS-concurrent-mark-start]? 2016-02-21T13:53:08.377+0800: 171467.657: [CMS-concurrent-mark: 0.340/0.340 secs] [Times: user=2.11 sys=0.15, real=0.34 secs]? 2016-02-21T13:53:08.377+0800: 171467.657: [CMS-concurrent-preclean-start]? 2016-02-21T13:53:08.385+0800: 171467.665: [CMS-concurrent-preclean: 0.008/0.008 secs] [Times: user=0.01 sys=0.00, real=0.01 secs]? 2016-02-21T13:53:08.385+0800: 171467.665: [CMS-concurrent-abortable-preclean-start]? {Heap before GC invocations=88667 (full 45):? ?par new generation?? total 943744K, used 914399K [0x0000000757000000, 0x0000000797000000, 0x0000000797000000)? ??eden space 838912K, 100% used [0x0000000757000000, 0x000000078a340000, 0x000000078a340000)? ??from space 104832K,? 72% used [0x000000078a340000, 0x000000078ecf7d98, 0x00000007909a0000)? ??to?? space 104832K,?? 0% used [0x00000007909a0000, 0x00000007909a0000, 0x0000000797000000)? ?concurrent mark-sweep generation total 1560576K, used 1436199K [0x0000000797000000, 0x00000007f6400000, 0x00000007f6400000)? ?concurrent-mark-sweep perm gen total 159744K, used 78413K [0x00000007f6400000, 0x0000000800000000, 0x0000000800000000)? 2016-02-21T13:53:09.535+0800: 171468.815: [GC2016-02-21T13:53:09.535+0800: 171468.815: [ParNew? Desired survivor size 53673984 bytes, new threshold 6 (max 6)? - age?? 1:??? 7100568 bytes,??? 7100568 total? - age?? 2:??? 4676456 bytes,?? 11777024 total? - age?? 3:??? 8773736 bytes,?? 20550760 total? - age?? 4:??? 7709744 bytes,?? 28260504 total? - age?? 5:?? 10891960 bytes,?? 39152464 total? - age?? 6:?? 11735032 bytes,?? 50887496 total? : 914399K->75616K(943744K), 0.0414860 secs] 2350599K->1517225K(2504320K), 0.0417710 secs] [Times: user=0.32 sys=0.05, real=0.05 secs]? Heap after GC invocations=88668 (full 45):? ?par new generation?? total 943744K, used 75616K [0x0000000757000000, 0x0000000797000000, 0x0000000797000000)? ??eden space 838912K,?? 0% used [0x0000000757000000, 0x0000000757000000, 0x000000078a340000)? ??from space 104832K,? 72% used [0x00000007909a0000, 0x0000000795378128, 0x0000000797000000)? ??to?? space 104832K,?? 0% used [0x000000078a340000, 0x000000078a340000, 0x00000007909a0000)? ?concurrent mark-sweep generation total 1560576K, used 1441609K [0x0000000797000000, 0x00000007f6400000, 0x00000007f6400000)? ?concurrent-mark-sweep perm gen total 159744K, used 78413K [0x00000007f6400000, 0x0000000800000000, 0x0000000800000000)? }? 2016-02-21T13:53:10.202+0800: 171469.482: [CMS-concurrent-abortable-preclean: 1.772/1.817 secs] [Times: user=4.94 sys=0.06, real=1.81 secs]? 2016-02-21T13:53:10.204+0800: 171469.483: [GC[YG occupancy: 497152 K (943744 K)]2016-02-21T13:53:10.204+0800: 171469.483: [Rescan (parallel) , 1.3691900 secs]2016-02-21T13:53:11.573+0800: 171470.853: [weak refs processing, 0.1009300 secs]2016-02-21T13:53:11.674+0800: 1714? 70.954: [class unloading, 0.0153470 secs]2016-02-21T13:53:11.689+0800: 171470.969: [scrub symbol table, 0.0110770 secs]2016-02-21T13:53:11.700+0800: 171470.980: [scrub string table, 0.0016360 secs] [1 CMS-remark: 1441609K(1560576K)] 1938761K(2504320K), 1.5079530 secs] [Ti? mes: user=13.01 sys=0.08, real=1.51 secs]? 2016-02-21T13:53:11.712+0800: 171470.992: [CMS-concurrent-sweep-start]? {Heap before GC invocations=88668 (full 45):? ?par new generation?? total 943744K, used 914528K [0x0000000757000000, 0x0000000797000000, 0x0000000797000000)? ??eden space 838912K, 100% used [0x0000000757000000, 0x000000078a340000, 0x000000078a340000)? ??from space 104832K,? 72% used [0x00000007909a0000, 0x0000000795378128, 0x0000000797000000)? ??to?? space 104832K,?? 0% used [0x000000078a340000, 0x000000078a340000, 0x00000007909a0000)? ?concurrent mark-sweep generation total 1560576K, used 1441606K [0x0000000797000000, 0x00000007f6400000, 0x00000007f6400000)? ?concurrent-mark-sweep perm gen total 159744K, used 78413K [0x00000007f6400000, 0x0000000800000000, 0x0000000800000000)? 2016-02-21T13:53:11.775+0800: 171471.055: [GC2016-02-21T13:53:11.775+0800: 171471.055: [ParNew (promotion failed)? Desired survivor size 53673984 bytes, new threshold 6 (max 6)? - age?? 1:??? 4030872 bytes,??? 4030872 total? - age?? 2:??? 5959704 bytes,??? 9990576 total? - age?? 3:??? 4628680 bytes,?? 14619256 total? - age?? 4:??? 8773080 bytes,?? 23392336 total? - age?? 5:??? 7707144 bytes,?? 31099480 total? - age?? 6:?? 10890224 bytes,?? 41989704 total? : 914528K->907344K(943744K), 1.0312010 secs]2016-02-21T13:53:12.807+0800: 171472.086: [CMS2016-02-21T13:53:14.455+0800: 171473.735: [CMS-concurrent-sweep: 1.684/2.743 secs] [Times: user=3.69 sys=0.36, real=2.74 secs]? ?(concurrent mode failure): 1451903K->475795K(1560576K), 3.9644230 secs] 2356134K->475795K(2504320K), [CMS Perm : 78413K->78413K(159744K)], 4.9959570 secs] [Times: user=5.65 sys=0.34, real=5.00 secs]? Heap after GC invocations=88669 (full 46):? ?par new generation?? total 943744K, used 0K [0x0000000757000000, 0x0000000797000000, 0x0000000797000000)? ??eden space 838912K,?? 0% used [0x0000000757000000, 0x0000000757000000, 0x000000078a340000)? ??from space 104832K,?? 0% used [0x000000078a340000, 0x000000078a340000, 0x00000007909a0000)? ??to?? space 104832K,?? 0% used [0x00000007909a0000, 0x00000007909a0000, 0x0000000797000000)? ?concurrent mark-sweep generation total 1560576K, used 475795K [0x0000000797000000, 0x00000007f6400000, 0x00000007f6400000)? ?concurrent-mark-sweep perm gen total 159744K, used 78413K [0x00000007f6400000, 0x0000000800000000, 0x0000000800000000)? } |
- 在 CMS 啟動(dòng)過(guò)程中,新生代提升速度過(guò)快,老年代收集速度趕不上新生代提升速度
- 在 CMS 啟動(dòng)過(guò)程中,老年代碎片化嚴(yán)重,無(wú)法容納新生代提升上來(lái)的大對(duì)象
- 新生代提升過(guò)快問(wèn)題:(1)如果頻率太快的話,說(shuō)明空間不足,首先可以嘗試調(diào)大新生代空間和晉升閾值。(2)如果內(nèi)存有限,可以設(shè)置 CMS 垃圾收集在老年代占比達(dá)到多少時(shí)啟動(dòng)來(lái)減少問(wèn)題發(fā)生頻率(越早啟動(dòng)問(wèn)題發(fā)生頻率越低,但是會(huì)降低吞吐量,具體得多調(diào)整幾次找到平衡點(diǎn)),參數(shù)如下:如果沒(méi)有第二個(gè)參數(shù),會(huì)隨著 JVM 動(dòng)態(tài)調(diào)節(jié) CMS 啟動(dòng)時(shí)間
- 老年代碎片嚴(yán)重問(wèn)題:(1)如果頻率太快或者 Full GC 后空間釋放不多的話,說(shuō)明空間不足,首先可以嘗試調(diào)大老年代空間(2)如果內(nèi)存不足,可以設(shè)置進(jìn)行 n 次 CMS 后進(jìn)行一次壓縮式 Full GC,參數(shù)如下:
二. 提升失敗(promotion failed)
在 Minor GC 過(guò)程中,Survivor Unused?可能不足以容納 Eden 和另一個(gè) Survivor 中的存活對(duì)象, 那么多余的將被移到老年代, 稱為過(guò)早提升(Premature Promotion)。 這會(huì)導(dǎo)致老年代中短期存活對(duì)象的增長(zhǎng), 可能會(huì)引發(fā)嚴(yán)重的性能問(wèn)題。 ?再進(jìn)一步, 如果老年代滿了, Minor GC 后會(huì)進(jìn)行 Full GC, 這將導(dǎo)致遍歷整個(gè)堆, 稱為提升失敗(Promotion Failure)。 提升失敗日志:| 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 | 2016-01-07T18:54:26.948+0800: 18782.967: [GC2016-04-07T18:54:26.948+0800: 18782.967: [ParNew (promotion failed)? Desired survivor size 117833728 bytes, new threshold 10 (max 10)? - age?? 1:??? 6141680 bytes,??? 6141680 total? - age?? 2:??? 6337936 bytes,?? 12479616 total? - age?? 3:???? 549120 bytes,?? 13028736 total? - age?? 4:????? 87768 bytes,?? 13116504 total? - age?? 5:???? 221384 bytes,?? 13337888 total? - age?? 6:???? 934168 bytes,?? 14272056 total? - age?? 7:???? 146072 bytes,?? 14418128 total? - age?? 8:???? 626064 bytes,?? 15044192 total? - age?? 9:???? 398000 bytes,?? 15442192 total? - age? 10:???? 429616 bytes,?? 15871808 total? : 1969227K->1929200K(2071808K), 0.7452140 secs]2016-01-07T18:54:27.693+0800: 18783.713: [CMS: 1394703K->632845K(2097152K), 4.0993640 secs] 3301676K->632845K(4168960K), [CMS Perm : 77485K->77473K(159744K)], 4.8450240 secs] [Times: user=5.18 sys=0.56, real=4.84 secs]? Heap after GC invocations=5847 (full 7):? ?par new generation?? total 2071808K, used 0K [0x00000006e9c00000, 0x0000000776400000, 0x0000000776400000)? ??eden space 1841664K,?? 0% used [0x00000006e9c00000, 0x00000006e9c00000, 0x000000075a280000)? ??from space 230144K,?? 0% used [0x0000000768340000, 0x0000000768340000, 0x0000000776400000)? ??to?? space 230144K,?? 0% used [0x000000075a280000, 0x000000075a280000, 0x0000000768340000)? ?concurrent mark-sweep generation total 2097152K, used 632845K [0x0000000776400000, 0x00000007f6400000, 0x00000007f6400000)? ?concurrent-mark-sweep perm gen total 159744K, used 77473K [0x00000007f6400000, 0x0000000800000000, 0x0000000800000000)? } |
- 新生代提升太快
- 老年代碎片太多,放不下大對(duì)象提升(表現(xiàn)為老年代還有很多空間但是,出現(xiàn)了 promotion failed)
三. 在 GC 的時(shí)候其他系統(tǒng)活動(dòng)影響
有些時(shí)候系統(tǒng)活動(dòng)諸如內(nèi)存換入換出(vmstat)、網(wǎng)絡(luò)活動(dòng)(netstat)、I/O (iostat)在 GC 過(guò)程中發(fā)生會(huì)使 GC 時(shí)間變長(zhǎng)。 前提是你的服務(wù)器上是有 SWAP 區(qū)域(用 top、 vmstat 等命令可以看出)用于內(nèi)存的換入換出,那么操作系統(tǒng)可能會(huì)將 JVM 中不活躍的內(nèi)存頁(yè)換到 SWAP 區(qū)域用以釋放內(nèi)存給線程使用(這也透露出內(nèi)存開(kāi)始不夠用了)。內(nèi)存換入換出是一個(gè)開(kāi)銷巨大的磁盤操作,比內(nèi)存訪問(wèn)慢好幾個(gè)數(shù)量級(jí)。 看一段 GC 日志:耗時(shí) 29.47 秒| 1 2 3 4 5 6 7 8 | {Heap before GC invocations=132 (full 0):? par new generation total 2696384K, used 2696384K [0xfffffffc20010000, 0xfffffffce0010000, 0xfffffffce0010000)? eden space 2247040K, 100% used [0xfffffffc20010000, 0xfffffffca9270000, 0xfffffffca9270000)? from space 449344K, 100% used [0xfffffffca9270000, 0xfffffffcc4940000, 0xfffffffcc4940000)? to space 449344K, 0% used [0xfffffffcc4940000, 0xfffffffcc4940000, 0xfffffffce0010000)? concurrent mark-sweep generation total 9437184K, used 1860619K [0xfffffffce0010000, 0xffffffff20010000, 0xffffffff20010000)? concurrent-mark-sweep perm gen total 1310720K, used 511451K [0xffffffff20010000, 0xffffffff70010000, 0xffffffff70010000)? 2013-07-17T03:58:06.601-0700: 51522.120: [GC Before GC: : 2696384K->449344K(2696384K), 29.4779282 secs] 4557003K->2326821K(12133568K) ,29.4795222 secs] [Times: user=915.56 sys=6.35, real=29.48 secs] |
四. JVM 的 bug
這種原因就根據(jù)自己使用的 JDK 版本去查一下,如果是 JVM 的 bug,升級(jí)到解決的版本一般就能解決。五. 總結(jié)
? ? ???長(zhǎng)時(shí)間停頓問(wèn)題的排查及解決首先需要一定的信息和方法論:- 詳細(xì)的 GC 日志
- 借助 Linux 平臺(tái)下的 iostat、vmstat、netstat、mpstat 等命令監(jiān)控系統(tǒng)情況
- 使用 GCHisto 這個(gè) GC 圖形用戶界面工具,可以統(tǒng)計(jì)出 Minor GC 及 Full GC 頻率及時(shí)長(zhǎng)分布,可參考:http://blog.csdn.net/wenniuwuren/article/details/50760259
- 查看 GC 日志中是否出現(xiàn)了上述的典型內(nèi)存異常問(wèn)題(promotion failed, concurrent mode failure),整體來(lái)說(shuō)把上述兩個(gè)典型內(nèi)存異常情況控制在可接受的發(fā)生頻率即可,對(duì) CMS 碎片問(wèn)題來(lái)說(shuō)杜絕以上問(wèn)題似乎不太可能,只能靠 G1 來(lái)解決了
- 是不是 JVM 本身的 bug 導(dǎo)致的
- 如果程序沒(méi)問(wèn)題,參數(shù)調(diào)了幾次還是不能解決,可能說(shuō)明流量太大,需要加機(jī)器把壓力分散到更多 JVM 上
六. 參考資料
《Java 性能優(yōu)化權(quán)威指南》 https://blogs.oracle.com/poonam/entry/troubleshooting_long_gc_pauses原文出處:?文牛武人
from:?http://www.importnew.com/22886.html
總結(jié)
以上是生活随笔為你收集整理的JVM 调优 —— GC 长时间停顿问题及解决方法的全部?jī)?nèi)容,希望文章能夠幫你解決所遇到的問(wèn)題。
- 上一篇: 彻底弄懂 HTTP 缓存机制 —— 基于
- 下一篇: 在Spring MVC中使用Apache