记一次 .NET 某电商定向爬虫 内存碎片化分析
一:背景
1. 講故事
上個(gè)月有位朋友wx找到我,說(shuō)他的程序存在內(nèi)存泄漏問(wèn)題,尋求如何解決? 如下圖所示:
從截圖中可以看出,這位朋友對(duì) windbg 的操作還是有些熟悉的,可能缺乏一定的實(shí)操經(jīng)驗(yàn),所以用了幾個(gè)命令之后就不知道怎么排查下去了。
既然找到我,那就以我的個(gè)人經(jīng)驗(yàn)在他的dump上繼續(xù)分析尋找罪魁禍?zhǔn)?#xff0c;閑話(huà)不多說(shuō),上windbg說(shuō)話(huà)。
二:Windbg 分析
1. 真的存在內(nèi)存泄漏嗎?
追這個(gè)系列的朋友應(yīng)該知道,我無(wú)數(shù)次的用 !address -summary 和 !eeheap -gc 這兩個(gè)命令來(lái)判斷當(dāng)前的內(nèi)存泄漏是屬于托管層還是非托管層。
0:000>?!address?-summary---?Usage?Summary?----------------?RgnCount?-----------?Total?Size?--------?%ofBusy?%ofTotal Free????????????????????????????????????358?????7dfc`67f60000?(?125.986?TB)???????????98.43% <unknown>??????????????????????????????1087??????203`88b6e000?(???2.014?TB)??99.99%????1.57% Image??????????????????????????????????1532????????0`09f11000?(?159.066?MB)???0.01%????0.00% Heap????????????????????????????????????249????????0`03453000?(??52.324?MB)???0.00%????0.00% Stack????????????????????????????????????66????????0`01fc0000?(??31.750?MB)???0.00%????0.00% Other????????????????????????????????????10????????0`001d1000?(???1.816?MB)???0.00%????0.00% TEB??????????????????????????????????????22????????0`0002c000?(?176.000?kB)???0.00%????0.00% PEB???????????????????????????????????????1????????0`00001000?(???4.000?kB)???0.00%????0.00%---?Type?Summary?(for?busy)?------?RgnCount?-----------?Total?Size?--------?%ofBusy?%ofTotal MEM_MAPPED??????????????????????????????183??????200`00d06000?(???2.000?TB)??99.30%????1.56% MEM_PRIVATE????????????????????????????1252????????3`8d479000?(??14.207?GB)???0.69%????0.01% MEM_IMAGE??????????????????????????????1532????????0`09f11000?(?159.066?MB)???0.01%????0.00%---?State?Summary?----------------?RgnCount?-----------?Total?Size?--------?%ofBusy?%ofTotal MEM_FREE????????????????????????????????358?????7dfc`67f60000?(?125.986?TB)???????????98.43% MEM_RESERVE?????????????????????????????749??????200`28a9b000?(???2.001?TB)??99.33%????1.56% MEM_COMMIT?????????????????????????????2218????????3`6f5f5000?(??13.740?GB)???0.67%????0.01%0:000>?!eeheap?-gc Number?of?GC?Heaps:?1 generation?0?starts?at?0x00000026DA8DA928 generation?1?starts?at?0x00000026DA7FC348 generation?2?starts?at?0x00000024C4691000 ephemeral?segment?allocation?context:?nonesegment?????????????begin?????????allocated??????????????size 00000024C4690000??00000024C4691000??00000024D468FF28??0xfffef28(268431144) 00000024EECF0000??00000024EECF1000??00000024FECF0000??0xffff000(268431360) 000000248D6F0000??000000248D6F1000??000000249D6EFEF8??0xfffeef8(268431096) ... 00000026D66D0000??00000026D66D1000??00000026DBA3CA30??0x536ba30(87472688) Large?object?heap?starts?at?0x00000024D4691000segment?????????????begin?????????allocated??????????????size 00000024D4690000??00000024D4691000??00000024DC67C318??0x7feb318(134132504) 00000024E60F0000??00000024E60F1000??00000024EE0637C8??0x7f727c8(133638088) 0000002482140000??0000002482141000??000000248A08F338??0x7f4e338(133489464) 00000024A6770000??00000024A6771000??00000024AE76F6C0??0x7ffe6c0(134211264) ... 000000278E6D0000??000000278E6D1000??000000279635F2D0??0x7c8e2d0(130605776) 00000029233E0000??00000029233E1000??000000292AF672F8??0x7b862f8(129524472) 000000292B3E0000??000000292B3E1000??0000002931A5ED60??0x667dd60(107470176) 000000299B3E0000??000000299B3E1000??00000029A20095B0??0x6c285b0(113411504) 000000281E6D0000??000000281E6D1000??0000002825CD3F58??0x7602f58(123744088) 00000028266D0000??00000028266D1000??000000282D5CAD50??0x6ef9d50(116366672) 000000282E6D0000??000000282E6D1000??0000002833CA0880??0x55cf880(89979008) 00000029A33E0000??00000029A33E1000??00000029A684D300??0x346c300(54969088) Total?Size:??????????????Size:?0x353f96d88?(14293757320)?bytes. ------------------------------ GC?Heap?Size:????Size:?0x353f96d88?(14293757320)?bytes.從輸出看,當(dāng)前進(jìn)程占用 MEM_COMMIT=13.7G,托管堆內(nèi)存占用 14293757320 = 13.3G,很明顯這屬于簡(jiǎn)單模式的 托管內(nèi)存泄漏,根據(jù)經(jīng)驗(yàn),托管堆上可能有什么大對(duì)象,這里用 !dumpheap -stat 命令。
0:000>?!dumpheap?-stat Statistics:MT????Count????TotalSize?Class?Name 00007ff9ed6ea268??3956842?????94964208?System.Collections.Generic.Dictionary`2+KeyCollection[[System.String,?System.Private.CoreLib],[Serilog.Events.LogEventPropertyValue,?Serilog]] 00007ff9ed5e6d28??3842435????166405016?Serilog.Parsing.MessageTemplateToken[] 00007ff9ed5e5e28??3842434????184436832?Serilog.Events.MessageTemplate 00007ff9ecccf090??4011012????203304420?System.Int32[] 00007ff9ed647078??3956849????253238336?Serilog.Events.LogEvent 00007ff9ed6e7b48??3956849????284893128?System.Collections.Generic.Dictionary`2[[System.String,?System.Private.CoreLib],[Serilog.Events.LogEventPropertyValue,?Serilog]] 00007ff9ed5e74e8??9259598????296307136?Serilog.Parsing.TextToken 00007ff9ed6471b0?12551808????301243392?Serilog.Events.ScalarValue 00007ff9ed6e8308??3956849????729078048?System.Collections.Generic.Dictionary`2+Entry[[System.String,?System.Private.CoreLib],[Serilog.Events.LogEventPropertyValue,?Serilog]][] 00007ff9eccb1e18?16546412???3987811940?System.String 00000024c3b8faf0????82904???7382993568??????Free我去,托管堆最大的對(duì)象居然是 Free,大概占用 7.3G, 這就🐂👃了,不按套路出牌哈,為了更好的理解,先來(lái)簡(jiǎn)要解釋下 free 對(duì)象。
2. 簡(jiǎn)單解釋 free
其實(shí)簡(jiǎn)而言之,free對(duì)象是被gc標(biāo)記成已回收的空閑塊但并未提交給操作系統(tǒng)釋放的對(duì)象,那怎么去證明呢? 可以先到托管堆上找一個(gè)free塊。
0:000>?!dumpheap?-type?FreeAddress???????????????MT?????Size 00000024c4691000?00000024c3b8faf0???????24?Free 00000024c46a2448?00000024c3b8faf0???????24?Free 00000024c46b26d8?00000024c3b8faf0???????40?Free 00000024c47e4418?00000024c3b8faf0???????40?Free 00000024c4925680?00000024c3b8faf0???????40?Free 00000024c49284a8?00000024c3b8faf0???????64?Free 00000024c4947a90?00000024c3b8faf0??????192?Free 00000024c4951f70?00000024c3b8faf0???????48?Free????????? 000000249d6ea3a8?00000024c3b8faf0??????640?Free用 !do 命令查看是否標(biāo)記為free塊
用 dc 看一下這個(gè) free 塊的內(nèi)容,是否有g(shù)c回收的痕跡。
可以看到,這個(gè)free塊生前是有一些殘留的內(nèi)容字符,好了,對(duì) free 塊有基本了解后,接下來(lái)我們繼續(xù)探究下。
3. 到底是什么阻止了free塊的合并?
按照正常邏輯,大多free塊會(huì)在gc回收完之后合并內(nèi)存時(shí),那些被清空后的segment會(huì)被操作系統(tǒng)釋放的,但這個(gè)dump并沒(méi)有,也就說(shuō)明這里有什么東西阻止了free塊的合并,那到底是什么呢? 有經(jīng)驗(yàn)的朋友會(huì)說(shuō),可以觀察下 gc 的句柄表,命令為 !gchandles -stat 。
0:000>?!gchandles?-stat Statistics:MT????Count????TotalSize?Class?Name ... 00007ff9ed15c0f0?????1008????????72576?System.Reflection.Emit.DynamicResolver 00007ff9ecbf6618???????38???????409344?System.Object[] Total?1784?objectsHandles:Strong?Handles:???????233Pinned?Handles:???????16Async?Pinned?Handles:?18Ref?Count?Handles:????1Weak?Long?Handles:????1327Weak?Short?Handles:???144Dependent?Handles:????45從輸出看,這里并沒(méi)有什么可疑的地方,那怎么辦呢? 實(shí)操經(jīng)驗(yàn)多的話(huà),這里還是有一些經(jīng)驗(yàn)值得分享的,比如觀察 free 在 heap 上的布局特征,往往就有重大發(fā)現(xiàn)。
4. 查看 free 塊的布局特征
為了簡(jiǎn)化輸出結(jié)果,我把范圍限定到 heap 上某一個(gè) segment 上,比如這里的: 00000029233E0000 000000292AF672F8,所以命令就是 !dumpheap 00000029233E0000 000000292AF672F8
0:000>?!dumpheap?00000029233E0000??000000292AF672F8Address???????????????MT?????Size 00000029233e1000?00000024c3b8faf0??8291896?Free 0000002923bc9638?00007ff9eccb1e18???108448????? 0000002923be3dd8?00000024c3b8faf0?29931248?Free 000000292586f4c8?00007ff9eccb1e18???301328????? 00000029258b8dd8?00000024c3b8faf0?41384784?Free 0000002928030928?00007ff9eccb1e18???301328????? 000000292807a238?00000024c3b8faf0??2542664?Free 00000029282e6e80?00007ff9eccb1e18???108448????? 0000002928301620?00000024c3b8faf0?29915032?Free 0000002929f88db8?00007ff9eccb1e18???301328????? 0000002929fd26c8?00000024c3b8faf0??2746688?Free 000000292a271008?00007ff9eccb1e18???291304????? 000000292a2b81f0?00000024c3b8faf0??1019600?Free 000000292a3b10c0?00007ff9eccb1e18???108448????? 000000292a3cb860?00000024c3b8faf0?10601048?Free 000000292ade7ab8?00007ff9eccb1e18???301328????? 000000292ae313c8?00000024c3b8faf0???280808?Free 000000292ae75cb0?00007ff9eccb1e18???280854????? 000000292aeba5c8?00000024c3b8faf0???416584?Free 000000292af20110?00007ff9eccb1e18???291304?????Statistics:MT????Count????TotalSize?Class?Name 00007ff9eccb1e18???????10??????2394118?System.String 00000024c3b8faf0???????10????127130352??????Free Total?20?objects真是一看嚇一跳,free 和 object 呈交替狀,這就是為什么free塊不能被合并的真正原因,說(shuō)實(shí)話(huà)這種教科書(shū)式的 內(nèi)存碎片化 dump,真是可遇不可求,接下來(lái)就抽幾個(gè) free 之間的 object 對(duì)象,看看到底是被什么引用著導(dǎo)致gc回收不掉。
5. 尋找 object 的引用鏈
要想查看 object 到底被誰(shuí)引用著,可以用 !gcroot 命令,這里我抽二個(gè)看看。
0:000>?!gcroot?0000002923bc9638 Thread?1878:00000024C39BE4B0?00007FFA4C0B3522?System.Threading.Tasks.Task.SpinThenBlockingWait(Int32,?System.Threading.CancellationToken)?[/_/src/System.Private.CoreLib/shared/System/Threading/Tasks/Task.cs?@?2922]rbp+10:?00000024c39be520->??00000024C48AD6E0?System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1+AsyncStateMachineBox`1[[System.Threading.Tasks.VoidTaskResult,?System.Private.CoreLib],[Microsoft.Extensions.Hosting.HostingAbstractionsHostExtensions+<RunAsync>d__4,?Microsoft.Extensions.Hosting.Abstractions]]->??00000024C48AD200?Microsoft.Extensions.Hosting.Internal.Host->??00000024C48AC538?Microsoft.Extensions.Logging.Logger`1[[Microsoft.Extensions.Hosting.Internal.Host,?Microsoft.Extensions.Hosting]]->??00000024C48AC658?Microsoft.Extensions.Logging.Logger->??00000024C48AC680?Microsoft.Extensions.Logging.LoggerInformation[]->??00000024C48AC6E0?Serilog.Extensions.Logging.SerilogLogger->??00000024C48AC808?Serilog.Core.Logger->??00000024C48AC760?Serilog.Core.Logger->??00000024C47AD468?Serilog.Core.Logger->??00000024C47ABF08?Serilog.Core.Sinks.SafeAggregateSink->??00000024C47ABF20?Serilog.Core.ILogEventSink[]->??00000024C479C548?Serilog.Sinks.Grafana.Loki.LokiSink->??00000024C479C778?Serilog.Sinks.Grafana.Loki.Infrastructure.BoundedQueue`1[[Serilog.Events.LogEvent,?Serilog]]->??00000024C479C7B8?System.Collections.Generic.Queue`1[[Serilog.Events.LogEvent,?Serilog]]->??00000026E8C1A800?Serilog.Events.LogEvent[]->??00000026148D3308?Serilog.Events.LogEvent->??00000026148D4EF0?Serilog.Events.MessageTemplate->??0000002923BC9638?System.StringFound?1?unique?roots?(run?'!gcroot?-all'?to?see?all?roots).0:000>?!gcroot?000000292586f4c8 Thread?1878:00000024C39BE4B0?00007FFA4C0B3522?System.Threading.Tasks.Task.SpinThenBlockingWait(Int32,?System.Threading.CancellationToken)?[/_/src/System.Private.CoreLib/shared/System/Threading/Tasks/Task.cs?@?2922]rbp+10:?00000024c39be520->??00000024C48AD6E0?System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1+AsyncStateMachineBox`1[[System.Threading.Tasks.VoidTaskResult,?System.Private.CoreLib],[Microsoft.Extensions.Hosting.HostingAbstractionsHostExtensions+<RunAsync>d__4,?Microsoft.Extensions.Hosting.Abstractions]]->??00000024C48AD200?Microsoft.Extensions.Hosting.Internal.Host->??00000024C48AC538?Microsoft.Extensions.Logging.Logger`1[[Microsoft.Extensions.Hosting.Internal.Host,?Microsoft.Extensions.Hosting]]->??00000024C48AC658?Microsoft.Extensions.Logging.Logger->??00000024C48AC680?Microsoft.Extensions.Logging.LoggerInformation[]->??00000024C48AC6E0?Serilog.Extensions.Logging.SerilogLogger->??00000024C48AC808?Serilog.Core.Logger->??00000024C48AC760?Serilog.Core.Logger->??00000024C47AD468?Serilog.Core.Logger->??00000024C47ABF08?Serilog.Core.Sinks.SafeAggregateSink->??00000024C47ABF20?Serilog.Core.ILogEventSink[]->??00000024C479C548?Serilog.Sinks.Grafana.Loki.LokiSink->??00000024C479C778?Serilog.Sinks.Grafana.Loki.Infrastructure.BoundedQueue`1[[Serilog.Events.LogEvent,?Serilog]]->??00000024C479C7B8?System.Collections.Generic.Queue`1[[Serilog.Events.LogEvent,?Serilog]]->??00000026E8C1A800?Serilog.Events.LogEvent[]->??0000002614BB7AC8?Serilog.Events.LogEvent->??0000002616D3CC40?Serilog.Events.MessageTemplate->??000000292586F4C8?System.StringFound?1?unique?roots?(run?'!gcroot?-all'?to?see?all?roots).從引用鏈看,這些 object 都是通過(guò) Serilog 日志組件發(fā)送給 Grafana.Loki ,通過(guò)對(duì)引用鏈對(duì)象的追蹤,我發(fā)現(xiàn) System.Collections.Generic.Queue 有重大問(wèn)題,截圖如下:
對(duì),這個(gè) queue 居然有 395w 的積壓,到底積壓了什么東西,可以用 !wdo 看一下string內(nèi)容。
看樣子這是一個(gè)失敗重試,分析到這里思路大概就清晰了,就是 Grafana.Loki 或者 Serilog 組件出了什么問(wèn)題,導(dǎo)致日志發(fā)送不到 Loki 中或者發(fā)送速度過(guò)慢,然后不斷的積壓所致,接下來(lái)把分析到的這些信息和朋友做了一個(gè)溝通,截圖如下:
三:總結(jié)
本次內(nèi)存碎片化的主要原因在于 Serilog 對(duì)接 Loki 的過(guò)程中產(chǎn)生的395w的queue積壓所致,但我也只能分析到這里了,至于為什么有積壓,這個(gè)還得朋友進(jìn)一步調(diào)試分析,我相信這個(gè)問(wèn)題很快就能得到解決💪💪💪
END
工作中的你,是否已遇到 ...?
1. CPU爆高
2. 內(nèi)存暴漲
3. 資源泄漏
4. 崩潰死鎖
5. 程序呆滯
等緊急事件,全公司都指望著你能解決...? 危難時(shí)刻才能展現(xiàn)你的技術(shù)價(jià)值,作為專(zhuān)注于.NET高級(jí)調(diào)試的技術(shù)博主,歡迎微信搜索: 一線碼農(nóng)聊技術(shù),免費(fèi)協(xié)助你分析Dump文件,希望我能將你的踩坑經(jīng)驗(yàn)分享給更多的人。
總結(jié)
以上是生活随笔為你收集整理的记一次 .NET 某电商定向爬虫 内存碎片化分析的全部?jī)?nèi)容,希望文章能夠幫你解決所遇到的問(wèn)題。
- 上一篇: Http Only Cookie保护Ac
- 下一篇: 别等了,全面「远程办公」凉了