一背景1. 讲故事前些天有位朋友微信找到我说它的程序出现了内存暴涨自己也没分析出啥让我看下到底怎么回事然后让这位朋友抓一个dump拿它占一卦就行了。二内存暴涨分析1. 为什么会暴涨到底是哪里的暴涨折半查找一下就知道了分别通过!address -summary和!eeheap -gc观察各自的内存输出如下0:000 !eeheap -gc DATAS Number of GC Heaps: 1 ---------------------------------------- generation 0 starts at 2e8fc4b9da8 generation 1 starts at 2e8fc4b99a0 generation 2 starts at 2e780001000 ephemeral segment allocation context: none Small object heap segment begin allocated committed allocated size committed size 02e780000000 02e780001000 02e78fffffd0 02e790000000 0xfffefd0 (268431312) 0x10000000 (268435456) ... 02e8b8150000 02e8b8151000 02e8c814ff58 02e8c8150000 0xfffef58 (268431192) 0x10000000 (268435456) 02e8e0150000 02e8e0151000 02e8f014ff90 02e8f0150000 0xfffef90 (268431248) 0x10000000 (268435456) 02ec45c40000 02ec45c41000 02ec55c3fe90 02ec55c40000 0xfffee90 (268430992) 0x10000000 (268435456) 02e8f0150000 02e8f0151000 02e8fc865dc0 02e8fce40000 0xc714dc0 (208752064) 0xccf0000 (214892544) Large object heap starts at 2e790001000 segment begin allocated committed allocated size committed size 02e790000000 02e790001000 02e7960253a0 02e796046000 0x60243a0 (100811680) 0x6046000 (100950016) 02e7a29d0000 02e7a29d1000 02e7a47242e8 02e7a4745000 0x1d532e8 (30749416) 0x1d75000 (30887936) 02e8c8150000 02e8c8151000 02e8dcae0b50 02e8dcae1000 0x1498fb50 (345570128) 0x14991000 (345575424) Pinned object heap starts at 2e798001000 segment begin allocated committed allocated size committed size 02e798000000 02e798001000 02e79806d3f0 02e79806e000 0x6c3f0 (443376) 0x6e000 (450560) ------------------------------ GC Allocated Heap Size: Size: 0x128e77b30 (4981226288) bytes. GC Committed Heap Size: Size: 0x1294aa000 (4987723776) bytes.从卦中看很显然这是一个托管内存暴涨问题接下来怎么办呢看看托管堆都有哪些对象来进一步的drill down, 接下来使用!dumpheap -stat分析托管堆详情输出如下0:000 !address -summary --- Usage Summary ---------------- RgnCount ----------- Total Size -------- %ofBusy %ofTotal Free 934 7dfd28767000 ( 125.989 TB) 98.43% unknown 2136 2026b0c5000 ( 2.009 TB) 99.92% 1.57% Heap 104 04fac0000 ( 1.245 GB) 0.06% 0.00% Image 1343 015f8a000 ( 351.539 MB) 0.02% 0.00% Stack 219 006b00000 ( 107.000 MB) 0.01% 0.00% Other 16 0001e7000 ( 1.902 MB) 0.00% 0.00% TEB 73 000092000 ( 584.000 kB) 0.00% 0.00% PEB 1 000001000 ( 4.000 kB) 0.00% 0.00% --- State Summary ---------------- RgnCount ----------- Total Size -------- %ofBusy %ofTotal MEM_FREE 934 7dfd28767000 ( 125.989 TB) 98.43% MEM_RESERVE 529 20132f63000 ( 2.005 TB) 99.68% 1.57% MEM_COMMIT 3363 1a4926000 ( 6.571 GB) 0.32% 0.01% 0:000 !dumpheap -stat Statistics: MT Count TotalSize Class Name ... 7fff3d73b460 135,722 60,558,432 System.Windows.EffectiveValueEntry[] 7fff3d495b58 6,933,877 166,413,048 System.WeakReference 7fff3df42c28 6,737,027 323,377,296 MS.Internal.Data.DataBindEngineTask 7fff3d46c878 1,719 347,902,656 System.Collections.Hashtablebucket[] 7fff3df40c18 6,749,752 971,964,28 8 System.Windows.Data.BindingExpression 02e7e69b9b80 5,342,750 2,859,160,528 Free Total 28,339,839 objects, 4,980,676,386 bytes这幅卦有意思Free占大头这也就表明当前托管堆存在碎片化有些朋友可能比较好奇这碎片化到底是怎么个碎片化能不能给我看看涨什么样子这个就需要使用 jetbrains 大名鼎鼎的 DotMemory 了。从卦中可以看到 Gen2 上有大量的灰色小段丛横交错这就是内部的free撑起来的虚幻内存到这里我们已然知道内存暴涨和Free有密切的关系。2. 为啥有那么多的free要想找到这个问题的答案就需要看下 free 的前后都是什么对象了这里我就随便截取一段参考如下0:000 !dumpheap 02e8e0151000 02e8f014ff90 Address MT Size 02e8e09da8f0 02e7e69b9b80 872 Free 02e8e09dac58 7fff3df40c18 144 02e8e09dace8 7fff3d495b58 24 02e8e09dad00 7fff3df42c28 48 02e8e09dad30 02e7e69b9b80 1,000 Free 02e8e09db118 7fff3df40c18 144 02e8e09db1a8 7fff3d495b58 24 02e8e09db1c0 7fff3df42c28 48 02e8e09db1f0 02e7e69b9b80 656 Free 02e8e09db480 7fff3df40c18 144 02e8e09db510 7fff3d495b58 24 02e8e09db528 7fff3df42c28 48 02e8e09db558 02e7e69b9b80 760 Free 02e8e09db850 7fff3df40c18 144 02e8e09db8e0 7fff3d495b58 24 02e8e09db8f8 7fff3df42c28 48 02e8e09db928 02e7e69b9b80 608 Free 02e8e09dbb88 7fff3df40c18 144 02e8e09dbc18 7fff3d495b58 24 02e8e09dbc30 7fff3df42c28 48 02e8e09dbc60 02e7e69b9b80 480 Free 02e8e09dbe40 7fff3df40c18 144 02e8e09dbed0 7fff3d495b58 24 02e8e09dbee8 7fff3df42c28 48 02e8e09dbf18 02e7e69b9b80 656 Free从卦中的对象分布来看layout还是蛮有规律的这里就从02e8e09dbb88这个地址上开刀吧使用!gcroot观察。0:000 !gcroot 02e8e09dbb88 Caching GC roots, this may take a while. Subsequent runs of this command will be faster. HandleTable: 000002e7e68d1340 (strong handle) - 02e780019858 System.Object[] - 02e780019588 System.Windows.Threading.Dispatcher - 02e7800196c0 System.Windows.Threading.PriorityQueueSystem.Windows.Threading.DispatcherOperation - 02e8fc49c608 System.Windows.Threading.PriorityItemSystem.Windows.Threading.DispatcherOperation - 02e8fc4a5c10 System.Windows.Threading.PriorityItemSystem.Windows.Threading.DispatcherOperation - 02e8fc4a3d00 System.Windows.Threading.PriorityItemSystem.Windows.Threading.DispatcherOperation - 02e8fc4a3bd8 System.Windows.Threading.DispatcherOperation - 02e8fc4a3b98 System.Action - 02e8fc492380 xxx.UiViewModelBasec__DisplayClass375_0 - 02e780ed3ea0 xxx.xxx.EFEMViewModel - 02e780ed4518 System.Collections.ObjectModel.ObservableCollectionSystem.String - 02e781581de0 System.Collections.Specialized.NotifyCollectionChangedEventHandler - 02e781581c48 System.Windows.Data.ListCollectionView - 02e7804095d8 MS.Internal.Data.DataBindEngine - 02e780409a70 System.Collections.Specialized.HybridDictionary - 02e7869fa948 System.Collections.Hashtable - 02e8c8151020 System.Collections.Hashtablebucket[] - 02e8e09dbb88 System.Windows.Data.BindingExpression揽天地入卦中我们看到了熟悉的Dispatcher这不就是消息循环的调度器嘛接下来赶紧看看内部的 PriorityQueue 集合截图如下尼玛居然积压了 8949 个未处理导致gen2直接碎片化说实话这个 lead to 我还是第一次见到以前最多导致 UI 卡慢甚至卡死害我也是长见识了。3. 都是谁在疯狂的推送要想找到这块信息可以观察下各个线程都在做什么看看那些 suspicious 线程都在通过什么进行Invoke输出和截图如下0:000 ~*e !clrstack OS Thread Id: 0x4060 (22) Child SP IP Call Site 000000F80A67EBD8 00007fffb316e0f4 [HelperMethodFrame: 000000f80a67ebd8] System.Threading.WaitHandle.WaitOneCore(IntPtr, Int32) 000000F80A67ECE0 00007fff3e830687 System.Threading.WaitHandle.WaitOneNoCheck(Int32) [/_/src/libraries/System.Private.CoreLib/src/System/Threading/WaitHandle.cs 139] 000000F80A67ED40 00007fff3e91e335 System.Windows.Threading.DispatcherOperationDispatcherOperationEvent.WaitOne() [/_/src/Microsoft.DotNet.Wpf/src/WindowsBase/System/Windows/Threading/DispatcherOperation.cs 659] 000000F80A67EDB0 00007fff3e912dd3 System.Windows.Threading.DispatcherOperation.Wait(System.TimeSpan) [/_/src/Microsoft.DotNet.Wpf/src/WindowsBase/System/Windows/Threading/DispatcherOperation.cs 220] 000000F80A67EDF0 00007fff3e91ddb7 System.Windows.Threading.Dispatcher.InvokeImpl(System.Windows.Threading.DispatcherOperation, System.Threading.CancellationToken, System.TimeSpan) [/_/src/Microsoft.DotNet.Wpf/src/WindowsBase/System/Windows/Threading/Dispatcher.cs 1384] 000000F80A67EE80 00007fff3e91da7e System.Windows.Threading.Dispatcher.Invoke(System.Action, System.Windows.Threading.DispatcherPriority, System.Threading.CancellationToken, System.TimeSpan) [/_/src/Microsoft.DotNet.Wpf/src/WindowsBase/System/Windows/Threading/Dispatcher.cs 627] 000000F80A67EF00 00007fff3e91d7f5 System.Windows.Threading.Dispatcher.Invoke(System.Action) [/_/src/Microsoft.DotNet.Wpf/src/WindowsBase/System/Windows/Threading/Dispatcher.cs 509] 000000F80A67EF40 00007fff3ebe6acd xxx.UiViewModelBase.GetDataAndUpdate(System.Collections.Generic.IEnumerable1System.String) 000000F80A67F0A0 00007fff3ebe54ff xxx.UiViewModelBase.Poll() 000000F80A67F310 00007fff3e1253a7 System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object) [/_/src/libraries/System.Private.CoreLib/src/System/Threading/ExecutionContext.cs 183] 000000F80A67F380 00007fff3e906d7e System.Threading.Tasks.Task.ExecuteWithThreadLocal(System.Threading.Tasks.Task ByRef, System.Threading.Thread) [/_/src/libraries/System.Private.CoreLib/src/System/Threading/Tasks/Task.cs 2333] 000000F80A67F430 00007fff9c48b32a System.Threading.Tasks.ThreadPoolTaskSchedulerc..cctorb__10_0(System.Object) [/_/src/libraries/System.Private.CoreLib/src/System/Threading/Tasks/ThreadPoolTaskScheduler.cs 35] 000000F80A67F460 00007fff9c461d41 System.Threading.Thread.StartCallback() [/_/src/coreclr/System.Private.CoreLib/src/System/Threading/Thread.CoreCLR.cs 105] 000000F80A67F6F0 00007fff9cd1a573 [DebuggerU2MCatchHandlerFrame: 000000f80a67f6f0]最后的作业就留给这位朋友了优化代码逻辑将 PriorityQueue 给降下去当然原则上来说朋友没有反馈卡死可能它这个程序是无人值守的所以不知道UI线程的惨样。三总结这次生产事故的分析给我的dump分析之旅增加了一点点缀毕竟也给我涨了点见识期待下次精彩相遇。
记一次 .NET 某集群管理软件 内存暴涨分析
发布时间:2026/5/23 1:46:12
一背景1. 讲故事前些天有位朋友微信找到我说它的程序出现了内存暴涨自己也没分析出啥让我看下到底怎么回事然后让这位朋友抓一个dump拿它占一卦就行了。二内存暴涨分析1. 为什么会暴涨到底是哪里的暴涨折半查找一下就知道了分别通过!address -summary和!eeheap -gc观察各自的内存输出如下0:000 !eeheap -gc DATAS Number of GC Heaps: 1 ---------------------------------------- generation 0 starts at 2e8fc4b9da8 generation 1 starts at 2e8fc4b99a0 generation 2 starts at 2e780001000 ephemeral segment allocation context: none Small object heap segment begin allocated committed allocated size committed size 02e780000000 02e780001000 02e78fffffd0 02e790000000 0xfffefd0 (268431312) 0x10000000 (268435456) ... 02e8b8150000 02e8b8151000 02e8c814ff58 02e8c8150000 0xfffef58 (268431192) 0x10000000 (268435456) 02e8e0150000 02e8e0151000 02e8f014ff90 02e8f0150000 0xfffef90 (268431248) 0x10000000 (268435456) 02ec45c40000 02ec45c41000 02ec55c3fe90 02ec55c40000 0xfffee90 (268430992) 0x10000000 (268435456) 02e8f0150000 02e8f0151000 02e8fc865dc0 02e8fce40000 0xc714dc0 (208752064) 0xccf0000 (214892544) Large object heap starts at 2e790001000 segment begin allocated committed allocated size committed size 02e790000000 02e790001000 02e7960253a0 02e796046000 0x60243a0 (100811680) 0x6046000 (100950016) 02e7a29d0000 02e7a29d1000 02e7a47242e8 02e7a4745000 0x1d532e8 (30749416) 0x1d75000 (30887936) 02e8c8150000 02e8c8151000 02e8dcae0b50 02e8dcae1000 0x1498fb50 (345570128) 0x14991000 (345575424) Pinned object heap starts at 2e798001000 segment begin allocated committed allocated size committed size 02e798000000 02e798001000 02e79806d3f0 02e79806e000 0x6c3f0 (443376) 0x6e000 (450560) ------------------------------ GC Allocated Heap Size: Size: 0x128e77b30 (4981226288) bytes. GC Committed Heap Size: Size: 0x1294aa000 (4987723776) bytes.从卦中看很显然这是一个托管内存暴涨问题接下来怎么办呢看看托管堆都有哪些对象来进一步的drill down, 接下来使用!dumpheap -stat分析托管堆详情输出如下0:000 !address -summary --- Usage Summary ---------------- RgnCount ----------- Total Size -------- %ofBusy %ofTotal Free 934 7dfd28767000 ( 125.989 TB) 98.43% unknown 2136 2026b0c5000 ( 2.009 TB) 99.92% 1.57% Heap 104 04fac0000 ( 1.245 GB) 0.06% 0.00% Image 1343 015f8a000 ( 351.539 MB) 0.02% 0.00% Stack 219 006b00000 ( 107.000 MB) 0.01% 0.00% Other 16 0001e7000 ( 1.902 MB) 0.00% 0.00% TEB 73 000092000 ( 584.000 kB) 0.00% 0.00% PEB 1 000001000 ( 4.000 kB) 0.00% 0.00% --- State Summary ---------------- RgnCount ----------- Total Size -------- %ofBusy %ofTotal MEM_FREE 934 7dfd28767000 ( 125.989 TB) 98.43% MEM_RESERVE 529 20132f63000 ( 2.005 TB) 99.68% 1.57% MEM_COMMIT 3363 1a4926000 ( 6.571 GB) 0.32% 0.01% 0:000 !dumpheap -stat Statistics: MT Count TotalSize Class Name ... 7fff3d73b460 135,722 60,558,432 System.Windows.EffectiveValueEntry[] 7fff3d495b58 6,933,877 166,413,048 System.WeakReference 7fff3df42c28 6,737,027 323,377,296 MS.Internal.Data.DataBindEngineTask 7fff3d46c878 1,719 347,902,656 System.Collections.Hashtablebucket[] 7fff3df40c18 6,749,752 971,964,28 8 System.Windows.Data.BindingExpression 02e7e69b9b80 5,342,750 2,859,160,528 Free Total 28,339,839 objects, 4,980,676,386 bytes这幅卦有意思Free占大头这也就表明当前托管堆存在碎片化有些朋友可能比较好奇这碎片化到底是怎么个碎片化能不能给我看看涨什么样子这个就需要使用 jetbrains 大名鼎鼎的 DotMemory 了。从卦中可以看到 Gen2 上有大量的灰色小段丛横交错这就是内部的free撑起来的虚幻内存到这里我们已然知道内存暴涨和Free有密切的关系。2. 为啥有那么多的free要想找到这个问题的答案就需要看下 free 的前后都是什么对象了这里我就随便截取一段参考如下0:000 !dumpheap 02e8e0151000 02e8f014ff90 Address MT Size 02e8e09da8f0 02e7e69b9b80 872 Free 02e8e09dac58 7fff3df40c18 144 02e8e09dace8 7fff3d495b58 24 02e8e09dad00 7fff3df42c28 48 02e8e09dad30 02e7e69b9b80 1,000 Free 02e8e09db118 7fff3df40c18 144 02e8e09db1a8 7fff3d495b58 24 02e8e09db1c0 7fff3df42c28 48 02e8e09db1f0 02e7e69b9b80 656 Free 02e8e09db480 7fff3df40c18 144 02e8e09db510 7fff3d495b58 24 02e8e09db528 7fff3df42c28 48 02e8e09db558 02e7e69b9b80 760 Free 02e8e09db850 7fff3df40c18 144 02e8e09db8e0 7fff3d495b58 24 02e8e09db8f8 7fff3df42c28 48 02e8e09db928 02e7e69b9b80 608 Free 02e8e09dbb88 7fff3df40c18 144 02e8e09dbc18 7fff3d495b58 24 02e8e09dbc30 7fff3df42c28 48 02e8e09dbc60 02e7e69b9b80 480 Free 02e8e09dbe40 7fff3df40c18 144 02e8e09dbed0 7fff3d495b58 24 02e8e09dbee8 7fff3df42c28 48 02e8e09dbf18 02e7e69b9b80 656 Free从卦中的对象分布来看layout还是蛮有规律的这里就从02e8e09dbb88这个地址上开刀吧使用!gcroot观察。0:000 !gcroot 02e8e09dbb88 Caching GC roots, this may take a while. Subsequent runs of this command will be faster. HandleTable: 000002e7e68d1340 (strong handle) - 02e780019858 System.Object[] - 02e780019588 System.Windows.Threading.Dispatcher - 02e7800196c0 System.Windows.Threading.PriorityQueueSystem.Windows.Threading.DispatcherOperation - 02e8fc49c608 System.Windows.Threading.PriorityItemSystem.Windows.Threading.DispatcherOperation - 02e8fc4a5c10 System.Windows.Threading.PriorityItemSystem.Windows.Threading.DispatcherOperation - 02e8fc4a3d00 System.Windows.Threading.PriorityItemSystem.Windows.Threading.DispatcherOperation - 02e8fc4a3bd8 System.Windows.Threading.DispatcherOperation - 02e8fc4a3b98 System.Action - 02e8fc492380 xxx.UiViewModelBasec__DisplayClass375_0 - 02e780ed3ea0 xxx.xxx.EFEMViewModel - 02e780ed4518 System.Collections.ObjectModel.ObservableCollectionSystem.String - 02e781581de0 System.Collections.Specialized.NotifyCollectionChangedEventHandler - 02e781581c48 System.Windows.Data.ListCollectionView - 02e7804095d8 MS.Internal.Data.DataBindEngine - 02e780409a70 System.Collections.Specialized.HybridDictionary - 02e7869fa948 System.Collections.Hashtable - 02e8c8151020 System.Collections.Hashtablebucket[] - 02e8e09dbb88 System.Windows.Data.BindingExpression揽天地入卦中我们看到了熟悉的Dispatcher这不就是消息循环的调度器嘛接下来赶紧看看内部的 PriorityQueue 集合截图如下尼玛居然积压了 8949 个未处理导致gen2直接碎片化说实话这个 lead to 我还是第一次见到以前最多导致 UI 卡慢甚至卡死害我也是长见识了。3. 都是谁在疯狂的推送要想找到这块信息可以观察下各个线程都在做什么看看那些 suspicious 线程都在通过什么进行Invoke输出和截图如下0:000 ~*e !clrstack OS Thread Id: 0x4060 (22) Child SP IP Call Site 000000F80A67EBD8 00007fffb316e0f4 [HelperMethodFrame: 000000f80a67ebd8] System.Threading.WaitHandle.WaitOneCore(IntPtr, Int32) 000000F80A67ECE0 00007fff3e830687 System.Threading.WaitHandle.WaitOneNoCheck(Int32) [/_/src/libraries/System.Private.CoreLib/src/System/Threading/WaitHandle.cs 139] 000000F80A67ED40 00007fff3e91e335 System.Windows.Threading.DispatcherOperationDispatcherOperationEvent.WaitOne() [/_/src/Microsoft.DotNet.Wpf/src/WindowsBase/System/Windows/Threading/DispatcherOperation.cs 659] 000000F80A67EDB0 00007fff3e912dd3 System.Windows.Threading.DispatcherOperation.Wait(System.TimeSpan) [/_/src/Microsoft.DotNet.Wpf/src/WindowsBase/System/Windows/Threading/DispatcherOperation.cs 220] 000000F80A67EDF0 00007fff3e91ddb7 System.Windows.Threading.Dispatcher.InvokeImpl(System.Windows.Threading.DispatcherOperation, System.Threading.CancellationToken, System.TimeSpan) [/_/src/Microsoft.DotNet.Wpf/src/WindowsBase/System/Windows/Threading/Dispatcher.cs 1384] 000000F80A67EE80 00007fff3e91da7e System.Windows.Threading.Dispatcher.Invoke(System.Action, System.Windows.Threading.DispatcherPriority, System.Threading.CancellationToken, System.TimeSpan) [/_/src/Microsoft.DotNet.Wpf/src/WindowsBase/System/Windows/Threading/Dispatcher.cs 627] 000000F80A67EF00 00007fff3e91d7f5 System.Windows.Threading.Dispatcher.Invoke(System.Action) [/_/src/Microsoft.DotNet.Wpf/src/WindowsBase/System/Windows/Threading/Dispatcher.cs 509] 000000F80A67EF40 00007fff3ebe6acd xxx.UiViewModelBase.GetDataAndUpdate(System.Collections.Generic.IEnumerable1System.String) 000000F80A67F0A0 00007fff3ebe54ff xxx.UiViewModelBase.Poll() 000000F80A67F310 00007fff3e1253a7 System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object) [/_/src/libraries/System.Private.CoreLib/src/System/Threading/ExecutionContext.cs 183] 000000F80A67F380 00007fff3e906d7e System.Threading.Tasks.Task.ExecuteWithThreadLocal(System.Threading.Tasks.Task ByRef, System.Threading.Thread) [/_/src/libraries/System.Private.CoreLib/src/System/Threading/Tasks/Task.cs 2333] 000000F80A67F430 00007fff9c48b32a System.Threading.Tasks.ThreadPoolTaskSchedulerc..cctorb__10_0(System.Object) [/_/src/libraries/System.Private.CoreLib/src/System/Threading/Tasks/ThreadPoolTaskScheduler.cs 35] 000000F80A67F460 00007fff9c461d41 System.Threading.Thread.StartCallback() [/_/src/coreclr/System.Private.CoreLib/src/System/Threading/Thread.CoreCLR.cs 105] 000000F80A67F6F0 00007fff9cd1a573 [DebuggerU2MCatchHandlerFrame: 000000f80a67f6f0]最后的作业就留给这位朋友了优化代码逻辑将 PriorityQueue 给降下去当然原则上来说朋友没有反馈卡死可能它这个程序是无人值守的所以不知道UI线程的惨样。三总结这次生产事故的分析给我的dump分析之旅增加了一点点缀毕竟也给我涨了点见识期待下次精彩相遇。