Tolerating Memory Leaks
Tolerating Memory LeaksMichael D. Bond Kathryn S. McKinley
Bugs in Deployed SoftwareDeployed software failsDifferent environment and inputs different behaviorsGreater complexity & reliance
Bugs in Deployed SoftwareDeployed software failsDifferent environment and inputs different behaviorsGreater complexity & reliance
Memory leaks are a real problemFixing leaks is hardMemory Leaks in Deployed SystemsMemory leaks are a real problemManaged languages do not eliminate themMemory Leaks in Deployed SystemsMemory leaks are a real problemManaged languages do not eliminate themLiveReachableDeadMemory Leaks in Deployed SystemsMemory leaks are a real problemManaged languages do not eliminate themLiveReachableDeadMemory Leaks in Deployed SystemsMemory leaks are a real problemManaged languages do not eliminate themLiveReachableDeadMemory Leaks in Deployed SystemsMemory leaks are a real problemManaged languages do not eliminate themLiveReachableDeadMemory Leaks in Deployed SystemsMemory leaks are a real problemManaged languages do not eliminate themSlow & crash real programsLiveDeadMemory Leaks in Deployed SystemsMemory leaks are a real problemManaged languages do not eliminate themSlow & crash real programsUnacceptable for some applicationsMemory Leaks in Deployed SystemsMemory leaks are a real problemManaged languages do not eliminate themSlow & crash real programsUnacceptable for some applications
Fixing leaks is hardLeaks take time to materializeFailure far from cause
Examplehttp://www.codeproject.com/KB/showcase/IfOnlyWedUsedANTSProfiler.aspxDriverless truck10,000 lines of C#Leak: past obstacles remained reachableNo immediate symptomsThis problem was pernicious because it only showed up after 40 minutes to an hour of driving around and collecting obstacles.Quick fix: after 40 minutes, stop & rebootEnvironment sensitiveMore obstacles in deployment: failed in 28 minutes
Examplehttp://www.codeproject.com/KB/showcase/IfOnlyWedUsedANTSProfiler.aspxDriverless truck10,000 lines of C#Leak: past obstacles remained reachableNo immediate symptomsThis problem was pernicious because it only showed up after 40 minutes to an hour of driving around and collecting obstacles.Quick fix: restart after 40 minutesEnvironment sensitiveMore obstacles in deploymentFailed in 28 minutes
Examplehttp://www.codeproject.com/KB/showcase/IfOnlyWedUsedANTSProfiler.aspxDriverless truck10,000 lines of C#Leak: past obstacles remained reachableNo immediate symptomsThis problem was pernicious because it only showed up after 40 minutes to an hour of driving around and collecting obstacles.Quick fix: restart after 40 minutesEnvironment sensitiveMore obstacles in deploymentFailed in 28 minutes
ExampleDriverless truck10,000 lines of C#Leak: past obstacles remained reachableNo immediate symptomsThis problem was pernicious because it only showed up after 40 minutes to an hour of driving around and collecting obstacles.Quick fix: restart after 40 minutesEnvironment sensitiveMore obstacles in deploymentUnresponsive after 28 minuteshttp://www.codeproject.com/KB/showcase/IfOnlyWedUsedANTSProfiler.aspxUncertainty in Deployed SoftwareUnknown leaks; unexpected failuresOnline leak diagnosis helpsToo late to help failing systemsUncertainty in Deployed SoftwareUnknown leaks; unexpected failuresOnline leak diagnosis helpsToo late to help failing systemsAlso tolerate leaksUncertainty in Deployed SoftwareUnknown leaks; unexpected failuresOnline leak diagnosis helpsToo late to help failing systemsAlso tolerate leaksIllusion of fixUncertainty in Deployed SoftwareUnknown leaks; unexpected failuresOnline leak diagnosis helpsToo late to help failing systemsAlso tolerate leaksIllusion of fixEliminate bad effectsDont slowDont crashUncertainty in Deployed SoftwareUnknown leaks; unexpected failuresOnline leak diagnosis helpsToo late to help failing systemsAlso tolerate leaksIllusion of fixEliminate bad effectsPreserve semanticsDont slowDont crashDefer OOM errorsPredicting the FutureDead objects not used againHighly stale objects likely leakedLiveReachableDeadPredicting the FutureDead objects not used againHighly stale objects likely leaked
[Chilimbi & Hauswirth 04][Qin et al. 05][Bond & McKinley 06]LiveReachableDeadTolerating Leaks with MeltMove highly stale objects to diskMuch larger than memoryTime & space proportional to live memoryPreserve semantics
Stale objectsIn-use objectsStale objectsSounds like Paging!
Stale objectsIn-use objectsStale objectsSounds like Paging!Paging insufficient for managed languagesNeed object granularity
GCs working set is all reachable objectsSounds like Paging!Paging insufficient for managed languagesNeed object granularity
GCs working set is all reachable objects
Bookmarking collection [Hertz et al. 05]Challenge #1: How does Melt identify stale objects?rootsAEBCFD
rootsAEBCFDGC:for all fields a.f a.f |= 0x1;
Challenge #1: How does Melt identify stale objects?rootsGC:for all fields a.f a.f |= 0x1;
AEBCFDChallenge #1: How does Melt identify stale objects?rootsGC:for all fields a.f a.f |= 0x1;
Application:b = a.f;if (b & 0x1) { b &= ~0x1; a.f = b; [atomic]}
AEBCFDChallenge #1: How does Melt identify stale objects?rootsAEBCFDGC:for all fields a.f a.f |= 0x1;
Application:b = a.f;if (b & 0x1) { b &= ~0x1; a.f = b; [atomic]}
Add 6% to application timeChallenge #1: How does Melt identify stale objects?rootsGC:for all fields a.f a.f |= 0x1;
Application:b = a.f;if (b & 0x1) { b &= ~0x1; a.f = b; [atomic]}
AEFDCBChallenge #1: How does Melt identify stale objects?rootsGC:for all fields a.f a.f |= 0x1;
Application:b = a.f;if (b & 0x1) { b &= ~0x1; a.f = b; [atomic]}
AEFDBCChallenge #1: How does Melt identify stale objects?Stale Spacestale space
roots
AEFDBCin-use spaceHeap nearly full move stale objects to diskStale Spacerootsin-use spacestale spaceAEBFCD
Challenge #2rootsin-use spacestale spaceAEBFCD
How does Melt maintain pointers?Stub-Scion Pairsrootsin-use spacestale spaceAEBFCD
BstubBscionscionspaceStub-Scion Pairsrootsin-use spacestale spaceAEBFCD
BstubBscionscionspaceB BscionsciontableStub-Scion Pairsrootsin-use spacestale spaceAEBFCD
BstubBscionscionspaceB Bscionsciontable?Scion-Referenced Object Becomes Stalerootsin-use spacestale spacescionspaceB BscionsciontableAEFCDBstubBscionB
Scion-Referenced Object Becomes Stalerootsin-use spacestale spacescionspacesciontableAEFCD
BstubBrootsin-use spacestale spacescionspacesciontableAEFCD
BstubBChallenge #3What if program accesses highly stale object?Application Accesses Stale Objectrootsin-use spacestale spacescionspacesciontableAEFCD
BstubB b = a.f; if (b & 0x1) { b &= ~0x1; if (inStaleSpace(b)) b = activate(b); a.f = b; [atomic] }Application Accesses Stale Objectrootsin-use spacestale spacescionspaceC CscionsciontableEFCstubDABstubBCCscion
Application Accesses Stale Objectrootsin-use spacestale spacescionspaceC CscionsciontableFCstubDABstubBCscion
ECApplication Accesses Stale Objectrootsin-use spacestale spacescionspaceC CscionsciontableFCstubDABstubBCscion
ECImplementationIntegrated into Jikes RVM 2.9.2Works with any tracing collectorEvaluation uses generational copying collectorImplementationIntegrated into Jikes RVM 2.9.2Works with any tracing collectorEvaluation uses generational copying collector64-bit120 GB
32-bit2 GB
ImplementationIntegrated into Jikes RVM 2.9.2Works with any tracing collectorEvaluation uses generational copying collector64-bit120 GB
mappingstub32-bit2 GBPerformance EvaluationMethodologyDaCapo, SPECjbb2000, SPECjvm98Dual-core Pentium 4Deterministic execution (replay)Results6% overhead (read barriers)Stress test: still 6% overheadSpeedups in tight heaps (reduced GC workload)Tolerating LeaksLeakMelts effectEclipse DiffTolerates until 24-hr limit (1,000X longer)Eclipse Copy-PasteTolerates until 24-hr limit (194X longer)JbbModTolerates until 20-hr crash (19X longer)ListLeakTolerates until disk full (200X longer)SwapLeakTolerates until disk full (1,000X longer)MySQLSome highly stale but in-use (74X longer)Delaunay MeshShort-runningDualLeakHeap growth is in-use (2X longer)SPECjbb2000Heap growth is mostly in-use (2X longer)Mckoi DatabaseThread leak: extra support needed (2X longer)Tolerating LeaksLeakMelts effectEclipse DiffTolerates until 24-hr limit (1,000X longer)Eclipse Copy-PasteTolerates until 24-hr limit (194X longer)JbbModTolerates until 20-hr crash (19X longer)ListLeakTolerates until disk full (200X longer)SwapLeakTolerates until disk full (1,000X longer)MySQLSome highly stale but in-use (74X longer)Delaunay MeshShort-runningDualLeakHeap growth is in-use (2X longer)SPECjbb2000Heap growth is mostly in-use (2X longer)Mckoi DatabaseThread leak: extra support needed (2X longer)Tolerating LeaksLeakMelts effectEclipse DiffTolerates until 24-hr limit (1,000X longer)Eclipse Copy-PasteTolerates until 24-hr limit (194X longer)JbbModTolerates until 20-hr crash (19X longer)ListLeakTolerates until disk full (200X longer)SwapLeakTolerates until disk full (1,000X longer)MySQLSome highly stale but in-use (74X longer)Delaunay MeshShort-runningDualLeakHeap growth is in-use (2X longer)SPECjbb2000Heap growth is mostly in-use (2X longer)Mckoi DatabaseThread leak: extra support needed (2X longer)Tolerating LeaksLeakMelts effectEclipse DiffTolerates until 24-hr limit (1,000X longer)Eclipse Copy-PasteTolerates until 24-hr limit (194X longer)JbbModTolerates until 20-hr crash (19X longer)ListLeakTolerates until disk full (200X longer)SwapLeakTolerates until disk full (1,000X longer)MySQLSome highly stale but in-use (74X longer)Delaunay MeshShort-runningDualLeakHeap growth is in-use (2X longer)SPECjbb2000Heap growth is mostly in-use (2X longer)Mckoi DatabaseThread leak: extra support needed (2X longer)Tolerating LeaksLeakMelts effectEclipse DiffTolerates until 24-hr limit (1,000X longer)Eclipse Copy-PasteTolerates until 24-hr limit (194X longer)JbbModTolerates until 20-hr crash (19X longer)ListLeakTolerates until disk full (200X longer)SwapLeakTolerates until disk full (1,000X longer)MySQLSome highly stale but in-use (74X longer)Delaunay MeshShort-runningDualLeakHeap growth is in-use (2X longer)SPECjbb2000Heap growth is mostly in-use (2X longer)Mckoi DatabaseThread leak: extra support needed (2X longer)Tolerating LeaksLeakMelts effectEclipse DiffTolerates until 24-hr limit (1,000X longer)Eclipse Copy-PasteTolerates until 24-hr limit (194X longer)JbbModTolerates until 20-hr crash (19X longer)ListLeakTolerates until disk full (200X longer)SwapLeakTolerates until disk full (1,000X longer)MySQLSome highly stale but in-use (74X longer)Delaunay MeshShort-runningDualLeakHeap growth is in-use (2X longer)SPECjbb2000Heap growth is mostly in-use (2X longer)Mckoi DatabaseThread leak: extra support needed (2X longer)Tolerating LeaksLeakMelts effectEclipse DiffTolerates until 24-hr limit (1,000X longer)Eclipse Copy-PasteTolerates until 24-hr limit (194X longer)JbbModTolerates until 20-hr crash (19X longer)ListLeakTolerates until disk full (200X longer)SwapLeakTolerates until disk full (1,000X longer)MySQLSome highly stale but in-use (74X longer)Delaunay MeshShort-runningDualLeakHeap growth is in-use (2X longer)SPECjbb2000Heap growth is mostly in-use (2X longer)Mckoi DatabaseThread leak: extra support needed (2X longer)Tolerating LeaksLeakMelts effectEclipse DiffTolerates until 24-hr limit (1,000X longer)Eclipse Copy-PasteTolerates until 24-hr limit (194X longer)JbbModTolerates until 20-hr crash (19X longer)ListLeakTolerates until disk full (200X longer)SwapLeakTolerates until disk full (1,000X longer)MySQLSome highly stale but in-use (74X longer)Delaunay MeshShort-runningDualLeakHeap growth is in-use (2X longer)SPECjbb2000Heap growth is mostly in-use (2X longer)Mckoi DatabaseThread leak: extra support needed (2X longer)Eclipse Diff: Reachable Memory
Eclipse Diff: Reachable Memory
Eclipse Diff: Performance
Eclipse Diff: Performance
Eclipse Diff: Performance
Managed [LeakSurvivor, Tang et al. 08] [Panacea, Goldstein et al. 07, Breitgand et al. 07]Dont guarantee time & space proportional to live memory
Native [Cyclic memory allocation, Nguyen & Rinard 07] [Plug, Novark et al. 08]Different challenges & opportunitiesLess coverage or change semantics
Orthogonal persistence & distributed GCBarriers, swizzling, object faulting, stub-scion pairs
Related WorkConclusionFinding bugs before deployment is hard
ConclusionFinding bugs before deployment is hard
Online diagnosis helps developersHelp users in meantime
Tolerate leaks with Melt: illusion of fix
Stale objectsConclusionFinding bugs before deployment is hard
Online diagnosis helps developersHelp users in meantime
Tolerate leaks with Melt: illusion of fixTime & space proportional to live memoryPreserve semanticsConclusionFinding bugs before deployment is hard
Online diagnosis helps developersHelp users in meantime
Tolerate leaks with Melt: illusion of fixTime & space proportional to live memoryPreserve semanticsBuys developers time to fix leaksThank you!BackupTriggering MeltINACTIVEMARK STALEMOVE & MARK STALEWAITHeap not nearly fullHeap full ornearly fullHeap full ornearly fullStartExpected heapfullnessHeap notnearly fullAftermarkingUnexpectedheap fullnessBackConclusionFinding bugs before deployment is hard
Online diagnosis helps developersTo help users in meantime, tolerate bugs
Tolerate leaks with Melt: illusion of fix
Stale objectsRelated Work: Tolerating BugsNondeterministic errors [Atom-Aid] [DieHard] [Grace] [Rx]Memory corruption: perturb layoutConcurrency bugs: perturb schedulingGeneral bugsIgnore failing operations [FOC]Need higher level, more proactive approaches
Melts GC OverheadBack