HeapMD: Identifying Heap- based Bugs using Anomaly Detection Trishul M. Chilimbi Microsoft Research Redmond, WA [email protected] Vinod Ganapathy University of Wisconsin Madison, WI [email protected] ASPLOS XII, October 2006 San Jose, California
Dec 30, 2015
HeapMD: Identifying Heap-based Bugs using Anomaly Detection
Trishul M. ChilimbiMicrosoft Research
Redmond, WA
Vinod GanapathyUniversity of Wisconsin
Madison, WI
ASPLOS XII, October 2006San Jose, California
ASPLOS'06 Chilimbi/Ganapathy - HeapMD: Identifying Heap-based Bugs using Anomaly Detection 2
A motivating examplepNewAsset = Initialize(pAssetParams);...if (pAssetList->next != NULL) {
pNewAsset->next = pAssetList->nextpAssetList->next = pNewAsset
}
… …pAssetList
pNewAsset
ASPLOS'06 Chilimbi/Ganapathy - HeapMD: Identifying Heap-based Bugs using Anomaly Detection 3
A motivating examplepNewAsset = Initialize(pAssetParams);...if (pAssetList->next != NULL) {
pNewAsset->next = pAssetList->nextpAssetList->next = pNewAsset
}
… …pAssetList
pNewAsset
ASPLOS'06 Chilimbi/Ganapathy - HeapMD: Identifying Heap-based Bugs using Anomaly Detection 4
Noteworthy points
• Violation of implicit data-structure invariant– Only extremal nodes must have indegree=1
• Malformed, but pointer-correct structure– Bug does not necessarily result in a crash
… …pAssetList
pNewAsset
…
ASPLOS'06 Chilimbi/Ganapathy - HeapMD: Identifying Heap-based Bugs using Anomaly Detection 5
Key challenges
• Programmers rarely write down invariants for heap data structures
• Bugs may not be immediately apparent
Can we infer invariants for the heapand use them for bug detection?
ASPLOS'06 Chilimbi/Ganapathy - HeapMD: Identifying Heap-based Bugs using Anomaly Detection 6
Presenting HeapMD
• A tool to monitor the health of the heap
Highlights of resultsHeapMD found 40 bugs (31 new)
in 5 large commercial applications
ASPLOS'06 Chilimbi/Ganapathy - HeapMD: Identifying Heap-based Bugs using Anomaly Detection 7
Talk outline
• Motivation
• Stability of the heap
• Application to bug-finding
• Related work and conclusion
ASPLOS'06 Chilimbi/Ganapathy - HeapMD: Identifying Heap-based Bugs using Anomaly Detection 8
Heap data
A programmer’s perspective
1. Invisible No tangible representation
2. Can be arbitrarily modified Akin to self-modifying code
3. Can have arbitrary structure Akin to programming with gotos
If this were true in practice, building large software would be untenable
ASPLOS'06 Chilimbi/Ganapathy - HeapMD: Identifying Heap-based Bugs using Anomaly Detection 9
Heap data
The reality
1. Invisible No tangible representation
2. Can be arbitrarily modified Often only a small fraction is modified
3. Can have arbitrary structure In practice, structure is simple
In practice, the heap has a simple, stable structure
ASPLOS'06 Chilimbi/Ganapathy - HeapMD: Identifying Heap-based Bugs using Anomaly Detection 10
Stability of pointer-valued locations
0%
20%
40%
60%
80%
100%
gzip crafty mcf parser twolf vpr gcc vortex average
Number of pointer-valued heap locations that only store
NULL during their lifetime
Number of pointer-valued heap locations that only store only one
value during their lifetime
Number of pointer-valued heap locations that only store only two
values during their lifetime
Number of pointer-valued heap locations that store more than two values during their lifetime
91.22%
Strong evidence that a large fraction of the heap is not modified
ASPLOS'06 Chilimbi/Ganapathy - HeapMD: Identifying Heap-based Bugs using Anomaly Detection 11
Simple structure of the heap
• Most data structures in large programs have low in- and out-degrees– Linked lists– Trees– Hash tables
Can we quantify the simplicity and stability of the heap?
ASPLOS'06 Chilimbi/Ganapathy - HeapMD: Identifying Heap-based Bugs using Anomaly Detection 12
Simple metrics suffice
• % nodes with indegree = 0 (root nodes)
• % nodes with outdegree = 0 (leaf nodes)
• % nodes with indegree = 1
• % nodes with indegree = 2
• % nodes with outdegree = 1
• % nodes with outdegree = 2
• % nodes with indegree = outdegree
ASPLOS'06 Chilimbi/Ganapathy - HeapMD: Identifying Heap-based Bugs using Anomaly Detection 13
Gathering metrics: Setup
• Instrument program (x86) to track metrics
• Execute instrumented program on a set of inputs
• Gather at metric-computation points– Function entry points– Frequency is tunable: currently 1/100,000
ASPLOS'06 Chilimbi/Ganapathy - HeapMD: Identifying Heap-based Bugs using Anomaly Detection 14
% indegree = outdegree% indegree = 2
% outdegree = 0 (leaves)
ASPLOS'06 Chilimbi/Ganapathy - HeapMD: Identifying Heap-based Bugs using Anomaly Detection 15
Algorithm to find stable metrics
Computemetrics
Program
Calibrationinputs
Metricreports
Determinestability
Is stable for 40% inputs?
Metric unstable(discard)
Find range of metric
Stable metrics and their ranges
Check for range violation
Other 60%
Can potentially find bugs exercised by calibration inputs
ASPLOS'06 Chilimbi/Ganapathy - HeapMD: Identifying Heap-based Bugs using Anomaly Detection 16
Stable metrics exist: SPECSPEC b/m # Inputs # Stable
twolf 3 6
crafty 3 2
mcf 3 4
vpr 6 1
vortex 5 1
gzip 100 2
parser 100 3
gcc 100 2
% of vertices with Indegree = OutdegreeRange = [14.2%, 17.2%]
ASPLOS'06 Chilimbi/Ganapathy - HeapMD: Identifying Heap-based Bugs using Anomaly Detection 17
Stable metrics exist: Commercial
Benchmark # Inputs # Stable
Multimedia 50 2
Interactive web-app. 50 2
PC-game (simulation) 50 2
PC-game (action) 50 1
Productivity 50 2
ASPLOS'06 Chilimbi/Ganapathy - HeapMD: Identifying Heap-based Bugs using Anomaly Detection 18
Extending the observation
Several degree-based metrics of the heap-graph remain stable
as the heap evolves
Stable heap metrics exist even across different development
versions of a program
In fact, we observed that …
ASPLOS'06 Chilimbi/Ganapathy - HeapMD: Identifying Heap-based Bugs using Anomaly Detection 19
Metric stability across versions
Benchmark # Versions # Inputs # Stable
Multimedia 5 10 2
Interactive web-app. 5 10 2
PC-game (simulation) 5 10 2
PC-game (action) 5 10 1
Productivity 5 10 2
ASPLOS'06 Chilimbi/Ganapathy - HeapMD: Identifying Heap-based Bugs using Anomaly Detection 20
Talk outline
• Motivation
• Stability of the heap
• Application to bug-finding
• Related work and conclusion
ASPLOS'06 Chilimbi/Ganapathy - HeapMD: Identifying Heap-based Bugs using Anomaly Detection 21
Architecture of HeapMD
Binaryinstrumenter
input.exe
output.exe
Stable metricsand their ranges
Metricgatherer
Metricanalyzer
Calibrationinput set
Observed metrics
Calibration
Anomalydetector
“On the field”input set
Metric rangeviolations
Bug-finding
GoalIdentify a subset of
observed metrics asstable metrics
GoalCheck the metrics
identified by trainingfor range violation
ASPLOS'06 Chilimbi/Ganapathy - HeapMD: Identifying Heap-based Bugs using Anomaly Detection 22
Finding bugs using HeapMD
Key intuitionRange violation Likely bug
ASPLOS'06 Chilimbi/Ganapathy - HeapMD: Identifying Heap-based Bugs using Anomaly Detection 23
Recall our motivating examplepNewAsset = Initialize(pAssetParams);...if (pAssetList->next != NULL) {
pNewAsset->next = pAssetList->nextpAssetList->next = pNewAsset
}
… …pAssetList
pNewAsset
Violated invariantOnly extremal nodes
have indegree=1
ASPLOS'06 Chilimbi/Ganapathy - HeapMD: Identifying Heap-based Bugs using Anomaly Detection 24
Range violation for this exampleIndegree = 1 for PC Game (Action)
0
5
10
15
20
25
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61Execution Progress
Perc
enta
ge o
f Vert
exe
s
Log the call stackfor diagnostics
ASPLOS'06 Chilimbi/Ganapathy - HeapMD: Identifying Heap-based Bugs using Anomaly Detection 25
Summary of bugs foundBenchmark # Bugs # New
bugs
# Invariant
violations
Multimedia 8 6 3
Interactive web-app 10 6 5
PC-game (simulation) 9 6 2
PC-game (action) 8 8 3
Productivity 5 5 4
Total 40 31 17
ASPLOS'06 Chilimbi/Ganapathy - HeapMD: Identifying Heap-based Bugs using Anomaly Detection 26
Kinds of bugs found (1)
• Erroneous insertion into linked list
% vertices with indegree = 0, 1% vertices with outdegree = 0, 1
% vertices with indegree = outdegree
ASPLOS'06 Chilimbi/Ganapathy - HeapMD: Identifying Heap-based Bugs using Anomaly Detection 27
Kinds of bugs found (2)
• Shared data structure manipulation errors
Delete
??
% vertices with outdegee = 0% vertices with indegee = 2% vertices with indegree = outdegree
ASPLOS'06 Chilimbi/Ganapathy - HeapMD: Identifying Heap-based Bugs using Anomaly Detection 28
Kinds of bugs found (3)
• Data structure invariants
% vertices with outdegee = 1, 2% vertices with indegee = 1, 2
ASPLOS'06 Chilimbi/Ganapathy - HeapMD: Identifying Heap-based Bugs using Anomaly Detection 29
Kinds of bugs found (3)
• Data structure invariants
% vertices with indegree = outdegree % vertices with indegee = 1
See the paper for several more examples
ASPLOS'06 Chilimbi/Ganapathy - HeapMD: Identifying Heap-based Bugs using Anomaly Detection 30
Comparison with SWAT[Chilimbi and Hauswirth, ASPLOS’04]
Benchmark Leaks #FP Leaks #FP Other bugs
Multimedia 4 0 2 0 6
Interactive web-app.
9 1 4 0 6
PC-Game (simulation)
4 1 3 0 6
SWAT HeapMD
ASPLOS'06 Chilimbi/Ganapathy - HeapMD: Identifying Heap-based Bugs using Anomaly Detection 31
Characteristics of bugs found
• Systemic bugs: Repeated often enough to affect heap-graph metric
• HeapMD cannot find “one-off” bugs– Temporary data structure invariant violations
ASPLOS'06 Chilimbi/Ganapathy - HeapMD: Identifying Heap-based Bugs using Anomaly Detection 32
False positives and negatives
• In our experiments: 0 false positives– Metrics are computed for the whole heap
rather than per data structure– Anomaly detector only looks for range
violations, not stability
• HeapMD cannot find all bugs– Bugs with no effect on heap-graph metrics– Bugs that affect heap-graph metrics, but do
not violate calibrated range
Comes at a cost: False negatives
ASPLOS'06 Chilimbi/Ganapathy - HeapMD: Identifying Heap-based Bugs using Anomaly Detection 33
Talk outline
• Motivation
• Stability of the heap
• Application to bug-finding
• Related work and conclusion
ASPLOS'06 Chilimbi/Ganapathy - HeapMD: Identifying Heap-based Bugs using Anomaly Detection 34
Related work• Tools for special classes of bugs
– Purify, Valgrind, SWAT, …– Not built for detecting invariant violations
• Tools to find invariants– Daikon [Ernst], DIDUCE [Hangal and Lam, ICSE 2002]
– HeapMD complements these in the types of invariants found
• Shape analysis algorithms and tools– Can find invariant violations given a
correctness specification
ASPLOS'06 Chilimbi/Ganapathy - HeapMD: Identifying Heap-based Bugs using Anomaly Detection 35
Points to take home
Stable heap-graph metrics exist.Their ranges serve as invariants
Range violation Bug likely exercised
Can find non-crashing bugse.g., invariant violations
HeapMD: Identifying Heap-based Bugs using Anomaly Detection
Trishul M. ChilimbiMicrosoft Research
Redmond, WA
Vinod GanapathyUniversity of Wisconsin
Madison, WI
Questions?