Background Limitations Griffin Empirical Study Conclusion Sangmin Park , Mary Jean Harrold, Richard Vuduc Georgia Institute of Technology Griffin : Grouping Suspicious Memory- Access Patterns to Improve Understanding of Concurrency Bugs
Background Limitations Griffin Empirical Study Conclusion
Sangmin Park, Mary Jean Harrold, Richard VuducGeorgia Institute of Technology
Griffin: Grouping Suspicious Memory-Access Patterns to Improve Understanding of
Concurrency Bugs
Background Limitations Griffin Empirical Study Conclusion
2
Difficult to Debug and Fix
Time to DebugConcurrency Bugs*
Hours(28%)
Months(9%)
Days (63%)
* P. Godefroid and N. Nagappan. Concurrency at Microsoft: An exploratory survey. (EC)2, 2008.
Background Limitations Griffin Empirical Study Conclusion
3
Difficult to Debug and Fix
* Z. Yin et al. How do fixes become bugs? — a comprehensive characteristic study on incorrect fixes in commercial and open source operating systems, ESEC/FSE 2011
Incorrect Fixof Concurrency Bugs*
Correct (61%)
Incorrect (39%)
Background Limitations Griffin Empirical Study Conclusion
4
Existing Techniques• Automatic fault-localization
• Suspicious pair of interaction [Jin 10]• Memory-interaction list [Lucia 11]• Memory-access patterns [Park 10, Park12]
• Semi-automated fix• Atomicity violation fix [Jin 11, Liu 12]• Order/atomicity violation fix [Jin 12]
Limitations• Low-level memory accesses• Too much spurious information
Limitation• Require developer input
Background Limitations Griffin Empirical Study Conclusion
5
Overview
• Background
• Limitations
• Our technique: Griffin
• Empirical Study
• Conclusion
Background Limitations Griffin Empirical Study Conclusion
6
Concurrency Bugs
• Order violationA pair of memory accesses unintended program behavior
• Atomicity violationCode region should be atomic but is not unintended program behavior
* Lu et al. Learning from Mistakes---A comprehensive study on concurrency bugs. ASPLOS 2008.
Background Limitations Griffin Empirical Study Conclusion
7
Concurrency Bugs: Atomicity Violation
* Example from Java Collection Library (Vector).
b.sizeb.array
kk
a.sizea.array
Thread 2
b.addElement(c)
kk
Thread 1
a.initObject(b)
b.copyElements(a)
a = new Data(b)
+ cc
Background Limitations Griffin Empirical Study Conclusion
8
Concurrency Bugs: Atomicity Violation
* Example from Java Collection Library (Vector).
b.sizeb.array
kk
a.sizea.array
Thread 2
b.addElement(c)
kk
Thread 1
a.initObject(b)
b.copyElements(a)
a = new Data(b)
+ ccRb.size
Rb.size
Wb.size
Rb.array
Rb.size
Wb.size
Wb.array
Background Limitations Griffin Empirical Study Conclusion
9
Variables Type Memory Access Patterns
Single
Order
R1,S(x) W2,S(x)
W1,S(x) R2,S(x)
W1,S(x) W2,S(x)
Single-VariableAtomicity
R1,S1(x) W2,S2(x) R1,S3(x)
W1,S1(x) W2,S2(x) R1,S3(x)
W1,S1(x) R2,S2(x) W1,S3(x)
W1,S1(x) R2,S2(x) W1,S3(x)
W1,S1(x) W2,S2(x) W1,S3(x)
MultipleMulti-
VariableAtomicity
W1,S1(x) W2,S2(x) W2,S3(y) W1,S4(y)
W1,S1(x) W2,S2(y) W2,S3(x) W1,S4(y)
W1,S1(x) W2,S2(y) W1,S3(y) W2,S4(x)
W1,S1(x) R2,S2(x) R2,S3(y) W1,S4(y)
W1,S1(x) R2,S2(y) R2,S3(x) W1,S4(y)
R1,S1(x) W2,S2(x) W2,S3(y) R1,S4(y)
R1,S1(x) W2,S2(y) W2,S3(x) R1,S4(y)
R1,S1(x) W2,S2(y) R1,S3(y) W2,S4(x)
W1,S1(x) R2,S2(x) W1,S3(y) R2,S4(y)
Problematic Memory-Access Patterns
Patterns identified by Vaziri, Tip, Dolby. POPL 2006.
Fault-localization techniquesRecord suspicious memory-access patterns and report them in a ranked list (e.g., [Jin 10, Lucia 11, Park 10, Park 12])
Background Limitations Griffin Empirical Study Conclusion
10
L1. Context Information
main
Data
getSize
copyElements….
Dynamic Calling Context
main
addElement
addSize addArray
….
* Example from Java Collection Library (Vector).
Thread 2Thread 1
b.addElement(c)
a.initObject(b)
b.copyElements(a)
a = new Data(b)
initObject
Thread 1 Thread 2
a.size = b.getSize()
b.addSize(c.array)b.addArray(c.array)
ProblemsExisting techniques• report only low-level memory accesses • lose context information
Background Limitations Griffin Empirical Study Conclusion
11
Thread 2Thread 1
b.addElement(c)
a.initObject(b)
b.copyElements(a)
a = new Data(b)
L2. Multiple Bugs
Rb.array
Rb.size
Wb.size
Wb.array
Sample Report
…
1)2)3)
4)
RWR – size
RWWR – size/array
mem-order 3
mem-order 4
ProblemExisting techniques • do not handle multiple concurrency
bugs
* Example from Java Collection Library (Vector).
Background Limitations Griffin Empirical Study Conclusion
12
L3. False-positive Patterns
Sample Report
…
1)2)3)
4)
RWR – size
RWWR – size/array
mem-order 3
mem-order 4
* Example from Java Collection Library (Vector).
Thread 2Thread 1
b.addElement(c)
a.initObject(b)
b.copyElements(a)
a = new Data(b)
Rb.array
Rb.size
Wb.size
Wb.array
ProblemExisting techniques • do not handle false-positive
memory accesses
Background Limitations Griffin Empirical Study Conclusion
Our Technique: Griffin
13
1. Fault Localization
2. Test Clustering
3. Bug Reconstruction
Background Limitations Griffin Empirical Study Conclusion
14
Step 1: Fault Localization
* Park, Vuduc, Harrold [ICST 2012]
Method [Unicorn, ICST 2012]:1. Collect pairs of memory accesses in multiple tests2. Combine pairs to patterns offline3. Rank patterns by associating patterns with
failures
Background Limitations Griffin Empirical Study Conclusion
15
Step 1: Fault Localization
Thread 2Thread 1
b.addElement(c)
a.initObject(b)
b.copyElements(a)
a = new Data(b)
Generate ranked list of patterns for each failing test
t1RWR 271-851-681
RWWR 271-851-852-682RW 271-851RWR 250-353-252
t2 RWR 271-801-681
RWWR 271-801-802-682RW 271-801RWR 222-453-224t3
t4
RWR 271-851-681
RWWR 271-851-852-682RWR 250-354-253RW 271-851RWR 271-801-681
RWR 222-454-225RW 271-801RWWR 271-801-802-682
Background Limitations Griffin Empirical Study Conclusion
16
Step 2: Test ClusteringMethod [Fault-localization-based clustering]: 1. Create initial clusters for each failing test with p
patterns2. Merge if similarity (Jaccard) above threshold th
Until no more clusters can be merged
* Jones, Bowring, Harrold [ISSTA 2007]
Background Limitations Griffin Empirical Study Conclusion
17
t1 t2
t3 t4
RWR 271-851-681RWWR 271-851-852-682
RW 271-851RWR 250-353-252
RWR 271-851-681RWWR 271-851-852-682
RWR 250-354-253RW 271-851
RWR 271-801-681RWWR 271-801-802-682
RW 271-801RWR 222-453-224
RWR 271-801-681
RWR 222-454-225RW 271-801RWWR 271-801-802-682
Step 2: Test ClusteringCluster by similarity of top patterns
p =4 and th = 0.6
Background Limitations Griffin Empirical Study Conclusion
18
RWR 271-851-681RWWR 271-851-852-682RW 271-851RWR 250-353-252
t1 t2
t3 t4RWR 271-851-681RWWR 271-851-852-682
RWR 250-354-253RW 271-851
RWR 271-801-681RWWR 271-801-802-682
RW 271-801RWR 222-453-224
RWR 271-801-681
RWR 222-454-225RW 271-801RWWR 271-801-802-682
Step 2: Test Clustering
3/5 or 0.6≥th
Cluster by similarity of top patternsp =4 and th = 0.6
Background Limitations Griffin Empirical Study Conclusion
19
t1 t2
t3 t4
RWR 271-851-681RWWR 271-851-852-682
RW 271-851RWR 250-353-252
RWR 271-851-681RWWR 271-851-852-682
RWR 250-354-253RW 271-851
RWR 271-801-681RWWR 271-801-802-682
RW 271-801RWR 222-453-224
RWR 271-801-681
RWR 222-454-225RW 271-801RWWR 271-801-802-682
Step 2: Test ClusteringCluster by similarity of top patterns
p =4 and th = 0.6
Background Limitations Griffin Empirical Study Conclusion
20
t1 t2
t3 t4
RWR 271-851-681RWWR 271-851-852-682
RW 271-851RWR 250-353-252
RWR 271-851-681RWWR 271-851-852-682
RWR 250-354-253RW 271-851
RWR 271-801-681RWWR 271-801-802-682
RW 271-801RWR 222-453-224
RWR 271-801-681
RWR 222-454-225RW 271-801RWWR 271-801-802-682
Step 2: Test Clustering
3/5 or 0.6≥th
Cluster by similarity of top patternsp =4 and th = 0.6
Background Limitations Griffin Empirical Study Conclusion
21
t1, t3 t2, t4
Two clusters of failing executions
Step 2: Test ClusteringCluster by similarity of top patterns
p =4 and th = 0.6
Background Limitations Griffin Empirical Study Conclusion
22
Step 3: Bug Reconstruction
* See the paper for detailed clustering policy
Method: 1. Perform call-stack-based clustering to group
true/false positive patterns(Agglomerative clustering like Step 2)
2. Identify suspicious methods, bug graph
Background Limitations Griffin Empirical Study Conclusion
23
Step 3: Bug Reconstruction
{t1,t3}
Cluster patterns based on call-stack similarity
RWR 271-851-681
RWWR 271-851-852-682RW 271-851RWR 250-353-252
RWR 271-851-681
RWWR 271-851-852-682RWR 250-354-253RW 271-851
Background Limitations Griffin Empirical Study Conclusion
24
Step 3: Bug Reconstruction
{t1,t3}
Cluster patterns based on call-stack similarity
RWR 271-851-681
RWWR 271-851-852-682RW 271-851RWR 250-353-252
RWR 271-851-681
RWWR 271-851-852-682RWR 250-354-253RW 271-851
RWR 271-851-681
RWWR 271-851-852-682RW 271-851RWR 250-353-252
RWR 250-354-253
Background Limitations Griffin Empirical Study Conclusion
25
Step 3: Bug ReconstructionCluster patterns based on call-stack similarity
RWR 271-851-681
RWWR 271-851-852-682RW 271-851RWR 250-353-252
RWR 250-354-253
Initial Clusters RWR 271-851-681
RWR 250-354-253
RW 271-851
RWWR 271-851-852-682
RWR 250-353-252
Background Limitations Griffin Empirical Study Conclusion
26
120 main()150 Data (Data c)270 int getSize()
130 void run()850 void addAll(Data c)
120 main()151 Data (Data b)680 void copyArray(a)
Step 3: Bug ReconstructionCluster patterns based on call-stack similarity
Initial Clusters RWR 271-851-681
RWWR 271-851-852-682
Background Limitations Griffin Empirical Study Conclusion
27
120 main()150 Data (Data c)270 int getSize()
130 void run()850 void addAll(Data c)
120 main()151 Data (Data b)680 void copyArray(a)
Step 3: Bug ReconstructionCluster patterns based on call-stack similarity
Initial Clusters RWR 271-851-681
RWWR 271-851-852-682
120 main()150 Data (Data c)270 int getSize()
130 void run()850 void addAll(Data c)
120 main()151 Data (Data b)680 void copyArray(a)
130 void run()850 void addAll(Data c)
Common call stacks are same for both clusters merge
* See the paper for detailed clustering policy
Background Limitations Griffin Empirical Study Conclusion
28
Step 3: Bug ReconstructionCluster patterns based on call-stack similarity
Initial Clusters RWR 271-851-681
RWR 250-354-253
RW 271-851
RWWR 271-851-852-682
RWR 250-353-252
271-851 part of 271-851-681
merge
Background Limitations Griffin Empirical Study Conclusion
29
Step 3: Bug ReconstructionCluster patterns based on call-stack similarity
Initial Clusters RWR 271-851-681
RW 271-851RWWR 271-851-852-682
120 main()150 Data (Data c)270 int getSize()
130 void run()850 void addAll(Data c)
120 main()151 Data (Data b)680 void copyArray(a)
Background Limitations Griffin Empirical Study Conclusion
30
Step 3: Bug ReconstructionCluster patterns based on call-stack similarity
Initial Clusters RWR 271-851-681
RW 271-851RWWR 271-851-852-682
120 main()150 Data (Data c)270 int getSize()
130 void run()850 void addAll(Data c)
120 main()151 Data (Data b)680 void copyArray(a)
Thread 2Thread 1
Background Limitations Griffin Empirical Study Conclusion
31
Step 3: Bug ReconstructionIdentify suspicious methods
Initial Clusters RWR 271-851-681
RW 271-851RWWR 271-851-852-682
120 main()150 Data (Data c)270 int getSize()
130 void run()850 void addAll(Data c)
120 main()151 Data (Data b)680 void copyArray(a)
Thread 2Thread 1
suspicious method: the method at the top in the common call stack.
Background Limitations Griffin Empirical Study Conclusion
32
Step 3: Bug Reconstruction
Thread 1 Thread 2
120 main()152 Data (Data b)680 void copyArray( a)
681 a.size = b.size;682 a.array = b.array;
120 main()150 Data (Data c)270 int getSize()
271 return size;130 void run()850 void addAll(Data c)
851 b.size += c.size;852 b.array += c.array;
R
W
RR
W
Present bug graph to developer
Background Limitations Griffin Empirical Study Conclusion
33
Empirical StudiesStudies
1. Evaluate effectiveness of finding multiple faults
2. Evaluate effectiveness of explaining the bug3. Evaluate efficiency of the technique
(See paper)
Empirical Setup• Implemented in Java (Soot) and C++ (Pin)• Evaluated on a set of subjects
Background Limitations Griffin Empirical Study Conclusion
34
Evaluation: SubjectsLanguage Program KLOC Num.
BugsBug Type
Java
TreeSet-1 7.5 5 Atomicity
TreeSet-2 7.5 3 Atomicity
StringBuffer-1 1.4 4 Atomicity
StringBuffer-2 1.4 1 Atomicity
Vector-1 9.5 4 Atomicity
Vector-2 9.5 2 Atomicity
C++
Mysql-169 331 1 Atomicity
Mysql-791 372 1 Atomicity
NSPR-165586 125 1 Atomicity
PBZip2 2 1 Order
Transmission 90 1 Order
Background Limitations Griffin Empirical Study Conclusion
35
Study 1: Handling Multiple Bugs
GoalTo investigate how well Griffin clusters failing executions responsible for the same bug
Method• Ran Step 2 of algorithm; p= 30, th= 0.8• Computed F-measure* values to evaluate
effectiveness of clustering algorithm
* F-measure is a standard method to evaluate clustering. See “M. Steinbach et al. A comparison of document clustering techniques. In Wksp, Text Mining, 2000.“
Background Limitations Griffin Empirical Study Conclusion
36
Study 1: Handling Multiple BugsProgram # Patterns # Bugs # Output
ClustersF-measure
TreeSet-1 714 5 7 0.88
TreeSet-2 656 3 4 0.91
StringBuffer-1 12 4 4 1.00
StringBuffer-2 3 1 1 1.00
Vector-1 18 4 4 1.00
Vector-2 10 2 2 1.00
Mysql-169 21834 1 1 1.00
Mysql-791 71694 1 2 0.94
NSPR-165586 1479 1 2 0.86
PBZip2 427 1 2 0.96
Transmission 226 1 1 1.00
• Most F-measures close to 1.00; indicates effectiveness clustering
• Manual inspection when F-measures < 1.00 indicates that if th is a lesser value, clustering is more effective may need to adjust parameters
Background Limitations Griffin Empirical Study Conclusion
37
Study 2: Reconstructing Bug Context
GoalTo investigate how well Griffin reconstructs bug context
Method• Ran Step 3 of algorithm• Investigated the results
Background Limitations Griffin Empirical Study Conclusion
38
Study 2: Reconstructing Bug ContextProgram # Bugs # Output
Clusters# False
PositivesSuspicious
Method contains bug
Call stack size
TreeSet-1 5 5 0 Y 6TreeSet-2 3 3 0 Y 6
StringBuffer-1 4 4 0 Y 1StringBuffer-2 1 1 0 Y 1
Vector-1 4 4 0 Y 1Vector-2 2 2 0 Y 1
Mysql-169 1 2 1 Y 9Mysql-791 1 1 0 Y 1
NSPR-165586 1 1 0 Y 4PBZip2 1 1 0 Y 0
Transmission 1 1 0 Y 7
Background Limitations Griffin Empirical Study Conclusion
39
Study 2: Reconstructing Bug ContextProgram # Bugs # Output
Clusters# False
PositivesSuspicious
Method contains bug
Call stack size
TreeSet-1 5 5 0 Y 6TreeSet-2 3 3 0 Y 6
StringBuffer-1 4 4 0 Y 1StringBuffer-2 1 1 0 Y 1
Vector-1 4 4 0 Y 1Vector-2 2 2 0 Y 1
Mysql-169 1 2 1 Y 9Mysql-791 1 1 0 Y 1
NSPR-165586 1 1 0 Y 4PBZip2 1 1 0 Y 0
Transmission 1 1 0 Y 7
1
Technique successfully outputs clusters of patterns with false positives
Background Limitations Griffin Empirical Study Conclusion
40
Study 2: Reconstructing Bug ContextProgram # Bugs # Output
Clusters# False
PositivesSuspicious
Method contains bug
Call stack size
TreeSet-1 5 5 0 Y 6TreeSet-2 3 3 0 Y 6
StringBuffer-1 4 4 0 Y 1StringBuffer-2 1 1 0 Y 1
Vector-1 4 4 0 Y 1Vector-2 2 2 0 Y 1
Mysql-169 1 2 1 Y 9Mysql-791 1 1 0 Y 1
NSPR-165586 1 1 0 Y 4PBZip2 1 1 0 Y 0
Transmission 1 1 0 Y 7
2
Technique successfully locates the bug in the suspicious method
Background Limitations Griffin Empirical Study Conclusion
41
Study 2: Reconstructing Bug ContextProgram # Bugs # Output
Clusters# False
PositivesSuspicious
Method contains bug
Call stack size
TreeSet-1 5 5 0 Y 6TreeSet-2 3 3 0 Y 6
StringBuffer-1 4 4 0 Y 1StringBuffer-2 1 1 0 Y 1
Vector-1 4 4 0 Y 1Vector-2 2 2 0 Y 1
Mysql-169 1 2 1 Y 9Mysql-791 1 1 0 Y 1
NSPR-165586 1 1 0 Y 4PBZip2 1 1 0 Y 0
Transmission 1 1 0 Y 7
3
Call-stack sizes greater than 0 in all but one case difficult to infer method with bug
Background Limitations Griffin Empirical Study Conclusion
42
Future Work
• Perform user studies to determine the usefulness of the technique to developers
• Perform more studies that involve multiple bugs
• Perform studies to give more guidance to select clustering parameters
Background Limitations Griffin Empirical Study Conclusion
43
Contributions• Fault explanation technique that provides
• Information about multiple bugs • Patterns of true- and false-positive• Visualization in a Bug Graph
• Empirical results that indicate the effectiveness of fault explanation• Effective in grouping concurrency bugs• Effective in explaining concurrency bugs
• See www.cc.gatech.edu/~sangminp/issta2013
QUESTIONS?
Background Limitations Griffin Empirical Study Conclusion
44
BackupSlides
Background Limitations Griffin Empirical Study Conclusion
45
Challenges
Engineeringissues…
Large context size
Efficient information gathering
Expensive manual inspection
Large number of patterns