Systematic Architecture Level Fault Diagnosis Using Statistical Techniques Bachelor Thesis by Fabian Keller
Systematic Architecture Level Fault Diagnosis Using Statistical Techniques
Bachelor Thesis by Fabian Keller
Estimated Costs 2012as reported by Britton et al. [2013]
11.11.2014 STARDUST - Fabian Keller 2
Agenda
1. Automated Fault Diagnosis
2. State of the Art
3. Case Study: AspectJ
4. Evaluation
5. Conclusions
11.11.2014 STARDUST - Fabian Keller 3
Agenda
1. Automated Fault Diagnosis
2. State of the Art
3. Case Study: AspectJ
4. Evaluation
5. Conclusions
11.11.2014 STARDUST - Fabian Keller 4
Fault Diagnosiswhat is the current practice?
Goal: Pinpoint single/multiple failure/s
Commonly used techniques:
• System.out.println()
• Symbolic Debugging
• Static Slicing / Dynamic Slicing
There is room for improvement!
11.11.2014 STARDUST - Fabian Keller 5
Automated Fault Diagnosisis it possible?
B1 B2 B3 B4 B5 Error
Test1 1 0 0 0 0 0
Test2 1 1 0 0 0 0
Test3 1 1 1 1 1 0
Test4 1 1 1 1 1 0
Test5 1 1 1 1 1 1
Test6 1 1 1 0 1 0
11.11.2014 STARDUST - Fabian Keller 6
By intuition: A block is more suspicious, if:- It is involved in failing test cases- It is not involved in passing test cases
Ranking Metrics… it is possible
Tarantula𝑆𝑆𝑇𝑇 =
#𝐼𝐼𝐼𝐼#𝐼𝐼𝐼𝐼 + #𝑁𝑁𝐼𝐼
#𝐼𝐼𝐼𝐼#𝐼𝐼𝐼𝐼 + #𝑁𝑁𝐼𝐼 + #𝐼𝐼𝐼𝐼
#𝐼𝐼𝐼𝐼 + #𝑁𝑁𝐼𝐼
Jaccard𝑆𝑆𝐽𝐽 =
#𝐼𝐼𝐼𝐼#𝐼𝐼𝐼𝐼 + #𝑁𝑁𝐼𝐼 + #𝐼𝐼𝐼𝐼
Ochiai𝑆𝑆𝑂𝑂 =
#𝐼𝐼𝐼𝐼(#𝐼𝐼𝐼𝐼 + #𝑁𝑁𝐼𝐼) ⋅ #𝐼𝐼𝐼𝐼 + #𝐼𝐼𝐼𝐼
Involved / Not involved / Failing / Passing
11.11.2014 STARDUST - Fabian Keller 7
B1 B2 B3 B4 B5 Error
Test1 1 0 0 0 0 0
Test2 1 1 0 0 0 0
Test3 1 1 1 1 1 0
Test4 1 1 1 1 1 0
Test5 1 1 1 1 1 1
Test6 1 1 1 0 1 0
𝑆𝑆𝑇𝑇 0,50 0,56 0,63 0,71 0,63
𝑆𝑆𝐽𝐽 0,17 0,20 0,25 0,33 0,25
𝑆𝑆𝑂𝑂 0,41 0,45 0,50 0,58 0,50
Ranking:1. B4 2. B3, B5 3. B2 4. B1
Agenda
1. Automated Fault Diagnosis
2. State of the Art
3. Case Study: AspectJ
4. Evaluation
5. Conclusions
11.11.2014 STARDUST - Fabian Keller 8
Commonly Used Dataand its limiting factors
11.11.2014 STARDUST - Fabian Keller 9
Software-artifact Infrastructure Repository• Siemens set• space program
Program Faulty versions LOC Test cases Descriptionprint_tokens 7 478 4130 Lexical anayzer
print_tokens2 10 399 4115 Lexical analyzer
replace 32 512 5542 Pattern recognition
schedule 9 292 2650 Priority scheduler
schedule2 10 301 2710 Priority scheduler
tcas 41 141 1608 Altitude separation
tot_info 23 440 1052 Information measure
space 38 6218 13585 Array definition language
Performance Metricshow can fault localization performance be evaluated?
• Wasted Effort (WE):
Ranking: L4, L3, L2, L7, L6, L1, L5, L9, L10, L8
Wasted Effort (prominent bug): 2 (or 20%)
• Proportion of Bugs Localized (PBL)
Percentage of bugs localized with WE < p%
• Hit@X
Number of bugs localized after inspecting X elements
11.11.2014 STARDUST - Fabian Keller 10
Agenda
1. Automated Fault Diagnosis
2. State of the Art
3. Case Study: AspectJ
4. Evaluation
5. Conclusions
11.11.2014 STARDUST - Fabian Keller 11
AspectJ – Lines of Codenearly doubled in the examined timespan
11.11.2014 STARDUST - Fabian Keller 12
AspectJ – Commitsactive development with mostly 50+ commits per month
11.11.2014 STARDUST - Fabian Keller 13
AspectJ – Bugsnearly 2500 bugs reported in the examined time span
11.11.2014 STARDUST - Fabian Keller 14
AspectJ – Dataless than 40% of the investigated bugs are applicable for SBFL
AspectJ AJDT Sum
All bugs 1544 886 2430
Bugs in iBugs 285 65 350
Classified Bugs 99 11 110
Applicable Bugs 41 1 42
Involved Bugs 20 1 21
11.11.2014 STARDUST - Fabian Keller 15
What happened?
Bug 36234workarounds cannot be used as evaluation oracle
11.11.2014 STARDUST - Fabian Keller 16
Bug report: „Getting an out of memory error when compiling with Ajc 1.1 RC1 […]”
Pre-Fix Post-Fix
Bug 61411platform specific bugs are mostly not present in test suites
11.11.2014 STARDUST - Fabian Keller 17
Bug report: „[…] highlights a problem that I've seen using ajdoc.bat on Windows […]”
Pre-Fix Post-Fix
Bug 151182synchronization bugs are mostly not present in test suites
11.11.2014 STARDUST - Fabian Keller 18
Bug report: „[…] recompiled the aspect using 1.5.2 and tried to run it […], but it fails with a NullPointerException.[…]”
Pre-Fix Post-Fix
Agenda
1. Automated Fault Diagnosis
2. State of the Art
3. Case Study: AspectJ
4. Evaluation
5. Conclusions
11.11.2014 STARDUST - Fabian Keller 19
Research Questions
• RQ1: How does the program size influence fault localization
performance?
• RQ2: How many bugs can be found when examining a fixed
amount of ranked elements?
• RQ3: How does the program size influence suspiciousness
scores produced by different ranking metrics?
• RQ4: Are the fault localization performance metrics
currently used by the research community valid?
11.11.2014 STARDUST - Fabian Keller 20
RQ1: Program Size vs. SBFL Performance?multiple ranked elements are mapped to the same suspiciousness
11.11.2014 STARDUST - Fabian Keller 21
11.11.2014 STARDUST - Fabian Keller 22
RQ4: Are the Performance Metrics Valid?on average, no bugs can be found in the first 100 lines
11.11.2014 STARDUST - Fabian Keller 23
RQ4: Are the Performance Metrics Valid?with luck, 33% of all bugs can be found in the first 1000 lines
11.11.2014 STARDUST - Fabian Keller 24
Agenda
1. Automated Fault Diagnosis
2. State of the Art
3. Case Study: AspectJ
4. Evaluation
5. Conclusions
11.11.2014 STARDUST - Fabian Keller 25
Conclusionsthere is still some work to be done
• Bugs need more context to be fully understood
• Current metrics cannot be applied to large projects
• SBFL is not feasible for large projects
• New metrics are starting point for future work
11.11.2014 STARDUST - Fabian Keller 26
Thank you for your attention!
Questions?
11.11.2014 STARDUST - Fabian Keller 27
RQ2: examining a fixed amountinspect more than 100 files to find 50% of all bugs
11.11.2014 STARDUST - Fabian Keller 28
RQ3: Program Size vs. Suspiciousnessmean suspiciousness drops for larger programs
11.11.2014 STARDUST - Fabian Keller 29
WAUC: Weighted Area Under Curve
11.11.2014 STARDUST - Fabian Keller 30