ImpactScale: Quantifying Change Impact to Predict Faults in Large Software Systems Kenichi Kobayashi Fujitsu Laboratories Akihiko Matsuo Fujitsu Laboratories Manabu Kamimura Fujitsu Laboratories Toshiaki Yoshino Fujitsu Yasuhiro Hayase University of Tsukuba Katsuro Inoue Osaka University
40
Embed
Impact Analysis - ImpactScale: Quantifying Change Impact to Predict Faults in Large Software Systems
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
ImpactScale: Quantifying Change Impact to Predict Faults in Large Software Systems
Kenichi KobayashiFujitsu Laboratories
Akihiko Matsuo Fujitsu Laboratories
Manabu KamimuraFujitsu Laboratories
Toshiaki YoshinoFujitsu
Yasuhiro HayaseUniversity of Tsukuba
Katsuro InoueOsaka University
ICSM2011 @ Williamsburg, 2011-09-27
Overview
1. Background and Goal2. Definition of ImpactScale3. Measuring ImpactScale in Real Systems4. Fault Prediction and Evaluation5. Summary
Copyright 2011 FUJITSU LABORATORIES LIMITED1
ICSM2011 @ Williamsburg, 2011-09-27
Practitioners’ Point of View
Copyright 2011 FUJITSU LABORATORIES LIMITED
Background Fault prediction in maintenance is a difficult task, and
predictive performance is not enough only with product metrics. Product Metrics are metrics extracted from software product such as
source code.
Therefore, process metrics, such as code churn and logical coupling, have been combined to product metrics. Process Metrics are metrics extracted from software process such as
change histories.
However, in enterprise scenes of maintenance, documents, change histories, bug reports, and specialists’ knowledge are often lost, out-of-date, or unable to be used.
Probabilistic PropagationWe assume that change impact probabilistically propagates
from a node to another node as some Ripple Effect studies. [Hanny72] [Tsantalis05] [Sharafat07]
In this presentation, propagation probability is always 0.5.
×0.5
×0.5×0.5
Quantity of change impact
from the source node
Propagation Probability
Propagation Probability
Change!Change!
8
ICSM2011 @ Williamsburg, 2011-09-27
To avoid overestimation, we used context information to eliminate unlikely propagation.We use an edge’s relation type as minimal context information in
point of computational time.
Cut Rules determine whether propagation from one node to its next node is cut or not, referring its previous and next edge’s relation type.
We call such controlled propagation relation-sensitive propagation.
Computational complexity is practically low.Copyright 2011 FUJITSU LABORATORIES LIMITED
Relation-sensitive Propagation
currentnode
nextrelation type
previous relation type next
nodeCut Rulerefer
refer
9
ICSM2011 @ Williamsburg, 2011-09-27
Cut Rule 2During finding callers,
don’t find callees.
Example of Cut Rules
Copyright 2011 FUJITSU LABORATORIES LIMITED
Change!Change! Change!Change!
Cut Rule 1During finding callees,
don’t find callers.
Cut Rule 3Don’t find beyond
READ edges.
Example from “C” Example from “F”
10
ICSM2011 @ Williamsburg, 2011-09-27
Overview
1. Background and Goal2. Definition of ImpactScale3. Measuring ImpactScale in Real Systems4. Fault Prediction and Evaluation5. Summary
Practically, these Precision/Recall/F1 evaluations are not very useful. Because in maintenance, high fault-estimated modules tend to be large. Actually, in the case of DS2, the top 10% of high fault-estimated modules
has 24% LOC. It is not effort-effective.
Adding IS improves all performance measures supports RQ1 is YES.
All improvements are significant in Wilcoxon’s signed rank test.
Faults are predicted using logistic regression. MET = Model without ImpactScale / MET+IS = Model with ImpactScale
18
ICSM2011 @ Williamsburg, 2011-09-27
Effort-aware Fault Prediction Model Problem In maintenance, modules estimated as faulty tend to be large. A large module needs large effort to be reviewed or tested.
Practitioners’ Opinion “Budget and schedule are very demanding. We want to find more faults
with less effort.” Therefore, effort-effectiveness is our main concern.
We use “Effort-aware model” [Arisholm06] [Menzies10] [Mende10]
It prioritize modules in the order of relative riskto maximize effort-effectiveness.
Poisson Regression is used to learn relative risk.
Copyright 2011 FUJITSU LABORATORIES LIMITED
)()(#
xEffortxerrors
19
ICSM2011 @ Williamsburg, 2011-09-27
AUC is the Area Under the Curve of lift chart. AUC shows overall predictive performance. High AUC means high
performance.
ddr10 is “detected defect rate in first 10% effort”. ddr10 shows the predictive
performance in the limited effort. High ddr10 means high performance.
Practitioners’ Point of View
In maintenance, budget, schedule and effort is always limited, therefore, ddr10 is more important.
Results of Effort-aware Evaluation
Copyright 2011 FUJITSU LABORATORIES LIMITED
DS1-MET+ISDS1-MET
Optimal
Faul
ts d
etec
ted
Effort (LOC inspected)
《Effort-based Cumulative Lift Chart of DS1》
PerformanceMeasure
DS1-MET
DS1-MET+IS
Improvement by IS
AUC 0.635 0.680 +0.045ddr10 0.186 0.296 ×1.60
0.296
0.186
All improvements are significant in Wilcoxon’s signed rank test.
20-1
ICSM2011 @ Williamsburg, 2011-09-27
Results of Effort-aware Evaluation
Copyright 2011 FUJITSU LABORATORIES LIMITED
DS1-MET+ISDS1-MET
Optimal
Faul
ts d
etec
ted
Effort (LOC inspected)
DS2-MET+ISDS2-MET
Optimal
Effort (LOC inspected)
Faul
ts d
etec
ted
《Effort-based Cumulative Lift Chart of DS1》 《Effort-based Cumulative Lift Chart of DS2》
PerformanceMeasure
DS1-MET
DS1-MET+IS
Improvement by IS
AUC 0.635 0.680 +0.045ddr10 0.186 0.296 ×1.60
PerformanceMeasure
DS2-MET
DS2-MET+IS
Improvement by IS
AUC 0.669 0.714 +0.045ddr10 0.225 0.343 ×1.53
0.296
0.186
0.343
0.225
All improvements are significant in Wilcoxon’s signed rank test.
Does adding ImpactScale to existing product metrics improve predictive performance?
RQ1is YES.
20-2
ICSM2011 @ Williamsburg, 2011-09-27
Comparison with Network MeasuresNetwork Measures Recently, [Zimmermann et al. ICSE08] applied Social Network Analysis
(SNA) on a software dependency graph representing relationships between binary modules of software systems.
Over 50 network measures were used. For example,• in/out Degrees• Network Diameter• Closeness• Eigenvector Centrality, etc.
They and some replication studies [Tosun09][Nguyen10] reported they work well in some cases.
Copyright 2011 FUJITSU LABORATORIES LIMITED
“Does adding ImpactScale to existing product metrics and network measures improve predictive performance?”
RQ2
a.k.a. Page Rank
21
ICSM2011 @ Williamsburg, 2011-09-27
ImpactScale vs. Network MeasuresHierarchical Model Comparison based on Effort-aware Model
Copyright 2011 FUJITSU LABORATORIES LIMITED
Models are learned by using Principal Component Poisson Regression.
All improvements and deterioration are significant in Wilcoxon’s signed rank test.
*: P<0.05, **: P<0.01, unmarked: P<0.001
Model with existing metrics
+ImpactScale
+network measures
+network measures
+ImpactScale
Adding ImpactScale
improves performance.
22-1
ICSM2011 @ Williamsburg, 2011-09-27
ImpactScale vs. Network MeasuresHierarchical Model Comparison based on Effort-aware Model
Copyright 2011 FUJITSU LABORATORIES LIMITED
Models are learned by using Principal Component Poisson Regression.
All improvements and deterioration are significant in Wilcoxon’s signed rank test.
*: P<0.05, **: P<0.01, unmarked: P<0.001
Model with existing metrics
+ImpactScale
+network measures
+network measures
+ImpactScale
Adding ImpactScale
improves performance.
“Does adding ImpactScale to existing product metrics and network measures improve predictive performance?”