Top Banner
Survey on Software Defect Prediction - PhD Qualifying Examination - July 3, 2014 Jaechang Nam Department of Computer Science and Engineering HKUST
97

Survey on Software Defect Prediction

Dec 05, 2014

Download

Technology

Sung Kim

PQE slides at HKUST
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Survey on Software Defect Prediction

Survey onSoftware Defect Prediction

- PhD Qualifying Examination -

July 3, 2014Jaechang Nam

Department of Computer Science and Engineering

HKUST

Page 2: Survey on Software Defect Prediction

2

Outline

• Background• Software Defect Prediction Approaches– Simple metric and defect estimation models– Complexity metrics and Fitting models– Prediction models– Just-In-Time Prediction Models– Practical Prediction Models and Applications– History Metrics from Software Repositories– Cross-Project Defect Prediction and

Feasibility

• Summary and Challenging Issues

Page 3: Survey on Software Defect Prediction

3

Motivation• General question of software defect

prediction– Can we identify defect-prone entities (source

code file, binary, module, change,...) in advance?• # of defects• buggy or clean

• Why?– Quality assurance for large software

(Akiyama@IFIP’71)

– Effective resource allocation• Testing (Menzies@TSE`07)

• Code review (Rahman@FSE’11)

Page 4: Survey on Software Defect Prediction

4

Ground Assumption

• The more complex, the more defect-prone

Page 5: Survey on Software Defect Prediction

5

Two Focuses on Defect Prediction

• How much complex is software and its process?– Metrics

• How can we predict whether software has defects?– Models based on the metrics

Page 6: Survey on Software Defect Prediction

6

Prediction Performance Goal

• Recall vs. Precision

• Strong predictor criteria– 70% recall and 25% false positive rate

(Menzies@TSE`07)

– Precision, recall, accuracy ≥ 75% (Zimmermann@FSE`09)

Page 7: Survey on Software Defect Prediction

7

Outline

• Background• Software Defect Prediction Approaches– Simple metric and defect estimation models– Complexity metrics and Fitting models– Prediction models– Just-In-Time Prediction Models– Practical Prediction Models and Applications– History Metrics from Software Repositories– Cross-Project Defect Prediction and

Feasibility

• Summary and Challenging Issues

Page 8: Survey on Software Defect Prediction

Defect Prediction Approaches1970

s 1980s 1990s 2000s 2010sLOC

Simple Model

Metr

ics

Mod

els

Oth

ers

Page 9: Survey on Software Defect Prediction

9

Identifying Defect-prone Entities

• Akiyama’s equation (Ajiyama@IFIP`71)

– # of defects = 4.86 + 0.018 * LOC (=Lines Of Code)

• 23 defects in 1 KLOC• Derived from actual systems

• Limitation– Only LOC is not enough to capture software

complexity

Page 10: Survey on Software Defect Prediction

Defect Prediction Approaches1970

s 1980s 1990s 2000s 2010sLOC

Simple Model

Fitting Model

Cyclomatic

MetricHalstea

d Metrics

Metr

ics

Mod

els

Oth

ers

Page 11: Survey on Software Defect Prediction

11

Complexity Metrics and Fitting Models

• Cyclomatic complexity metrics (McCabe`76)

– “Logical complexity” of a program represented in control flow graph

– V(G) = #edge – #node + 2

• Halstead complexity metrics (Halsted`77)

– Metrics based on # of operators and operands

– Volume = N * log2n

– # of defects = Volume / 3000

Page 12: Survey on Software Defect Prediction

12

Complexity Metrics and Fitting Models

• Limitation– Do not capture complexity (amount) of

change.– Just fitting models but not prediction

models in most of studies conducted in 1970s and early 1980s• Correlation analysis between metrics and # of

defects– By linear regression models

• Models were not validated for new entities (modules).

Page 13: Survey on Software Defect Prediction

Defect Prediction Approaches1970

s 1980s 1990s 2000s 2010sLOC

Simple Model

Fitting ModelPrediction Model (Regression)

Cyclomatic

MetricHalstea

d Metrics

Process Metrics

Metr

ics

Mod

els

Oth

ers

Prediction Model (Classification)

Page 14: Survey on Software Defect Prediction

14

Regression Model• Shen et al.’s empirical study (Shen@TSE`85)

– Linear regression model– Validated on actual new modules– Metrics

• Halstead, # of conditional statements• Process metrics

– Delta of complexity metrics between two successive system versions

– Measures• Between actual and predicted # of defects on new

modules– MRE (Mean magnitude of relative error)

» average of (D-D’)/D for all modules• D: actual # of defects• D’: predicted # of defects

» MRE = 0.48

Page 15: Survey on Software Defect Prediction

15

Classification Model• Discriminative analysis by Munson et al.

(Munson@TSE`92)

• Logistic regression• High risk vs. low risk modules• Metrics

– Halstead and Cyclomatic complexity metrics

• Measure– Type I error: False positive rate– Type II error: False negative rate

• Result– Accuracy: 92% (6 misclassifi cation out of 78 modules)– Precision: 85%– Recall: 73%– F-measure: 88%

Page 16: Survey on Software Defect Prediction

16

?

Defect Prediction Process(Based on Machine Learning)

Classification /Regression

SoftwareArchives

BCC

B

...

250

1

...

Instances withmetrics

(features) and labels

BC

B...

2

0

1

...

Training Instances

(Preprocessing)

Model

?

New instances

Generate

Instances

Builda

model

Page 17: Survey on Software Defect Prediction

17

Defect Prediction(Based on Machine Learning)

• Limitations– Limited resources for process metrics

• Error fix in unit testing phase was conducted informally by an individual developer (no error information available in this phase). (Shen@TSE`85)

– Existing metrics were not enough to capture complexity of object-oriented (OO) programs.

– Helpful for quality assurance team but not for individual developers

Page 18: Survey on Software Defect Prediction

Defect Prediction Approaches1970

s 1980s 1990s 2000s 2010sLOC

Simple Model

Fitting ModelPrediction Model (Regression)

Prediction Model (Classification)

Cyclomatic

MetricHalstea

d Metrics

Process Metrics

Metr

ics

Mod

els

Oth

ers

Just-In-Time Prediction Model

Practical Model and Applications

History Metrics

CK Metrics

Page 19: Survey on Software Defect Prediction

Defect Prediction Approaches1970

s 1980s 1990s 2000s 2010sLOC

Simple Model

Fitting ModelPrediction Model (Regression)

Prediction Model (Classification)

Cyclomatic

MetricHalstea

d Metrics

Just-In-Time Prediction Model

Practical Model and Applications

Process Metrics

Metr

ics

Mod

els

Oth

ers

History Metrics

CK Metrics

Page 20: Survey on Software Defect Prediction

20

Risk Prediction of Software Changes

(Mockus@BLTJ`00)• Logistic regression• Change metrics

– LOC added/deleted/modified– Diffusion of change– Developer experience

• Result– Both false positive and false negative rate:

20% in the best case

Page 21: Survey on Software Defect Prediction

21

Risk Prediction of Software Changes

(Mockus@BLTJ`00)• Advantage

– Show the feasible model in practice

• Limitation– Conducted 3 times per week

• Not fully Just-In-Time

– Validated on one commercial system (5ESS switching system software)

Page 22: Survey on Software Defect Prediction

22

BugCache (Kim@ICSE`07)

• Maintain defect-prone entities in a cache• Approach

• Result– Top 10% files account for 73-95% of defects on 7

systems

Page 23: Survey on Software Defect Prediction

23

BugCache (Kim@ICSE`07)

• Advantages– Cache can be updated quickly with less cost. (c.f.

static models based on machine learning)– Just-In-Time: always available whenever QA teams

want to get the list of defect-prone entities

• Limitations– Cache is not reusable for other software projects.– Designed for QA teams

• Applicable only in a certain time point after a bunch of changes (e.g., end of a sprint)

• Still limited for individual developers in development phase

Page 24: Survey on Software Defect Prediction

24

Change Classification (Kim@TSE`08)

• Classification model based on SVM• About 11,500 features

– Change metadata such as changed LOC, change count

– Complexity metrics– Text features from change log messages, source

code, and file names

• Results– 78% accuracy and 60% recall on average from 12

open-source projects

Page 25: Survey on Software Defect Prediction

25

Change Classification (Kim@TSE`08)

• Limitations– Heavy model (11,500 features)– Not validated on commercial software products.

Page 26: Survey on Software Defect Prediction

26

Follow-up Studies• Studies addressing limitations

– “Reducing Features to Improve Code Change-Based Bug Prediction” (Shivaj i@TSE`13)

• With less than 10% of all features, buggy F-measure is 21% improved.

– “Software Change Classification using Hunk Metrics” (Ferzund@ICSM`09)

• 27 hunk-level metrics for change classification• 81% accuracy, 77% buggy hunk precision, and 67% buggy hunk

recall

– “A large-scale empirical study of just-in-time quality assurance” (Kamei@TSE`13)

• 14 process metrics (mostly from Mockus`00)• 68% accuracy, 64% recall on 11open-source and commercial

projects

– “An Empirical Study of Just-In-Time Defect Prediction Using Cross-Project Models” (Fukushima@MSR`14)

• Median AUC: 0.72

Page 27: Survey on Software Defect Prediction

27

Challenges of JIT model

• Practical validation is difficult– Just 10-fold cross validation in current

literature– No validation on real scenario

• e.g., online machine learning

• Still difficult to review huge change– Fine-grained prediction within a change

• e.g., Line-level prediction

Page 28: Survey on Software Defect Prediction

Next Steps of Defect Prediction

1980s 1990s 2000s 2010s 2020s

Online Learning JIT Model

Prediction Model (Regression)

Prediction Model (Classification)

Just-In-Time Prediction Model

Process Metrics

Metrics

Mod

els

Oth

ers

Fine-grained Prediction

Page 29: Survey on Software Defect Prediction

Defect Prediction Approaches1970

s 1980s 1990s 2000s 2010sLOC

Simple Model

Fitting ModelPrediction Model (Regression)

Prediction Model (Classification)

Cyclomatic

MetricHalstea

d Metrics

Just-In-Time Prediction Model

Practical Model and Applications

Process Metrics

Metr

ics

Mod

els

Oth

ers

History Metrics

CK Metrics

Page 30: Survey on Software Defect Prediction

30

Defect Prediction in Industry• “Predicting the location and number of faults

in large software systems” (Ostrand@TSE`05)– Two industrial systems– Recall 86%– 20% most fault-prone modules account for 62%

faults

Page 31: Survey on Software Defect Prediction

31

Case Study for Practical Model

• “Does Bug Prediction Support Human Developers? Findings From a Google Case Study” (Lewis@ICSE`13)

– No identifiable change in developer behaviors after using defect prediction model

• Required characteristics but very challenging– Actionable messages / obvious reasoning

Page 32: Survey on Software Defect Prediction

Next Steps of Defect Prediction

1980s 1990s 2000s 2010s 2020s

Actionable Defect

Prediction

Prediction Model (Regression)

Prediction Model (Classification)

Just-In-Time Prediction Model

Practical Model and Applications

Process Metrics

Metrics

Mod

els

Oth

ers

Page 33: Survey on Software Defect Prediction

33

Evaluation Measure for Practical Model

• Measure prediction performance based on code review effort

• AUCEC (Area Under Cost Effectiveness Curve)

Percent of LOC

Perc

ent

of

bugs

found

0100%

100%

50%10%

M1

M2

Thre

shol

d

Rahman@FSE`11, Bugcache for inspections: Hit or miss?

Page 34: Survey on Software Defect Prediction

34

Practical Application

• What else can we do more with defect prediction models?– Test case selection on regression testing

(Engstrom@ICST`10)

– Prioritizing warnings from FindBugs (Rahman@ICSE`14)

Page 35: Survey on Software Defect Prediction

Defect Prediction Approaches1970

s 1980s 1990s 2000s 2010sLOC

Simple Model

Fitting ModelPrediction Model (Regression)

Prediction Model (Classification)

Cyclomatic

MetricHalstea

d Metrics

CK MetricsProcess Metrics

Metr

ics

Mod

els

Oth

ers

Practical Model and Applications

Just-In-Time Prediction Model

History Metrics

Page 36: Survey on Software Defect Prediction

36

Representative OO Metrics

Metric Description

WMC Weighted Methods per Class (# of methods)

DIT Depth of Inheritance Tree ( # of ancestor classes)

NOC Number of Children

CBO Coupling between Objects (# of coupled classes)

RFC Response for a class: WMC + # of methods called by the class)

LCOM Lack of Cohesion in Methods (# of "connected components”)

• CK metrics (Chidamber&Kemerer@TSE`94)

• Prediction Performance of CK vs. code (Basili@TSE`96)

– F-measure: 70% vs. 60%

Page 37: Survey on Software Defect Prediction

Defect Prediction Approaches1970

s 1980s 1990s 2000s 2010sLOC

Simple Model

Fitting ModelPrediction Model (Regression)

Prediction Model (Classification)

Cyclomatic

MetricHalstea

d Metrics

CK MetricsProcess Metrics

Metr

ics

Mod

els

Oth

ers

Practical Model and Applications

Just-In-Time Prediction Model

History Metrics

Page 38: Survey on Software Defect Prediction

38

Representative History Metrics

Name# of

metrics

Metric source Citation

Relative code change churn 8 SW Repo.* Nagappan@ICSE`05

Change 17 SW Repo. Moser@ICSE`08

Change Entropy 1 SW Repo. Hassan@ICSE`09

Code metric churnCode Entropy 2 SW Repo. D’Ambros@MSR`10

Popularity 5 Email archive

Bacchelli@FASE`10

Ownership 4 SW Repo. Bird@FSE`11

Micro Interaction Metrics (MIM) 56 Mylyn Lee@FSE`11

* SW Repo. = version control system + issue tracking system

Page 39: Survey on Software Defect Prediction

Representative History Metrics

• Advantage– Better prediction performance than code metrics

39

Moser`08 Hassan`09 D'Ambros`10 Bachille`10 Bird`11 Lee`110.0%

10.0%

20.0%

30.0%

40.0%

50.0%

60.0%

Performance Improvement(all metrics vs. code complexity metrics)

(F-measure) (F-measure)(Absoluteprediction

error)

(Spearmancorrelation)

(Spearmancorrelation)

(Spearmancorrelation*)

(*Bird`10’s results are from two metrics vs. code metrics, No comparison data in Nagappan`05)

PerformanceImprovement

(%)

Page 40: Survey on Software Defect Prediction

40

History Metrics

• Limitations– History metrics do not extract particular program

characteristics such as developer social network, component network, and anti-pattern.

– Noise data• Bias in Bug-Fix Dataset (Bird@FSE`09)

– Not applicable for new projects and projects lacking in historical data

Page 41: Survey on Software Defect Prediction

Defect Prediction Approaches1970

s 1980s 1990s 2000s 2010sLOC

Simple Model

Fitting ModelPrediction Model (Regression)

Prediction Model (Classification)

Cyclomatic

MetricHalstea

d Metrics

CK Metrics

Just-In-Time Prediction Model

Cross-Project Prediction

Practical Model and Applications

Universal

Model

Process Metrics

Cross-Project Feasibility

Metr

ics

Mod

els

Oth

ers

History Metrics

Other Metrics

Noise Reduction

Semi-supervised/active

Page 42: Survey on Software Defect Prediction

Defect Prediction Approaches1970

s 1980s 1990s 2000s 2010sLOC

Simple Model

Fitting ModelPrediction Model (Regression)

Prediction Model (Classification)

Cyclomatic

MetricHalstea

d Metrics

CK Metrics

Just-In-Time Prediction Model

Cross-Project Prediction

Practical Model and Applications

Universal

Model

Process Metrics

Cross-Project Feasibility

Metr

ics

Mod

els

Oth

ers

History Metrics

Other Metrics

Noise Reduction

Semi-supervised/active

Page 43: Survey on Software Defect Prediction

43

Other Metrics

Name# of

metrics

Metric source Citation

Component network 28

Binaries(Windows

Server 2003)

Zimmermann@ICSE`08

Developer-Module network 9 SW Repo. + Binaries

Pinzger@FSE`08

Developer social network 4 SW Repo. Meenely@FSE`08

Anti-pattern 4

SW Repo. +

Design-pattern

Taba@ICSM`13

* SW Repo. = version control system + issue tracking system

Page 44: Survey on Software Defect Prediction

Defect Prediction Approaches1970

s 1980s 1990s 2000s 2010sLOC

Simple Model

Fitting ModelPrediction Model (Regression)

Prediction Model (Classification)

Cyclomatic

MetricHalstea

d Metrics

CK Metrics

Just-In-Time Prediction Model

Cross-Project Prediction

Practical Model and Applications

Universal

Model

Process Metrics

Cross-Project Feasibility

Metr

ics

Mod

els

Oth

ers

History Metrics

Other Metrics

Noise Reduction

Semi-supervised/active

Page 45: Survey on Software Defect Prediction

45

Noise Reduction

• Noise detection and elimination algorithm (Kim@ICSE`11)

– Closest List Noise Identification (CLNI)• Based on Euclidean distance between instances

– Average F-measure improvement• 0.504 0.621

• Relink (Wo@FSE`11)

– Recover missing links between bugs and changes

– 60% 78% recall for missing links– F-measure improvement

• e.g. 0.698 (traditional) 0.731 (ReLink)

Page 46: Survey on Software Defect Prediction

Defect Prediction Approaches1970

s 1980s 1990s 2000s 2010sLOC

Simple Model

Fitting ModelPrediction Model (Regression)

Prediction Model (Classification)

Cyclomatic

MetricHalstea

d Metrics

CK Metrics

Just-In-Time Prediction Model

Cross-Project Prediction

Practical Model and Applications

Universal

Model

Process Metrics

Cross-Project Feasibility

Metr

ics

Mod

els

Oth

ers

History Metrics

Other Metrics

Semi-supervised/active

Page 47: Survey on Software Defect Prediction

47

Defect Prediction for New Software Projects

• Universal Defect Prediction Model

• Simi-supervised / active learning

• Cross-Project Defect Prediction

Page 48: Survey on Software Defect Prediction

48

Universal Defect Prediction Model

(Zhang@MSR`14)• Context-aware rank transformation– Transform metric values ranged from 1 to 10

across all projects.

• Model built by 1398 projects collected from SourceForge and Google code

Page 49: Survey on Software Defect Prediction

Defect Prediction Approaches1970

s 1980s 1990s 2000s 2010sLOC

Simple Model

Fitting ModelPrediction Model (Regression)

Prediction Model (Classification)

Cyclomatic

MetricHalstea

d Metrics

CK Metrics

Just-In-Time Prediction Model

Cross-Project Prediction

Practical Model and Applications

Universal

Model

Process Metrics

Cross-Project Feasibility

Metr

ics

Mod

els

Oth

ers

History Metrics

Other Metrics

Semi-supervised/active

Page 50: Survey on Software Defect Prediction

50

Other approaches for CDDP

• Semi-supervised learning with dimension reduction for defect prediction (Lu@ASE`12)

– Training a model by a small set of labeled instances together with many unlabeled instances

– AUC improvement• 0.83 0.88 with 2% labeled instances

• Sample-based semi-supervised/active learning for defect prediction (Li@AESEJ`12)

– Average F-measure• 0.628 0.685 with 10% sampled instances

Page 51: Survey on Software Defect Prediction

Defect Prediction Approaches1970

s 1980s 1990s 2000s 2010sLOC

Simple Model

Fitting ModelPrediction Model (Regression)

Prediction Model (Classification)

Cyclomatic

MetricHalstea

d Metrics

CK Metrics

Just-In-Time Prediction Model

Cross-Project Prediction

Practical Model and Applications

Universal

Model

Process Metrics

Cross-Project Feasibility

Metr

ics

Mod

els

Oth

ers

History Metrics

Other Metrics

Semi-supervised/active

Page 52: Survey on Software Defect Prediction

52

Cross-Project Defect Prediction

(CPDP)• For a new project or a project

lacking in the historical data

?

?

?

Training

Test

Model

Project A Project B

Only 2% out of 622 prediction combinations worked. (Zimmermann@FSE`09)

Page 53: Survey on Software Defect Prediction

Transfer Learning (TL)

27

Traditional Machine Learning (ML)

Learning

System

Learning

System

Transfer Learning

Learning

System

Learning

System

Knowledge

Transfer

Pan et al.@TNN`10, Domain Adaptation via Transfer Component Analysis

Page 54: Survey on Software Defect Prediction

54

CPDP

• Adopting transfer learning

Transfer learning

Metric Compensation

NN Filter TNB TCA+

Preprocessing N/AFeature

selection,Log-filter

Log-filter Normalization

Machine learner

C4.5 Naive Bayes TNBLogistic

Regression

# of Subjects 2 10 10 8

# of predictions

2 10 10 26

Avg. f-measure0.67

(W:0.79, C:0.58)0.35

(W:0.37, C:0.26)

0.39(NN: 0.35,

C:0.33)

0.46(W:0.46, C:0.36)

Citation Watanabe@PROMISE`08

Turhan@ESEJ`09 Ma@IST`12 Nam@ICSE`13

* NN = Nearest neighbor, W = Within, C = Cross

Page 55: Survey on Software Defect Prediction

55

Metric Compensation (Watanabe@PROMISE`08)

• Key idea

• New target metric value =target metric value * average source

metric value average target metric value

s

Source Target New Target

Let me transform like source!

Page 56: Survey on Software Defect Prediction

56

Metric Compensation (cont.)(Watanabe@PROMISE`08)

Transfer learning

Metric Compensation

NN Filter TNB TCA+

Preprocessing N/AFeature

selection,Log-filter

Log-filter Normalization

Machine learner

C4.5 Naive Bayes TNBLogistic

Regression

# of Subjects 2 10 10 8

# of predictions

2 10 10 26

Avg. f-measure0.67

(W:0.79, C:0.58)0.35

(W:0.37, C:0.26)

0.39(NN: 0.35,

C:0.33)

0.46(W:0.46, C:0.36)

Citation Watanabe@PROMISE`08

Turhan@ESEJ`09 Ma@IST`12 Nam@ICSE`13

* NN = Nearest neighbor, W = Within, C = Cross

Page 57: Survey on Software Defect Prediction

57

NN filter(Turhan@ESEJ`09)

• Key idea

• Nearest neighbor filter– Select 10 nearest source instances of

each target instance

New Source Target

Hey, you look like me! Could you be my model?

Source

Page 58: Survey on Software Defect Prediction

58

NN filter (cont.)(Turhan@ESEJ`09)

Transfer learning

Metric Compensation

NN Filter TNB TCA+

Preprocessing N/AFeature

selection,Log-filter

Log-filter Normalization

Machine learner

C4.5 Naive Bayes TNBLogistic

Regression

# of Subjects 2 10 10 8

# of predictions

2 10 10 26

Avg. f-measure0.67

(W:0.79, C:0.58)0.35

(W:0.37, C:0.26)

0.39(NN: 0.35,

C:0.33)

0.46(W:0.46, C:0.36)

Citation Watanabe@PROMISE`08

Turhan@ESEJ`09 Ma@IST`12 Nam@ICSE`13

* NN = Nearest neighbor, W = Within, C = Cross

Page 59: Survey on Software Defect Prediction

Transfer Naive Bayes(Ma@IST`12)

• Key idea

59

Target

Hey, you look like me! You will get more chance to be my best model!

Source

Provide more weight to similar source instances to build a Naive Bayes Model

Build a model

Please, consider me more important than other instances

I’m not that

important!

Page 60: Survey on Software Defect Prediction

60

Transfer Naive Bayes (cont.)(Ma@IST`12)

• Transfer Naive Bayes

– New prior probability

– New conditional probability

Page 61: Survey on Software Defect Prediction

61

Transfer Naive Bayes (cont.)(Ma@IST`12)

• How to find similar source instances for target– A similarity score

– A weight value

F1 F2 F3 F4Score

(si)

Max of target 7 3 2 5 -

src. inst 1 5 4 2 2 3

src. inst 2 0 2 5 9 1

Min of target 1 2 0 1 -

k=# of features, si=score of instance i

Page 62: Survey on Software Defect Prediction

62

Transfer Naive Bayes (cont.)(Ma@IST`12)

Transfer learning

Metric Compensation

NN Filter TNB TCA+

Preprocessing N/AFeature

selection,Log-filter

Log-filter Normalization

Machine learner

C4.5 Naive Bayes TNBLogistic

Regression

# of Subjects 2 10 10 8

# of predictions

2 10 10 26

Avg. f-measure0.67

(W:0.79, C:0.58)0.35

(W:0.37, C:0.26)

0.39(NN: 0.35,

C:0.33)

0.46(W:0.46, C:0.36)

Citation Watanabe@PROMISE`08

Turhan@ESEJ`09 Ma@IST`12 Nam@ICSE`13

* NN = Nearest neighbor, W = Within, C = Cross

Page 63: Survey on Software Defect Prediction

63

TCA+(Nam@ICSE`13)

• Key idea– TCA (Transfer Component Analysis)

Source Target

Oops, we are different! Let’s meet in another world!

New Source New Target

Page 64: Survey on Software Defect Prediction

64

Transfer Component Analysis (cont.)

• Feature extraction approach– Dimensionality reduction– Projection• Map original data

in a lower-dimensional feature space

1-dimensional feature space

2-dimensional feature space

Page 65: Survey on Software Defect Prediction

65

TCA (cont.)

Pan et al.@TNN`10, Domain Adaptation via Transfer Component Analysis

Target domain dataSource domain data

Page 66: Survey on Software Defect Prediction

66

TCA (cont.)

TCA

Pan et al.@TNN`10, Domain Adaptation via Transfer Component Analysis

Page 67: Survey on Software Defect Prediction

TCA+(Nam@ICSE`13)

67

Source Target

Oops, we are different! Let’s meet at another world!

New Source New Target

But, we are still a bit different!

Source Target

Oops, we are different! Let’s meet at another world!

New Source New Target

Normalize US together!

TCATCA+

Page 68: Survey on Software Defect Prediction

Normalization Options

• NoN: No normalization applied

• N1: Min-max normalization (max=1, min=0)

• N2: Z-score normalization (mean=0, std=1)

• N3: Z-score normalization only using source mean and standard deviation

• N4: Z-score normalization only using target mean and standard deviation

13

Page 69: Survey on Software Defect Prediction

69

Preliminary Results using TCA

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

F-measure

*Baseline: Cross-project defect prediction without TCA and normalization

Prediction performance of TCA varies according to different

normalization options!Baseline NoN N1 N2 N3 N4

Baseline NoN N1 N2 N3 N4

Project A Project B

Project B Project A

F-m

easu

re

Page 70: Survey on Software Defect Prediction

70

TCA+: Decision Rules

• Find a suitable normalization for TCA

• Steps–#1: Characterize a dataset–#2: Measure similarity

between source and target datasets

–#3: Decision rules

Page 71: Survey on Software Defect Prediction

71

TCA+: #1. Characterize a Dataset

3

1

Dataset A

Dataset B

2

4

5

8

9

6

11

d1,

2

d1,

5

d1,

3

d3,11

3

1

24

5

8

9

611

d2,

6

d1,

2 d1,

3

d3,11

DIST={dij : i,j, 1 ≤ i < n, 1 < j ≤ n, i < j}

A

Page 72: Survey on Software Defect Prediction

72

TCA+: #2. Measure Similarity between Source and Target

• Minimum (min) and maximum (max) values of DIST

• Mean and standard deviation (std) of DIST• The number of instances

Page 73: Survey on Software Defect Prediction

73

TCA+: #3. Decision Rules

• Rule #1– Mean and Std are same NoN

• Rule #2– Max and Min are different N1 (max=1,

min=0)

• Rule #3,#4– Std and # of instances are different

N3 or N4 (src/tgt mean=0, std=1)

• Rule #5– Default N2 (mean=0, std=1)

Page 74: Survey on Software Defect Prediction

74

TCA+ (cont.)(Nam@ICSE`13)

Transfer learning

Metric Compensation

NN Filter TNB TCA+

Preprocessing N/AFeature

selection,Log-filter

Log-filter Normalization

Machine learner

C4.5 Naive Bayes TNBLogistic

Regression

# of Subjects 2 10 10 8

# of predictions

2 10 10 26

Avg. f-measure0.67

(W:0.79, C:0.58)0.35

(W:0.37, C:0.26)

0.39(NN: 0.35,

C:0.33)

0.46(W:0.46, C:0.36)

Citation Watanabe@PROMISE`08

Turhan@ESEJ`09 Ma@IST`12 Nam@ICSE`13

* NN = Nearest neighbor, W = Within, C = Cross

Page 75: Survey on Software Defect Prediction

75

Current CPDP using TL• Advantages

– Comparable prediction performance to within-prediction models

– Benefit from the state-of-the-art TL approaches

• Limitation– Performance of some cross-prediction pairs is still

poor. (Negative Transfer)

Source Target

Page 76: Survey on Software Defect Prediction

Defect Prediction Approaches

1970s 1980s 1990s 2000s 2010sLOC

Simple Model

Fitting ModelPrediction Model (Regression)

Prediction Model (Classification)

Cyclomatic

MetricHalstea

d Metrics

CK Metrics

Just-In-Time Prediction Model

Cross-Project Prediction

Practical Model and Applications

Universal

Model

Process Metrics

Cross-Project Feasibility

Metr

ics

Mod

els

Oth

ers

History Metrics

Other Metrics

Semi-supervised/active

Page 77: Survey on Software Defect Prediction

77

Feasibility Evaluation for CPDP

• Solution for negative transfer– Decision tree using project characteristic metrics

(Zimmermann@FSE`09)

• E.g. programming language, # developers, etc.

Page 78: Survey on Software Defect Prediction

78

Follow-up Studies• “An investigation on the feasibility of cross-

project defect prediction.” (He@ASEJ`12)

– Decision tree using distributional characteristics of a dataset E.g. mean, skewness, peakedness, etc.

Page 79: Survey on Software Defect Prediction

79

Feasibility for CPDP

• Challenges on current studies– Decision trees were not evaluated properly.

• Just fitting model

– Low target prediction coverage• 5 out of 34 target projects were feasible for cross-

predictions (He@ASEJ`12)

Page 80: Survey on Software Defect Prediction

Next Steps of Defect Prediction

1980s 1990s 2000s 2010s 2020s

Cross-Prediction Feasibility

Model

Prediction Model (Regression)

Prediction Model (Classification)

CK Metrics

Just-In-Time Prediction Model

Cross-Project Prediction

Practical Model and Applications

Universal

Model

Process Metrics

Cross-Project Feasibility

Metrics

Mod

els

Oth

ers

History Metrics

Other Metrics

Semi-supervised/active

Page 81: Survey on Software Defect Prediction

Semi-supervised/active

Defect Prediction Approaches

1970s 1980s 1990s 2000s 2010sLOC

Simple Model

Fitting ModelPrediction Model (Regression)

Prediction Model (Classification)

Cyclomatic

MetricHalstea

d Metrics

CK Metrics

History Metrics

Just-In-Time Prediction Model

Cross-Project Prediction

Other Metrics

Practical Model and Applications

Universal

Model

Process Metrics

Cross-Project Feasibility

Metr

ics

Mod

els

Oth

ers

Personalized Model

Page 82: Survey on Software Defect Prediction

82

Cross-prediction Model• Common challenge

– Current cross-prediction models are limited to datasets with same number of metrics

– Not applicable on projects with different feature spaces (different domains)• NASA Dataset: Halstead, LOC• Apache Dataset: LOC, Cyclomatic, CK metrics

Source Target

Page 83: Survey on Software Defect Prediction

Next Steps of Defect Prediction

1980s 1990s 2000s 2010s 2020s

Prediction Model (Regression)

Prediction Model (Classification)

CK Metrics

Just-In-Time Prediction Model

Cross-Project Prediction

Practical Model and Applications

Universal

Model

Process Metrics

Cross-Project Feasibility

Metrics

Mod

els

Oth

ers

Cross-Domain

Prediction

History Metrics

Other Metrics

Noise Reduction

Semi-supervised/activePersonalized Model

Page 84: Survey on Software Defect Prediction

84

Other Topics

Page 85: Survey on Software Defect Prediction

Defect Prediction Approaches

1970s 1980s 1990s 2000s 2010sLOC

Simple Model

Fitting ModelPrediction Model (Regression)

Prediction Model (Classification)

Cyclomatic

MetricHalstea

d Metrics

CK Metrics

History Metrics

Just-In-Time Prediction Model

Cross-Project Prediction

Other Metrics

Practical Model and Applications

Universal

Model

Process Metrics

Cross-Project Feasibility

Metr

ics

Mod

els

Oth

ers

Data Privacy

Noise Reduction

Semi-supervised/activePersonalized Model

Page 86: Survey on Software Defect Prediction

86

Other Topics• Privacy issue on defect datasets

– MORPH (Peters@ICSE`12)

• Mutate defect datasets while keeping prediction accuracy

• Can accelerate cross-project defect prediction with industrial datasets

• Personalized defect prediction model (Jiang@ASE`13)

– “Different developers have different coding styles, commit frequencies, and experience levels, all of which cause different defect patterns.”

– Results• Average F-measure: 0.62 (personalized models) vs. 0.59

(non-personalized models)

Page 87: Survey on Software Defect Prediction

87

Outline

• Background• Software Defect Prediction Approaches– Simple metric and defect estimation models– Complexity metrics and Fitting models– Prediction models– Just-In-Time Prediction Models– Practical Prediction Models and Applications– History Metrics from Software Repositories– Cross-Project Defect Prediction and

Feasibility

• Summary and Challenging Issues

Page 88: Survey on Software Defect Prediction

Defect Prediction Approaches

1970s 1980s 1990s 2000s 2010sLOC

Simple Model

Fitting ModelPrediction Model (Regression)

Prediction Model (Classification)

Cyclomatic

MetricHalstea

d Metrics

CK Metrics

History Metrics

Just-In-Time Prediction Model

Cross-Project Prediction

Other Metrics

Practical Model and Applications

Data Privacy

Universal

Model

Process Metrics

Cross-Project Feasibility

Metr

ics

Mod

els

Oth

ers Noise

Reduction

Semi-supervised/activePersonalized Model

Page 89: Survey on Software Defect Prediction

Next Steps of Defect Prediction

1980s 1990s 2000s 2010s 2020s

Online Learning JIT Model

Actionable Defect

Prediction

Cross-Prediction Feasibility

Model

Prediction Model (Regression)

Prediction Model (Classification)

CK Metrics

History Metrics

Just-In-Time Prediction Model

Cross-Project Prediction

Other Metrics

Practical Model and Applications

Universal

Model

Process Metrics

Cross-Project Feasibility

Metrics

Mod

els

Oth

ers

Cross-Domain

Prediction

Fine-grained Prediction

Data Privacy

Noise Reduction

Semi-supervised/activePersonalized Model

Page 90: Survey on Software Defect Prediction

90

Thank you!

Page 91: Survey on Software Defect Prediction

91

Page 92: Survey on Software Defect Prediction

92

Evaluation Measures (classification)

• Measures for binary classification– Confusion matrix

Buggy Clean

Buggy True Positive (TP) False Negative (FN)

Clean False Positive (FP) True Negatives (TN)

Predicted Class

ActualClass

Page 93: Survey on Software Defect Prediction

93

Evaluation Measures (classification)

• False positive rate (FPR,PF) = FP/(TN+FP)

• Accuracy = (TP+TN)/(TP+FP+TN+FN)

• Precision = TP/(TP+FP)• Recall = TP/(TP+FN)• F-measure =

2*Precision*Recall Precision+Recall

Page 94: Survey on Software Defect Prediction

94

Evaluation Measures (classification)

• AUC (Area Under receiver operating characteristic Curve)

False Positive rate

True P

osi

tive r

ate

01

1

Page 95: Survey on Software Defect Prediction

95

Evaluation Measures (classification)

• AUCEC (Area Under Cost Effectiveness Curve)

Percent of LOC

Perc

ent

of

bugs

found

0100%

100%

50%10%

M1

M2Th

resh

old

Rahman@FSE`11, Bugcache for inspections: Hit or miss?

Page 96: Survey on Software Defect Prediction

96

Evaluation Measures (Regression)

• Target–Metric values vs. the number of bugs– Actual vs. predicted number of bugs

• Correlation coefficient– Spearman / Pearson /R2

• Mean squared error

Page 97: Survey on Software Defect Prediction

97

CK metrics

Metric Description

WMC Weighted Methods per Class (# of methods)

DIT Depth of Inheritance Tree ( # of ancestor classes)

NOC Number of Children

CBO Coupling between Objects (# of coupled classes)

RFC Response for a class: WMC + # of methods called by the class)

LCOM Lack of Cohesion in Methods (# of "connected components”)