Top Banner
Cross - project Defect Prediction Using a Connectivity - based Unsupervised Classifier Feng Zhang Quan Zheng Ying Zou Ahmed E. Hassan
88

Cross-project Defect Prediction Using A Connectivity-based Unsupervised Classifier

Feb 11, 2017

Download

Software

Feng Zhang
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Cross-project Defect Prediction Using A Connectivity-based Unsupervised Classifier

Cross-project Defect Prediction Using a Connectivity-based

Unsupervised Classifier

Feng Zhang Quan Zheng Ying Zou Ahmed E. Hassan

Page 2: Cross-project Defect Prediction Using A Connectivity-based Unsupervised Classifier

Defect prediction

Page 3: Cross-project Defect Prediction Using A Connectivity-based Unsupervised Classifier

Training

Defect prediction

Past data to build the model

Page 4: Cross-project Defect Prediction Using A Connectivity-based Unsupervised Classifier

Training Target

Past data to build the model New

Defect prediction

Page 5: Cross-project Defect Prediction Using A Connectivity-based Unsupervised Classifier

Training Target

Past data to build the model New

Within-project defect prediction

Page 6: Cross-project Defect Prediction Using A Connectivity-based Unsupervised Classifier

Target

Past data to build the model

Historical data may not be available

Page 7: Cross-project Defect Prediction Using A Connectivity-based Unsupervised Classifier

Target

Historical data may not be available

Page 8: Cross-project Defect Prediction Using A Connectivity-based Unsupervised Classifier

Other projects as training data

Target

Page 9: Cross-project Defect Prediction Using A Connectivity-based Unsupervised Classifier

Target

Cross-project defect prediction

Train-ing

Page 10: Cross-project Defect Prediction Using A Connectivity-based Unsupervised Classifier

Softwaremetrics

Defect data

Cross-project defect prediction

Page 11: Cross-project Defect Prediction Using A Connectivity-based Unsupervised Classifier

Softwaremetrics

Defect data

Cross-project defect prediction

Trainingproject

Page 12: Cross-project Defect Prediction Using A Connectivity-based Unsupervised Classifier

Supervisedclassifier

Softwaremetrics

Defect data

Cross-project defect prediction

Trainingproject

Page 13: Cross-project Defect Prediction Using A Connectivity-based Unsupervised Classifier

Supervisedclassifier

Softwaremetrics

Defect data

Cross-project defect prediction

Softwaremetrics

Trainingproject

Page 14: Cross-project Defect Prediction Using A Connectivity-based Unsupervised Classifier

Supervisedclassifier

Softwaremetrics

Defect data

Cross-project defect prediction

Softwaremetrics

Targetproject

Trainingproject

Page 15: Cross-project Defect Prediction Using A Connectivity-based Unsupervised Classifier

Supervisedclassifier

Softwaremetrics

Defect data

Cross-project defect prediction

Softwaremetrics

Targetproject

Trainingproject

Page 16: Cross-project Defect Prediction Using A Connectivity-based Unsupervised Classifier

Supervisedclassifier

Softwaremetrics

Defect data

Cross-project defect prediction

Softwaremetrics

Targetproject

Defect proneness

Trainingproject

Page 17: Cross-project Defect Prediction Using A Connectivity-based Unsupervised Classifier

Heterogeneity across projects(ICSM 2013)

Page 18: Cross-project Defect Prediction Using A Connectivity-based Unsupervised Classifier

Supervisedclassifier

Softwaremetrics

Defect data

Cross-project defect prediction

Softwaremetrics

Targetproject

Defect proneness

Trainingproject

Page 19: Cross-project Defect Prediction Using A Connectivity-based Unsupervised Classifier

Supervisedclassifier

Softwaremetrics

Defect data

Cross-project defect prediction

Softwaremetrics

Targetproject

Defect proneness

Trainingproject

Heterogeneity

Page 20: Cross-project Defect Prediction Using A Connectivity-based Unsupervised Classifier

Supervisedclassifier

Softwaremetrics

Defect data

Cross-project defect prediction

Softwaremetrics

Targetproject

Defect proneness

Trainingproject

Heterogeneity

Page 21: Cross-project Defect Prediction Using A Connectivity-based Unsupervised Classifier

Supervisedclassifier

Softwaremetrics

Defect data

Softwaremetrics

Targetproject

Defect proneness

Trainingproject

Heterogeneity

Our Previous Solution(MSR 2014)

Page 22: Cross-project Defect Prediction Using A Connectivity-based Unsupervised Classifier

Supervisedclassifier

Softwaremetrics

Defect data

Softwaremetrics

Targetproject

Defect proneness

Trainingproject

Heterogeneity

Our Previous Solution(MSR 2014)

Page 23: Cross-project Defect Prediction Using A Connectivity-based Unsupervised Classifier

Supervisedclassifier

Softwaremetrics

Defect data

Softwaremetrics

Targetproject

Defect proneness

Trainingproject

How About Using Unsupervised Classifiers?

Page 24: Cross-project Defect Prediction Using A Connectivity-based Unsupervised Classifier

Unsupervisedclassifier

Softwaremetrics

Defect data

Softwaremetrics

Targetproject

Defect proneness

Trainingproject

How About Using Unsupervised Classifiers?

Page 25: Cross-project Defect Prediction Using A Connectivity-based Unsupervised Classifier

Unsupervisedclassifier

Softwaremetrics

Defect data

Softwaremetrics

Targetproject

Defect proneness

Trainingproject

How About Using Unsupervised Classifiers?

Heterogeneity

Page 26: Cross-project Defect Prediction Using A Connectivity-based Unsupervised Classifier

Unsupervisedclassifier

Softwaremetrics

Defect data

Softwaremetrics

Targetproject

Defect proneness

Trainingproject

How About Using Unsupervised Classifiers?

HeterogeneityInitial attempts using K-means were not very successful.

Page 27: Cross-project Defect Prediction Using A Connectivity-based Unsupervised Classifier

How About Using Unsupervised Classifiers?

Page 28: Cross-project Defect Prediction Using A Connectivity-based Unsupervised Classifier

How About Using Unsupervised Classifiers?

Short distance

Page 29: Cross-project Defect Prediction Using A Connectivity-based Unsupervised Classifier

How About Using Unsupervised Classifiers?

Short distance

Page 30: Cross-project Defect Prediction Using A Connectivity-based Unsupervised Classifier

How About Using Unsupervised Classifiers?

Long distance

Long distance

Page 31: Cross-project Defect Prediction Using A Connectivity-based Unsupervised Classifier

How About Using Unsupervised Classifiers?

Long distance

Long distance

Page 32: Cross-project Defect Prediction Using A Connectivity-based Unsupervised Classifier

How About Using Unsupervised Classifiers?

Connections

Connections

Page 33: Cross-project Defect Prediction Using A Connectivity-based Unsupervised Classifier

Social network

Page 34: Cross-project Defect Prediction Using A Connectivity-based Unsupervised Classifier

c

Page 35: Cross-project Defect Prediction Using A Connectivity-based Unsupervised Classifier

Far away in distance but may be connected !c

Page 36: Cross-project Defect Prediction Using A Connectivity-based Unsupervised Classifier

Far away in distance but may be connected !

Page 37: Cross-project Defect Prediction Using A Connectivity-based Unsupervised Classifier

Far away in distance but may be connected !

Connection is more important than distance.

Page 38: Cross-project Defect Prediction Using A Connectivity-based Unsupervised Classifier

Far away in distance but may be connected !

Are defective software entities connected to each other?

Page 39: Cross-project Defect Prediction Using A Connectivity-based Unsupervised Classifier

Within-community and cross-community connections

Page 40: Cross-project Defect Prediction Using A Connectivity-based Unsupervised Classifier

Stronger Stronger

Weaker

Within-community and cross-community connections

Page 41: Cross-project Defect Prediction Using A Connectivity-based Unsupervised Classifier

Stronger Stronger

Weaker

Defective entities tend to connect to other defective entities.

Within-community and cross-community connections

Page 42: Cross-project Defect Prediction Using A Connectivity-based Unsupervised Classifier

Our connectivity-based unsupervised approach

Page 43: Cross-project Defect Prediction Using A Connectivity-based Unsupervised Classifier

Consider each entity (file/class) as a node

Page 44: Cross-project Defect Prediction Using A Connectivity-based Unsupervised Classifier

Step 1. Compute software metrics

Page 45: Cross-project Defect Prediction Using A Connectivity-based Unsupervised Classifier

Step 2. Build a graph based on the similarity

Page 46: Cross-project Defect Prediction Using A Connectivity-based Unsupervised Classifier

Step 3. Make a bipartition on the graph

Page 47: Cross-project Defect Prediction Using A Connectivity-based Unsupervised Classifier

Step 4. Label the defective cluster

Defective Clean

Page 48: Cross-project Defect Prediction Using A Connectivity-based Unsupervised Classifier

17 lines of R code is provided in the paper

Page 49: Cross-project Defect Prediction Using A Connectivity-based Unsupervised Classifier

Looks simple? Does it really work?

Page 50: Cross-project Defect Prediction Using A Connectivity-based Unsupervised Classifier

Research questions

RQ1. How does the spectral clustering based classifier perform in cross-project defect prediction?

RQ2. Does the spectral clustering based classifier perform well in within-projectdefect prediction?

Page 51: Cross-project Defect Prediction Using A Connectivity-based Unsupervised Classifier

Subject projects (Total: 26)

Page 52: Cross-project Defect Prediction Using A Connectivity-based Unsupervised Classifier

Equinox JDT Lucene Mylyn PDE

AEEEM (5 projects)

Subject projects (Total: 26)

Page 53: Cross-project Defect Prediction Using A Connectivity-based Unsupervised Classifier

Equinox JDT Lucene Mylyn PDE

AEEEM (5 projects)

CM1 JM1 KC3 MC1 MC2 MW1

NASA (11 projects)

PC1 PC2 PC3 PC4 PC5

Subject projects (Total: 26)

Page 54: Cross-project Defect Prediction Using A Connectivity-based Unsupervised Classifier

Subject projects (Total: 26)

Equinox JDT Lucene Mylyn PDE

AEEEM (5 projects)

CM1 JM1 KC3 MC1 MC2 MW1

NASA (11 projects)

PC1 PC2 PC3 PC4 PC5

PROMISE (10 projects)

Ant Camel Ivy Jedit Log4j

Lucene POI Tomcat Xalan Xerces

Page 55: Cross-project Defect Prediction Using A Connectivity-based Unsupervised Classifier

Classifiers for comparison (Total: 9)

Page 56: Cross-project Defect Prediction Using A Connectivity-based Unsupervised Classifier

Unsupervised

1. K-means clustering (KM)

2. Partition around medoids (PAM)

3. Fuzzy C-means (FCM)

4. Neural-gas (NG)

Classifiers for comparison (Total: 9)

Page 57: Cross-project Defect Prediction Using A Connectivity-based Unsupervised Classifier

Unsupervised

1. K-means clustering (KM)

2. Partition around medoids (PAM)

3. Fuzzy C-means (FCM)

4. Neural-gas (NG)

Supervised

1. Random forest (RF)

2. Naïve Bayes (NB)

3. Logistic regression (LR)

4. Decision tree (DT)

5. Logistic model tree (LMT)

Classifiers for comparison (Total: 9)

Page 58: Cross-project Defect Prediction Using A Connectivity-based Unsupervised Classifier

RQ1. How does the spectral clustering based classifier perform in cross-projectdefect prediction?

Page 59: Cross-project Defect Prediction Using A Connectivity-based Unsupervised Classifier

NASA

AEEEM

PROMISE

RQ1. How does the spectral clustering based classifier perform in cross-projectdefect prediction?

Page 60: Cross-project Defect Prediction Using A Connectivity-based Unsupervised Classifier

NASA

AEEEM

PROMISE

RQ1. How does the spectral clustering based classifier perform in cross-projectdefect prediction?

Page 61: Cross-project Defect Prediction Using A Connectivity-based Unsupervised Classifier

NASA

AEEEM

PROMISE

AverageAUC

AverageAUC

AverageAUC

RQ1. How does the spectral clustering based classifier perform in cross-projectdefect prediction?

Page 62: Cross-project Defect Prediction Using A Connectivity-based Unsupervised Classifier

AverageAUC

AverageAUC

AverageAUC

NASA

AEEEM

PROMISE

Rank classifiers(Scott-Knott Test)

RQ1. How does the spectral clustering based classifier perform in cross-projectdefect prediction?

Page 63: Cross-project Defect Prediction Using A Connectivity-based Unsupervised Classifier

RQ1. Results (cross-project)

Page 64: Cross-project Defect Prediction Using A Connectivity-based Unsupervised Classifier

Red text:Unsupervised

Blue text:Supervised

Rank 1

Rank 2

Rank 3

Rank 4

RQ1. Results (cross-project)

Page 65: Cross-project Defect Prediction Using A Connectivity-based Unsupervised Classifier

Red text:Unsupervised

Blue text:Supervised

Rank 1

Rank 2

Rank 3

Rank 4

RQ1. Results (cross-project)

Page 66: Cross-project Defect Prediction Using A Connectivity-based Unsupervised Classifier

Red text:Unsupervised

Blue text:Supervised

Rank 1

Rank 2

Rank 3

Rank 4

RQ1. Results (cross-project)

Our approach can compete with supervised classifiers under study,

and sometime is even better.

Page 67: Cross-project Defect Prediction Using A Connectivity-based Unsupervised Classifier

RQ2. Does the spectral clustering based classifier perform well in within-project defect prediction?

Page 68: Cross-project Defect Prediction Using A Connectivity-based Unsupervised Classifier

RQ2. Does the spectral clustering based classifier perform well in within-project defect prediction?

Page 69: Cross-project Defect Prediction Using A Connectivity-based Unsupervised Classifier

50%

50%

RQ2. Does the spectral clustering based classifier perform well in within-project defect prediction?

Page 70: Cross-project Defect Prediction Using A Connectivity-based Unsupervised Classifier

50%

50%

AUCTraining Testing

RQ2. Does the spectral clustering based classifier perform well in within-project defect prediction?

Page 71: Cross-project Defect Prediction Using A Connectivity-based Unsupervised Classifier

50%

50%

AUCTraining

Training

Testing

Testing AUC

RQ2. Does the spectral clustering based classifier perform well in within-project defect prediction?

Page 72: Cross-project Defect Prediction Using A Connectivity-based Unsupervised Classifier

50%

50%

AUCTraining

Training

Testing

Testing AUC

50%

50%

AUCTraining

Training

Testing

Testing AUC

…(500 random splits, thus 1,000 evaluations)

RQ2. Does the spectral clustering based classifier perform well in within-project defect prediction?

Page 73: Cross-project Defect Prediction Using A Connectivity-based Unsupervised Classifier

50%

50%

AUC

Rank classifiers(Scott-Knott Test)

Training

Training

Testing

Testing AUC

50%

50%

AUCTraining

Training

Testing

Testing AUC

…(500 random splits, thus 1,000 evaluations)

RQ2. Does the spectral clustering based classifier perform well in within-project defect prediction?

Page 74: Cross-project Defect Prediction Using A Connectivity-based Unsupervised Classifier

RQ2. Results (within-project)

Page 75: Cross-project Defect Prediction Using A Connectivity-based Unsupervised Classifier

RQ2. Results (within-project)

1

Random forest

Gold

Page 76: Cross-project Defect Prediction Using A Connectivity-based Unsupervised Classifier

RQ2. Results (within-project)

12

Random forest

Logistic regressionSpectral clusteringLogistic model treeNaïve Bayes

Silver Gold

Page 77: Cross-project Defect Prediction Using A Connectivity-based Unsupervised Classifier

12 3

Random forest

Logistic regressionSpectral clusteringLogistic model treeNaïve Bayes

Fuzzy C-means

RQ2. Results (within-project)

Silver BronzeGold

Page 78: Cross-project Defect Prediction Using A Connectivity-based Unsupervised Classifier

12 3

Random forest

Logistic regressionSpectral clusteringLogistic model treeNaïve Bayes

Fuzzy C-means

RQ2. Results (within-project)

Silver BronzeGold

Our approach can achieve similar performance as supervised classifiers,

except random forest.

Page 79: Cross-project Defect Prediction Using A Connectivity-based Unsupervised Classifier

Summary

Page 80: Cross-project Defect Prediction Using A Connectivity-based Unsupervised Classifier
Page 81: Cross-project Defect Prediction Using A Connectivity-based Unsupervised Classifier
Page 82: Cross-project Defect Prediction Using A Connectivity-based Unsupervised Classifier
Page 83: Cross-project Defect Prediction Using A Connectivity-based Unsupervised Classifier
Page 84: Cross-project Defect Prediction Using A Connectivity-based Unsupervised Classifier
Page 85: Cross-project Defect Prediction Using A Connectivity-based Unsupervised Classifier
Page 86: Cross-project Defect Prediction Using A Connectivity-based Unsupervised Classifier
Page 87: Cross-project Defect Prediction Using A Connectivity-based Unsupervised Classifier
Page 88: Cross-project Defect Prediction Using A Connectivity-based Unsupervised Classifier

Feng Zhang([email protected]) (http://www.feng-zhang.com)