Top Banner
Predicting Defects Using Change Genealogies Kim Herzig * , Sascha Just , Andreas Rau , Andreas Zeller * Microsoft Research, UK Saarland University, Germany
14

Predicting Defects Using Change Genealogies (ISSE 2013)

Jul 04, 2015

Download

Technology

Kim Herzig
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Predicting Defects Using Change Genealogies (ISSE 2013)

Predicting Defects Using Change GenealogiesKim Herzig*, Sascha Just†, Andreas Rau†, Andreas Zeller†

* Microsoft Research, UK† Saarland University, Germany

Page 2: Predicting Defects Using Change Genealogies (ISSE 2013)

Prediction Models

• Goal: determine the likelihood of bugs in code entities Quality assurance limited by time and money.

Can be helpful for project outsiders.

• Trained on “ground truth” Known instances and their properties.

Idea: learning from past for future.

• Predicting / estimating defect likelihood of new, unknown code entities

Page 3: Predicting Defects Using Change Genealogies (ISSE 2013)

Fine-Tuning Prediction Models

Prediction Target

Machine Learner

Training Methods

Metrics (independent variables)

Page 4: Predicting Defects Using Change Genealogies (ISSE 2013)

(Social) Network Metrics

Some participants more active and central than others.

Are these participants also more crucial?

Page 5: Predicting Defects Using Change Genealogies (ISSE 2013)

Assumption: “Central binaries tend to be defect-prone”.

Code Network Metrics

Code entities communicate with each other.

Use call graph network to compute network metrics.

[2008] Zimmermann and Nagappan: “Predicting Defects using Network Analysis on Dependency Graphs”

10100100101101011000100101100100010101111001011001

10100100101101011000100101100100010101111001011001

10100100101101011000100101100100010101111001011001

10100100101101011000100101100100010101111001011001

10100100101101011000100101100100010101111001011001

10100100101101011000100101100100010101111001011001

10100100101101011000100101100100010101111001011001

10100100101101011000100101100100010101111001011001

Call graphs do not change significantly over time!

Page 6: Predicting Defects Using Change Genealogies (ISSE 2013)

Assumption: “Code being crucially changed tend to be defect prone”.

Change Network Metrics

Code changes depend on each other.

Central code changes tend to be crucial.

Idea: Use dependencies between code changes

Change Genealogies

Page 7: Predicting Defects Using Change Genealogies (ISSE 2013)

Change Genealogies (in a nutshell)[2013] Kim Herzig: “Mining and Untangling Change Genealogies” (PhD thesis)

Directed graph structure

Method level dependencies

Multi-dimensional (space & time)

Page 8: Predicting Defects Using Change Genealogies (ISSE 2013)

Change Genealogy Metrics EGO network metrics

Measures the immediate impact of changes on other changes.

GLOBAL network metrics Express the long-term impact of changes on other changes.

Considering the type of the change Adding method definition, modifying method call

Considering parent age How old are the parent changes a change depends on.

Change genealogy metrics must be aggregated to source file level.

Page 9: Predicting Defects Using Change Genealogies (ISSE 2013)

Comparing change genealogies

against:

Code complexity models (e.g. McCabe)

Code dependency models(Zimmermann & Nagappan)

Combined network models(Change genealogy & code dependency network metrics)

Experimental Setup

Page 10: Predicting Defects Using Change Genealogies (ISSE 2013)

Experimental Setup

Study subjects Multiple machine learners

Page 11: Predicting Defects Using Change Genealogies (ISSE 2013)

Prediction Precision

Code complexity metrics

Code dependency network metrics (Zimmermann & Nagappan)

Change genealogy metrics

NM & CGM

Page 12: Predicting Defects Using Change Genealogies (ISSE 2013)

Confirmed: Network metrics

outperform complexity metrics.

Change genealogy models report

less false positives (higher precision).

Change genealogy model slightly

more false negatives (lower recall).

Combining network metrics: good

recall but worse precision.

Page 13: Predicting Defects Using Change Genealogies (ISSE 2013)

Influential Metrics

Network efficiency among the top 10 most influential metrics.

Relationship between changes and type of dependency top 2 metrics (for all projects).

Higher number of old parents the higher the probability to add bugs.

Code entities combining multiple older functionalities more defect prone.

Page 14: Predicting Defects Using Change Genealogies (ISSE 2013)

Code entities combining multiple older functionalities more defect prone.

Change genealogies are well suited for defect prediction (better precision, close recall).

Adapting social network metrics Comparing prediction models.to change dependency graphs.

Summary