Promise 2011: "Studying the Fix-Time for Bugs in Large Open Source Projects"

Studying the Fix-Time for Bugs in Large Open Source

ProjectsLionel Marks, Ying Zou, Ahmed E. Hassan,

Thanh NguyenQueen’s University, Kingston, Ontario, Canada

Wednesday, 21 September, 11

If life was like that we don’t need

software prediction.

RealityMany simple feature requests or defect

reports do NOT get fixed for years.

Feature request / Defect report filed

Triage

Implementation plan / cause of defect

determined

Triage

determined

Implement

Triage

determined

Implement

Verify

Triage

determined

Implement

Verify

Triage

determined

Implement

Verify

Work item fix-time

Triage

determined

Implement

Verify

Work item fix-timeWhen will it be

fixed?

Triage

determined

Implement

Verify

fixed?Which

one should we fix this iteration?

Triage

determined

Implement

Verify

fixed?Which

one should we fix this iteration?

Can we predict the work item fix-time?

Locationproperties

Product / Version /

Component

Locationproperties

Product / Version /

Component

Number of completed

Locationproperties

Product / Version /

Component

Number of completed

Average fix time*

Locationproperties

Reporterproperties

Locationproperties

Reporterproperties

Industry / local / public

Locationproperties

Reporterproperties

Popularity*

Locationproperties

Reporterproperties

Popularity*

Number of past

requests*

Locationproperties

Reporterproperties

Popularity*

Number of past

requests*

Average fix time*

Locationproperties

Reporterproperties

Work itemproperties

Locationproperties

Reporterproperties

Work itemproperties

Severity / Priority

Locationproperties

Reporterproperties

Work itemproperties

Severity / Priority

Number of interested

parties*

Locationproperties

Reporterproperties

Work itemproperties

Severity / Priority

parties*

Morning / Day / night

Locationproperties

Reporterproperties

Work itemproperties

Severity / Priority

parties*

Description length*

Locationproperties

Reporterproperties

Work itemproperties

Severity / Priority

parties*

Description length*

Code attachment

Locationproperties

Reporterproperties

Work itemproperties

Predictor

Locationproperties

Reporterproperties

Work itemproperties

Predictor

When will it be fixed?

Locationproperties

Reporterproperties

Work itemproperties

Predictor

Locationproperties

Reporterproperties

Work itemproperties

NormalPredictor

Locationproperties

Reporterproperties

Work itemproperties

Normal

Predictor

Locationproperties

Reporterproperties

Work itemproperties

Predictor

Which one should we fix

this iteration?

Locationproperties

Reporterproperties

Work itemproperties

Next minor revision

Predictor

this iteration?

Locationproperties

Reporterproperties

Work itemproperties

Next minor revision

Next major revisionPredictor

this iteration?

Locationproperties

Reporterproperties

Work itemproperties

Next minor revision

Next major revision

Next version

Predictor

this iteration?

Case study

ProjectNumber of

work items

< 3 months

< 1 year

< 3 years

Mozilla 85,616 46% 27% 27%

Eclipse 63,402 76% 18% 6%

Random ForestWe use Random Forest because:

• Decision tree based models are explainable comparing to SVM or neural network.

• Random Forest out-performs C4.5 because it is more resistive to data with highly correlated attributes.

• It is easy to analyze the sensitivity of each property.

Data J48Random Forest M5P

Linux* 11.09 (3.09) 18.42 (2.53) 16.22(2.19)

Apache* 38.64 (1.35) 34.45 (1.68) 25.39 (1.64)

Jazz 17.75 (2.76) 18.43 (2.74) 11.67 (2.27)

*Akinori Ihara (Kinki University, Japan) and Yasutaka Kamei (Kyushu University, Japan)

Goals of our case study

• G1: What is the accuracy of fix-time prediction model?

• G2: Which properties are the most important predictors of fix-time?

• G3: How applicable are models in practice?

G1: Accuracy of the model

We build 10 random forests for each project:

• Each random forest uses randomly 2/3 of the data for training.

• We evaluate the prediction on the rest 1/3 of the data

Accuracy of the model:Overall misclassification

Reporter Location Description All

Mozilla

Reporter Location Description All 0

35.8 34.7

Reporter LocationDescription All

Mozilla Eclipse

Reporter Location Description All 0

35.8 34.7

Reporter LocationDescription All

Mozilla Eclipse

G1: We can correctly classify ∼65% of the time the fix-time for work items in Eclipse and Mozilla, twice as good as random.

G2: Model sensitivity - Importance of each property

We use technique called permutation accuracy importance measure as follow:

• For each property, we randomly alter values and rerun the classification.

• We give an importance score (1 to 10) depend on the change in the classification result.

• We sum its score across all ten forests.

LocationLocation

Mozilla Eclipse

ProductComponent

Project fix-timeProduct opened

work items

LocationLocation

Mozilla Eclipse

ProductComponent

work items

ReporterReporter

Mozilla Eclipse

Fix-timeRequests

Fix-timeOverall popularity

LocationLocation

Mozilla Eclipse

ProductComponent

work items

ReporterReporter

Mozilla Eclipse

Fix-timeRequests

DescriptionDescription

Mozilla Eclipse

YearHas target milestone

YearSeverity

LocationLocation

Mozilla Eclipse

ProductComponent

work items

ReporterReporter

Mozilla Eclipse

Fix-timeRequests

Mozilla Eclipse

YearSeverity

AllAll

Mozilla Eclipse

YearProduct

SeverityNumber of CCed

LocationLocation

Mozilla Eclipse

ProductComponent

work items

ReporterReporter

Mozilla Eclipse

Fix-timeRequests

Mozilla Eclipse

YearSeverity

AllAll

Mozilla Eclipse

YearProduct

SeverityNumber of CCed

G2: The time of bug filing and its location are the most important properties in the Mozilla project. In the Eclipse project, bug severity is the most important property.

Why is time the most important factor in Mozilla?

Project Duplicate Invalid Moved Won’t fix Works for me Fixed TotalMozilla 99,414 29,856 103 9,512 46,659 85,616 271,160

(37%) (11%) (0%) (4%) (17%) (32%) (100%)Eclipse 19,060 7,958 0 7,141 10,013 63,402 107,574

(18%) (7%) (0%) (7%) (9%) (59%) (100%)

Table 5: Statistics for the Resolution Type of Bugs for the Mozilla and Eclipse Projects

(a) Mozilla (b) Eclipse

Figure 1: The distribution of fix-time for bugs over the lifetime of the Mozilla and Eclipse projects. The x-axis shows the specificproject years, while the y-axis shows the percentage of bugs fixed within a specific class

Mozilla project. The observation about severity confirms the find-ing reported by Panjer using basic decision trees [17]. Our analysisshows that the year of the bug has a higher impact on the fix-timefor a bug than the severity of the bug. We also note that our find-ings do not match with the finding by Hooimeijer and Weimer [11]that severity of bugs in Mozilla has an important effect on the timeneeded to fix a bug. In contrast, our results show that reporting time(e.g., Year and Week of year) have a more important influence.

We believe that this is due to the fact that Hooimeijer and Weimermeasure the fix-time of a bug from the time it is entered into Bugzillatill it is fixed. In contrast, we only measure the time from theassignment of a developer (i.e., after triage) till the bug is fixed.Therefore, it might be the case that bugs with high severity aretriaged faster but then their fix-time is independent of their sever-ity level. This could be due to the fact that severity levels do notrepresent the true severity of a bug.

Mozilla Eclipse# Attribute % Attribute %1 Year 100 Year 992 Has target milestone 82 Severity 873 Number of CCed 80 Number of CCed 834 Week of Year 77 Has target milestone 69

Table 9: Top Attributes for the Bug Description Dimension

Experiment #4:All AttributesIn our last experiment, we combine all the attributes across the threedimensions. This resulted in 50 metrics that were studied usingour sensitivity analysis technique. Table 10 shows the results ofour analysis. The table shows that the severity of the bug and thenumber of CCed project personnel are the most important attributesfor the Eclipse project, while on the other hand the reporting time

(year and week), and the location (product and component) havethe largest influence for the Mozilla project.

Mozilla Eclipse# Attribute % Attribute %1 Year 97 Severity 872 Product 92 Number of CCed 733 Week 71 Product fix-time 614 Component 63

Table 10: Top Attributes for the Models with All Attributes

DiscussionTable 11 shows the performance of the three dimensions and allattributes for the Mozilla and Eclipse projects. The Table showsthat the dimensions perform differently for both projects. For theMozilla project, the best performing dimensions are: Location, De-scription, then Reporter. For the Eclipse project, the best perform-ing dimensions are: Location, Reporter, then Description. The per-formance differences are statistically significant. The results sug-gest that the bug descriptions for the Eclipse project should be im-proved to help project managers in project planning and resourceallocation decisions.

A random guessing approach would result in an overall misclas-sification rate of 60%, as we have 3 classes. Our random forestclassifier shows considerable improvement over random guessing.Moreover, if we did not discretize the numerical attributes we mightbe able to achieve a lower misclassification rate. However, the pro-duced model would be much harder to comprehend and use in prac-tice for project planning as we are searching for simple and basicrules-of-thumb that practitioners could use.

We verified the significance of our result differences by com-paring the performance of the ten forests generated for each di-

Why is time the most important factor in Mozilla?

Project Duplicate Invalid Moved Won’t fix Works for me Fixed TotalMozilla 99,414 29,856 103 9,512 46,659 85,616 271,160

(37%) (11%) (0%) (4%) (17%) (32%) (100%)Eclipse 19,060 7,958 0 7,141 10,013 63,402 107,574

(18%) (7%) (0%) (7%) (9%) (59%) (100%)

Table 5: Statistics for the Resolution Type of Bugs for the Mozilla and Eclipse Projects

(a) Mozilla (b) Eclipse

Figure 1: The distribution of fix-time for bugs over the lifetime of the Mozilla and Eclipse projects. The x-axis shows the specificproject years, while the y-axis shows the percentage of bugs fixed within a specific class

Mozilla project. The observation about severity confirms the find-ing reported by Panjer using basic decision trees [17]. Our analysisshows that the year of the bug has a higher impact on the fix-timefor a bug than the severity of the bug. We also note that our find-ings do not match with the finding by Hooimeijer and Weimer [11]that severity of bugs in Mozilla has an important effect on the timeneeded to fix a bug. In contrast, our results show that reporting time(e.g., Year and Week of year) have a more important influence.

We believe that this is due to the fact that Hooimeijer and Weimermeasure the fix-time of a bug from the time it is entered into Bugzillatill it is fixed. In contrast, we only measure the time from theassignment of a developer (i.e., after triage) till the bug is fixed.Therefore, it might be the case that bugs with high severity aretriaged faster but then their fix-time is independent of their sever-ity level. This could be due to the fact that severity levels do notrepresent the true severity of a bug.

Mozilla Eclipse# Attribute % Attribute %1 Year 100 Year 992 Has target milestone 82 Severity 873 Number of CCed 80 Number of CCed 834 Week of Year 77 Has target milestone 69

Table 9: Top Attributes for the Bug Description Dimension

Experiment #4:All AttributesIn our last experiment, we combine all the attributes across the threedimensions. This resulted in 50 metrics that were studied usingour sensitivity analysis technique. Table 10 shows the results ofour analysis. The table shows that the severity of the bug and thenumber of CCed project personnel are the most important attributesfor the Eclipse project, while on the other hand the reporting time

(year and week), and the location (product and component) havethe largest influence for the Mozilla project.

Mozilla Eclipse# Attribute % Attribute %1 Year 97 Severity 872 Product 92 Number of CCed 733 Week 71 Product fix-time 614 Component 63

Table 10: Top Attributes for the Models with All Attributes

DiscussionTable 11 shows the performance of the three dimensions and allattributes for the Mozilla and Eclipse projects. The Table showsthat the dimensions perform differently for both projects. For theMozilla project, the best performing dimensions are: Location, De-scription, then Reporter. For the Eclipse project, the best perform-ing dimensions are: Location, Reporter, then Description. The per-formance differences are statistically significant. The results sug-gest that the bug descriptions for the Eclipse project should be im-proved to help project managers in project planning and resourceallocation decisions.

A random guessing approach would result in an overall misclas-sification rate of 60%, as we have 3 classes. Our random forestclassifier shows considerable improvement over random guessing.Moreover, if we did not discretize the numerical attributes we mightbe able to achieve a lower misclassification rate. However, the pro-duced model would be much harder to comprehend and use in prac-tice for project planning as we are searching for simple and basicrules-of-thumb that practitioners could use.

We verified the significance of our result differences by com-paring the performance of the ten forests generated for each di-

G3: How applicable are models in practice?

If a prediction model is stable, it should:

• Use only available properties

• Be stable

Triage

determined

Implement

Verify

Work item fix-time

Triage

Work item fix-time

Triage

Work item fix-time

Number of CCed?

Triage

Work item fix-time

Number of CCed?

Serverity change!

Triage

Work item fix-time

Number of CCed?

Serverity change!

Assigned to someone else

Accuracy of the predictor models using only available properties

Data size Misclassification rate

Eclipse* 86490 0.51

Linux* 2024 0.55

Jazz 16672 0.57

Apache* 1466 0.37

*Akinori Ihara (Kinki University, Japan) and Yasutaka Kamei (Kyushu University, Japan)

Stability of the mode

• Training size stability: As more data is added to the training set, the accuracy should improve.

• Time stability: The accuracy should be stable overtime.

Apache - Training size stability

1 2 3 4 5 6 7 8 9

x 10% of training data

Precisions

Apache - Time stability

1 2 3 4 5 6 7 8

Precisions

Apache - Time stability

1 2 3 4 5 6 7 8

Precisions

G3: Fix-time prediction model may work on project such as Apache in practice. Apache prediction model have data stability and time stability.

Promise 2011: "Studying the Fix-Time for Bugs in Large Open Source Projects"

Technology

Rory’s Story Cubes Looney Tunes - Icon Guide · Granny...

Seeding Bugs To Find Bugs

Origami Bugs

Bugs Fixed

1 Bugs, Bugs and Bugs. 2 Bugs: Run Time Handling...

Giant Water Bugs, Electric Light Bugs, Lethocerus Abedus...

Garden Bugs

Good Bugs & Bad Bugs

BUGS B, UGS & MORE BUGS

Body Bugs NOVA | Bugs That Live on You NOVA | Bugs That Live...

Bugs and More Bugs: An Excerpt from The Holistic Orchard

Bugs for Bugs Update Autumn 2017 · suite of beneﬁcial...

Software bugs

Color Bugs

Bugs' World

BUGS, BUGS, BUGS WHICH ARE GOOD? WHICH ARE...