Top Banner
Seminar 2010 - The Mining Project Yana Mileva & Kim Herzig Mining Software Repositories Monday, 17 May, 2010
43

Mining Software Repositories - uni-saarland.de · Mining Software Repositories Monday, 17 May, 2010. Predicting Defects for Eclipse ... Bugzilla fixed bug 1234 Strong Candidate fixed

Aug 14, 2018

Download

Documents

vuongdung
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Mining Software Repositories - uni-saarland.de · Mining Software Repositories Monday, 17 May, 2010. Predicting Defects for Eclipse ... Bugzilla fixed bug 1234 Strong Candidate fixed

Seminar 2010 - The Mining ProjectYana Mileva & Kim Herzig

Mining Software Repositories

Monday, 17 May, 2010

Page 2: Mining Software Repositories - uni-saarland.de · Mining Software Repositories Monday, 17 May, 2010. Predicting Defects for Eclipse ... Bugzilla fixed bug 1234 Strong Candidate fixed

Predicting Defects for Eclipse[Zimmermann et al.]

SCM Repository

Monday, 17 May, 2010

Page 3: Mining Software Repositories - uni-saarland.de · Mining Software Repositories Monday, 17 May, 2010. Predicting Defects for Eclipse ... Bugzilla fixed bug 1234 Strong Candidate fixed

Predicting Defects for Eclipse[Zimmermann et al.]

SCM Repository Commit Message

extract commit messagese.g. svn log --xml

Monday, 17 May, 2010

Page 4: Mining Software Repositories - uni-saarland.de · Mining Software Repositories Monday, 17 May, 2010. Predicting Defects for Eclipse ... Bugzilla fixed bug 1234 Strong Candidate fixed

Predicting Defects for Eclipse[Zimmermann et al.]

SCM Repository Commit Message

fixed bug

1234

similar

than

1234

#1234

extract reference candidatese.g. using regExp

Monday, 17 May, 2010

Page 5: Mining Software Repositories - uni-saarland.de · Mining Software Repositories Monday, 17 May, 2010. Predicting Defects for Eclipse ... Bugzilla fixed bug 1234 Strong Candidate fixed

Predicting Defects for Eclipse[Zimmermann et al.]

SCM Repository Commit Message

fixed bug

1234

similar

than

1234

#1234

false positive?

Monday, 17 May, 2010

Page 6: Mining Software Repositories - uni-saarland.de · Mining Software Repositories Monday, 17 May, 2010. Predicting Defects for Eclipse ... Bugzilla fixed bug 1234 Strong Candidate fixed

Predicting Defects for Eclipse[Zimmermann et al.]

SCM Repository Commit Message

fixed bug

1234

similar

than

1234

#1234

Bugzilla

search for bug reports referenced

Monday, 17 May, 2010

Page 7: Mining Software Repositories - uni-saarland.de · Mining Software Repositories Monday, 17 May, 2010. Predicting Defects for Eclipse ... Bugzilla fixed bug 1234 Strong Candidate fixed

Predicting Defects for Eclipse[Zimmermann et al.]

SCM Repository Commit Message

fixed bug

1234

similar

than

1234

#1234

Bugzilla

fixed bug

1234

Strong Candidate

Monday, 17 May, 2010

Page 8: Mining Software Repositories - uni-saarland.de · Mining Software Repositories Monday, 17 May, 2010. Predicting Defects for Eclipse ... Bugzilla fixed bug 1234 Strong Candidate fixed

Predicting Defects for Eclipse[Zimmermann et al.]

SCM Repository Commit Message

fixed bug

1234

similar

than

1234

#1234

Bugzilla

fixed bug

1234

Strong Candidate

fixed bug

1234

Confirmed Reference

filter, filter, filtere.g. time, version, author

Monday, 17 May, 2010

Page 9: Mining Software Repositories - uni-saarland.de · Mining Software Repositories Monday, 17 May, 2010. Predicting Defects for Eclipse ... Bugzilla fixed bug 1234 Strong Candidate fixed

Your Job

Bugzilla

Monday, 17 May, 2010

Page 10: Mining Software Repositories - uni-saarland.de · Mining Software Repositories Monday, 17 May, 2010. Predicting Defects for Eclipse ... Bugzilla fixed bug 1234 Strong Candidate fixed

Your Job

Bugzilla Bug Report

extract bug report textand comments

Monday, 17 May, 2010

Page 11: Mining Software Repositories - uni-saarland.de · Mining Software Repositories Monday, 17 May, 2010. Predicting Defects for Eclipse ... Bugzilla fixed bug 1234 Strong Candidate fixed

Your Job

Bugzilla Bug Report

patch in

1234

since

1234

#1234

extract reference candidatese.g. using regExp

Monday, 17 May, 2010

Page 12: Mining Software Repositories - uni-saarland.de · Mining Software Repositories Monday, 17 May, 2010. Predicting Defects for Eclipse ... Bugzilla fixed bug 1234 Strong Candidate fixed

Your Job

Bugzilla Bug Report

SCM Repository

search for revisionsreferenced

patch in

1234

since

1234

#1234

Monday, 17 May, 2010

Page 13: Mining Software Repositories - uni-saarland.de · Mining Software Repositories Monday, 17 May, 2010. Predicting Defects for Eclipse ... Bugzilla fixed bug 1234 Strong Candidate fixed

Your Job

Bugzilla Bug Report

SCM Repository

fixed bug

1234

Strong Candidate

patch in

1234

since

1234

#1234

Monday, 17 May, 2010

Page 14: Mining Software Repositories - uni-saarland.de · Mining Software Repositories Monday, 17 May, 2010. Predicting Defects for Eclipse ... Bugzilla fixed bug 1234 Strong Candidate fixed

Your Job

Bugzilla Bug Report

SCM Repository

fixed bug

1234

Strong Candidate

fixed bug

1234

Confirmed Reference

filter, filter, filtere.g. time, commit message

patch in

1234

since

1234

#1234

Monday, 17 May, 2010

Page 15: Mining Software Repositories - uni-saarland.de · Mining Software Repositories Monday, 17 May, 2010. Predicting Defects for Eclipse ... Bugzilla fixed bug 1234 Strong Candidate fixed

Your Input

• Project: JEdit‣ Programmer’s text editor

‣ Sourceforge project: http://sourceforge.net/projects/jedit/

• We will provide you with‣ Bug reports (XHTML files)

‣ expect error messages

‣ Tarball containing SVN (Subversion) mirror

‣ Download from Webpage

Monday, 17 May, 2010

Page 16: Mining Software Repositories - uni-saarland.de · Mining Software Repositories Monday, 17 May, 2010. Predicting Defects for Eclipse ... Bugzilla fixed bug 1234 Strong Candidate fixed

Bug Reports

Monday, 17 May, 2010

Page 17: Mining Software Repositories - uni-saarland.de · Mining Software Repositories Monday, 17 May, 2010. Predicting Defects for Eclipse ... Bugzilla fixed bug 1234 Strong Candidate fixed

Bug Reports

comments not visible in browser!

Monday, 17 May, 2010

Page 18: Mining Software Repositories - uni-saarland.de · Mining Software Repositories Monday, 17 May, 2010. Predicting Defects for Eclipse ... Bugzilla fixed bug 1234 Strong Candidate fixed

Bug ReportsXHTML

Hint: One nug can be fixed using multiple transactions!

Fixed in r14535

Monday, 17 May, 2010

Page 19: Mining Software Repositories - uni-saarland.de · Mining Software Repositories Monday, 17 May, 2010. Predicting Defects for Eclipse ... Bugzilla fixed bug 1234 Strong Candidate fixed

SVN tarball

pearl:~ kim$ svn info file://<absolute_path>/svn_repo_14_01_2010/Path: svn_repo_14_01_2010URL: file://<absolute_path>/svn_repo_14_01_2010Repository Root: file://<absolute_path>/svn_repo_14_01_2010Repository UUID: 25ceb265-6a78-4dce-b758-64b437aadf78Revision: 16942Node Kind: directoryLast Changed Author: ezustLast Changed Rev: 16942Last Changed Date: 2010-01-17 06:05:58 +0100 (Sun, 17 Jan 2010)

Monday, 17 May, 2010

Page 20: Mining Software Repositories - uni-saarland.de · Mining Software Repositories Monday, 17 May, 2010. Predicting Defects for Eclipse ... Bugzilla fixed bug 1234 Strong Candidate fixed

SVN tarball

pearl:~ kim$ svn info file://<absolute_path>/svn_repo_14_01_2010/Path: svn_repo_14_01_2010URL: file://<absolute_path>/svn_repo_14_01_2010Repository Root: file://<absolute_path>/svn_repo_14_01_2010Repository UUID: 25ceb265-6a78-4dce-b758-64b437aadf78Revision: 16942Node Kind: directoryLast Changed Author: ezustLast Changed Rev: 16942Last Changed Date: 2010-01-17 06:05:58 +0100 (Sun, 17 Jan 2010)

three slashes!

Monday, 17 May, 2010

Page 21: Mining Software Repositories - uni-saarland.de · Mining Software Repositories Monday, 17 May, 2010. Predicting Defects for Eclipse ... Bugzilla fixed bug 1234 Strong Candidate fixed

Challenges

• Parsing XML

• Full text instead of short messages‣ Many more false positives

• Multiple authors, many comments‣ Order of comments is relevant

• Error Handling

Monday, 17 May, 2010

Page 22: Mining Software Repositories - uni-saarland.de · Mining Software Repositories Monday, 17 May, 2010. Predicting Defects for Eclipse ... Bugzilla fixed bug 1234 Strong Candidate fixed

Possibilities

• Using RegExp

• Using similarty between text

• Order the comments by time

• Comments are very likely to be later than patch

Monday, 17 May, 2010

Page 23: Mining Software Repositories - uni-saarland.de · Mining Software Repositories Monday, 17 May, 2010. Predicting Defects for Eclipse ... Bugzilla fixed bug 1234 Strong Candidate fixed

Programming Language

Monday, 17 May, 2010

Page 24: Mining Software Repositories - uni-saarland.de · Mining Software Repositories Monday, 17 May, 2010. Predicting Defects for Eclipse ... Bugzilla fixed bug 1234 Strong Candidate fixed

Programming Language

We don’t care!

Monday, 17 May, 2010

Page 25: Mining Software Repositories - uni-saarland.de · Mining Software Repositories Monday, 17 May, 2010. Predicting Defects for Eclipse ... Bugzilla fixed bug 1234 Strong Candidate fixed

Programming Language

We don’t care!But: It has to run on CIP pool computers!

Monday, 17 May, 2010

Page 26: Mining Software Repositories - uni-saarland.de · Mining Software Repositories Monday, 17 May, 2010. Predicting Defects for Eclipse ... Bugzilla fixed bug 1234 Strong Candidate fixed

Handing In

• A CSV file containing pairs <transaction_id, bug_id>‣ Which bug has been fixed in which transaction?

• A CSV file containing pairs <svn_path, #bugs>‣ How many bugs could be found in this file?

• Executable program or script (CIP-pool)

21July

Monday, 17 May, 2010

Page 27: Mining Software Repositories - uni-saarland.de · Mining Software Repositories Monday, 17 May, 2010. Predicting Defects for Eclipse ... Bugzilla fixed bug 1234 Strong Candidate fixed

Handing In (2)

• Documentation‣ What have you done?

‣ How can we repoduce the result?

‣ What’s good, what’s bad regarding your approach?

21July

Monday, 17 May, 2010

Page 28: Mining Software Repositories - uni-saarland.de · Mining Software Repositories Monday, 17 May, 2010. Predicting Defects for Eclipse ... Bugzilla fixed bug 1234 Strong Candidate fixed

“It’s better to be vaguely precise than precisely wrong!”

Clem Sunter (ICSE 2010)

Monday, 17 May, 2010

Page 29: Mining Software Repositories - uni-saarland.de · Mining Software Repositories Monday, 17 May, 2010. Predicting Defects for Eclipse ... Bugzilla fixed bug 1234 Strong Candidate fixed

Questions?

Monday, 17 May, 2010

Page 30: Mining Software Repositories - uni-saarland.de · Mining Software Repositories Monday, 17 May, 2010. Predicting Defects for Eclipse ... Bugzilla fixed bug 1234 Strong Candidate fixed

What do we Expect?

“Show that you understood the problem and that you are capable of solving it!”

no

Monday, 17 May, 2010

Page 31: Mining Software Repositories - uni-saarland.de · Mining Software Repositories Monday, 17 May, 2010. Predicting Defects for Eclipse ... Bugzilla fixed bug 1234 Strong Candidate fixed

Why do we Make you do that?

“Mining is a technical challenge.You need to think first but finally you have to do it.”

Monday, 17 May, 2010

Page 32: Mining Software Repositories - uni-saarland.de · Mining Software Repositories Monday, 17 May, 2010. Predicting Defects for Eclipse ... Bugzilla fixed bug 1234 Strong Candidate fixed

• Improve Bug-Mapping techniques

• Decomposing Change Sets

• Trend Mining

• Project Similarity for Prediction

Bachelor/Master - Theses

Monday, 17 May, 2010

Page 33: Mining Software Repositories - uni-saarland.de · Mining Software Repositories Monday, 17 May, 2010. Predicting Defects for Eclipse ... Bugzilla fixed bug 1234 Strong Candidate fixed

Solving a General Problem

Change Set

time

Change Set Change Set Change Set

Monday, 17 May, 2010

Page 34: Mining Software Repositories - uni-saarland.de · Mining Software Repositories Monday, 17 May, 2010. Predicting Defects for Eclipse ... Bugzilla fixed bug 1234 Strong Candidate fixed

Solving a General Problem

time

Change List

Monday, 17 May, 2010

Page 35: Mining Software Repositories - uni-saarland.de · Mining Software Repositories Monday, 17 May, 2010. Predicting Defects for Eclipse ... Bugzilla fixed bug 1234 Strong Candidate fixed

Solving a General Problem

time

Change List

Can we separate these change sets

again?

Monday, 17 May, 2010

Page 36: Mining Software Repositories - uni-saarland.de · Mining Software Repositories Monday, 17 May, 2010. Predicting Defects for Eclipse ... Bugzilla fixed bug 1234 Strong Candidate fixed

Solving a General Problem

time

Change List

Can we separate these change sets

again?

• Call graphs

• Dependency data

• Test data

• DeltaDebugging

• ...

Monday, 17 May, 2010

Page 37: Mining Software Repositories - uni-saarland.de · Mining Software Repositories Monday, 17 May, 2010. Predicting Defects for Eclipse ... Bugzilla fixed bug 1234 Strong Candidate fixed

Solving a General Problem

time

Change List

Can we separate these change sets

again?

• Call graphs

• Dependency data

• Test data

• DeltaDebugging

• ...

Monday, 17 May, 2010

Page 38: Mining Software Repositories - uni-saarland.de · Mining Software Repositories Monday, 17 May, 2010. Predicting Defects for Eclipse ... Bugzilla fixed bug 1234 Strong Candidate fixed

You did it! And now?Which change fixed the bug and which one added the feature?

Monday, 17 May, 2010

Page 39: Mining Software Repositories - uni-saarland.de · Mining Software Repositories Monday, 17 May, 2010. Predicting Defects for Eclipse ... Bugzilla fixed bug 1234 Strong Candidate fixed

Change Set Classification

• keywords do not work‣ best predictors are: “should” and “wtf”

• changes visible in API?‣ reverse engineering

• Test suites

• Dependencies to previous changes

Monday, 17 May, 2010

Page 40: Mining Software Repositories - uni-saarland.de · Mining Software Repositories Monday, 17 May, 2010. Predicting Defects for Eclipse ... Bugzilla fixed bug 1234 Strong Candidate fixed

You did that too?

Monday, 17 May, 2010

Page 41: Mining Software Repositories - uni-saarland.de · Mining Software Repositories Monday, 17 May, 2010. Predicting Defects for Eclipse ... Bugzilla fixed bug 1234 Strong Candidate fixed

Monday, 17 May, 2010

Page 42: Mining Software Repositories - uni-saarland.de · Mining Software Repositories Monday, 17 May, 2010. Predicting Defects for Eclipse ... Bugzilla fixed bug 1234 Strong Candidate fixed

Monday, 17 May, 2010

Page 43: Mining Software Repositories - uni-saarland.de · Mining Software Repositories Monday, 17 May, 2010. Predicting Defects for Eclipse ... Bugzilla fixed bug 1234 Strong Candidate fixed

Monday, 17 May, 2010