Top Banner
The Road Ahead for Mining Software Repositories Ahmed E. Hassan Queen’s University Canada Canada
14

The Road Ahead for Mining Software Repositoriesresearch.cs.queensu.ca/~ahmed/home/teaching/CISC880/F16/...The Road Ahead for Mining Software Repositories Ahmed E. Hassan Queen’s

Jun 09, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: The Road Ahead for Mining Software Repositoriesresearch.cs.queensu.ca/~ahmed/home/teaching/CISC880/F16/...The Road Ahead for Mining Software Repositories Ahmed E. Hassan Queen’s

The Road Ahead for Mining Software Repositories

Ahmed E. HassanQueen’s University

CanadaCanada

Page 2: The Road Ahead for Mining Software Repositoriesresearch.cs.queensu.ca/~ahmed/home/teaching/CISC880/F16/...The Road Ahead for Mining Software Repositories Ahmed E. Hassan Queen’s

Code Repos

SourceforgeGoogleCode

22

Field Logs

Source ControlCVS/SVN

Bugzilla Mailinglists

CrashRepos

Historical Repositories Runtime Repos

Page 3: The Road Ahead for Mining Software Repositoriesresearch.cs.queensu.ca/~ahmed/home/teaching/CISC880/F16/...The Road Ahead for Mining Software Repositories Ahmed E. Hassan Queen’s

• Transforms static record-keeping repositories to activerepositories

• Makes repos data actionable

Mining Software Repositories (MSR)

• Makes repos data actionableby uncovering hidden patterns and trends

3

MailinglistBugzilla Crashes

Field logs CVS/SVN

Page 4: The Road Ahead for Mining Software Repositoriesresearch.cs.queensu.ca/~ahmed/home/teaching/CISC880/F16/...The Road Ahead for Mining Software Repositories Ahmed E. Hassan Queen’s

MSR researchersanalyze and cross-link repositories

fixed bug

discussionsBuggy change &

Fixing change Field crashes

Bugzilla CVS/SVNMailinglist Crashes

Estimate fix effortMark duplicates

Suggest experts and fix

New Bug Report

Page 5: The Road Ahead for Mining Software Repositoriesresearch.cs.queensu.ca/~ahmed/home/teaching/CISC880/F16/...The Road Ahead for Mining Software Repositories Ahmed E. Hassan Queen’s

MSR researchersanalyze and cross-link repositories

fixed bug

discussionsBuggy change &

Fixing change Field crashes

Bugzilla CVS/SVNMailinglist Crashes

Suggest APIsWarn about risky code or bugs

Suggest locations to co-change

New Change

Page 6: The Road Ahead for Mining Software Repositoriesresearch.cs.queensu.ca/~ahmed/home/teaching/CISC880/F16/...The Road Ahead for Mining Software Repositories Ahmed E. Hassan Queen’s

Supporting software understanding (NETBSD)

Conceptual (proposed) Concrete (reality)

6

Why? Who?When? Where?

Page 7: The Road Ahead for Mining Software Repositoriesresearch.cs.queensu.ca/~ahmed/home/teaching/CISC880/F16/...The Road Ahead for Mining Software Repositories Ahmed E. Hassan Queen’s

Mining supports software understanding (NETBSD)

• Eight unexpected dependencies

• All except two dependencies existed since day one:

– Virtual Address Maintenance Pager

– Pager Hardware Translations

Auto-generatedfrom CVS repository

7

Which? vm_map_entry_create (in src/sys/vm/Attic/vm_map.c) depends on pager_map (in /src/sys/uvm/uvm_pager.c)

Who? cgd

When? 1993/04/09 15:54:59 Revision 1.2 of src/sys/vm/Attic/vm_map.c

Why?

from sean eric fagan: it seems to keep the vm system from deadlocking the system when it runs out of swap + physical memory. prevents the system from giving the last page(s) to anything but the referenced "processes" (especially important is the pager process, which should never have to wait for a free page).

Page 8: The Road Ahead for Mining Software Repositoriesresearch.cs.queensu.ca/~ahmed/home/teaching/CISC880/F16/...The Road Ahead for Mining Software Repositories Ahmed E. Hassan Queen’s

Opportunities in the Road Ahead

Repository Extract AnalyzeAdopt Results

Show Value

• Going beyond code and bugs

• Taming the complexity of MSRTaming the complexity of MSR

• Showing the value of repositories

• Easing the adoption of MSR

Page 9: The Road Ahead for Mining Software Repositoriesresearch.cs.queensu.ca/~ahmed/home/teaching/CISC880/F16/...The Road Ahead for Mining Software Repositories Ahmed E. Hassan Queen’s

Opportunities in the Road Ahead

Repository Extract AnalyzeAdopt Results

Show Value

Going beyond code and bugs MSR 2004-2008:

~80% of publications focus on code and bugs

• Explore non-structured data– Social aspects: emails and comments

9

– Social aspects: emails and comments• Link data between repos• Seek non-traditional repos

– Demonstrate the value of IDE interactions or build failures repos

• Understand the limitation of repos– Causation vs. Correlation

• Small number of committers in OS projects

Page 10: The Road Ahead for Mining Software Repositoriesresearch.cs.queensu.ca/~ahmed/home/teaching/CISC880/F16/...The Road Ahead for Mining Software Repositories Ahmed E. Hassan Queen’s

Opportunities in the Road Ahead

Repository Extract AnalyzeAdopt Results

Show Value

• Simplify the extraction of high quality data

Taming the complexity of MSR

main() {int a;/*call

help*/helpInfo();

}

helpInfo() {errorString!

}main() {

int a;/*call

help*/h l I f ()

helpInfo(){int b;}main() {

int a;/*call

help*/h l I f ()

10

– Toolkits and extracted data (e.g. FLOSSMetrics) are needed– Heuristics should be empirically verified– Acknowledgement mechanism needed for extractors

• Deal with skew in repository data– Visualization can help spot skew– Guidelines and re-sampling/robust techniques are needed

• Improve the quality of repository data– Provide tools for annotation of repos data at creation

helpInfo();}

helpInfo();}

V1:Undefined func.(Link Error)

V2:Syntax error

V3:Valid code

Page 11: The Road Ahead for Mining Software Repositoriesresearch.cs.queensu.ca/~ahmed/home/teaching/CISC880/F16/...The Road Ahead for Mining Software Repositories Ahmed E. Hassan Queen’s

Opportunities in the Road Ahead

Repository Extract AnalyzeAdopt Results

Show Value

• Simplify the extraction of high quality data

Taming the complexity of MSR

11

– Toolkits and extracted data (e.g. FLOSSMetrics) are needed– Heuristics should be empirically verified– Acknowledgement mechanism needed for extractors

• Deal with skew in repository data– Visualization can help spot skew– Guidelines and re-sampling/robust techniques are needed

• Improve the quality of repository data– Provide tools for annotation of repos data at creation

Page 12: The Road Ahead for Mining Software Repositoriesresearch.cs.queensu.ca/~ahmed/home/teaching/CISC880/F16/...The Road Ahead for Mining Software Repositories Ahmed E. Hassan Queen’s

Opportunities in the Road Ahead

Repository Extract AnalyzeAdopt Results

Show Value

• Understand the needs of practitionersP di ti b d l

Showing the value of MSR

12

– Predicting buggy modules:• Buggy modules are well-known

– Predicting fault occurrences at module level is too coarse• Study the performance in practice

– Tools affecting the repos data• Show the practical benefits

– Statistical improvements not sufficient– Cost of maintenance should be evaluated

• Evaluate on non-open source systems

Page 13: The Road Ahead for Mining Software Repositoriesresearch.cs.queensu.ca/~ahmed/home/teaching/CISC880/F16/...The Road Ahead for Mining Software Repositories Ahmed E. Hassan Queen’s

Opportunities in the Road Ahead

Repository Extract AnalyzeAdopt Results

Show Value

• Simplify access to techniques

Easing the adoption of MSR

13

– Integration into IDEs (HATARI, Hipikat, Myln, eRose)– A web service demonstration for an open source

project• A continuously updating MSR Challenge

• Help practitioners make decisions– MSR should aim to support not replace

practitioners

Page 14: The Road Ahead for Mining Software Repositoriesresearch.cs.queensu.ca/~ahmed/home/teaching/CISC880/F16/...The Road Ahead for Mining Software Repositories Ahmed E. Hassan Queen’s

Mining Software Repositories

14

http://msrconf.org