Evaluating the presence and impact of bias in bug-fix datasets Israel Herraiz, UPM http://mat.caminos.upm.es/~iht Talk at University of California, Davis April 11 2012 This presentation is available at http://www.slideshare.net/herraiz/evaluating-the-presence-and-impact-of-bias-in-bugfix-datasets
35
Embed
Evaluating the presence and impact of bias in bug-fix datasets
Empirical Software Engineering relies on reusable datasets to make it easier to replicate empirical studies and therefore build theories on top of those empirical results. An area where these reusable datasets are particularly useful is defect predictions. In this area, the goal is to predict which entities will be more error prone, so managers can take preventive actions to improve the quality of the delivered system. These reusable datasets contain information about source code files and their history, bug reports, and bugs fixed in each one of the files. However, some of the most used datasets in the Empirical Software Engineering community have been shown to be biased: many links between files and fixed bugs are missing. Research work has already shown that this bias may affect the performance of defect prediction models. In this talk we will show how to use statistical techniques to evaluate the bias in datasets, and to estimate their impact on defect prediction
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Evaluating the
presence and impact
of bias in bug-fix
datasets Israel Herraiz, UPM
http://mat.caminos.upm.es/~iht
Talk at University of California,
Davis
April 11 2012
This presentation is available at http://www.slideshare.net/herraiz/evaluating-the-presence-and-impact-of-bias-in-bugfix-datasets
1 / 34 http://mat.caminos.upm.es/~iht
Outline
1. Who am I and what do I do
2. The problem
3. Preliminary results
4. The road ahead
5. Take away and discussion
2 / 34 http://mat.caminos.upm.es/~iht
1. Who am I and what do I do
3 / 34 http://mat.caminos.upm.es/~iht
About me
• PhD on Computer Science from Universidad
Rey Juan Carlos (Madrid) • “A statistical examination of the evolution and properties
of libre software”
• http://herraiz.org/phd.html
• Assistant Professor at the Technical University
of Madrid • http://mat.caminos.upm.es/~iht
• Visiting UC Davis from April to July hosted by
Prof. Devanbu • Kindly funded by a MECD “José Castillejo” grant