TMPA-2017: Defect Report Classification in Accordance with Areas of Testing
Post on 05-Apr-2017
129 Views
Preview:
Transcript
Defect report classification in
accordance with areas of testing
Anna Gromova, Exactpro
Open Access Quality Assurance & Related Software Development for Financial Markets
Tel: +7 495 640 2460, +1 415 830 38 49
www.exactpro.com
Open Access Quality Assurance & Related Software Development for Financial Markets Tel: +7 495 640 24 60 , +1 415 830 38 49
www.exactpro.com2
Defect Management
Areas of research in defect management:
• automatic defect fixing
• automatic defect detection
• metrics and predictions of defect reports
• quality of defect reports
• triaging defect reports
Open Access Quality Assurance & Related Software Development for Financial Markets Tel: +7 495 640 24 60 , +1 415 830 38 49
www.exactpro.com3
• Examples of metrics:
• time to fix / time to resolve
• which defects get reopened
• which defects get fixed
• which defects get rejected
Metrics of testing
Open Access Quality Assurance & Related Software Development for Financial Markets Tel: +7 495 640 24 60 , +1 415 830 38 49
www.exactpro.com4
Area of testing: Component/s and Summary
Open Access Quality Assurance & Related Software Development for Financial Markets Tel: +7 495 640 24 60 , +1 415 830 38 49
www.exactpro.com5
● Manual classification of 2,795 defect reports extracted from the
bug tracking system.
● Answers to the following questions based on the previous
classification and natural language processing:
1. Does feature selection improve defect classification?
1. What combinations of the classifiers and feature selection
methods give the best results?
Contribution
Open Access Quality Assurance & Related Software Development for Financial Markets Tel: +7 495 640 24 60 , +1 415 830 38 49
www.exactpro.com6
Text categorization allows solving
the following tasks:
● classifying defects in
relation to different features,
such as the type of issue,
security or the configuration
aspect;
● predicting the
assignment of a developer that
should fix the bug;
● predicting the category
of the software component that
is connected to the defect,
etc.
Classification: related work
Open Access Quality Assurance & Related Software Development for Financial Markets Tel: +7 495 640 24 60 , +1 415 830 38 49
www.exactpro.com7
Techniques: preprocessing
● Natural language processing:
❖ Tokenization
❖ Removal of stop-words
❖ Stemming
● Bag of words (TF-IDF)
TF(t,d)=freq(t,d)/(maxw∈D freq(w,d))
IDF(t,D)=log2 (|D|/(d∈D:t∈d))
freq(t,d) — term frequency, i.e. the number of times that term t occurs in document d;
max w∈D freq(w,d) — the maximum frequency of any term in document d;
d∈D:t∈d — number of documents containing t;
D — total number of documents in the corpus
TFIDF=TF(t,d)×IDF(t,D)
Open Access Quality Assurance & Related Software Development for Financial Markets Tel: +7 495 640 24 60 , +1 415 830 38 49
www.exactpro.com8
Techniques: feature selection
Open Access Quality Assurance & Related Software Development for Financial Markets Tel: +7 495 640 24 60 , +1 415 830 38 49
www.exactpro.com9
Classifiers:
● Logistic regression
● SVM
● Decision tree
● Random forest
● Naive Bayes
● Bayes Net
Techniques
Open Access Quality Assurance & Related Software Development for Financial Markets Tel: +7 495 640 24 60 , +1 415 830 38 49
www.exactpro.com10
Objects
Open Access Quality Assurance & Related Software Development for Financial Markets Tel: +7 495 640 24 60 , +1 415 830 38 49
www.exactpro.com11
Example
CR
T1: Property1 = true
T2: Property1 = true
Market Structure
Document
Ti: Property1 = false
Current situation
Market Structure Gateway
T1: Property1 = true T1: Property1 = NULL
Open Access Quality Assurance & Related Software Development for Financial Markets Tel: +7 495 640 24 60 , +1 415 830 38 49
www.exactpro.com12
Approach
Open Access Quality Assurance & Related Software Development for Financial Markets Tel: +7 495 640 24 60 , +1 415 830 38 49
www.exactpro.com13
Results: metrics
Open Access Quality Assurance & Related Software Development for Financial Markets Tel: +7 495 640 24 60 , +1 415 830 38 49
www.exactpro.com14
The red values correspond to the minimum values of the F-measure, the green values - to the maximum.
Classifier FSAREA 1 AREA 2 AREA 3 AREA 4 AREA 5 AREA 6 AREA 7 AREA 8
F-measure F-measure F-measure F-measure F-measure F-measure F-measure F-measure
LogReg No 0,745 0,404 0,758 0,905 0,8 0,892 0,964 0,877
SVM No 0,741 0 0,389 0,852 0,389 0,723 0,914 0,864
J48 No 0,898 0,832 0,739 0,953 0,931 0,955 0,991 0,952
RandFor No 0,771 0,628 0,667 0,928 0,867 0,874 0,935 0,968
Bnet No 0,716 0,864 0,764 0,912 0,92 0,862 0,982 0,917
Bayes No 0,68 0,628 0,647 0,847 0,779 0,777 0,956 0,867
LogReg IG 0,907 0,811 0,764 0,883 0,88 0,922 0,894 0,916
SVM IG 0,948 0,862 0,836 0,924 0,938 0,95 0,991 0,938
J48 IG 0,822 0,867 0,739 0,943 0,931 0,955 0,991 0,973
RandFor IG 0,959 0,887 0,897 0,938 0,948 0,936 0,991 0,98
Bnet IG 0,716 0,864 0,764 0,912 0,92 0,862 0,982 0,917
Bayes IG 0,701 0,633 0,688 0,846 0,815 0,784 0,956 0,861
LogReg Cons 0,909 0,86 0,915 0,952 0,938 0,964 0,991 0,973
SVM Cons 0,95 0,87 0,885 0,953 0,938 0,964 0,991 0,976
J48 Cons 0,804 0,829 0,739 0,921 0,931 0,955 0,991 0,902
RandFor Cons 0,939 0,877 0,9 0,95 0,945 0,964 0,991 0,991
Bnet Cons 0,86 0,862 0,792 0,941 0,939 0,964 0,991 0,962
Bayes Cons 0,816 0,752 0,733 0,892 0,935 0,955 0,991 0,929
LogReg Cfs 0,88 0,811 0,83 0,921 0,93 0,915 0,991 0,912
SVM Cfs 0,941 0,862 0,836 0,915 0,938 0,936 0,957 0,91
J48 Cfs 0,821 0,821 0,739 0,916 0,931 0,931 0,991 0,838
RandFor Cfs 0,941 0,842 0,815 0,93 0,938 0,936 0,991 0,918
Bnet Cfs 0,782 0,862 0,815 0,926 0,945 0,847 0,982 0,903
Bayes Cfs 0,714 0,782 0,881 0,914 0,925 0,8 0,991 0,889
LogReg SSF 0,923 0,87 0,836 0,916 0,938 0,955 0,991 0,962
SVM SSF 0,923 0,87 0,836 0,916 0,938 0,955 0,991 0,962
J48 SSF 0,821 0,829 0,739 0,916 0,931 0,955 0,991 0,894
RandFor SSF 0,923 0,87 0,836 0,916 0,938 0,955 0,991 0,962
Bnet SSF 0,86 0,862 0,836 0,916 0,938 0,955 0,991 0,962
Bayes SSF 0,923 0,87 0,836 0,916 0,938 0,955 0,991 0,928
Results: hold out
Open Access Quality Assurance & Related Software Development for Financial Markets Tel: +7 495 640 24 60 , +1 415 830 38 49
www.exactpro.com15
Results: hold out
Open Access Quality Assurance & Related Software Development for Financial Markets Tel: +7 495 640 24 60 , +1 415 830 38 49
www.exactpro.com16
Results: cross-validation
Classifier FSAREA 1 AREA 2 AREA 3 AREA 4 AREA 5 AREA 6 AREA 7 AREA 8
F-measure F-measure F-measure F-measure F-measure F-measure F-measure F-measure
LogReg No 0,724 0,654 0,464 0,837 0,618 0,875 0,967 0,915
SVM No 0,748 0,052 0,726 0,873 0,563 0,86 0,949 0,877
J48 No 0,925 0,821 0,743 0,925 0,927 0,963 0,991 0,957
RandFor No 0,813 0,687 0,721 0,93 0,875 0,941 0,975 0,948
Bnet No 0,717 0,856 0,691 0,913 0,89 0,911 0,982 0,911
Bayes No 0,718 0,7 0,654 0,853 0,789 0,814 0,969 0,841
LogReg IG 0,856 0,785 0,789 0,881 0,882 0,852 0,991 0,879
SVM IG 0,948 0,854 0,825 0,933 0,954 0,971 0,991 0,943
J48 IG 0,931 0,868 0,752 0,947 0,944 0,969 0,991 0,957
RandFor IG 0,954 0,859 0,918 0,939 0,943 0,964 0,985 0,974
Bnet IG 0,717 0,856 0,691 0,913 0,818 0,911 0,982 0,911
Bayes IG 0,718 0,776 0,631 0,849 0,89 0,827 0,973 0,844
LogReg Cons 0,934 0,833 0,914 0,948 0,948 0,974 0,991 0,969
SVM Cons 0,946 0,844 0,914 0,954 0,954 0,976 0,991 0,965
J48 Cons 0,931 0,809 0,789 0,923 0,934 0,968 0,991 0,952
RandFor Cons 0,942 0,837 0,92 0,95 0,951 0,975 0,991 0,975
Bnet Cons 0,818 0,855 0,757 0,946 0,93 0,975 0,991 0,964
Bayes Cons 0,811 0,773 0,78 0,882 0,891 0,937 0,991 0,935
LogReg Cfs 0,921 0,831 0,872 0,931 0,939 0,951 0,982 0,915
SVM Cfs 0,941 0,844 0,841 0,937 0,952 0,962 0,982 0,92
J48 Cfs 0,933 0,791 0,748 0,917 0,933 0,963 0,991 0,905
RandFor Cfs 0,929 0,858 0,88 0,938 0,949 0,958 0,988 0,922
Bnet Cfs 0,797 0,856 0,815 0,931 0,93 0,935 0,988 0,903
Bayes Cfs 0,739 0,78 0,865 0,909 0,912 0,879 0,988 0,849
LogReg SSF 0,924 0,856 0,836 0,916 0,942 0,968 0,991 0,96
SVM SSF 0,924 0,849 0,836 0,917 0,941 0,968 0,991 0,96
J48 SSF 0,927 0,794 0,748 0,917 0,933 0,968 0,991 0,942
RandFor SSF 0,924 0,849 0,841 0,916 0,942 0,968 0,991 0,958
Bnet SSF 0,866 0,856 0,823 0,915 0,942 0,968 0,991 0,958
Bayes SSF 0,924 0,85 0,841 0,916 0,938 0,968 0,991 0,957
The red values correspond to the minimum values of the F-measure, the green values - to the maximum.
Open Access Quality Assurance & Related Software Development for Financial Markets Tel: +7 495 640 24 60 , +1 415 830 38 49
www.exactpro.com17
Results: cross-validation
Open Access Quality Assurance & Related Software Development for Financial Markets Tel: +7 495 640 24 60 , +1 415 830 38 49
www.exactpro.com18
Results: hold-out vs cross-validation
Open Access Quality Assurance & Related Software Development for Financial Markets Tel: +7 495 640 24 60 , +1 415 830 38 49
www.exactpro.com19
1. Manual classification of 2,795 defect reports extracted from the bug tracking
system according to the area of testing.
2. Building classifiers for each area using different machine learning and natural language
processing techniques.Methods of feature selection: information gain, the consistency-based and correlation-based methods,
and the simplified silhouette filter. Methods of classification: logistic regression, support vector machines,
decision tree, random forest, Bayes net and Naive Bayes.
❖ Feature selection is an integral part of a successful classification process
❖ The following combinations of the classifiers and feature selection methods have the best
results in both types of the set division:
- random forest and information gain;
- random forest and the consistency-based method;
- support vector machines and information gain;
- support vector machines and the consistency-based method.
Conclusions
Open Access Quality Assurance & Related Software Development for Financial Markets Tel: +7 495 640 24 60 , +1 415 830 38 49
www.exactpro.com20
● Clustering of defect-reports
● Prediction of the metric called “which defects get reopened”.
Future work
Open Access Quality Assurance & Related Software Development for Financial Markets Tel: +7 495 640 24 60 , +1 415 830 38 49
www.exactpro.com21
Thank you!
Open Access Quality Assurance & Related Software Development for Financial Markets Tel: +7 495 640 24 60 , +1 415 830 38 49
www.exactpro.com22
● Antoniol G., Ayari K., Di Penta M., Khomh F., Guhneuc Y.-G.: Is it a bug or an enhancement?: A text-
based approach to classify change requests. In Proc. 2008 Conf. Center for Adv. Studies Collaborative
Res.: Meeting Minds, 2008, ser. CASCON 08, Article No. 23. New York, NY, USA: ACM, 304-318
● Xia X., Lo D., Qiu W., Wang B., Zhou B.: Automated Configuration Bug Report Prediction Using Text
Mining. In 2014 IEEE 38th Annual Computer Software and Applications Conference, 2014, 107–116
● Gegick M., Rotella P., Xie T.: Identifying security bug reports via text mining: An industrial case study.
In Proc. 7th IEEE Working Conf. Mining Software Repositories (MSR), May 2010, IEEE Computer
Society, 11-20
● Zhou Y., Tong Y., Ruihang Gu, Gall H.C.: Combining Text Mining and Data Mining for Bug Report
Classification. In Proc. of 30th International Conference on Software Maintenance and Evolution
(ICSM/ICSME), IEEE, 2014, 311–320
● Somasundaram K., Murphy G.C.: Automatic categorization of bug reports using latent dirichlet
allocation. In proc. of the 5th India Software Engineering Conference , ISEC’12, New York, 2012, ACM,
125–130
● Cubranic D., Murphy G.C: Automatic bug triage using text categorization. In Proc. 16th Int. Conf.
Software Eng. Knowledge Eng.. : KSI Press, 2004, 92–97
● Sureka A.,Indukuri K.V.: Linguistic analysis of bug report titles with respect to the dimension of bug
importance. In Proceedings of the Third Annual ACM Bangalore Conference, Article No. 9, ACM, 2010,
1–6
Related work
top related