Top Banner
Hindawi Publishing Corporation Mathematical Problems in Engineering Volume 2013, Article ID 869356, 12 pages http://dx.doi.org/10.1155/2013/869356 Research Article Using Software Dependency to Bug Prediction Peng He, 1 Bing Li, 2 Yutao Ma, 2 and Lulu He 2 1 State Key Lab of Soſtware Engineering, School of Computer, Wuhan University, Wuhan 430072, China 2 State Key Lab of Soſtware Engineering, School of Computer, Research Center of Complex Network, Wuhan University, Wuhan 430072, China Correspondence should be addressed to Peng He; [email protected] Received 12 March 2013; Revised 16 August 2013; Accepted 16 August 2013 Academic Editor: Chuandong Li Copyright © 2013 Peng He et al. is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Soſtware maintenance, especially bug prediction, plays an important role in evaluating soſtware quality and balancing development costs. is study attempts to use several quantitative network metrics to explore their relationships with bug prediction in terms of soſtware dependency. Our work consists of four main steps. First, we constructed soſtware dependency networks regarding five dependency scenes at the class-level granularity. Second, we used a set of nine representative and commonly used metrics—namely, centrality, degree, PageRank, and HITS, as well as modularity—to quantify the importance of each class. ird, we identified how these metrics were related to the proneness and severity of fixed bugs in Tomcat and Ant and determined the extent to which they were related. Finally, the significant metrics were considered as predictors for bug proneness and severity. e result suggests that there is a statistically significant relationship between class’s importance and bug prediction. Furthermore, betweenness centrality and out-degree metric yield an impressive accuracy for bug prediction and test prioritization. e best accuracy of our prediction for bug proneness and bug severity is up to 54.7% and 66.7% (top 50, Tomcat) and 63.8% and 48.7% (top 100, Ant), respectively, within these two cases. 1. Introduction During the soſtware development and maintenance proce- dures, bug (defect) is one of the most important forces to drive the improvement of soſtware quality. It is well-known that soſtware engineering is a systematic and disciplined approach to developing soſtware. In detail, it applies com- puter science, engineering principles and practices to the generation, operation, and maintenance of soſtware systems. ere are many key processes in soſtware engineering. In particular, Soſtware maintenance and upgrade play a vital role in soſtware engineering. It has many advantages, such as improving the efficiency of programming, reducing the cost of maintenance, and promoting the development of soſtware systems. In fact, most development efforts and expenditures are allocated to this stage. We know that the majority of soſtware projects today are becoming increasingly large and complex. While maintenance is considered as an ongoing process throughout the soſtware life cycle, according to maintenance activities account for over 70% of the total life cycle costs of a soſtware product. Large development project incurs a sizable number of bug reports every day, more and more efforts should be involved to resolve these burdened problems. us the effective and efficient solutions to bug prediction and test prioritization are urgent for both open source soſtware and proprietary soſtware. Due to notable discoveries in the fast evolving field of complex networks and the dramatic increase of scale and complexity of real-world soſtware systems, more and more researches in soſtware engineering have also focused on representing the topology of soſtware systems with network theory. Soſtware represents one of the most diverse and sophisticated human made systems; however, little is known about the actual structure and quantitative properties of (large) soſtware systems. In the context of combining complex networks theories with soſtware engineering practices, the research for bug prediction has already made several discov- eries over the past years. In this paper we propose to use some quantitative metrics from network sciences to actualize bug prediction of complex systems and further enhance soſtware engineering practices.
13

Research Article Using Software Dependency to Bug Predictiondownloads.hindawi.com/journals/mpe/2013/869356.pdfdefects. ey found that network metrics perform signi - cantly better than

Dec 15, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Research Article Using Software Dependency to Bug Predictiondownloads.hindawi.com/journals/mpe/2013/869356.pdfdefects. ey found that network metrics perform signi - cantly better than

Hindawi Publishing CorporationMathematical Problems in EngineeringVolume 2013 Article ID 869356 12 pageshttpdxdoiorg1011552013869356

Research ArticleUsing Software Dependency to Bug Prediction

Peng He1 Bing Li2 Yutao Ma2 and Lulu He2

1 State Key Lab of Software Engineering School of Computer Wuhan University Wuhan 430072 China2 State Key Lab of Software Engineering School of Computer Research Center of Complex NetworkWuhan University Wuhan 430072 China

Correspondence should be addressed to Peng He penghewhueducn

Received 12 March 2013 Revised 16 August 2013 Accepted 16 August 2013

Academic Editor Chuandong Li

Copyright copy 2013 Peng He et alThis is an open access article distributed under the Creative Commons Attribution License whichpermits unrestricted use distribution and reproduction in any medium provided the original work is properly cited

Software maintenance especially bug prediction plays an important role in evaluating software quality and balancing developmentcosts This study attempts to use several quantitative network metrics to explore their relationships with bug prediction in termsof software dependency Our work consists of four main steps First we constructed software dependency networks regarding fivedependency scenes at the class-level granularity Second we used a set of nine representative and commonly usedmetricsmdashnamelycentrality degree PageRank and HITS as well as modularitymdashto quantify the importance of each class Third we identified howthese metrics were related to the proneness and severity of fixed bugs in Tomcat and Ant and determined the extent to which theywere related Finally the significant metrics were considered as predictors for bug proneness and severity The result suggests thatthere is a statistically significant relationship between classrsquos importance and bug prediction Furthermore betweenness centralityand out-degree metric yield an impressive accuracy for bug prediction and test prioritization The best accuracy of our predictionfor bug proneness and bug severity is up to 547 and 667 (top 50 Tomcat) and 638 and 487 (top 100 Ant) respectivelywithin these two cases

1 Introduction

During the software development and maintenance proce-dures bug (defect) is one of the most important forces todrive the improvement of software quality It is well-knownthat software engineering is a systematic and disciplinedapproach to developing software In detail it applies com-puter science engineering principles and practices to thegeneration operation and maintenance of software systemsThere are many key processes in software engineering Inparticular Software maintenance and upgrade play a vitalrole in software engineering It has many advantages suchas improving the efficiency of programming reducing thecost of maintenance and promoting the development ofsoftware systems In fact most development efforts andexpenditures are allocated to this stage We know that themajority of software projects today are becoming increasinglylarge and complex While maintenance is considered as anongoing process throughout the software life cycle accordingtomaintenance activities account for over 70of the total lifecycle costs of a software product Large development project

incurs a sizable number of bug reports every day more andmore efforts should be involved to resolve these burdenedproblems Thus the effective and efficient solutions to bugprediction and test prioritization are urgent for both opensource software and proprietary software

Due to notable discoveries in the fast evolving field ofcomplex networks and the dramatic increase of scale andcomplexity of real-world software systems more and moreresearches in software engineering have also focused onrepresenting the topology of software systems with networktheory Software represents one of the most diverse andsophisticated human made systems however little is knownabout the actual structure and quantitative properties of(large) software systems In the context of combining complexnetworks theories with software engineering practices theresearch for bug prediction has already made several discov-eries over the past years In this paper we propose to use somequantitative metrics from network sciences to actualize bugprediction of complex systems and further enhance softwareengineering practices

2 Mathematical Problems in Engineering

Given the advantages of open source (ie the opennessof source codes and the available data repositories) thefollowing work is conducted on open-source softwareOpen-source software usually maintains a bug repository(eg Bugzilla (httpwwwbugzillaorg) GNANTS (httpwwwgnuorgsoftwaregnats) and JIRA (httpwwwatlas-siancomsoftwarejira)) With the increasing scale of soft-ware project new features are added and more bugs are ledinto the system According to the statistics [1] Eclipse onaverage 37 bugs was submitted and more than 3 person-hours are spent on handling it per day Mozilla discoversmore than 300 bugs However potential bugs are often farmore than that have been submitted in a system and moreserious How to quickly retrieve much more bugs or seriousbugs is the overarching goal of this work specifically whichmetrics can be adapted to resolve this problem

The rest of this paper is organized as follows Section 2 is areview of related work In Section 3 the preliminary theoriesof software networks and network metrics are hold and theresearch questions are presented and Section 4 focuses on theresearch approach and the construction of software networkSection 5 shows the whole progress of our experiment Afterthat we discuss the empirical results and several limitationsIn Section 7 a conclusion for practice and research is made

2 Related Work

Defect prediction models are used to support softwareinspection and to improve software quality by locatingpossible defects For software bug prediction many softwaremetrics have been proposed The most commonly-used onesare CK Object-oriented metrics also traditional (eg LOC)and process (eg code churn) metrics are often used insome literature [2] With the increasing maturity of complexnetwork theory network metrics are of more concerns suchas centrality and are used to defect prediction [3ndash5]

21 Software Network Many real-world software systems areextremely complex and regarded as complex systems inwhich software entities (eg objects classes and packages)are abstracted as nodes and the dependencies between twonodes as links or edges Dependencies essentially representthe information flow within a software system and exert atleast some influences on the overall success and quality of theproduct [6]

Characterizing large or ultralarge scale (ULS) softwaresystem as a complex networks is rational and a great quantityof literature [7ndash12] has already demonstrated some softwarenetworks characteristics like small-world phenomenon andscale-free distribution of degree Likewise with regard tosoftware being built up out of many interacting units andsubsystems (software entities) at multiple granularity dif-ferent levels of granularity software networks (componentfeature class method etc) have been constructed to study byresearchers [13]

22 Network Metrics Many metrics have been defined inboth complex networks and social network analysis Cen-trality [14 15] was used to assess the relative importance

of nodes in a given network The simplest one is degreecentrality known as the number of connections a node hasto other nodes In a social context degree centrality canbe explained in terms of the potential impact of a node onother nodes However degree centrality does not capture theglobal position of a node in the network in terms of howimportant a node is to all other nodes so furthermeasures areproposed such as closeness centrality betweenness centralityand eigenvector centrality

Modularity [15 16] is a desirable characteristic for soft-ware systems The modularity of software architecture con-tributes to the sustainability of large scale software projectsby fostering the decoupling and cohesiveness of softwaredevelopment Particularly as the software evolves over timemodularity might even facilitate its maintainability andexpendability In some literature besides in-degree and out-degree [8] presenting the complexity and reuse were usedrespectively PageRank and HITS [17] sorting algorithm wereused to handle the directed network

Additionally Zimmermann and Nagappan [3] on Win-dows Server 2003 where the authors leveraged dependencyrelationships between software entities captured using socialnetwork metrics to predict whether they are likely to havedefects They found that network metrics perform signifi-cantly better than source code metrics at predicting defectsPremraj and Herzig [18] presented a replication study basedon ZampN work and found that network metrics undoubtedlyperform much better than code metrics in the stratifiedrandom sampling method

23 Network Metrics for Application Meneely et al [4]examined collaboration structurewith the developer networkderived from code churn information to predict failures atthe file level and developed failure prediction models usingtest and postrelease failure data The result indicated thata significant correlation exists between file-based developernetworkmetrics and failures Crowston et al [5] identified thecore-periphery structure and applied these measures to theinteractions around bug fixing for 116 SourceForge projectsPinzger et al [19] investigated the relationship betweenthe fragmentation of developer contributions measured bynetwork centrality metrics and the number of postreleasefailure with a developer-module network Authors claimedthat central modules weremore likely to be failure prone thanthose located in surrounding areas of the network Shin et al[20] used network analysis to discriminate and predict vul-nerable code locations They evaluated that network metricscan discriminate between vulnerable and neutral files andpredict vulnerabilities Sureka et al [21] derived a collabo-ration network from a defect tracking system and appliednetwork analysis to investigate the derived network for thepurpose of risk and vulnerability analysisThey demonstratedthat important information of risk and vulnerability can beuncovered using network analysis techniques

Dependencies exist between various pieces of com-ponent Modification to components with little regard todependencies may have an adverse impact on the quality ofthe latter Zimmermann and Nagappan [3] proposed to use

Mathematical Problems in Engineering 3

network analysis on software dependency network helpingidentify central program units that are more likely to concealdefects Perin et al [22] used PageRank for ranking classesof Pharo Smalltalk system based on a dependency graphrepresenting class inheritance and reference Bhattacharyaet al [23] constructed software networks at source codeand module level to capture its evolution and estimate bugseverity prioritize refactoring efforts and predict defect-prone release Steidl et al [24] used different network analysismetrics on dependency network to retrieve central classesand demonstrated that the results can compete with thesuggestions of experienced developers Zanetti et al [15]studied bug reporter centrality to validate that the centralityof bug reporters is greatly helpful for bug triaging procedures

To the best of our knowledge bug prediction is a problemthat is still an open question hence we expect that our studycan provide a supplement to the existing solutions Althoughthe prior researches on using network metrics to predictfailure proneness or bug severity our study leverages morerelationships between classes in analyzing the implications ofdependencies Additionally prior researches did not considerboth as is done in this paper We constructed softwarenetwork at class-level granularity and then used networkmetrics to analyze bug proneness and severity Differing fromthe existing work we make some contributions as follows

(1) Five dependency scenes are considered in our class-level software network inheritance field methodreturn and parameter dependency yet part of theserelationships were taken into account in most ofexisting researches

(2) A comparison between different metrics is madefirstly to explore the extent to which they reflect bugquantity and severity after that to model the pre-dictors with significant metrics to improve softwaredevelopment Some researchers directly assemble var-ious network metrics without filtration thus it isinevitable to bring biases so that reducing the accu-racy Our work just bridges this gap by a comparisonin the first place

(3) We select betweenness centrality and out-degreemet-rics for bug proneness and severity prediction Justas 10 efforts are paid an ideal F-value acquiredabout bug proneness and severity is up to 547 and667 and 638 and 487 respectively which iscomparable with the result in [4 25]

3 Preliminary Study

31 Software Networks There are all kinds of networksaround us such as social networks (eg friendship networksand scientific collaboration networks) technological net-works (eg the Internet andWWW) andbiological networks(eg neural networks and interaction of proteins in yeast)Surprisingly the underlying structures of these networkshave completely different statistical features than those ofregular and random networks They all belong to complexnetworks These discoveries have served to draw together

Public class A middot middot middot

middot middot middot

middot middot middot

Public class B extends A

public C Attribute

public void Method(A Arg)

Public class D

public B Atrr1public C Attr2

A B

C D

Public class C

Atrr2Method(Atrr1)

Figure 1 A simple example of software network

many disparate domains into an emerging science of networkscience

With the rapid development of software technology andthe pervasive of Internet software scale and complexity sosharply increase that developers could not control the systemin the global perspective then software plan and qualitycannot be guaranteed In order to overcome the challengessome pioneers of complex systems try to introduce graphtheory that nodes and edges are used to simplify the structureTo our surprise software network also shows the basic char-acteristics of complex networks These exciting discoveriesare paid attentions by researchers from software engineeringThrough the interdisciplinary between complex network andsoftware engineering an approach has been emerging byabstracting a software system as a network that is softwarenetwork

Software network is an interdisciplinary outcome basedon the combination of network science theory and softwareengineering practiceWhen constructing a software networkthe source code should be handled in reverse engineeringmethods (ie compiled the codes into a xml file then derivedthe topology structure from the xml file) Figure 1 is anexample of the fragments of software networks

32 Network Metrics Network metrics treat software entitiesas nodes in a graph and characterize themon the basis of theirdependencieswith other entities As opposed to codemetricsnetwork metrics take into account the interactions betweenentities thus modelling the flow of software information Inthis paper node centrality PageRank and HITS in-out-andtotal degrees andmodularity ratio metrics are involved thenthose significant metrics were used to predict bug pronenessand severity

321 Eigenvector Centrality (EC) It is a measure of theinfluence of a node in a network It assigns relative scores to allnodes in the network based on the concept that connectionsto high-scoring nodes contributemore to the score of currentnode than equal connections to low-scoring nodes and arecomputed by the adjacency matrix For a given network(graph) 119866 = (119881 119864) with |119881| number of nodes let 119860 = (119886

119894119895)

4 Mathematical Problems in Engineering

be the adjacency matrix if vertex 119894 is linked to 119895 so 119886119894119895= 1

and 119886119894119895= 0 otherwise The centrality score of vertex 119894 can be

defined as

119909119894=1

120582sum

119895isin119872(119894)

119909119895=1

120582sum

119895isin119866

119886119894119895119909119895 (1)

where119872(119894) is a set of the neighbors of 119894 and 120582 is a constantOnly the greatest eigenvalue results in the desired centralitymeasure the 119894th component of the eigenvector then gives thecentrality score of the vertex 119894 in the network [26]

322 Betweenness Centrality (BC) It is based on the totalnumber of shortest paths between all possible pairs of nodesthat pass through a node [27] It quantifies frequencies that anode acts as a bridge along the short path between two othernodes If 119892(119904119905)

119894are all shortest path node 119904 and 119905 which pass

through node 119894 and 119899(119904119905) is the shortest path between 119904 and 119905119899 is the total of nodes

BC119894=sum119904lt119905119892(119904119905)

119894119899(119904119905)

(119899 minus 1) (119899 minus 2) (2)

323 Closeness Centrality (CC) It concerned the farness toall other nodes Thus it is defined as the inverse of the sumof all distances to all other nodes [28] If 119889(119894 119895) is the distancebetween node 119894 and 119895 so 1198941015840119904 centrality is

CC119894=1

sum119881

119895=1119889 (119894 119895)

(3)

324 PageRank (PR) It is a variant of the Eigenvectorcentrality measure which is a probability distribution usedto represent the likelihood that while a person randomlyclicking on links will arrive at a particular page The rankvalue indicates the importance of a page which is used hereto denote the importance of a class

PR119894=1 minus 119902

119873+ 119902 sum

119895isin119881(119894)

PR (119895)out119863(119895)

(4)

where 119902 is a damping factor (119902 = 085)119881(119894) is the set of nodesthat link to 119894 out119863(119895) is the number of outgoing edges onnode 119895 and119873 is the total number of nodes in a network

325 HITS It is also originally designed to rank web pageswhich calculates two scores authority and hub scores Theauthority score is computed as the sum of the scaled hubscores that point to that page The hub score is the sum ofthe scaled authority scores of the pages it points to Here theauthority score as a experimental metric is used

326 Degree (D) In the case of directed network it ismeaningful to distinguish in-degree (the number of incom-ing links) from out-degree (the number of outgoing links)For software networks in-degree is a representation of codereuse and out-degree is design complexity Nodes with large

in-degree are reused and with large out-degree are morecomplex in some contexts In our paper we use in119863(119894)out119863(119894) and 119863 representing in-degree out-degree and totaldegree of a node respectively

327 Modularity Ratio (MR) Modularity is a metric used toevaluate the quality of partitioning community in a networkby Newman and Girvan [29] A system consists of manypackages and each package has a large number of classes orsubpackages For a given definition ofmodules or clusters andtheir underlying network structure its respective degree ofmodularity is defined by

Q =sum119899

119894119890119894119894minus sum119899

119894119886119894119887119894

1 minus sum119899

119894119886119894119887119894

(5)

where 119890119894119895is the fraction of all edges that link nodes in module

119894 to nodes in module 119895 119886119894= sum119899

119895119890119894119895 119887119894= sum119899

119895119890119895119894(the sum

of column and row resp) while 119899 is the total number ofexistingmodules According to this equation we define aMRof modular 119894 as follows

MR119894= 119890119894119894minus 119886119894119887119894 (6)

33 Network Metrics for Application Before presenting theresearch questions and the details of our approach we givetwo terminologies used henceforth in this paper whichindicate the practical application of our work

(i) Bug proneness Bug proneness is treated as a qualitymeasure of software entries in this paper An intuitiveunderstanding is that it represents the probability thatan entry (ie class) will become buggy in the processof debuggingThe higher the bug proneness of a classis the more likely it will be given priority to testingFor our purpose we valuate it via the number of bugsand fixed classes

(ii) Bug severity This terminology describes the impactof a bug on the entire system Bug tracking systemsclassify the severity into several levels such as criticalmajor minor and trivial However severity andpriority is not the same concept In some cases a bugof critical severity but the priority to resolve it is lowas this will happen rarely in real scenario Here wewould like to investigate that the importance of a classin the software network is related to the severity of bugcaused by it

With these metrics and application scenes the followingfour research questions are proposed to guide the designof the case study Questions 1ndash3 investigate the propertiesof software networks whereas question 4 predicts the bugproneness and severity with the significant metrics

RQ1 Is the position of classes in the software networkrelated to the bug proneness

RQ2 Is the position of classes in the software networkrelated to the bug severity

RQ3 If so which metrics are more significant

Mathematical Problems in Engineering 5

RQ4 How well does the approach to predicting bug prone-nessseverity with significant metrics work

4 Approach

In our approach we mainly consider two Open SourceprojectsmdashTomcat and Antmdashwritten in Java as our researchsubjects in the Apache Software Foundation (ASF) (httpantapacheorg) and (httptomcatapacheorg) The ratio-nale is threefold (1)They are well-known and stable projectseach has undergone a number of major release cycles and isstill under active development (2)The source code revisionarchives and SVN commit-log dating back several years andbug reports are available it is a good chance for rewardingexperience in Open Source projects (3) The choice of Javaprogramming language is limited by the tools developedto construct software networks and we are interested inunderstanding open source software written in Java

We observed the information provided by Bug trackingsystems (Bugzilla and JIRA) and SVN commits found thatmost bugs have given out the detailed classes and somehave already posted the modified fragment of source codesThis information determines the feasibility of our approachBesides the dependencies between classes which are directedshould not be ignored Therefore we decide to abstract thedirected but unweighted software networks at the class level

Let us assume that 119866 = (119881 119864) is a software networkdefinition 119881 = V

119894 is the set of all classes and 119864 = 119890

119894119895 is

the set of all dependencies We distinguish between differentkinds of dependencies as follows

(i) V119894implementingextending the interfaceclass V

119895is

inheritance dependency

(ii) V119894having a field of type V

119895is field dependency

(iii) V119894calling a method of V

119895ismethod dependency

(iv) Amethod of V119894returning an object of type V

119895is return

dependency

(v) A method of V119894taking an object of V

119895as a parameter

is parameter dependency

A bug report yet offers many fields to help developersunderstand this issue one of which is severity assigned basedon how severely a bug affects the program by administratorsTable 1 shows eight levels of bug severity and their ranksin the Bugzilla An objective for software providers is notonly to minimize the total number of bugs but to ensurethat bugsrsquo severity as low as possible [23] The approachmainly consists of four phases (1) compiling the sourcecode files to extract the class-level directed software networksregarding the dependency requirements (2) exporting SVNcommits and integrating themwith bug reports to obtain ournecessary bug-class association relationships (3) calculatinga set of networkmetrics in the established software networksthen acquire their relationships with the number of bugs andbug severity (4) finally applying the significant metrics tothe analysis of bug prediction The framework is shown inFigure 2

Table 1 Levels of bug severity and their ranks in the Bugzilla bugtracking system

Bug severity RankBlocker 8Critical 7Regression 6Major 5Normal 4Minor 3Trivial 2Enhancement 1

EC

BC

CC

BugzillaBug-class

Bug severity

Bug proneness

Source code

Software network

Changelog

(1)

(2)

(3)

(4)

MR

PRHITS

inDoutDD

Figure 2 The framework of our approach

5 Case Study

This section presents the subject projects used for our studyand the relevant data we collected With the experimentalresults the proposed questions are answered in turn

51 Data For our analysis we collect a number of types ofdata We gather the open source software data from sourcecode repositories and bug database and determine the rankof each bug In this paper we choose Tomcat 7 and Ant184 as our experimental subjects As a successful opensource project both Tomcat and Ant maintain are a publiclyavailable source To construct expected software networksource codes are compiled to xml file then software networkis established by parsing the xml file In the subsequentpart we simplified Tomcat software network as TSN andAnt software network as ASN There are 35 versions forTomcat 7The latest version is continuously renovating so farthereby only the last 34 versions are adopted in our workBut few differences in the number of nodesedges betweeneach version A likely explanation for this is that Tomcat is a

6 Mathematical Problems in Engineering

Table 2 The statistics The number of bugs and fixed classesnodesedges of software network

Tomcat Antbugs 475 480classes 422(21) 389(29)SN nodes 2015 1345SN edges 9573 3937community 19 20

relatively mature open source software project with decadedevelopment history Of course not all projects keep thisphenomenon As we known that some projects have a nearlysuper linear growth [13] At last stable Tomcat 7029 versionis used to analyze consists of 2015 nodes and 9573 edges 19communities 1345 nodes and 3937 edges 20 communities forAnt

The bug reports are comprised of two sets one used forthe experiment and the other for prediction That is the firstthirty versions of Tomcat are utilized for progress (2) and(3) yet the last four versions for prediction Owing to Antdata refers to only one version eighty percent of Ant datais used for experiment and the remaining twenty percentfor prediction Note that studies of software defect rely onlinks between bug databases and commit changes and thisinformation plays an important role inmeasuring quality andpredicting defects Some prior researches suggested that thereare some missing links between bugs and bug-fix commitlogsThey proposed automatic approaches to recovermissinglinks [30 31] and found that the missing links lead to biaseddefect information and it affects prediction performanceHowever some authors argued that a bug may not be abug but a feature or an enhancement [32 33] This mis-classification also introduces bias in bug prediction modelsWhile considering the missing links the false position ofbug classification increases Hence only bugs always reportedas a bug in the bug tracker are taken into account Onceeffective bugs are identified one checks the change logsby searching for bug ID and calculating the similarity ofsummary text The reason for using heuristics is that mostof missing links are developed due to misclassification in ourdatasets The statistics of the experimental data are shown inTable 2 There are some test classes that are not taken intoaccount because they are duplicate In bug 53062 for exam-ple both class orgapachecatalinaconnectorTestResponse andorgapachecatalinaconnectorResponse appear but they rep-resent the same object in our study Note that less than thirtypercent of the classes are involved it is confirmed thatmost ofthe bugs occur in a small number of classes So all we shoulddo is to have a guiding detection in a system

52 Results

RQ1 Is the Position of Classes in the Software Network Relatedto the Bug Proneness To answer this research question firstof all the nodes are divided into different groups by metricvalue in descending order and each group keeps the same

number of nodes The reason for doing this is to facilitatethe exploration for relations between the metrics and bugsOwing to the last 15 nodes are never fixed in TSN thus theyare excluded to keep each group of 200 nodes For ASNin which nodes are divided into eleven groups evenly Thenumber of fixed classes and involved bugs in each group arerecordedWith this information the answer to1198761 is not hardto hold

The number of fixed classes and involved bugs for differ-ent metrics in each group are illustrated in Figures 3 and 4 InFigure 3 EC inD D and HITS have a significantly negativecorrelation (the bigger the average metric value of a class themore bug-prone) BC and outD show a skew distribution seeFigures 3(a) 3(b) 3(e) 3(f) 3(g) and 3(h) There are somedifferences in Figure 4 BC outD and D show a significantcorrelation but do not an obviously skew distribution seeFigures 4(b) 4(f) and 4(g) The results of other metrics arefluctuated especially in groups six eight and nine

MR shows the same relationship with fixed classes andbugs A few differences from the above metrics are thatthe distribution of bugs is higher than classesrsquo and morefluctuated In subfigure (i) the x-axis is the rank of modular(community) the larger modularity ratio the higher its rankis An overall positive trend is that the larger metric value of aclass is the more it will be bug-prone and themore it is likelyto be modified is shown especially BC outD andDmetrics

RQ2 Is the Position of Classes in the Software NetworkRelated to the Bug SeverityUnderstanding and characterizingthe relationships between network metrics and bug severityin open-source software projects is also a very interestingproblem Although the value of metric is positively relatedto bug proneness whether it is related to bug severity theanswer may be not because there exist a lot of bugs in aclass but all of them are not serious We conduct the nextexperiment to validate this research questionwith the severityrank given in Table 1 The results in Figure 5 show that therelationship deviates what was expected CC keeps stablein both projects the overall distribution trend is increasedfirst and then decreased The top row depicts the severity ofTomcat and the below row does Ant

Rank 5 and 7 are two prominent breakpoints marked bythe dotted line in each subfigure they refer to the majorand critical bugs respectively Bugs with rank 8 have thelarge metric value in TSN on the contrary this kind of bugshave the relatively low value in ASN Blocker bugs shouldbe avoided as much as possible in software engineeringpracticesOnce this kind of bug occurs and is found the originis where it may trigger that a lot of other parts are modifiedor the problem is only a small mistake but affects an vitalnode

RQ3 If So Which Metrics Are More Significant Differentmetrics measure the importance of a node from differentaspects in a network According to the results obtained fromRQ1 RQ2 we know that not all the metrics have the sameexpected relationship somemetrics may bemore suitable forbug proneness prediction and others for severity predictionIn this section another work is conducted to analyze how

Mathematical Problems in Engineering 7

0

20

40

60

80

0 2 4 6 8 10 12

(a) EC

0

50

100

150

0 2 4 6 8 10 12

(b) BC

0

10

20

30

40

50

60

70

0 2 4 6 8 10 12

(c) CC

0

50

100

150

0 2 4 6 8 10 12

(d) PR

0

20

40

60

80

0 2 4 6 8 10 12

(e) inD

0

50

100

150

0 2 4 6 8 10 12

(f) outD

0

20

40

60

80

100

120

0 2 4 6 8 10 12

ClassesBugs

(g) D

0

20

40

60

80

0 2 4 6 8 10 12

ClassesBugs

(h) HITS

0

50

100

150

200

250

0 2 4 6 8 10 12 14 16

ClassesBugs

(i) MR

Figure 3 Tomcat (the x-axis is the group ID and y-axis is the number of classes and bugs)

significant these metrics are and which metrics are betterThree typical correlation analysis methods are used to resolvethis problem Pearson Kendall and Spearman (Sig 119875 lt001) Pearson is widely used in statistics to measure thedegree of the relationship between linear related variables butboth variables should be normally distributed before usingit Kendall and Spearman are nonparametric tests that doesnot assume any assumptions related to certain distributionBut the former basically gives the impression of being apermutation test of sorts the latter is essentially based on theprinciple of least squares

Table 3 gives the correlation coefficient between metricvalue and bug proneness on eight metrics For Tomcatthe coefficient of outD is minimum with 0552 and D ismaximum with 0967 yet CC and PR metrics are not corre-lated Although the correlations with outD in three methodsare minimum the skew distribution curve in Figure 3(f)

represented that most of the bugs and classes are fixed inthe first two groups On the other hand as a result ofno skew phenomenon for Ant outD is more significantthan other metrics in three methods However CC ECinD PR and HITS are all not significant The maximumof outD is up to 0952 The differences mentioned abovesuggest that treating these metrics should be unequally andwith caution Straightforward to apply all metrics will likelymislead interpretations Consequently BC and outD are thesuitable metrics to represent the bug proneness and severityBoth of them will be used for the next prediction

RQ4 How Well Does the Approach to Predicting Bug Prone-nessSeverity with Significant Metrics Work Lastly once wehave understood and characterized the relationships betweennetwork metrics and bug proneness or severity in two opensource projects what are the effects if any on software

8 Mathematical Problems in Engineering

0

20

40

60

80

0 2 4 6 8 10 12

(a) EC

0 2 4 6 8 10 120

20

40

60

80

100

(b) BC

0

20

40

60

80

0 2 4 6 8 10 12

(c) CC

0

20

40

60

80

0 2 4 6 8 10 12

(d) PR

0

20

40

60

80

0 2 4 6 8 10 12

(e) inD

0

20

40

60

80

100

120

0 2 4 6 8 10 12

(f) outD

0

20

40

60

80

100

0 2 4 6 8 10 12

ClassesBugs

(g) D

0

20

40

60

80

0 2 4 6 8 10 12

ClassesBugs

(h) HITS

0

20

40

60

80

100

120

140

0 2 4 6 8 10 12 14 16 18 20

ClassesBugs

(i) MR

Figure 4 Ant (the x-axis is the group ID and y-axis is the number of classes and bugs)

Table 3 Bug proneness the correlation coefficient with eight network metrics

Bug proneness (Sig 119875 lt 001)BC CC D EC inD outD PR HITS

Pearson-TSN minus0804 mdash minus0903 minus0852 minus0852 minus0752 minus0728 minus0933Kendall-TSN minus0764 mdash minus0899 minus0689 minus0689 minus0552 mdash minus0733Spearman-TSN minus0888 mdash minus0967 minus0879 minus0879 minus0656 mdash minus0903Pearson-ASN minus0891 mdash minus0895 mdash mdash minus0915 mdash mdashKendall-ASN minus0782 mdash minus0807 mdash mdash minus0844 mdash mdashSpearman-ASN minus0909 mdash minus0916 mdash mdash minus0952 mdash mdashThe bold value refers to the maximum correlation coefficient of different analysis methods per project by comparing the eight metrics

quality Or what are the benefits to software engineeringpractices

We have learnt that BC and outDmetrics are remarkableto represent bug proneness and severity with the aboveexperiments Whether this conclusion is workable or notwe should validate it through bug proneness and severity

predictionwith the test dataThere are 67 effective bugs in testdata and 87 fixed classes related to these bugs for Tomcat and135 bugs and 149 fixed classes for Ant Given the experimentalrequirements bug severity is divided into two categoriesone is slight category in which bugsrsquo rank is less than 4 butrather the other is severe category Table 4 shows the resulting

Mathematical Problems in Engineering 9

predictive F-value from Top 50 to 200 for Tomcat and fromTop 50 to 125 for Ant using selected metrics on softwarenetworks The reason for top k within a range of 200 and 125for these two cases is to ensure the selected nodes from thefirst group F-value is calculated by integrating precise andrecall as

119865 =2 lowast precise lowast recallprecise + recall

(7)

The BCD metric is a combination of BC and outD usedto compare with each of them and follows the rule of BCD =BC+out119863 For Tomcat BCDmetric performs better on fixedclasses prediction than the others and themaximum is 03316when returning the top 100 nodes While all the nodes ingroup ID = 1 are selected three metrics obtain the sameaccuracy 02857 When it comes to the number of bugs themaximum is 0547 when using outD metric and just top 50classes are needed to check In this column outD and BCDmetrics are better than BCmetric in all casesThe last columnis the results of bug severityThefinding shows outDperformsfar better than the other metrics and the F-value is nearlymore than twice The best case is 06667 at top of 50 nodes

For Ant BCD metric performs better than the other twoon fixed classes and bugs when returning top 50 classesExcept this case outD has the highest accuracy at top 100 and125 and obtains the best result when returning top 100 classesthemaximumof classes bugs and severity are 03534 06383and 04865 respectively Throughout the prediction outD isthemost suitablemetric for predicting the bug proneness andbug severity in a directed class-level software network

Finally few concentrations are devoted to analyze theaverage human consumptions and needed comments whendeveloper wants to overcome one kind of bug In Figure 6one observes that more serious bugs need more people toparticipate and these people do more discussions beforethese bugs are identified as fixed which is consistent withthe common experience except rank = 1 A reasonableexplanation for this exception is that the enhancement bughas the lowest difficulty that many developers are able to findthe problem and offer their own suggestions or revisions

6 Discussion

RQ1 From the perspective of node importance the moreimportant a node is the greater its probability of beingmodified is and more bugs would be involved during thesoftware development process The central classes (nodes)form the skeleton of the entire system especially the nodeswith large betweenness centrality (BC) and outgoing links(outD) The former category of nodes plays an importantrole in bridging the different modules to ensure the normalexecution of the entire system Such nodes are also namedldquobrokerrdquo in network science The latter nodes are morecomplex for they aggregate behaviors frommany other nodesThey give some implications to software testing Nodes withhigh BCmean that they aremore important inmany contextsand have significant external responsibility Bugs occurred inthese nodes will influence the reachability between modulesOn the other hand nodes with a high outD mean that

they aggregate many behaviors of other nodes and have asignificant internal complexity They are more likely to befixed and contain bugs In this part PR and CC metrics arenot significant In summary our results are consistent withthe work in [34] some metrics are related to bug pronenessand the others are not

We conclude that the importance of different metricsof a node plays an inconsistent role in the analysis of bugproneness It is clear that BC and outD are suitable to testprioritization when the developers want to know whichclasses should be paid more attentions In some specialscenarios for example the node importance and out-degreecan also be applied to bug location with regard to therelationship between them

RQ2 RQ3 the hypothesis that a class would be apt to highbug proneness for great metric value and the bugs wouldbe more serious in this class is not always true FortunatelyTable 3 shows the average of BC D and outD metrics hasa significant correlation with software bug proneness Themost likely explanation for strong correlations among somemetrics and bug proneness is that the classes were dividedinto different groups having the same number of classes indescending order during the experiment This treatment canalso represent that the average metric value of precedinggroup is larger than the succeeding one and is correspondingto the top-k recommendation in the subsequent section Theresults show an overall trend between groups so that strongerthan the ones reported generally

In Figure 5 major regression and critical bugs have arelatively high value in both projects Moreover this phe-nomenon represented by major bug is more obvious Addi-tionally Bugs raised from the broker will hinder the transferof information between modules and function scheduling ofentire system As long as the modules cannot work smoothlythe quality of system cannot be guaranteed A class with highoutgoing links hasmore potential bugs but the severity of thebugs is higher Such classes are usually the central componentin a system or the central nodes in a network

As the members of a project team who are more con-cerned about whether they can find serious bugs sometimesIn other words they hope to find more bugs quickly andefficiently while more severe bugs in a limited period of timeOur results provide an appropriate method to alleviate thischallenge by BC and outDmetrics

RQ4 by the contrast among eight network metrics BCand outD are selected to predict bug proneness and severityOverall compared to other metrics a significant advantageof outD is shown in our results It can be applied to identifywhich classes should be modified prior to others The resultalso indicates that severe bugs usually have strong internalcomplexity

Based on our work we acquire somemeaningful answersto the proposed four questions in Section 3 However thereare still some potential limitations and threats to the validityof our work

(i) All datasets used in our experiments are collectedfrom open source projects Apache Tomcat 7 andAnt We know that there are many other available

10 Mathematical Problems in Engineering

Table 4 119865-value fixed classes bugs and bug severity

Classes Bugs SeverityBC outD BCD BC outD BCD BC outD BCD

Top 50 02043 02920 02628 03760 05470 04786 02917 06667 04583Top 100 02567 03102 03316 04072 04431 04551 03288 05069 03836Top 150 02616 02954 02954 03594 03779 03687 02857 04184 02959Top 200 02857 02857 02857 03071 03071 03521 02439 03740 02846Top 50 01307 02111 02312 01946 04221 04322 01481 02613 02613Top 100 02651 03534 03213 03489 06383 05957 02270 04865 04649Top 125 03139 03358 03285 04385 05923 05846 03143 04381 04286The bold value refers to the best 119865-value identified by comparing three metrics (BC outD and BCD) at the top K of predicting fixed classes bugs and bugseverity respectively

0

0001

0002

0003

0004

0005

0006

0 1 2 3 4 5 6 7 8 9

(a)

0

10

20

30

40

50

60

0 1 2 3 4 5 6 7 8 9

(b)

0

02

04

06

08

1

0 1 2 3 4 5 6 7 8 9

(c)

0

0002

0004

0006

0008

001

0012

0 1 2 3 4 5 6 7 8 9

PRHITSBC

(d)

0

5

10

15

20

25

30

0 1 2 3 4 5 6 7 8 9

inDoutDD

(e)

0

01

02

03

04

05

06

0 1 2 3 4 5 6 7 8 9

CCEC minus (E minus 01)

(f)

Figure 5 Bug severity (Tomcat-topAnt-bottom the x-axis is bug rank and y-axis is the mean of metric)

software repositories that are helpful to bug analysisWewould like to improve our approach by combiningmore software resources such as mailing-list forummessages in the future work to enrich the findings

(ii) A limitation of our work is that we choose projectswritten in Java because of the tool used to constructsoftware network can only deal with java source codeWhether our conclusion can explain projects writtenin other language or not it still needs to evaluate

(iii) Theoretically software networkswill evolve over timetherefore differences are inevitable between multiple

versions In this paper we utilize a stable versionof each project as the construction standard of soft-ware network instead of considering the structuraldifference between them Although such treatmentis rough it will not affect the final experimentalresults

(iv) As mentioned at the beginning software system canbe characterized from multiple granularity which isa multi-granularity software network We investigatethe bug proneness and severity through networkmetrics in class level If the results will be more

Mathematical Problems in Engineering 11

0

2

4

6

8

10

12

14

0 1 2 3 4 5 6 7 8 9

TomcatAnt

(a) Comments

0

1

2

3

4

5

6

0 1 2 3 4 5 6 7 8 9

TomcatAnt

(b) Persons

Figure 6 Human factor (the x-axis is bug rank)

delightful from the other granularity it is an attractivetheme

7 Conclusion

We constructed a class-level software networks and intro-duced nine representative and commonly-used networkmetrics to bug prediction An empirical study has beenconducted on open-source project Apache Tomcat 7 fromversion 700 to 7033 and Ant 184 from Feb 2002 toNovember 2012 Our analysis demonstrated that there is astatistically significant relation between classrsquos importanceand the probability that the class is to be modified and buggyAlso classrsquos importance is related to the severity of raisedbugs The result also showed that just about ten percentof efforts paid and the accuracy of our prediction for bugproneness and bug severity can be up to 547 and 667638 and 487 respectively when successively returningthe top 50 and 100 classes within these two cases

We expect that our findings are insightful and can beused to support the design and development of softwarehelping engineers in assessing the risk of adding or droppinga feature in the case of supporting existing dependenciesbetween classes We also believe that our approach can beleveraged in the analysis of bug prediction test prioritizationof others open source softwares Finally our findings alsoprovide additional empirical evidence on the importance ofdependencies between classes to researchers in the socialnetwork analysis domain

The next work will mainly focus on two aspects On theone hand we will collect more open source projects (ieEclipse Mozilla or projects deployed on SourceForge) tovalidate the generality of our approach On the other handwe will further take into account the human factors by

the state of software development are a process of humanparticipation An exploration about what impact will bebrought by sociotechnical congruence on bug prediction isurgent and meaningful

Acknowledgment

This work is supported by the National Basic ResearchProgram of China no 2014CB340401 National Natural Sci-ence Foundation of China nos 61273216 61272111 61202048and 61202032 Wuhan Planning Project of Science andTechnology no 201210621214 the Open Foundation ofJiangsu Provincial Key Laboratory of Electronic Businessno JSEB2012-02 and Zhejiang Provincial Natural ScienceFoundation of China no LQ12F02011

References

[1] J Anvik ldquoAutomating bug report assignmentrdquo in Proceedings ofthe 28th International Conference on Software Engineering 2006(ICSE rsquo06) pp 937ndash940 May 2006

[2] D Radjenovica M Herickob R Torkarc and A ZivkovicbldquoSoftware fault prediction metrics a systematic literaturereviewrdquo Information and Software Technology vol 55 pp 1397ndash1418 2013

[3] T Zimmermann and N Nagappan ldquoPredicting defects usingnetwork analysis on dependency graphsrdquo in Proceedings of the30th International Conference on Software Engineering 2008(ICSE rsquo08) pp 531ndash540 May 2008

[4] A Meneely L Williams W Snipes and J Osborne ldquoPredictingfailures with developer networks and social network analysisrdquoin Proceedings of the 16th ACM SIGSOFT International Sym-posium on the Foundations of Software Engineering (SIGSOFT2008FSE rsquo08) pp 13ndash23 November 2008

12 Mathematical Problems in Engineering

[5] KCrowstonKWeiQ Li and JHowison ldquoCore and peripheryin FreeLibre and Open Source software team communica-tionsrdquo in Proceedings of the 39th Annual Hawaii InternationalConference on System Sciences (HICSS rsquo06) p 118 January 2006

[6] T Zimmerman N Nagappan K Herzig R Premraj and LWilliams ldquoAn empirical study on the relation between depen-dency neighborhoods and failuresrdquo in Proceedings of the 4thIEEE International Conference on Software Testing Verificationand Validation (ICST rsquo11) pp 347ndash356 March 2011

[7] S Valverde R F Cancho and R V Sole ldquoScale-free networksfrom optimal designrdquo Europhysics Letters vol 60 no 4 pp 512ndash517 2002

[8] C R Myers ldquoSoftware systems as complex networks structurefunction and evolvability of software collaboration graphsrdquoPhysical Review E vol 68 no 4 Article ID 046116 15 pages2003

[9] G Concas M Marchesi S Pinna and N Serra ldquoPower-laws ina large object-oriented software systemrdquo IEEE Transactions onSoftware Engineering vol 33 no 10 pp 687ndash708 2007

[10] P Louridas D Spinellis and V Vlachos ldquoPower laws insoftwarerdquo ACM Transactions on Software Engineering andMethodology vol 18 no 1 article 2 2008

[11] B Li YutaoMa and J Liu ldquoAdvances in the studies on complexnetworks of software systemsrdquo Advances in Mechanics vol 38no 6 pp 805ndash814 2008

[12] B Li H Wang Z Y Li K Q He and D H Yu ldquoSoftware com-plexity metrics based on complex networksrdquo Acta ElectronicaSinica vol 34 pp 2371ndash2375 2006

[13] W Pan B Li Y Ma and J Liu ldquoMulti-granularity evolutionanalysis of software using complex network theoryrdquo Journal ofSystems Science and Complexity vol 24 no 6 pp 1068ndash10822011

[14] T Opsahl F Agneessens and J Skvoretz ldquoNode centrality inweighted networks generalizing degree and shortest pathsrdquoSocial Networks vol 32 no 3 pp 245ndash251 2010

[15] M S Zanetti I Scholtes C J Tessone and F SchweitzerldquoCategorizing bugs with social networks a case study on fouropen source software communitiesrdquo in Proceedings of the 35thInternational Conference on Software Engineering (ICSE rsquo13) pp1032ndash1041 2013

[16] M S Zanetti and F Schweitzer ldquoA network perspective on soft-ware modularityrdquo in Proceedings of the International Workshopon Complex Sciences in the Engineering of Computing Systemspp 175ndash186 2012

[17] C Ding X He P Husbands H Zha and H D SimonldquoPageRank HITS and a unified framework for link analysisrdquo inProceedings of the 25th Annual International ACM SIGIR Con-ference on Research and Development in Information Retrievalp 353 354 2002

[18] R Premraj and K Herzig ldquoNetwork versus code metrics topredict defects a replication studyrdquo in Proceedings of the 5thInternational Symposium on Empirical Software Engineering andMeasurement (ESEM rsquo11) pp 215ndash224 September 2011

[19] M Pinzger N Nagappan and B Murphy ldquoCan developer-module networks predict failuresrdquo in Proceedings of the 16thACM SIGSOFT International Symposium on the Foundationsof Software Engineering (SIGSOFT 2008FSE rsquo08) pp 2ndash12November 2008

[20] Y Shin A Meneely LWilliams and J A Osborne ldquoEvaluatingcomplexity code churn and developer activity metrics asindicators of software vulnerabilitiesrdquo IEEE Transactions onSoftware Engineering vol 37 no 6 pp 772ndash787 2011

[21] A Sureka A Goyal and A Rastogi ldquoUsing social networkanalysis for mining collaboration data in a defect trackingsystem for risk and vulnerability analysisrdquo in Proceedings of the4th India Software Engineering Conference (ISEC rsquo11) pp 195ndash204 February 2011

[22] F Perin L Renggli and J Ressia ldquoRanking software artifactsrdquoin Proceedings of the 4th Workshop on FAMIX and Moose inReengineering (ICSM rsquo10) pp 1ndash4 2010

[23] P Bhattacharya M Iliofotou I Neamtiu and M FaloutsosldquoGraphbased analysis and prediction for software evolutionrdquoin Proceedings of the International Conference on SoftwareEngineering (ICSE rsquo12) pp 419ndash429 2012

[24] D Steidl B Hummel and E Juergens ldquoUsing network analysisfor recommendation of central software classesrdquo in Proceedingsof the 19th Working Conference on Reverse Engineering pp 93ndash102 2012

[25] T Wolf A Schroter D Damian and T Nguyen ldquoPredictingbuild failures using social network analysisrdquo in Proceedings ofthe International Conference on Software Engineering (ICSE rsquo09)pp 1ndash11 May 2009

[26] M E J Newman Networks Oxford University Press OxfordUK 2010

[27] S Wasserman and K Faust Social Network Analysis Methodsand Applications Cambridge University Press 1994

[28] L C Freeman ldquoCentrality in social networks conceptual clari-ficationrdquo Social Networks vol 1 no 3 pp 215ndash239 1978

[29] M E J Newman and M Girvan ldquoFinding and evaluatingcommunity structure in networksrdquo Physical Review E vol 69no 2 Article ID 026113 15 pages 2004

[30] A Bachmann C Bird F Rahman P Devanbu and ABernstein ldquoThe missing links bugs and bug-fix commitsrdquo inProceedings of the 18th ACM SIGSOFT International Symposiumon the Foundations of Software Engineering (FSE rsquo10) pp 97ndash106November 2010

[31] RWuH Zhang S Kim and S C Cheung ldquoReLink recoveringlinks between bugs and changesrdquo inProceedings of the 19th ACMSIGSOFT Symposium on Foundations of Software Engineering(SIGSOFTFSE rsquo11) pp 15ndash25 September 2011

[32] G Antoniol K Ayari M Di Penta F Khomh and Y GGueheneuc ldquoIs it a bug or an enhancement A text-basedapproach to classify change requestsrdquo in Proceedings of theConference of the Center for Advanced Studies (CASCON rsquo08)ACM October 2008

[33] K Herzig S Just and A Zeller ldquoItrsquos not a bug itrsquos a feature howmisclassification impacts bug predictionrdquo in Proceedings of the35 International Conference on Software Engineering pp 392ndash401 2013

[34] R Tonelli G Concas MMarchesi and AMurgia ldquoAn analysisof SNA metrics on the Java Qualitas Corpusrdquo in Proceedingsof the 4th India Software Engineering Conference (ISEC rsquo11) pp205ndash213 February 2011

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 2: Research Article Using Software Dependency to Bug Predictiondownloads.hindawi.com/journals/mpe/2013/869356.pdfdefects. ey found that network metrics perform signi - cantly better than

2 Mathematical Problems in Engineering

Given the advantages of open source (ie the opennessof source codes and the available data repositories) thefollowing work is conducted on open-source softwareOpen-source software usually maintains a bug repository(eg Bugzilla (httpwwwbugzillaorg) GNANTS (httpwwwgnuorgsoftwaregnats) and JIRA (httpwwwatlas-siancomsoftwarejira)) With the increasing scale of soft-ware project new features are added and more bugs are ledinto the system According to the statistics [1] Eclipse onaverage 37 bugs was submitted and more than 3 person-hours are spent on handling it per day Mozilla discoversmore than 300 bugs However potential bugs are often farmore than that have been submitted in a system and moreserious How to quickly retrieve much more bugs or seriousbugs is the overarching goal of this work specifically whichmetrics can be adapted to resolve this problem

The rest of this paper is organized as follows Section 2 is areview of related work In Section 3 the preliminary theoriesof software networks and network metrics are hold and theresearch questions are presented and Section 4 focuses on theresearch approach and the construction of software networkSection 5 shows the whole progress of our experiment Afterthat we discuss the empirical results and several limitationsIn Section 7 a conclusion for practice and research is made

2 Related Work

Defect prediction models are used to support softwareinspection and to improve software quality by locatingpossible defects For software bug prediction many softwaremetrics have been proposed The most commonly-used onesare CK Object-oriented metrics also traditional (eg LOC)and process (eg code churn) metrics are often used insome literature [2] With the increasing maturity of complexnetwork theory network metrics are of more concerns suchas centrality and are used to defect prediction [3ndash5]

21 Software Network Many real-world software systems areextremely complex and regarded as complex systems inwhich software entities (eg objects classes and packages)are abstracted as nodes and the dependencies between twonodes as links or edges Dependencies essentially representthe information flow within a software system and exert atleast some influences on the overall success and quality of theproduct [6]

Characterizing large or ultralarge scale (ULS) softwaresystem as a complex networks is rational and a great quantityof literature [7ndash12] has already demonstrated some softwarenetworks characteristics like small-world phenomenon andscale-free distribution of degree Likewise with regard tosoftware being built up out of many interacting units andsubsystems (software entities) at multiple granularity dif-ferent levels of granularity software networks (componentfeature class method etc) have been constructed to study byresearchers [13]

22 Network Metrics Many metrics have been defined inboth complex networks and social network analysis Cen-trality [14 15] was used to assess the relative importance

of nodes in a given network The simplest one is degreecentrality known as the number of connections a node hasto other nodes In a social context degree centrality canbe explained in terms of the potential impact of a node onother nodes However degree centrality does not capture theglobal position of a node in the network in terms of howimportant a node is to all other nodes so furthermeasures areproposed such as closeness centrality betweenness centralityand eigenvector centrality

Modularity [15 16] is a desirable characteristic for soft-ware systems The modularity of software architecture con-tributes to the sustainability of large scale software projectsby fostering the decoupling and cohesiveness of softwaredevelopment Particularly as the software evolves over timemodularity might even facilitate its maintainability andexpendability In some literature besides in-degree and out-degree [8] presenting the complexity and reuse were usedrespectively PageRank and HITS [17] sorting algorithm wereused to handle the directed network

Additionally Zimmermann and Nagappan [3] on Win-dows Server 2003 where the authors leveraged dependencyrelationships between software entities captured using socialnetwork metrics to predict whether they are likely to havedefects They found that network metrics perform signifi-cantly better than source code metrics at predicting defectsPremraj and Herzig [18] presented a replication study basedon ZampN work and found that network metrics undoubtedlyperform much better than code metrics in the stratifiedrandom sampling method

23 Network Metrics for Application Meneely et al [4]examined collaboration structurewith the developer networkderived from code churn information to predict failures atthe file level and developed failure prediction models usingtest and postrelease failure data The result indicated thata significant correlation exists between file-based developernetworkmetrics and failures Crowston et al [5] identified thecore-periphery structure and applied these measures to theinteractions around bug fixing for 116 SourceForge projectsPinzger et al [19] investigated the relationship betweenthe fragmentation of developer contributions measured bynetwork centrality metrics and the number of postreleasefailure with a developer-module network Authors claimedthat central modules weremore likely to be failure prone thanthose located in surrounding areas of the network Shin et al[20] used network analysis to discriminate and predict vul-nerable code locations They evaluated that network metricscan discriminate between vulnerable and neutral files andpredict vulnerabilities Sureka et al [21] derived a collabo-ration network from a defect tracking system and appliednetwork analysis to investigate the derived network for thepurpose of risk and vulnerability analysisThey demonstratedthat important information of risk and vulnerability can beuncovered using network analysis techniques

Dependencies exist between various pieces of com-ponent Modification to components with little regard todependencies may have an adverse impact on the quality ofthe latter Zimmermann and Nagappan [3] proposed to use

Mathematical Problems in Engineering 3

network analysis on software dependency network helpingidentify central program units that are more likely to concealdefects Perin et al [22] used PageRank for ranking classesof Pharo Smalltalk system based on a dependency graphrepresenting class inheritance and reference Bhattacharyaet al [23] constructed software networks at source codeand module level to capture its evolution and estimate bugseverity prioritize refactoring efforts and predict defect-prone release Steidl et al [24] used different network analysismetrics on dependency network to retrieve central classesand demonstrated that the results can compete with thesuggestions of experienced developers Zanetti et al [15]studied bug reporter centrality to validate that the centralityof bug reporters is greatly helpful for bug triaging procedures

To the best of our knowledge bug prediction is a problemthat is still an open question hence we expect that our studycan provide a supplement to the existing solutions Althoughthe prior researches on using network metrics to predictfailure proneness or bug severity our study leverages morerelationships between classes in analyzing the implications ofdependencies Additionally prior researches did not considerboth as is done in this paper We constructed softwarenetwork at class-level granularity and then used networkmetrics to analyze bug proneness and severity Differing fromthe existing work we make some contributions as follows

(1) Five dependency scenes are considered in our class-level software network inheritance field methodreturn and parameter dependency yet part of theserelationships were taken into account in most ofexisting researches

(2) A comparison between different metrics is madefirstly to explore the extent to which they reflect bugquantity and severity after that to model the pre-dictors with significant metrics to improve softwaredevelopment Some researchers directly assemble var-ious network metrics without filtration thus it isinevitable to bring biases so that reducing the accu-racy Our work just bridges this gap by a comparisonin the first place

(3) We select betweenness centrality and out-degreemet-rics for bug proneness and severity prediction Justas 10 efforts are paid an ideal F-value acquiredabout bug proneness and severity is up to 547 and667 and 638 and 487 respectively which iscomparable with the result in [4 25]

3 Preliminary Study

31 Software Networks There are all kinds of networksaround us such as social networks (eg friendship networksand scientific collaboration networks) technological net-works (eg the Internet andWWW) andbiological networks(eg neural networks and interaction of proteins in yeast)Surprisingly the underlying structures of these networkshave completely different statistical features than those ofregular and random networks They all belong to complexnetworks These discoveries have served to draw together

Public class A middot middot middot

middot middot middot

middot middot middot

Public class B extends A

public C Attribute

public void Method(A Arg)

Public class D

public B Atrr1public C Attr2

A B

C D

Public class C

Atrr2Method(Atrr1)

Figure 1 A simple example of software network

many disparate domains into an emerging science of networkscience

With the rapid development of software technology andthe pervasive of Internet software scale and complexity sosharply increase that developers could not control the systemin the global perspective then software plan and qualitycannot be guaranteed In order to overcome the challengessome pioneers of complex systems try to introduce graphtheory that nodes and edges are used to simplify the structureTo our surprise software network also shows the basic char-acteristics of complex networks These exciting discoveriesare paid attentions by researchers from software engineeringThrough the interdisciplinary between complex network andsoftware engineering an approach has been emerging byabstracting a software system as a network that is softwarenetwork

Software network is an interdisciplinary outcome basedon the combination of network science theory and softwareengineering practiceWhen constructing a software networkthe source code should be handled in reverse engineeringmethods (ie compiled the codes into a xml file then derivedthe topology structure from the xml file) Figure 1 is anexample of the fragments of software networks

32 Network Metrics Network metrics treat software entitiesas nodes in a graph and characterize themon the basis of theirdependencieswith other entities As opposed to codemetricsnetwork metrics take into account the interactions betweenentities thus modelling the flow of software information Inthis paper node centrality PageRank and HITS in-out-andtotal degrees andmodularity ratio metrics are involved thenthose significant metrics were used to predict bug pronenessand severity

321 Eigenvector Centrality (EC) It is a measure of theinfluence of a node in a network It assigns relative scores to allnodes in the network based on the concept that connectionsto high-scoring nodes contributemore to the score of currentnode than equal connections to low-scoring nodes and arecomputed by the adjacency matrix For a given network(graph) 119866 = (119881 119864) with |119881| number of nodes let 119860 = (119886

119894119895)

4 Mathematical Problems in Engineering

be the adjacency matrix if vertex 119894 is linked to 119895 so 119886119894119895= 1

and 119886119894119895= 0 otherwise The centrality score of vertex 119894 can be

defined as

119909119894=1

120582sum

119895isin119872(119894)

119909119895=1

120582sum

119895isin119866

119886119894119895119909119895 (1)

where119872(119894) is a set of the neighbors of 119894 and 120582 is a constantOnly the greatest eigenvalue results in the desired centralitymeasure the 119894th component of the eigenvector then gives thecentrality score of the vertex 119894 in the network [26]

322 Betweenness Centrality (BC) It is based on the totalnumber of shortest paths between all possible pairs of nodesthat pass through a node [27] It quantifies frequencies that anode acts as a bridge along the short path between two othernodes If 119892(119904119905)

119894are all shortest path node 119904 and 119905 which pass

through node 119894 and 119899(119904119905) is the shortest path between 119904 and 119905119899 is the total of nodes

BC119894=sum119904lt119905119892(119904119905)

119894119899(119904119905)

(119899 minus 1) (119899 minus 2) (2)

323 Closeness Centrality (CC) It concerned the farness toall other nodes Thus it is defined as the inverse of the sumof all distances to all other nodes [28] If 119889(119894 119895) is the distancebetween node 119894 and 119895 so 1198941015840119904 centrality is

CC119894=1

sum119881

119895=1119889 (119894 119895)

(3)

324 PageRank (PR) It is a variant of the Eigenvectorcentrality measure which is a probability distribution usedto represent the likelihood that while a person randomlyclicking on links will arrive at a particular page The rankvalue indicates the importance of a page which is used hereto denote the importance of a class

PR119894=1 minus 119902

119873+ 119902 sum

119895isin119881(119894)

PR (119895)out119863(119895)

(4)

where 119902 is a damping factor (119902 = 085)119881(119894) is the set of nodesthat link to 119894 out119863(119895) is the number of outgoing edges onnode 119895 and119873 is the total number of nodes in a network

325 HITS It is also originally designed to rank web pageswhich calculates two scores authority and hub scores Theauthority score is computed as the sum of the scaled hubscores that point to that page The hub score is the sum ofthe scaled authority scores of the pages it points to Here theauthority score as a experimental metric is used

326 Degree (D) In the case of directed network it ismeaningful to distinguish in-degree (the number of incom-ing links) from out-degree (the number of outgoing links)For software networks in-degree is a representation of codereuse and out-degree is design complexity Nodes with large

in-degree are reused and with large out-degree are morecomplex in some contexts In our paper we use in119863(119894)out119863(119894) and 119863 representing in-degree out-degree and totaldegree of a node respectively

327 Modularity Ratio (MR) Modularity is a metric used toevaluate the quality of partitioning community in a networkby Newman and Girvan [29] A system consists of manypackages and each package has a large number of classes orsubpackages For a given definition ofmodules or clusters andtheir underlying network structure its respective degree ofmodularity is defined by

Q =sum119899

119894119890119894119894minus sum119899

119894119886119894119887119894

1 minus sum119899

119894119886119894119887119894

(5)

where 119890119894119895is the fraction of all edges that link nodes in module

119894 to nodes in module 119895 119886119894= sum119899

119895119890119894119895 119887119894= sum119899

119895119890119895119894(the sum

of column and row resp) while 119899 is the total number ofexistingmodules According to this equation we define aMRof modular 119894 as follows

MR119894= 119890119894119894minus 119886119894119887119894 (6)

33 Network Metrics for Application Before presenting theresearch questions and the details of our approach we givetwo terminologies used henceforth in this paper whichindicate the practical application of our work

(i) Bug proneness Bug proneness is treated as a qualitymeasure of software entries in this paper An intuitiveunderstanding is that it represents the probability thatan entry (ie class) will become buggy in the processof debuggingThe higher the bug proneness of a classis the more likely it will be given priority to testingFor our purpose we valuate it via the number of bugsand fixed classes

(ii) Bug severity This terminology describes the impactof a bug on the entire system Bug tracking systemsclassify the severity into several levels such as criticalmajor minor and trivial However severity andpriority is not the same concept In some cases a bugof critical severity but the priority to resolve it is lowas this will happen rarely in real scenario Here wewould like to investigate that the importance of a classin the software network is related to the severity of bugcaused by it

With these metrics and application scenes the followingfour research questions are proposed to guide the designof the case study Questions 1ndash3 investigate the propertiesof software networks whereas question 4 predicts the bugproneness and severity with the significant metrics

RQ1 Is the position of classes in the software networkrelated to the bug proneness

RQ2 Is the position of classes in the software networkrelated to the bug severity

RQ3 If so which metrics are more significant

Mathematical Problems in Engineering 5

RQ4 How well does the approach to predicting bug prone-nessseverity with significant metrics work

4 Approach

In our approach we mainly consider two Open SourceprojectsmdashTomcat and Antmdashwritten in Java as our researchsubjects in the Apache Software Foundation (ASF) (httpantapacheorg) and (httptomcatapacheorg) The ratio-nale is threefold (1)They are well-known and stable projectseach has undergone a number of major release cycles and isstill under active development (2)The source code revisionarchives and SVN commit-log dating back several years andbug reports are available it is a good chance for rewardingexperience in Open Source projects (3) The choice of Javaprogramming language is limited by the tools developedto construct software networks and we are interested inunderstanding open source software written in Java

We observed the information provided by Bug trackingsystems (Bugzilla and JIRA) and SVN commits found thatmost bugs have given out the detailed classes and somehave already posted the modified fragment of source codesThis information determines the feasibility of our approachBesides the dependencies between classes which are directedshould not be ignored Therefore we decide to abstract thedirected but unweighted software networks at the class level

Let us assume that 119866 = (119881 119864) is a software networkdefinition 119881 = V

119894 is the set of all classes and 119864 = 119890

119894119895 is

the set of all dependencies We distinguish between differentkinds of dependencies as follows

(i) V119894implementingextending the interfaceclass V

119895is

inheritance dependency

(ii) V119894having a field of type V

119895is field dependency

(iii) V119894calling a method of V

119895ismethod dependency

(iv) Amethod of V119894returning an object of type V

119895is return

dependency

(v) A method of V119894taking an object of V

119895as a parameter

is parameter dependency

A bug report yet offers many fields to help developersunderstand this issue one of which is severity assigned basedon how severely a bug affects the program by administratorsTable 1 shows eight levels of bug severity and their ranksin the Bugzilla An objective for software providers is notonly to minimize the total number of bugs but to ensurethat bugsrsquo severity as low as possible [23] The approachmainly consists of four phases (1) compiling the sourcecode files to extract the class-level directed software networksregarding the dependency requirements (2) exporting SVNcommits and integrating themwith bug reports to obtain ournecessary bug-class association relationships (3) calculatinga set of networkmetrics in the established software networksthen acquire their relationships with the number of bugs andbug severity (4) finally applying the significant metrics tothe analysis of bug prediction The framework is shown inFigure 2

Table 1 Levels of bug severity and their ranks in the Bugzilla bugtracking system

Bug severity RankBlocker 8Critical 7Regression 6Major 5Normal 4Minor 3Trivial 2Enhancement 1

EC

BC

CC

BugzillaBug-class

Bug severity

Bug proneness

Source code

Software network

Changelog

(1)

(2)

(3)

(4)

MR

PRHITS

inDoutDD

Figure 2 The framework of our approach

5 Case Study

This section presents the subject projects used for our studyand the relevant data we collected With the experimentalresults the proposed questions are answered in turn

51 Data For our analysis we collect a number of types ofdata We gather the open source software data from sourcecode repositories and bug database and determine the rankof each bug In this paper we choose Tomcat 7 and Ant184 as our experimental subjects As a successful opensource project both Tomcat and Ant maintain are a publiclyavailable source To construct expected software networksource codes are compiled to xml file then software networkis established by parsing the xml file In the subsequentpart we simplified Tomcat software network as TSN andAnt software network as ASN There are 35 versions forTomcat 7The latest version is continuously renovating so farthereby only the last 34 versions are adopted in our workBut few differences in the number of nodesedges betweeneach version A likely explanation for this is that Tomcat is a

6 Mathematical Problems in Engineering

Table 2 The statistics The number of bugs and fixed classesnodesedges of software network

Tomcat Antbugs 475 480classes 422(21) 389(29)SN nodes 2015 1345SN edges 9573 3937community 19 20

relatively mature open source software project with decadedevelopment history Of course not all projects keep thisphenomenon As we known that some projects have a nearlysuper linear growth [13] At last stable Tomcat 7029 versionis used to analyze consists of 2015 nodes and 9573 edges 19communities 1345 nodes and 3937 edges 20 communities forAnt

The bug reports are comprised of two sets one used forthe experiment and the other for prediction That is the firstthirty versions of Tomcat are utilized for progress (2) and(3) yet the last four versions for prediction Owing to Antdata refers to only one version eighty percent of Ant datais used for experiment and the remaining twenty percentfor prediction Note that studies of software defect rely onlinks between bug databases and commit changes and thisinformation plays an important role inmeasuring quality andpredicting defects Some prior researches suggested that thereare some missing links between bugs and bug-fix commitlogsThey proposed automatic approaches to recovermissinglinks [30 31] and found that the missing links lead to biaseddefect information and it affects prediction performanceHowever some authors argued that a bug may not be abug but a feature or an enhancement [32 33] This mis-classification also introduces bias in bug prediction modelsWhile considering the missing links the false position ofbug classification increases Hence only bugs always reportedas a bug in the bug tracker are taken into account Onceeffective bugs are identified one checks the change logsby searching for bug ID and calculating the similarity ofsummary text The reason for using heuristics is that mostof missing links are developed due to misclassification in ourdatasets The statistics of the experimental data are shown inTable 2 There are some test classes that are not taken intoaccount because they are duplicate In bug 53062 for exam-ple both class orgapachecatalinaconnectorTestResponse andorgapachecatalinaconnectorResponse appear but they rep-resent the same object in our study Note that less than thirtypercent of the classes are involved it is confirmed thatmost ofthe bugs occur in a small number of classes So all we shoulddo is to have a guiding detection in a system

52 Results

RQ1 Is the Position of Classes in the Software Network Relatedto the Bug Proneness To answer this research question firstof all the nodes are divided into different groups by metricvalue in descending order and each group keeps the same

number of nodes The reason for doing this is to facilitatethe exploration for relations between the metrics and bugsOwing to the last 15 nodes are never fixed in TSN thus theyare excluded to keep each group of 200 nodes For ASNin which nodes are divided into eleven groups evenly Thenumber of fixed classes and involved bugs in each group arerecordedWith this information the answer to1198761 is not hardto hold

The number of fixed classes and involved bugs for differ-ent metrics in each group are illustrated in Figures 3 and 4 InFigure 3 EC inD D and HITS have a significantly negativecorrelation (the bigger the average metric value of a class themore bug-prone) BC and outD show a skew distribution seeFigures 3(a) 3(b) 3(e) 3(f) 3(g) and 3(h) There are somedifferences in Figure 4 BC outD and D show a significantcorrelation but do not an obviously skew distribution seeFigures 4(b) 4(f) and 4(g) The results of other metrics arefluctuated especially in groups six eight and nine

MR shows the same relationship with fixed classes andbugs A few differences from the above metrics are thatthe distribution of bugs is higher than classesrsquo and morefluctuated In subfigure (i) the x-axis is the rank of modular(community) the larger modularity ratio the higher its rankis An overall positive trend is that the larger metric value of aclass is the more it will be bug-prone and themore it is likelyto be modified is shown especially BC outD andDmetrics

RQ2 Is the Position of Classes in the Software NetworkRelated to the Bug SeverityUnderstanding and characterizingthe relationships between network metrics and bug severityin open-source software projects is also a very interestingproblem Although the value of metric is positively relatedto bug proneness whether it is related to bug severity theanswer may be not because there exist a lot of bugs in aclass but all of them are not serious We conduct the nextexperiment to validate this research questionwith the severityrank given in Table 1 The results in Figure 5 show that therelationship deviates what was expected CC keeps stablein both projects the overall distribution trend is increasedfirst and then decreased The top row depicts the severity ofTomcat and the below row does Ant

Rank 5 and 7 are two prominent breakpoints marked bythe dotted line in each subfigure they refer to the majorand critical bugs respectively Bugs with rank 8 have thelarge metric value in TSN on the contrary this kind of bugshave the relatively low value in ASN Blocker bugs shouldbe avoided as much as possible in software engineeringpracticesOnce this kind of bug occurs and is found the originis where it may trigger that a lot of other parts are modifiedor the problem is only a small mistake but affects an vitalnode

RQ3 If So Which Metrics Are More Significant Differentmetrics measure the importance of a node from differentaspects in a network According to the results obtained fromRQ1 RQ2 we know that not all the metrics have the sameexpected relationship somemetrics may bemore suitable forbug proneness prediction and others for severity predictionIn this section another work is conducted to analyze how

Mathematical Problems in Engineering 7

0

20

40

60

80

0 2 4 6 8 10 12

(a) EC

0

50

100

150

0 2 4 6 8 10 12

(b) BC

0

10

20

30

40

50

60

70

0 2 4 6 8 10 12

(c) CC

0

50

100

150

0 2 4 6 8 10 12

(d) PR

0

20

40

60

80

0 2 4 6 8 10 12

(e) inD

0

50

100

150

0 2 4 6 8 10 12

(f) outD

0

20

40

60

80

100

120

0 2 4 6 8 10 12

ClassesBugs

(g) D

0

20

40

60

80

0 2 4 6 8 10 12

ClassesBugs

(h) HITS

0

50

100

150

200

250

0 2 4 6 8 10 12 14 16

ClassesBugs

(i) MR

Figure 3 Tomcat (the x-axis is the group ID and y-axis is the number of classes and bugs)

significant these metrics are and which metrics are betterThree typical correlation analysis methods are used to resolvethis problem Pearson Kendall and Spearman (Sig 119875 lt001) Pearson is widely used in statistics to measure thedegree of the relationship between linear related variables butboth variables should be normally distributed before usingit Kendall and Spearman are nonparametric tests that doesnot assume any assumptions related to certain distributionBut the former basically gives the impression of being apermutation test of sorts the latter is essentially based on theprinciple of least squares

Table 3 gives the correlation coefficient between metricvalue and bug proneness on eight metrics For Tomcatthe coefficient of outD is minimum with 0552 and D ismaximum with 0967 yet CC and PR metrics are not corre-lated Although the correlations with outD in three methodsare minimum the skew distribution curve in Figure 3(f)

represented that most of the bugs and classes are fixed inthe first two groups On the other hand as a result ofno skew phenomenon for Ant outD is more significantthan other metrics in three methods However CC ECinD PR and HITS are all not significant The maximumof outD is up to 0952 The differences mentioned abovesuggest that treating these metrics should be unequally andwith caution Straightforward to apply all metrics will likelymislead interpretations Consequently BC and outD are thesuitable metrics to represent the bug proneness and severityBoth of them will be used for the next prediction

RQ4 How Well Does the Approach to Predicting Bug Prone-nessSeverity with Significant Metrics Work Lastly once wehave understood and characterized the relationships betweennetwork metrics and bug proneness or severity in two opensource projects what are the effects if any on software

8 Mathematical Problems in Engineering

0

20

40

60

80

0 2 4 6 8 10 12

(a) EC

0 2 4 6 8 10 120

20

40

60

80

100

(b) BC

0

20

40

60

80

0 2 4 6 8 10 12

(c) CC

0

20

40

60

80

0 2 4 6 8 10 12

(d) PR

0

20

40

60

80

0 2 4 6 8 10 12

(e) inD

0

20

40

60

80

100

120

0 2 4 6 8 10 12

(f) outD

0

20

40

60

80

100

0 2 4 6 8 10 12

ClassesBugs

(g) D

0

20

40

60

80

0 2 4 6 8 10 12

ClassesBugs

(h) HITS

0

20

40

60

80

100

120

140

0 2 4 6 8 10 12 14 16 18 20

ClassesBugs

(i) MR

Figure 4 Ant (the x-axis is the group ID and y-axis is the number of classes and bugs)

Table 3 Bug proneness the correlation coefficient with eight network metrics

Bug proneness (Sig 119875 lt 001)BC CC D EC inD outD PR HITS

Pearson-TSN minus0804 mdash minus0903 minus0852 minus0852 minus0752 minus0728 minus0933Kendall-TSN minus0764 mdash minus0899 minus0689 minus0689 minus0552 mdash minus0733Spearman-TSN minus0888 mdash minus0967 minus0879 minus0879 minus0656 mdash minus0903Pearson-ASN minus0891 mdash minus0895 mdash mdash minus0915 mdash mdashKendall-ASN minus0782 mdash minus0807 mdash mdash minus0844 mdash mdashSpearman-ASN minus0909 mdash minus0916 mdash mdash minus0952 mdash mdashThe bold value refers to the maximum correlation coefficient of different analysis methods per project by comparing the eight metrics

quality Or what are the benefits to software engineeringpractices

We have learnt that BC and outDmetrics are remarkableto represent bug proneness and severity with the aboveexperiments Whether this conclusion is workable or notwe should validate it through bug proneness and severity

predictionwith the test dataThere are 67 effective bugs in testdata and 87 fixed classes related to these bugs for Tomcat and135 bugs and 149 fixed classes for Ant Given the experimentalrequirements bug severity is divided into two categoriesone is slight category in which bugsrsquo rank is less than 4 butrather the other is severe category Table 4 shows the resulting

Mathematical Problems in Engineering 9

predictive F-value from Top 50 to 200 for Tomcat and fromTop 50 to 125 for Ant using selected metrics on softwarenetworks The reason for top k within a range of 200 and 125for these two cases is to ensure the selected nodes from thefirst group F-value is calculated by integrating precise andrecall as

119865 =2 lowast precise lowast recallprecise + recall

(7)

The BCD metric is a combination of BC and outD usedto compare with each of them and follows the rule of BCD =BC+out119863 For Tomcat BCDmetric performs better on fixedclasses prediction than the others and themaximum is 03316when returning the top 100 nodes While all the nodes ingroup ID = 1 are selected three metrics obtain the sameaccuracy 02857 When it comes to the number of bugs themaximum is 0547 when using outD metric and just top 50classes are needed to check In this column outD and BCDmetrics are better than BCmetric in all casesThe last columnis the results of bug severityThefinding shows outDperformsfar better than the other metrics and the F-value is nearlymore than twice The best case is 06667 at top of 50 nodes

For Ant BCD metric performs better than the other twoon fixed classes and bugs when returning top 50 classesExcept this case outD has the highest accuracy at top 100 and125 and obtains the best result when returning top 100 classesthemaximumof classes bugs and severity are 03534 06383and 04865 respectively Throughout the prediction outD isthemost suitablemetric for predicting the bug proneness andbug severity in a directed class-level software network

Finally few concentrations are devoted to analyze theaverage human consumptions and needed comments whendeveloper wants to overcome one kind of bug In Figure 6one observes that more serious bugs need more people toparticipate and these people do more discussions beforethese bugs are identified as fixed which is consistent withthe common experience except rank = 1 A reasonableexplanation for this exception is that the enhancement bughas the lowest difficulty that many developers are able to findthe problem and offer their own suggestions or revisions

6 Discussion

RQ1 From the perspective of node importance the moreimportant a node is the greater its probability of beingmodified is and more bugs would be involved during thesoftware development process The central classes (nodes)form the skeleton of the entire system especially the nodeswith large betweenness centrality (BC) and outgoing links(outD) The former category of nodes plays an importantrole in bridging the different modules to ensure the normalexecution of the entire system Such nodes are also namedldquobrokerrdquo in network science The latter nodes are morecomplex for they aggregate behaviors frommany other nodesThey give some implications to software testing Nodes withhigh BCmean that they aremore important inmany contextsand have significant external responsibility Bugs occurred inthese nodes will influence the reachability between modulesOn the other hand nodes with a high outD mean that

they aggregate many behaviors of other nodes and have asignificant internal complexity They are more likely to befixed and contain bugs In this part PR and CC metrics arenot significant In summary our results are consistent withthe work in [34] some metrics are related to bug pronenessand the others are not

We conclude that the importance of different metricsof a node plays an inconsistent role in the analysis of bugproneness It is clear that BC and outD are suitable to testprioritization when the developers want to know whichclasses should be paid more attentions In some specialscenarios for example the node importance and out-degreecan also be applied to bug location with regard to therelationship between them

RQ2 RQ3 the hypothesis that a class would be apt to highbug proneness for great metric value and the bugs wouldbe more serious in this class is not always true FortunatelyTable 3 shows the average of BC D and outD metrics hasa significant correlation with software bug proneness Themost likely explanation for strong correlations among somemetrics and bug proneness is that the classes were dividedinto different groups having the same number of classes indescending order during the experiment This treatment canalso represent that the average metric value of precedinggroup is larger than the succeeding one and is correspondingto the top-k recommendation in the subsequent section Theresults show an overall trend between groups so that strongerthan the ones reported generally

In Figure 5 major regression and critical bugs have arelatively high value in both projects Moreover this phe-nomenon represented by major bug is more obvious Addi-tionally Bugs raised from the broker will hinder the transferof information between modules and function scheduling ofentire system As long as the modules cannot work smoothlythe quality of system cannot be guaranteed A class with highoutgoing links hasmore potential bugs but the severity of thebugs is higher Such classes are usually the central componentin a system or the central nodes in a network

As the members of a project team who are more con-cerned about whether they can find serious bugs sometimesIn other words they hope to find more bugs quickly andefficiently while more severe bugs in a limited period of timeOur results provide an appropriate method to alleviate thischallenge by BC and outDmetrics

RQ4 by the contrast among eight network metrics BCand outD are selected to predict bug proneness and severityOverall compared to other metrics a significant advantageof outD is shown in our results It can be applied to identifywhich classes should be modified prior to others The resultalso indicates that severe bugs usually have strong internalcomplexity

Based on our work we acquire somemeaningful answersto the proposed four questions in Section 3 However thereare still some potential limitations and threats to the validityof our work

(i) All datasets used in our experiments are collectedfrom open source projects Apache Tomcat 7 andAnt We know that there are many other available

10 Mathematical Problems in Engineering

Table 4 119865-value fixed classes bugs and bug severity

Classes Bugs SeverityBC outD BCD BC outD BCD BC outD BCD

Top 50 02043 02920 02628 03760 05470 04786 02917 06667 04583Top 100 02567 03102 03316 04072 04431 04551 03288 05069 03836Top 150 02616 02954 02954 03594 03779 03687 02857 04184 02959Top 200 02857 02857 02857 03071 03071 03521 02439 03740 02846Top 50 01307 02111 02312 01946 04221 04322 01481 02613 02613Top 100 02651 03534 03213 03489 06383 05957 02270 04865 04649Top 125 03139 03358 03285 04385 05923 05846 03143 04381 04286The bold value refers to the best 119865-value identified by comparing three metrics (BC outD and BCD) at the top K of predicting fixed classes bugs and bugseverity respectively

0

0001

0002

0003

0004

0005

0006

0 1 2 3 4 5 6 7 8 9

(a)

0

10

20

30

40

50

60

0 1 2 3 4 5 6 7 8 9

(b)

0

02

04

06

08

1

0 1 2 3 4 5 6 7 8 9

(c)

0

0002

0004

0006

0008

001

0012

0 1 2 3 4 5 6 7 8 9

PRHITSBC

(d)

0

5

10

15

20

25

30

0 1 2 3 4 5 6 7 8 9

inDoutDD

(e)

0

01

02

03

04

05

06

0 1 2 3 4 5 6 7 8 9

CCEC minus (E minus 01)

(f)

Figure 5 Bug severity (Tomcat-topAnt-bottom the x-axis is bug rank and y-axis is the mean of metric)

software repositories that are helpful to bug analysisWewould like to improve our approach by combiningmore software resources such as mailing-list forummessages in the future work to enrich the findings

(ii) A limitation of our work is that we choose projectswritten in Java because of the tool used to constructsoftware network can only deal with java source codeWhether our conclusion can explain projects writtenin other language or not it still needs to evaluate

(iii) Theoretically software networkswill evolve over timetherefore differences are inevitable between multiple

versions In this paper we utilize a stable versionof each project as the construction standard of soft-ware network instead of considering the structuraldifference between them Although such treatmentis rough it will not affect the final experimentalresults

(iv) As mentioned at the beginning software system canbe characterized from multiple granularity which isa multi-granularity software network We investigatethe bug proneness and severity through networkmetrics in class level If the results will be more

Mathematical Problems in Engineering 11

0

2

4

6

8

10

12

14

0 1 2 3 4 5 6 7 8 9

TomcatAnt

(a) Comments

0

1

2

3

4

5

6

0 1 2 3 4 5 6 7 8 9

TomcatAnt

(b) Persons

Figure 6 Human factor (the x-axis is bug rank)

delightful from the other granularity it is an attractivetheme

7 Conclusion

We constructed a class-level software networks and intro-duced nine representative and commonly-used networkmetrics to bug prediction An empirical study has beenconducted on open-source project Apache Tomcat 7 fromversion 700 to 7033 and Ant 184 from Feb 2002 toNovember 2012 Our analysis demonstrated that there is astatistically significant relation between classrsquos importanceand the probability that the class is to be modified and buggyAlso classrsquos importance is related to the severity of raisedbugs The result also showed that just about ten percentof efforts paid and the accuracy of our prediction for bugproneness and bug severity can be up to 547 and 667638 and 487 respectively when successively returningthe top 50 and 100 classes within these two cases

We expect that our findings are insightful and can beused to support the design and development of softwarehelping engineers in assessing the risk of adding or droppinga feature in the case of supporting existing dependenciesbetween classes We also believe that our approach can beleveraged in the analysis of bug prediction test prioritizationof others open source softwares Finally our findings alsoprovide additional empirical evidence on the importance ofdependencies between classes to researchers in the socialnetwork analysis domain

The next work will mainly focus on two aspects On theone hand we will collect more open source projects (ieEclipse Mozilla or projects deployed on SourceForge) tovalidate the generality of our approach On the other handwe will further take into account the human factors by

the state of software development are a process of humanparticipation An exploration about what impact will bebrought by sociotechnical congruence on bug prediction isurgent and meaningful

Acknowledgment

This work is supported by the National Basic ResearchProgram of China no 2014CB340401 National Natural Sci-ence Foundation of China nos 61273216 61272111 61202048and 61202032 Wuhan Planning Project of Science andTechnology no 201210621214 the Open Foundation ofJiangsu Provincial Key Laboratory of Electronic Businessno JSEB2012-02 and Zhejiang Provincial Natural ScienceFoundation of China no LQ12F02011

References

[1] J Anvik ldquoAutomating bug report assignmentrdquo in Proceedings ofthe 28th International Conference on Software Engineering 2006(ICSE rsquo06) pp 937ndash940 May 2006

[2] D Radjenovica M Herickob R Torkarc and A ZivkovicbldquoSoftware fault prediction metrics a systematic literaturereviewrdquo Information and Software Technology vol 55 pp 1397ndash1418 2013

[3] T Zimmermann and N Nagappan ldquoPredicting defects usingnetwork analysis on dependency graphsrdquo in Proceedings of the30th International Conference on Software Engineering 2008(ICSE rsquo08) pp 531ndash540 May 2008

[4] A Meneely L Williams W Snipes and J Osborne ldquoPredictingfailures with developer networks and social network analysisrdquoin Proceedings of the 16th ACM SIGSOFT International Sym-posium on the Foundations of Software Engineering (SIGSOFT2008FSE rsquo08) pp 13ndash23 November 2008

12 Mathematical Problems in Engineering

[5] KCrowstonKWeiQ Li and JHowison ldquoCore and peripheryin FreeLibre and Open Source software team communica-tionsrdquo in Proceedings of the 39th Annual Hawaii InternationalConference on System Sciences (HICSS rsquo06) p 118 January 2006

[6] T Zimmerman N Nagappan K Herzig R Premraj and LWilliams ldquoAn empirical study on the relation between depen-dency neighborhoods and failuresrdquo in Proceedings of the 4thIEEE International Conference on Software Testing Verificationand Validation (ICST rsquo11) pp 347ndash356 March 2011

[7] S Valverde R F Cancho and R V Sole ldquoScale-free networksfrom optimal designrdquo Europhysics Letters vol 60 no 4 pp 512ndash517 2002

[8] C R Myers ldquoSoftware systems as complex networks structurefunction and evolvability of software collaboration graphsrdquoPhysical Review E vol 68 no 4 Article ID 046116 15 pages2003

[9] G Concas M Marchesi S Pinna and N Serra ldquoPower-laws ina large object-oriented software systemrdquo IEEE Transactions onSoftware Engineering vol 33 no 10 pp 687ndash708 2007

[10] P Louridas D Spinellis and V Vlachos ldquoPower laws insoftwarerdquo ACM Transactions on Software Engineering andMethodology vol 18 no 1 article 2 2008

[11] B Li YutaoMa and J Liu ldquoAdvances in the studies on complexnetworks of software systemsrdquo Advances in Mechanics vol 38no 6 pp 805ndash814 2008

[12] B Li H Wang Z Y Li K Q He and D H Yu ldquoSoftware com-plexity metrics based on complex networksrdquo Acta ElectronicaSinica vol 34 pp 2371ndash2375 2006

[13] W Pan B Li Y Ma and J Liu ldquoMulti-granularity evolutionanalysis of software using complex network theoryrdquo Journal ofSystems Science and Complexity vol 24 no 6 pp 1068ndash10822011

[14] T Opsahl F Agneessens and J Skvoretz ldquoNode centrality inweighted networks generalizing degree and shortest pathsrdquoSocial Networks vol 32 no 3 pp 245ndash251 2010

[15] M S Zanetti I Scholtes C J Tessone and F SchweitzerldquoCategorizing bugs with social networks a case study on fouropen source software communitiesrdquo in Proceedings of the 35thInternational Conference on Software Engineering (ICSE rsquo13) pp1032ndash1041 2013

[16] M S Zanetti and F Schweitzer ldquoA network perspective on soft-ware modularityrdquo in Proceedings of the International Workshopon Complex Sciences in the Engineering of Computing Systemspp 175ndash186 2012

[17] C Ding X He P Husbands H Zha and H D SimonldquoPageRank HITS and a unified framework for link analysisrdquo inProceedings of the 25th Annual International ACM SIGIR Con-ference on Research and Development in Information Retrievalp 353 354 2002

[18] R Premraj and K Herzig ldquoNetwork versus code metrics topredict defects a replication studyrdquo in Proceedings of the 5thInternational Symposium on Empirical Software Engineering andMeasurement (ESEM rsquo11) pp 215ndash224 September 2011

[19] M Pinzger N Nagappan and B Murphy ldquoCan developer-module networks predict failuresrdquo in Proceedings of the 16thACM SIGSOFT International Symposium on the Foundationsof Software Engineering (SIGSOFT 2008FSE rsquo08) pp 2ndash12November 2008

[20] Y Shin A Meneely LWilliams and J A Osborne ldquoEvaluatingcomplexity code churn and developer activity metrics asindicators of software vulnerabilitiesrdquo IEEE Transactions onSoftware Engineering vol 37 no 6 pp 772ndash787 2011

[21] A Sureka A Goyal and A Rastogi ldquoUsing social networkanalysis for mining collaboration data in a defect trackingsystem for risk and vulnerability analysisrdquo in Proceedings of the4th India Software Engineering Conference (ISEC rsquo11) pp 195ndash204 February 2011

[22] F Perin L Renggli and J Ressia ldquoRanking software artifactsrdquoin Proceedings of the 4th Workshop on FAMIX and Moose inReengineering (ICSM rsquo10) pp 1ndash4 2010

[23] P Bhattacharya M Iliofotou I Neamtiu and M FaloutsosldquoGraphbased analysis and prediction for software evolutionrdquoin Proceedings of the International Conference on SoftwareEngineering (ICSE rsquo12) pp 419ndash429 2012

[24] D Steidl B Hummel and E Juergens ldquoUsing network analysisfor recommendation of central software classesrdquo in Proceedingsof the 19th Working Conference on Reverse Engineering pp 93ndash102 2012

[25] T Wolf A Schroter D Damian and T Nguyen ldquoPredictingbuild failures using social network analysisrdquo in Proceedings ofthe International Conference on Software Engineering (ICSE rsquo09)pp 1ndash11 May 2009

[26] M E J Newman Networks Oxford University Press OxfordUK 2010

[27] S Wasserman and K Faust Social Network Analysis Methodsand Applications Cambridge University Press 1994

[28] L C Freeman ldquoCentrality in social networks conceptual clari-ficationrdquo Social Networks vol 1 no 3 pp 215ndash239 1978

[29] M E J Newman and M Girvan ldquoFinding and evaluatingcommunity structure in networksrdquo Physical Review E vol 69no 2 Article ID 026113 15 pages 2004

[30] A Bachmann C Bird F Rahman P Devanbu and ABernstein ldquoThe missing links bugs and bug-fix commitsrdquo inProceedings of the 18th ACM SIGSOFT International Symposiumon the Foundations of Software Engineering (FSE rsquo10) pp 97ndash106November 2010

[31] RWuH Zhang S Kim and S C Cheung ldquoReLink recoveringlinks between bugs and changesrdquo inProceedings of the 19th ACMSIGSOFT Symposium on Foundations of Software Engineering(SIGSOFTFSE rsquo11) pp 15ndash25 September 2011

[32] G Antoniol K Ayari M Di Penta F Khomh and Y GGueheneuc ldquoIs it a bug or an enhancement A text-basedapproach to classify change requestsrdquo in Proceedings of theConference of the Center for Advanced Studies (CASCON rsquo08)ACM October 2008

[33] K Herzig S Just and A Zeller ldquoItrsquos not a bug itrsquos a feature howmisclassification impacts bug predictionrdquo in Proceedings of the35 International Conference on Software Engineering pp 392ndash401 2013

[34] R Tonelli G Concas MMarchesi and AMurgia ldquoAn analysisof SNA metrics on the Java Qualitas Corpusrdquo in Proceedingsof the 4th India Software Engineering Conference (ISEC rsquo11) pp205ndash213 February 2011

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 3: Research Article Using Software Dependency to Bug Predictiondownloads.hindawi.com/journals/mpe/2013/869356.pdfdefects. ey found that network metrics perform signi - cantly better than

Mathematical Problems in Engineering 3

network analysis on software dependency network helpingidentify central program units that are more likely to concealdefects Perin et al [22] used PageRank for ranking classesof Pharo Smalltalk system based on a dependency graphrepresenting class inheritance and reference Bhattacharyaet al [23] constructed software networks at source codeand module level to capture its evolution and estimate bugseverity prioritize refactoring efforts and predict defect-prone release Steidl et al [24] used different network analysismetrics on dependency network to retrieve central classesand demonstrated that the results can compete with thesuggestions of experienced developers Zanetti et al [15]studied bug reporter centrality to validate that the centralityof bug reporters is greatly helpful for bug triaging procedures

To the best of our knowledge bug prediction is a problemthat is still an open question hence we expect that our studycan provide a supplement to the existing solutions Althoughthe prior researches on using network metrics to predictfailure proneness or bug severity our study leverages morerelationships between classes in analyzing the implications ofdependencies Additionally prior researches did not considerboth as is done in this paper We constructed softwarenetwork at class-level granularity and then used networkmetrics to analyze bug proneness and severity Differing fromthe existing work we make some contributions as follows

(1) Five dependency scenes are considered in our class-level software network inheritance field methodreturn and parameter dependency yet part of theserelationships were taken into account in most ofexisting researches

(2) A comparison between different metrics is madefirstly to explore the extent to which they reflect bugquantity and severity after that to model the pre-dictors with significant metrics to improve softwaredevelopment Some researchers directly assemble var-ious network metrics without filtration thus it isinevitable to bring biases so that reducing the accu-racy Our work just bridges this gap by a comparisonin the first place

(3) We select betweenness centrality and out-degreemet-rics for bug proneness and severity prediction Justas 10 efforts are paid an ideal F-value acquiredabout bug proneness and severity is up to 547 and667 and 638 and 487 respectively which iscomparable with the result in [4 25]

3 Preliminary Study

31 Software Networks There are all kinds of networksaround us such as social networks (eg friendship networksand scientific collaboration networks) technological net-works (eg the Internet andWWW) andbiological networks(eg neural networks and interaction of proteins in yeast)Surprisingly the underlying structures of these networkshave completely different statistical features than those ofregular and random networks They all belong to complexnetworks These discoveries have served to draw together

Public class A middot middot middot

middot middot middot

middot middot middot

Public class B extends A

public C Attribute

public void Method(A Arg)

Public class D

public B Atrr1public C Attr2

A B

C D

Public class C

Atrr2Method(Atrr1)

Figure 1 A simple example of software network

many disparate domains into an emerging science of networkscience

With the rapid development of software technology andthe pervasive of Internet software scale and complexity sosharply increase that developers could not control the systemin the global perspective then software plan and qualitycannot be guaranteed In order to overcome the challengessome pioneers of complex systems try to introduce graphtheory that nodes and edges are used to simplify the structureTo our surprise software network also shows the basic char-acteristics of complex networks These exciting discoveriesare paid attentions by researchers from software engineeringThrough the interdisciplinary between complex network andsoftware engineering an approach has been emerging byabstracting a software system as a network that is softwarenetwork

Software network is an interdisciplinary outcome basedon the combination of network science theory and softwareengineering practiceWhen constructing a software networkthe source code should be handled in reverse engineeringmethods (ie compiled the codes into a xml file then derivedthe topology structure from the xml file) Figure 1 is anexample of the fragments of software networks

32 Network Metrics Network metrics treat software entitiesas nodes in a graph and characterize themon the basis of theirdependencieswith other entities As opposed to codemetricsnetwork metrics take into account the interactions betweenentities thus modelling the flow of software information Inthis paper node centrality PageRank and HITS in-out-andtotal degrees andmodularity ratio metrics are involved thenthose significant metrics were used to predict bug pronenessand severity

321 Eigenvector Centrality (EC) It is a measure of theinfluence of a node in a network It assigns relative scores to allnodes in the network based on the concept that connectionsto high-scoring nodes contributemore to the score of currentnode than equal connections to low-scoring nodes and arecomputed by the adjacency matrix For a given network(graph) 119866 = (119881 119864) with |119881| number of nodes let 119860 = (119886

119894119895)

4 Mathematical Problems in Engineering

be the adjacency matrix if vertex 119894 is linked to 119895 so 119886119894119895= 1

and 119886119894119895= 0 otherwise The centrality score of vertex 119894 can be

defined as

119909119894=1

120582sum

119895isin119872(119894)

119909119895=1

120582sum

119895isin119866

119886119894119895119909119895 (1)

where119872(119894) is a set of the neighbors of 119894 and 120582 is a constantOnly the greatest eigenvalue results in the desired centralitymeasure the 119894th component of the eigenvector then gives thecentrality score of the vertex 119894 in the network [26]

322 Betweenness Centrality (BC) It is based on the totalnumber of shortest paths between all possible pairs of nodesthat pass through a node [27] It quantifies frequencies that anode acts as a bridge along the short path between two othernodes If 119892(119904119905)

119894are all shortest path node 119904 and 119905 which pass

through node 119894 and 119899(119904119905) is the shortest path between 119904 and 119905119899 is the total of nodes

BC119894=sum119904lt119905119892(119904119905)

119894119899(119904119905)

(119899 minus 1) (119899 minus 2) (2)

323 Closeness Centrality (CC) It concerned the farness toall other nodes Thus it is defined as the inverse of the sumof all distances to all other nodes [28] If 119889(119894 119895) is the distancebetween node 119894 and 119895 so 1198941015840119904 centrality is

CC119894=1

sum119881

119895=1119889 (119894 119895)

(3)

324 PageRank (PR) It is a variant of the Eigenvectorcentrality measure which is a probability distribution usedto represent the likelihood that while a person randomlyclicking on links will arrive at a particular page The rankvalue indicates the importance of a page which is used hereto denote the importance of a class

PR119894=1 minus 119902

119873+ 119902 sum

119895isin119881(119894)

PR (119895)out119863(119895)

(4)

where 119902 is a damping factor (119902 = 085)119881(119894) is the set of nodesthat link to 119894 out119863(119895) is the number of outgoing edges onnode 119895 and119873 is the total number of nodes in a network

325 HITS It is also originally designed to rank web pageswhich calculates two scores authority and hub scores Theauthority score is computed as the sum of the scaled hubscores that point to that page The hub score is the sum ofthe scaled authority scores of the pages it points to Here theauthority score as a experimental metric is used

326 Degree (D) In the case of directed network it ismeaningful to distinguish in-degree (the number of incom-ing links) from out-degree (the number of outgoing links)For software networks in-degree is a representation of codereuse and out-degree is design complexity Nodes with large

in-degree are reused and with large out-degree are morecomplex in some contexts In our paper we use in119863(119894)out119863(119894) and 119863 representing in-degree out-degree and totaldegree of a node respectively

327 Modularity Ratio (MR) Modularity is a metric used toevaluate the quality of partitioning community in a networkby Newman and Girvan [29] A system consists of manypackages and each package has a large number of classes orsubpackages For a given definition ofmodules or clusters andtheir underlying network structure its respective degree ofmodularity is defined by

Q =sum119899

119894119890119894119894minus sum119899

119894119886119894119887119894

1 minus sum119899

119894119886119894119887119894

(5)

where 119890119894119895is the fraction of all edges that link nodes in module

119894 to nodes in module 119895 119886119894= sum119899

119895119890119894119895 119887119894= sum119899

119895119890119895119894(the sum

of column and row resp) while 119899 is the total number ofexistingmodules According to this equation we define aMRof modular 119894 as follows

MR119894= 119890119894119894minus 119886119894119887119894 (6)

33 Network Metrics for Application Before presenting theresearch questions and the details of our approach we givetwo terminologies used henceforth in this paper whichindicate the practical application of our work

(i) Bug proneness Bug proneness is treated as a qualitymeasure of software entries in this paper An intuitiveunderstanding is that it represents the probability thatan entry (ie class) will become buggy in the processof debuggingThe higher the bug proneness of a classis the more likely it will be given priority to testingFor our purpose we valuate it via the number of bugsand fixed classes

(ii) Bug severity This terminology describes the impactof a bug on the entire system Bug tracking systemsclassify the severity into several levels such as criticalmajor minor and trivial However severity andpriority is not the same concept In some cases a bugof critical severity but the priority to resolve it is lowas this will happen rarely in real scenario Here wewould like to investigate that the importance of a classin the software network is related to the severity of bugcaused by it

With these metrics and application scenes the followingfour research questions are proposed to guide the designof the case study Questions 1ndash3 investigate the propertiesof software networks whereas question 4 predicts the bugproneness and severity with the significant metrics

RQ1 Is the position of classes in the software networkrelated to the bug proneness

RQ2 Is the position of classes in the software networkrelated to the bug severity

RQ3 If so which metrics are more significant

Mathematical Problems in Engineering 5

RQ4 How well does the approach to predicting bug prone-nessseverity with significant metrics work

4 Approach

In our approach we mainly consider two Open SourceprojectsmdashTomcat and Antmdashwritten in Java as our researchsubjects in the Apache Software Foundation (ASF) (httpantapacheorg) and (httptomcatapacheorg) The ratio-nale is threefold (1)They are well-known and stable projectseach has undergone a number of major release cycles and isstill under active development (2)The source code revisionarchives and SVN commit-log dating back several years andbug reports are available it is a good chance for rewardingexperience in Open Source projects (3) The choice of Javaprogramming language is limited by the tools developedto construct software networks and we are interested inunderstanding open source software written in Java

We observed the information provided by Bug trackingsystems (Bugzilla and JIRA) and SVN commits found thatmost bugs have given out the detailed classes and somehave already posted the modified fragment of source codesThis information determines the feasibility of our approachBesides the dependencies between classes which are directedshould not be ignored Therefore we decide to abstract thedirected but unweighted software networks at the class level

Let us assume that 119866 = (119881 119864) is a software networkdefinition 119881 = V

119894 is the set of all classes and 119864 = 119890

119894119895 is

the set of all dependencies We distinguish between differentkinds of dependencies as follows

(i) V119894implementingextending the interfaceclass V

119895is

inheritance dependency

(ii) V119894having a field of type V

119895is field dependency

(iii) V119894calling a method of V

119895ismethod dependency

(iv) Amethod of V119894returning an object of type V

119895is return

dependency

(v) A method of V119894taking an object of V

119895as a parameter

is parameter dependency

A bug report yet offers many fields to help developersunderstand this issue one of which is severity assigned basedon how severely a bug affects the program by administratorsTable 1 shows eight levels of bug severity and their ranksin the Bugzilla An objective for software providers is notonly to minimize the total number of bugs but to ensurethat bugsrsquo severity as low as possible [23] The approachmainly consists of four phases (1) compiling the sourcecode files to extract the class-level directed software networksregarding the dependency requirements (2) exporting SVNcommits and integrating themwith bug reports to obtain ournecessary bug-class association relationships (3) calculatinga set of networkmetrics in the established software networksthen acquire their relationships with the number of bugs andbug severity (4) finally applying the significant metrics tothe analysis of bug prediction The framework is shown inFigure 2

Table 1 Levels of bug severity and their ranks in the Bugzilla bugtracking system

Bug severity RankBlocker 8Critical 7Regression 6Major 5Normal 4Minor 3Trivial 2Enhancement 1

EC

BC

CC

BugzillaBug-class

Bug severity

Bug proneness

Source code

Software network

Changelog

(1)

(2)

(3)

(4)

MR

PRHITS

inDoutDD

Figure 2 The framework of our approach

5 Case Study

This section presents the subject projects used for our studyand the relevant data we collected With the experimentalresults the proposed questions are answered in turn

51 Data For our analysis we collect a number of types ofdata We gather the open source software data from sourcecode repositories and bug database and determine the rankof each bug In this paper we choose Tomcat 7 and Ant184 as our experimental subjects As a successful opensource project both Tomcat and Ant maintain are a publiclyavailable source To construct expected software networksource codes are compiled to xml file then software networkis established by parsing the xml file In the subsequentpart we simplified Tomcat software network as TSN andAnt software network as ASN There are 35 versions forTomcat 7The latest version is continuously renovating so farthereby only the last 34 versions are adopted in our workBut few differences in the number of nodesedges betweeneach version A likely explanation for this is that Tomcat is a

6 Mathematical Problems in Engineering

Table 2 The statistics The number of bugs and fixed classesnodesedges of software network

Tomcat Antbugs 475 480classes 422(21) 389(29)SN nodes 2015 1345SN edges 9573 3937community 19 20

relatively mature open source software project with decadedevelopment history Of course not all projects keep thisphenomenon As we known that some projects have a nearlysuper linear growth [13] At last stable Tomcat 7029 versionis used to analyze consists of 2015 nodes and 9573 edges 19communities 1345 nodes and 3937 edges 20 communities forAnt

The bug reports are comprised of two sets one used forthe experiment and the other for prediction That is the firstthirty versions of Tomcat are utilized for progress (2) and(3) yet the last four versions for prediction Owing to Antdata refers to only one version eighty percent of Ant datais used for experiment and the remaining twenty percentfor prediction Note that studies of software defect rely onlinks between bug databases and commit changes and thisinformation plays an important role inmeasuring quality andpredicting defects Some prior researches suggested that thereare some missing links between bugs and bug-fix commitlogsThey proposed automatic approaches to recovermissinglinks [30 31] and found that the missing links lead to biaseddefect information and it affects prediction performanceHowever some authors argued that a bug may not be abug but a feature or an enhancement [32 33] This mis-classification also introduces bias in bug prediction modelsWhile considering the missing links the false position ofbug classification increases Hence only bugs always reportedas a bug in the bug tracker are taken into account Onceeffective bugs are identified one checks the change logsby searching for bug ID and calculating the similarity ofsummary text The reason for using heuristics is that mostof missing links are developed due to misclassification in ourdatasets The statistics of the experimental data are shown inTable 2 There are some test classes that are not taken intoaccount because they are duplicate In bug 53062 for exam-ple both class orgapachecatalinaconnectorTestResponse andorgapachecatalinaconnectorResponse appear but they rep-resent the same object in our study Note that less than thirtypercent of the classes are involved it is confirmed thatmost ofthe bugs occur in a small number of classes So all we shoulddo is to have a guiding detection in a system

52 Results

RQ1 Is the Position of Classes in the Software Network Relatedto the Bug Proneness To answer this research question firstof all the nodes are divided into different groups by metricvalue in descending order and each group keeps the same

number of nodes The reason for doing this is to facilitatethe exploration for relations between the metrics and bugsOwing to the last 15 nodes are never fixed in TSN thus theyare excluded to keep each group of 200 nodes For ASNin which nodes are divided into eleven groups evenly Thenumber of fixed classes and involved bugs in each group arerecordedWith this information the answer to1198761 is not hardto hold

The number of fixed classes and involved bugs for differ-ent metrics in each group are illustrated in Figures 3 and 4 InFigure 3 EC inD D and HITS have a significantly negativecorrelation (the bigger the average metric value of a class themore bug-prone) BC and outD show a skew distribution seeFigures 3(a) 3(b) 3(e) 3(f) 3(g) and 3(h) There are somedifferences in Figure 4 BC outD and D show a significantcorrelation but do not an obviously skew distribution seeFigures 4(b) 4(f) and 4(g) The results of other metrics arefluctuated especially in groups six eight and nine

MR shows the same relationship with fixed classes andbugs A few differences from the above metrics are thatthe distribution of bugs is higher than classesrsquo and morefluctuated In subfigure (i) the x-axis is the rank of modular(community) the larger modularity ratio the higher its rankis An overall positive trend is that the larger metric value of aclass is the more it will be bug-prone and themore it is likelyto be modified is shown especially BC outD andDmetrics

RQ2 Is the Position of Classes in the Software NetworkRelated to the Bug SeverityUnderstanding and characterizingthe relationships between network metrics and bug severityin open-source software projects is also a very interestingproblem Although the value of metric is positively relatedto bug proneness whether it is related to bug severity theanswer may be not because there exist a lot of bugs in aclass but all of them are not serious We conduct the nextexperiment to validate this research questionwith the severityrank given in Table 1 The results in Figure 5 show that therelationship deviates what was expected CC keeps stablein both projects the overall distribution trend is increasedfirst and then decreased The top row depicts the severity ofTomcat and the below row does Ant

Rank 5 and 7 are two prominent breakpoints marked bythe dotted line in each subfigure they refer to the majorand critical bugs respectively Bugs with rank 8 have thelarge metric value in TSN on the contrary this kind of bugshave the relatively low value in ASN Blocker bugs shouldbe avoided as much as possible in software engineeringpracticesOnce this kind of bug occurs and is found the originis where it may trigger that a lot of other parts are modifiedor the problem is only a small mistake but affects an vitalnode

RQ3 If So Which Metrics Are More Significant Differentmetrics measure the importance of a node from differentaspects in a network According to the results obtained fromRQ1 RQ2 we know that not all the metrics have the sameexpected relationship somemetrics may bemore suitable forbug proneness prediction and others for severity predictionIn this section another work is conducted to analyze how

Mathematical Problems in Engineering 7

0

20

40

60

80

0 2 4 6 8 10 12

(a) EC

0

50

100

150

0 2 4 6 8 10 12

(b) BC

0

10

20

30

40

50

60

70

0 2 4 6 8 10 12

(c) CC

0

50

100

150

0 2 4 6 8 10 12

(d) PR

0

20

40

60

80

0 2 4 6 8 10 12

(e) inD

0

50

100

150

0 2 4 6 8 10 12

(f) outD

0

20

40

60

80

100

120

0 2 4 6 8 10 12

ClassesBugs

(g) D

0

20

40

60

80

0 2 4 6 8 10 12

ClassesBugs

(h) HITS

0

50

100

150

200

250

0 2 4 6 8 10 12 14 16

ClassesBugs

(i) MR

Figure 3 Tomcat (the x-axis is the group ID and y-axis is the number of classes and bugs)

significant these metrics are and which metrics are betterThree typical correlation analysis methods are used to resolvethis problem Pearson Kendall and Spearman (Sig 119875 lt001) Pearson is widely used in statistics to measure thedegree of the relationship between linear related variables butboth variables should be normally distributed before usingit Kendall and Spearman are nonparametric tests that doesnot assume any assumptions related to certain distributionBut the former basically gives the impression of being apermutation test of sorts the latter is essentially based on theprinciple of least squares

Table 3 gives the correlation coefficient between metricvalue and bug proneness on eight metrics For Tomcatthe coefficient of outD is minimum with 0552 and D ismaximum with 0967 yet CC and PR metrics are not corre-lated Although the correlations with outD in three methodsare minimum the skew distribution curve in Figure 3(f)

represented that most of the bugs and classes are fixed inthe first two groups On the other hand as a result ofno skew phenomenon for Ant outD is more significantthan other metrics in three methods However CC ECinD PR and HITS are all not significant The maximumof outD is up to 0952 The differences mentioned abovesuggest that treating these metrics should be unequally andwith caution Straightforward to apply all metrics will likelymislead interpretations Consequently BC and outD are thesuitable metrics to represent the bug proneness and severityBoth of them will be used for the next prediction

RQ4 How Well Does the Approach to Predicting Bug Prone-nessSeverity with Significant Metrics Work Lastly once wehave understood and characterized the relationships betweennetwork metrics and bug proneness or severity in two opensource projects what are the effects if any on software

8 Mathematical Problems in Engineering

0

20

40

60

80

0 2 4 6 8 10 12

(a) EC

0 2 4 6 8 10 120

20

40

60

80

100

(b) BC

0

20

40

60

80

0 2 4 6 8 10 12

(c) CC

0

20

40

60

80

0 2 4 6 8 10 12

(d) PR

0

20

40

60

80

0 2 4 6 8 10 12

(e) inD

0

20

40

60

80

100

120

0 2 4 6 8 10 12

(f) outD

0

20

40

60

80

100

0 2 4 6 8 10 12

ClassesBugs

(g) D

0

20

40

60

80

0 2 4 6 8 10 12

ClassesBugs

(h) HITS

0

20

40

60

80

100

120

140

0 2 4 6 8 10 12 14 16 18 20

ClassesBugs

(i) MR

Figure 4 Ant (the x-axis is the group ID and y-axis is the number of classes and bugs)

Table 3 Bug proneness the correlation coefficient with eight network metrics

Bug proneness (Sig 119875 lt 001)BC CC D EC inD outD PR HITS

Pearson-TSN minus0804 mdash minus0903 minus0852 minus0852 minus0752 minus0728 minus0933Kendall-TSN minus0764 mdash minus0899 minus0689 minus0689 minus0552 mdash minus0733Spearman-TSN minus0888 mdash minus0967 minus0879 minus0879 minus0656 mdash minus0903Pearson-ASN minus0891 mdash minus0895 mdash mdash minus0915 mdash mdashKendall-ASN minus0782 mdash minus0807 mdash mdash minus0844 mdash mdashSpearman-ASN minus0909 mdash minus0916 mdash mdash minus0952 mdash mdashThe bold value refers to the maximum correlation coefficient of different analysis methods per project by comparing the eight metrics

quality Or what are the benefits to software engineeringpractices

We have learnt that BC and outDmetrics are remarkableto represent bug proneness and severity with the aboveexperiments Whether this conclusion is workable or notwe should validate it through bug proneness and severity

predictionwith the test dataThere are 67 effective bugs in testdata and 87 fixed classes related to these bugs for Tomcat and135 bugs and 149 fixed classes for Ant Given the experimentalrequirements bug severity is divided into two categoriesone is slight category in which bugsrsquo rank is less than 4 butrather the other is severe category Table 4 shows the resulting

Mathematical Problems in Engineering 9

predictive F-value from Top 50 to 200 for Tomcat and fromTop 50 to 125 for Ant using selected metrics on softwarenetworks The reason for top k within a range of 200 and 125for these two cases is to ensure the selected nodes from thefirst group F-value is calculated by integrating precise andrecall as

119865 =2 lowast precise lowast recallprecise + recall

(7)

The BCD metric is a combination of BC and outD usedto compare with each of them and follows the rule of BCD =BC+out119863 For Tomcat BCDmetric performs better on fixedclasses prediction than the others and themaximum is 03316when returning the top 100 nodes While all the nodes ingroup ID = 1 are selected three metrics obtain the sameaccuracy 02857 When it comes to the number of bugs themaximum is 0547 when using outD metric and just top 50classes are needed to check In this column outD and BCDmetrics are better than BCmetric in all casesThe last columnis the results of bug severityThefinding shows outDperformsfar better than the other metrics and the F-value is nearlymore than twice The best case is 06667 at top of 50 nodes

For Ant BCD metric performs better than the other twoon fixed classes and bugs when returning top 50 classesExcept this case outD has the highest accuracy at top 100 and125 and obtains the best result when returning top 100 classesthemaximumof classes bugs and severity are 03534 06383and 04865 respectively Throughout the prediction outD isthemost suitablemetric for predicting the bug proneness andbug severity in a directed class-level software network

Finally few concentrations are devoted to analyze theaverage human consumptions and needed comments whendeveloper wants to overcome one kind of bug In Figure 6one observes that more serious bugs need more people toparticipate and these people do more discussions beforethese bugs are identified as fixed which is consistent withthe common experience except rank = 1 A reasonableexplanation for this exception is that the enhancement bughas the lowest difficulty that many developers are able to findthe problem and offer their own suggestions or revisions

6 Discussion

RQ1 From the perspective of node importance the moreimportant a node is the greater its probability of beingmodified is and more bugs would be involved during thesoftware development process The central classes (nodes)form the skeleton of the entire system especially the nodeswith large betweenness centrality (BC) and outgoing links(outD) The former category of nodes plays an importantrole in bridging the different modules to ensure the normalexecution of the entire system Such nodes are also namedldquobrokerrdquo in network science The latter nodes are morecomplex for they aggregate behaviors frommany other nodesThey give some implications to software testing Nodes withhigh BCmean that they aremore important inmany contextsand have significant external responsibility Bugs occurred inthese nodes will influence the reachability between modulesOn the other hand nodes with a high outD mean that

they aggregate many behaviors of other nodes and have asignificant internal complexity They are more likely to befixed and contain bugs In this part PR and CC metrics arenot significant In summary our results are consistent withthe work in [34] some metrics are related to bug pronenessand the others are not

We conclude that the importance of different metricsof a node plays an inconsistent role in the analysis of bugproneness It is clear that BC and outD are suitable to testprioritization when the developers want to know whichclasses should be paid more attentions In some specialscenarios for example the node importance and out-degreecan also be applied to bug location with regard to therelationship between them

RQ2 RQ3 the hypothesis that a class would be apt to highbug proneness for great metric value and the bugs wouldbe more serious in this class is not always true FortunatelyTable 3 shows the average of BC D and outD metrics hasa significant correlation with software bug proneness Themost likely explanation for strong correlations among somemetrics and bug proneness is that the classes were dividedinto different groups having the same number of classes indescending order during the experiment This treatment canalso represent that the average metric value of precedinggroup is larger than the succeeding one and is correspondingto the top-k recommendation in the subsequent section Theresults show an overall trend between groups so that strongerthan the ones reported generally

In Figure 5 major regression and critical bugs have arelatively high value in both projects Moreover this phe-nomenon represented by major bug is more obvious Addi-tionally Bugs raised from the broker will hinder the transferof information between modules and function scheduling ofentire system As long as the modules cannot work smoothlythe quality of system cannot be guaranteed A class with highoutgoing links hasmore potential bugs but the severity of thebugs is higher Such classes are usually the central componentin a system or the central nodes in a network

As the members of a project team who are more con-cerned about whether they can find serious bugs sometimesIn other words they hope to find more bugs quickly andefficiently while more severe bugs in a limited period of timeOur results provide an appropriate method to alleviate thischallenge by BC and outDmetrics

RQ4 by the contrast among eight network metrics BCand outD are selected to predict bug proneness and severityOverall compared to other metrics a significant advantageof outD is shown in our results It can be applied to identifywhich classes should be modified prior to others The resultalso indicates that severe bugs usually have strong internalcomplexity

Based on our work we acquire somemeaningful answersto the proposed four questions in Section 3 However thereare still some potential limitations and threats to the validityof our work

(i) All datasets used in our experiments are collectedfrom open source projects Apache Tomcat 7 andAnt We know that there are many other available

10 Mathematical Problems in Engineering

Table 4 119865-value fixed classes bugs and bug severity

Classes Bugs SeverityBC outD BCD BC outD BCD BC outD BCD

Top 50 02043 02920 02628 03760 05470 04786 02917 06667 04583Top 100 02567 03102 03316 04072 04431 04551 03288 05069 03836Top 150 02616 02954 02954 03594 03779 03687 02857 04184 02959Top 200 02857 02857 02857 03071 03071 03521 02439 03740 02846Top 50 01307 02111 02312 01946 04221 04322 01481 02613 02613Top 100 02651 03534 03213 03489 06383 05957 02270 04865 04649Top 125 03139 03358 03285 04385 05923 05846 03143 04381 04286The bold value refers to the best 119865-value identified by comparing three metrics (BC outD and BCD) at the top K of predicting fixed classes bugs and bugseverity respectively

0

0001

0002

0003

0004

0005

0006

0 1 2 3 4 5 6 7 8 9

(a)

0

10

20

30

40

50

60

0 1 2 3 4 5 6 7 8 9

(b)

0

02

04

06

08

1

0 1 2 3 4 5 6 7 8 9

(c)

0

0002

0004

0006

0008

001

0012

0 1 2 3 4 5 6 7 8 9

PRHITSBC

(d)

0

5

10

15

20

25

30

0 1 2 3 4 5 6 7 8 9

inDoutDD

(e)

0

01

02

03

04

05

06

0 1 2 3 4 5 6 7 8 9

CCEC minus (E minus 01)

(f)

Figure 5 Bug severity (Tomcat-topAnt-bottom the x-axis is bug rank and y-axis is the mean of metric)

software repositories that are helpful to bug analysisWewould like to improve our approach by combiningmore software resources such as mailing-list forummessages in the future work to enrich the findings

(ii) A limitation of our work is that we choose projectswritten in Java because of the tool used to constructsoftware network can only deal with java source codeWhether our conclusion can explain projects writtenin other language or not it still needs to evaluate

(iii) Theoretically software networkswill evolve over timetherefore differences are inevitable between multiple

versions In this paper we utilize a stable versionof each project as the construction standard of soft-ware network instead of considering the structuraldifference between them Although such treatmentis rough it will not affect the final experimentalresults

(iv) As mentioned at the beginning software system canbe characterized from multiple granularity which isa multi-granularity software network We investigatethe bug proneness and severity through networkmetrics in class level If the results will be more

Mathematical Problems in Engineering 11

0

2

4

6

8

10

12

14

0 1 2 3 4 5 6 7 8 9

TomcatAnt

(a) Comments

0

1

2

3

4

5

6

0 1 2 3 4 5 6 7 8 9

TomcatAnt

(b) Persons

Figure 6 Human factor (the x-axis is bug rank)

delightful from the other granularity it is an attractivetheme

7 Conclusion

We constructed a class-level software networks and intro-duced nine representative and commonly-used networkmetrics to bug prediction An empirical study has beenconducted on open-source project Apache Tomcat 7 fromversion 700 to 7033 and Ant 184 from Feb 2002 toNovember 2012 Our analysis demonstrated that there is astatistically significant relation between classrsquos importanceand the probability that the class is to be modified and buggyAlso classrsquos importance is related to the severity of raisedbugs The result also showed that just about ten percentof efforts paid and the accuracy of our prediction for bugproneness and bug severity can be up to 547 and 667638 and 487 respectively when successively returningthe top 50 and 100 classes within these two cases

We expect that our findings are insightful and can beused to support the design and development of softwarehelping engineers in assessing the risk of adding or droppinga feature in the case of supporting existing dependenciesbetween classes We also believe that our approach can beleveraged in the analysis of bug prediction test prioritizationof others open source softwares Finally our findings alsoprovide additional empirical evidence on the importance ofdependencies between classes to researchers in the socialnetwork analysis domain

The next work will mainly focus on two aspects On theone hand we will collect more open source projects (ieEclipse Mozilla or projects deployed on SourceForge) tovalidate the generality of our approach On the other handwe will further take into account the human factors by

the state of software development are a process of humanparticipation An exploration about what impact will bebrought by sociotechnical congruence on bug prediction isurgent and meaningful

Acknowledgment

This work is supported by the National Basic ResearchProgram of China no 2014CB340401 National Natural Sci-ence Foundation of China nos 61273216 61272111 61202048and 61202032 Wuhan Planning Project of Science andTechnology no 201210621214 the Open Foundation ofJiangsu Provincial Key Laboratory of Electronic Businessno JSEB2012-02 and Zhejiang Provincial Natural ScienceFoundation of China no LQ12F02011

References

[1] J Anvik ldquoAutomating bug report assignmentrdquo in Proceedings ofthe 28th International Conference on Software Engineering 2006(ICSE rsquo06) pp 937ndash940 May 2006

[2] D Radjenovica M Herickob R Torkarc and A ZivkovicbldquoSoftware fault prediction metrics a systematic literaturereviewrdquo Information and Software Technology vol 55 pp 1397ndash1418 2013

[3] T Zimmermann and N Nagappan ldquoPredicting defects usingnetwork analysis on dependency graphsrdquo in Proceedings of the30th International Conference on Software Engineering 2008(ICSE rsquo08) pp 531ndash540 May 2008

[4] A Meneely L Williams W Snipes and J Osborne ldquoPredictingfailures with developer networks and social network analysisrdquoin Proceedings of the 16th ACM SIGSOFT International Sym-posium on the Foundations of Software Engineering (SIGSOFT2008FSE rsquo08) pp 13ndash23 November 2008

12 Mathematical Problems in Engineering

[5] KCrowstonKWeiQ Li and JHowison ldquoCore and peripheryin FreeLibre and Open Source software team communica-tionsrdquo in Proceedings of the 39th Annual Hawaii InternationalConference on System Sciences (HICSS rsquo06) p 118 January 2006

[6] T Zimmerman N Nagappan K Herzig R Premraj and LWilliams ldquoAn empirical study on the relation between depen-dency neighborhoods and failuresrdquo in Proceedings of the 4thIEEE International Conference on Software Testing Verificationand Validation (ICST rsquo11) pp 347ndash356 March 2011

[7] S Valverde R F Cancho and R V Sole ldquoScale-free networksfrom optimal designrdquo Europhysics Letters vol 60 no 4 pp 512ndash517 2002

[8] C R Myers ldquoSoftware systems as complex networks structurefunction and evolvability of software collaboration graphsrdquoPhysical Review E vol 68 no 4 Article ID 046116 15 pages2003

[9] G Concas M Marchesi S Pinna and N Serra ldquoPower-laws ina large object-oriented software systemrdquo IEEE Transactions onSoftware Engineering vol 33 no 10 pp 687ndash708 2007

[10] P Louridas D Spinellis and V Vlachos ldquoPower laws insoftwarerdquo ACM Transactions on Software Engineering andMethodology vol 18 no 1 article 2 2008

[11] B Li YutaoMa and J Liu ldquoAdvances in the studies on complexnetworks of software systemsrdquo Advances in Mechanics vol 38no 6 pp 805ndash814 2008

[12] B Li H Wang Z Y Li K Q He and D H Yu ldquoSoftware com-plexity metrics based on complex networksrdquo Acta ElectronicaSinica vol 34 pp 2371ndash2375 2006

[13] W Pan B Li Y Ma and J Liu ldquoMulti-granularity evolutionanalysis of software using complex network theoryrdquo Journal ofSystems Science and Complexity vol 24 no 6 pp 1068ndash10822011

[14] T Opsahl F Agneessens and J Skvoretz ldquoNode centrality inweighted networks generalizing degree and shortest pathsrdquoSocial Networks vol 32 no 3 pp 245ndash251 2010

[15] M S Zanetti I Scholtes C J Tessone and F SchweitzerldquoCategorizing bugs with social networks a case study on fouropen source software communitiesrdquo in Proceedings of the 35thInternational Conference on Software Engineering (ICSE rsquo13) pp1032ndash1041 2013

[16] M S Zanetti and F Schweitzer ldquoA network perspective on soft-ware modularityrdquo in Proceedings of the International Workshopon Complex Sciences in the Engineering of Computing Systemspp 175ndash186 2012

[17] C Ding X He P Husbands H Zha and H D SimonldquoPageRank HITS and a unified framework for link analysisrdquo inProceedings of the 25th Annual International ACM SIGIR Con-ference on Research and Development in Information Retrievalp 353 354 2002

[18] R Premraj and K Herzig ldquoNetwork versus code metrics topredict defects a replication studyrdquo in Proceedings of the 5thInternational Symposium on Empirical Software Engineering andMeasurement (ESEM rsquo11) pp 215ndash224 September 2011

[19] M Pinzger N Nagappan and B Murphy ldquoCan developer-module networks predict failuresrdquo in Proceedings of the 16thACM SIGSOFT International Symposium on the Foundationsof Software Engineering (SIGSOFT 2008FSE rsquo08) pp 2ndash12November 2008

[20] Y Shin A Meneely LWilliams and J A Osborne ldquoEvaluatingcomplexity code churn and developer activity metrics asindicators of software vulnerabilitiesrdquo IEEE Transactions onSoftware Engineering vol 37 no 6 pp 772ndash787 2011

[21] A Sureka A Goyal and A Rastogi ldquoUsing social networkanalysis for mining collaboration data in a defect trackingsystem for risk and vulnerability analysisrdquo in Proceedings of the4th India Software Engineering Conference (ISEC rsquo11) pp 195ndash204 February 2011

[22] F Perin L Renggli and J Ressia ldquoRanking software artifactsrdquoin Proceedings of the 4th Workshop on FAMIX and Moose inReengineering (ICSM rsquo10) pp 1ndash4 2010

[23] P Bhattacharya M Iliofotou I Neamtiu and M FaloutsosldquoGraphbased analysis and prediction for software evolutionrdquoin Proceedings of the International Conference on SoftwareEngineering (ICSE rsquo12) pp 419ndash429 2012

[24] D Steidl B Hummel and E Juergens ldquoUsing network analysisfor recommendation of central software classesrdquo in Proceedingsof the 19th Working Conference on Reverse Engineering pp 93ndash102 2012

[25] T Wolf A Schroter D Damian and T Nguyen ldquoPredictingbuild failures using social network analysisrdquo in Proceedings ofthe International Conference on Software Engineering (ICSE rsquo09)pp 1ndash11 May 2009

[26] M E J Newman Networks Oxford University Press OxfordUK 2010

[27] S Wasserman and K Faust Social Network Analysis Methodsand Applications Cambridge University Press 1994

[28] L C Freeman ldquoCentrality in social networks conceptual clari-ficationrdquo Social Networks vol 1 no 3 pp 215ndash239 1978

[29] M E J Newman and M Girvan ldquoFinding and evaluatingcommunity structure in networksrdquo Physical Review E vol 69no 2 Article ID 026113 15 pages 2004

[30] A Bachmann C Bird F Rahman P Devanbu and ABernstein ldquoThe missing links bugs and bug-fix commitsrdquo inProceedings of the 18th ACM SIGSOFT International Symposiumon the Foundations of Software Engineering (FSE rsquo10) pp 97ndash106November 2010

[31] RWuH Zhang S Kim and S C Cheung ldquoReLink recoveringlinks between bugs and changesrdquo inProceedings of the 19th ACMSIGSOFT Symposium on Foundations of Software Engineering(SIGSOFTFSE rsquo11) pp 15ndash25 September 2011

[32] G Antoniol K Ayari M Di Penta F Khomh and Y GGueheneuc ldquoIs it a bug or an enhancement A text-basedapproach to classify change requestsrdquo in Proceedings of theConference of the Center for Advanced Studies (CASCON rsquo08)ACM October 2008

[33] K Herzig S Just and A Zeller ldquoItrsquos not a bug itrsquos a feature howmisclassification impacts bug predictionrdquo in Proceedings of the35 International Conference on Software Engineering pp 392ndash401 2013

[34] R Tonelli G Concas MMarchesi and AMurgia ldquoAn analysisof SNA metrics on the Java Qualitas Corpusrdquo in Proceedingsof the 4th India Software Engineering Conference (ISEC rsquo11) pp205ndash213 February 2011

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 4: Research Article Using Software Dependency to Bug Predictiondownloads.hindawi.com/journals/mpe/2013/869356.pdfdefects. ey found that network metrics perform signi - cantly better than

4 Mathematical Problems in Engineering

be the adjacency matrix if vertex 119894 is linked to 119895 so 119886119894119895= 1

and 119886119894119895= 0 otherwise The centrality score of vertex 119894 can be

defined as

119909119894=1

120582sum

119895isin119872(119894)

119909119895=1

120582sum

119895isin119866

119886119894119895119909119895 (1)

where119872(119894) is a set of the neighbors of 119894 and 120582 is a constantOnly the greatest eigenvalue results in the desired centralitymeasure the 119894th component of the eigenvector then gives thecentrality score of the vertex 119894 in the network [26]

322 Betweenness Centrality (BC) It is based on the totalnumber of shortest paths between all possible pairs of nodesthat pass through a node [27] It quantifies frequencies that anode acts as a bridge along the short path between two othernodes If 119892(119904119905)

119894are all shortest path node 119904 and 119905 which pass

through node 119894 and 119899(119904119905) is the shortest path between 119904 and 119905119899 is the total of nodes

BC119894=sum119904lt119905119892(119904119905)

119894119899(119904119905)

(119899 minus 1) (119899 minus 2) (2)

323 Closeness Centrality (CC) It concerned the farness toall other nodes Thus it is defined as the inverse of the sumof all distances to all other nodes [28] If 119889(119894 119895) is the distancebetween node 119894 and 119895 so 1198941015840119904 centrality is

CC119894=1

sum119881

119895=1119889 (119894 119895)

(3)

324 PageRank (PR) It is a variant of the Eigenvectorcentrality measure which is a probability distribution usedto represent the likelihood that while a person randomlyclicking on links will arrive at a particular page The rankvalue indicates the importance of a page which is used hereto denote the importance of a class

PR119894=1 minus 119902

119873+ 119902 sum

119895isin119881(119894)

PR (119895)out119863(119895)

(4)

where 119902 is a damping factor (119902 = 085)119881(119894) is the set of nodesthat link to 119894 out119863(119895) is the number of outgoing edges onnode 119895 and119873 is the total number of nodes in a network

325 HITS It is also originally designed to rank web pageswhich calculates two scores authority and hub scores Theauthority score is computed as the sum of the scaled hubscores that point to that page The hub score is the sum ofthe scaled authority scores of the pages it points to Here theauthority score as a experimental metric is used

326 Degree (D) In the case of directed network it ismeaningful to distinguish in-degree (the number of incom-ing links) from out-degree (the number of outgoing links)For software networks in-degree is a representation of codereuse and out-degree is design complexity Nodes with large

in-degree are reused and with large out-degree are morecomplex in some contexts In our paper we use in119863(119894)out119863(119894) and 119863 representing in-degree out-degree and totaldegree of a node respectively

327 Modularity Ratio (MR) Modularity is a metric used toevaluate the quality of partitioning community in a networkby Newman and Girvan [29] A system consists of manypackages and each package has a large number of classes orsubpackages For a given definition ofmodules or clusters andtheir underlying network structure its respective degree ofmodularity is defined by

Q =sum119899

119894119890119894119894minus sum119899

119894119886119894119887119894

1 minus sum119899

119894119886119894119887119894

(5)

where 119890119894119895is the fraction of all edges that link nodes in module

119894 to nodes in module 119895 119886119894= sum119899

119895119890119894119895 119887119894= sum119899

119895119890119895119894(the sum

of column and row resp) while 119899 is the total number ofexistingmodules According to this equation we define aMRof modular 119894 as follows

MR119894= 119890119894119894minus 119886119894119887119894 (6)

33 Network Metrics for Application Before presenting theresearch questions and the details of our approach we givetwo terminologies used henceforth in this paper whichindicate the practical application of our work

(i) Bug proneness Bug proneness is treated as a qualitymeasure of software entries in this paper An intuitiveunderstanding is that it represents the probability thatan entry (ie class) will become buggy in the processof debuggingThe higher the bug proneness of a classis the more likely it will be given priority to testingFor our purpose we valuate it via the number of bugsand fixed classes

(ii) Bug severity This terminology describes the impactof a bug on the entire system Bug tracking systemsclassify the severity into several levels such as criticalmajor minor and trivial However severity andpriority is not the same concept In some cases a bugof critical severity but the priority to resolve it is lowas this will happen rarely in real scenario Here wewould like to investigate that the importance of a classin the software network is related to the severity of bugcaused by it

With these metrics and application scenes the followingfour research questions are proposed to guide the designof the case study Questions 1ndash3 investigate the propertiesof software networks whereas question 4 predicts the bugproneness and severity with the significant metrics

RQ1 Is the position of classes in the software networkrelated to the bug proneness

RQ2 Is the position of classes in the software networkrelated to the bug severity

RQ3 If so which metrics are more significant

Mathematical Problems in Engineering 5

RQ4 How well does the approach to predicting bug prone-nessseverity with significant metrics work

4 Approach

In our approach we mainly consider two Open SourceprojectsmdashTomcat and Antmdashwritten in Java as our researchsubjects in the Apache Software Foundation (ASF) (httpantapacheorg) and (httptomcatapacheorg) The ratio-nale is threefold (1)They are well-known and stable projectseach has undergone a number of major release cycles and isstill under active development (2)The source code revisionarchives and SVN commit-log dating back several years andbug reports are available it is a good chance for rewardingexperience in Open Source projects (3) The choice of Javaprogramming language is limited by the tools developedto construct software networks and we are interested inunderstanding open source software written in Java

We observed the information provided by Bug trackingsystems (Bugzilla and JIRA) and SVN commits found thatmost bugs have given out the detailed classes and somehave already posted the modified fragment of source codesThis information determines the feasibility of our approachBesides the dependencies between classes which are directedshould not be ignored Therefore we decide to abstract thedirected but unweighted software networks at the class level

Let us assume that 119866 = (119881 119864) is a software networkdefinition 119881 = V

119894 is the set of all classes and 119864 = 119890

119894119895 is

the set of all dependencies We distinguish between differentkinds of dependencies as follows

(i) V119894implementingextending the interfaceclass V

119895is

inheritance dependency

(ii) V119894having a field of type V

119895is field dependency

(iii) V119894calling a method of V

119895ismethod dependency

(iv) Amethod of V119894returning an object of type V

119895is return

dependency

(v) A method of V119894taking an object of V

119895as a parameter

is parameter dependency

A bug report yet offers many fields to help developersunderstand this issue one of which is severity assigned basedon how severely a bug affects the program by administratorsTable 1 shows eight levels of bug severity and their ranksin the Bugzilla An objective for software providers is notonly to minimize the total number of bugs but to ensurethat bugsrsquo severity as low as possible [23] The approachmainly consists of four phases (1) compiling the sourcecode files to extract the class-level directed software networksregarding the dependency requirements (2) exporting SVNcommits and integrating themwith bug reports to obtain ournecessary bug-class association relationships (3) calculatinga set of networkmetrics in the established software networksthen acquire their relationships with the number of bugs andbug severity (4) finally applying the significant metrics tothe analysis of bug prediction The framework is shown inFigure 2

Table 1 Levels of bug severity and their ranks in the Bugzilla bugtracking system

Bug severity RankBlocker 8Critical 7Regression 6Major 5Normal 4Minor 3Trivial 2Enhancement 1

EC

BC

CC

BugzillaBug-class

Bug severity

Bug proneness

Source code

Software network

Changelog

(1)

(2)

(3)

(4)

MR

PRHITS

inDoutDD

Figure 2 The framework of our approach

5 Case Study

This section presents the subject projects used for our studyand the relevant data we collected With the experimentalresults the proposed questions are answered in turn

51 Data For our analysis we collect a number of types ofdata We gather the open source software data from sourcecode repositories and bug database and determine the rankof each bug In this paper we choose Tomcat 7 and Ant184 as our experimental subjects As a successful opensource project both Tomcat and Ant maintain are a publiclyavailable source To construct expected software networksource codes are compiled to xml file then software networkis established by parsing the xml file In the subsequentpart we simplified Tomcat software network as TSN andAnt software network as ASN There are 35 versions forTomcat 7The latest version is continuously renovating so farthereby only the last 34 versions are adopted in our workBut few differences in the number of nodesedges betweeneach version A likely explanation for this is that Tomcat is a

6 Mathematical Problems in Engineering

Table 2 The statistics The number of bugs and fixed classesnodesedges of software network

Tomcat Antbugs 475 480classes 422(21) 389(29)SN nodes 2015 1345SN edges 9573 3937community 19 20

relatively mature open source software project with decadedevelopment history Of course not all projects keep thisphenomenon As we known that some projects have a nearlysuper linear growth [13] At last stable Tomcat 7029 versionis used to analyze consists of 2015 nodes and 9573 edges 19communities 1345 nodes and 3937 edges 20 communities forAnt

The bug reports are comprised of two sets one used forthe experiment and the other for prediction That is the firstthirty versions of Tomcat are utilized for progress (2) and(3) yet the last four versions for prediction Owing to Antdata refers to only one version eighty percent of Ant datais used for experiment and the remaining twenty percentfor prediction Note that studies of software defect rely onlinks between bug databases and commit changes and thisinformation plays an important role inmeasuring quality andpredicting defects Some prior researches suggested that thereare some missing links between bugs and bug-fix commitlogsThey proposed automatic approaches to recovermissinglinks [30 31] and found that the missing links lead to biaseddefect information and it affects prediction performanceHowever some authors argued that a bug may not be abug but a feature or an enhancement [32 33] This mis-classification also introduces bias in bug prediction modelsWhile considering the missing links the false position ofbug classification increases Hence only bugs always reportedas a bug in the bug tracker are taken into account Onceeffective bugs are identified one checks the change logsby searching for bug ID and calculating the similarity ofsummary text The reason for using heuristics is that mostof missing links are developed due to misclassification in ourdatasets The statistics of the experimental data are shown inTable 2 There are some test classes that are not taken intoaccount because they are duplicate In bug 53062 for exam-ple both class orgapachecatalinaconnectorTestResponse andorgapachecatalinaconnectorResponse appear but they rep-resent the same object in our study Note that less than thirtypercent of the classes are involved it is confirmed thatmost ofthe bugs occur in a small number of classes So all we shoulddo is to have a guiding detection in a system

52 Results

RQ1 Is the Position of Classes in the Software Network Relatedto the Bug Proneness To answer this research question firstof all the nodes are divided into different groups by metricvalue in descending order and each group keeps the same

number of nodes The reason for doing this is to facilitatethe exploration for relations between the metrics and bugsOwing to the last 15 nodes are never fixed in TSN thus theyare excluded to keep each group of 200 nodes For ASNin which nodes are divided into eleven groups evenly Thenumber of fixed classes and involved bugs in each group arerecordedWith this information the answer to1198761 is not hardto hold

The number of fixed classes and involved bugs for differ-ent metrics in each group are illustrated in Figures 3 and 4 InFigure 3 EC inD D and HITS have a significantly negativecorrelation (the bigger the average metric value of a class themore bug-prone) BC and outD show a skew distribution seeFigures 3(a) 3(b) 3(e) 3(f) 3(g) and 3(h) There are somedifferences in Figure 4 BC outD and D show a significantcorrelation but do not an obviously skew distribution seeFigures 4(b) 4(f) and 4(g) The results of other metrics arefluctuated especially in groups six eight and nine

MR shows the same relationship with fixed classes andbugs A few differences from the above metrics are thatthe distribution of bugs is higher than classesrsquo and morefluctuated In subfigure (i) the x-axis is the rank of modular(community) the larger modularity ratio the higher its rankis An overall positive trend is that the larger metric value of aclass is the more it will be bug-prone and themore it is likelyto be modified is shown especially BC outD andDmetrics

RQ2 Is the Position of Classes in the Software NetworkRelated to the Bug SeverityUnderstanding and characterizingthe relationships between network metrics and bug severityin open-source software projects is also a very interestingproblem Although the value of metric is positively relatedto bug proneness whether it is related to bug severity theanswer may be not because there exist a lot of bugs in aclass but all of them are not serious We conduct the nextexperiment to validate this research questionwith the severityrank given in Table 1 The results in Figure 5 show that therelationship deviates what was expected CC keeps stablein both projects the overall distribution trend is increasedfirst and then decreased The top row depicts the severity ofTomcat and the below row does Ant

Rank 5 and 7 are two prominent breakpoints marked bythe dotted line in each subfigure they refer to the majorand critical bugs respectively Bugs with rank 8 have thelarge metric value in TSN on the contrary this kind of bugshave the relatively low value in ASN Blocker bugs shouldbe avoided as much as possible in software engineeringpracticesOnce this kind of bug occurs and is found the originis where it may trigger that a lot of other parts are modifiedor the problem is only a small mistake but affects an vitalnode

RQ3 If So Which Metrics Are More Significant Differentmetrics measure the importance of a node from differentaspects in a network According to the results obtained fromRQ1 RQ2 we know that not all the metrics have the sameexpected relationship somemetrics may bemore suitable forbug proneness prediction and others for severity predictionIn this section another work is conducted to analyze how

Mathematical Problems in Engineering 7

0

20

40

60

80

0 2 4 6 8 10 12

(a) EC

0

50

100

150

0 2 4 6 8 10 12

(b) BC

0

10

20

30

40

50

60

70

0 2 4 6 8 10 12

(c) CC

0

50

100

150

0 2 4 6 8 10 12

(d) PR

0

20

40

60

80

0 2 4 6 8 10 12

(e) inD

0

50

100

150

0 2 4 6 8 10 12

(f) outD

0

20

40

60

80

100

120

0 2 4 6 8 10 12

ClassesBugs

(g) D

0

20

40

60

80

0 2 4 6 8 10 12

ClassesBugs

(h) HITS

0

50

100

150

200

250

0 2 4 6 8 10 12 14 16

ClassesBugs

(i) MR

Figure 3 Tomcat (the x-axis is the group ID and y-axis is the number of classes and bugs)

significant these metrics are and which metrics are betterThree typical correlation analysis methods are used to resolvethis problem Pearson Kendall and Spearman (Sig 119875 lt001) Pearson is widely used in statistics to measure thedegree of the relationship between linear related variables butboth variables should be normally distributed before usingit Kendall and Spearman are nonparametric tests that doesnot assume any assumptions related to certain distributionBut the former basically gives the impression of being apermutation test of sorts the latter is essentially based on theprinciple of least squares

Table 3 gives the correlation coefficient between metricvalue and bug proneness on eight metrics For Tomcatthe coefficient of outD is minimum with 0552 and D ismaximum with 0967 yet CC and PR metrics are not corre-lated Although the correlations with outD in three methodsare minimum the skew distribution curve in Figure 3(f)

represented that most of the bugs and classes are fixed inthe first two groups On the other hand as a result ofno skew phenomenon for Ant outD is more significantthan other metrics in three methods However CC ECinD PR and HITS are all not significant The maximumof outD is up to 0952 The differences mentioned abovesuggest that treating these metrics should be unequally andwith caution Straightforward to apply all metrics will likelymislead interpretations Consequently BC and outD are thesuitable metrics to represent the bug proneness and severityBoth of them will be used for the next prediction

RQ4 How Well Does the Approach to Predicting Bug Prone-nessSeverity with Significant Metrics Work Lastly once wehave understood and characterized the relationships betweennetwork metrics and bug proneness or severity in two opensource projects what are the effects if any on software

8 Mathematical Problems in Engineering

0

20

40

60

80

0 2 4 6 8 10 12

(a) EC

0 2 4 6 8 10 120

20

40

60

80

100

(b) BC

0

20

40

60

80

0 2 4 6 8 10 12

(c) CC

0

20

40

60

80

0 2 4 6 8 10 12

(d) PR

0

20

40

60

80

0 2 4 6 8 10 12

(e) inD

0

20

40

60

80

100

120

0 2 4 6 8 10 12

(f) outD

0

20

40

60

80

100

0 2 4 6 8 10 12

ClassesBugs

(g) D

0

20

40

60

80

0 2 4 6 8 10 12

ClassesBugs

(h) HITS

0

20

40

60

80

100

120

140

0 2 4 6 8 10 12 14 16 18 20

ClassesBugs

(i) MR

Figure 4 Ant (the x-axis is the group ID and y-axis is the number of classes and bugs)

Table 3 Bug proneness the correlation coefficient with eight network metrics

Bug proneness (Sig 119875 lt 001)BC CC D EC inD outD PR HITS

Pearson-TSN minus0804 mdash minus0903 minus0852 minus0852 minus0752 minus0728 minus0933Kendall-TSN minus0764 mdash minus0899 minus0689 minus0689 minus0552 mdash minus0733Spearman-TSN minus0888 mdash minus0967 minus0879 minus0879 minus0656 mdash minus0903Pearson-ASN minus0891 mdash minus0895 mdash mdash minus0915 mdash mdashKendall-ASN minus0782 mdash minus0807 mdash mdash minus0844 mdash mdashSpearman-ASN minus0909 mdash minus0916 mdash mdash minus0952 mdash mdashThe bold value refers to the maximum correlation coefficient of different analysis methods per project by comparing the eight metrics

quality Or what are the benefits to software engineeringpractices

We have learnt that BC and outDmetrics are remarkableto represent bug proneness and severity with the aboveexperiments Whether this conclusion is workable or notwe should validate it through bug proneness and severity

predictionwith the test dataThere are 67 effective bugs in testdata and 87 fixed classes related to these bugs for Tomcat and135 bugs and 149 fixed classes for Ant Given the experimentalrequirements bug severity is divided into two categoriesone is slight category in which bugsrsquo rank is less than 4 butrather the other is severe category Table 4 shows the resulting

Mathematical Problems in Engineering 9

predictive F-value from Top 50 to 200 for Tomcat and fromTop 50 to 125 for Ant using selected metrics on softwarenetworks The reason for top k within a range of 200 and 125for these two cases is to ensure the selected nodes from thefirst group F-value is calculated by integrating precise andrecall as

119865 =2 lowast precise lowast recallprecise + recall

(7)

The BCD metric is a combination of BC and outD usedto compare with each of them and follows the rule of BCD =BC+out119863 For Tomcat BCDmetric performs better on fixedclasses prediction than the others and themaximum is 03316when returning the top 100 nodes While all the nodes ingroup ID = 1 are selected three metrics obtain the sameaccuracy 02857 When it comes to the number of bugs themaximum is 0547 when using outD metric and just top 50classes are needed to check In this column outD and BCDmetrics are better than BCmetric in all casesThe last columnis the results of bug severityThefinding shows outDperformsfar better than the other metrics and the F-value is nearlymore than twice The best case is 06667 at top of 50 nodes

For Ant BCD metric performs better than the other twoon fixed classes and bugs when returning top 50 classesExcept this case outD has the highest accuracy at top 100 and125 and obtains the best result when returning top 100 classesthemaximumof classes bugs and severity are 03534 06383and 04865 respectively Throughout the prediction outD isthemost suitablemetric for predicting the bug proneness andbug severity in a directed class-level software network

Finally few concentrations are devoted to analyze theaverage human consumptions and needed comments whendeveloper wants to overcome one kind of bug In Figure 6one observes that more serious bugs need more people toparticipate and these people do more discussions beforethese bugs are identified as fixed which is consistent withthe common experience except rank = 1 A reasonableexplanation for this exception is that the enhancement bughas the lowest difficulty that many developers are able to findthe problem and offer their own suggestions or revisions

6 Discussion

RQ1 From the perspective of node importance the moreimportant a node is the greater its probability of beingmodified is and more bugs would be involved during thesoftware development process The central classes (nodes)form the skeleton of the entire system especially the nodeswith large betweenness centrality (BC) and outgoing links(outD) The former category of nodes plays an importantrole in bridging the different modules to ensure the normalexecution of the entire system Such nodes are also namedldquobrokerrdquo in network science The latter nodes are morecomplex for they aggregate behaviors frommany other nodesThey give some implications to software testing Nodes withhigh BCmean that they aremore important inmany contextsand have significant external responsibility Bugs occurred inthese nodes will influence the reachability between modulesOn the other hand nodes with a high outD mean that

they aggregate many behaviors of other nodes and have asignificant internal complexity They are more likely to befixed and contain bugs In this part PR and CC metrics arenot significant In summary our results are consistent withthe work in [34] some metrics are related to bug pronenessand the others are not

We conclude that the importance of different metricsof a node plays an inconsistent role in the analysis of bugproneness It is clear that BC and outD are suitable to testprioritization when the developers want to know whichclasses should be paid more attentions In some specialscenarios for example the node importance and out-degreecan also be applied to bug location with regard to therelationship between them

RQ2 RQ3 the hypothesis that a class would be apt to highbug proneness for great metric value and the bugs wouldbe more serious in this class is not always true FortunatelyTable 3 shows the average of BC D and outD metrics hasa significant correlation with software bug proneness Themost likely explanation for strong correlations among somemetrics and bug proneness is that the classes were dividedinto different groups having the same number of classes indescending order during the experiment This treatment canalso represent that the average metric value of precedinggroup is larger than the succeeding one and is correspondingto the top-k recommendation in the subsequent section Theresults show an overall trend between groups so that strongerthan the ones reported generally

In Figure 5 major regression and critical bugs have arelatively high value in both projects Moreover this phe-nomenon represented by major bug is more obvious Addi-tionally Bugs raised from the broker will hinder the transferof information between modules and function scheduling ofentire system As long as the modules cannot work smoothlythe quality of system cannot be guaranteed A class with highoutgoing links hasmore potential bugs but the severity of thebugs is higher Such classes are usually the central componentin a system or the central nodes in a network

As the members of a project team who are more con-cerned about whether they can find serious bugs sometimesIn other words they hope to find more bugs quickly andefficiently while more severe bugs in a limited period of timeOur results provide an appropriate method to alleviate thischallenge by BC and outDmetrics

RQ4 by the contrast among eight network metrics BCand outD are selected to predict bug proneness and severityOverall compared to other metrics a significant advantageof outD is shown in our results It can be applied to identifywhich classes should be modified prior to others The resultalso indicates that severe bugs usually have strong internalcomplexity

Based on our work we acquire somemeaningful answersto the proposed four questions in Section 3 However thereare still some potential limitations and threats to the validityof our work

(i) All datasets used in our experiments are collectedfrom open source projects Apache Tomcat 7 andAnt We know that there are many other available

10 Mathematical Problems in Engineering

Table 4 119865-value fixed classes bugs and bug severity

Classes Bugs SeverityBC outD BCD BC outD BCD BC outD BCD

Top 50 02043 02920 02628 03760 05470 04786 02917 06667 04583Top 100 02567 03102 03316 04072 04431 04551 03288 05069 03836Top 150 02616 02954 02954 03594 03779 03687 02857 04184 02959Top 200 02857 02857 02857 03071 03071 03521 02439 03740 02846Top 50 01307 02111 02312 01946 04221 04322 01481 02613 02613Top 100 02651 03534 03213 03489 06383 05957 02270 04865 04649Top 125 03139 03358 03285 04385 05923 05846 03143 04381 04286The bold value refers to the best 119865-value identified by comparing three metrics (BC outD and BCD) at the top K of predicting fixed classes bugs and bugseverity respectively

0

0001

0002

0003

0004

0005

0006

0 1 2 3 4 5 6 7 8 9

(a)

0

10

20

30

40

50

60

0 1 2 3 4 5 6 7 8 9

(b)

0

02

04

06

08

1

0 1 2 3 4 5 6 7 8 9

(c)

0

0002

0004

0006

0008

001

0012

0 1 2 3 4 5 6 7 8 9

PRHITSBC

(d)

0

5

10

15

20

25

30

0 1 2 3 4 5 6 7 8 9

inDoutDD

(e)

0

01

02

03

04

05

06

0 1 2 3 4 5 6 7 8 9

CCEC minus (E minus 01)

(f)

Figure 5 Bug severity (Tomcat-topAnt-bottom the x-axis is bug rank and y-axis is the mean of metric)

software repositories that are helpful to bug analysisWewould like to improve our approach by combiningmore software resources such as mailing-list forummessages in the future work to enrich the findings

(ii) A limitation of our work is that we choose projectswritten in Java because of the tool used to constructsoftware network can only deal with java source codeWhether our conclusion can explain projects writtenin other language or not it still needs to evaluate

(iii) Theoretically software networkswill evolve over timetherefore differences are inevitable between multiple

versions In this paper we utilize a stable versionof each project as the construction standard of soft-ware network instead of considering the structuraldifference between them Although such treatmentis rough it will not affect the final experimentalresults

(iv) As mentioned at the beginning software system canbe characterized from multiple granularity which isa multi-granularity software network We investigatethe bug proneness and severity through networkmetrics in class level If the results will be more

Mathematical Problems in Engineering 11

0

2

4

6

8

10

12

14

0 1 2 3 4 5 6 7 8 9

TomcatAnt

(a) Comments

0

1

2

3

4

5

6

0 1 2 3 4 5 6 7 8 9

TomcatAnt

(b) Persons

Figure 6 Human factor (the x-axis is bug rank)

delightful from the other granularity it is an attractivetheme

7 Conclusion

We constructed a class-level software networks and intro-duced nine representative and commonly-used networkmetrics to bug prediction An empirical study has beenconducted on open-source project Apache Tomcat 7 fromversion 700 to 7033 and Ant 184 from Feb 2002 toNovember 2012 Our analysis demonstrated that there is astatistically significant relation between classrsquos importanceand the probability that the class is to be modified and buggyAlso classrsquos importance is related to the severity of raisedbugs The result also showed that just about ten percentof efforts paid and the accuracy of our prediction for bugproneness and bug severity can be up to 547 and 667638 and 487 respectively when successively returningthe top 50 and 100 classes within these two cases

We expect that our findings are insightful and can beused to support the design and development of softwarehelping engineers in assessing the risk of adding or droppinga feature in the case of supporting existing dependenciesbetween classes We also believe that our approach can beleveraged in the analysis of bug prediction test prioritizationof others open source softwares Finally our findings alsoprovide additional empirical evidence on the importance ofdependencies between classes to researchers in the socialnetwork analysis domain

The next work will mainly focus on two aspects On theone hand we will collect more open source projects (ieEclipse Mozilla or projects deployed on SourceForge) tovalidate the generality of our approach On the other handwe will further take into account the human factors by

the state of software development are a process of humanparticipation An exploration about what impact will bebrought by sociotechnical congruence on bug prediction isurgent and meaningful

Acknowledgment

This work is supported by the National Basic ResearchProgram of China no 2014CB340401 National Natural Sci-ence Foundation of China nos 61273216 61272111 61202048and 61202032 Wuhan Planning Project of Science andTechnology no 201210621214 the Open Foundation ofJiangsu Provincial Key Laboratory of Electronic Businessno JSEB2012-02 and Zhejiang Provincial Natural ScienceFoundation of China no LQ12F02011

References

[1] J Anvik ldquoAutomating bug report assignmentrdquo in Proceedings ofthe 28th International Conference on Software Engineering 2006(ICSE rsquo06) pp 937ndash940 May 2006

[2] D Radjenovica M Herickob R Torkarc and A ZivkovicbldquoSoftware fault prediction metrics a systematic literaturereviewrdquo Information and Software Technology vol 55 pp 1397ndash1418 2013

[3] T Zimmermann and N Nagappan ldquoPredicting defects usingnetwork analysis on dependency graphsrdquo in Proceedings of the30th International Conference on Software Engineering 2008(ICSE rsquo08) pp 531ndash540 May 2008

[4] A Meneely L Williams W Snipes and J Osborne ldquoPredictingfailures with developer networks and social network analysisrdquoin Proceedings of the 16th ACM SIGSOFT International Sym-posium on the Foundations of Software Engineering (SIGSOFT2008FSE rsquo08) pp 13ndash23 November 2008

12 Mathematical Problems in Engineering

[5] KCrowstonKWeiQ Li and JHowison ldquoCore and peripheryin FreeLibre and Open Source software team communica-tionsrdquo in Proceedings of the 39th Annual Hawaii InternationalConference on System Sciences (HICSS rsquo06) p 118 January 2006

[6] T Zimmerman N Nagappan K Herzig R Premraj and LWilliams ldquoAn empirical study on the relation between depen-dency neighborhoods and failuresrdquo in Proceedings of the 4thIEEE International Conference on Software Testing Verificationand Validation (ICST rsquo11) pp 347ndash356 March 2011

[7] S Valverde R F Cancho and R V Sole ldquoScale-free networksfrom optimal designrdquo Europhysics Letters vol 60 no 4 pp 512ndash517 2002

[8] C R Myers ldquoSoftware systems as complex networks structurefunction and evolvability of software collaboration graphsrdquoPhysical Review E vol 68 no 4 Article ID 046116 15 pages2003

[9] G Concas M Marchesi S Pinna and N Serra ldquoPower-laws ina large object-oriented software systemrdquo IEEE Transactions onSoftware Engineering vol 33 no 10 pp 687ndash708 2007

[10] P Louridas D Spinellis and V Vlachos ldquoPower laws insoftwarerdquo ACM Transactions on Software Engineering andMethodology vol 18 no 1 article 2 2008

[11] B Li YutaoMa and J Liu ldquoAdvances in the studies on complexnetworks of software systemsrdquo Advances in Mechanics vol 38no 6 pp 805ndash814 2008

[12] B Li H Wang Z Y Li K Q He and D H Yu ldquoSoftware com-plexity metrics based on complex networksrdquo Acta ElectronicaSinica vol 34 pp 2371ndash2375 2006

[13] W Pan B Li Y Ma and J Liu ldquoMulti-granularity evolutionanalysis of software using complex network theoryrdquo Journal ofSystems Science and Complexity vol 24 no 6 pp 1068ndash10822011

[14] T Opsahl F Agneessens and J Skvoretz ldquoNode centrality inweighted networks generalizing degree and shortest pathsrdquoSocial Networks vol 32 no 3 pp 245ndash251 2010

[15] M S Zanetti I Scholtes C J Tessone and F SchweitzerldquoCategorizing bugs with social networks a case study on fouropen source software communitiesrdquo in Proceedings of the 35thInternational Conference on Software Engineering (ICSE rsquo13) pp1032ndash1041 2013

[16] M S Zanetti and F Schweitzer ldquoA network perspective on soft-ware modularityrdquo in Proceedings of the International Workshopon Complex Sciences in the Engineering of Computing Systemspp 175ndash186 2012

[17] C Ding X He P Husbands H Zha and H D SimonldquoPageRank HITS and a unified framework for link analysisrdquo inProceedings of the 25th Annual International ACM SIGIR Con-ference on Research and Development in Information Retrievalp 353 354 2002

[18] R Premraj and K Herzig ldquoNetwork versus code metrics topredict defects a replication studyrdquo in Proceedings of the 5thInternational Symposium on Empirical Software Engineering andMeasurement (ESEM rsquo11) pp 215ndash224 September 2011

[19] M Pinzger N Nagappan and B Murphy ldquoCan developer-module networks predict failuresrdquo in Proceedings of the 16thACM SIGSOFT International Symposium on the Foundationsof Software Engineering (SIGSOFT 2008FSE rsquo08) pp 2ndash12November 2008

[20] Y Shin A Meneely LWilliams and J A Osborne ldquoEvaluatingcomplexity code churn and developer activity metrics asindicators of software vulnerabilitiesrdquo IEEE Transactions onSoftware Engineering vol 37 no 6 pp 772ndash787 2011

[21] A Sureka A Goyal and A Rastogi ldquoUsing social networkanalysis for mining collaboration data in a defect trackingsystem for risk and vulnerability analysisrdquo in Proceedings of the4th India Software Engineering Conference (ISEC rsquo11) pp 195ndash204 February 2011

[22] F Perin L Renggli and J Ressia ldquoRanking software artifactsrdquoin Proceedings of the 4th Workshop on FAMIX and Moose inReengineering (ICSM rsquo10) pp 1ndash4 2010

[23] P Bhattacharya M Iliofotou I Neamtiu and M FaloutsosldquoGraphbased analysis and prediction for software evolutionrdquoin Proceedings of the International Conference on SoftwareEngineering (ICSE rsquo12) pp 419ndash429 2012

[24] D Steidl B Hummel and E Juergens ldquoUsing network analysisfor recommendation of central software classesrdquo in Proceedingsof the 19th Working Conference on Reverse Engineering pp 93ndash102 2012

[25] T Wolf A Schroter D Damian and T Nguyen ldquoPredictingbuild failures using social network analysisrdquo in Proceedings ofthe International Conference on Software Engineering (ICSE rsquo09)pp 1ndash11 May 2009

[26] M E J Newman Networks Oxford University Press OxfordUK 2010

[27] S Wasserman and K Faust Social Network Analysis Methodsand Applications Cambridge University Press 1994

[28] L C Freeman ldquoCentrality in social networks conceptual clari-ficationrdquo Social Networks vol 1 no 3 pp 215ndash239 1978

[29] M E J Newman and M Girvan ldquoFinding and evaluatingcommunity structure in networksrdquo Physical Review E vol 69no 2 Article ID 026113 15 pages 2004

[30] A Bachmann C Bird F Rahman P Devanbu and ABernstein ldquoThe missing links bugs and bug-fix commitsrdquo inProceedings of the 18th ACM SIGSOFT International Symposiumon the Foundations of Software Engineering (FSE rsquo10) pp 97ndash106November 2010

[31] RWuH Zhang S Kim and S C Cheung ldquoReLink recoveringlinks between bugs and changesrdquo inProceedings of the 19th ACMSIGSOFT Symposium on Foundations of Software Engineering(SIGSOFTFSE rsquo11) pp 15ndash25 September 2011

[32] G Antoniol K Ayari M Di Penta F Khomh and Y GGueheneuc ldquoIs it a bug or an enhancement A text-basedapproach to classify change requestsrdquo in Proceedings of theConference of the Center for Advanced Studies (CASCON rsquo08)ACM October 2008

[33] K Herzig S Just and A Zeller ldquoItrsquos not a bug itrsquos a feature howmisclassification impacts bug predictionrdquo in Proceedings of the35 International Conference on Software Engineering pp 392ndash401 2013

[34] R Tonelli G Concas MMarchesi and AMurgia ldquoAn analysisof SNA metrics on the Java Qualitas Corpusrdquo in Proceedingsof the 4th India Software Engineering Conference (ISEC rsquo11) pp205ndash213 February 2011

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 5: Research Article Using Software Dependency to Bug Predictiondownloads.hindawi.com/journals/mpe/2013/869356.pdfdefects. ey found that network metrics perform signi - cantly better than

Mathematical Problems in Engineering 5

RQ4 How well does the approach to predicting bug prone-nessseverity with significant metrics work

4 Approach

In our approach we mainly consider two Open SourceprojectsmdashTomcat and Antmdashwritten in Java as our researchsubjects in the Apache Software Foundation (ASF) (httpantapacheorg) and (httptomcatapacheorg) The ratio-nale is threefold (1)They are well-known and stable projectseach has undergone a number of major release cycles and isstill under active development (2)The source code revisionarchives and SVN commit-log dating back several years andbug reports are available it is a good chance for rewardingexperience in Open Source projects (3) The choice of Javaprogramming language is limited by the tools developedto construct software networks and we are interested inunderstanding open source software written in Java

We observed the information provided by Bug trackingsystems (Bugzilla and JIRA) and SVN commits found thatmost bugs have given out the detailed classes and somehave already posted the modified fragment of source codesThis information determines the feasibility of our approachBesides the dependencies between classes which are directedshould not be ignored Therefore we decide to abstract thedirected but unweighted software networks at the class level

Let us assume that 119866 = (119881 119864) is a software networkdefinition 119881 = V

119894 is the set of all classes and 119864 = 119890

119894119895 is

the set of all dependencies We distinguish between differentkinds of dependencies as follows

(i) V119894implementingextending the interfaceclass V

119895is

inheritance dependency

(ii) V119894having a field of type V

119895is field dependency

(iii) V119894calling a method of V

119895ismethod dependency

(iv) Amethod of V119894returning an object of type V

119895is return

dependency

(v) A method of V119894taking an object of V

119895as a parameter

is parameter dependency

A bug report yet offers many fields to help developersunderstand this issue one of which is severity assigned basedon how severely a bug affects the program by administratorsTable 1 shows eight levels of bug severity and their ranksin the Bugzilla An objective for software providers is notonly to minimize the total number of bugs but to ensurethat bugsrsquo severity as low as possible [23] The approachmainly consists of four phases (1) compiling the sourcecode files to extract the class-level directed software networksregarding the dependency requirements (2) exporting SVNcommits and integrating themwith bug reports to obtain ournecessary bug-class association relationships (3) calculatinga set of networkmetrics in the established software networksthen acquire their relationships with the number of bugs andbug severity (4) finally applying the significant metrics tothe analysis of bug prediction The framework is shown inFigure 2

Table 1 Levels of bug severity and their ranks in the Bugzilla bugtracking system

Bug severity RankBlocker 8Critical 7Regression 6Major 5Normal 4Minor 3Trivial 2Enhancement 1

EC

BC

CC

BugzillaBug-class

Bug severity

Bug proneness

Source code

Software network

Changelog

(1)

(2)

(3)

(4)

MR

PRHITS

inDoutDD

Figure 2 The framework of our approach

5 Case Study

This section presents the subject projects used for our studyand the relevant data we collected With the experimentalresults the proposed questions are answered in turn

51 Data For our analysis we collect a number of types ofdata We gather the open source software data from sourcecode repositories and bug database and determine the rankof each bug In this paper we choose Tomcat 7 and Ant184 as our experimental subjects As a successful opensource project both Tomcat and Ant maintain are a publiclyavailable source To construct expected software networksource codes are compiled to xml file then software networkis established by parsing the xml file In the subsequentpart we simplified Tomcat software network as TSN andAnt software network as ASN There are 35 versions forTomcat 7The latest version is continuously renovating so farthereby only the last 34 versions are adopted in our workBut few differences in the number of nodesedges betweeneach version A likely explanation for this is that Tomcat is a

6 Mathematical Problems in Engineering

Table 2 The statistics The number of bugs and fixed classesnodesedges of software network

Tomcat Antbugs 475 480classes 422(21) 389(29)SN nodes 2015 1345SN edges 9573 3937community 19 20

relatively mature open source software project with decadedevelopment history Of course not all projects keep thisphenomenon As we known that some projects have a nearlysuper linear growth [13] At last stable Tomcat 7029 versionis used to analyze consists of 2015 nodes and 9573 edges 19communities 1345 nodes and 3937 edges 20 communities forAnt

The bug reports are comprised of two sets one used forthe experiment and the other for prediction That is the firstthirty versions of Tomcat are utilized for progress (2) and(3) yet the last four versions for prediction Owing to Antdata refers to only one version eighty percent of Ant datais used for experiment and the remaining twenty percentfor prediction Note that studies of software defect rely onlinks between bug databases and commit changes and thisinformation plays an important role inmeasuring quality andpredicting defects Some prior researches suggested that thereare some missing links between bugs and bug-fix commitlogsThey proposed automatic approaches to recovermissinglinks [30 31] and found that the missing links lead to biaseddefect information and it affects prediction performanceHowever some authors argued that a bug may not be abug but a feature or an enhancement [32 33] This mis-classification also introduces bias in bug prediction modelsWhile considering the missing links the false position ofbug classification increases Hence only bugs always reportedas a bug in the bug tracker are taken into account Onceeffective bugs are identified one checks the change logsby searching for bug ID and calculating the similarity ofsummary text The reason for using heuristics is that mostof missing links are developed due to misclassification in ourdatasets The statistics of the experimental data are shown inTable 2 There are some test classes that are not taken intoaccount because they are duplicate In bug 53062 for exam-ple both class orgapachecatalinaconnectorTestResponse andorgapachecatalinaconnectorResponse appear but they rep-resent the same object in our study Note that less than thirtypercent of the classes are involved it is confirmed thatmost ofthe bugs occur in a small number of classes So all we shoulddo is to have a guiding detection in a system

52 Results

RQ1 Is the Position of Classes in the Software Network Relatedto the Bug Proneness To answer this research question firstof all the nodes are divided into different groups by metricvalue in descending order and each group keeps the same

number of nodes The reason for doing this is to facilitatethe exploration for relations between the metrics and bugsOwing to the last 15 nodes are never fixed in TSN thus theyare excluded to keep each group of 200 nodes For ASNin which nodes are divided into eleven groups evenly Thenumber of fixed classes and involved bugs in each group arerecordedWith this information the answer to1198761 is not hardto hold

The number of fixed classes and involved bugs for differ-ent metrics in each group are illustrated in Figures 3 and 4 InFigure 3 EC inD D and HITS have a significantly negativecorrelation (the bigger the average metric value of a class themore bug-prone) BC and outD show a skew distribution seeFigures 3(a) 3(b) 3(e) 3(f) 3(g) and 3(h) There are somedifferences in Figure 4 BC outD and D show a significantcorrelation but do not an obviously skew distribution seeFigures 4(b) 4(f) and 4(g) The results of other metrics arefluctuated especially in groups six eight and nine

MR shows the same relationship with fixed classes andbugs A few differences from the above metrics are thatthe distribution of bugs is higher than classesrsquo and morefluctuated In subfigure (i) the x-axis is the rank of modular(community) the larger modularity ratio the higher its rankis An overall positive trend is that the larger metric value of aclass is the more it will be bug-prone and themore it is likelyto be modified is shown especially BC outD andDmetrics

RQ2 Is the Position of Classes in the Software NetworkRelated to the Bug SeverityUnderstanding and characterizingthe relationships between network metrics and bug severityin open-source software projects is also a very interestingproblem Although the value of metric is positively relatedto bug proneness whether it is related to bug severity theanswer may be not because there exist a lot of bugs in aclass but all of them are not serious We conduct the nextexperiment to validate this research questionwith the severityrank given in Table 1 The results in Figure 5 show that therelationship deviates what was expected CC keeps stablein both projects the overall distribution trend is increasedfirst and then decreased The top row depicts the severity ofTomcat and the below row does Ant

Rank 5 and 7 are two prominent breakpoints marked bythe dotted line in each subfigure they refer to the majorand critical bugs respectively Bugs with rank 8 have thelarge metric value in TSN on the contrary this kind of bugshave the relatively low value in ASN Blocker bugs shouldbe avoided as much as possible in software engineeringpracticesOnce this kind of bug occurs and is found the originis where it may trigger that a lot of other parts are modifiedor the problem is only a small mistake but affects an vitalnode

RQ3 If So Which Metrics Are More Significant Differentmetrics measure the importance of a node from differentaspects in a network According to the results obtained fromRQ1 RQ2 we know that not all the metrics have the sameexpected relationship somemetrics may bemore suitable forbug proneness prediction and others for severity predictionIn this section another work is conducted to analyze how

Mathematical Problems in Engineering 7

0

20

40

60

80

0 2 4 6 8 10 12

(a) EC

0

50

100

150

0 2 4 6 8 10 12

(b) BC

0

10

20

30

40

50

60

70

0 2 4 6 8 10 12

(c) CC

0

50

100

150

0 2 4 6 8 10 12

(d) PR

0

20

40

60

80

0 2 4 6 8 10 12

(e) inD

0

50

100

150

0 2 4 6 8 10 12

(f) outD

0

20

40

60

80

100

120

0 2 4 6 8 10 12

ClassesBugs

(g) D

0

20

40

60

80

0 2 4 6 8 10 12

ClassesBugs

(h) HITS

0

50

100

150

200

250

0 2 4 6 8 10 12 14 16

ClassesBugs

(i) MR

Figure 3 Tomcat (the x-axis is the group ID and y-axis is the number of classes and bugs)

significant these metrics are and which metrics are betterThree typical correlation analysis methods are used to resolvethis problem Pearson Kendall and Spearman (Sig 119875 lt001) Pearson is widely used in statistics to measure thedegree of the relationship between linear related variables butboth variables should be normally distributed before usingit Kendall and Spearman are nonparametric tests that doesnot assume any assumptions related to certain distributionBut the former basically gives the impression of being apermutation test of sorts the latter is essentially based on theprinciple of least squares

Table 3 gives the correlation coefficient between metricvalue and bug proneness on eight metrics For Tomcatthe coefficient of outD is minimum with 0552 and D ismaximum with 0967 yet CC and PR metrics are not corre-lated Although the correlations with outD in three methodsare minimum the skew distribution curve in Figure 3(f)

represented that most of the bugs and classes are fixed inthe first two groups On the other hand as a result ofno skew phenomenon for Ant outD is more significantthan other metrics in three methods However CC ECinD PR and HITS are all not significant The maximumof outD is up to 0952 The differences mentioned abovesuggest that treating these metrics should be unequally andwith caution Straightforward to apply all metrics will likelymislead interpretations Consequently BC and outD are thesuitable metrics to represent the bug proneness and severityBoth of them will be used for the next prediction

RQ4 How Well Does the Approach to Predicting Bug Prone-nessSeverity with Significant Metrics Work Lastly once wehave understood and characterized the relationships betweennetwork metrics and bug proneness or severity in two opensource projects what are the effects if any on software

8 Mathematical Problems in Engineering

0

20

40

60

80

0 2 4 6 8 10 12

(a) EC

0 2 4 6 8 10 120

20

40

60

80

100

(b) BC

0

20

40

60

80

0 2 4 6 8 10 12

(c) CC

0

20

40

60

80

0 2 4 6 8 10 12

(d) PR

0

20

40

60

80

0 2 4 6 8 10 12

(e) inD

0

20

40

60

80

100

120

0 2 4 6 8 10 12

(f) outD

0

20

40

60

80

100

0 2 4 6 8 10 12

ClassesBugs

(g) D

0

20

40

60

80

0 2 4 6 8 10 12

ClassesBugs

(h) HITS

0

20

40

60

80

100

120

140

0 2 4 6 8 10 12 14 16 18 20

ClassesBugs

(i) MR

Figure 4 Ant (the x-axis is the group ID and y-axis is the number of classes and bugs)

Table 3 Bug proneness the correlation coefficient with eight network metrics

Bug proneness (Sig 119875 lt 001)BC CC D EC inD outD PR HITS

Pearson-TSN minus0804 mdash minus0903 minus0852 minus0852 minus0752 minus0728 minus0933Kendall-TSN minus0764 mdash minus0899 minus0689 minus0689 minus0552 mdash minus0733Spearman-TSN minus0888 mdash minus0967 minus0879 minus0879 minus0656 mdash minus0903Pearson-ASN minus0891 mdash minus0895 mdash mdash minus0915 mdash mdashKendall-ASN minus0782 mdash minus0807 mdash mdash minus0844 mdash mdashSpearman-ASN minus0909 mdash minus0916 mdash mdash minus0952 mdash mdashThe bold value refers to the maximum correlation coefficient of different analysis methods per project by comparing the eight metrics

quality Or what are the benefits to software engineeringpractices

We have learnt that BC and outDmetrics are remarkableto represent bug proneness and severity with the aboveexperiments Whether this conclusion is workable or notwe should validate it through bug proneness and severity

predictionwith the test dataThere are 67 effective bugs in testdata and 87 fixed classes related to these bugs for Tomcat and135 bugs and 149 fixed classes for Ant Given the experimentalrequirements bug severity is divided into two categoriesone is slight category in which bugsrsquo rank is less than 4 butrather the other is severe category Table 4 shows the resulting

Mathematical Problems in Engineering 9

predictive F-value from Top 50 to 200 for Tomcat and fromTop 50 to 125 for Ant using selected metrics on softwarenetworks The reason for top k within a range of 200 and 125for these two cases is to ensure the selected nodes from thefirst group F-value is calculated by integrating precise andrecall as

119865 =2 lowast precise lowast recallprecise + recall

(7)

The BCD metric is a combination of BC and outD usedto compare with each of them and follows the rule of BCD =BC+out119863 For Tomcat BCDmetric performs better on fixedclasses prediction than the others and themaximum is 03316when returning the top 100 nodes While all the nodes ingroup ID = 1 are selected three metrics obtain the sameaccuracy 02857 When it comes to the number of bugs themaximum is 0547 when using outD metric and just top 50classes are needed to check In this column outD and BCDmetrics are better than BCmetric in all casesThe last columnis the results of bug severityThefinding shows outDperformsfar better than the other metrics and the F-value is nearlymore than twice The best case is 06667 at top of 50 nodes

For Ant BCD metric performs better than the other twoon fixed classes and bugs when returning top 50 classesExcept this case outD has the highest accuracy at top 100 and125 and obtains the best result when returning top 100 classesthemaximumof classes bugs and severity are 03534 06383and 04865 respectively Throughout the prediction outD isthemost suitablemetric for predicting the bug proneness andbug severity in a directed class-level software network

Finally few concentrations are devoted to analyze theaverage human consumptions and needed comments whendeveloper wants to overcome one kind of bug In Figure 6one observes that more serious bugs need more people toparticipate and these people do more discussions beforethese bugs are identified as fixed which is consistent withthe common experience except rank = 1 A reasonableexplanation for this exception is that the enhancement bughas the lowest difficulty that many developers are able to findthe problem and offer their own suggestions or revisions

6 Discussion

RQ1 From the perspective of node importance the moreimportant a node is the greater its probability of beingmodified is and more bugs would be involved during thesoftware development process The central classes (nodes)form the skeleton of the entire system especially the nodeswith large betweenness centrality (BC) and outgoing links(outD) The former category of nodes plays an importantrole in bridging the different modules to ensure the normalexecution of the entire system Such nodes are also namedldquobrokerrdquo in network science The latter nodes are morecomplex for they aggregate behaviors frommany other nodesThey give some implications to software testing Nodes withhigh BCmean that they aremore important inmany contextsand have significant external responsibility Bugs occurred inthese nodes will influence the reachability between modulesOn the other hand nodes with a high outD mean that

they aggregate many behaviors of other nodes and have asignificant internal complexity They are more likely to befixed and contain bugs In this part PR and CC metrics arenot significant In summary our results are consistent withthe work in [34] some metrics are related to bug pronenessand the others are not

We conclude that the importance of different metricsof a node plays an inconsistent role in the analysis of bugproneness It is clear that BC and outD are suitable to testprioritization when the developers want to know whichclasses should be paid more attentions In some specialscenarios for example the node importance and out-degreecan also be applied to bug location with regard to therelationship between them

RQ2 RQ3 the hypothesis that a class would be apt to highbug proneness for great metric value and the bugs wouldbe more serious in this class is not always true FortunatelyTable 3 shows the average of BC D and outD metrics hasa significant correlation with software bug proneness Themost likely explanation for strong correlations among somemetrics and bug proneness is that the classes were dividedinto different groups having the same number of classes indescending order during the experiment This treatment canalso represent that the average metric value of precedinggroup is larger than the succeeding one and is correspondingto the top-k recommendation in the subsequent section Theresults show an overall trend between groups so that strongerthan the ones reported generally

In Figure 5 major regression and critical bugs have arelatively high value in both projects Moreover this phe-nomenon represented by major bug is more obvious Addi-tionally Bugs raised from the broker will hinder the transferof information between modules and function scheduling ofentire system As long as the modules cannot work smoothlythe quality of system cannot be guaranteed A class with highoutgoing links hasmore potential bugs but the severity of thebugs is higher Such classes are usually the central componentin a system or the central nodes in a network

As the members of a project team who are more con-cerned about whether they can find serious bugs sometimesIn other words they hope to find more bugs quickly andefficiently while more severe bugs in a limited period of timeOur results provide an appropriate method to alleviate thischallenge by BC and outDmetrics

RQ4 by the contrast among eight network metrics BCand outD are selected to predict bug proneness and severityOverall compared to other metrics a significant advantageof outD is shown in our results It can be applied to identifywhich classes should be modified prior to others The resultalso indicates that severe bugs usually have strong internalcomplexity

Based on our work we acquire somemeaningful answersto the proposed four questions in Section 3 However thereare still some potential limitations and threats to the validityof our work

(i) All datasets used in our experiments are collectedfrom open source projects Apache Tomcat 7 andAnt We know that there are many other available

10 Mathematical Problems in Engineering

Table 4 119865-value fixed classes bugs and bug severity

Classes Bugs SeverityBC outD BCD BC outD BCD BC outD BCD

Top 50 02043 02920 02628 03760 05470 04786 02917 06667 04583Top 100 02567 03102 03316 04072 04431 04551 03288 05069 03836Top 150 02616 02954 02954 03594 03779 03687 02857 04184 02959Top 200 02857 02857 02857 03071 03071 03521 02439 03740 02846Top 50 01307 02111 02312 01946 04221 04322 01481 02613 02613Top 100 02651 03534 03213 03489 06383 05957 02270 04865 04649Top 125 03139 03358 03285 04385 05923 05846 03143 04381 04286The bold value refers to the best 119865-value identified by comparing three metrics (BC outD and BCD) at the top K of predicting fixed classes bugs and bugseverity respectively

0

0001

0002

0003

0004

0005

0006

0 1 2 3 4 5 6 7 8 9

(a)

0

10

20

30

40

50

60

0 1 2 3 4 5 6 7 8 9

(b)

0

02

04

06

08

1

0 1 2 3 4 5 6 7 8 9

(c)

0

0002

0004

0006

0008

001

0012

0 1 2 3 4 5 6 7 8 9

PRHITSBC

(d)

0

5

10

15

20

25

30

0 1 2 3 4 5 6 7 8 9

inDoutDD

(e)

0

01

02

03

04

05

06

0 1 2 3 4 5 6 7 8 9

CCEC minus (E minus 01)

(f)

Figure 5 Bug severity (Tomcat-topAnt-bottom the x-axis is bug rank and y-axis is the mean of metric)

software repositories that are helpful to bug analysisWewould like to improve our approach by combiningmore software resources such as mailing-list forummessages in the future work to enrich the findings

(ii) A limitation of our work is that we choose projectswritten in Java because of the tool used to constructsoftware network can only deal with java source codeWhether our conclusion can explain projects writtenin other language or not it still needs to evaluate

(iii) Theoretically software networkswill evolve over timetherefore differences are inevitable between multiple

versions In this paper we utilize a stable versionof each project as the construction standard of soft-ware network instead of considering the structuraldifference between them Although such treatmentis rough it will not affect the final experimentalresults

(iv) As mentioned at the beginning software system canbe characterized from multiple granularity which isa multi-granularity software network We investigatethe bug proneness and severity through networkmetrics in class level If the results will be more

Mathematical Problems in Engineering 11

0

2

4

6

8

10

12

14

0 1 2 3 4 5 6 7 8 9

TomcatAnt

(a) Comments

0

1

2

3

4

5

6

0 1 2 3 4 5 6 7 8 9

TomcatAnt

(b) Persons

Figure 6 Human factor (the x-axis is bug rank)

delightful from the other granularity it is an attractivetheme

7 Conclusion

We constructed a class-level software networks and intro-duced nine representative and commonly-used networkmetrics to bug prediction An empirical study has beenconducted on open-source project Apache Tomcat 7 fromversion 700 to 7033 and Ant 184 from Feb 2002 toNovember 2012 Our analysis demonstrated that there is astatistically significant relation between classrsquos importanceand the probability that the class is to be modified and buggyAlso classrsquos importance is related to the severity of raisedbugs The result also showed that just about ten percentof efforts paid and the accuracy of our prediction for bugproneness and bug severity can be up to 547 and 667638 and 487 respectively when successively returningthe top 50 and 100 classes within these two cases

We expect that our findings are insightful and can beused to support the design and development of softwarehelping engineers in assessing the risk of adding or droppinga feature in the case of supporting existing dependenciesbetween classes We also believe that our approach can beleveraged in the analysis of bug prediction test prioritizationof others open source softwares Finally our findings alsoprovide additional empirical evidence on the importance ofdependencies between classes to researchers in the socialnetwork analysis domain

The next work will mainly focus on two aspects On theone hand we will collect more open source projects (ieEclipse Mozilla or projects deployed on SourceForge) tovalidate the generality of our approach On the other handwe will further take into account the human factors by

the state of software development are a process of humanparticipation An exploration about what impact will bebrought by sociotechnical congruence on bug prediction isurgent and meaningful

Acknowledgment

This work is supported by the National Basic ResearchProgram of China no 2014CB340401 National Natural Sci-ence Foundation of China nos 61273216 61272111 61202048and 61202032 Wuhan Planning Project of Science andTechnology no 201210621214 the Open Foundation ofJiangsu Provincial Key Laboratory of Electronic Businessno JSEB2012-02 and Zhejiang Provincial Natural ScienceFoundation of China no LQ12F02011

References

[1] J Anvik ldquoAutomating bug report assignmentrdquo in Proceedings ofthe 28th International Conference on Software Engineering 2006(ICSE rsquo06) pp 937ndash940 May 2006

[2] D Radjenovica M Herickob R Torkarc and A ZivkovicbldquoSoftware fault prediction metrics a systematic literaturereviewrdquo Information and Software Technology vol 55 pp 1397ndash1418 2013

[3] T Zimmermann and N Nagappan ldquoPredicting defects usingnetwork analysis on dependency graphsrdquo in Proceedings of the30th International Conference on Software Engineering 2008(ICSE rsquo08) pp 531ndash540 May 2008

[4] A Meneely L Williams W Snipes and J Osborne ldquoPredictingfailures with developer networks and social network analysisrdquoin Proceedings of the 16th ACM SIGSOFT International Sym-posium on the Foundations of Software Engineering (SIGSOFT2008FSE rsquo08) pp 13ndash23 November 2008

12 Mathematical Problems in Engineering

[5] KCrowstonKWeiQ Li and JHowison ldquoCore and peripheryin FreeLibre and Open Source software team communica-tionsrdquo in Proceedings of the 39th Annual Hawaii InternationalConference on System Sciences (HICSS rsquo06) p 118 January 2006

[6] T Zimmerman N Nagappan K Herzig R Premraj and LWilliams ldquoAn empirical study on the relation between depen-dency neighborhoods and failuresrdquo in Proceedings of the 4thIEEE International Conference on Software Testing Verificationand Validation (ICST rsquo11) pp 347ndash356 March 2011

[7] S Valverde R F Cancho and R V Sole ldquoScale-free networksfrom optimal designrdquo Europhysics Letters vol 60 no 4 pp 512ndash517 2002

[8] C R Myers ldquoSoftware systems as complex networks structurefunction and evolvability of software collaboration graphsrdquoPhysical Review E vol 68 no 4 Article ID 046116 15 pages2003

[9] G Concas M Marchesi S Pinna and N Serra ldquoPower-laws ina large object-oriented software systemrdquo IEEE Transactions onSoftware Engineering vol 33 no 10 pp 687ndash708 2007

[10] P Louridas D Spinellis and V Vlachos ldquoPower laws insoftwarerdquo ACM Transactions on Software Engineering andMethodology vol 18 no 1 article 2 2008

[11] B Li YutaoMa and J Liu ldquoAdvances in the studies on complexnetworks of software systemsrdquo Advances in Mechanics vol 38no 6 pp 805ndash814 2008

[12] B Li H Wang Z Y Li K Q He and D H Yu ldquoSoftware com-plexity metrics based on complex networksrdquo Acta ElectronicaSinica vol 34 pp 2371ndash2375 2006

[13] W Pan B Li Y Ma and J Liu ldquoMulti-granularity evolutionanalysis of software using complex network theoryrdquo Journal ofSystems Science and Complexity vol 24 no 6 pp 1068ndash10822011

[14] T Opsahl F Agneessens and J Skvoretz ldquoNode centrality inweighted networks generalizing degree and shortest pathsrdquoSocial Networks vol 32 no 3 pp 245ndash251 2010

[15] M S Zanetti I Scholtes C J Tessone and F SchweitzerldquoCategorizing bugs with social networks a case study on fouropen source software communitiesrdquo in Proceedings of the 35thInternational Conference on Software Engineering (ICSE rsquo13) pp1032ndash1041 2013

[16] M S Zanetti and F Schweitzer ldquoA network perspective on soft-ware modularityrdquo in Proceedings of the International Workshopon Complex Sciences in the Engineering of Computing Systemspp 175ndash186 2012

[17] C Ding X He P Husbands H Zha and H D SimonldquoPageRank HITS and a unified framework for link analysisrdquo inProceedings of the 25th Annual International ACM SIGIR Con-ference on Research and Development in Information Retrievalp 353 354 2002

[18] R Premraj and K Herzig ldquoNetwork versus code metrics topredict defects a replication studyrdquo in Proceedings of the 5thInternational Symposium on Empirical Software Engineering andMeasurement (ESEM rsquo11) pp 215ndash224 September 2011

[19] M Pinzger N Nagappan and B Murphy ldquoCan developer-module networks predict failuresrdquo in Proceedings of the 16thACM SIGSOFT International Symposium on the Foundationsof Software Engineering (SIGSOFT 2008FSE rsquo08) pp 2ndash12November 2008

[20] Y Shin A Meneely LWilliams and J A Osborne ldquoEvaluatingcomplexity code churn and developer activity metrics asindicators of software vulnerabilitiesrdquo IEEE Transactions onSoftware Engineering vol 37 no 6 pp 772ndash787 2011

[21] A Sureka A Goyal and A Rastogi ldquoUsing social networkanalysis for mining collaboration data in a defect trackingsystem for risk and vulnerability analysisrdquo in Proceedings of the4th India Software Engineering Conference (ISEC rsquo11) pp 195ndash204 February 2011

[22] F Perin L Renggli and J Ressia ldquoRanking software artifactsrdquoin Proceedings of the 4th Workshop on FAMIX and Moose inReengineering (ICSM rsquo10) pp 1ndash4 2010

[23] P Bhattacharya M Iliofotou I Neamtiu and M FaloutsosldquoGraphbased analysis and prediction for software evolutionrdquoin Proceedings of the International Conference on SoftwareEngineering (ICSE rsquo12) pp 419ndash429 2012

[24] D Steidl B Hummel and E Juergens ldquoUsing network analysisfor recommendation of central software classesrdquo in Proceedingsof the 19th Working Conference on Reverse Engineering pp 93ndash102 2012

[25] T Wolf A Schroter D Damian and T Nguyen ldquoPredictingbuild failures using social network analysisrdquo in Proceedings ofthe International Conference on Software Engineering (ICSE rsquo09)pp 1ndash11 May 2009

[26] M E J Newman Networks Oxford University Press OxfordUK 2010

[27] S Wasserman and K Faust Social Network Analysis Methodsand Applications Cambridge University Press 1994

[28] L C Freeman ldquoCentrality in social networks conceptual clari-ficationrdquo Social Networks vol 1 no 3 pp 215ndash239 1978

[29] M E J Newman and M Girvan ldquoFinding and evaluatingcommunity structure in networksrdquo Physical Review E vol 69no 2 Article ID 026113 15 pages 2004

[30] A Bachmann C Bird F Rahman P Devanbu and ABernstein ldquoThe missing links bugs and bug-fix commitsrdquo inProceedings of the 18th ACM SIGSOFT International Symposiumon the Foundations of Software Engineering (FSE rsquo10) pp 97ndash106November 2010

[31] RWuH Zhang S Kim and S C Cheung ldquoReLink recoveringlinks between bugs and changesrdquo inProceedings of the 19th ACMSIGSOFT Symposium on Foundations of Software Engineering(SIGSOFTFSE rsquo11) pp 15ndash25 September 2011

[32] G Antoniol K Ayari M Di Penta F Khomh and Y GGueheneuc ldquoIs it a bug or an enhancement A text-basedapproach to classify change requestsrdquo in Proceedings of theConference of the Center for Advanced Studies (CASCON rsquo08)ACM October 2008

[33] K Herzig S Just and A Zeller ldquoItrsquos not a bug itrsquos a feature howmisclassification impacts bug predictionrdquo in Proceedings of the35 International Conference on Software Engineering pp 392ndash401 2013

[34] R Tonelli G Concas MMarchesi and AMurgia ldquoAn analysisof SNA metrics on the Java Qualitas Corpusrdquo in Proceedingsof the 4th India Software Engineering Conference (ISEC rsquo11) pp205ndash213 February 2011

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 6: Research Article Using Software Dependency to Bug Predictiondownloads.hindawi.com/journals/mpe/2013/869356.pdfdefects. ey found that network metrics perform signi - cantly better than

6 Mathematical Problems in Engineering

Table 2 The statistics The number of bugs and fixed classesnodesedges of software network

Tomcat Antbugs 475 480classes 422(21) 389(29)SN nodes 2015 1345SN edges 9573 3937community 19 20

relatively mature open source software project with decadedevelopment history Of course not all projects keep thisphenomenon As we known that some projects have a nearlysuper linear growth [13] At last stable Tomcat 7029 versionis used to analyze consists of 2015 nodes and 9573 edges 19communities 1345 nodes and 3937 edges 20 communities forAnt

The bug reports are comprised of two sets one used forthe experiment and the other for prediction That is the firstthirty versions of Tomcat are utilized for progress (2) and(3) yet the last four versions for prediction Owing to Antdata refers to only one version eighty percent of Ant datais used for experiment and the remaining twenty percentfor prediction Note that studies of software defect rely onlinks between bug databases and commit changes and thisinformation plays an important role inmeasuring quality andpredicting defects Some prior researches suggested that thereare some missing links between bugs and bug-fix commitlogsThey proposed automatic approaches to recovermissinglinks [30 31] and found that the missing links lead to biaseddefect information and it affects prediction performanceHowever some authors argued that a bug may not be abug but a feature or an enhancement [32 33] This mis-classification also introduces bias in bug prediction modelsWhile considering the missing links the false position ofbug classification increases Hence only bugs always reportedas a bug in the bug tracker are taken into account Onceeffective bugs are identified one checks the change logsby searching for bug ID and calculating the similarity ofsummary text The reason for using heuristics is that mostof missing links are developed due to misclassification in ourdatasets The statistics of the experimental data are shown inTable 2 There are some test classes that are not taken intoaccount because they are duplicate In bug 53062 for exam-ple both class orgapachecatalinaconnectorTestResponse andorgapachecatalinaconnectorResponse appear but they rep-resent the same object in our study Note that less than thirtypercent of the classes are involved it is confirmed thatmost ofthe bugs occur in a small number of classes So all we shoulddo is to have a guiding detection in a system

52 Results

RQ1 Is the Position of Classes in the Software Network Relatedto the Bug Proneness To answer this research question firstof all the nodes are divided into different groups by metricvalue in descending order and each group keeps the same

number of nodes The reason for doing this is to facilitatethe exploration for relations between the metrics and bugsOwing to the last 15 nodes are never fixed in TSN thus theyare excluded to keep each group of 200 nodes For ASNin which nodes are divided into eleven groups evenly Thenumber of fixed classes and involved bugs in each group arerecordedWith this information the answer to1198761 is not hardto hold

The number of fixed classes and involved bugs for differ-ent metrics in each group are illustrated in Figures 3 and 4 InFigure 3 EC inD D and HITS have a significantly negativecorrelation (the bigger the average metric value of a class themore bug-prone) BC and outD show a skew distribution seeFigures 3(a) 3(b) 3(e) 3(f) 3(g) and 3(h) There are somedifferences in Figure 4 BC outD and D show a significantcorrelation but do not an obviously skew distribution seeFigures 4(b) 4(f) and 4(g) The results of other metrics arefluctuated especially in groups six eight and nine

MR shows the same relationship with fixed classes andbugs A few differences from the above metrics are thatthe distribution of bugs is higher than classesrsquo and morefluctuated In subfigure (i) the x-axis is the rank of modular(community) the larger modularity ratio the higher its rankis An overall positive trend is that the larger metric value of aclass is the more it will be bug-prone and themore it is likelyto be modified is shown especially BC outD andDmetrics

RQ2 Is the Position of Classes in the Software NetworkRelated to the Bug SeverityUnderstanding and characterizingthe relationships between network metrics and bug severityin open-source software projects is also a very interestingproblem Although the value of metric is positively relatedto bug proneness whether it is related to bug severity theanswer may be not because there exist a lot of bugs in aclass but all of them are not serious We conduct the nextexperiment to validate this research questionwith the severityrank given in Table 1 The results in Figure 5 show that therelationship deviates what was expected CC keeps stablein both projects the overall distribution trend is increasedfirst and then decreased The top row depicts the severity ofTomcat and the below row does Ant

Rank 5 and 7 are two prominent breakpoints marked bythe dotted line in each subfigure they refer to the majorand critical bugs respectively Bugs with rank 8 have thelarge metric value in TSN on the contrary this kind of bugshave the relatively low value in ASN Blocker bugs shouldbe avoided as much as possible in software engineeringpracticesOnce this kind of bug occurs and is found the originis where it may trigger that a lot of other parts are modifiedor the problem is only a small mistake but affects an vitalnode

RQ3 If So Which Metrics Are More Significant Differentmetrics measure the importance of a node from differentaspects in a network According to the results obtained fromRQ1 RQ2 we know that not all the metrics have the sameexpected relationship somemetrics may bemore suitable forbug proneness prediction and others for severity predictionIn this section another work is conducted to analyze how

Mathematical Problems in Engineering 7

0

20

40

60

80

0 2 4 6 8 10 12

(a) EC

0

50

100

150

0 2 4 6 8 10 12

(b) BC

0

10

20

30

40

50

60

70

0 2 4 6 8 10 12

(c) CC

0

50

100

150

0 2 4 6 8 10 12

(d) PR

0

20

40

60

80

0 2 4 6 8 10 12

(e) inD

0

50

100

150

0 2 4 6 8 10 12

(f) outD

0

20

40

60

80

100

120

0 2 4 6 8 10 12

ClassesBugs

(g) D

0

20

40

60

80

0 2 4 6 8 10 12

ClassesBugs

(h) HITS

0

50

100

150

200

250

0 2 4 6 8 10 12 14 16

ClassesBugs

(i) MR

Figure 3 Tomcat (the x-axis is the group ID and y-axis is the number of classes and bugs)

significant these metrics are and which metrics are betterThree typical correlation analysis methods are used to resolvethis problem Pearson Kendall and Spearman (Sig 119875 lt001) Pearson is widely used in statistics to measure thedegree of the relationship between linear related variables butboth variables should be normally distributed before usingit Kendall and Spearman are nonparametric tests that doesnot assume any assumptions related to certain distributionBut the former basically gives the impression of being apermutation test of sorts the latter is essentially based on theprinciple of least squares

Table 3 gives the correlation coefficient between metricvalue and bug proneness on eight metrics For Tomcatthe coefficient of outD is minimum with 0552 and D ismaximum with 0967 yet CC and PR metrics are not corre-lated Although the correlations with outD in three methodsare minimum the skew distribution curve in Figure 3(f)

represented that most of the bugs and classes are fixed inthe first two groups On the other hand as a result ofno skew phenomenon for Ant outD is more significantthan other metrics in three methods However CC ECinD PR and HITS are all not significant The maximumof outD is up to 0952 The differences mentioned abovesuggest that treating these metrics should be unequally andwith caution Straightforward to apply all metrics will likelymislead interpretations Consequently BC and outD are thesuitable metrics to represent the bug proneness and severityBoth of them will be used for the next prediction

RQ4 How Well Does the Approach to Predicting Bug Prone-nessSeverity with Significant Metrics Work Lastly once wehave understood and characterized the relationships betweennetwork metrics and bug proneness or severity in two opensource projects what are the effects if any on software

8 Mathematical Problems in Engineering

0

20

40

60

80

0 2 4 6 8 10 12

(a) EC

0 2 4 6 8 10 120

20

40

60

80

100

(b) BC

0

20

40

60

80

0 2 4 6 8 10 12

(c) CC

0

20

40

60

80

0 2 4 6 8 10 12

(d) PR

0

20

40

60

80

0 2 4 6 8 10 12

(e) inD

0

20

40

60

80

100

120

0 2 4 6 8 10 12

(f) outD

0

20

40

60

80

100

0 2 4 6 8 10 12

ClassesBugs

(g) D

0

20

40

60

80

0 2 4 6 8 10 12

ClassesBugs

(h) HITS

0

20

40

60

80

100

120

140

0 2 4 6 8 10 12 14 16 18 20

ClassesBugs

(i) MR

Figure 4 Ant (the x-axis is the group ID and y-axis is the number of classes and bugs)

Table 3 Bug proneness the correlation coefficient with eight network metrics

Bug proneness (Sig 119875 lt 001)BC CC D EC inD outD PR HITS

Pearson-TSN minus0804 mdash minus0903 minus0852 minus0852 minus0752 minus0728 minus0933Kendall-TSN minus0764 mdash minus0899 minus0689 minus0689 minus0552 mdash minus0733Spearman-TSN minus0888 mdash minus0967 minus0879 minus0879 minus0656 mdash minus0903Pearson-ASN minus0891 mdash minus0895 mdash mdash minus0915 mdash mdashKendall-ASN minus0782 mdash minus0807 mdash mdash minus0844 mdash mdashSpearman-ASN minus0909 mdash minus0916 mdash mdash minus0952 mdash mdashThe bold value refers to the maximum correlation coefficient of different analysis methods per project by comparing the eight metrics

quality Or what are the benefits to software engineeringpractices

We have learnt that BC and outDmetrics are remarkableto represent bug proneness and severity with the aboveexperiments Whether this conclusion is workable or notwe should validate it through bug proneness and severity

predictionwith the test dataThere are 67 effective bugs in testdata and 87 fixed classes related to these bugs for Tomcat and135 bugs and 149 fixed classes for Ant Given the experimentalrequirements bug severity is divided into two categoriesone is slight category in which bugsrsquo rank is less than 4 butrather the other is severe category Table 4 shows the resulting

Mathematical Problems in Engineering 9

predictive F-value from Top 50 to 200 for Tomcat and fromTop 50 to 125 for Ant using selected metrics on softwarenetworks The reason for top k within a range of 200 and 125for these two cases is to ensure the selected nodes from thefirst group F-value is calculated by integrating precise andrecall as

119865 =2 lowast precise lowast recallprecise + recall

(7)

The BCD metric is a combination of BC and outD usedto compare with each of them and follows the rule of BCD =BC+out119863 For Tomcat BCDmetric performs better on fixedclasses prediction than the others and themaximum is 03316when returning the top 100 nodes While all the nodes ingroup ID = 1 are selected three metrics obtain the sameaccuracy 02857 When it comes to the number of bugs themaximum is 0547 when using outD metric and just top 50classes are needed to check In this column outD and BCDmetrics are better than BCmetric in all casesThe last columnis the results of bug severityThefinding shows outDperformsfar better than the other metrics and the F-value is nearlymore than twice The best case is 06667 at top of 50 nodes

For Ant BCD metric performs better than the other twoon fixed classes and bugs when returning top 50 classesExcept this case outD has the highest accuracy at top 100 and125 and obtains the best result when returning top 100 classesthemaximumof classes bugs and severity are 03534 06383and 04865 respectively Throughout the prediction outD isthemost suitablemetric for predicting the bug proneness andbug severity in a directed class-level software network

Finally few concentrations are devoted to analyze theaverage human consumptions and needed comments whendeveloper wants to overcome one kind of bug In Figure 6one observes that more serious bugs need more people toparticipate and these people do more discussions beforethese bugs are identified as fixed which is consistent withthe common experience except rank = 1 A reasonableexplanation for this exception is that the enhancement bughas the lowest difficulty that many developers are able to findthe problem and offer their own suggestions or revisions

6 Discussion

RQ1 From the perspective of node importance the moreimportant a node is the greater its probability of beingmodified is and more bugs would be involved during thesoftware development process The central classes (nodes)form the skeleton of the entire system especially the nodeswith large betweenness centrality (BC) and outgoing links(outD) The former category of nodes plays an importantrole in bridging the different modules to ensure the normalexecution of the entire system Such nodes are also namedldquobrokerrdquo in network science The latter nodes are morecomplex for they aggregate behaviors frommany other nodesThey give some implications to software testing Nodes withhigh BCmean that they aremore important inmany contextsand have significant external responsibility Bugs occurred inthese nodes will influence the reachability between modulesOn the other hand nodes with a high outD mean that

they aggregate many behaviors of other nodes and have asignificant internal complexity They are more likely to befixed and contain bugs In this part PR and CC metrics arenot significant In summary our results are consistent withthe work in [34] some metrics are related to bug pronenessand the others are not

We conclude that the importance of different metricsof a node plays an inconsistent role in the analysis of bugproneness It is clear that BC and outD are suitable to testprioritization when the developers want to know whichclasses should be paid more attentions In some specialscenarios for example the node importance and out-degreecan also be applied to bug location with regard to therelationship between them

RQ2 RQ3 the hypothesis that a class would be apt to highbug proneness for great metric value and the bugs wouldbe more serious in this class is not always true FortunatelyTable 3 shows the average of BC D and outD metrics hasa significant correlation with software bug proneness Themost likely explanation for strong correlations among somemetrics and bug proneness is that the classes were dividedinto different groups having the same number of classes indescending order during the experiment This treatment canalso represent that the average metric value of precedinggroup is larger than the succeeding one and is correspondingto the top-k recommendation in the subsequent section Theresults show an overall trend between groups so that strongerthan the ones reported generally

In Figure 5 major regression and critical bugs have arelatively high value in both projects Moreover this phe-nomenon represented by major bug is more obvious Addi-tionally Bugs raised from the broker will hinder the transferof information between modules and function scheduling ofentire system As long as the modules cannot work smoothlythe quality of system cannot be guaranteed A class with highoutgoing links hasmore potential bugs but the severity of thebugs is higher Such classes are usually the central componentin a system or the central nodes in a network

As the members of a project team who are more con-cerned about whether they can find serious bugs sometimesIn other words they hope to find more bugs quickly andefficiently while more severe bugs in a limited period of timeOur results provide an appropriate method to alleviate thischallenge by BC and outDmetrics

RQ4 by the contrast among eight network metrics BCand outD are selected to predict bug proneness and severityOverall compared to other metrics a significant advantageof outD is shown in our results It can be applied to identifywhich classes should be modified prior to others The resultalso indicates that severe bugs usually have strong internalcomplexity

Based on our work we acquire somemeaningful answersto the proposed four questions in Section 3 However thereare still some potential limitations and threats to the validityof our work

(i) All datasets used in our experiments are collectedfrom open source projects Apache Tomcat 7 andAnt We know that there are many other available

10 Mathematical Problems in Engineering

Table 4 119865-value fixed classes bugs and bug severity

Classes Bugs SeverityBC outD BCD BC outD BCD BC outD BCD

Top 50 02043 02920 02628 03760 05470 04786 02917 06667 04583Top 100 02567 03102 03316 04072 04431 04551 03288 05069 03836Top 150 02616 02954 02954 03594 03779 03687 02857 04184 02959Top 200 02857 02857 02857 03071 03071 03521 02439 03740 02846Top 50 01307 02111 02312 01946 04221 04322 01481 02613 02613Top 100 02651 03534 03213 03489 06383 05957 02270 04865 04649Top 125 03139 03358 03285 04385 05923 05846 03143 04381 04286The bold value refers to the best 119865-value identified by comparing three metrics (BC outD and BCD) at the top K of predicting fixed classes bugs and bugseverity respectively

0

0001

0002

0003

0004

0005

0006

0 1 2 3 4 5 6 7 8 9

(a)

0

10

20

30

40

50

60

0 1 2 3 4 5 6 7 8 9

(b)

0

02

04

06

08

1

0 1 2 3 4 5 6 7 8 9

(c)

0

0002

0004

0006

0008

001

0012

0 1 2 3 4 5 6 7 8 9

PRHITSBC

(d)

0

5

10

15

20

25

30

0 1 2 3 4 5 6 7 8 9

inDoutDD

(e)

0

01

02

03

04

05

06

0 1 2 3 4 5 6 7 8 9

CCEC minus (E minus 01)

(f)

Figure 5 Bug severity (Tomcat-topAnt-bottom the x-axis is bug rank and y-axis is the mean of metric)

software repositories that are helpful to bug analysisWewould like to improve our approach by combiningmore software resources such as mailing-list forummessages in the future work to enrich the findings

(ii) A limitation of our work is that we choose projectswritten in Java because of the tool used to constructsoftware network can only deal with java source codeWhether our conclusion can explain projects writtenin other language or not it still needs to evaluate

(iii) Theoretically software networkswill evolve over timetherefore differences are inevitable between multiple

versions In this paper we utilize a stable versionof each project as the construction standard of soft-ware network instead of considering the structuraldifference between them Although such treatmentis rough it will not affect the final experimentalresults

(iv) As mentioned at the beginning software system canbe characterized from multiple granularity which isa multi-granularity software network We investigatethe bug proneness and severity through networkmetrics in class level If the results will be more

Mathematical Problems in Engineering 11

0

2

4

6

8

10

12

14

0 1 2 3 4 5 6 7 8 9

TomcatAnt

(a) Comments

0

1

2

3

4

5

6

0 1 2 3 4 5 6 7 8 9

TomcatAnt

(b) Persons

Figure 6 Human factor (the x-axis is bug rank)

delightful from the other granularity it is an attractivetheme

7 Conclusion

We constructed a class-level software networks and intro-duced nine representative and commonly-used networkmetrics to bug prediction An empirical study has beenconducted on open-source project Apache Tomcat 7 fromversion 700 to 7033 and Ant 184 from Feb 2002 toNovember 2012 Our analysis demonstrated that there is astatistically significant relation between classrsquos importanceand the probability that the class is to be modified and buggyAlso classrsquos importance is related to the severity of raisedbugs The result also showed that just about ten percentof efforts paid and the accuracy of our prediction for bugproneness and bug severity can be up to 547 and 667638 and 487 respectively when successively returningthe top 50 and 100 classes within these two cases

We expect that our findings are insightful and can beused to support the design and development of softwarehelping engineers in assessing the risk of adding or droppinga feature in the case of supporting existing dependenciesbetween classes We also believe that our approach can beleveraged in the analysis of bug prediction test prioritizationof others open source softwares Finally our findings alsoprovide additional empirical evidence on the importance ofdependencies between classes to researchers in the socialnetwork analysis domain

The next work will mainly focus on two aspects On theone hand we will collect more open source projects (ieEclipse Mozilla or projects deployed on SourceForge) tovalidate the generality of our approach On the other handwe will further take into account the human factors by

the state of software development are a process of humanparticipation An exploration about what impact will bebrought by sociotechnical congruence on bug prediction isurgent and meaningful

Acknowledgment

This work is supported by the National Basic ResearchProgram of China no 2014CB340401 National Natural Sci-ence Foundation of China nos 61273216 61272111 61202048and 61202032 Wuhan Planning Project of Science andTechnology no 201210621214 the Open Foundation ofJiangsu Provincial Key Laboratory of Electronic Businessno JSEB2012-02 and Zhejiang Provincial Natural ScienceFoundation of China no LQ12F02011

References

[1] J Anvik ldquoAutomating bug report assignmentrdquo in Proceedings ofthe 28th International Conference on Software Engineering 2006(ICSE rsquo06) pp 937ndash940 May 2006

[2] D Radjenovica M Herickob R Torkarc and A ZivkovicbldquoSoftware fault prediction metrics a systematic literaturereviewrdquo Information and Software Technology vol 55 pp 1397ndash1418 2013

[3] T Zimmermann and N Nagappan ldquoPredicting defects usingnetwork analysis on dependency graphsrdquo in Proceedings of the30th International Conference on Software Engineering 2008(ICSE rsquo08) pp 531ndash540 May 2008

[4] A Meneely L Williams W Snipes and J Osborne ldquoPredictingfailures with developer networks and social network analysisrdquoin Proceedings of the 16th ACM SIGSOFT International Sym-posium on the Foundations of Software Engineering (SIGSOFT2008FSE rsquo08) pp 13ndash23 November 2008

12 Mathematical Problems in Engineering

[5] KCrowstonKWeiQ Li and JHowison ldquoCore and peripheryin FreeLibre and Open Source software team communica-tionsrdquo in Proceedings of the 39th Annual Hawaii InternationalConference on System Sciences (HICSS rsquo06) p 118 January 2006

[6] T Zimmerman N Nagappan K Herzig R Premraj and LWilliams ldquoAn empirical study on the relation between depen-dency neighborhoods and failuresrdquo in Proceedings of the 4thIEEE International Conference on Software Testing Verificationand Validation (ICST rsquo11) pp 347ndash356 March 2011

[7] S Valverde R F Cancho and R V Sole ldquoScale-free networksfrom optimal designrdquo Europhysics Letters vol 60 no 4 pp 512ndash517 2002

[8] C R Myers ldquoSoftware systems as complex networks structurefunction and evolvability of software collaboration graphsrdquoPhysical Review E vol 68 no 4 Article ID 046116 15 pages2003

[9] G Concas M Marchesi S Pinna and N Serra ldquoPower-laws ina large object-oriented software systemrdquo IEEE Transactions onSoftware Engineering vol 33 no 10 pp 687ndash708 2007

[10] P Louridas D Spinellis and V Vlachos ldquoPower laws insoftwarerdquo ACM Transactions on Software Engineering andMethodology vol 18 no 1 article 2 2008

[11] B Li YutaoMa and J Liu ldquoAdvances in the studies on complexnetworks of software systemsrdquo Advances in Mechanics vol 38no 6 pp 805ndash814 2008

[12] B Li H Wang Z Y Li K Q He and D H Yu ldquoSoftware com-plexity metrics based on complex networksrdquo Acta ElectronicaSinica vol 34 pp 2371ndash2375 2006

[13] W Pan B Li Y Ma and J Liu ldquoMulti-granularity evolutionanalysis of software using complex network theoryrdquo Journal ofSystems Science and Complexity vol 24 no 6 pp 1068ndash10822011

[14] T Opsahl F Agneessens and J Skvoretz ldquoNode centrality inweighted networks generalizing degree and shortest pathsrdquoSocial Networks vol 32 no 3 pp 245ndash251 2010

[15] M S Zanetti I Scholtes C J Tessone and F SchweitzerldquoCategorizing bugs with social networks a case study on fouropen source software communitiesrdquo in Proceedings of the 35thInternational Conference on Software Engineering (ICSE rsquo13) pp1032ndash1041 2013

[16] M S Zanetti and F Schweitzer ldquoA network perspective on soft-ware modularityrdquo in Proceedings of the International Workshopon Complex Sciences in the Engineering of Computing Systemspp 175ndash186 2012

[17] C Ding X He P Husbands H Zha and H D SimonldquoPageRank HITS and a unified framework for link analysisrdquo inProceedings of the 25th Annual International ACM SIGIR Con-ference on Research and Development in Information Retrievalp 353 354 2002

[18] R Premraj and K Herzig ldquoNetwork versus code metrics topredict defects a replication studyrdquo in Proceedings of the 5thInternational Symposium on Empirical Software Engineering andMeasurement (ESEM rsquo11) pp 215ndash224 September 2011

[19] M Pinzger N Nagappan and B Murphy ldquoCan developer-module networks predict failuresrdquo in Proceedings of the 16thACM SIGSOFT International Symposium on the Foundationsof Software Engineering (SIGSOFT 2008FSE rsquo08) pp 2ndash12November 2008

[20] Y Shin A Meneely LWilliams and J A Osborne ldquoEvaluatingcomplexity code churn and developer activity metrics asindicators of software vulnerabilitiesrdquo IEEE Transactions onSoftware Engineering vol 37 no 6 pp 772ndash787 2011

[21] A Sureka A Goyal and A Rastogi ldquoUsing social networkanalysis for mining collaboration data in a defect trackingsystem for risk and vulnerability analysisrdquo in Proceedings of the4th India Software Engineering Conference (ISEC rsquo11) pp 195ndash204 February 2011

[22] F Perin L Renggli and J Ressia ldquoRanking software artifactsrdquoin Proceedings of the 4th Workshop on FAMIX and Moose inReengineering (ICSM rsquo10) pp 1ndash4 2010

[23] P Bhattacharya M Iliofotou I Neamtiu and M FaloutsosldquoGraphbased analysis and prediction for software evolutionrdquoin Proceedings of the International Conference on SoftwareEngineering (ICSE rsquo12) pp 419ndash429 2012

[24] D Steidl B Hummel and E Juergens ldquoUsing network analysisfor recommendation of central software classesrdquo in Proceedingsof the 19th Working Conference on Reverse Engineering pp 93ndash102 2012

[25] T Wolf A Schroter D Damian and T Nguyen ldquoPredictingbuild failures using social network analysisrdquo in Proceedings ofthe International Conference on Software Engineering (ICSE rsquo09)pp 1ndash11 May 2009

[26] M E J Newman Networks Oxford University Press OxfordUK 2010

[27] S Wasserman and K Faust Social Network Analysis Methodsand Applications Cambridge University Press 1994

[28] L C Freeman ldquoCentrality in social networks conceptual clari-ficationrdquo Social Networks vol 1 no 3 pp 215ndash239 1978

[29] M E J Newman and M Girvan ldquoFinding and evaluatingcommunity structure in networksrdquo Physical Review E vol 69no 2 Article ID 026113 15 pages 2004

[30] A Bachmann C Bird F Rahman P Devanbu and ABernstein ldquoThe missing links bugs and bug-fix commitsrdquo inProceedings of the 18th ACM SIGSOFT International Symposiumon the Foundations of Software Engineering (FSE rsquo10) pp 97ndash106November 2010

[31] RWuH Zhang S Kim and S C Cheung ldquoReLink recoveringlinks between bugs and changesrdquo inProceedings of the 19th ACMSIGSOFT Symposium on Foundations of Software Engineering(SIGSOFTFSE rsquo11) pp 15ndash25 September 2011

[32] G Antoniol K Ayari M Di Penta F Khomh and Y GGueheneuc ldquoIs it a bug or an enhancement A text-basedapproach to classify change requestsrdquo in Proceedings of theConference of the Center for Advanced Studies (CASCON rsquo08)ACM October 2008

[33] K Herzig S Just and A Zeller ldquoItrsquos not a bug itrsquos a feature howmisclassification impacts bug predictionrdquo in Proceedings of the35 International Conference on Software Engineering pp 392ndash401 2013

[34] R Tonelli G Concas MMarchesi and AMurgia ldquoAn analysisof SNA metrics on the Java Qualitas Corpusrdquo in Proceedingsof the 4th India Software Engineering Conference (ISEC rsquo11) pp205ndash213 February 2011

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 7: Research Article Using Software Dependency to Bug Predictiondownloads.hindawi.com/journals/mpe/2013/869356.pdfdefects. ey found that network metrics perform signi - cantly better than

Mathematical Problems in Engineering 7

0

20

40

60

80

0 2 4 6 8 10 12

(a) EC

0

50

100

150

0 2 4 6 8 10 12

(b) BC

0

10

20

30

40

50

60

70

0 2 4 6 8 10 12

(c) CC

0

50

100

150

0 2 4 6 8 10 12

(d) PR

0

20

40

60

80

0 2 4 6 8 10 12

(e) inD

0

50

100

150

0 2 4 6 8 10 12

(f) outD

0

20

40

60

80

100

120

0 2 4 6 8 10 12

ClassesBugs

(g) D

0

20

40

60

80

0 2 4 6 8 10 12

ClassesBugs

(h) HITS

0

50

100

150

200

250

0 2 4 6 8 10 12 14 16

ClassesBugs

(i) MR

Figure 3 Tomcat (the x-axis is the group ID and y-axis is the number of classes and bugs)

significant these metrics are and which metrics are betterThree typical correlation analysis methods are used to resolvethis problem Pearson Kendall and Spearman (Sig 119875 lt001) Pearson is widely used in statistics to measure thedegree of the relationship between linear related variables butboth variables should be normally distributed before usingit Kendall and Spearman are nonparametric tests that doesnot assume any assumptions related to certain distributionBut the former basically gives the impression of being apermutation test of sorts the latter is essentially based on theprinciple of least squares

Table 3 gives the correlation coefficient between metricvalue and bug proneness on eight metrics For Tomcatthe coefficient of outD is minimum with 0552 and D ismaximum with 0967 yet CC and PR metrics are not corre-lated Although the correlations with outD in three methodsare minimum the skew distribution curve in Figure 3(f)

represented that most of the bugs and classes are fixed inthe first two groups On the other hand as a result ofno skew phenomenon for Ant outD is more significantthan other metrics in three methods However CC ECinD PR and HITS are all not significant The maximumof outD is up to 0952 The differences mentioned abovesuggest that treating these metrics should be unequally andwith caution Straightforward to apply all metrics will likelymislead interpretations Consequently BC and outD are thesuitable metrics to represent the bug proneness and severityBoth of them will be used for the next prediction

RQ4 How Well Does the Approach to Predicting Bug Prone-nessSeverity with Significant Metrics Work Lastly once wehave understood and characterized the relationships betweennetwork metrics and bug proneness or severity in two opensource projects what are the effects if any on software

8 Mathematical Problems in Engineering

0

20

40

60

80

0 2 4 6 8 10 12

(a) EC

0 2 4 6 8 10 120

20

40

60

80

100

(b) BC

0

20

40

60

80

0 2 4 6 8 10 12

(c) CC

0

20

40

60

80

0 2 4 6 8 10 12

(d) PR

0

20

40

60

80

0 2 4 6 8 10 12

(e) inD

0

20

40

60

80

100

120

0 2 4 6 8 10 12

(f) outD

0

20

40

60

80

100

0 2 4 6 8 10 12

ClassesBugs

(g) D

0

20

40

60

80

0 2 4 6 8 10 12

ClassesBugs

(h) HITS

0

20

40

60

80

100

120

140

0 2 4 6 8 10 12 14 16 18 20

ClassesBugs

(i) MR

Figure 4 Ant (the x-axis is the group ID and y-axis is the number of classes and bugs)

Table 3 Bug proneness the correlation coefficient with eight network metrics

Bug proneness (Sig 119875 lt 001)BC CC D EC inD outD PR HITS

Pearson-TSN minus0804 mdash minus0903 minus0852 minus0852 minus0752 minus0728 minus0933Kendall-TSN minus0764 mdash minus0899 minus0689 minus0689 minus0552 mdash minus0733Spearman-TSN minus0888 mdash minus0967 minus0879 minus0879 minus0656 mdash minus0903Pearson-ASN minus0891 mdash minus0895 mdash mdash minus0915 mdash mdashKendall-ASN minus0782 mdash minus0807 mdash mdash minus0844 mdash mdashSpearman-ASN minus0909 mdash minus0916 mdash mdash minus0952 mdash mdashThe bold value refers to the maximum correlation coefficient of different analysis methods per project by comparing the eight metrics

quality Or what are the benefits to software engineeringpractices

We have learnt that BC and outDmetrics are remarkableto represent bug proneness and severity with the aboveexperiments Whether this conclusion is workable or notwe should validate it through bug proneness and severity

predictionwith the test dataThere are 67 effective bugs in testdata and 87 fixed classes related to these bugs for Tomcat and135 bugs and 149 fixed classes for Ant Given the experimentalrequirements bug severity is divided into two categoriesone is slight category in which bugsrsquo rank is less than 4 butrather the other is severe category Table 4 shows the resulting

Mathematical Problems in Engineering 9

predictive F-value from Top 50 to 200 for Tomcat and fromTop 50 to 125 for Ant using selected metrics on softwarenetworks The reason for top k within a range of 200 and 125for these two cases is to ensure the selected nodes from thefirst group F-value is calculated by integrating precise andrecall as

119865 =2 lowast precise lowast recallprecise + recall

(7)

The BCD metric is a combination of BC and outD usedto compare with each of them and follows the rule of BCD =BC+out119863 For Tomcat BCDmetric performs better on fixedclasses prediction than the others and themaximum is 03316when returning the top 100 nodes While all the nodes ingroup ID = 1 are selected three metrics obtain the sameaccuracy 02857 When it comes to the number of bugs themaximum is 0547 when using outD metric and just top 50classes are needed to check In this column outD and BCDmetrics are better than BCmetric in all casesThe last columnis the results of bug severityThefinding shows outDperformsfar better than the other metrics and the F-value is nearlymore than twice The best case is 06667 at top of 50 nodes

For Ant BCD metric performs better than the other twoon fixed classes and bugs when returning top 50 classesExcept this case outD has the highest accuracy at top 100 and125 and obtains the best result when returning top 100 classesthemaximumof classes bugs and severity are 03534 06383and 04865 respectively Throughout the prediction outD isthemost suitablemetric for predicting the bug proneness andbug severity in a directed class-level software network

Finally few concentrations are devoted to analyze theaverage human consumptions and needed comments whendeveloper wants to overcome one kind of bug In Figure 6one observes that more serious bugs need more people toparticipate and these people do more discussions beforethese bugs are identified as fixed which is consistent withthe common experience except rank = 1 A reasonableexplanation for this exception is that the enhancement bughas the lowest difficulty that many developers are able to findthe problem and offer their own suggestions or revisions

6 Discussion

RQ1 From the perspective of node importance the moreimportant a node is the greater its probability of beingmodified is and more bugs would be involved during thesoftware development process The central classes (nodes)form the skeleton of the entire system especially the nodeswith large betweenness centrality (BC) and outgoing links(outD) The former category of nodes plays an importantrole in bridging the different modules to ensure the normalexecution of the entire system Such nodes are also namedldquobrokerrdquo in network science The latter nodes are morecomplex for they aggregate behaviors frommany other nodesThey give some implications to software testing Nodes withhigh BCmean that they aremore important inmany contextsand have significant external responsibility Bugs occurred inthese nodes will influence the reachability between modulesOn the other hand nodes with a high outD mean that

they aggregate many behaviors of other nodes and have asignificant internal complexity They are more likely to befixed and contain bugs In this part PR and CC metrics arenot significant In summary our results are consistent withthe work in [34] some metrics are related to bug pronenessand the others are not

We conclude that the importance of different metricsof a node plays an inconsistent role in the analysis of bugproneness It is clear that BC and outD are suitable to testprioritization when the developers want to know whichclasses should be paid more attentions In some specialscenarios for example the node importance and out-degreecan also be applied to bug location with regard to therelationship between them

RQ2 RQ3 the hypothesis that a class would be apt to highbug proneness for great metric value and the bugs wouldbe more serious in this class is not always true FortunatelyTable 3 shows the average of BC D and outD metrics hasa significant correlation with software bug proneness Themost likely explanation for strong correlations among somemetrics and bug proneness is that the classes were dividedinto different groups having the same number of classes indescending order during the experiment This treatment canalso represent that the average metric value of precedinggroup is larger than the succeeding one and is correspondingto the top-k recommendation in the subsequent section Theresults show an overall trend between groups so that strongerthan the ones reported generally

In Figure 5 major regression and critical bugs have arelatively high value in both projects Moreover this phe-nomenon represented by major bug is more obvious Addi-tionally Bugs raised from the broker will hinder the transferof information between modules and function scheduling ofentire system As long as the modules cannot work smoothlythe quality of system cannot be guaranteed A class with highoutgoing links hasmore potential bugs but the severity of thebugs is higher Such classes are usually the central componentin a system or the central nodes in a network

As the members of a project team who are more con-cerned about whether they can find serious bugs sometimesIn other words they hope to find more bugs quickly andefficiently while more severe bugs in a limited period of timeOur results provide an appropriate method to alleviate thischallenge by BC and outDmetrics

RQ4 by the contrast among eight network metrics BCand outD are selected to predict bug proneness and severityOverall compared to other metrics a significant advantageof outD is shown in our results It can be applied to identifywhich classes should be modified prior to others The resultalso indicates that severe bugs usually have strong internalcomplexity

Based on our work we acquire somemeaningful answersto the proposed four questions in Section 3 However thereare still some potential limitations and threats to the validityof our work

(i) All datasets used in our experiments are collectedfrom open source projects Apache Tomcat 7 andAnt We know that there are many other available

10 Mathematical Problems in Engineering

Table 4 119865-value fixed classes bugs and bug severity

Classes Bugs SeverityBC outD BCD BC outD BCD BC outD BCD

Top 50 02043 02920 02628 03760 05470 04786 02917 06667 04583Top 100 02567 03102 03316 04072 04431 04551 03288 05069 03836Top 150 02616 02954 02954 03594 03779 03687 02857 04184 02959Top 200 02857 02857 02857 03071 03071 03521 02439 03740 02846Top 50 01307 02111 02312 01946 04221 04322 01481 02613 02613Top 100 02651 03534 03213 03489 06383 05957 02270 04865 04649Top 125 03139 03358 03285 04385 05923 05846 03143 04381 04286The bold value refers to the best 119865-value identified by comparing three metrics (BC outD and BCD) at the top K of predicting fixed classes bugs and bugseverity respectively

0

0001

0002

0003

0004

0005

0006

0 1 2 3 4 5 6 7 8 9

(a)

0

10

20

30

40

50

60

0 1 2 3 4 5 6 7 8 9

(b)

0

02

04

06

08

1

0 1 2 3 4 5 6 7 8 9

(c)

0

0002

0004

0006

0008

001

0012

0 1 2 3 4 5 6 7 8 9

PRHITSBC

(d)

0

5

10

15

20

25

30

0 1 2 3 4 5 6 7 8 9

inDoutDD

(e)

0

01

02

03

04

05

06

0 1 2 3 4 5 6 7 8 9

CCEC minus (E minus 01)

(f)

Figure 5 Bug severity (Tomcat-topAnt-bottom the x-axis is bug rank and y-axis is the mean of metric)

software repositories that are helpful to bug analysisWewould like to improve our approach by combiningmore software resources such as mailing-list forummessages in the future work to enrich the findings

(ii) A limitation of our work is that we choose projectswritten in Java because of the tool used to constructsoftware network can only deal with java source codeWhether our conclusion can explain projects writtenin other language or not it still needs to evaluate

(iii) Theoretically software networkswill evolve over timetherefore differences are inevitable between multiple

versions In this paper we utilize a stable versionof each project as the construction standard of soft-ware network instead of considering the structuraldifference between them Although such treatmentis rough it will not affect the final experimentalresults

(iv) As mentioned at the beginning software system canbe characterized from multiple granularity which isa multi-granularity software network We investigatethe bug proneness and severity through networkmetrics in class level If the results will be more

Mathematical Problems in Engineering 11

0

2

4

6

8

10

12

14

0 1 2 3 4 5 6 7 8 9

TomcatAnt

(a) Comments

0

1

2

3

4

5

6

0 1 2 3 4 5 6 7 8 9

TomcatAnt

(b) Persons

Figure 6 Human factor (the x-axis is bug rank)

delightful from the other granularity it is an attractivetheme

7 Conclusion

We constructed a class-level software networks and intro-duced nine representative and commonly-used networkmetrics to bug prediction An empirical study has beenconducted on open-source project Apache Tomcat 7 fromversion 700 to 7033 and Ant 184 from Feb 2002 toNovember 2012 Our analysis demonstrated that there is astatistically significant relation between classrsquos importanceand the probability that the class is to be modified and buggyAlso classrsquos importance is related to the severity of raisedbugs The result also showed that just about ten percentof efforts paid and the accuracy of our prediction for bugproneness and bug severity can be up to 547 and 667638 and 487 respectively when successively returningthe top 50 and 100 classes within these two cases

We expect that our findings are insightful and can beused to support the design and development of softwarehelping engineers in assessing the risk of adding or droppinga feature in the case of supporting existing dependenciesbetween classes We also believe that our approach can beleveraged in the analysis of bug prediction test prioritizationof others open source softwares Finally our findings alsoprovide additional empirical evidence on the importance ofdependencies between classes to researchers in the socialnetwork analysis domain

The next work will mainly focus on two aspects On theone hand we will collect more open source projects (ieEclipse Mozilla or projects deployed on SourceForge) tovalidate the generality of our approach On the other handwe will further take into account the human factors by

the state of software development are a process of humanparticipation An exploration about what impact will bebrought by sociotechnical congruence on bug prediction isurgent and meaningful

Acknowledgment

This work is supported by the National Basic ResearchProgram of China no 2014CB340401 National Natural Sci-ence Foundation of China nos 61273216 61272111 61202048and 61202032 Wuhan Planning Project of Science andTechnology no 201210621214 the Open Foundation ofJiangsu Provincial Key Laboratory of Electronic Businessno JSEB2012-02 and Zhejiang Provincial Natural ScienceFoundation of China no LQ12F02011

References

[1] J Anvik ldquoAutomating bug report assignmentrdquo in Proceedings ofthe 28th International Conference on Software Engineering 2006(ICSE rsquo06) pp 937ndash940 May 2006

[2] D Radjenovica M Herickob R Torkarc and A ZivkovicbldquoSoftware fault prediction metrics a systematic literaturereviewrdquo Information and Software Technology vol 55 pp 1397ndash1418 2013

[3] T Zimmermann and N Nagappan ldquoPredicting defects usingnetwork analysis on dependency graphsrdquo in Proceedings of the30th International Conference on Software Engineering 2008(ICSE rsquo08) pp 531ndash540 May 2008

[4] A Meneely L Williams W Snipes and J Osborne ldquoPredictingfailures with developer networks and social network analysisrdquoin Proceedings of the 16th ACM SIGSOFT International Sym-posium on the Foundations of Software Engineering (SIGSOFT2008FSE rsquo08) pp 13ndash23 November 2008

12 Mathematical Problems in Engineering

[5] KCrowstonKWeiQ Li and JHowison ldquoCore and peripheryin FreeLibre and Open Source software team communica-tionsrdquo in Proceedings of the 39th Annual Hawaii InternationalConference on System Sciences (HICSS rsquo06) p 118 January 2006

[6] T Zimmerman N Nagappan K Herzig R Premraj and LWilliams ldquoAn empirical study on the relation between depen-dency neighborhoods and failuresrdquo in Proceedings of the 4thIEEE International Conference on Software Testing Verificationand Validation (ICST rsquo11) pp 347ndash356 March 2011

[7] S Valverde R F Cancho and R V Sole ldquoScale-free networksfrom optimal designrdquo Europhysics Letters vol 60 no 4 pp 512ndash517 2002

[8] C R Myers ldquoSoftware systems as complex networks structurefunction and evolvability of software collaboration graphsrdquoPhysical Review E vol 68 no 4 Article ID 046116 15 pages2003

[9] G Concas M Marchesi S Pinna and N Serra ldquoPower-laws ina large object-oriented software systemrdquo IEEE Transactions onSoftware Engineering vol 33 no 10 pp 687ndash708 2007

[10] P Louridas D Spinellis and V Vlachos ldquoPower laws insoftwarerdquo ACM Transactions on Software Engineering andMethodology vol 18 no 1 article 2 2008

[11] B Li YutaoMa and J Liu ldquoAdvances in the studies on complexnetworks of software systemsrdquo Advances in Mechanics vol 38no 6 pp 805ndash814 2008

[12] B Li H Wang Z Y Li K Q He and D H Yu ldquoSoftware com-plexity metrics based on complex networksrdquo Acta ElectronicaSinica vol 34 pp 2371ndash2375 2006

[13] W Pan B Li Y Ma and J Liu ldquoMulti-granularity evolutionanalysis of software using complex network theoryrdquo Journal ofSystems Science and Complexity vol 24 no 6 pp 1068ndash10822011

[14] T Opsahl F Agneessens and J Skvoretz ldquoNode centrality inweighted networks generalizing degree and shortest pathsrdquoSocial Networks vol 32 no 3 pp 245ndash251 2010

[15] M S Zanetti I Scholtes C J Tessone and F SchweitzerldquoCategorizing bugs with social networks a case study on fouropen source software communitiesrdquo in Proceedings of the 35thInternational Conference on Software Engineering (ICSE rsquo13) pp1032ndash1041 2013

[16] M S Zanetti and F Schweitzer ldquoA network perspective on soft-ware modularityrdquo in Proceedings of the International Workshopon Complex Sciences in the Engineering of Computing Systemspp 175ndash186 2012

[17] C Ding X He P Husbands H Zha and H D SimonldquoPageRank HITS and a unified framework for link analysisrdquo inProceedings of the 25th Annual International ACM SIGIR Con-ference on Research and Development in Information Retrievalp 353 354 2002

[18] R Premraj and K Herzig ldquoNetwork versus code metrics topredict defects a replication studyrdquo in Proceedings of the 5thInternational Symposium on Empirical Software Engineering andMeasurement (ESEM rsquo11) pp 215ndash224 September 2011

[19] M Pinzger N Nagappan and B Murphy ldquoCan developer-module networks predict failuresrdquo in Proceedings of the 16thACM SIGSOFT International Symposium on the Foundationsof Software Engineering (SIGSOFT 2008FSE rsquo08) pp 2ndash12November 2008

[20] Y Shin A Meneely LWilliams and J A Osborne ldquoEvaluatingcomplexity code churn and developer activity metrics asindicators of software vulnerabilitiesrdquo IEEE Transactions onSoftware Engineering vol 37 no 6 pp 772ndash787 2011

[21] A Sureka A Goyal and A Rastogi ldquoUsing social networkanalysis for mining collaboration data in a defect trackingsystem for risk and vulnerability analysisrdquo in Proceedings of the4th India Software Engineering Conference (ISEC rsquo11) pp 195ndash204 February 2011

[22] F Perin L Renggli and J Ressia ldquoRanking software artifactsrdquoin Proceedings of the 4th Workshop on FAMIX and Moose inReengineering (ICSM rsquo10) pp 1ndash4 2010

[23] P Bhattacharya M Iliofotou I Neamtiu and M FaloutsosldquoGraphbased analysis and prediction for software evolutionrdquoin Proceedings of the International Conference on SoftwareEngineering (ICSE rsquo12) pp 419ndash429 2012

[24] D Steidl B Hummel and E Juergens ldquoUsing network analysisfor recommendation of central software classesrdquo in Proceedingsof the 19th Working Conference on Reverse Engineering pp 93ndash102 2012

[25] T Wolf A Schroter D Damian and T Nguyen ldquoPredictingbuild failures using social network analysisrdquo in Proceedings ofthe International Conference on Software Engineering (ICSE rsquo09)pp 1ndash11 May 2009

[26] M E J Newman Networks Oxford University Press OxfordUK 2010

[27] S Wasserman and K Faust Social Network Analysis Methodsand Applications Cambridge University Press 1994

[28] L C Freeman ldquoCentrality in social networks conceptual clari-ficationrdquo Social Networks vol 1 no 3 pp 215ndash239 1978

[29] M E J Newman and M Girvan ldquoFinding and evaluatingcommunity structure in networksrdquo Physical Review E vol 69no 2 Article ID 026113 15 pages 2004

[30] A Bachmann C Bird F Rahman P Devanbu and ABernstein ldquoThe missing links bugs and bug-fix commitsrdquo inProceedings of the 18th ACM SIGSOFT International Symposiumon the Foundations of Software Engineering (FSE rsquo10) pp 97ndash106November 2010

[31] RWuH Zhang S Kim and S C Cheung ldquoReLink recoveringlinks between bugs and changesrdquo inProceedings of the 19th ACMSIGSOFT Symposium on Foundations of Software Engineering(SIGSOFTFSE rsquo11) pp 15ndash25 September 2011

[32] G Antoniol K Ayari M Di Penta F Khomh and Y GGueheneuc ldquoIs it a bug or an enhancement A text-basedapproach to classify change requestsrdquo in Proceedings of theConference of the Center for Advanced Studies (CASCON rsquo08)ACM October 2008

[33] K Herzig S Just and A Zeller ldquoItrsquos not a bug itrsquos a feature howmisclassification impacts bug predictionrdquo in Proceedings of the35 International Conference on Software Engineering pp 392ndash401 2013

[34] R Tonelli G Concas MMarchesi and AMurgia ldquoAn analysisof SNA metrics on the Java Qualitas Corpusrdquo in Proceedingsof the 4th India Software Engineering Conference (ISEC rsquo11) pp205ndash213 February 2011

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 8: Research Article Using Software Dependency to Bug Predictiondownloads.hindawi.com/journals/mpe/2013/869356.pdfdefects. ey found that network metrics perform signi - cantly better than

8 Mathematical Problems in Engineering

0

20

40

60

80

0 2 4 6 8 10 12

(a) EC

0 2 4 6 8 10 120

20

40

60

80

100

(b) BC

0

20

40

60

80

0 2 4 6 8 10 12

(c) CC

0

20

40

60

80

0 2 4 6 8 10 12

(d) PR

0

20

40

60

80

0 2 4 6 8 10 12

(e) inD

0

20

40

60

80

100

120

0 2 4 6 8 10 12

(f) outD

0

20

40

60

80

100

0 2 4 6 8 10 12

ClassesBugs

(g) D

0

20

40

60

80

0 2 4 6 8 10 12

ClassesBugs

(h) HITS

0

20

40

60

80

100

120

140

0 2 4 6 8 10 12 14 16 18 20

ClassesBugs

(i) MR

Figure 4 Ant (the x-axis is the group ID and y-axis is the number of classes and bugs)

Table 3 Bug proneness the correlation coefficient with eight network metrics

Bug proneness (Sig 119875 lt 001)BC CC D EC inD outD PR HITS

Pearson-TSN minus0804 mdash minus0903 minus0852 minus0852 minus0752 minus0728 minus0933Kendall-TSN minus0764 mdash minus0899 minus0689 minus0689 minus0552 mdash minus0733Spearman-TSN minus0888 mdash minus0967 minus0879 minus0879 minus0656 mdash minus0903Pearson-ASN minus0891 mdash minus0895 mdash mdash minus0915 mdash mdashKendall-ASN minus0782 mdash minus0807 mdash mdash minus0844 mdash mdashSpearman-ASN minus0909 mdash minus0916 mdash mdash minus0952 mdash mdashThe bold value refers to the maximum correlation coefficient of different analysis methods per project by comparing the eight metrics

quality Or what are the benefits to software engineeringpractices

We have learnt that BC and outDmetrics are remarkableto represent bug proneness and severity with the aboveexperiments Whether this conclusion is workable or notwe should validate it through bug proneness and severity

predictionwith the test dataThere are 67 effective bugs in testdata and 87 fixed classes related to these bugs for Tomcat and135 bugs and 149 fixed classes for Ant Given the experimentalrequirements bug severity is divided into two categoriesone is slight category in which bugsrsquo rank is less than 4 butrather the other is severe category Table 4 shows the resulting

Mathematical Problems in Engineering 9

predictive F-value from Top 50 to 200 for Tomcat and fromTop 50 to 125 for Ant using selected metrics on softwarenetworks The reason for top k within a range of 200 and 125for these two cases is to ensure the selected nodes from thefirst group F-value is calculated by integrating precise andrecall as

119865 =2 lowast precise lowast recallprecise + recall

(7)

The BCD metric is a combination of BC and outD usedto compare with each of them and follows the rule of BCD =BC+out119863 For Tomcat BCDmetric performs better on fixedclasses prediction than the others and themaximum is 03316when returning the top 100 nodes While all the nodes ingroup ID = 1 are selected three metrics obtain the sameaccuracy 02857 When it comes to the number of bugs themaximum is 0547 when using outD metric and just top 50classes are needed to check In this column outD and BCDmetrics are better than BCmetric in all casesThe last columnis the results of bug severityThefinding shows outDperformsfar better than the other metrics and the F-value is nearlymore than twice The best case is 06667 at top of 50 nodes

For Ant BCD metric performs better than the other twoon fixed classes and bugs when returning top 50 classesExcept this case outD has the highest accuracy at top 100 and125 and obtains the best result when returning top 100 classesthemaximumof classes bugs and severity are 03534 06383and 04865 respectively Throughout the prediction outD isthemost suitablemetric for predicting the bug proneness andbug severity in a directed class-level software network

Finally few concentrations are devoted to analyze theaverage human consumptions and needed comments whendeveloper wants to overcome one kind of bug In Figure 6one observes that more serious bugs need more people toparticipate and these people do more discussions beforethese bugs are identified as fixed which is consistent withthe common experience except rank = 1 A reasonableexplanation for this exception is that the enhancement bughas the lowest difficulty that many developers are able to findthe problem and offer their own suggestions or revisions

6 Discussion

RQ1 From the perspective of node importance the moreimportant a node is the greater its probability of beingmodified is and more bugs would be involved during thesoftware development process The central classes (nodes)form the skeleton of the entire system especially the nodeswith large betweenness centrality (BC) and outgoing links(outD) The former category of nodes plays an importantrole in bridging the different modules to ensure the normalexecution of the entire system Such nodes are also namedldquobrokerrdquo in network science The latter nodes are morecomplex for they aggregate behaviors frommany other nodesThey give some implications to software testing Nodes withhigh BCmean that they aremore important inmany contextsand have significant external responsibility Bugs occurred inthese nodes will influence the reachability between modulesOn the other hand nodes with a high outD mean that

they aggregate many behaviors of other nodes and have asignificant internal complexity They are more likely to befixed and contain bugs In this part PR and CC metrics arenot significant In summary our results are consistent withthe work in [34] some metrics are related to bug pronenessand the others are not

We conclude that the importance of different metricsof a node plays an inconsistent role in the analysis of bugproneness It is clear that BC and outD are suitable to testprioritization when the developers want to know whichclasses should be paid more attentions In some specialscenarios for example the node importance and out-degreecan also be applied to bug location with regard to therelationship between them

RQ2 RQ3 the hypothesis that a class would be apt to highbug proneness for great metric value and the bugs wouldbe more serious in this class is not always true FortunatelyTable 3 shows the average of BC D and outD metrics hasa significant correlation with software bug proneness Themost likely explanation for strong correlations among somemetrics and bug proneness is that the classes were dividedinto different groups having the same number of classes indescending order during the experiment This treatment canalso represent that the average metric value of precedinggroup is larger than the succeeding one and is correspondingto the top-k recommendation in the subsequent section Theresults show an overall trend between groups so that strongerthan the ones reported generally

In Figure 5 major regression and critical bugs have arelatively high value in both projects Moreover this phe-nomenon represented by major bug is more obvious Addi-tionally Bugs raised from the broker will hinder the transferof information between modules and function scheduling ofentire system As long as the modules cannot work smoothlythe quality of system cannot be guaranteed A class with highoutgoing links hasmore potential bugs but the severity of thebugs is higher Such classes are usually the central componentin a system or the central nodes in a network

As the members of a project team who are more con-cerned about whether they can find serious bugs sometimesIn other words they hope to find more bugs quickly andefficiently while more severe bugs in a limited period of timeOur results provide an appropriate method to alleviate thischallenge by BC and outDmetrics

RQ4 by the contrast among eight network metrics BCand outD are selected to predict bug proneness and severityOverall compared to other metrics a significant advantageof outD is shown in our results It can be applied to identifywhich classes should be modified prior to others The resultalso indicates that severe bugs usually have strong internalcomplexity

Based on our work we acquire somemeaningful answersto the proposed four questions in Section 3 However thereare still some potential limitations and threats to the validityof our work

(i) All datasets used in our experiments are collectedfrom open source projects Apache Tomcat 7 andAnt We know that there are many other available

10 Mathematical Problems in Engineering

Table 4 119865-value fixed classes bugs and bug severity

Classes Bugs SeverityBC outD BCD BC outD BCD BC outD BCD

Top 50 02043 02920 02628 03760 05470 04786 02917 06667 04583Top 100 02567 03102 03316 04072 04431 04551 03288 05069 03836Top 150 02616 02954 02954 03594 03779 03687 02857 04184 02959Top 200 02857 02857 02857 03071 03071 03521 02439 03740 02846Top 50 01307 02111 02312 01946 04221 04322 01481 02613 02613Top 100 02651 03534 03213 03489 06383 05957 02270 04865 04649Top 125 03139 03358 03285 04385 05923 05846 03143 04381 04286The bold value refers to the best 119865-value identified by comparing three metrics (BC outD and BCD) at the top K of predicting fixed classes bugs and bugseverity respectively

0

0001

0002

0003

0004

0005

0006

0 1 2 3 4 5 6 7 8 9

(a)

0

10

20

30

40

50

60

0 1 2 3 4 5 6 7 8 9

(b)

0

02

04

06

08

1

0 1 2 3 4 5 6 7 8 9

(c)

0

0002

0004

0006

0008

001

0012

0 1 2 3 4 5 6 7 8 9

PRHITSBC

(d)

0

5

10

15

20

25

30

0 1 2 3 4 5 6 7 8 9

inDoutDD

(e)

0

01

02

03

04

05

06

0 1 2 3 4 5 6 7 8 9

CCEC minus (E minus 01)

(f)

Figure 5 Bug severity (Tomcat-topAnt-bottom the x-axis is bug rank and y-axis is the mean of metric)

software repositories that are helpful to bug analysisWewould like to improve our approach by combiningmore software resources such as mailing-list forummessages in the future work to enrich the findings

(ii) A limitation of our work is that we choose projectswritten in Java because of the tool used to constructsoftware network can only deal with java source codeWhether our conclusion can explain projects writtenin other language or not it still needs to evaluate

(iii) Theoretically software networkswill evolve over timetherefore differences are inevitable between multiple

versions In this paper we utilize a stable versionof each project as the construction standard of soft-ware network instead of considering the structuraldifference between them Although such treatmentis rough it will not affect the final experimentalresults

(iv) As mentioned at the beginning software system canbe characterized from multiple granularity which isa multi-granularity software network We investigatethe bug proneness and severity through networkmetrics in class level If the results will be more

Mathematical Problems in Engineering 11

0

2

4

6

8

10

12

14

0 1 2 3 4 5 6 7 8 9

TomcatAnt

(a) Comments

0

1

2

3

4

5

6

0 1 2 3 4 5 6 7 8 9

TomcatAnt

(b) Persons

Figure 6 Human factor (the x-axis is bug rank)

delightful from the other granularity it is an attractivetheme

7 Conclusion

We constructed a class-level software networks and intro-duced nine representative and commonly-used networkmetrics to bug prediction An empirical study has beenconducted on open-source project Apache Tomcat 7 fromversion 700 to 7033 and Ant 184 from Feb 2002 toNovember 2012 Our analysis demonstrated that there is astatistically significant relation between classrsquos importanceand the probability that the class is to be modified and buggyAlso classrsquos importance is related to the severity of raisedbugs The result also showed that just about ten percentof efforts paid and the accuracy of our prediction for bugproneness and bug severity can be up to 547 and 667638 and 487 respectively when successively returningthe top 50 and 100 classes within these two cases

We expect that our findings are insightful and can beused to support the design and development of softwarehelping engineers in assessing the risk of adding or droppinga feature in the case of supporting existing dependenciesbetween classes We also believe that our approach can beleveraged in the analysis of bug prediction test prioritizationof others open source softwares Finally our findings alsoprovide additional empirical evidence on the importance ofdependencies between classes to researchers in the socialnetwork analysis domain

The next work will mainly focus on two aspects On theone hand we will collect more open source projects (ieEclipse Mozilla or projects deployed on SourceForge) tovalidate the generality of our approach On the other handwe will further take into account the human factors by

the state of software development are a process of humanparticipation An exploration about what impact will bebrought by sociotechnical congruence on bug prediction isurgent and meaningful

Acknowledgment

This work is supported by the National Basic ResearchProgram of China no 2014CB340401 National Natural Sci-ence Foundation of China nos 61273216 61272111 61202048and 61202032 Wuhan Planning Project of Science andTechnology no 201210621214 the Open Foundation ofJiangsu Provincial Key Laboratory of Electronic Businessno JSEB2012-02 and Zhejiang Provincial Natural ScienceFoundation of China no LQ12F02011

References

[1] J Anvik ldquoAutomating bug report assignmentrdquo in Proceedings ofthe 28th International Conference on Software Engineering 2006(ICSE rsquo06) pp 937ndash940 May 2006

[2] D Radjenovica M Herickob R Torkarc and A ZivkovicbldquoSoftware fault prediction metrics a systematic literaturereviewrdquo Information and Software Technology vol 55 pp 1397ndash1418 2013

[3] T Zimmermann and N Nagappan ldquoPredicting defects usingnetwork analysis on dependency graphsrdquo in Proceedings of the30th International Conference on Software Engineering 2008(ICSE rsquo08) pp 531ndash540 May 2008

[4] A Meneely L Williams W Snipes and J Osborne ldquoPredictingfailures with developer networks and social network analysisrdquoin Proceedings of the 16th ACM SIGSOFT International Sym-posium on the Foundations of Software Engineering (SIGSOFT2008FSE rsquo08) pp 13ndash23 November 2008

12 Mathematical Problems in Engineering

[5] KCrowstonKWeiQ Li and JHowison ldquoCore and peripheryin FreeLibre and Open Source software team communica-tionsrdquo in Proceedings of the 39th Annual Hawaii InternationalConference on System Sciences (HICSS rsquo06) p 118 January 2006

[6] T Zimmerman N Nagappan K Herzig R Premraj and LWilliams ldquoAn empirical study on the relation between depen-dency neighborhoods and failuresrdquo in Proceedings of the 4thIEEE International Conference on Software Testing Verificationand Validation (ICST rsquo11) pp 347ndash356 March 2011

[7] S Valverde R F Cancho and R V Sole ldquoScale-free networksfrom optimal designrdquo Europhysics Letters vol 60 no 4 pp 512ndash517 2002

[8] C R Myers ldquoSoftware systems as complex networks structurefunction and evolvability of software collaboration graphsrdquoPhysical Review E vol 68 no 4 Article ID 046116 15 pages2003

[9] G Concas M Marchesi S Pinna and N Serra ldquoPower-laws ina large object-oriented software systemrdquo IEEE Transactions onSoftware Engineering vol 33 no 10 pp 687ndash708 2007

[10] P Louridas D Spinellis and V Vlachos ldquoPower laws insoftwarerdquo ACM Transactions on Software Engineering andMethodology vol 18 no 1 article 2 2008

[11] B Li YutaoMa and J Liu ldquoAdvances in the studies on complexnetworks of software systemsrdquo Advances in Mechanics vol 38no 6 pp 805ndash814 2008

[12] B Li H Wang Z Y Li K Q He and D H Yu ldquoSoftware com-plexity metrics based on complex networksrdquo Acta ElectronicaSinica vol 34 pp 2371ndash2375 2006

[13] W Pan B Li Y Ma and J Liu ldquoMulti-granularity evolutionanalysis of software using complex network theoryrdquo Journal ofSystems Science and Complexity vol 24 no 6 pp 1068ndash10822011

[14] T Opsahl F Agneessens and J Skvoretz ldquoNode centrality inweighted networks generalizing degree and shortest pathsrdquoSocial Networks vol 32 no 3 pp 245ndash251 2010

[15] M S Zanetti I Scholtes C J Tessone and F SchweitzerldquoCategorizing bugs with social networks a case study on fouropen source software communitiesrdquo in Proceedings of the 35thInternational Conference on Software Engineering (ICSE rsquo13) pp1032ndash1041 2013

[16] M S Zanetti and F Schweitzer ldquoA network perspective on soft-ware modularityrdquo in Proceedings of the International Workshopon Complex Sciences in the Engineering of Computing Systemspp 175ndash186 2012

[17] C Ding X He P Husbands H Zha and H D SimonldquoPageRank HITS and a unified framework for link analysisrdquo inProceedings of the 25th Annual International ACM SIGIR Con-ference on Research and Development in Information Retrievalp 353 354 2002

[18] R Premraj and K Herzig ldquoNetwork versus code metrics topredict defects a replication studyrdquo in Proceedings of the 5thInternational Symposium on Empirical Software Engineering andMeasurement (ESEM rsquo11) pp 215ndash224 September 2011

[19] M Pinzger N Nagappan and B Murphy ldquoCan developer-module networks predict failuresrdquo in Proceedings of the 16thACM SIGSOFT International Symposium on the Foundationsof Software Engineering (SIGSOFT 2008FSE rsquo08) pp 2ndash12November 2008

[20] Y Shin A Meneely LWilliams and J A Osborne ldquoEvaluatingcomplexity code churn and developer activity metrics asindicators of software vulnerabilitiesrdquo IEEE Transactions onSoftware Engineering vol 37 no 6 pp 772ndash787 2011

[21] A Sureka A Goyal and A Rastogi ldquoUsing social networkanalysis for mining collaboration data in a defect trackingsystem for risk and vulnerability analysisrdquo in Proceedings of the4th India Software Engineering Conference (ISEC rsquo11) pp 195ndash204 February 2011

[22] F Perin L Renggli and J Ressia ldquoRanking software artifactsrdquoin Proceedings of the 4th Workshop on FAMIX and Moose inReengineering (ICSM rsquo10) pp 1ndash4 2010

[23] P Bhattacharya M Iliofotou I Neamtiu and M FaloutsosldquoGraphbased analysis and prediction for software evolutionrdquoin Proceedings of the International Conference on SoftwareEngineering (ICSE rsquo12) pp 419ndash429 2012

[24] D Steidl B Hummel and E Juergens ldquoUsing network analysisfor recommendation of central software classesrdquo in Proceedingsof the 19th Working Conference on Reverse Engineering pp 93ndash102 2012

[25] T Wolf A Schroter D Damian and T Nguyen ldquoPredictingbuild failures using social network analysisrdquo in Proceedings ofthe International Conference on Software Engineering (ICSE rsquo09)pp 1ndash11 May 2009

[26] M E J Newman Networks Oxford University Press OxfordUK 2010

[27] S Wasserman and K Faust Social Network Analysis Methodsand Applications Cambridge University Press 1994

[28] L C Freeman ldquoCentrality in social networks conceptual clari-ficationrdquo Social Networks vol 1 no 3 pp 215ndash239 1978

[29] M E J Newman and M Girvan ldquoFinding and evaluatingcommunity structure in networksrdquo Physical Review E vol 69no 2 Article ID 026113 15 pages 2004

[30] A Bachmann C Bird F Rahman P Devanbu and ABernstein ldquoThe missing links bugs and bug-fix commitsrdquo inProceedings of the 18th ACM SIGSOFT International Symposiumon the Foundations of Software Engineering (FSE rsquo10) pp 97ndash106November 2010

[31] RWuH Zhang S Kim and S C Cheung ldquoReLink recoveringlinks between bugs and changesrdquo inProceedings of the 19th ACMSIGSOFT Symposium on Foundations of Software Engineering(SIGSOFTFSE rsquo11) pp 15ndash25 September 2011

[32] G Antoniol K Ayari M Di Penta F Khomh and Y GGueheneuc ldquoIs it a bug or an enhancement A text-basedapproach to classify change requestsrdquo in Proceedings of theConference of the Center for Advanced Studies (CASCON rsquo08)ACM October 2008

[33] K Herzig S Just and A Zeller ldquoItrsquos not a bug itrsquos a feature howmisclassification impacts bug predictionrdquo in Proceedings of the35 International Conference on Software Engineering pp 392ndash401 2013

[34] R Tonelli G Concas MMarchesi and AMurgia ldquoAn analysisof SNA metrics on the Java Qualitas Corpusrdquo in Proceedingsof the 4th India Software Engineering Conference (ISEC rsquo11) pp205ndash213 February 2011

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 9: Research Article Using Software Dependency to Bug Predictiondownloads.hindawi.com/journals/mpe/2013/869356.pdfdefects. ey found that network metrics perform signi - cantly better than

Mathematical Problems in Engineering 9

predictive F-value from Top 50 to 200 for Tomcat and fromTop 50 to 125 for Ant using selected metrics on softwarenetworks The reason for top k within a range of 200 and 125for these two cases is to ensure the selected nodes from thefirst group F-value is calculated by integrating precise andrecall as

119865 =2 lowast precise lowast recallprecise + recall

(7)

The BCD metric is a combination of BC and outD usedto compare with each of them and follows the rule of BCD =BC+out119863 For Tomcat BCDmetric performs better on fixedclasses prediction than the others and themaximum is 03316when returning the top 100 nodes While all the nodes ingroup ID = 1 are selected three metrics obtain the sameaccuracy 02857 When it comes to the number of bugs themaximum is 0547 when using outD metric and just top 50classes are needed to check In this column outD and BCDmetrics are better than BCmetric in all casesThe last columnis the results of bug severityThefinding shows outDperformsfar better than the other metrics and the F-value is nearlymore than twice The best case is 06667 at top of 50 nodes

For Ant BCD metric performs better than the other twoon fixed classes and bugs when returning top 50 classesExcept this case outD has the highest accuracy at top 100 and125 and obtains the best result when returning top 100 classesthemaximumof classes bugs and severity are 03534 06383and 04865 respectively Throughout the prediction outD isthemost suitablemetric for predicting the bug proneness andbug severity in a directed class-level software network

Finally few concentrations are devoted to analyze theaverage human consumptions and needed comments whendeveloper wants to overcome one kind of bug In Figure 6one observes that more serious bugs need more people toparticipate and these people do more discussions beforethese bugs are identified as fixed which is consistent withthe common experience except rank = 1 A reasonableexplanation for this exception is that the enhancement bughas the lowest difficulty that many developers are able to findthe problem and offer their own suggestions or revisions

6 Discussion

RQ1 From the perspective of node importance the moreimportant a node is the greater its probability of beingmodified is and more bugs would be involved during thesoftware development process The central classes (nodes)form the skeleton of the entire system especially the nodeswith large betweenness centrality (BC) and outgoing links(outD) The former category of nodes plays an importantrole in bridging the different modules to ensure the normalexecution of the entire system Such nodes are also namedldquobrokerrdquo in network science The latter nodes are morecomplex for they aggregate behaviors frommany other nodesThey give some implications to software testing Nodes withhigh BCmean that they aremore important inmany contextsand have significant external responsibility Bugs occurred inthese nodes will influence the reachability between modulesOn the other hand nodes with a high outD mean that

they aggregate many behaviors of other nodes and have asignificant internal complexity They are more likely to befixed and contain bugs In this part PR and CC metrics arenot significant In summary our results are consistent withthe work in [34] some metrics are related to bug pronenessand the others are not

We conclude that the importance of different metricsof a node plays an inconsistent role in the analysis of bugproneness It is clear that BC and outD are suitable to testprioritization when the developers want to know whichclasses should be paid more attentions In some specialscenarios for example the node importance and out-degreecan also be applied to bug location with regard to therelationship between them

RQ2 RQ3 the hypothesis that a class would be apt to highbug proneness for great metric value and the bugs wouldbe more serious in this class is not always true FortunatelyTable 3 shows the average of BC D and outD metrics hasa significant correlation with software bug proneness Themost likely explanation for strong correlations among somemetrics and bug proneness is that the classes were dividedinto different groups having the same number of classes indescending order during the experiment This treatment canalso represent that the average metric value of precedinggroup is larger than the succeeding one and is correspondingto the top-k recommendation in the subsequent section Theresults show an overall trend between groups so that strongerthan the ones reported generally

In Figure 5 major regression and critical bugs have arelatively high value in both projects Moreover this phe-nomenon represented by major bug is more obvious Addi-tionally Bugs raised from the broker will hinder the transferof information between modules and function scheduling ofentire system As long as the modules cannot work smoothlythe quality of system cannot be guaranteed A class with highoutgoing links hasmore potential bugs but the severity of thebugs is higher Such classes are usually the central componentin a system or the central nodes in a network

As the members of a project team who are more con-cerned about whether they can find serious bugs sometimesIn other words they hope to find more bugs quickly andefficiently while more severe bugs in a limited period of timeOur results provide an appropriate method to alleviate thischallenge by BC and outDmetrics

RQ4 by the contrast among eight network metrics BCand outD are selected to predict bug proneness and severityOverall compared to other metrics a significant advantageof outD is shown in our results It can be applied to identifywhich classes should be modified prior to others The resultalso indicates that severe bugs usually have strong internalcomplexity

Based on our work we acquire somemeaningful answersto the proposed four questions in Section 3 However thereare still some potential limitations and threats to the validityof our work

(i) All datasets used in our experiments are collectedfrom open source projects Apache Tomcat 7 andAnt We know that there are many other available

10 Mathematical Problems in Engineering

Table 4 119865-value fixed classes bugs and bug severity

Classes Bugs SeverityBC outD BCD BC outD BCD BC outD BCD

Top 50 02043 02920 02628 03760 05470 04786 02917 06667 04583Top 100 02567 03102 03316 04072 04431 04551 03288 05069 03836Top 150 02616 02954 02954 03594 03779 03687 02857 04184 02959Top 200 02857 02857 02857 03071 03071 03521 02439 03740 02846Top 50 01307 02111 02312 01946 04221 04322 01481 02613 02613Top 100 02651 03534 03213 03489 06383 05957 02270 04865 04649Top 125 03139 03358 03285 04385 05923 05846 03143 04381 04286The bold value refers to the best 119865-value identified by comparing three metrics (BC outD and BCD) at the top K of predicting fixed classes bugs and bugseverity respectively

0

0001

0002

0003

0004

0005

0006

0 1 2 3 4 5 6 7 8 9

(a)

0

10

20

30

40

50

60

0 1 2 3 4 5 6 7 8 9

(b)

0

02

04

06

08

1

0 1 2 3 4 5 6 7 8 9

(c)

0

0002

0004

0006

0008

001

0012

0 1 2 3 4 5 6 7 8 9

PRHITSBC

(d)

0

5

10

15

20

25

30

0 1 2 3 4 5 6 7 8 9

inDoutDD

(e)

0

01

02

03

04

05

06

0 1 2 3 4 5 6 7 8 9

CCEC minus (E minus 01)

(f)

Figure 5 Bug severity (Tomcat-topAnt-bottom the x-axis is bug rank and y-axis is the mean of metric)

software repositories that are helpful to bug analysisWewould like to improve our approach by combiningmore software resources such as mailing-list forummessages in the future work to enrich the findings

(ii) A limitation of our work is that we choose projectswritten in Java because of the tool used to constructsoftware network can only deal with java source codeWhether our conclusion can explain projects writtenin other language or not it still needs to evaluate

(iii) Theoretically software networkswill evolve over timetherefore differences are inevitable between multiple

versions In this paper we utilize a stable versionof each project as the construction standard of soft-ware network instead of considering the structuraldifference between them Although such treatmentis rough it will not affect the final experimentalresults

(iv) As mentioned at the beginning software system canbe characterized from multiple granularity which isa multi-granularity software network We investigatethe bug proneness and severity through networkmetrics in class level If the results will be more

Mathematical Problems in Engineering 11

0

2

4

6

8

10

12

14

0 1 2 3 4 5 6 7 8 9

TomcatAnt

(a) Comments

0

1

2

3

4

5

6

0 1 2 3 4 5 6 7 8 9

TomcatAnt

(b) Persons

Figure 6 Human factor (the x-axis is bug rank)

delightful from the other granularity it is an attractivetheme

7 Conclusion

We constructed a class-level software networks and intro-duced nine representative and commonly-used networkmetrics to bug prediction An empirical study has beenconducted on open-source project Apache Tomcat 7 fromversion 700 to 7033 and Ant 184 from Feb 2002 toNovember 2012 Our analysis demonstrated that there is astatistically significant relation between classrsquos importanceand the probability that the class is to be modified and buggyAlso classrsquos importance is related to the severity of raisedbugs The result also showed that just about ten percentof efforts paid and the accuracy of our prediction for bugproneness and bug severity can be up to 547 and 667638 and 487 respectively when successively returningthe top 50 and 100 classes within these two cases

We expect that our findings are insightful and can beused to support the design and development of softwarehelping engineers in assessing the risk of adding or droppinga feature in the case of supporting existing dependenciesbetween classes We also believe that our approach can beleveraged in the analysis of bug prediction test prioritizationof others open source softwares Finally our findings alsoprovide additional empirical evidence on the importance ofdependencies between classes to researchers in the socialnetwork analysis domain

The next work will mainly focus on two aspects On theone hand we will collect more open source projects (ieEclipse Mozilla or projects deployed on SourceForge) tovalidate the generality of our approach On the other handwe will further take into account the human factors by

the state of software development are a process of humanparticipation An exploration about what impact will bebrought by sociotechnical congruence on bug prediction isurgent and meaningful

Acknowledgment

This work is supported by the National Basic ResearchProgram of China no 2014CB340401 National Natural Sci-ence Foundation of China nos 61273216 61272111 61202048and 61202032 Wuhan Planning Project of Science andTechnology no 201210621214 the Open Foundation ofJiangsu Provincial Key Laboratory of Electronic Businessno JSEB2012-02 and Zhejiang Provincial Natural ScienceFoundation of China no LQ12F02011

References

[1] J Anvik ldquoAutomating bug report assignmentrdquo in Proceedings ofthe 28th International Conference on Software Engineering 2006(ICSE rsquo06) pp 937ndash940 May 2006

[2] D Radjenovica M Herickob R Torkarc and A ZivkovicbldquoSoftware fault prediction metrics a systematic literaturereviewrdquo Information and Software Technology vol 55 pp 1397ndash1418 2013

[3] T Zimmermann and N Nagappan ldquoPredicting defects usingnetwork analysis on dependency graphsrdquo in Proceedings of the30th International Conference on Software Engineering 2008(ICSE rsquo08) pp 531ndash540 May 2008

[4] A Meneely L Williams W Snipes and J Osborne ldquoPredictingfailures with developer networks and social network analysisrdquoin Proceedings of the 16th ACM SIGSOFT International Sym-posium on the Foundations of Software Engineering (SIGSOFT2008FSE rsquo08) pp 13ndash23 November 2008

12 Mathematical Problems in Engineering

[5] KCrowstonKWeiQ Li and JHowison ldquoCore and peripheryin FreeLibre and Open Source software team communica-tionsrdquo in Proceedings of the 39th Annual Hawaii InternationalConference on System Sciences (HICSS rsquo06) p 118 January 2006

[6] T Zimmerman N Nagappan K Herzig R Premraj and LWilliams ldquoAn empirical study on the relation between depen-dency neighborhoods and failuresrdquo in Proceedings of the 4thIEEE International Conference on Software Testing Verificationand Validation (ICST rsquo11) pp 347ndash356 March 2011

[7] S Valverde R F Cancho and R V Sole ldquoScale-free networksfrom optimal designrdquo Europhysics Letters vol 60 no 4 pp 512ndash517 2002

[8] C R Myers ldquoSoftware systems as complex networks structurefunction and evolvability of software collaboration graphsrdquoPhysical Review E vol 68 no 4 Article ID 046116 15 pages2003

[9] G Concas M Marchesi S Pinna and N Serra ldquoPower-laws ina large object-oriented software systemrdquo IEEE Transactions onSoftware Engineering vol 33 no 10 pp 687ndash708 2007

[10] P Louridas D Spinellis and V Vlachos ldquoPower laws insoftwarerdquo ACM Transactions on Software Engineering andMethodology vol 18 no 1 article 2 2008

[11] B Li YutaoMa and J Liu ldquoAdvances in the studies on complexnetworks of software systemsrdquo Advances in Mechanics vol 38no 6 pp 805ndash814 2008

[12] B Li H Wang Z Y Li K Q He and D H Yu ldquoSoftware com-plexity metrics based on complex networksrdquo Acta ElectronicaSinica vol 34 pp 2371ndash2375 2006

[13] W Pan B Li Y Ma and J Liu ldquoMulti-granularity evolutionanalysis of software using complex network theoryrdquo Journal ofSystems Science and Complexity vol 24 no 6 pp 1068ndash10822011

[14] T Opsahl F Agneessens and J Skvoretz ldquoNode centrality inweighted networks generalizing degree and shortest pathsrdquoSocial Networks vol 32 no 3 pp 245ndash251 2010

[15] M S Zanetti I Scholtes C J Tessone and F SchweitzerldquoCategorizing bugs with social networks a case study on fouropen source software communitiesrdquo in Proceedings of the 35thInternational Conference on Software Engineering (ICSE rsquo13) pp1032ndash1041 2013

[16] M S Zanetti and F Schweitzer ldquoA network perspective on soft-ware modularityrdquo in Proceedings of the International Workshopon Complex Sciences in the Engineering of Computing Systemspp 175ndash186 2012

[17] C Ding X He P Husbands H Zha and H D SimonldquoPageRank HITS and a unified framework for link analysisrdquo inProceedings of the 25th Annual International ACM SIGIR Con-ference on Research and Development in Information Retrievalp 353 354 2002

[18] R Premraj and K Herzig ldquoNetwork versus code metrics topredict defects a replication studyrdquo in Proceedings of the 5thInternational Symposium on Empirical Software Engineering andMeasurement (ESEM rsquo11) pp 215ndash224 September 2011

[19] M Pinzger N Nagappan and B Murphy ldquoCan developer-module networks predict failuresrdquo in Proceedings of the 16thACM SIGSOFT International Symposium on the Foundationsof Software Engineering (SIGSOFT 2008FSE rsquo08) pp 2ndash12November 2008

[20] Y Shin A Meneely LWilliams and J A Osborne ldquoEvaluatingcomplexity code churn and developer activity metrics asindicators of software vulnerabilitiesrdquo IEEE Transactions onSoftware Engineering vol 37 no 6 pp 772ndash787 2011

[21] A Sureka A Goyal and A Rastogi ldquoUsing social networkanalysis for mining collaboration data in a defect trackingsystem for risk and vulnerability analysisrdquo in Proceedings of the4th India Software Engineering Conference (ISEC rsquo11) pp 195ndash204 February 2011

[22] F Perin L Renggli and J Ressia ldquoRanking software artifactsrdquoin Proceedings of the 4th Workshop on FAMIX and Moose inReengineering (ICSM rsquo10) pp 1ndash4 2010

[23] P Bhattacharya M Iliofotou I Neamtiu and M FaloutsosldquoGraphbased analysis and prediction for software evolutionrdquoin Proceedings of the International Conference on SoftwareEngineering (ICSE rsquo12) pp 419ndash429 2012

[24] D Steidl B Hummel and E Juergens ldquoUsing network analysisfor recommendation of central software classesrdquo in Proceedingsof the 19th Working Conference on Reverse Engineering pp 93ndash102 2012

[25] T Wolf A Schroter D Damian and T Nguyen ldquoPredictingbuild failures using social network analysisrdquo in Proceedings ofthe International Conference on Software Engineering (ICSE rsquo09)pp 1ndash11 May 2009

[26] M E J Newman Networks Oxford University Press OxfordUK 2010

[27] S Wasserman and K Faust Social Network Analysis Methodsand Applications Cambridge University Press 1994

[28] L C Freeman ldquoCentrality in social networks conceptual clari-ficationrdquo Social Networks vol 1 no 3 pp 215ndash239 1978

[29] M E J Newman and M Girvan ldquoFinding and evaluatingcommunity structure in networksrdquo Physical Review E vol 69no 2 Article ID 026113 15 pages 2004

[30] A Bachmann C Bird F Rahman P Devanbu and ABernstein ldquoThe missing links bugs and bug-fix commitsrdquo inProceedings of the 18th ACM SIGSOFT International Symposiumon the Foundations of Software Engineering (FSE rsquo10) pp 97ndash106November 2010

[31] RWuH Zhang S Kim and S C Cheung ldquoReLink recoveringlinks between bugs and changesrdquo inProceedings of the 19th ACMSIGSOFT Symposium on Foundations of Software Engineering(SIGSOFTFSE rsquo11) pp 15ndash25 September 2011

[32] G Antoniol K Ayari M Di Penta F Khomh and Y GGueheneuc ldquoIs it a bug or an enhancement A text-basedapproach to classify change requestsrdquo in Proceedings of theConference of the Center for Advanced Studies (CASCON rsquo08)ACM October 2008

[33] K Herzig S Just and A Zeller ldquoItrsquos not a bug itrsquos a feature howmisclassification impacts bug predictionrdquo in Proceedings of the35 International Conference on Software Engineering pp 392ndash401 2013

[34] R Tonelli G Concas MMarchesi and AMurgia ldquoAn analysisof SNA metrics on the Java Qualitas Corpusrdquo in Proceedingsof the 4th India Software Engineering Conference (ISEC rsquo11) pp205ndash213 February 2011

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 10: Research Article Using Software Dependency to Bug Predictiondownloads.hindawi.com/journals/mpe/2013/869356.pdfdefects. ey found that network metrics perform signi - cantly better than

10 Mathematical Problems in Engineering

Table 4 119865-value fixed classes bugs and bug severity

Classes Bugs SeverityBC outD BCD BC outD BCD BC outD BCD

Top 50 02043 02920 02628 03760 05470 04786 02917 06667 04583Top 100 02567 03102 03316 04072 04431 04551 03288 05069 03836Top 150 02616 02954 02954 03594 03779 03687 02857 04184 02959Top 200 02857 02857 02857 03071 03071 03521 02439 03740 02846Top 50 01307 02111 02312 01946 04221 04322 01481 02613 02613Top 100 02651 03534 03213 03489 06383 05957 02270 04865 04649Top 125 03139 03358 03285 04385 05923 05846 03143 04381 04286The bold value refers to the best 119865-value identified by comparing three metrics (BC outD and BCD) at the top K of predicting fixed classes bugs and bugseverity respectively

0

0001

0002

0003

0004

0005

0006

0 1 2 3 4 5 6 7 8 9

(a)

0

10

20

30

40

50

60

0 1 2 3 4 5 6 7 8 9

(b)

0

02

04

06

08

1

0 1 2 3 4 5 6 7 8 9

(c)

0

0002

0004

0006

0008

001

0012

0 1 2 3 4 5 6 7 8 9

PRHITSBC

(d)

0

5

10

15

20

25

30

0 1 2 3 4 5 6 7 8 9

inDoutDD

(e)

0

01

02

03

04

05

06

0 1 2 3 4 5 6 7 8 9

CCEC minus (E minus 01)

(f)

Figure 5 Bug severity (Tomcat-topAnt-bottom the x-axis is bug rank and y-axis is the mean of metric)

software repositories that are helpful to bug analysisWewould like to improve our approach by combiningmore software resources such as mailing-list forummessages in the future work to enrich the findings

(ii) A limitation of our work is that we choose projectswritten in Java because of the tool used to constructsoftware network can only deal with java source codeWhether our conclusion can explain projects writtenin other language or not it still needs to evaluate

(iii) Theoretically software networkswill evolve over timetherefore differences are inevitable between multiple

versions In this paper we utilize a stable versionof each project as the construction standard of soft-ware network instead of considering the structuraldifference between them Although such treatmentis rough it will not affect the final experimentalresults

(iv) As mentioned at the beginning software system canbe characterized from multiple granularity which isa multi-granularity software network We investigatethe bug proneness and severity through networkmetrics in class level If the results will be more

Mathematical Problems in Engineering 11

0

2

4

6

8

10

12

14

0 1 2 3 4 5 6 7 8 9

TomcatAnt

(a) Comments

0

1

2

3

4

5

6

0 1 2 3 4 5 6 7 8 9

TomcatAnt

(b) Persons

Figure 6 Human factor (the x-axis is bug rank)

delightful from the other granularity it is an attractivetheme

7 Conclusion

We constructed a class-level software networks and intro-duced nine representative and commonly-used networkmetrics to bug prediction An empirical study has beenconducted on open-source project Apache Tomcat 7 fromversion 700 to 7033 and Ant 184 from Feb 2002 toNovember 2012 Our analysis demonstrated that there is astatistically significant relation between classrsquos importanceand the probability that the class is to be modified and buggyAlso classrsquos importance is related to the severity of raisedbugs The result also showed that just about ten percentof efforts paid and the accuracy of our prediction for bugproneness and bug severity can be up to 547 and 667638 and 487 respectively when successively returningthe top 50 and 100 classes within these two cases

We expect that our findings are insightful and can beused to support the design and development of softwarehelping engineers in assessing the risk of adding or droppinga feature in the case of supporting existing dependenciesbetween classes We also believe that our approach can beleveraged in the analysis of bug prediction test prioritizationof others open source softwares Finally our findings alsoprovide additional empirical evidence on the importance ofdependencies between classes to researchers in the socialnetwork analysis domain

The next work will mainly focus on two aspects On theone hand we will collect more open source projects (ieEclipse Mozilla or projects deployed on SourceForge) tovalidate the generality of our approach On the other handwe will further take into account the human factors by

the state of software development are a process of humanparticipation An exploration about what impact will bebrought by sociotechnical congruence on bug prediction isurgent and meaningful

Acknowledgment

This work is supported by the National Basic ResearchProgram of China no 2014CB340401 National Natural Sci-ence Foundation of China nos 61273216 61272111 61202048and 61202032 Wuhan Planning Project of Science andTechnology no 201210621214 the Open Foundation ofJiangsu Provincial Key Laboratory of Electronic Businessno JSEB2012-02 and Zhejiang Provincial Natural ScienceFoundation of China no LQ12F02011

References

[1] J Anvik ldquoAutomating bug report assignmentrdquo in Proceedings ofthe 28th International Conference on Software Engineering 2006(ICSE rsquo06) pp 937ndash940 May 2006

[2] D Radjenovica M Herickob R Torkarc and A ZivkovicbldquoSoftware fault prediction metrics a systematic literaturereviewrdquo Information and Software Technology vol 55 pp 1397ndash1418 2013

[3] T Zimmermann and N Nagappan ldquoPredicting defects usingnetwork analysis on dependency graphsrdquo in Proceedings of the30th International Conference on Software Engineering 2008(ICSE rsquo08) pp 531ndash540 May 2008

[4] A Meneely L Williams W Snipes and J Osborne ldquoPredictingfailures with developer networks and social network analysisrdquoin Proceedings of the 16th ACM SIGSOFT International Sym-posium on the Foundations of Software Engineering (SIGSOFT2008FSE rsquo08) pp 13ndash23 November 2008

12 Mathematical Problems in Engineering

[5] KCrowstonKWeiQ Li and JHowison ldquoCore and peripheryin FreeLibre and Open Source software team communica-tionsrdquo in Proceedings of the 39th Annual Hawaii InternationalConference on System Sciences (HICSS rsquo06) p 118 January 2006

[6] T Zimmerman N Nagappan K Herzig R Premraj and LWilliams ldquoAn empirical study on the relation between depen-dency neighborhoods and failuresrdquo in Proceedings of the 4thIEEE International Conference on Software Testing Verificationand Validation (ICST rsquo11) pp 347ndash356 March 2011

[7] S Valverde R F Cancho and R V Sole ldquoScale-free networksfrom optimal designrdquo Europhysics Letters vol 60 no 4 pp 512ndash517 2002

[8] C R Myers ldquoSoftware systems as complex networks structurefunction and evolvability of software collaboration graphsrdquoPhysical Review E vol 68 no 4 Article ID 046116 15 pages2003

[9] G Concas M Marchesi S Pinna and N Serra ldquoPower-laws ina large object-oriented software systemrdquo IEEE Transactions onSoftware Engineering vol 33 no 10 pp 687ndash708 2007

[10] P Louridas D Spinellis and V Vlachos ldquoPower laws insoftwarerdquo ACM Transactions on Software Engineering andMethodology vol 18 no 1 article 2 2008

[11] B Li YutaoMa and J Liu ldquoAdvances in the studies on complexnetworks of software systemsrdquo Advances in Mechanics vol 38no 6 pp 805ndash814 2008

[12] B Li H Wang Z Y Li K Q He and D H Yu ldquoSoftware com-plexity metrics based on complex networksrdquo Acta ElectronicaSinica vol 34 pp 2371ndash2375 2006

[13] W Pan B Li Y Ma and J Liu ldquoMulti-granularity evolutionanalysis of software using complex network theoryrdquo Journal ofSystems Science and Complexity vol 24 no 6 pp 1068ndash10822011

[14] T Opsahl F Agneessens and J Skvoretz ldquoNode centrality inweighted networks generalizing degree and shortest pathsrdquoSocial Networks vol 32 no 3 pp 245ndash251 2010

[15] M S Zanetti I Scholtes C J Tessone and F SchweitzerldquoCategorizing bugs with social networks a case study on fouropen source software communitiesrdquo in Proceedings of the 35thInternational Conference on Software Engineering (ICSE rsquo13) pp1032ndash1041 2013

[16] M S Zanetti and F Schweitzer ldquoA network perspective on soft-ware modularityrdquo in Proceedings of the International Workshopon Complex Sciences in the Engineering of Computing Systemspp 175ndash186 2012

[17] C Ding X He P Husbands H Zha and H D SimonldquoPageRank HITS and a unified framework for link analysisrdquo inProceedings of the 25th Annual International ACM SIGIR Con-ference on Research and Development in Information Retrievalp 353 354 2002

[18] R Premraj and K Herzig ldquoNetwork versus code metrics topredict defects a replication studyrdquo in Proceedings of the 5thInternational Symposium on Empirical Software Engineering andMeasurement (ESEM rsquo11) pp 215ndash224 September 2011

[19] M Pinzger N Nagappan and B Murphy ldquoCan developer-module networks predict failuresrdquo in Proceedings of the 16thACM SIGSOFT International Symposium on the Foundationsof Software Engineering (SIGSOFT 2008FSE rsquo08) pp 2ndash12November 2008

[20] Y Shin A Meneely LWilliams and J A Osborne ldquoEvaluatingcomplexity code churn and developer activity metrics asindicators of software vulnerabilitiesrdquo IEEE Transactions onSoftware Engineering vol 37 no 6 pp 772ndash787 2011

[21] A Sureka A Goyal and A Rastogi ldquoUsing social networkanalysis for mining collaboration data in a defect trackingsystem for risk and vulnerability analysisrdquo in Proceedings of the4th India Software Engineering Conference (ISEC rsquo11) pp 195ndash204 February 2011

[22] F Perin L Renggli and J Ressia ldquoRanking software artifactsrdquoin Proceedings of the 4th Workshop on FAMIX and Moose inReengineering (ICSM rsquo10) pp 1ndash4 2010

[23] P Bhattacharya M Iliofotou I Neamtiu and M FaloutsosldquoGraphbased analysis and prediction for software evolutionrdquoin Proceedings of the International Conference on SoftwareEngineering (ICSE rsquo12) pp 419ndash429 2012

[24] D Steidl B Hummel and E Juergens ldquoUsing network analysisfor recommendation of central software classesrdquo in Proceedingsof the 19th Working Conference on Reverse Engineering pp 93ndash102 2012

[25] T Wolf A Schroter D Damian and T Nguyen ldquoPredictingbuild failures using social network analysisrdquo in Proceedings ofthe International Conference on Software Engineering (ICSE rsquo09)pp 1ndash11 May 2009

[26] M E J Newman Networks Oxford University Press OxfordUK 2010

[27] S Wasserman and K Faust Social Network Analysis Methodsand Applications Cambridge University Press 1994

[28] L C Freeman ldquoCentrality in social networks conceptual clari-ficationrdquo Social Networks vol 1 no 3 pp 215ndash239 1978

[29] M E J Newman and M Girvan ldquoFinding and evaluatingcommunity structure in networksrdquo Physical Review E vol 69no 2 Article ID 026113 15 pages 2004

[30] A Bachmann C Bird F Rahman P Devanbu and ABernstein ldquoThe missing links bugs and bug-fix commitsrdquo inProceedings of the 18th ACM SIGSOFT International Symposiumon the Foundations of Software Engineering (FSE rsquo10) pp 97ndash106November 2010

[31] RWuH Zhang S Kim and S C Cheung ldquoReLink recoveringlinks between bugs and changesrdquo inProceedings of the 19th ACMSIGSOFT Symposium on Foundations of Software Engineering(SIGSOFTFSE rsquo11) pp 15ndash25 September 2011

[32] G Antoniol K Ayari M Di Penta F Khomh and Y GGueheneuc ldquoIs it a bug or an enhancement A text-basedapproach to classify change requestsrdquo in Proceedings of theConference of the Center for Advanced Studies (CASCON rsquo08)ACM October 2008

[33] K Herzig S Just and A Zeller ldquoItrsquos not a bug itrsquos a feature howmisclassification impacts bug predictionrdquo in Proceedings of the35 International Conference on Software Engineering pp 392ndash401 2013

[34] R Tonelli G Concas MMarchesi and AMurgia ldquoAn analysisof SNA metrics on the Java Qualitas Corpusrdquo in Proceedingsof the 4th India Software Engineering Conference (ISEC rsquo11) pp205ndash213 February 2011

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 11: Research Article Using Software Dependency to Bug Predictiondownloads.hindawi.com/journals/mpe/2013/869356.pdfdefects. ey found that network metrics perform signi - cantly better than

Mathematical Problems in Engineering 11

0

2

4

6

8

10

12

14

0 1 2 3 4 5 6 7 8 9

TomcatAnt

(a) Comments

0

1

2

3

4

5

6

0 1 2 3 4 5 6 7 8 9

TomcatAnt

(b) Persons

Figure 6 Human factor (the x-axis is bug rank)

delightful from the other granularity it is an attractivetheme

7 Conclusion

We constructed a class-level software networks and intro-duced nine representative and commonly-used networkmetrics to bug prediction An empirical study has beenconducted on open-source project Apache Tomcat 7 fromversion 700 to 7033 and Ant 184 from Feb 2002 toNovember 2012 Our analysis demonstrated that there is astatistically significant relation between classrsquos importanceand the probability that the class is to be modified and buggyAlso classrsquos importance is related to the severity of raisedbugs The result also showed that just about ten percentof efforts paid and the accuracy of our prediction for bugproneness and bug severity can be up to 547 and 667638 and 487 respectively when successively returningthe top 50 and 100 classes within these two cases

We expect that our findings are insightful and can beused to support the design and development of softwarehelping engineers in assessing the risk of adding or droppinga feature in the case of supporting existing dependenciesbetween classes We also believe that our approach can beleveraged in the analysis of bug prediction test prioritizationof others open source softwares Finally our findings alsoprovide additional empirical evidence on the importance ofdependencies between classes to researchers in the socialnetwork analysis domain

The next work will mainly focus on two aspects On theone hand we will collect more open source projects (ieEclipse Mozilla or projects deployed on SourceForge) tovalidate the generality of our approach On the other handwe will further take into account the human factors by

the state of software development are a process of humanparticipation An exploration about what impact will bebrought by sociotechnical congruence on bug prediction isurgent and meaningful

Acknowledgment

This work is supported by the National Basic ResearchProgram of China no 2014CB340401 National Natural Sci-ence Foundation of China nos 61273216 61272111 61202048and 61202032 Wuhan Planning Project of Science andTechnology no 201210621214 the Open Foundation ofJiangsu Provincial Key Laboratory of Electronic Businessno JSEB2012-02 and Zhejiang Provincial Natural ScienceFoundation of China no LQ12F02011

References

[1] J Anvik ldquoAutomating bug report assignmentrdquo in Proceedings ofthe 28th International Conference on Software Engineering 2006(ICSE rsquo06) pp 937ndash940 May 2006

[2] D Radjenovica M Herickob R Torkarc and A ZivkovicbldquoSoftware fault prediction metrics a systematic literaturereviewrdquo Information and Software Technology vol 55 pp 1397ndash1418 2013

[3] T Zimmermann and N Nagappan ldquoPredicting defects usingnetwork analysis on dependency graphsrdquo in Proceedings of the30th International Conference on Software Engineering 2008(ICSE rsquo08) pp 531ndash540 May 2008

[4] A Meneely L Williams W Snipes and J Osborne ldquoPredictingfailures with developer networks and social network analysisrdquoin Proceedings of the 16th ACM SIGSOFT International Sym-posium on the Foundations of Software Engineering (SIGSOFT2008FSE rsquo08) pp 13ndash23 November 2008

12 Mathematical Problems in Engineering

[5] KCrowstonKWeiQ Li and JHowison ldquoCore and peripheryin FreeLibre and Open Source software team communica-tionsrdquo in Proceedings of the 39th Annual Hawaii InternationalConference on System Sciences (HICSS rsquo06) p 118 January 2006

[6] T Zimmerman N Nagappan K Herzig R Premraj and LWilliams ldquoAn empirical study on the relation between depen-dency neighborhoods and failuresrdquo in Proceedings of the 4thIEEE International Conference on Software Testing Verificationand Validation (ICST rsquo11) pp 347ndash356 March 2011

[7] S Valverde R F Cancho and R V Sole ldquoScale-free networksfrom optimal designrdquo Europhysics Letters vol 60 no 4 pp 512ndash517 2002

[8] C R Myers ldquoSoftware systems as complex networks structurefunction and evolvability of software collaboration graphsrdquoPhysical Review E vol 68 no 4 Article ID 046116 15 pages2003

[9] G Concas M Marchesi S Pinna and N Serra ldquoPower-laws ina large object-oriented software systemrdquo IEEE Transactions onSoftware Engineering vol 33 no 10 pp 687ndash708 2007

[10] P Louridas D Spinellis and V Vlachos ldquoPower laws insoftwarerdquo ACM Transactions on Software Engineering andMethodology vol 18 no 1 article 2 2008

[11] B Li YutaoMa and J Liu ldquoAdvances in the studies on complexnetworks of software systemsrdquo Advances in Mechanics vol 38no 6 pp 805ndash814 2008

[12] B Li H Wang Z Y Li K Q He and D H Yu ldquoSoftware com-plexity metrics based on complex networksrdquo Acta ElectronicaSinica vol 34 pp 2371ndash2375 2006

[13] W Pan B Li Y Ma and J Liu ldquoMulti-granularity evolutionanalysis of software using complex network theoryrdquo Journal ofSystems Science and Complexity vol 24 no 6 pp 1068ndash10822011

[14] T Opsahl F Agneessens and J Skvoretz ldquoNode centrality inweighted networks generalizing degree and shortest pathsrdquoSocial Networks vol 32 no 3 pp 245ndash251 2010

[15] M S Zanetti I Scholtes C J Tessone and F SchweitzerldquoCategorizing bugs with social networks a case study on fouropen source software communitiesrdquo in Proceedings of the 35thInternational Conference on Software Engineering (ICSE rsquo13) pp1032ndash1041 2013

[16] M S Zanetti and F Schweitzer ldquoA network perspective on soft-ware modularityrdquo in Proceedings of the International Workshopon Complex Sciences in the Engineering of Computing Systemspp 175ndash186 2012

[17] C Ding X He P Husbands H Zha and H D SimonldquoPageRank HITS and a unified framework for link analysisrdquo inProceedings of the 25th Annual International ACM SIGIR Con-ference on Research and Development in Information Retrievalp 353 354 2002

[18] R Premraj and K Herzig ldquoNetwork versus code metrics topredict defects a replication studyrdquo in Proceedings of the 5thInternational Symposium on Empirical Software Engineering andMeasurement (ESEM rsquo11) pp 215ndash224 September 2011

[19] M Pinzger N Nagappan and B Murphy ldquoCan developer-module networks predict failuresrdquo in Proceedings of the 16thACM SIGSOFT International Symposium on the Foundationsof Software Engineering (SIGSOFT 2008FSE rsquo08) pp 2ndash12November 2008

[20] Y Shin A Meneely LWilliams and J A Osborne ldquoEvaluatingcomplexity code churn and developer activity metrics asindicators of software vulnerabilitiesrdquo IEEE Transactions onSoftware Engineering vol 37 no 6 pp 772ndash787 2011

[21] A Sureka A Goyal and A Rastogi ldquoUsing social networkanalysis for mining collaboration data in a defect trackingsystem for risk and vulnerability analysisrdquo in Proceedings of the4th India Software Engineering Conference (ISEC rsquo11) pp 195ndash204 February 2011

[22] F Perin L Renggli and J Ressia ldquoRanking software artifactsrdquoin Proceedings of the 4th Workshop on FAMIX and Moose inReengineering (ICSM rsquo10) pp 1ndash4 2010

[23] P Bhattacharya M Iliofotou I Neamtiu and M FaloutsosldquoGraphbased analysis and prediction for software evolutionrdquoin Proceedings of the International Conference on SoftwareEngineering (ICSE rsquo12) pp 419ndash429 2012

[24] D Steidl B Hummel and E Juergens ldquoUsing network analysisfor recommendation of central software classesrdquo in Proceedingsof the 19th Working Conference on Reverse Engineering pp 93ndash102 2012

[25] T Wolf A Schroter D Damian and T Nguyen ldquoPredictingbuild failures using social network analysisrdquo in Proceedings ofthe International Conference on Software Engineering (ICSE rsquo09)pp 1ndash11 May 2009

[26] M E J Newman Networks Oxford University Press OxfordUK 2010

[27] S Wasserman and K Faust Social Network Analysis Methodsand Applications Cambridge University Press 1994

[28] L C Freeman ldquoCentrality in social networks conceptual clari-ficationrdquo Social Networks vol 1 no 3 pp 215ndash239 1978

[29] M E J Newman and M Girvan ldquoFinding and evaluatingcommunity structure in networksrdquo Physical Review E vol 69no 2 Article ID 026113 15 pages 2004

[30] A Bachmann C Bird F Rahman P Devanbu and ABernstein ldquoThe missing links bugs and bug-fix commitsrdquo inProceedings of the 18th ACM SIGSOFT International Symposiumon the Foundations of Software Engineering (FSE rsquo10) pp 97ndash106November 2010

[31] RWuH Zhang S Kim and S C Cheung ldquoReLink recoveringlinks between bugs and changesrdquo inProceedings of the 19th ACMSIGSOFT Symposium on Foundations of Software Engineering(SIGSOFTFSE rsquo11) pp 15ndash25 September 2011

[32] G Antoniol K Ayari M Di Penta F Khomh and Y GGueheneuc ldquoIs it a bug or an enhancement A text-basedapproach to classify change requestsrdquo in Proceedings of theConference of the Center for Advanced Studies (CASCON rsquo08)ACM October 2008

[33] K Herzig S Just and A Zeller ldquoItrsquos not a bug itrsquos a feature howmisclassification impacts bug predictionrdquo in Proceedings of the35 International Conference on Software Engineering pp 392ndash401 2013

[34] R Tonelli G Concas MMarchesi and AMurgia ldquoAn analysisof SNA metrics on the Java Qualitas Corpusrdquo in Proceedingsof the 4th India Software Engineering Conference (ISEC rsquo11) pp205ndash213 February 2011

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 12: Research Article Using Software Dependency to Bug Predictiondownloads.hindawi.com/journals/mpe/2013/869356.pdfdefects. ey found that network metrics perform signi - cantly better than

12 Mathematical Problems in Engineering

[5] KCrowstonKWeiQ Li and JHowison ldquoCore and peripheryin FreeLibre and Open Source software team communica-tionsrdquo in Proceedings of the 39th Annual Hawaii InternationalConference on System Sciences (HICSS rsquo06) p 118 January 2006

[6] T Zimmerman N Nagappan K Herzig R Premraj and LWilliams ldquoAn empirical study on the relation between depen-dency neighborhoods and failuresrdquo in Proceedings of the 4thIEEE International Conference on Software Testing Verificationand Validation (ICST rsquo11) pp 347ndash356 March 2011

[7] S Valverde R F Cancho and R V Sole ldquoScale-free networksfrom optimal designrdquo Europhysics Letters vol 60 no 4 pp 512ndash517 2002

[8] C R Myers ldquoSoftware systems as complex networks structurefunction and evolvability of software collaboration graphsrdquoPhysical Review E vol 68 no 4 Article ID 046116 15 pages2003

[9] G Concas M Marchesi S Pinna and N Serra ldquoPower-laws ina large object-oriented software systemrdquo IEEE Transactions onSoftware Engineering vol 33 no 10 pp 687ndash708 2007

[10] P Louridas D Spinellis and V Vlachos ldquoPower laws insoftwarerdquo ACM Transactions on Software Engineering andMethodology vol 18 no 1 article 2 2008

[11] B Li YutaoMa and J Liu ldquoAdvances in the studies on complexnetworks of software systemsrdquo Advances in Mechanics vol 38no 6 pp 805ndash814 2008

[12] B Li H Wang Z Y Li K Q He and D H Yu ldquoSoftware com-plexity metrics based on complex networksrdquo Acta ElectronicaSinica vol 34 pp 2371ndash2375 2006

[13] W Pan B Li Y Ma and J Liu ldquoMulti-granularity evolutionanalysis of software using complex network theoryrdquo Journal ofSystems Science and Complexity vol 24 no 6 pp 1068ndash10822011

[14] T Opsahl F Agneessens and J Skvoretz ldquoNode centrality inweighted networks generalizing degree and shortest pathsrdquoSocial Networks vol 32 no 3 pp 245ndash251 2010

[15] M S Zanetti I Scholtes C J Tessone and F SchweitzerldquoCategorizing bugs with social networks a case study on fouropen source software communitiesrdquo in Proceedings of the 35thInternational Conference on Software Engineering (ICSE rsquo13) pp1032ndash1041 2013

[16] M S Zanetti and F Schweitzer ldquoA network perspective on soft-ware modularityrdquo in Proceedings of the International Workshopon Complex Sciences in the Engineering of Computing Systemspp 175ndash186 2012

[17] C Ding X He P Husbands H Zha and H D SimonldquoPageRank HITS and a unified framework for link analysisrdquo inProceedings of the 25th Annual International ACM SIGIR Con-ference on Research and Development in Information Retrievalp 353 354 2002

[18] R Premraj and K Herzig ldquoNetwork versus code metrics topredict defects a replication studyrdquo in Proceedings of the 5thInternational Symposium on Empirical Software Engineering andMeasurement (ESEM rsquo11) pp 215ndash224 September 2011

[19] M Pinzger N Nagappan and B Murphy ldquoCan developer-module networks predict failuresrdquo in Proceedings of the 16thACM SIGSOFT International Symposium on the Foundationsof Software Engineering (SIGSOFT 2008FSE rsquo08) pp 2ndash12November 2008

[20] Y Shin A Meneely LWilliams and J A Osborne ldquoEvaluatingcomplexity code churn and developer activity metrics asindicators of software vulnerabilitiesrdquo IEEE Transactions onSoftware Engineering vol 37 no 6 pp 772ndash787 2011

[21] A Sureka A Goyal and A Rastogi ldquoUsing social networkanalysis for mining collaboration data in a defect trackingsystem for risk and vulnerability analysisrdquo in Proceedings of the4th India Software Engineering Conference (ISEC rsquo11) pp 195ndash204 February 2011

[22] F Perin L Renggli and J Ressia ldquoRanking software artifactsrdquoin Proceedings of the 4th Workshop on FAMIX and Moose inReengineering (ICSM rsquo10) pp 1ndash4 2010

[23] P Bhattacharya M Iliofotou I Neamtiu and M FaloutsosldquoGraphbased analysis and prediction for software evolutionrdquoin Proceedings of the International Conference on SoftwareEngineering (ICSE rsquo12) pp 419ndash429 2012

[24] D Steidl B Hummel and E Juergens ldquoUsing network analysisfor recommendation of central software classesrdquo in Proceedingsof the 19th Working Conference on Reverse Engineering pp 93ndash102 2012

[25] T Wolf A Schroter D Damian and T Nguyen ldquoPredictingbuild failures using social network analysisrdquo in Proceedings ofthe International Conference on Software Engineering (ICSE rsquo09)pp 1ndash11 May 2009

[26] M E J Newman Networks Oxford University Press OxfordUK 2010

[27] S Wasserman and K Faust Social Network Analysis Methodsand Applications Cambridge University Press 1994

[28] L C Freeman ldquoCentrality in social networks conceptual clari-ficationrdquo Social Networks vol 1 no 3 pp 215ndash239 1978

[29] M E J Newman and M Girvan ldquoFinding and evaluatingcommunity structure in networksrdquo Physical Review E vol 69no 2 Article ID 026113 15 pages 2004

[30] A Bachmann C Bird F Rahman P Devanbu and ABernstein ldquoThe missing links bugs and bug-fix commitsrdquo inProceedings of the 18th ACM SIGSOFT International Symposiumon the Foundations of Software Engineering (FSE rsquo10) pp 97ndash106November 2010

[31] RWuH Zhang S Kim and S C Cheung ldquoReLink recoveringlinks between bugs and changesrdquo inProceedings of the 19th ACMSIGSOFT Symposium on Foundations of Software Engineering(SIGSOFTFSE rsquo11) pp 15ndash25 September 2011

[32] G Antoniol K Ayari M Di Penta F Khomh and Y GGueheneuc ldquoIs it a bug or an enhancement A text-basedapproach to classify change requestsrdquo in Proceedings of theConference of the Center for Advanced Studies (CASCON rsquo08)ACM October 2008

[33] K Herzig S Just and A Zeller ldquoItrsquos not a bug itrsquos a feature howmisclassification impacts bug predictionrdquo in Proceedings of the35 International Conference on Software Engineering pp 392ndash401 2013

[34] R Tonelli G Concas MMarchesi and AMurgia ldquoAn analysisof SNA metrics on the Java Qualitas Corpusrdquo in Proceedingsof the 4th India Software Engineering Conference (ISEC rsquo11) pp205ndash213 February 2011

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 13: Research Article Using Software Dependency to Bug Predictiondownloads.hindawi.com/journals/mpe/2013/869356.pdfdefects. ey found that network metrics perform signi - cantly better than

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of