Master Thesis presentation

Analysis of AdvancedAggregation Techniquesfor Software Metrics

Final presentation

Bogdan [email protected]

Supervisor: Dr. Alexander Serebrenik

July 20, 2011

2/32

/ department of mathematics and computer science

Analysis of advanced aggregation techniques for software metrics

Most metrics do not have a definition at system level.

2/32




2/32




2/32




3/32



“Designing a sound aggregation of software metrics is notobvious and it is still an open issue.” [CSS09]

GoalDerive requirements for aggregation techniques for softwaremetrics.

3/32



“Designing a sound aggregation of software metrics is notobvious and it is still an open issue.” [CSS09]

GoalDerive requirements for aggregation techniques for softwaremetrics.

4/32


Aggregation of software metrics

Many to one:I Same artifactI Different metrics

Example:Maintainability Index

One to many:I Same metricI Different artifacts

Example:Weighted Methods perClass

4/32


Aggregation of software metrics

Many to one:I Same artifactI Different metrics

Example:Maintainability Index

One to many:I Same metricI Different artifacts

Example:Weighted Methods perClass

5/32


Approach

Derive requirements for one-to-manyaggregation techniques for software metrics

5/32


Approach


Study existingaggregation techniques:

Theoreticalanalysis

Empiricalanalysis

- traditional- inequality indices- threshold-based

(e.g., mean, median) (e.g., Gini, Theil) (e.g., SIG, Squale)

6/32


Inequality indices

Econometrics: measure/explain the inequality of income or wealth.

Software metrics and econometric variables have distributions withsimilar shapes.

Source Lines of Code: freecol−0.9.4

SLOC per class

Fre

quen

cy

0 500 1000 1500 2000 2500 3000

010

020

030

040

0

Household income in Ilocos, Philippines (1998)

Income

Fre

quen

cy

0 500000 1500000 2500000

010

020

030

040

050

0

7/32


Degree of concentration of functionality

Lorenz curve for SLOC in Hibernate3.6.0-beta4.

% Classes

% S

LOC

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

0.0

0.2

0.4

0.6

0.8

1.0

Measure inequality between:I individuals

(e.g., classes)I groups

(e.g., components)

7/32




IHoover

IGini =A

A

B

A+B = 2A



(e.g., components)

7/32




IHoover

IGini =A

A

B

A+B = 2A



(e.g., components)

7/32



When computing the inequalitywithin the entire population, it isoften desirable to assess thecontribution of the inequalitybetween the groups.

Decomposability:

I (X ) = Iwithin + Ibetween

=

m∑j=1

ωj I (Xj )+ Ibetween



(e.g., components)

8/32


Traceability via decomposability

Share of inequality explained by the partitioning G = {G1, . . . ,Gm}:

R (G ) =Ibetween(G )

I (X )

Which individuals (classes in package) contribute to 80% of theinequality of SLOC?

Which class contributes the most to the inequality?

8/32



Share of inequality explained by the partitioning G = {G1, . . . ,Gm}:

R (G ) =Ibetween(G )

I (X )

Which individuals (classes in package) contribute to 80% of theinequality of SLOC?

Which class contributes the most to the inequality?

8/32



LemmaLet X = {x1, x2, . . . , xn} be a collection of values such that x1 ≤ xi ≤ xn .Then, it is either x1 or xn that contributes the most to the inequalitymeasured using ITheil, i.e., it is either the partitioning ({x1}, X \ {x1}) orthe partitioning ({xn}, X \ {xn}) that provides the best explanation forthe inequality measured using ITheil.

9/32


Other properties of inequality indices

Symmetry

Inequality stays the same for any permutation of the population.

9/32



Symmetry


9/32



Symmetry


10/32



Population principle

Inequality does not change if the population is replicated any number oftimes.

10/32





10/32





11/32



Transfers principle

A transfer from a rich man to a poor man (without reversing theirposition) should decrease inequality.

11/32



Transfers principle


11/32



Transfers principle


11/32



Transfers principle

20 36 45

30 36


12/32



Scale invariance

Inequality does not change if all values are multiplied by the sameconstant.

12/32



Scale invariance

Inequality does not change if all values are multiplied by the sameconstant.

13/32


Summary

Ineq. index Sym. Inv. Dec. Pop. Tra.IGini X × X XITheil X × X X XIMLD X × X X XIHoover X × XIαAtkinson X × X X XIβKolm X + X X X

Problems include:I Domain not always Rn .I No distinction between all values equal but low, and all values

equal but high.

14/32


Threshold-based aggregation techniques

Two types:

I hard thresholds: improvements in quality are not reflected as longas the metrics stay within certain boundaries (e.g., SIG).

I soft thresholds: do not exhibit staircasing effects (e.g., Squale).

15/32


The Squale Quality Model

Metrics

Individual Marksin [0,3]

Global Markin [0,3]

15/32



Metrics


Global Markin [0,3]

SLOC per method

Indi

vidu

al M

ark

(IM

)

0 10 20 30 40 50 60 70 80 90 110 130 150 170

0.0

0.5

1.0

1.5

2.0

2.5

3.0

15/32



Metrics


Global Markin [0,3]

SLOC per method

Indi

vidu

al M

ark

(IM

)

0 10 20 30 40 50 60 70 80 90 110 130 150 170

0.0

0.5

1.0

1.5

2.0

2.5

3.0

16/32


Properties of Squale aggregation

Symmetry

Population princ.

Anti-transfers princ.

20 36 45

30 36

17/32


Properties of Squale aggregation

LemmaI log λKolm(x1, . . . , xn)+ I λSquale(x1, . . . , xn) = x

LemmaFor all c ∈ R it holds that I λSquale is “unit translatable”, i.e.,

I λSquale(x1 + c, . . . , xn + c) = I λSquale(x1, . . . , xn)+ c

Inequality indices are invariant with respect to either multiplication, oraddition.

18/32


Summary

We distill:

I Highlighting undesirable values in the aggregated result.

However, problems include:

I Thresholds should be derived and validated.I A high rating is not necessarily an indication of good software

engineering practices.I Not decomposable.

19/32


Approach


Study existingaggregation techniques:

Theoreticalanalysis

Empiricalanalysis

- traditional- inequality indices- threshold-based

(e.g., mean, median) (e.g., Gini, Theil) (e.g., SIG, Squale)

20/32


Empirical evaluation

21/32


Pilot study

Aggregate SLOC from class to package level.

Study statistical correlation betweenI aggregation techniques and

number of defects per package.I pairs of aggregation techniques.

Case studies: ArgoUML, Adempiere, Mogwai.

Questions:I Does aggregation technique influence correlation with bugs?

• Correlation between SLOC and defects is not strong, and isinfluenced by the aggregation technique.

I Which aggregation techniques convey the same information?

• IGini, ITheil, IMLD, IHoover, and IAtkinson convey the same information.

21/32


Pilot study

Aggregate SLOC from class to package level.

Study statistical correlation betweenI aggregation techniques and

number of defects per package.I pairs of aggregation techniques.

Case studies: ArgoUML, Adempiere, Mogwai.

Questions:I Does aggregation technique influence correlation with bugs?

• Correlation between SLOC and defects is not strong, and isinfluenced by the aggregation technique.

I Which aggregation techniques convey the same information?• IGini, ITheil, IMLD, IHoover, and IAtkinson convey the same information.

22/32


Threats to validity

Threat Pilot

Subsequent studies

Metric SLOC

SLOC, LOC, NOS, NOSt, DIT, NOC, PBS, PLwC

ArgoUML

Qualitas Corpus

System Adempiere

106 Java open-source systems

Mogwai

430K files, 57 MSLOC

Version single

414 from 13/106 systems (> 10 versions)

Technique traditional

traditional, ineq. indices, threshold-based

ineq. indicesAggr. level class–package

class-package, method–class

22/32


Threats to validity

Threat Pilot Subsequent studiesMetric SLOC SLOC, LOC, NOS, NOSt, DIT, NOC, PBS, PLwC

ArgoUML Qualitas CorpusSystem Adempiere 106 Java open-source systems

Mogwai 430K files, 57 MSLOCVersion single 414 from 13/106 systems (> 10 versions)Technique traditional traditional, ineq. indices, threshold-based

ineq. indicesAggr. level class–package class-package, method–class

23/32


Results (1)

IGini, ITheil, IMLD, IAtkinson, and IHoover always convey the same information.-1.0

-0.5

0.0

0.5

1.0

-1.0

-0.5

0.0

0.5

1.0

SLOC

MLD-Hoo Gin-MLD The-MLD Gin-Hoo Atk-Hoo The-Hoo Gin-Atk MLD-Atk Gin-The The-Atk

(91%) (89%) (91%) (90%) (92%) (92%) (90%) (91%) (91%) (92%)

-1.0

-0.5

0.0

0.5

1.0

-1.0

-0.5

0.0

0.5

1.0

DIT

MLD-Hoo Atk-Hoo Gin-MLD The-Hoo Gin-Atk Gin-Hoo Gin-The The-MLD The-Atk MLD-Atk

(85%) (87%) (87%) (88%) (88%) (89%) (88%) (88%) (88%) (89%)

24/32


Results (2)

IKolm shows high correlation with mean for size metrics.

-1.0

-0.5

0.0

0.5

1.0

Kendall corr.: mean - Kolm (SLOC)

Ken

dall

corr

elat

ion

coef

ficie

nt

-1.0

-0.5

0.0

0.5

1.0

Kendall corr.: mean - Kolm (SLOC)

Ken

dall

corr

elat

ion

coef

ficie

nt

-1.0

-0.5

0.0

0.5

1.0

Kendall corr.: mean - Kolm (DIT)

Ken

dall

corr

elat

ion

coef

ficie

nt

-1.0

-0.5

0.0

0.5

1.0

Kendall corr.: mean - Kolm (DIT)

Ken

dall

corr

elat

ion

coef

ficie

nt

-1.0

-0.5

0.0

0.5

1.0

Kendall corr.: mean - Kolm (PLwC)

Ken

dall

corr

elat

ion

coef

ficie

nt

-1.0

-0.5

0.0

0.5

1.0

Kendall corr.: mean - Kolm (PLwC)

Ken

dall

corr

elat

ion

coef

ficie

nt

25/32


Results (3)

Superlinear (e.g., ITheil–IGini) and chaotic (e.g., ITheil–IKolm) patterns canbe observed in the scatter plots.

0.1 0.2 0.3 0.4 0.5 0.6

0.0

0.2

0.4

0.6

0.8

1.0

compiere: Theil-Gini. Kendall: 0.94, p-val: 0.00

Gini (SLOC)

The

il (S

LOC

)

0 50 100 150 200 250 300 350

0.0

0.2

0.4

0.6

0.8

1.0

compiere: Theil-Kolm. Kendall: 0.25, p-val: 0.01

Kolm (SLOC)

The

il (S

LOC

)

26/32


Results (4)

Changing the aggregation level to class level does not affect thecorrelation between various aggregation techniques as measured atpackage level.

-1.0

-0.5

0.0

0.5

1.0

Kendall: Gini - Theil (SLOC) (100%)

Ken

dall

corr

elat

ion

coef

ficie

nt

-1.0

-0.5

0.0

0.5

1.0

Kendall: Gini - Theil (SLOC) (100%)

Ken

dall

corr

elat

ion

coef

ficie

nt

-1.0

-0.5

0.0

0.5

1.0

Kendall: Theil - Atkinson (SLOC) (100%)

Ken

dall

corr

elat

ion

coef

ficie

nt

-1.0

-0.5

0.0

0.5

1.0

Kendall: Theil - Atkinson (SLOC) (100%)

Ken

dall

corr

elat

ion

coef

ficie

nt

-1.0

-0.5

0.0

0.5

1.0

Kendall: Theil - MLD (SLOC) (100%)

Ken

dall

corr

elat

ion

coef

ficie

nt

-1.0

-0.5

0.0

0.5

1.0

Kendall: Theil - MLD (SLOC) (100%)

Ken

dall

corr

elat

ion

coef

ficie

nt

27/32


Results (5)

System size does influence the correlation between aggregationtechniques, e.g., ITheil–IKolm increases with system size.

0.0

0.2

0.4

0.6

0.8

1.0

hibernate − Kendall(Theil(SLOC), Kolm(SLOC)) (86 releases)

Cor

. coe

ff. T

heil(

SLO

C)

− K

olm

(SLO

C)

0.8.

11.

01.

12.

0−be

ta−

12.

0−be

ta−

22.

0−be

ta−

32.

0−be

ta−

42.

0−fin

al2.

0−rc

22.

0.1

2.0.

22.

0.3

2.1−

beta

−1

2.1−

beta

−2

2.1−

beta

−3

2.1−

beta

−3b

2.1−

beta

−4

2.1−

beta

−5

2.1−

beta

−6

2.1−

final

2.1−

rc1

2.1.

12.

1.2

2.1.

32.

1.4

2.1.

52.

1.6

2.1.

72.

1.8

3.0

3.0−

alph

a3.

0−be

ta1

3.0−

beta

23.

0−be

ta3

3.0−

beta

43.

0−rc

13.

0.1

3.0.

23.

0.3

3.0.

43.

0.5

3.1

3.1−

alph

a13.

1−be

ta1

3.1−

beta

23.

1−be

ta3

3.1−

rc1

3.1−

rc2

3.1−

rc3

3.1.

13.

1.2

3.1.

33.

2−al

pha1

3.2−

alph

a23.

2−cr

13.

2−cr

23.

2.0−

cr3

3.2.

0−cr

43.

2.0−

cr5

3.2.

0.ga

3.2.

1−ga

3.2.

2−ga

3.2.

3−ga

3.2.

4−ga

3.2.

4−sp

13.

2.5−

ga3.

2.6−

ga3.

2.7−

ga3.

3.0−

cr2

3.3.

0−ga

3.3.

0−sp

13.

3.0.

cr1

3.3.

1−ga

3.3.

2−ga

3.5.

0−be

ta−

13.

5.0−

beta

−2

3.5.

0−be

ta−

33.

5.0−

beta

−4

3.5.

0−cr

−1

3.5.

0−cr

−2

3.5.

3−fin

al3.

5.5−

final

3.6.

0−be

ta1

3.6.

0−be

ta2

3.6.

0−be

ta3

3.6.

0−be

ta4

28/32


Results (6)

SIG and Squale correlate positively to each other and negatively to allother aggregation techniques.

-1.0

-0.5

0.0

0.5

1.0

Kendall: Squale(3) - SIGd (SLOC) (95%)

Ken

dall

corr

elat

ion

coef

ficie

nt

-1.0

-0.5

0.0

0.5

1.0

Kendall: Squale(3) - SIGd (SLOC) (95%)

Ken

dall

corr

elat

ion

coef

ficie

nt

-1.0

-0.5

0.0

0.5

1.0

Kendall: Gini - Squale(3) (SLOC) (95%)

Ken

dall

corr

elat

ion

coef

ficie

nt

-1.0

-0.5

0.0

0.5

1.0

Kendall: Gini - Squale(3) (SLOC) (95%)

Ken

dall

corr

elat

ion

coef

ficie

nt

-1.0

-0.5

0.0

0.5

1.0

Kendall: Theil - Squale(3) (SLOC) (95%)

Ken

dall

corr

elat

ion

coef

ficie

nt

-1.0

-0.5

0.0

0.5

1.0

Kendall: Theil - Squale(3) (SLOC) (95%)

Ken

dall

corr

elat

ion

coef

ficie

nt

29/32


Results (7)

Inequality indices are less appropriate for highlighting undesirablevalues unless assumptions about their number can be made.

0.0

0.5

1.0

1.5

2.0

2.5

3.0

Squale (weight = 3) aggregate for different percentages of perfect IMs

Ave

rage

Squ

ale

(wei

ght =

3)

mar

k

0 10 20 30 40 50 60 70 80 90 100

0.0

0.5

1.0

1.5

2.0

2.5

3.0

range [2, 3)range [1, 2)range [0.5, 1)range [0.1, 0.5)range (0, 0.1)

Ave

rage

mea

n ra

nge

Percentage of imperfect marks

0.0

0.5

1.0

1.5

2.0

2.5

3.0

Theil aggregate for different percentages of perfect IMs

Ave

rage

The

il ag

greg

ate

0 10 20 30 40 50 60 70 80 90 100

2.0

1.5

1.0

0.5

0.0


Ave

rage

mea

n ra

nge


0.0

0.5

1.0

1.5

2.0

2.5

3.0

Kolm aggregate for different percentages of perfect IMs

Ave

rage

Kol

m a

ggre

gate

0 10 20 30 40 50 60 70 80 90 100

1.0

0.8

0.6

0.4

0.2

0.0


Ave

rage

mea

n ra

nge


30/32


Summary

We distill:

I Correlation with Squale or SIG for aggregation techniques thatsatisfy the highlight problems requirement.

I Correlation with ITheil, IMLD, or IAtkinson, e.g., for aggregationtechniques that satisfy the symmetry and decomposabilityrequirements.

31/32


Conclusions

Requirements for one-to-manyaggregation techniques for software metrics

Existing aggregation techniques

Theoretical analysisEmpirical analysis

- root-cause analysis using

- mathematical properties of

- methodology and tooling- correlation studies with differentobjectives, metrics, systems, versions, aggregation techniques, aggregation levels

31/32


Conclusions

Requirements for one-to-manyaggregation techniques for software metrics

Existing aggregation techniques

Theoretical analysisEmpirical analysis

- root-cause analysis using

- mathematical properties of

- methodology and tooling- correlation studies with differentobjectives, metrics, systems, versions, aggregation techniques, aggregation levels

New one-to-many aggregationtechniques for software metrics

Extensions:- other software metrics- non-software domains

Apply the same techniques toaggregation of combined metrics data

Determine an optimal partitioningSocial organizationof software projects

32/32


Publications

ComparativeStudy of SoftwareMetrics’ Aggregation Techniques

Bogdan Vasilescu, Alexander Serebrenik∗, Mark van den Brand

TechnischeUniversiteit Eindhoven,Den Dolech 2, P.O. Box 513, 5600MBEindhoven, TheNetherlands

Abstract

While softwaremetrics are commonly used to assess softwaremaintainability and study softwareevolution, they areusually defined on amicro-level (method, class, package). Metricsshould thereforebeaggregated in order to provideinsights in the evolution at the macro-level (system). In addition to traditional aggregation techniques such as themean, recently econometric aggregation techniques such as theGini index and the Theil index have been proposed.Advantagesand disadvantagesof di erent aggregation techniqueshavenot been evaluated empirically so far. In thispaper wepresent thepreliminary resultsof thecomparativestudy of di erent aggregation techniques.

Keywords:softwaremetrics, maintainability, aggregation techniques

1. Introduction

While softwaremetrics are commonly used to assess softwaremaintainability and study softwareevolution, theyare usually defined on a micro-level (method, class, package). Metrics should therefore be aggregated in order toprovide insights in the evolution at themacro-level (system). Popular aggregation techniques include themean [15]and distribution fitting [4, 19]. The main advantage of the mean is its metrics-independence: whatever metrics areconsidered, themean should becalculated in thesameway. However, as thedistribution of many interesting softwaremetrics isskewed [24] themeanbecomesunreliable. Distribution fitting consistsof selectingaknown family of distri-butions(e.g., log-normal, exponential or negativebinomial) and fitting itsparametersto approximatethemetric valuesobserved. However, thefitting processshould berepeatedwhenever anew metric isbeing considered. Moreover, it isstill amatter of controversy whether, e.g., softwaresize isdistributed log-normally [4] or doublePareto [11].

It is highly desirable, hence, to develop an aggregation approach that would be both reliable and independent ofthe metrics being aggregated. Examples of such approaches are the Gini coe cient [10] and the Theil index [22],both well-known in econometrics [6] and recently applied to software metrics [23, 20]. Comparison of di erentaggregation techniqueswasso far missing, however. In this short paper wepresent thefirst preliminary results.

Remainder of thispaper isorganizedasfollows. InSection2webriefly introducetheaggregation techniquesbeingcompared. Section 3 compares the theoretical propertiesof di erent aggregation techniques. Section 4 described theempirical studiesconducted and, finally, Section 5 discusses related work and concludes.

2. Aggregation techniques

In this section webriefly present themathematical definitions of the aggregation techniques to be evaluated. Let{x1, . . . , xn} be theset of values to beaggregated. Then, themean, denoted as x, isdefined as 1

nni=1 xi .

∗Corresponding authorEmail addresses: [email protected] (Bogdan Vasilescu), [email protected] (Alexander Serebrenik),

[email protected] (Mark van den Brand)

Preprint submitted to Elsevier June27, 2011

You Can’t Control the Unfamiliar:A Study on the Relations Between Aggregation

Techniques for Software MetricsBogdan Vasilescu, Alexander Serebrenik, Mark van den Brand

Technische Universiteit Eindhoven,Den Dolech 2, P.O. Box 513,

5600 MB Eindhoven, The Netherlands{b.n.vasilescu@student., a.serebrenik@, m.g.j.v.d.brand@}tue.nl

Abstract—A popular approach to assessing software main-tainability and predicting its evolution involves collecting andanalyzing softwaremetr ics. However, metr ics are usually definedon a micro-level (method, class, package), and should thereforebe aggregated in order to provide insights in the evolution at themacro-level (system). In addition to traditional aggregation tech-niques such as the mean, median, or sum, recently econometr icaggregation techniques, such as the Gini, Theil, Kolm, Atkinson,and Hoover inequality indices have been proposed and appliedto software metr ics.In this paper we present the results of an extensive correlation

study of themost widely-used traditional and econometr ic aggre-gation techniques, applied to lifting SLOC values from class topackage level in the 106 systems compr ising theQualitasCorpus.Moreover, we investigate the nature of this relation, and studyits evolution on a subset of 12 systems from theQualitas Corpus.Our results indicate high and statistically significant corre-

lation between the Gini, Theil, Atkinson, and Hoover indices,i.e., aggregation values obtained using these techniques conveythe same information. However, we discuss some of the rationalebehind choosing between one index or another.

I . INTRODUCTION

Software maintenance is an area of software engineeringwith deep financial implications. Indeed, it was reported thatbetween 60%and 90%of thesoftwarebudgets represent main-tenanceand evolution costs [1]–[3]. Furthermore, maintenanceand evolution costs were forecasted to account for more thanhalf of North American and European software budgets in2010 [4]. Similar or even higher figures were reported forcountries such as Norway [5] and Chile [6].Controlling software maintenance costs requires predicting

how the system will evolve in the future, which in turnrequires a better understanding of software evolution [7]–[9].A popular approach to assessing software maintainability andpredicting its evolution involves performing measurements oncode artifacts. It starts off by identifying a number of specificproperties of the system under investigation, and then collect-ing the corresponding software metrics and analyzing theirevolution. Although it is debatablewhether one cannot controlwhat one cannot measure, it is without a doubt that collectingand analyzing metrics helps increase one’s familiarity andunderstanding of the analyzed systems.

However, metricsareusually defined at micro level (method,class, package), while the analysis of maintainability andevolution requires insights at macro (system) level. Moreover,due to privacy reasons, it might be undesirable to disclosemetrics pertaining to a single developer as opposed to thosepertaining to the entire project [10]. Metrics should thereforebe aggregated [11].Popular aggregation techniques include such standard sum-

mary statistical measures as mean, median, or sum [12], [13].Their main advantage is universality (metrics-independence):whatever metrics are considered, the measures should be cal-culated in the sameway. However, as the distribution of manyinteresting software metrics is skewed [14], the interpretationof such measures becomes unreliable [15].Alternatively, distribution fitting [14], [16], [17] consists of

selecting a known family of distributions (e.g., log-normalor exponential) and fitting its parameters to approximate themetric values observed. The fitted parameters can be thenseen as aggregating these values. However, the fitting processshould be repeated whenever a new metric is being consid-ered. Moreover, it is still a matter of controversy whether,e.g., software size is distributed log-normally [16] or doublePareto [18]. We do not consider distribution fitting.Recently, there isan emerging trend in usingmoreadvanced

aggregation techniques borrowed from econometrics, wherethey areused to study inequality of incomeor welfaredistribu-tions [19]–[21]. The motivation for applying such techniquesto software metrics is twofold. First, as numerous countrieshave few rich and many poor, numerous software systemshave few very big or complex components, and many small orsimple ones [15], [22], [23]. Consequently, it is common bothfor software metrics, as well as for econometric variables tohave strongly-skewed distributions (Figure 1).Second, the shape of these distributions, which appear

visually to follow a power law, renders the use of traditionalaggregation techniques such as the sample mean and variancequestionable at best. Indeed, it was reported that many impor-tant relationships between software artifacts follow a power-law distribution [16], [25], and it is known that a power-lawdistribution may not have a finite mean and variance [22].

By No Means: A Study on Aggregating Software Metrics

Bogdan VasilescuTechnische Universiteit

EindhovenDen Dolech 2, P.O. Box 513,

5600 MB EindhovenThe Netherlands

[email protected]

Alexander SerebrenikTechnische Universiteit



[email protected]

Mark van den BrandTechnische Universiteit



[email protected]

ABSTRACTFault prediction models usually employ software metrics which

were previously shown to be a strong predictor for defects,

e.g., SLOC. However, metrics are usually de ned on a micro-

level (method, class, package), and should therefore be ag-

gregated in order to provide insights in the evolution at the

macro-level (system). In addition to traditional aggrega-

tion techniques such as the mean, median, or sum, recently

econometric aggregation techniques, such as the Gini, Theil,

and Hoover indices have been proposed. In this paper we

wish to understand whether the aggregation technique in-

uences the presence and strength of the relation between

SLOC and defects. Our results indicate that correlation is

not strong, and is in uenced by the aggregation technique.

Categor iesand Subject Descr iptorsD.2.7 [Software Engineering]: Distribution, Maintenance,

and Enhancement corrections; D.2.8 [Software Engineer-

ing]: Metrics complexity measures

General TermsMeasurement, Economics, Experimentation

KeywordsSoftware metrics, maintainability, aggregation techniques

1. INTRODUCTIONSoftware maintenance is an area of software engineering

with deep nancial implications. Indeed, it was reported

that up to 90% of the software budgets represent mainte-

nance and evolution costs [10, 3]. Thus, in order to control

software maintenance costs, it is desirable, e.g., to predict

faulty components early in the development phase.

Fault prediction models usually employ software metrics

which were previously shown to be a strong predictor for de-

fects [9, 4, 21, 22, 20, 12]. Such a metric is size, measured in

Permission to make digital or hard copies of all or part of this work forpersonal or classroom use is granted without fee provided that copies arenot madeor distributed for profit or commercial advantage and that copiesbear thisnoticeand the full citation on thefirst page. To copy otherwise, torepublish, to post on serversor to redistribute to lists, requiresprior specificpermission and/or a fee.ICSE ’11, May 21–28, 2011, Waikiki, Honolulu, HI, USACopyright 2011 ACM 978-1-4503-0593-8/11/05 ...$10.00.

(source) lines of code, (S)LOC. Size (SLOC) not only corre-

sponds to the intuitive belief that large systems have more

faults in them than small systems, but was shown to act

as an early indicator of problems better than, e.g., object-

oriented metrics such as the Chidamber and Kemerer suite

or the Lorenz and Kidd suite [9].

However, software metrics are commonly de ned at micro-

level (method, class, package), and should therefore be ag-

gregated at macro-level (system), in order to provide insights

in the study of maintainability and evolution.

Popular aggregation techniques include such standard sum-

mary statistical measures as mean, median, or sum [19].

Their main advantage is universality (metrics-independence):

whatever metrics are considered, the measures should be

calculated in the same way. However, as the distribution of

many interesting software metrics is skewed [29], the inter-

pretation of such measures becomes unreliable.

Alternatively, distribution tting [6, 26, 29] consists of se-

lecting a known family of distributions (e.g., log-normal or

exponential) and tting its parameters to approximate the

metric values observed. The tted parameters can be then

considered as aggregating these values. However, the tting

process should be repeated whenever a new metric is be-

ing considered. Moreover, it is still a matter of controversy

whether, e.g., software size is distributed log-normally [6] or

double Pareto [14].

Recently, there is an emerging trend in using more ad-

vanced aggregation techniques, that are both reliable, as well

as general. Examples of such approaches are the Gini coe -

cient [11], the Theil index [28], and the Hoover index [15], all

well-known in econometrics for their applicability to study-

ing income inequality [7], and recently applied to software

metrics [27, 30, 13, 31].

In this preliminary study, based on the assumption that

size is a good predictor for defects, hence size and defects

should be statistically related, we wish to understand whether

the aggregation technique in uences the presence and strength

of this relation. Brie y, our results indicate that correlation

between SLOC and defects is not strong, and is in uenced

by the aggregation technique.

2. METHODOLOGYWe apply correlation analysis to SLOC data of Java classes

aggregated at package level using di erent aggregation tech-

niques, and defects (bug count per package). As a by-

product of our evaluation, we also study the correlation be-

tween the di erent aggregation techniques themselves. The

choice for aggregating data from class to package level rather

JOURNAL OFSOFTWAREMAINTENANCEAND EVOLUTION: RESEARCH AND PRACTICEJ. Softw. Maint. Evol.: Res. Pract. 0000; 00:1–15Published online inWiley InterScience (www.interscience.wiley.com). DOI: 10.1002/smr

Practical SoftwareQuality MetricsAggregation

KarineMordal1, NicolasAnquetil2 , Jannik Laval2, Alexander Serebrenik3, BogdanVasilescu3, and StephaneDucasse2

1LIASD, University of Paris8, France2RMoD Team, INRIA, Lille, France

3TechnischeUniversiteit Eindhoven, TheNetherlands

SUMMARY

With the growing need for quality assessment of entire software systems, in practice, new issues areemerging. First, since most software quality metrics are defined at the level of individual softwarecomponents, there is a need for aggregation methods to summarize the results at the system level. Second,sinceapractical evaluation requires theuseof different metrics, with possibly widely varying output ranges,there is a need to combine these results into a unified quality assessment. Third, since projects vary anddifferent organizationshavedifferent perceptionson quality, there isaneed to adapt the interpretation of thequality assessment to theperception of theusersperforming it. In thispaper weidentify therequirementsfora practical aggregation method, and present theSqualemodel for metric aggregation, specifically designedto address theneedsof practitioners. Weempirically validate theadequation of Squale through experimentson Eclipse. Additionally, wecompare theSqualemodel to both traditional aggregation techniques (e.g., thearithmetic mean), aswell as to econometric inequality indices (e.g., theGini or theTheil indices), recentlyapplied to aggregation of softwaremetrics. Copyright c 0000 JohnWiley & Sons, Ltd.

Received . . .

KEY WORDS: softwaremetrics; softwarequality; aggregation; inequality indices

1. INTRODUCTION

Softwaremetrics are becoming part of the software development fabric, essential to understandingwhether the quality of the software we are building corresponds to our expectations [Pfl08]. Asa consequence, many different metrics have been proposed, as well as a plethora of tools tocomputethemandperformquality assessments. Considering thedifferent stakeholdersparticipatingin software projects (e.g. developers, managers, users), quality needs to be evaluated at differentlevels of detail. Practical application of software metrics is, however, challenged by (i) the needto combine different metrics as recommended by quality-model design methods such as Factor-Criteria-Metric (FCM) [MRW76], or Goal-Question-Metric (GQM) [Bas92]; (ii) theneed to obtaininsights in quality of the entire system based on the metric values obtained for low-level systemelements such as classes and methods; and (iii) the need to fine tune the quality model to differentquality standardsemployed by different organizations. Wedetail each challengeseparately.First, a practical quality assessment needs to combine the results of various methods to answer

specificquestionsassuggestedby suchmodelsasFactor-Criteria-Metric (FCM) [MRW76], or Goal-Question-Metric (GQM) [Bas92]. For example, cyclomatic complexity might becombinedwith test

Correspondence to: INRIA Team RMod, Parc Scientifique de la Haute Borne, 40, avenue Halley. Bt.A, Park Plaza,59650 Villeneuved’Ascq, France. E-mail: [email protected]

Copyright c 0000 JohnWiley & Sons, Ltd.

Prepared using smrauth.cls [Version: 2010/05/10 v2.00]

BeNeVol 2010 ICSM 2011WETSoM 2011 JSME

33/32


Correlation

Linear correlation can be misleading.

5 10 15

46

810

12

Pea: 0.816; Ken: 0.963; Spe: 0.990

●●

●

●●

●

●●

●

●●

5 10 15

46

810

12

Pea: 0.816; Ken: 0.636; Spe: 0.818

●

●●

●●

●

●

●

●

●

●

5 10 15

46

810

12

Pea: 0.816; Ken: 0.563; Spe: 0.690

●

●●●

●

●

●

●

●

●

●

5 10 15

46

810

12

Pea: 0.816; Ken: 0.426; Spe: 0.5

●

●

●

●●

●

●

●

●

●

●

34/32


References I

Irina Diana Coman, Alberto Sillitti, and Giancarlo Succi.

A case-study on using an automated in-process software engineeringmeasurement and analysis system in an industrial environment.

In Proceedings of the 31st International Conference on SoftwareEngineering, pages 89–99. IEEE Computer Society, 2009.

Master Thesis presentation

Technology

computer science

inequality measure inequality

inequality of sloc

symmetry inequality

inequality of income

inequality groupsbetween

software metrics332

median inequality indices