Analysis of Advanced Aggregation Techniques for Software Metrics Final presentation Bogdan Vasilescu [email protected] Supervisor: Dr. Alexander Serebrenik July 20, 2011
Dec 04, 2014
Analysis of AdvancedAggregation Techniquesfor Software Metrics
Final presentation
Bogdan [email protected]
Supervisor: Dr. Alexander Serebrenik
July 20, 2011
2/32
/ department of mathematics and computer science
Analysis of advanced aggregation techniques for software metrics
Most metrics do not have a definition at system level.
2/32
/ department of mathematics and computer science
Analysis of advanced aggregation techniques for software metrics
Most metrics do not have a definition at system level.
2/32
/ department of mathematics and computer science
Analysis of advanced aggregation techniques for software metrics
Most metrics do not have a definition at system level.
2/32
/ department of mathematics and computer science
Analysis of advanced aggregation techniques for software metrics
Most metrics do not have a definition at system level.
3/32
/ department of mathematics and computer science
Analysis of advanced aggregation techniques for software metrics
“Designing a sound aggregation of software metrics is notobvious and it is still an open issue.” [CSS09]
GoalDerive requirements for aggregation techniques for softwaremetrics.
3/32
/ department of mathematics and computer science
Analysis of advanced aggregation techniques for software metrics
“Designing a sound aggregation of software metrics is notobvious and it is still an open issue.” [CSS09]
GoalDerive requirements for aggregation techniques for softwaremetrics.
4/32
/ department of mathematics and computer science
Aggregation of software metrics
Many to one:I Same artifactI Different metrics
Example:Maintainability Index
One to many:I Same metricI Different artifacts
Example:Weighted Methods perClass
4/32
/ department of mathematics and computer science
Aggregation of software metrics
Many to one:I Same artifactI Different metrics
Example:Maintainability Index
One to many:I Same metricI Different artifacts
Example:Weighted Methods perClass
5/32
/ department of mathematics and computer science
Approach
Derive requirements for one-to-manyaggregation techniques for software metrics
5/32
/ department of mathematics and computer science
Approach
Derive requirements for one-to-manyaggregation techniques for software metrics
Study existingaggregation techniques:
Theoreticalanalysis
Empiricalanalysis
- traditional- inequality indices- threshold-based
(e.g., mean, median) (e.g., Gini, Theil) (e.g., SIG, Squale)
6/32
/ department of mathematics and computer science
Inequality indices
Econometrics: measure/explain the inequality of income or wealth.
Software metrics and econometric variables have distributions withsimilar shapes.
Source Lines of Code: freecol−0.9.4
SLOC per class
Fre
quen
cy
0 500 1000 1500 2000 2500 3000
010
020
030
040
0
Household income in Ilocos, Philippines (1998)
Income
Fre
quen
cy
0 500000 1500000 2500000
010
020
030
040
050
0
7/32
/ department of mathematics and computer science
Degree of concentration of functionality
Lorenz curve for SLOC in Hibernate3.6.0-beta4.
% Classes
% S
LOC
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
0.0
0.2
0.4
0.6
0.8
1.0
Measure inequality between:I individuals
(e.g., classes)I groups
(e.g., components)
7/32
/ department of mathematics and computer science
Degree of concentration of functionality
Lorenz curve for SLOC in Hibernate3.6.0-beta4.
IHoover
IGini =A
A
B
A+B = 2A
Measure inequality between:I individuals
(e.g., classes)I groups
(e.g., components)
7/32
/ department of mathematics and computer science
Degree of concentration of functionality
Lorenz curve for SLOC in Hibernate3.6.0-beta4.
IHoover
IGini =A
A
B
A+B = 2A
Measure inequality between:I individuals
(e.g., classes)I groups
(e.g., components)
7/32
/ department of mathematics and computer science
Degree of concentration of functionality
When computing the inequalitywithin the entire population, it isoften desirable to assess thecontribution of the inequalitybetween the groups.
Decomposability:
I (X ) = Iwithin + Ibetween
=
m∑j=1
ωj I (Xj )+ Ibetween
Measure inequality between:I individuals
(e.g., classes)I groups
(e.g., components)
8/32
/ department of mathematics and computer science
Traceability via decomposability
Share of inequality explained by the partitioning G = {G1, . . . ,Gm}:
R (G ) =Ibetween(G )
I (X )
Which individuals (classes in package) contribute to 80% of theinequality of SLOC?
Which class contributes the most to the inequality?
8/32
/ department of mathematics and computer science
Traceability via decomposability
Share of inequality explained by the partitioning G = {G1, . . . ,Gm}:
R (G ) =Ibetween(G )
I (X )
Which individuals (classes in package) contribute to 80% of theinequality of SLOC?
Which class contributes the most to the inequality?
8/32
/ department of mathematics and computer science
Traceability via decomposability
LemmaLet X = {x1, x2, . . . , xn} be a collection of values such that x1 ≤ xi ≤ xn .Then, it is either x1 or xn that contributes the most to the inequalitymeasured using ITheil, i.e., it is either the partitioning ({x1}, X \ {x1}) orthe partitioning ({xn}, X \ {xn}) that provides the best explanation forthe inequality measured using ITheil.
9/32
/ department of mathematics and computer science
Other properties of inequality indices
Symmetry
Inequality stays the same for any permutation of the population.
9/32
/ department of mathematics and computer science
Other properties of inequality indices
Symmetry
Inequality stays the same for any permutation of the population.
9/32
/ department of mathematics and computer science
Other properties of inequality indices
Symmetry
Inequality stays the same for any permutation of the population.
10/32
/ department of mathematics and computer science
Other properties of inequality indices
Population principle
Inequality does not change if the population is replicated any number oftimes.
10/32
/ department of mathematics and computer science
Other properties of inequality indices
Population principle
Inequality does not change if the population is replicated any number oftimes.
10/32
/ department of mathematics and computer science
Other properties of inequality indices
Population principle
Inequality does not change if the population is replicated any number oftimes.
11/32
/ department of mathematics and computer science
Other properties of inequality indices
Transfers principle
A transfer from a rich man to a poor man (without reversing theirposition) should decrease inequality.
11/32
/ department of mathematics and computer science
Other properties of inequality indices
Transfers principle
A transfer from a rich man to a poor man (without reversing theirposition) should decrease inequality.
11/32
/ department of mathematics and computer science
Other properties of inequality indices
Transfers principle
A transfer from a rich man to a poor man (without reversing theirposition) should decrease inequality.
11/32
/ department of mathematics and computer science
Other properties of inequality indices
Transfers principle
20 36 45
30 36
A transfer from a rich man to a poor man (without reversing theirposition) should decrease inequality.
12/32
/ department of mathematics and computer science
Other properties of inequality indices
Scale invariance
Inequality does not change if all values are multiplied by the sameconstant.
12/32
/ department of mathematics and computer science
Other properties of inequality indices
Scale invariance
Inequality does not change if all values are multiplied by the sameconstant.
13/32
/ department of mathematics and computer science
Summary
Ineq. index Sym. Inv. Dec. Pop. Tra.IGini X × X XITheil X × X X XIMLD X × X X XIHoover X × XIαAtkinson X × X X XIβKolm X + X X X
Problems include:I Domain not always Rn .I No distinction between all values equal but low, and all values
equal but high.
14/32
/ department of mathematics and computer science
Threshold-based aggregation techniques
Two types:
I hard thresholds: improvements in quality are not reflected as longas the metrics stay within certain boundaries (e.g., SIG).
I soft thresholds: do not exhibit staircasing effects (e.g., Squale).
15/32
/ department of mathematics and computer science
The Squale Quality Model
Metrics
Individual Marksin [0,3]
Global Markin [0,3]
15/32
/ department of mathematics and computer science
The Squale Quality Model
Metrics
Individual Marksin [0,3]
Global Markin [0,3]
SLOC per method
Indi
vidu
al M
ark
(IM
)
0 10 20 30 40 50 60 70 80 90 110 130 150 170
0.0
0.5
1.0
1.5
2.0
2.5
3.0
15/32
/ department of mathematics and computer science
The Squale Quality Model
Metrics
Individual Marksin [0,3]
Global Markin [0,3]
SLOC per method
Indi
vidu
al M
ark
(IM
)
0 10 20 30 40 50 60 70 80 90 110 130 150 170
0.0
0.5
1.0
1.5
2.0
2.5
3.0
16/32
/ department of mathematics and computer science
Properties of Squale aggregation
Symmetry
Population princ.
Anti-transfers princ.
20 36 45
30 36
17/32
/ department of mathematics and computer science
Properties of Squale aggregation
LemmaI log λKolm(x1, . . . , xn)+ I λSquale(x1, . . . , xn) = x
LemmaFor all c ∈ R it holds that I λSquale is “unit translatable”, i.e.,
I λSquale(x1 + c, . . . , xn + c) = I λSquale(x1, . . . , xn)+ c
Inequality indices are invariant with respect to either multiplication, oraddition.
18/32
/ department of mathematics and computer science
Summary
We distill:
I Highlighting undesirable values in the aggregated result.
However, problems include:
I Thresholds should be derived and validated.I A high rating is not necessarily an indication of good software
engineering practices.I Not decomposable.
19/32
/ department of mathematics and computer science
Approach
Derive requirements for one-to-manyaggregation techniques for software metrics
Study existingaggregation techniques:
Theoreticalanalysis
Empiricalanalysis
- traditional- inequality indices- threshold-based
(e.g., mean, median) (e.g., Gini, Theil) (e.g., SIG, Squale)
20/32
/ department of mathematics and computer science
Empirical evaluation
21/32
/ department of mathematics and computer science
Pilot study
Aggregate SLOC from class to package level.
Study statistical correlation betweenI aggregation techniques and
number of defects per package.I pairs of aggregation techniques.
Case studies: ArgoUML, Adempiere, Mogwai.
Questions:I Does aggregation technique influence correlation with bugs?
• Correlation between SLOC and defects is not strong, and isinfluenced by the aggregation technique.
I Which aggregation techniques convey the same information?
• IGini, ITheil, IMLD, IHoover, and IAtkinson convey the same information.
21/32
/ department of mathematics and computer science
Pilot study
Aggregate SLOC from class to package level.
Study statistical correlation betweenI aggregation techniques and
number of defects per package.I pairs of aggregation techniques.
Case studies: ArgoUML, Adempiere, Mogwai.
Questions:I Does aggregation technique influence correlation with bugs?
• Correlation between SLOC and defects is not strong, and isinfluenced by the aggregation technique.
I Which aggregation techniques convey the same information?• IGini, ITheil, IMLD, IHoover, and IAtkinson convey the same information.
22/32
/ department of mathematics and computer science
Threats to validity
Threat Pilot
Subsequent studies
Metric SLOC
SLOC, LOC, NOS, NOSt, DIT, NOC, PBS, PLwC
ArgoUML
Qualitas Corpus
System Adempiere
106 Java open-source systems
Mogwai
430K files, 57 MSLOC
Version single
414 from 13/106 systems (> 10 versions)
Technique traditional
traditional, ineq. indices, threshold-based
ineq. indicesAggr. level class–package
class-package, method–class
22/32
/ department of mathematics and computer science
Threats to validity
Threat Pilot Subsequent studiesMetric SLOC SLOC, LOC, NOS, NOSt, DIT, NOC, PBS, PLwC
ArgoUML Qualitas CorpusSystem Adempiere 106 Java open-source systems
Mogwai 430K files, 57 MSLOCVersion single 414 from 13/106 systems (> 10 versions)Technique traditional traditional, ineq. indices, threshold-based
ineq. indicesAggr. level class–package class-package, method–class
23/32
/ department of mathematics and computer science
Results (1)
IGini, ITheil, IMLD, IAtkinson, and IHoover always convey the same information.-1.0
-0.5
0.0
0.5
1.0
-1.0
-0.5
0.0
0.5
1.0
SLOC
MLD-Hoo Gin-MLD The-MLD Gin-Hoo Atk-Hoo The-Hoo Gin-Atk MLD-Atk Gin-The The-Atk
(91%) (89%) (91%) (90%) (92%) (92%) (90%) (91%) (91%) (92%)
-1.0
-0.5
0.0
0.5
1.0
-1.0
-0.5
0.0
0.5
1.0
DIT
MLD-Hoo Atk-Hoo Gin-MLD The-Hoo Gin-Atk Gin-Hoo Gin-The The-MLD The-Atk MLD-Atk
(85%) (87%) (87%) (88%) (88%) (89%) (88%) (88%) (88%) (89%)
24/32
/ department of mathematics and computer science
Results (2)
IKolm shows high correlation with mean for size metrics.
-1.0
-0.5
0.0
0.5
1.0
Kendall corr.: mean - Kolm (SLOC)
Ken
dall
corr
elat
ion
coef
ficie
nt
-1.0
-0.5
0.0
0.5
1.0
Kendall corr.: mean - Kolm (SLOC)
Ken
dall
corr
elat
ion
coef
ficie
nt
-1.0
-0.5
0.0
0.5
1.0
Kendall corr.: mean - Kolm (DIT)
Ken
dall
corr
elat
ion
coef
ficie
nt
-1.0
-0.5
0.0
0.5
1.0
Kendall corr.: mean - Kolm (DIT)
Ken
dall
corr
elat
ion
coef
ficie
nt
-1.0
-0.5
0.0
0.5
1.0
Kendall corr.: mean - Kolm (PLwC)
Ken
dall
corr
elat
ion
coef
ficie
nt
-1.0
-0.5
0.0
0.5
1.0
Kendall corr.: mean - Kolm (PLwC)
Ken
dall
corr
elat
ion
coef
ficie
nt
25/32
/ department of mathematics and computer science
Results (3)
Superlinear (e.g., ITheil–IGini) and chaotic (e.g., ITheil–IKolm) patterns canbe observed in the scatter plots.
0.1 0.2 0.3 0.4 0.5 0.6
0.0
0.2
0.4
0.6
0.8
1.0
compiere: Theil-Gini. Kendall: 0.94, p-val: 0.00
Gini (SLOC)
The
il (S
LOC
)
0 50 100 150 200 250 300 350
0.0
0.2
0.4
0.6
0.8
1.0
compiere: Theil-Kolm. Kendall: 0.25, p-val: 0.01
Kolm (SLOC)
The
il (S
LOC
)
26/32
/ department of mathematics and computer science
Results (4)
Changing the aggregation level to class level does not affect thecorrelation between various aggregation techniques as measured atpackage level.
-1.0
-0.5
0.0
0.5
1.0
Kendall: Gini - Theil (SLOC) (100%)
Ken
dall
corr
elat
ion
coef
ficie
nt
-1.0
-0.5
0.0
0.5
1.0
Kendall: Gini - Theil (SLOC) (100%)
Ken
dall
corr
elat
ion
coef
ficie
nt
-1.0
-0.5
0.0
0.5
1.0
Kendall: Theil - Atkinson (SLOC) (100%)
Ken
dall
corr
elat
ion
coef
ficie
nt
-1.0
-0.5
0.0
0.5
1.0
Kendall: Theil - Atkinson (SLOC) (100%)
Ken
dall
corr
elat
ion
coef
ficie
nt
-1.0
-0.5
0.0
0.5
1.0
Kendall: Theil - MLD (SLOC) (100%)
Ken
dall
corr
elat
ion
coef
ficie
nt
-1.0
-0.5
0.0
0.5
1.0
Kendall: Theil - MLD (SLOC) (100%)
Ken
dall
corr
elat
ion
coef
ficie
nt
27/32
/ department of mathematics and computer science
Results (5)
System size does influence the correlation between aggregationtechniques, e.g., ITheil–IKolm increases with system size.
0.0
0.2
0.4
0.6
0.8
1.0
hibernate − Kendall(Theil(SLOC), Kolm(SLOC)) (86 releases)
Cor
. coe
ff. T
heil(
SLO
C)
− K
olm
(SLO
C)
0.8.
11.
01.
12.
0−be
ta−
12.
0−be
ta−
22.
0−be
ta−
32.
0−be
ta−
42.
0−fin
al2.
0−rc
22.
0.1
2.0.
22.
0.3
2.1−
beta
−1
2.1−
beta
−2
2.1−
beta
−3
2.1−
beta
−3b
2.1−
beta
−4
2.1−
beta
−5
2.1−
beta
−6
2.1−
final
2.1−
rc1
2.1.
12.
1.2
2.1.
32.
1.4
2.1.
52.
1.6
2.1.
72.
1.8
3.0
3.0−
alph
a3.
0−be
ta1
3.0−
beta
23.
0−be
ta3
3.0−
beta
43.
0−rc
13.
0.1
3.0.
23.
0.3
3.0.
43.
0.5
3.1
3.1−
alph
a13.
1−be
ta1
3.1−
beta
23.
1−be
ta3
3.1−
rc1
3.1−
rc2
3.1−
rc3
3.1.
13.
1.2
3.1.
33.
2−al
pha1
3.2−
alph
a23.
2−cr
13.
2−cr
23.
2.0−
cr3
3.2.
0−cr
43.
2.0−
cr5
3.2.
0.ga
3.2.
1−ga
3.2.
2−ga
3.2.
3−ga
3.2.
4−ga
3.2.
4−sp
13.
2.5−
ga3.
2.6−
ga3.
2.7−
ga3.
3.0−
cr2
3.3.
0−ga
3.3.
0−sp
13.
3.0.
cr1
3.3.
1−ga
3.3.
2−ga
3.5.
0−be
ta−
13.
5.0−
beta
−2
3.5.
0−be
ta−
33.
5.0−
beta
−4
3.5.
0−cr
−1
3.5.
0−cr
−2
3.5.
3−fin
al3.
5.5−
final
3.6.
0−be
ta1
3.6.
0−be
ta2
3.6.
0−be
ta3
3.6.
0−be
ta4
28/32
/ department of mathematics and computer science
Results (6)
SIG and Squale correlate positively to each other and negatively to allother aggregation techniques.
-1.0
-0.5
0.0
0.5
1.0
Kendall: Squale(3) - SIGd (SLOC) (95%)
Ken
dall
corr
elat
ion
coef
ficie
nt
-1.0
-0.5
0.0
0.5
1.0
Kendall: Squale(3) - SIGd (SLOC) (95%)
Ken
dall
corr
elat
ion
coef
ficie
nt
-1.0
-0.5
0.0
0.5
1.0
Kendall: Gini - Squale(3) (SLOC) (95%)
Ken
dall
corr
elat
ion
coef
ficie
nt
-1.0
-0.5
0.0
0.5
1.0
Kendall: Gini - Squale(3) (SLOC) (95%)
Ken
dall
corr
elat
ion
coef
ficie
nt
-1.0
-0.5
0.0
0.5
1.0
Kendall: Theil - Squale(3) (SLOC) (95%)
Ken
dall
corr
elat
ion
coef
ficie
nt
-1.0
-0.5
0.0
0.5
1.0
Kendall: Theil - Squale(3) (SLOC) (95%)
Ken
dall
corr
elat
ion
coef
ficie
nt
29/32
/ department of mathematics and computer science
Results (7)
Inequality indices are less appropriate for highlighting undesirablevalues unless assumptions about their number can be made.
0.0
0.5
1.0
1.5
2.0
2.5
3.0
Squale (weight = 3) aggregate for different percentages of perfect IMs
Ave
rage
Squ
ale
(wei
ght =
3)
mar
k
0 10 20 30 40 50 60 70 80 90 100
0.0
0.5
1.0
1.5
2.0
2.5
3.0
range [2, 3)range [1, 2)range [0.5, 1)range [0.1, 0.5)range (0, 0.1)
Ave
rage
mea
n ra
nge
Percentage of imperfect marks
0.0
0.5
1.0
1.5
2.0
2.5
3.0
Theil aggregate for different percentages of perfect IMs
Ave
rage
The
il ag
greg
ate
0 10 20 30 40 50 60 70 80 90 100
2.0
1.5
1.0
0.5
0.0
range [2, 3)range [1, 2)range [0.5, 1)range [0.1, 0.5)range (0, 0.1)
Ave
rage
mea
n ra
nge
Percentage of imperfect marks
0.0
0.5
1.0
1.5
2.0
2.5
3.0
Kolm aggregate for different percentages of perfect IMs
Ave
rage
Kol
m a
ggre
gate
0 10 20 30 40 50 60 70 80 90 100
1.0
0.8
0.6
0.4
0.2
0.0
range [2, 3)range [1, 2)range [0.5, 1)range [0.1, 0.5)range (0, 0.1)
Ave
rage
mea
n ra
nge
Percentage of imperfect marks
30/32
/ department of mathematics and computer science
Summary
We distill:
I Correlation with Squale or SIG for aggregation techniques thatsatisfy the highlight problems requirement.
I Correlation with ITheil, IMLD, or IAtkinson, e.g., for aggregationtechniques that satisfy the symmetry and decomposabilityrequirements.
31/32
/ department of mathematics and computer science
Conclusions
Requirements for one-to-manyaggregation techniques for software metrics
Existing aggregation techniques
Theoretical analysisEmpirical analysis
- root-cause analysis using
- mathematical properties of
- methodology and tooling- correlation studies with differentobjectives, metrics, systems, versions, aggregation techniques, aggregation levels
31/32
/ department of mathematics and computer science
Conclusions
Requirements for one-to-manyaggregation techniques for software metrics
Existing aggregation techniques
Theoretical analysisEmpirical analysis
- root-cause analysis using
- mathematical properties of
- methodology and tooling- correlation studies with differentobjectives, metrics, systems, versions, aggregation techniques, aggregation levels
New one-to-many aggregationtechniques for software metrics
Extensions:- other software metrics- non-software domains
Apply the same techniques toaggregation of combined metrics data
Determine an optimal partitioningSocial organizationof software projects
32/32
/ department of mathematics and computer science
Publications
ComparativeStudy of SoftwareMetrics’ Aggregation Techniques
Bogdan Vasilescu, Alexander Serebrenik∗, Mark van den Brand
TechnischeUniversiteit Eindhoven,Den Dolech 2, P.O. Box 513, 5600MBEindhoven, TheNetherlands
Abstract
While softwaremetrics are commonly used to assess softwaremaintainability and study softwareevolution, they areusually defined on amicro-level (method, class, package). Metricsshould thereforebeaggregated in order to provideinsights in the evolution at the macro-level (system). In addition to traditional aggregation techniques such as themean, recently econometric aggregation techniques such as theGini index and the Theil index have been proposed.Advantagesand disadvantagesof di erent aggregation techniqueshavenot been evaluated empirically so far. In thispaper wepresent thepreliminary resultsof thecomparativestudy of di erent aggregation techniques.
Keywords:softwaremetrics, maintainability, aggregation techniques
1. Introduction
While softwaremetrics are commonly used to assess softwaremaintainability and study softwareevolution, theyare usually defined on a micro-level (method, class, package). Metrics should therefore be aggregated in order toprovide insights in the evolution at themacro-level (system). Popular aggregation techniques include themean [15]and distribution fitting [4, 19]. The main advantage of the mean is its metrics-independence: whatever metrics areconsidered, themean should becalculated in thesameway. However, as thedistribution of many interesting softwaremetrics isskewed [24] themeanbecomesunreliable. Distribution fitting consistsof selectingaknown family of distri-butions(e.g., log-normal, exponential or negativebinomial) and fitting itsparametersto approximatethemetric valuesobserved. However, thefitting processshould berepeatedwhenever anew metric isbeing considered. Moreover, it isstill amatter of controversy whether, e.g., softwaresize isdistributed log-normally [4] or doublePareto [11].
It is highly desirable, hence, to develop an aggregation approach that would be both reliable and independent ofthe metrics being aggregated. Examples of such approaches are the Gini coe cient [10] and the Theil index [22],both well-known in econometrics [6] and recently applied to software metrics [23, 20]. Comparison of di erentaggregation techniqueswasso far missing, however. In this short paper wepresent thefirst preliminary results.
Remainder of thispaper isorganizedasfollows. InSection2webriefly introducetheaggregation techniquesbeingcompared. Section 3 compares the theoretical propertiesof di erent aggregation techniques. Section 4 described theempirical studiesconducted and, finally, Section 5 discusses related work and concludes.
2. Aggregation techniques
In this section webriefly present themathematical definitions of the aggregation techniques to be evaluated. Let{x1, . . . , xn} be theset of values to beaggregated. Then, themean, denoted as x, isdefined as 1
nni=1 xi .
∗Corresponding authorEmail addresses: [email protected] (Bogdan Vasilescu), [email protected] (Alexander Serebrenik),
[email protected] (Mark van den Brand)
Preprint submitted to Elsevier June27, 2011
You Can’t Control the Unfamiliar:A Study on the Relations Between Aggregation
Techniques for Software MetricsBogdan Vasilescu, Alexander Serebrenik, Mark van den Brand
Technische Universiteit Eindhoven,Den Dolech 2, P.O. Box 513,
5600 MB Eindhoven, The Netherlands{b.n.vasilescu@student., a.serebrenik@, m.g.j.v.d.brand@}tue.nl
Abstract—A popular approach to assessing software main-tainability and predicting its evolution involves collecting andanalyzing softwaremetr ics. However, metr ics are usually definedon a micro-level (method, class, package), and should thereforebe aggregated in order to provide insights in the evolution at themacro-level (system). In addition to traditional aggregation tech-niques such as the mean, median, or sum, recently econometr icaggregation techniques, such as the Gini, Theil, Kolm, Atkinson,and Hoover inequality indices have been proposed and appliedto software metr ics.In this paper we present the results of an extensive correlation
study of themost widely-used traditional and econometr ic aggre-gation techniques, applied to lifting SLOC values from class topackage level in the 106 systems compr ising theQualitasCorpus.Moreover, we investigate the nature of this relation, and studyits evolution on a subset of 12 systems from theQualitas Corpus.Our results indicate high and statistically significant corre-
lation between the Gini, Theil, Atkinson, and Hoover indices,i.e., aggregation values obtained using these techniques conveythe same information. However, we discuss some of the rationalebehind choosing between one index or another.
I . INTRODUCTION
Software maintenance is an area of software engineeringwith deep financial implications. Indeed, it was reported thatbetween 60%and 90%of thesoftwarebudgets represent main-tenanceand evolution costs [1]–[3]. Furthermore, maintenanceand evolution costs were forecasted to account for more thanhalf of North American and European software budgets in2010 [4]. Similar or even higher figures were reported forcountries such as Norway [5] and Chile [6].Controlling software maintenance costs requires predicting
how the system will evolve in the future, which in turnrequires a better understanding of software evolution [7]–[9].A popular approach to assessing software maintainability andpredicting its evolution involves performing measurements oncode artifacts. It starts off by identifying a number of specificproperties of the system under investigation, and then collect-ing the corresponding software metrics and analyzing theirevolution. Although it is debatablewhether one cannot controlwhat one cannot measure, it is without a doubt that collectingand analyzing metrics helps increase one’s familiarity andunderstanding of the analyzed systems.
However, metricsareusually defined at micro level (method,class, package), while the analysis of maintainability andevolution requires insights at macro (system) level. Moreover,due to privacy reasons, it might be undesirable to disclosemetrics pertaining to a single developer as opposed to thosepertaining to the entire project [10]. Metrics should thereforebe aggregated [11].Popular aggregation techniques include such standard sum-
mary statistical measures as mean, median, or sum [12], [13].Their main advantage is universality (metrics-independence):whatever metrics are considered, the measures should be cal-culated in the sameway. However, as the distribution of manyinteresting software metrics is skewed [14], the interpretationof such measures becomes unreliable [15].Alternatively, distribution fitting [14], [16], [17] consists of
selecting a known family of distributions (e.g., log-normalor exponential) and fitting its parameters to approximate themetric values observed. The fitted parameters can be thenseen as aggregating these values. However, the fitting processshould be repeated whenever a new metric is being consid-ered. Moreover, it is still a matter of controversy whether,e.g., software size is distributed log-normally [16] or doublePareto [18]. We do not consider distribution fitting.Recently, there isan emerging trend in usingmoreadvanced
aggregation techniques borrowed from econometrics, wherethey areused to study inequality of incomeor welfaredistribu-tions [19]–[21]. The motivation for applying such techniquesto software metrics is twofold. First, as numerous countrieshave few rich and many poor, numerous software systemshave few very big or complex components, and many small orsimple ones [15], [22], [23]. Consequently, it is common bothfor software metrics, as well as for econometric variables tohave strongly-skewed distributions (Figure 1).Second, the shape of these distributions, which appear
visually to follow a power law, renders the use of traditionalaggregation techniques such as the sample mean and variancequestionable at best. Indeed, it was reported that many impor-tant relationships between software artifacts follow a power-law distribution [16], [25], and it is known that a power-lawdistribution may not have a finite mean and variance [22].
By No Means: A Study on Aggregating Software Metrics
Bogdan VasilescuTechnische Universiteit
EindhovenDen Dolech 2, P.O. Box 513,
5600 MB EindhovenThe Netherlands
Alexander SerebrenikTechnische Universiteit
EindhovenDen Dolech 2, P.O. Box 513,
5600 MB EindhovenThe Netherlands
Mark van den BrandTechnische Universiteit
EindhovenDen Dolech 2, P.O. Box 513,
5600 MB EindhovenThe Netherlands
ABSTRACTFault prediction models usually employ software metrics which
were previously shown to be a strong predictor for defects,
e.g., SLOC. However, metrics are usually de ned on a micro-
level (method, class, package), and should therefore be ag-
gregated in order to provide insights in the evolution at the
macro-level (system). In addition to traditional aggrega-
tion techniques such as the mean, median, or sum, recently
econometric aggregation techniques, such as the Gini, Theil,
and Hoover indices have been proposed. In this paper we
wish to understand whether the aggregation technique in-
uences the presence and strength of the relation between
SLOC and defects. Our results indicate that correlation is
not strong, and is in uenced by the aggregation technique.
Categor iesand Subject Descr iptorsD.2.7 [Software Engineering]: Distribution, Maintenance,
and Enhancement corrections; D.2.8 [Software Engineer-
ing]: Metrics complexity measures
General TermsMeasurement, Economics, Experimentation
KeywordsSoftware metrics, maintainability, aggregation techniques
1. INTRODUCTIONSoftware maintenance is an area of software engineering
with deep nancial implications. Indeed, it was reported
that up to 90% of the software budgets represent mainte-
nance and evolution costs [10, 3]. Thus, in order to control
software maintenance costs, it is desirable, e.g., to predict
faulty components early in the development phase.
Fault prediction models usually employ software metrics
which were previously shown to be a strong predictor for de-
fects [9, 4, 21, 22, 20, 12]. Such a metric is size, measured in
Permission to make digital or hard copies of all or part of this work forpersonal or classroom use is granted without fee provided that copies arenot madeor distributed for profit or commercial advantage and that copiesbear thisnoticeand the full citation on thefirst page. To copy otherwise, torepublish, to post on serversor to redistribute to lists, requiresprior specificpermission and/or a fee.ICSE ’11, May 21–28, 2011, Waikiki, Honolulu, HI, USACopyright 2011 ACM 978-1-4503-0593-8/11/05 ...$10.00.
(source) lines of code, (S)LOC. Size (SLOC) not only corre-
sponds to the intuitive belief that large systems have more
faults in them than small systems, but was shown to act
as an early indicator of problems better than, e.g., object-
oriented metrics such as the Chidamber and Kemerer suite
or the Lorenz and Kidd suite [9].
However, software metrics are commonly de ned at micro-
level (method, class, package), and should therefore be ag-
gregated at macro-level (system), in order to provide insights
in the study of maintainability and evolution.
Popular aggregation techniques include such standard sum-
mary statistical measures as mean, median, or sum [19].
Their main advantage is universality (metrics-independence):
whatever metrics are considered, the measures should be
calculated in the same way. However, as the distribution of
many interesting software metrics is skewed [29], the inter-
pretation of such measures becomes unreliable.
Alternatively, distribution tting [6, 26, 29] consists of se-
lecting a known family of distributions (e.g., log-normal or
exponential) and tting its parameters to approximate the
metric values observed. The tted parameters can be then
considered as aggregating these values. However, the tting
process should be repeated whenever a new metric is be-
ing considered. Moreover, it is still a matter of controversy
whether, e.g., software size is distributed log-normally [6] or
double Pareto [14].
Recently, there is an emerging trend in using more ad-
vanced aggregation techniques, that are both reliable, as well
as general. Examples of such approaches are the Gini coe -
cient [11], the Theil index [28], and the Hoover index [15], all
well-known in econometrics for their applicability to study-
ing income inequality [7], and recently applied to software
metrics [27, 30, 13, 31].
In this preliminary study, based on the assumption that
size is a good predictor for defects, hence size and defects
should be statistically related, we wish to understand whether
the aggregation technique in uences the presence and strength
of this relation. Brie y, our results indicate that correlation
between SLOC and defects is not strong, and is in uenced
by the aggregation technique.
2. METHODOLOGYWe apply correlation analysis to SLOC data of Java classes
aggregated at package level using di erent aggregation tech-
niques, and defects (bug count per package). As a by-
product of our evaluation, we also study the correlation be-
tween the di erent aggregation techniques themselves. The
choice for aggregating data from class to package level rather
JOURNAL OFSOFTWAREMAINTENANCEAND EVOLUTION: RESEARCH AND PRACTICEJ. Softw. Maint. Evol.: Res. Pract. 0000; 00:1–15Published online inWiley InterScience (www.interscience.wiley.com). DOI: 10.1002/smr
Practical SoftwareQuality MetricsAggregation
KarineMordal1, NicolasAnquetil2 , Jannik Laval2, Alexander Serebrenik3, BogdanVasilescu3, and StephaneDucasse2
1LIASD, University of Paris8, France2RMoD Team, INRIA, Lille, France
3TechnischeUniversiteit Eindhoven, TheNetherlands
SUMMARY
With the growing need for quality assessment of entire software systems, in practice, new issues areemerging. First, since most software quality metrics are defined at the level of individual softwarecomponents, there is a need for aggregation methods to summarize the results at the system level. Second,sinceapractical evaluation requires theuseof different metrics, with possibly widely varying output ranges,there is a need to combine these results into a unified quality assessment. Third, since projects vary anddifferent organizationshavedifferent perceptionson quality, there isaneed to adapt the interpretation of thequality assessment to theperception of theusersperforming it. In thispaper weidentify therequirementsfora practical aggregation method, and present theSqualemodel for metric aggregation, specifically designedto address theneedsof practitioners. Weempirically validate theadequation of Squale through experimentson Eclipse. Additionally, wecompare theSqualemodel to both traditional aggregation techniques (e.g., thearithmetic mean), aswell as to econometric inequality indices (e.g., theGini or theTheil indices), recentlyapplied to aggregation of softwaremetrics. Copyright c 0000 JohnWiley & Sons, Ltd.
Received . . .
KEY WORDS: softwaremetrics; softwarequality; aggregation; inequality indices
1. INTRODUCTION
Softwaremetrics are becoming part of the software development fabric, essential to understandingwhether the quality of the software we are building corresponds to our expectations [Pfl08]. Asa consequence, many different metrics have been proposed, as well as a plethora of tools tocomputethemandperformquality assessments. Considering thedifferent stakeholdersparticipatingin software projects (e.g. developers, managers, users), quality needs to be evaluated at differentlevels of detail. Practical application of software metrics is, however, challenged by (i) the needto combine different metrics as recommended by quality-model design methods such as Factor-Criteria-Metric (FCM) [MRW76], or Goal-Question-Metric (GQM) [Bas92]; (ii) theneed to obtaininsights in quality of the entire system based on the metric values obtained for low-level systemelements such as classes and methods; and (iii) the need to fine tune the quality model to differentquality standardsemployed by different organizations. Wedetail each challengeseparately.First, a practical quality assessment needs to combine the results of various methods to answer
specificquestionsassuggestedby suchmodelsasFactor-Criteria-Metric (FCM) [MRW76], or Goal-Question-Metric (GQM) [Bas92]. For example, cyclomatic complexity might becombinedwith test
Correspondence to: INRIA Team RMod, Parc Scientifique de la Haute Borne, 40, avenue Halley. Bt.A, Park Plaza,59650 Villeneuved’Ascq, France. E-mail: [email protected]
Copyright c 0000 JohnWiley & Sons, Ltd.
Prepared using smrauth.cls [Version: 2010/05/10 v2.00]
BeNeVol 2010 ICSM 2011WETSoM 2011 JSME
33/32
/ department of mathematics and computer science
Correlation
Linear correlation can be misleading.
5 10 15
46
810
12
Pea: 0.816; Ken: 0.963; Spe: 0.990
●●
●
●●
●
●●
●
●●
5 10 15
46
810
12
Pea: 0.816; Ken: 0.636; Spe: 0.818
●
●●
●●
●
●
●
●
●
●
5 10 15
46
810
12
Pea: 0.816; Ken: 0.563; Spe: 0.690
●
●●●
●
●
●
●
●
●
●
5 10 15
46
810
12
Pea: 0.816; Ken: 0.426; Spe: 0.5
●
●
●
●●
●
●
●
●
●
●
34/32
/ department of mathematics and computer science
References I
Irina Diana Coman, Alberto Sillitti, and Giancarlo Succi.
A case-study on using an automated in-process software engineeringmeasurement and analysis system in an industrial environment.
In Proceedings of the 31st International Conference on SoftwareEngineering, pages 89–99. IEEE Computer Society, 2009.