Top Banner
Analysis of Advanced Aggregation Techniques for Software Metrics Final presentation Bogdan Vasilescu [email protected] Supervisor: Dr. Alexander Serebrenik July 20, 2011
58

Master Thesis presentation

Dec 04, 2014

Download

Technology

I used these slides for my final presentation and Master Thesis defence.
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Master Thesis presentation

Analysis of AdvancedAggregation Techniquesfor Software Metrics

Final presentation

Bogdan [email protected]

Supervisor: Dr. Alexander Serebrenik

July 20, 2011

Page 2: Master Thesis presentation

2/32

/ department of mathematics and computer science

Analysis of advanced aggregation techniques for software metrics

Most metrics do not have a definition at system level.

Page 3: Master Thesis presentation

2/32

/ department of mathematics and computer science

Analysis of advanced aggregation techniques for software metrics

Most metrics do not have a definition at system level.

Page 4: Master Thesis presentation

2/32

/ department of mathematics and computer science

Analysis of advanced aggregation techniques for software metrics

Most metrics do not have a definition at system level.

Page 5: Master Thesis presentation

2/32

/ department of mathematics and computer science

Analysis of advanced aggregation techniques for software metrics

Most metrics do not have a definition at system level.

Page 6: Master Thesis presentation

3/32

/ department of mathematics and computer science

Analysis of advanced aggregation techniques for software metrics

“Designing a sound aggregation of software metrics is notobvious and it is still an open issue.” [CSS09]

GoalDerive requirements for aggregation techniques for softwaremetrics.

Page 7: Master Thesis presentation

3/32

/ department of mathematics and computer science

Analysis of advanced aggregation techniques for software metrics

“Designing a sound aggregation of software metrics is notobvious and it is still an open issue.” [CSS09]

GoalDerive requirements for aggregation techniques for softwaremetrics.

Page 8: Master Thesis presentation

4/32

/ department of mathematics and computer science

Aggregation of software metrics

Many to one:I Same artifactI Different metrics

Example:Maintainability Index

One to many:I Same metricI Different artifacts

Example:Weighted Methods perClass

Page 9: Master Thesis presentation

4/32

/ department of mathematics and computer science

Aggregation of software metrics

Many to one:I Same artifactI Different metrics

Example:Maintainability Index

One to many:I Same metricI Different artifacts

Example:Weighted Methods perClass

Page 10: Master Thesis presentation

5/32

/ department of mathematics and computer science

Approach

Derive requirements for one-to-manyaggregation techniques for software metrics

Page 11: Master Thesis presentation

5/32

/ department of mathematics and computer science

Approach

Derive requirements for one-to-manyaggregation techniques for software metrics

Study existingaggregation techniques:

Theoreticalanalysis

Empiricalanalysis

- traditional- inequality indices- threshold-based

(e.g., mean, median) (e.g., Gini, Theil) (e.g., SIG, Squale)

Page 12: Master Thesis presentation

6/32

/ department of mathematics and computer science

Inequality indices

Econometrics: measure/explain the inequality of income or wealth.

Software metrics and econometric variables have distributions withsimilar shapes.

Source Lines of Code: freecol−0.9.4

SLOC per class

Fre

quen

cy

0 500 1000 1500 2000 2500 3000

010

020

030

040

0

Household income in Ilocos, Philippines (1998)

Income

Fre

quen

cy

0 500000 1500000 2500000

010

020

030

040

050

0

Page 13: Master Thesis presentation

7/32

/ department of mathematics and computer science

Degree of concentration of functionality

Lorenz curve for SLOC in Hibernate3.6.0-beta4.

% Classes

% S

LOC

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

0.0

0.2

0.4

0.6

0.8

1.0

Measure inequality between:I individuals

(e.g., classes)I groups

(e.g., components)

Page 14: Master Thesis presentation

7/32

/ department of mathematics and computer science

Degree of concentration of functionality

Lorenz curve for SLOC in Hibernate3.6.0-beta4.

IHoover

IGini =A

A

B

A+B = 2A

Measure inequality between:I individuals

(e.g., classes)I groups

(e.g., components)

Page 15: Master Thesis presentation

7/32

/ department of mathematics and computer science

Degree of concentration of functionality

Lorenz curve for SLOC in Hibernate3.6.0-beta4.

IHoover

IGini =A

A

B

A+B = 2A

Measure inequality between:I individuals

(e.g., classes)I groups

(e.g., components)

Page 16: Master Thesis presentation

7/32

/ department of mathematics and computer science

Degree of concentration of functionality

When computing the inequalitywithin the entire population, it isoften desirable to assess thecontribution of the inequalitybetween the groups.

Decomposability:

I (X ) = Iwithin + Ibetween

=

m∑j=1

ωj I (Xj )+ Ibetween

Measure inequality between:I individuals

(e.g., classes)I groups

(e.g., components)

Page 17: Master Thesis presentation

8/32

/ department of mathematics and computer science

Traceability via decomposability

Share of inequality explained by the partitioning G = {G1, . . . ,Gm}:

R (G ) =Ibetween(G )

I (X )

Which individuals (classes in package) contribute to 80% of theinequality of SLOC?

Which class contributes the most to the inequality?

Page 18: Master Thesis presentation

8/32

/ department of mathematics and computer science

Traceability via decomposability

Share of inequality explained by the partitioning G = {G1, . . . ,Gm}:

R (G ) =Ibetween(G )

I (X )

Which individuals (classes in package) contribute to 80% of theinequality of SLOC?

Which class contributes the most to the inequality?

Page 19: Master Thesis presentation

8/32

/ department of mathematics and computer science

Traceability via decomposability

LemmaLet X = {x1, x2, . . . , xn} be a collection of values such that x1 ≤ xi ≤ xn .Then, it is either x1 or xn that contributes the most to the inequalitymeasured using ITheil, i.e., it is either the partitioning ({x1}, X \ {x1}) orthe partitioning ({xn}, X \ {xn}) that provides the best explanation forthe inequality measured using ITheil.

Page 20: Master Thesis presentation

9/32

/ department of mathematics and computer science

Other properties of inequality indices

Symmetry

Inequality stays the same for any permutation of the population.

Page 21: Master Thesis presentation

9/32

/ department of mathematics and computer science

Other properties of inequality indices

Symmetry

Inequality stays the same for any permutation of the population.

Page 22: Master Thesis presentation

9/32

/ department of mathematics and computer science

Other properties of inequality indices

Symmetry

Inequality stays the same for any permutation of the population.

Page 23: Master Thesis presentation

10/32

/ department of mathematics and computer science

Other properties of inequality indices

Population principle

Inequality does not change if the population is replicated any number oftimes.

Page 24: Master Thesis presentation

10/32

/ department of mathematics and computer science

Other properties of inequality indices

Population principle

Inequality does not change if the population is replicated any number oftimes.

Page 25: Master Thesis presentation

10/32

/ department of mathematics and computer science

Other properties of inequality indices

Population principle

Inequality does not change if the population is replicated any number oftimes.

Page 26: Master Thesis presentation

11/32

/ department of mathematics and computer science

Other properties of inequality indices

Transfers principle

A transfer from a rich man to a poor man (without reversing theirposition) should decrease inequality.

Page 27: Master Thesis presentation

11/32

/ department of mathematics and computer science

Other properties of inequality indices

Transfers principle

A transfer from a rich man to a poor man (without reversing theirposition) should decrease inequality.

Page 28: Master Thesis presentation

11/32

/ department of mathematics and computer science

Other properties of inequality indices

Transfers principle

A transfer from a rich man to a poor man (without reversing theirposition) should decrease inequality.

Page 29: Master Thesis presentation

11/32

/ department of mathematics and computer science

Other properties of inequality indices

Transfers principle

20 36 45

30 36

A transfer from a rich man to a poor man (without reversing theirposition) should decrease inequality.

Page 30: Master Thesis presentation

12/32

/ department of mathematics and computer science

Other properties of inequality indices

Scale invariance

Inequality does not change if all values are multiplied by the sameconstant.

Page 31: Master Thesis presentation

12/32

/ department of mathematics and computer science

Other properties of inequality indices

Scale invariance

Inequality does not change if all values are multiplied by the sameconstant.

Page 32: Master Thesis presentation

13/32

/ department of mathematics and computer science

Summary

Ineq. index Sym. Inv. Dec. Pop. Tra.IGini X × X XITheil X × X X XIMLD X × X X XIHoover X × XIαAtkinson X × X X XIβKolm X + X X X

Problems include:I Domain not always Rn .I No distinction between all values equal but low, and all values

equal but high.

Page 33: Master Thesis presentation

14/32

/ department of mathematics and computer science

Threshold-based aggregation techniques

Two types:

I hard thresholds: improvements in quality are not reflected as longas the metrics stay within certain boundaries (e.g., SIG).

I soft thresholds: do not exhibit staircasing effects (e.g., Squale).

Page 34: Master Thesis presentation

15/32

/ department of mathematics and computer science

The Squale Quality Model

Metrics

Individual Marksin [0,3]

Global Markin [0,3]

Page 35: Master Thesis presentation

15/32

/ department of mathematics and computer science

The Squale Quality Model

Metrics

Individual Marksin [0,3]

Global Markin [0,3]

SLOC per method

Indi

vidu

al M

ark

(IM

)

0 10 20 30 40 50 60 70 80 90 110 130 150 170

0.0

0.5

1.0

1.5

2.0

2.5

3.0

Page 36: Master Thesis presentation

15/32

/ department of mathematics and computer science

The Squale Quality Model

Metrics

Individual Marksin [0,3]

Global Markin [0,3]

SLOC per method

Indi

vidu

al M

ark

(IM

)

0 10 20 30 40 50 60 70 80 90 110 130 150 170

0.0

0.5

1.0

1.5

2.0

2.5

3.0

Page 37: Master Thesis presentation

16/32

/ department of mathematics and computer science

Properties of Squale aggregation

Symmetry

Population princ.

Anti-transfers princ.

20 36 45

30 36

Page 38: Master Thesis presentation

17/32

/ department of mathematics and computer science

Properties of Squale aggregation

LemmaI log λKolm(x1, . . . , xn)+ I λSquale(x1, . . . , xn) = x

LemmaFor all c ∈ R it holds that I λSquale is “unit translatable”, i.e.,

I λSquale(x1 + c, . . . , xn + c) = I λSquale(x1, . . . , xn)+ c

Inequality indices are invariant with respect to either multiplication, oraddition.

Page 39: Master Thesis presentation

18/32

/ department of mathematics and computer science

Summary

We distill:

I Highlighting undesirable values in the aggregated result.

However, problems include:

I Thresholds should be derived and validated.I A high rating is not necessarily an indication of good software

engineering practices.I Not decomposable.

Page 40: Master Thesis presentation

19/32

/ department of mathematics and computer science

Approach

Derive requirements for one-to-manyaggregation techniques for software metrics

Study existingaggregation techniques:

Theoreticalanalysis

Empiricalanalysis

- traditional- inequality indices- threshold-based

(e.g., mean, median) (e.g., Gini, Theil) (e.g., SIG, Squale)

Page 41: Master Thesis presentation

20/32

/ department of mathematics and computer science

Empirical evaluation

Page 42: Master Thesis presentation

21/32

/ department of mathematics and computer science

Pilot study

Aggregate SLOC from class to package level.

Study statistical correlation betweenI aggregation techniques and

number of defects per package.I pairs of aggregation techniques.

Case studies: ArgoUML, Adempiere, Mogwai.

Questions:I Does aggregation technique influence correlation with bugs?

• Correlation between SLOC and defects is not strong, and isinfluenced by the aggregation technique.

I Which aggregation techniques convey the same information?

• IGini, ITheil, IMLD, IHoover, and IAtkinson convey the same information.

Page 43: Master Thesis presentation

21/32

/ department of mathematics and computer science

Pilot study

Aggregate SLOC from class to package level.

Study statistical correlation betweenI aggregation techniques and

number of defects per package.I pairs of aggregation techniques.

Case studies: ArgoUML, Adempiere, Mogwai.

Questions:I Does aggregation technique influence correlation with bugs?

• Correlation between SLOC and defects is not strong, and isinfluenced by the aggregation technique.

I Which aggregation techniques convey the same information?• IGini, ITheil, IMLD, IHoover, and IAtkinson convey the same information.

Page 44: Master Thesis presentation

22/32

/ department of mathematics and computer science

Threats to validity

Threat Pilot

Subsequent studies

Metric SLOC

SLOC, LOC, NOS, NOSt, DIT, NOC, PBS, PLwC

ArgoUML

Qualitas Corpus

System Adempiere

106 Java open-source systems

Mogwai

430K files, 57 MSLOC

Version single

414 from 13/106 systems (> 10 versions)

Technique traditional

traditional, ineq. indices, threshold-based

ineq. indicesAggr. level class–package

class-package, method–class

Page 45: Master Thesis presentation

22/32

/ department of mathematics and computer science

Threats to validity

Threat Pilot Subsequent studiesMetric SLOC SLOC, LOC, NOS, NOSt, DIT, NOC, PBS, PLwC

ArgoUML Qualitas CorpusSystem Adempiere 106 Java open-source systems

Mogwai 430K files, 57 MSLOCVersion single 414 from 13/106 systems (> 10 versions)Technique traditional traditional, ineq. indices, threshold-based

ineq. indicesAggr. level class–package class-package, method–class

Page 46: Master Thesis presentation

23/32

/ department of mathematics and computer science

Results (1)

IGini, ITheil, IMLD, IAtkinson, and IHoover always convey the same information.-1.0

-0.5

0.0

0.5

1.0

-1.0

-0.5

0.0

0.5

1.0

SLOC

MLD-Hoo Gin-MLD The-MLD Gin-Hoo Atk-Hoo The-Hoo Gin-Atk MLD-Atk Gin-The The-Atk

(91%) (89%) (91%) (90%) (92%) (92%) (90%) (91%) (91%) (92%)

-1.0

-0.5

0.0

0.5

1.0

-1.0

-0.5

0.0

0.5

1.0

DIT

MLD-Hoo Atk-Hoo Gin-MLD The-Hoo Gin-Atk Gin-Hoo Gin-The The-MLD The-Atk MLD-Atk

(85%) (87%) (87%) (88%) (88%) (89%) (88%) (88%) (88%) (89%)

Page 47: Master Thesis presentation

24/32

/ department of mathematics and computer science

Results (2)

IKolm shows high correlation with mean for size metrics.

-1.0

-0.5

0.0

0.5

1.0

Kendall corr.: mean - Kolm (SLOC)

Ken

dall

corr

elat

ion

coef

ficie

nt

-1.0

-0.5

0.0

0.5

1.0

Kendall corr.: mean - Kolm (SLOC)

Ken

dall

corr

elat

ion

coef

ficie

nt

-1.0

-0.5

0.0

0.5

1.0

Kendall corr.: mean - Kolm (DIT)

Ken

dall

corr

elat

ion

coef

ficie

nt

-1.0

-0.5

0.0

0.5

1.0

Kendall corr.: mean - Kolm (DIT)

Ken

dall

corr

elat

ion

coef

ficie

nt

-1.0

-0.5

0.0

0.5

1.0

Kendall corr.: mean - Kolm (PLwC)

Ken

dall

corr

elat

ion

coef

ficie

nt

-1.0

-0.5

0.0

0.5

1.0

Kendall corr.: mean - Kolm (PLwC)

Ken

dall

corr

elat

ion

coef

ficie

nt

Page 48: Master Thesis presentation

25/32

/ department of mathematics and computer science

Results (3)

Superlinear (e.g., ITheil–IGini) and chaotic (e.g., ITheil–IKolm) patterns canbe observed in the scatter plots.

0.1 0.2 0.3 0.4 0.5 0.6

0.0

0.2

0.4

0.6

0.8

1.0

compiere: Theil-Gini. Kendall: 0.94, p-val: 0.00

Gini (SLOC)

The

il (S

LOC

)

0 50 100 150 200 250 300 350

0.0

0.2

0.4

0.6

0.8

1.0

compiere: Theil-Kolm. Kendall: 0.25, p-val: 0.01

Kolm (SLOC)

The

il (S

LOC

)

Page 49: Master Thesis presentation

26/32

/ department of mathematics and computer science

Results (4)

Changing the aggregation level to class level does not affect thecorrelation between various aggregation techniques as measured atpackage level.

-1.0

-0.5

0.0

0.5

1.0

Kendall: Gini - Theil (SLOC) (100%)

Ken

dall

corr

elat

ion

coef

ficie

nt

-1.0

-0.5

0.0

0.5

1.0

Kendall: Gini - Theil (SLOC) (100%)

Ken

dall

corr

elat

ion

coef

ficie

nt

-1.0

-0.5

0.0

0.5

1.0

Kendall: Theil - Atkinson (SLOC) (100%)

Ken

dall

corr

elat

ion

coef

ficie

nt

-1.0

-0.5

0.0

0.5

1.0

Kendall: Theil - Atkinson (SLOC) (100%)

Ken

dall

corr

elat

ion

coef

ficie

nt

-1.0

-0.5

0.0

0.5

1.0

Kendall: Theil - MLD (SLOC) (100%)

Ken

dall

corr

elat

ion

coef

ficie

nt

-1.0

-0.5

0.0

0.5

1.0

Kendall: Theil - MLD (SLOC) (100%)

Ken

dall

corr

elat

ion

coef

ficie

nt

Page 50: Master Thesis presentation

27/32

/ department of mathematics and computer science

Results (5)

System size does influence the correlation between aggregationtechniques, e.g., ITheil–IKolm increases with system size.

0.0

0.2

0.4

0.6

0.8

1.0

hibernate − Kendall(Theil(SLOC), Kolm(SLOC)) (86 releases)

Cor

. coe

ff. T

heil(

SLO

C)

− K

olm

(SLO

C)

0.8.

11.

01.

12.

0−be

ta−

12.

0−be

ta−

22.

0−be

ta−

32.

0−be

ta−

42.

0−fin

al2.

0−rc

22.

0.1

2.0.

22.

0.3

2.1−

beta

−1

2.1−

beta

−2

2.1−

beta

−3

2.1−

beta

−3b

2.1−

beta

−4

2.1−

beta

−5

2.1−

beta

−6

2.1−

final

2.1−

rc1

2.1.

12.

1.2

2.1.

32.

1.4

2.1.

52.

1.6

2.1.

72.

1.8

3.0

3.0−

alph

a3.

0−be

ta1

3.0−

beta

23.

0−be

ta3

3.0−

beta

43.

0−rc

13.

0.1

3.0.

23.

0.3

3.0.

43.

0.5

3.1

3.1−

alph

a13.

1−be

ta1

3.1−

beta

23.

1−be

ta3

3.1−

rc1

3.1−

rc2

3.1−

rc3

3.1.

13.

1.2

3.1.

33.

2−al

pha1

3.2−

alph

a23.

2−cr

13.

2−cr

23.

2.0−

cr3

3.2.

0−cr

43.

2.0−

cr5

3.2.

0.ga

3.2.

1−ga

3.2.

2−ga

3.2.

3−ga

3.2.

4−ga

3.2.

4−sp

13.

2.5−

ga3.

2.6−

ga3.

2.7−

ga3.

3.0−

cr2

3.3.

0−ga

3.3.

0−sp

13.

3.0.

cr1

3.3.

1−ga

3.3.

2−ga

3.5.

0−be

ta−

13.

5.0−

beta

−2

3.5.

0−be

ta−

33.

5.0−

beta

−4

3.5.

0−cr

−1

3.5.

0−cr

−2

3.5.

3−fin

al3.

5.5−

final

3.6.

0−be

ta1

3.6.

0−be

ta2

3.6.

0−be

ta3

3.6.

0−be

ta4

Page 51: Master Thesis presentation

28/32

/ department of mathematics and computer science

Results (6)

SIG and Squale correlate positively to each other and negatively to allother aggregation techniques.

-1.0

-0.5

0.0

0.5

1.0

Kendall: Squale(3) - SIGd (SLOC) (95%)

Ken

dall

corr

elat

ion

coef

ficie

nt

-1.0

-0.5

0.0

0.5

1.0

Kendall: Squale(3) - SIGd (SLOC) (95%)

Ken

dall

corr

elat

ion

coef

ficie

nt

-1.0

-0.5

0.0

0.5

1.0

Kendall: Gini - Squale(3) (SLOC) (95%)

Ken

dall

corr

elat

ion

coef

ficie

nt

-1.0

-0.5

0.0

0.5

1.0

Kendall: Gini - Squale(3) (SLOC) (95%)

Ken

dall

corr

elat

ion

coef

ficie

nt

-1.0

-0.5

0.0

0.5

1.0

Kendall: Theil - Squale(3) (SLOC) (95%)

Ken

dall

corr

elat

ion

coef

ficie

nt

-1.0

-0.5

0.0

0.5

1.0

Kendall: Theil - Squale(3) (SLOC) (95%)

Ken

dall

corr

elat

ion

coef

ficie

nt

Page 52: Master Thesis presentation

29/32

/ department of mathematics and computer science

Results (7)

Inequality indices are less appropriate for highlighting undesirablevalues unless assumptions about their number can be made.

0.0

0.5

1.0

1.5

2.0

2.5

3.0

Squale (weight = 3) aggregate for different percentages of perfect IMs

Ave

rage

Squ

ale

(wei

ght =

3)

mar

k

0 10 20 30 40 50 60 70 80 90 100

0.0

0.5

1.0

1.5

2.0

2.5

3.0

range [2, 3)range [1, 2)range [0.5, 1)range [0.1, 0.5)range (0, 0.1)

Ave

rage

mea

n ra

nge

Percentage of imperfect marks

0.0

0.5

1.0

1.5

2.0

2.5

3.0

Theil aggregate for different percentages of perfect IMs

Ave

rage

The

il ag

greg

ate

0 10 20 30 40 50 60 70 80 90 100

2.0

1.5

1.0

0.5

0.0

range [2, 3)range [1, 2)range [0.5, 1)range [0.1, 0.5)range (0, 0.1)

Ave

rage

mea

n ra

nge

Percentage of imperfect marks

0.0

0.5

1.0

1.5

2.0

2.5

3.0

Kolm aggregate for different percentages of perfect IMs

Ave

rage

Kol

m a

ggre

gate

0 10 20 30 40 50 60 70 80 90 100

1.0

0.8

0.6

0.4

0.2

0.0

range [2, 3)range [1, 2)range [0.5, 1)range [0.1, 0.5)range (0, 0.1)

Ave

rage

mea

n ra

nge

Percentage of imperfect marks

Page 53: Master Thesis presentation

30/32

/ department of mathematics and computer science

Summary

We distill:

I Correlation with Squale or SIG for aggregation techniques thatsatisfy the highlight problems requirement.

I Correlation with ITheil, IMLD, or IAtkinson, e.g., for aggregationtechniques that satisfy the symmetry and decomposabilityrequirements.

Page 54: Master Thesis presentation

31/32

/ department of mathematics and computer science

Conclusions

Requirements for one-to-manyaggregation techniques for software metrics

Existing aggregation techniques

Theoretical analysisEmpirical analysis

- root-cause analysis using

- mathematical properties of

- methodology and tooling- correlation studies with differentobjectives, metrics, systems, versions, aggregation techniques, aggregation levels

Page 55: Master Thesis presentation

31/32

/ department of mathematics and computer science

Conclusions

Requirements for one-to-manyaggregation techniques for software metrics

Existing aggregation techniques

Theoretical analysisEmpirical analysis

- root-cause analysis using

- mathematical properties of

- methodology and tooling- correlation studies with differentobjectives, metrics, systems, versions, aggregation techniques, aggregation levels

New one-to-many aggregationtechniques for software metrics

Extensions:- other software metrics- non-software domains

Apply the same techniques toaggregation of combined metrics data

Determine an optimal partitioningSocial organizationof software projects

Page 56: Master Thesis presentation

32/32

/ department of mathematics and computer science

Publications

ComparativeStudy of SoftwareMetrics’ Aggregation Techniques

Bogdan Vasilescu, Alexander Serebrenik∗, Mark van den Brand

TechnischeUniversiteit Eindhoven,Den Dolech 2, P.O. Box 513, 5600MBEindhoven, TheNetherlands

Abstract

While softwaremetrics are commonly used to assess softwaremaintainability and study softwareevolution, they areusually defined on amicro-level (method, class, package). Metricsshould thereforebeaggregated in order to provideinsights in the evolution at the macro-level (system). In addition to traditional aggregation techniques such as themean, recently econometric aggregation techniques such as theGini index and the Theil index have been proposed.Advantagesand disadvantagesof di erent aggregation techniqueshavenot been evaluated empirically so far. In thispaper wepresent thepreliminary resultsof thecomparativestudy of di erent aggregation techniques.

Keywords:softwaremetrics, maintainability, aggregation techniques

1. Introduction

While softwaremetrics are commonly used to assess softwaremaintainability and study softwareevolution, theyare usually defined on a micro-level (method, class, package). Metrics should therefore be aggregated in order toprovide insights in the evolution at themacro-level (system). Popular aggregation techniques include themean [15]and distribution fitting [4, 19]. The main advantage of the mean is its metrics-independence: whatever metrics areconsidered, themean should becalculated in thesameway. However, as thedistribution of many interesting softwaremetrics isskewed [24] themeanbecomesunreliable. Distribution fitting consistsof selectingaknown family of distri-butions(e.g., log-normal, exponential or negativebinomial) and fitting itsparametersto approximatethemetric valuesobserved. However, thefitting processshould berepeatedwhenever anew metric isbeing considered. Moreover, it isstill amatter of controversy whether, e.g., softwaresize isdistributed log-normally [4] or doublePareto [11].

It is highly desirable, hence, to develop an aggregation approach that would be both reliable and independent ofthe metrics being aggregated. Examples of such approaches are the Gini coe cient [10] and the Theil index [22],both well-known in econometrics [6] and recently applied to software metrics [23, 20]. Comparison of di erentaggregation techniqueswasso far missing, however. In this short paper wepresent thefirst preliminary results.

Remainder of thispaper isorganizedasfollows. InSection2webriefly introducetheaggregation techniquesbeingcompared. Section 3 compares the theoretical propertiesof di erent aggregation techniques. Section 4 described theempirical studiesconducted and, finally, Section 5 discusses related work and concludes.

2. Aggregation techniques

In this section webriefly present themathematical definitions of the aggregation techniques to be evaluated. Let{x1, . . . , xn} be theset of values to beaggregated. Then, themean, denoted as x, isdefined as 1

nni=1 xi .

∗Corresponding authorEmail addresses: [email protected] (Bogdan Vasilescu), [email protected] (Alexander Serebrenik),

[email protected] (Mark van den Brand)

Preprint submitted to Elsevier June27, 2011

You Can’t Control the Unfamiliar:A Study on the Relations Between Aggregation

Techniques for Software MetricsBogdan Vasilescu, Alexander Serebrenik, Mark van den Brand

Technische Universiteit Eindhoven,Den Dolech 2, P.O. Box 513,

5600 MB Eindhoven, The Netherlands{b.n.vasilescu@student., a.serebrenik@, m.g.j.v.d.brand@}tue.nl

Abstract—A popular approach to assessing software main-tainability and predicting its evolution involves collecting andanalyzing softwaremetr ics. However, metr ics are usually definedon a micro-level (method, class, package), and should thereforebe aggregated in order to provide insights in the evolution at themacro-level (system). In addition to traditional aggregation tech-niques such as the mean, median, or sum, recently econometr icaggregation techniques, such as the Gini, Theil, Kolm, Atkinson,and Hoover inequality indices have been proposed and appliedto software metr ics.In this paper we present the results of an extensive correlation

study of themost widely-used traditional and econometr ic aggre-gation techniques, applied to lifting SLOC values from class topackage level in the 106 systems compr ising theQualitasCorpus.Moreover, we investigate the nature of this relation, and studyits evolution on a subset of 12 systems from theQualitas Corpus.Our results indicate high and statistically significant corre-

lation between the Gini, Theil, Atkinson, and Hoover indices,i.e., aggregation values obtained using these techniques conveythe same information. However, we discuss some of the rationalebehind choosing between one index or another.

I . INTRODUCTION

Software maintenance is an area of software engineeringwith deep financial implications. Indeed, it was reported thatbetween 60%and 90%of thesoftwarebudgets represent main-tenanceand evolution costs [1]–[3]. Furthermore, maintenanceand evolution costs were forecasted to account for more thanhalf of North American and European software budgets in2010 [4]. Similar or even higher figures were reported forcountries such as Norway [5] and Chile [6].Controlling software maintenance costs requires predicting

how the system will evolve in the future, which in turnrequires a better understanding of software evolution [7]–[9].A popular approach to assessing software maintainability andpredicting its evolution involves performing measurements oncode artifacts. It starts off by identifying a number of specificproperties of the system under investigation, and then collect-ing the corresponding software metrics and analyzing theirevolution. Although it is debatablewhether one cannot controlwhat one cannot measure, it is without a doubt that collectingand analyzing metrics helps increase one’s familiarity andunderstanding of the analyzed systems.

However, metricsareusually defined at micro level (method,class, package), while the analysis of maintainability andevolution requires insights at macro (system) level. Moreover,due to privacy reasons, it might be undesirable to disclosemetrics pertaining to a single developer as opposed to thosepertaining to the entire project [10]. Metrics should thereforebe aggregated [11].Popular aggregation techniques include such standard sum-

mary statistical measures as mean, median, or sum [12], [13].Their main advantage is universality (metrics-independence):whatever metrics are considered, the measures should be cal-culated in the sameway. However, as the distribution of manyinteresting software metrics is skewed [14], the interpretationof such measures becomes unreliable [15].Alternatively, distribution fitting [14], [16], [17] consists of

selecting a known family of distributions (e.g., log-normalor exponential) and fitting its parameters to approximate themetric values observed. The fitted parameters can be thenseen as aggregating these values. However, the fitting processshould be repeated whenever a new metric is being consid-ered. Moreover, it is still a matter of controversy whether,e.g., software size is distributed log-normally [16] or doublePareto [18]. We do not consider distribution fitting.Recently, there isan emerging trend in usingmoreadvanced

aggregation techniques borrowed from econometrics, wherethey areused to study inequality of incomeor welfaredistribu-tions [19]–[21]. The motivation for applying such techniquesto software metrics is twofold. First, as numerous countrieshave few rich and many poor, numerous software systemshave few very big or complex components, and many small orsimple ones [15], [22], [23]. Consequently, it is common bothfor software metrics, as well as for econometric variables tohave strongly-skewed distributions (Figure 1).Second, the shape of these distributions, which appear

visually to follow a power law, renders the use of traditionalaggregation techniques such as the sample mean and variancequestionable at best. Indeed, it was reported that many impor-tant relationships between software artifacts follow a power-law distribution [16], [25], and it is known that a power-lawdistribution may not have a finite mean and variance [22].

By No Means: A Study on Aggregating Software Metrics

Bogdan VasilescuTechnische Universiteit

EindhovenDen Dolech 2, P.O. Box 513,

5600 MB EindhovenThe Netherlands

[email protected]

Alexander SerebrenikTechnische Universiteit

EindhovenDen Dolech 2, P.O. Box 513,

5600 MB EindhovenThe Netherlands

[email protected]

Mark van den BrandTechnische Universiteit

EindhovenDen Dolech 2, P.O. Box 513,

5600 MB EindhovenThe Netherlands

[email protected]

ABSTRACTFault prediction models usually employ software metrics which

were previously shown to be a strong predictor for defects,

e.g., SLOC. However, metrics are usually de ned on a micro-

level (method, class, package), and should therefore be ag-

gregated in order to provide insights in the evolution at the

macro-level (system). In addition to traditional aggrega-

tion techniques such as the mean, median, or sum, recently

econometric aggregation techniques, such as the Gini, Theil,

and Hoover indices have been proposed. In this paper we

wish to understand whether the aggregation technique in-

uences the presence and strength of the relation between

SLOC and defects. Our results indicate that correlation is

not strong, and is in uenced by the aggregation technique.

Categor iesand Subject Descr iptorsD.2.7 [Software Engineering]: Distribution, Maintenance,

and Enhancement corrections; D.2.8 [Software Engineer-

ing]: Metrics complexity measures

General TermsMeasurement, Economics, Experimentation

KeywordsSoftware metrics, maintainability, aggregation techniques

1. INTRODUCTIONSoftware maintenance is an area of software engineering

with deep nancial implications. Indeed, it was reported

that up to 90% of the software budgets represent mainte-

nance and evolution costs [10, 3]. Thus, in order to control

software maintenance costs, it is desirable, e.g., to predict

faulty components early in the development phase.

Fault prediction models usually employ software metrics

which were previously shown to be a strong predictor for de-

fects [9, 4, 21, 22, 20, 12]. Such a metric is size, measured in

Permission to make digital or hard copies of all or part of this work forpersonal or classroom use is granted without fee provided that copies arenot madeor distributed for profit or commercial advantage and that copiesbear thisnoticeand the full citation on thefirst page. To copy otherwise, torepublish, to post on serversor to redistribute to lists, requiresprior specificpermission and/or a fee.ICSE ’11, May 21–28, 2011, Waikiki, Honolulu, HI, USACopyright 2011 ACM 978-1-4503-0593-8/11/05 ...$10.00.

(source) lines of code, (S)LOC. Size (SLOC) not only corre-

sponds to the intuitive belief that large systems have more

faults in them than small systems, but was shown to act

as an early indicator of problems better than, e.g., object-

oriented metrics such as the Chidamber and Kemerer suite

or the Lorenz and Kidd suite [9].

However, software metrics are commonly de ned at micro-

level (method, class, package), and should therefore be ag-

gregated at macro-level (system), in order to provide insights

in the study of maintainability and evolution.

Popular aggregation techniques include such standard sum-

mary statistical measures as mean, median, or sum [19].

Their main advantage is universality (metrics-independence):

whatever metrics are considered, the measures should be

calculated in the same way. However, as the distribution of

many interesting software metrics is skewed [29], the inter-

pretation of such measures becomes unreliable.

Alternatively, distribution tting [6, 26, 29] consists of se-

lecting a known family of distributions (e.g., log-normal or

exponential) and tting its parameters to approximate the

metric values observed. The tted parameters can be then

considered as aggregating these values. However, the tting

process should be repeated whenever a new metric is be-

ing considered. Moreover, it is still a matter of controversy

whether, e.g., software size is distributed log-normally [6] or

double Pareto [14].

Recently, there is an emerging trend in using more ad-

vanced aggregation techniques, that are both reliable, as well

as general. Examples of such approaches are the Gini coe -

cient [11], the Theil index [28], and the Hoover index [15], all

well-known in econometrics for their applicability to study-

ing income inequality [7], and recently applied to software

metrics [27, 30, 13, 31].

In this preliminary study, based on the assumption that

size is a good predictor for defects, hence size and defects

should be statistically related, we wish to understand whether

the aggregation technique in uences the presence and strength

of this relation. Brie y, our results indicate that correlation

between SLOC and defects is not strong, and is in uenced

by the aggregation technique.

2. METHODOLOGYWe apply correlation analysis to SLOC data of Java classes

aggregated at package level using di erent aggregation tech-

niques, and defects (bug count per package). As a by-

product of our evaluation, we also study the correlation be-

tween the di erent aggregation techniques themselves. The

choice for aggregating data from class to package level rather

JOURNAL OFSOFTWAREMAINTENANCEAND EVOLUTION: RESEARCH AND PRACTICEJ. Softw. Maint. Evol.: Res. Pract. 0000; 00:1–15Published online inWiley InterScience (www.interscience.wiley.com). DOI: 10.1002/smr

Practical SoftwareQuality MetricsAggregation

KarineMordal1, NicolasAnquetil2 , Jannik Laval2, Alexander Serebrenik3, BogdanVasilescu3, and StephaneDucasse2

1LIASD, University of Paris8, France2RMoD Team, INRIA, Lille, France

3TechnischeUniversiteit Eindhoven, TheNetherlands

SUMMARY

With the growing need for quality assessment of entire software systems, in practice, new issues areemerging. First, since most software quality metrics are defined at the level of individual softwarecomponents, there is a need for aggregation methods to summarize the results at the system level. Second,sinceapractical evaluation requires theuseof different metrics, with possibly widely varying output ranges,there is a need to combine these results into a unified quality assessment. Third, since projects vary anddifferent organizationshavedifferent perceptionson quality, there isaneed to adapt the interpretation of thequality assessment to theperception of theusersperforming it. In thispaper weidentify therequirementsfora practical aggregation method, and present theSqualemodel for metric aggregation, specifically designedto address theneedsof practitioners. Weempirically validate theadequation of Squale through experimentson Eclipse. Additionally, wecompare theSqualemodel to both traditional aggregation techniques (e.g., thearithmetic mean), aswell as to econometric inequality indices (e.g., theGini or theTheil indices), recentlyapplied to aggregation of softwaremetrics. Copyright c 0000 JohnWiley & Sons, Ltd.

Received . . .

KEY WORDS: softwaremetrics; softwarequality; aggregation; inequality indices

1. INTRODUCTION

Softwaremetrics are becoming part of the software development fabric, essential to understandingwhether the quality of the software we are building corresponds to our expectations [Pfl08]. Asa consequence, many different metrics have been proposed, as well as a plethora of tools tocomputethemandperformquality assessments. Considering thedifferent stakeholdersparticipatingin software projects (e.g. developers, managers, users), quality needs to be evaluated at differentlevels of detail. Practical application of software metrics is, however, challenged by (i) the needto combine different metrics as recommended by quality-model design methods such as Factor-Criteria-Metric (FCM) [MRW76], or Goal-Question-Metric (GQM) [Bas92]; (ii) theneed to obtaininsights in quality of the entire system based on the metric values obtained for low-level systemelements such as classes and methods; and (iii) the need to fine tune the quality model to differentquality standardsemployed by different organizations. Wedetail each challengeseparately.First, a practical quality assessment needs to combine the results of various methods to answer

specificquestionsassuggestedby suchmodelsasFactor-Criteria-Metric (FCM) [MRW76], or Goal-Question-Metric (GQM) [Bas92]. For example, cyclomatic complexity might becombinedwith test

Correspondence to: INRIA Team RMod, Parc Scientifique de la Haute Borne, 40, avenue Halley. Bt.A, Park Plaza,59650 Villeneuved’Ascq, France. E-mail: [email protected]

Copyright c 0000 JohnWiley & Sons, Ltd.

Prepared using smrauth.cls [Version: 2010/05/10 v2.00]

BeNeVol 2010 ICSM 2011WETSoM 2011 JSME

Page 57: Master Thesis presentation

33/32

/ department of mathematics and computer science

Correlation

Linear correlation can be misleading.

5 10 15

46

810

12

Pea: 0.816; Ken: 0.963; Spe: 0.990

●●

●●

●●

●●

5 10 15

46

810

12

Pea: 0.816; Ken: 0.636; Spe: 0.818

●●

●●

5 10 15

46

810

12

Pea: 0.816; Ken: 0.563; Spe: 0.690

●●●

5 10 15

46

810

12

Pea: 0.816; Ken: 0.426; Spe: 0.5

●●

Page 58: Master Thesis presentation

34/32

/ department of mathematics and computer science

References I

Irina Diana Coman, Alberto Sillitti, and Giancarlo Succi.

A case-study on using an automated in-process software engineeringmeasurement and analysis system in an industrial environment.

In Proceedings of the 31st International Conference on SoftwareEngineering, pages 89–99. IEEE Computer Society, 2009.