54COMMUNI CATI ONSOFTHEACM| JULY2012| VOL. 55| NO.
7practiceILLUSTRATION BY GARY NEILLARESOFTWAREMETRICS helpful tools
or a waste of time? For every developer who treasures these
mathematical abstractions of software systems there is a developer
who thinks software metrics are invented just to keep project
managers busy. Software metrics can be very powerful tools that
help achieve your goals but it is important to use them correctly,
as they also have the power to demotivate project teams and steer
development in the wrong direction. For the past 11 years, the
Software Improvement Group has advised hundreds of organizations
concerning software development and risk management on the basis of
software metrics. We have used software metrics in more than 200
investigations in which we examined a single snapshot of a system.
Additionally, we use software metrics to track the ongoing
development effort of more than 400 systems. While executing these
projects, we have learned some pitfalls to avoid when using
software metrics in a project management setting. Thisarticle
addresses the four most important of these: Metric in a bubble;
Treating the metric; One-track metric; and Metrics galore.
Knowingaboutthesepitfallswill help you recognize them and,
hopeful-ly, avoid them, which ultimately leads
tomakingyourprojectsuccessful.As asoftwareengineer,yourknowledge of
these pitfalls helps you understand why project managers want to
use soft-waremetricsandhelpsyouassistthe managers when they are
applying met-ricsinaninefcientmanner.Asan
outsideconsultant,youneedtotake
thepitfallsintoaccountwhenpre-senting advice and proposing actions.
Finally,ifyouaredoingresearchin the area of software metrics,
knowing these pitfalls will help place your new
metricintherightcontextwhenpre-sentingittopractitioners.Beforediv-ingintothepitfalls,letslookatwhy
softwaremetricscanbeconsidereda useful tool. Software Metrics Steer
People Yougetwhatyoumeasure.This phrasedenitelyappliestosoftware
projectteams.Nomatterwhatyoude-ne as a metric, as soon as it is
used to evaluate a team, the value of the metric
movestowardthedesiredvalue.Thus, to reach a particular goal, you
can con-tinuouslymeasurepropertiesofthe
desiredgoalandplotthesemeasure-mentsinaplacevisibletotheteam.
Ideally,thedesiredgoalisplotted alongside the current measurement
to indicate the distance to the goal. Imagine a project in which
the run-timeperformanceofaparticularuse
caseisofcriticalimportance.Inthis caseithelpstocreateatestinwhich
theexecutiontimeoftheusecaseis measureddaily.Byplottingthisdaily
datapointagainstthedesiredvalue, andmakingsuretheteamseesthis
measurement,itbecomescleartoev-eryonewhetherthedesiredtargetis
being met or whether the development
actionsofyesterdayareleadingthe team away from the goal. Getting
What You MeasureDOI : 10. 1145/2209249. 2209266 Article development
led by queue.acm.orgFour common pitfalls in using software metrics
for project management. BY ERIC BOUWERS, JOOST VISSER, AND ARIE VAN
DEURSENCREDIT TKJULY2012| VOL. 55| NO. 7| COMMUNI CATI
ONSOFTHEACM5556COMMUNI CATI ONSOFTHEACM| JULY2012| VOL. 55| NO.
7practiceEventhoughitmightseemsimple,
thistechniquecanbeappliedincor-rectlyinanumberofsubtleways.For
example,imagineasituationinwhich customersareunhappybecausethey
reportproblemsinaproductthatare
notsolvedinatimelymanner.Toim-provecustomersatisfaction,theproj-ectteamtrackstheaverageresolution
timeforissuesinarelease,following the reasoning that a lower
average res-olutiontimeresultsinhighercustom-er satisfaction.
Unfortunately,realityisnotso simple.Tostart,solvingissuesfaster
might lead to unwanted side
effectsforexample,aquickxnowcouldre-sult in longer x times later
because of incurred technical debt. Second, solv-ing an issue
within days does not help the customer if these xes are released
onlyonceayear.Finally,customers areundoubtedlymoresatisedwhen no x
is required at allthat is, issues donotendupintheproductinthe rst
place. Thus,usingametricallowsyou tosteertowardagoal,whichcanbe
eitherahigh-levelbusinessproposi-tion(thecostsofmaintainingthis
systemshouldnotexceed$100,000
peryear)ormoretechnicallyori-ented(allpagesshouldloadwithin
10seconds).Unfortunately,using metricscanalsopreventyoufrom
reachingthedesiredgoal,depend-ing on the pitfalls encountered. In
the remainderofthisarticle,wediscuss
someofthepitfallswefrequentlyen-counteredandexplainhowtheycan be
recognized and avoided.What Does the Metric Mean?
Softwaremetricscanbemeasuredon differentviewsofasoftwaresystem.
Thisarticlefocusesonmetricscalcu-lated on a particular version of
the code base of a system, but the pitfalls also ap-ply to metrics
calculated on other views. Assumingthecodebasecontains
onlythecodeofthecurrentproject, softwareproductmetricsestablish
agroundtruth.Calculatingonlythe metricsisnotenough,however.Two
moreactionsareneededtointerpret the value of the metric: adding
context; andestablishingtherelationshipwith the goal.
Toillustratethesepoints,weuse theLOC(linesofcode)metrictopro-Figure
1. The lines of code of a software system from January 2010 to July
2011.Figure 2. Measuring lines of code in two different
ways.025,00050,00075,000100,000125,000150,000175,000200,000225,000250,000275,000300,000325,000350,000375,000400,000Lines
of code
LinesJan2010Mar2010May2010Jul2010Sep2010Nov2010Jan2011Mar2011May2011Jul2011Figure
3. Measuring number of les used. Nr. of
lesJan2010Mar2010May2010Jul2010Sep2010Nov2010Jan2011Mar2011May2011Jul201102505007501,0001,2501,5001,7502,0002,2502,5002,7503,0003,2503,5003,7504,0004,2504,5004,7505,000025,00050,00075,000100,000125,000150,000175,000200,000225,000250,000275,000300,000325,000350,000Lines
of
codeJan2010Mar2010May2010Jul2010Sep2010Nov2010Jan2011Mar2011May2011Jul2011practiceJULY2012|
VOL. 55| NO. 7| COMMUNI CATI
ONSOFTHEACM57videdetailsaboutthecurrentsizeof
aproject.Eventhoughtherearemul-tipledenitionsofwhatconstitutes
alineofcode,suchametriccanbe usedtoreasonaboutwhethertheex-amined
code base is complete or con-tains extraneous code such as
copied-inlibraries.Todothis,however,the
metricshouldbeplacedincontext, bringing us to our rst
pitfall.Metricinabubble.Usingametric
withoutproperinterpretation.Recog-nizedbynotbeingabletoexplainwhat
agivenvalueofametricmeans.Canbe solved by placing the metric inside
a con-text with respect to a goal.Theusefulnessofasingledata
pointofametricislimited.Knowing
thatasystemis100,000LOCismean-inglessbyitself,sincethenumber alone
does not explain if the system is largeorsmall.Tobeuseful,thevalue
ofthemetricshould,forexample,be comparedagainstdatapointstaken from
the history of the project or from abenchmarkofotherprojects.Inthe
rstscenario,youcandiscovertrends thatshouldbeexplainedbyexternal
events.Forexample,thegraphinFig-ure 1 shows the LOC of a software
sys-tem from January 2010 to July 2011. Therstquestionthatcomesto
mind here is: Why did the size of the systemdropsomuchinJuly2010?
Iftheanswertothisquestionis,We removedalotofopensourcecode
wecopiedinearlier,thenthereis noproblem(otherthantheinclusion
ofthiscodeintherstplace).Ifthe answeris,Weaccidentallydeleted
partofourcodebase,thenitmight
bewisetointroduceadifferentpro-cessofsource-codeversionmanage-ment.Inthiscasetheansweristhat
an action was scheduled to drastically reducetheamountofconguration
needed; given the amount of code that
wasremoved,thisactionwasappar-ently successful. Note that one of
the benets of plac-ingmetricsincontextisthatitallows
youtofocusontheimportantpartof thegraph.Questionsregardingwhat
happenedatacertainpointintime orwhythevaluesignicantlydeviates
fromothersystemsbecomemoreim-portant than the specic details about
howthemetricismeasured.Often
people,eitheronpurposeorbyacci-dent,trytosteeradiscussiontoward
Howisthismetricmeasured?in-steadofWhatdothesedatapoints
tellme?Inmostcasestheexactcon-structionofametricisnotimportant
fortheconclusiondrawnfromthedata. Forexample,considerthethreeplots
showningures2and3represent-ingdifferentwaysofcomputingthe
volumeofasystem.Figure2shows thelinesofcodecountedasevery
linecontainingatleastonecharacter thatisnotacommentorwhitespace
(blue) and lines of code counted as all
newlinecharacters(orange).Figure3 shows the number of les used.
Thetrendlinesindicatethat,even thoughthescalediffers,thesevol-ume
metrics all show the same events.
Thismeansthateachofthesemet-ricsisagoodcandidatetocompare
thevolumeofasystemagainstother systems.Aslongasthevolumeofthe other
systems is measured in the same manner,theconclusionsdrawnfrom the
data will be very similar. Thedifferenttrendlinesbringup a second
question: Why does the vol-umedecreaseafteraperiodinwhich
thevolumeincreased?Theanswer canbefoundinthenormalwayin
whichalterationsaremadetothis particularsystem.Whenthevolume
ofthesystemincreases,anactionis scheduledtodeterminewhethernew
abstractionsarepossible,whichis
usuallythecase.Thistypeofrefac-toringcansignicantlydecreasethe size
of the code base, which results in lowermaintenanceeffortandeasier
waystoaddfunctionalitytothesys-tem.Thus,thegoalhereistoreduce
maintenance effort by (among others) keeping the size of the code
base rela-tively small. Intheidealsituationadirectrela-tionship
exists between a desired goal (such as, reduced maintenance effort)
andametric(suchas,asmallcode base). In some cases this relationship
is based on informal reasoning (for ex-ample, when the code base of
a system is small it is easier to analyze what the
systemdoes);inothercasesscientic
researchhasshownthattherelation-shipexists.Whatisimportanthereis
that you determine both the nature of
therelationshipbetweenthemetric andthegoal(direct/indirect)andthe
strength of this relationship (informal reasoning/empirically
validated). To be useful,the value ofthe metric shouldbe compared
against datapoints taken fromthe historyof the project or from a
benchmark of other projects.58COMMUNI CATI ONSOFTHEACM| JULY2012|
VOL. 55| NO. 7practiceThus,ametricinisolationwillnot help you reach
your goal. On the other hand,assigningtoomuchmeaningto a metric
leads to a different pitfall.
Treatingthemetric.Makingaltera-tionsjusttoimprovethevalueofamet-ric.Recognizedwhenchangesmadeto
the software are purely cosmetic. Can be
solvedbydeterminingtherootcauseof the value of a
metric.Themostcommonpitfallismak-ing changes to a system just to
improve thevalueofametric,insteadoftrying to reach a particular
goal. At this point, thevalueofthemetrichasbecome
agoalinitself,insteadofameans
ofreachingalargergoal.Thissitua-tionleadstorefactoringsthatsimply
pleasethemetric,whichisawaste ofpreciousresources.Youknowthis
hashappenedwhen,forexample,one developer explains to another
develop-erthatarefactoringneedstobedone
becausetheduplicationpercentage is too high, instead of explaining
that multiplecopiesofapieceofcodecan causeproblemsformaintainingthe
code later on. It is never a problem that
thevalueofametricistoohighortoo low:thefactthisvalueisnotinline
with your goal should be the reason to perform a refactoring.
Consideraprojectinwhichthe numberofparametersformethods
ishighcomparedwithabenchmark. Whenamethodhasarelativelylarge
numberofparameters(forexample, morethanseven)itcanindicatethat
thismethodisimplementingdif-ferentfunctionalities.Splittingthe
methodintosmallermethodswould makeiteasiertounderstandeach function
separately. A second problem that could be sur-facing through this
metric is the lack of agroupingofrelateddataobjects.For example,
consider a method that takes asparametersaDateobjectcalled
startDateandanothercalledend-Date.Thenamessuggestthatthese two
parameters together form a Date-PeriodobjectinwhichstartDate will
need to be before endDate. When
multiplemethodstakethesetwopa-rametersasinput,introducingsucha
DatePeriodobjecttomakethisex-plicit in the model could be benecial,
reducingbothfuturemaintenanceef-fort,aswellasthenumberofparam-eters
being passed to methods. Sometimes,however,parameters
are,forexample,movedtotheelds of the surrounding class or replaced
by amapinwhicha(String,Object)
pairrepresentsthedifferentparam-eters.Althoughbothstrategiesreduce
thenumberofparametersinside methods, it is clear that if the goal
is to improvereadabilityandreducefuture
maintenanceeffort,thenthesesolu-tionsarenothelping.Itcouldbethat
this type of refactoring is done because
thedeveloperssimplydonotunder-stand the goal and thus are treating
the symptoms.Therearealsosituations,
however,inwhichthesenon-goal-ori-entedrefactoringsaredonetogame
thesystem.Inbothsituationsitisim-portanttomakethedevelopersaware
oftheunderlyinggoalstoensurethat effort is spent wisely.Thus a
metric should never be used as-is,butitshouldbeplacedinside
acontextthatenablesameaningful
comparison.Additionally,therela-tionshipbetweenthemetricandde-siredpropertyofyourgoalshouldbe
clear; this enables you to use the
met-rictoschedulespecicactionsthat willhelpreachyourgoal.Makesure
thescheduledactionsaretargeted towardreachingtheunderlyinggoal
instead of only improving the value of the metric.How Many Metrics
Do You Need?
Eachmetricprovidesaspecicview-pointofyoursystem.Therefore,com-biningmultiplemetricsleadstoabal-ancedoverviewofthecurrentstateof
your system. The number of metrics to
beusedleadstotwopitfalls,westart with using only a single
metric.One-track metric. Focusing on only a
singlemetric.Recognizedbyseeingonly
one(orjustafew)metricsondisplay. Can be solved by adding metrics
relevant to the goal.Usingonlyasinglesoftwaremetric
tomeasurewhetheryouareontrack towardyourgoalreducesthatgoalto a
single dimension (that is, the metric
thatiscurrentlybeingmeasured).A
goalisneveronedimensional,how-ever.Softwareprojectsexperience
constant trade-offs between delivering
desiredfunctionalityandnonfunc-tionalrequirementssuchassecurity,
performance,scalability,andmain-tainability.Therefore,multiplemet-The
most common pitfall is making changes toa system justto improvethe
valueof a metric,instead of tryingto reacha particular
goal.practiceJULY2012| VOL. 55| NO. 7| COMMUNI CATI
ONSOFTHEACM59ricsarenecessarytoensurethatyour
goal,includingspeciedtrade-offs, isreached.Forexample,asmallcode
basemightbeeasiertoanalyze,butif
thiscodebaseismadeofhighlycom-plexcode,thenitcanstillbedifcult to
make changes. In addition to providing a more bal-anced view of
your goal, using multiple metricsalsoassistsyouinndingthe
rootcauseofaproblem.Asinglemet-ricusuallyshowsonlyasinglesymp-tom,whileacombinationofmetrics
canhelpdiagnosetheactualdisease within a project.
Forexample,inoneprojectthe equalsandhashCodemethods
(thoseusedtoimplementequality forobjectsinJava)wereamongthe
longestandmostcomplexmethods within the system. Additionally, a
rela-tivelylargepercentageofduplication occurred in these methods.
Since they use all the elds of a class, the metrics
indicatethatmultipleclasseshavea relativelylargenumberofeldsthat
arealsoduplicated.Basedonthisob-servation, we reasoned the
duplicated eldsformanobjectthatwasmiss-ingfromthemodel.Inthiscasewe
advisedlookingintothemodelofthe systemtodeterminewhetherextend-ing
the model with a new object would be benecial. In this example,
examining the met-ricsinisolationwouldnothaveledto this conclusion,
but by combining sev-eral unit-level metrics, we were able to
detect a design aw.Metrics galore. Focusing on too many
metrics.Recognizedwhentheteamig-nores all metrics. Can be solved by
reduc-ing the number of metrics used.Although using a single metric
over-simpliesthegoal,usingtoomany metricsmakesitdifcult(oreven
impossible)toreachyourgoal.Apart frommakingithardtondtheright
balanceamongalargesetofmetrics, itisnotmotivatingforateamtosee that
every change they make results in
thedeclineofatleastonemetric.Ad-ditionally,whenthevalueofametric is
far off the desired goal, then a team
canstarttothink,Wewillneverget there, anyway, and simply ignore the
metrics altogether.
Forexample,therehavebeenmul-tipleprojectsthatdeployedastatic-analysistoolwithoutcriticallyexam-Ifyouarealreadyusingmetricsin
yourdailywork,trytolinkthemto specic goals. If you are not using
any metricsatthistimebutwouldliketo
seetheireffects,wesuggestyoustart small:deneasmallgoal(methods
shouldbesimpletounderstandfor newpersonnel);deneasmallsetof metrics
(for example, length and com-plexityofmethods);deneatarget
measurement (at least 90% of the code
shouldbesimple);andinstallatool that can measure the metric.
Commu-nicateboththegoalandthetrendof
themetrictoyourcolleaguesandex-perience the inuence of
metrics.Related articleson queue.acm.orgMaking a Case for Efcient
Supercomputing Wu-chun
Fenghttp://queue.acm.org/detail.cfm?id=957772Power-Efcient Software
Eric Saxehttp://queue.acm.org/detail.cfm?id=1698225Sifting Through
the SoftwareSandbox: SCM Meets QA William W.
Whitehttp://queue.acm.org/detail.cfm?id=1046945Eric Bouwers (at
[email protected] ) is a software engineer and technical consultant
at the Software Improvement Group in Amsterdam, The Netherlands. He
is a part-time Ph.D. student at Delft University of Technology. He
is interested in how software metrics can assist in quantifying the
architectural aspects of software quality. Joost Visser
([email protected] ) is head of researchat the Software Improvement
Group in Amsterdam,The Netherlands, where he is responsible for
innovation of tools and services, academic relations, and general
research. He also holds a part-time position as professor of
large-scale software systems at the Radboud University Nijmegen,The
Netherlands.Arie van Deursen ([email protected]) is a full
professor in software engineering at Delft University of
Technology, The Netherlands, where he leads the Software
Engineering Research Group. His researchtopics include software
testing, software architecture,and collaborative software
development. 2012 ACM 0001-0782/12/07 $15.00ining the default
conguration. When
thetoolinquestioncontains,forex-ample,acheckthatagstheuseofa
tabcharacterinsteadofspaces,the rst run of the tool can report an
enor-mousnumberofviolationsforeach check(runningintothehundredsof
thousands).Withoutproperinter-pretationofthisnumber,itiseasyto
conclude that reaching zero violations cannot be done within any
reasonable amountoftime(eventhoughsome problemscaneasilybesolvedbya
simple formatting action). Such an in-correct assessment sometimes
results inthetoolbeingconsidereduseless by the team, which then
decides to ig-nore the tool. Fortunately,inothercasesthe
teamadaptsthecongurationtosuit thespecicsituationbylimitingthe
numberofchecks(forexample,by removingchecksthatmeasurehighly
related properties, can be solved auto-matically, or are not
related to the cur-rentgoals)andinstantiatingproper default values.
By using such a specic conguration, the tool reports a lower number
of violations that can be xed in a reasonable amount of time.
Toensureallviolationsarexed eventually,thecongurationcan
beextendedtoincludeothertypes ofchecksormorestrictversionsof
checks.Thiswillincreasetheto-talnumberofviolationsfound,but
whendonecorrectlythenumberof
reportedviolationsdoesnotdemo-tivatethedeveloperstoomuch.This
process can be repeated to extend the set of checks slowly toward
all desired checks without overwhelming the de-velopers with a
large number of viola-tions at once. Conclusion
Softwaremetricsareusefultoolsfor projectmanagersanddevelopers
alike. To benet from the full potential
ofmetrics,keepthefollowingrecom-mendations in mind:
Attachmeaningtoeachmetricby placingitincontextanddeningthe
relationshipbetweenthemetricand your goal, while at the same time
avoid making the metric a goal in itself.
Usemultiplemetricstotrackdif-ferentdimensionsofyourgoal,but avoid
demotivating a team by using too many metrics.