Key Measurements For Testers

Key Measurements For Key Measurements For TestersTesters

By GopiDelivering Software Project Success

Precision vs. AccuracyPrecision vs. Accuracy

AccuracyAccuracy Saying PI = 3 is accurate, but not preciseSaying PI = 3 is accurate, but not precise I’m 2 meters tall, which is accurate, I’m 2 meters tall, which is accurate,

but not precisebut not precise PrecisionPrecision

Saying PI = 4.378383 is precise, but not accurateSaying PI = 4.378383 is precise, but not accurate Airline flight times are precise to the minute, Airline flight times are precise to the minute,

but not accuratebut not accurate Number of significant digits is the keyNumber of significant digits is the key

Precision vs. AccuracyPrecision vs. Accuracy

People make assumptions about accuracy People make assumptions about accuracy based on precisionbased on precision

““365 days” is not the same as “1 year” or 365 days” is not the same as “1 year” or “4 quarters” or even “52 weeks”“4 quarters” or even “52 weeks”

““10,000 staff hours” is not the same as “5 10,000 staff hours” is not the same as “5 staff years”staff years”

Unwarranted precision is the enemy of Unwarranted precision is the enemy of accuracy (e.g., 395.7 days +/- 6 months)accuracy (e.g., 395.7 days +/- 6 months)

IntroductionIntroduction

Good GoalsGood Goals

A goal should be SMARTA goal should be SMART SpecificSpecific Measurable/TestableMeasurable/Testable AttainableAttainable RelevantRelevant Time-boundTime-bound

Can use a Can use a Purpose, Issue, Object Purpose, Issue, Object formatformat


GQM HierarchyGQM Hierarchy

Goal 1 Goal 2

Question Question Question Question Question

Measure Measure Measure Measure MeasureMeasure


GQM ExampleGQM Example

Current average cycle time * 100Baseline average cycle time

Subjective rating of manager’s satisfaction

Measures

Is the performance of the process improving?Question

Average cycle timeStandard Deviation% cases outside the upper limit

Measures

What is the current change request processing speed?Question

Improve by 10%the timeliness of change request processingfrom the project manager’s viewpoint

Goal Purpose Issue Object (process) Viewpoint

Project Evaluation: QualityProject Evaluation: Quality

Test Planning and ResourcesTest Planning and Resources

Do we have enough testing resources?Do we have enough testing resources? How many tests do we need to run (estimated)?How many tests do we need to run (estimated)? How long does each test case take to design and write?How long does each test case take to design and write? How long does each test take, on average?How long does each test take, on average? How many full testing cycles do we expect? (more than How many full testing cycles do we expect? (more than

one especially for early test cycles)one especially for early test cycles) How many person-days do we need (# tests * time per test How many person-days do we need (# tests * time per test

* # of cycles)? * # of cycles)? How many testing staff do we have?How many testing staff do we have? How long will the testing phase take, with our current How long will the testing phase take, with our current

staff?staff? Is the testing phase too long (i.e. our current staff is not Is the testing phase too long (i.e. our current staff is not

sufficient)? Do we have to test less or can we add staff?sufficient)? Do we have to test less or can we add staff?


Reported/Corrected Software Reported/Corrected Software DefectsDefects

0%

100%

TimeStart of testing phase End of testing phase

Defects found

Defects fixed

Defects open

From Manager’s Handbook for Software Development, Revision 1, NASA, Software Engineering Laboratory 1990


Reported/Corrected Software Reported/Corrected Software Defects – Actual ProjectDefects – Actual Project

Number of defect reports

(in thousands)

0

1.0

Weeks of testing5 10 15 20 25 30 35 40

0.2

0.4

0.8

0.6

Found

Open

Fixed


Defect RateDefect RateExpected Total Defects

010203040506070

0 3 6 9 12 15 18 21 24 27

Months from Start of Project

Defects/Month

Defects/Month

95%

99%

99.9%


Statistics on Effort per DefectStatistics on Effort per Defect

Data on time required to fix defects, categorized Data on time required to fix defects, categorized by type of defect, provides a basis for estimating by type of defect, provides a basis for estimating remaining defect correction workremaining defect correction work

Need to collect data on fix time in defect tracking Need to collect data on fix time in defect tracking systemsystem

Data on phases in which defects are injected and Data on phases in which defects are injected and later detected gives you a measure of the later detected gives you a measure of the efficiency of the development process. If 95% of efficiency of the development process. If 95% of the defects are detected in the same phase they the defects are detected in the same phase they were created, the project has an efficient processwere created, the project has an efficient process


A Defect Fix Time Model for A Defect Fix Time Model for TestingTesting

From Software Metrics: Establishing a Company-wide Program, by Robert B Grady and Deborah L. Caswell, 1987

25%

50%

20%

4%

1%

2 hours5 hours10 hours20 hours50 hours

Product Characterization: QualityProduct Characterization: Quality

DefectsDefects

Defects are one of the most often used Defects are one of the most often used measures of qualitymeasures of quality

Definitions of defects differDefinitions of defects differ Only items found by customers? Testers?Only items found by customers? Testers? Items found during upstream reviews?Items found during upstream reviews? Only non-trivial items?Only non-trivial items? Small enhancements?Small enhancements?

Timing of “defect” detection an important part Timing of “defect” detection an important part of defect characterizationof defect characterization

A “product defect” may be different than a “process A “product defect” may be different than a “process defect”defect”

Product Evaluation: TestingProduct Evaluation: Testing

System Test ProfileSystem Test Profile

0

20

40

60

80

100

120

140

System Test Phase

Test

s

TestsExecuted

TestsPassed

TestsPlanned

From NASA, Recommended Approach to Software Development, 1992


System Test ProfileSystem Test Profile

0

20

40

60

80

100

120

140

System Test Phase

Test

s

TestsExecuted

TestsPassed

TestsPlanned

From NASA, Recommended Approach to Software Development, 1992


Cumulative Defects Found in Cumulative Defects Found in TestingTesting

Error Rate Model

012345678

Design Code/Test System Test AcceptanceTest

Cum

ulat

ive

Erro

rs p

er K

SLO

C

Historical NormUpper boundLower Bound



Cumulative Defects – Actual Cumulative Defects – Actual ProjectProject

Error Rate Model

012345678

Design Code/Test System Test AcceptanceTest

Cum

ulat

ive

Erro

rs p

er K

SLO

C

Historical NormUpper BoundLower BoundActual Project


Product PredictionProduct Prediction

Predicting Future Defect RatesPredicting Future Defect Rates

Increasing FactorsIncreasing Factors System sizeSystem size Application Application

complexitycomplexity Compressing the Compressing the

scheduleschedule 4x increase4x increase

More staffMore staff Lower productivityLower productivity

Decreasing FactorsDecreasing Factors Simplifying the Simplifying the

application/problem at application/problem at handhand

Extending the planned Extending the planned development timedevelopment time Cut in halfCut in half

Fewer staffFewer staff Higher productivityHigher productivity


Defect Density PredictionDefect Density Prediction

To judge whether we’ve found all the defects for an To judge whether we’ve found all the defects for an application, estimate its defect densityapplication, estimate its defect density

Need statistics on defect density of past similar projectsNeed statistics on defect density of past similar projects Use this data to predict expected density on this projectUse this data to predict expected density on this project For example, if our prior projects had a defect density For example, if our prior projects had a defect density

between 7 and 9.5 defects/KLOC, we expect a similar between 7 and 9.5 defects/KLOC, we expect a similar density on our new projectdensity on our new project

If our new project has 100,000 lines of code, we expect to find If our new project has 100,000 lines of code, we expect to find between 700 and 950 defects totalbetween 700 and 950 defects total

If we’ve found 600 defects so farIf we’ve found 600 defects so far We’re not done: we expect to find between 100 and 350 more We’re not done: we expect to find between 100 and 350 more

defectsdefects


Distribution of Software Defect Distribution of Software Defect Origins and SeveritiesOrigins and Severities

Highest severity faults come from Highest severity faults come from requirements and designrequirements and design

Seve

rity

Leve

l

Minor

Mod

Major

Critical

Requirements

Design

Coding

Documentation

Bad Fixes


Defect ModelingDefect Modeling

Model the number of defects expected Model the number of defects expected based on past experiencebased on past experience

Model the number of defects in Model the number of defects in requirements, design, construction, etc. requirements, design, construction, etc.

Two approaches:Two approaches: Model defects based on effort hours, i.e X Model defects based on effort hours, i.e X

defects will be introduced per hour workeddefects will be introduced per hour worked Model defects per KSLOC (or other size unit) Model defects per KSLOC (or other size unit)

based on past experience and code growth based on past experience and code growth curve curve


Defect Modeling Defect Modeling continuedcontinued

Approach 1: SEI data, based on PSP data:Approach 1: SEI data, based on PSP data: DesignDesign Injected/hour = 1.76Injected/hour = 1.76 CodingCoding Injected/hour = 4.20 Injected/hour = 4.20

Approach 2:Approach 2: Defects / KSLOC total are about 40 (30-85)Defects / KSLOC total are about 40 (30-85)

10% requirements (4/KLOC) 25% design (10/KLOC) 40% coding (16/KLOC) 15% user documentation (6/KLOC) 10% bad fixes (4/KLOC)


Predicted and Actual Defects Predicted and Actual Defects FoundFound

0

100

200

300

400

500

600

700

800

Defe

cts

Phase injectionestimate

Phase actual removal

Phase expectedremoval

Cumulative actualremoval

Cumulative injectionestimate

Cumulative expectedRemoval

Cumulative injectionreestimate

Development PhaseFrom Edward F. Weller, Practical Applications of Statistical Process Control, IEEE Software May/June 2000

Size reestimate


Defect Profile by Type - Defect Profile by Type - ExampleExample

Sources of defects

Release MeasuresRelease Measures

Defect CountsDefect Counts

Defect counts give a quantitative handle Defect counts give a quantitative handle on how much work the project team still on how much work the project team still has to do before it can release the has to do before it can release the softwaresoftware

Graph the cumulative reported defects, Graph the cumulative reported defects, open defects and fixed defectsopen defects and fixed defects

When the software is nearing release, the When the software is nearing release, the number of open defects should trend number of open defects should trend downward, and the fixed defects should downward, and the fixed defects should be approaching the reported defects linebe approaching the reported defects line


Defect Trends – Near ReleaseDefect Trends – Near ReleaseAll DefectsAll Defects


(in thousands)

0

1.0

Weeks of testing5 10 15 20 25 30 35 40

0.2

0.4

0.8

0.6

Found

Open

Fixed

Target


Defect Trends – Near ReleaseDefect Trends – Near ReleaseSeverity 1 and 2Severity 1 and 2


(in thousands)

0

1.0

Weeks of testing5 10 15 20 25 30 35 40

0.2

0.4

0.8

0.6

Found

Open

Fixed

Target


Construx Measurable Release Construx Measurable Release CriteriaCriteria

Acceptance testing successfully completedAcceptance testing successfully completed All open change requests dispositionedAll open change requests dispositioned System testing successfully completedSystem testing successfully completed All requirements implemented, based on the specAll requirements implemented, based on the spec All review goals have been metAll review goals have been met Declining defect rates are seenDeclining defect rates are seen Declining change rates are seenDeclining change rates are seen No open Priority A defects exist in the databaseNo open Priority A defects exist in the database Code growth has stabilizedCode growth has stabilized


HP Measurable Release HP Measurable Release CriteriaCriteria

Breadth – testing coverage of user Breadth – testing coverage of user accessible and internal functionsaccessible and internal functions

Depth – branch coverage testingDepth – branch coverage testing Reliability – continuous hours of operation Reliability – continuous hours of operation

under stress; stability; ability to recover under stress; stability; ability to recover gracefully from defect conditionsgracefully from defect conditions

Remaining defect density at releaseRemaining defect density at release

From Robert B Grady, Practical Software Metrics for Project Management and Process Improvement, 1992


Post Release Defect Density by Post Release Defect Density by Whether Met Release CriteriaWhether Met Release Criteria

Postrelease incoming defects submitted by customers (3 month moving average)

MR 1 2 3 4 5 6 7 8 9 10 11 12

Months

Defects submitted

(normalized by KLOC)

Did NotMeet

Worst ProductThat Met

Average ofProductsThat Met

From Practical Software Metrics for Project Management and Process Improvement, by Robert B. Grady 1992

Release Measures: Defect CountsRelease Measures: Defect Counts

Defect Plot Before ReleaseDefect Plot Before Release

0

2

4

6

8

10

12

Time

Num

ber o

f Def

ects

Sev 1 & 2Sev 2Sev 1Target

From Robert B Grady, Practical Software Metrics for Project Management and Process Improvement, 1992

Detection EffectivenessDetection Effectiveness

0

10

20

30

40

50

60

70

80

90

100

Des

ign

Che

ck

Des

ign

Rev

iew

Des

ign

Insp

ectio

n

Cod

e In

spec

tion

Prot

otyp

e

Cod

e C

heck

Uni

t Tes

t

Func

tiona

l Tes

t

Inte

grat

ion

Test

Fiel

d Tr

ial

Cum

ulat

ive

HighestModalLowest

[Jones86]

Process EvaluationProcess Evaluation

Status ModelStatus Model

Units created

Units reviewed

Units tested

Process EvaluationProcess Evaluation

Status ExampleStatus Example

0100200300400500600700800

Implementation Phase

Uni

ts

TargetUnits CreatedUnits ReviewedUnits Tested

1

From NASA, Manager’s Handbook for Software Development, Revision 1, 1990

Goal #1 – Improve Software Quality Goal #1 – Improve Software Quality

Postrelease Discovered Defect Postrelease Discovered Defect DensityDensity

00.10.20.30.40.50.60.70.80.9

1

Nov-84 Mar-86 Aug-87 Dec-88 May-90 Sep-91 Jan-93Num

ber o

f Ope

n Se

rious

and

Crit

ical

D

efec

t Rep

orts

Older

< 12Months

10X Goal


Goal #1 – Improve Software Quality Goal #1 – Improve Software Quality

Prerelease Defect DensityPrerelease Defect Density Question: How can we predict software quality based on Question: How can we predict software quality based on early development processes?early development processes?

0

10

20

30

40

50

60

70

80

Oct-80 Feb-82 Jul-83 Nov-84 Mar-86 Aug-87 Dec-88

Project Release Date

Def

ects

in T

est/K

LOC Defects in

Test/KLOC

Linear(Defects inTest/KLOC)


Goal #3 – Improve Productivity Goal #3 – Improve Productivity

Defect Repair EfficiencyDefect Repair EfficiencyQuestion: How efficient are defect-fixing activities? Are we Question: How efficient are defect-fixing activities? Are we improving?improving?

00.5

11.5

22.5

33.5

44.5

5

1987 1988 1989 1990 1991

Def

ects

Fix

ed/E

ngr.

Mon

th

DefectsFixed /EngrMonth


Goal #4 – Maximize Customer Satisfaction Goal #4 – Maximize Customer Satisfaction

Mean Time to Fix Critical and Mean Time to Fix Critical and Serious DefectsSerious DefectsQuestion: How long does it take to fix a problem?Question: How long does it take to fix a problem?

0

50

100

150

200

250

7/18

/199

0

8/18

/199

0

9/18

/199

0

10/1

8/19

90

11/1

8/19

90

12/1

8/19

90

1/18

/199

1

2/18

/199

1

3/18

/199

1

4/18

/199

1

5/18

/199

1

6/18

/199

1

7/18

/199

1

8/18

/199

1

Days

ARQAKP+ADLCMR


AR = Awaiting release

QA = Final QA testing

KP = known problem

AD = awaiting data

LC = lab classification

MR = marketing review

Key Measurements For Testers

Documents