Key Measurements For Key Measurements For Testers Testers By Gopi Delivering Software Project Success
Key Measurements For Key Measurements For TestersTesters
By GopiDelivering Software Project Success
Precision vs. AccuracyPrecision vs. Accuracy
AccuracyAccuracy Saying PI = 3 is accurate, but not preciseSaying PI = 3 is accurate, but not precise I’m 2 meters tall, which is accurate, I’m 2 meters tall, which is accurate,
but not precisebut not precise PrecisionPrecision
Saying PI = 4.378383 is precise, but not accurateSaying PI = 4.378383 is precise, but not accurate Airline flight times are precise to the minute, Airline flight times are precise to the minute,
but not accuratebut not accurate Number of significant digits is the keyNumber of significant digits is the key
Precision vs. AccuracyPrecision vs. Accuracy
People make assumptions about accuracy People make assumptions about accuracy based on precisionbased on precision
““365 days” is not the same as “1 year” or 365 days” is not the same as “1 year” or “4 quarters” or even “52 weeks”“4 quarters” or even “52 weeks”
““10,000 staff hours” is not the same as “5 10,000 staff hours” is not the same as “5 staff years”staff years”
Unwarranted precision is the enemy of Unwarranted precision is the enemy of accuracy (e.g., 395.7 days +/- 6 months)accuracy (e.g., 395.7 days +/- 6 months)
IntroductionIntroduction
Good GoalsGood Goals
A goal should be SMARTA goal should be SMART SpecificSpecific Measurable/TestableMeasurable/Testable AttainableAttainable RelevantRelevant Time-boundTime-bound
Can use a Can use a Purpose, Issue, Object Purpose, Issue, Object formatformat
IntroductionIntroduction
GQM HierarchyGQM Hierarchy
Goal 1 Goal 2
Question Question Question Question Question
Measure Measure Measure Measure MeasureMeasure
IntroductionIntroduction
GQM ExampleGQM Example
Current average cycle time * 100Baseline average cycle time
Subjective rating of manager’s satisfaction
Measures
Is the performance of the process improving?Question
Average cycle timeStandard Deviation% cases outside the upper limit
Measures
What is the current change request processing speed?Question
Improve by 10%the timeliness of change request processingfrom the project manager’s viewpoint
Goal Purpose Issue Object (process) Viewpoint
Project Evaluation: QualityProject Evaluation: Quality
Test Planning and ResourcesTest Planning and Resources
Do we have enough testing resources?Do we have enough testing resources? How many tests do we need to run (estimated)?How many tests do we need to run (estimated)? How long does each test case take to design and write?How long does each test case take to design and write? How long does each test take, on average?How long does each test take, on average? How many full testing cycles do we expect? (more than How many full testing cycles do we expect? (more than
one especially for early test cycles)one especially for early test cycles) How many person-days do we need (# tests * time per test How many person-days do we need (# tests * time per test
* # of cycles)? * # of cycles)? How many testing staff do we have?How many testing staff do we have? How long will the testing phase take, with our current How long will the testing phase take, with our current
staff?staff? Is the testing phase too long (i.e. our current staff is not Is the testing phase too long (i.e. our current staff is not
sufficient)? Do we have to test less or can we add staff?sufficient)? Do we have to test less or can we add staff?
Project Evaluation: QualityProject Evaluation: Quality
Reported/Corrected Software Reported/Corrected Software DefectsDefects
0%
100%
TimeStart of testing phase End of testing phase
Defects found
Defects fixed
Defects open
From Manager’s Handbook for Software Development, Revision 1, NASA, Software Engineering Laboratory 1990
Project Evaluation: QualityProject Evaluation: Quality
Reported/Corrected Software Reported/Corrected Software Defects – Actual ProjectDefects – Actual Project
Number of defect reports
(in thousands)
0
1.0
Weeks of testing5 10 15 20 25 30 35 40
0.2
0.4
0.8
0.6
Found
Open
Fixed
Project Evaluation: QualityProject Evaluation: Quality
Defect RateDefect RateExpected Total Defects
010203040506070
0 3 6 9 12 15 18 21 24 27
Months from Start of Project
Defects/Month
Defects/Month
95%
99%
99.9%
Project Evaluation: QualityProject Evaluation: Quality
Statistics on Effort per DefectStatistics on Effort per Defect
Data on time required to fix defects, categorized Data on time required to fix defects, categorized by type of defect, provides a basis for estimating by type of defect, provides a basis for estimating remaining defect correction workremaining defect correction work
Need to collect data on fix time in defect tracking Need to collect data on fix time in defect tracking systemsystem
Data on phases in which defects are injected and Data on phases in which defects are injected and later detected gives you a measure of the later detected gives you a measure of the efficiency of the development process. If 95% of efficiency of the development process. If 95% of the defects are detected in the same phase they the defects are detected in the same phase they were created, the project has an efficient processwere created, the project has an efficient process
Project Evaluation: QualityProject Evaluation: Quality
A Defect Fix Time Model for A Defect Fix Time Model for TestingTesting
From Software Metrics: Establishing a Company-wide Program, by Robert B Grady and Deborah L. Caswell, 1987
25%
50%
20%
4%
1%
2 hours5 hours10 hours20 hours50 hours
Product Characterization: QualityProduct Characterization: Quality
DefectsDefects
Defects are one of the most often used Defects are one of the most often used measures of qualitymeasures of quality
Definitions of defects differDefinitions of defects differ Only items found by customers? Testers?Only items found by customers? Testers? Items found during upstream reviews?Items found during upstream reviews? Only non-trivial items?Only non-trivial items? Small enhancements?Small enhancements?
Timing of “defect” detection an important part Timing of “defect” detection an important part of defect characterizationof defect characterization
A “product defect” may be different than a “process A “product defect” may be different than a “process defect”defect”
Product Evaluation: TestingProduct Evaluation: Testing
System Test ProfileSystem Test Profile
0
20
40
60
80
100
120
140
System Test Phase
Test
s
TestsExecuted
TestsPassed
TestsPlanned
From NASA, Recommended Approach to Software Development, 1992
Product Evaluation: TestingProduct Evaluation: Testing
System Test ProfileSystem Test Profile
0
20
40
60
80
100
120
140
System Test Phase
Test
s
TestsExecuted
TestsPassed
TestsPlanned
From NASA, Recommended Approach to Software Development, 1992
Product Evaluation: TestingProduct Evaluation: Testing
Cumulative Defects Found in Cumulative Defects Found in TestingTesting
Error Rate Model
012345678
Design Code/Test System Test AcceptanceTest
Cum
ulat
ive
Erro
rs p
er K
SLO
C
Historical NormUpper boundLower Bound
From Manager’s Handbook for Software Development, Revision 1, NASA, Software Engineering Laboratory 1990
Product Evaluation: TestingProduct Evaluation: Testing
Cumulative Defects – Actual Cumulative Defects – Actual ProjectProject
Error Rate Model
012345678
Design Code/Test System Test AcceptanceTest
Cum
ulat
ive
Erro
rs p
er K
SLO
C
Historical NormUpper BoundLower BoundActual Project
From Manager’s Handbook for Software Development, Revision 1, NASA, Software Engineering Laboratory 1990
Product PredictionProduct Prediction
Predicting Future Defect RatesPredicting Future Defect Rates
Increasing FactorsIncreasing Factors System sizeSystem size Application Application
complexitycomplexity Compressing the Compressing the
scheduleschedule 4x increase4x increase
More staffMore staff Lower productivityLower productivity
Decreasing FactorsDecreasing Factors Simplifying the Simplifying the
application/problem at application/problem at handhand
Extending the planned Extending the planned development timedevelopment time Cut in halfCut in half
Fewer staffFewer staff Higher productivityHigher productivity
Product PredictionProduct Prediction
Defect Density PredictionDefect Density Prediction
To judge whether we’ve found all the defects for an To judge whether we’ve found all the defects for an application, estimate its defect densityapplication, estimate its defect density
Need statistics on defect density of past similar projectsNeed statistics on defect density of past similar projects Use this data to predict expected density on this projectUse this data to predict expected density on this project For example, if our prior projects had a defect density For example, if our prior projects had a defect density
between 7 and 9.5 defects/KLOC, we expect a similar between 7 and 9.5 defects/KLOC, we expect a similar density on our new projectdensity on our new project
If our new project has 100,000 lines of code, we expect to find If our new project has 100,000 lines of code, we expect to find between 700 and 950 defects totalbetween 700 and 950 defects total
If we’ve found 600 defects so farIf we’ve found 600 defects so far We’re not done: we expect to find between 100 and 350 more We’re not done: we expect to find between 100 and 350 more
defectsdefects
Product PredictionProduct Prediction
Distribution of Software Defect Distribution of Software Defect Origins and SeveritiesOrigins and Severities
Highest severity faults come from Highest severity faults come from requirements and designrequirements and design
Seve
rity
Leve
l
Minor
Mod
Major
Critical
Requirements
Design
Coding
Documentation
Bad Fixes
Product PredictionProduct Prediction
Defect ModelingDefect Modeling
Model the number of defects expected Model the number of defects expected based on past experiencebased on past experience
Model the number of defects in Model the number of defects in requirements, design, construction, etc. requirements, design, construction, etc.
Two approaches:Two approaches: Model defects based on effort hours, i.e X Model defects based on effort hours, i.e X
defects will be introduced per hour workeddefects will be introduced per hour worked Model defects per KSLOC (or other size unit) Model defects per KSLOC (or other size unit)
based on past experience and code growth based on past experience and code growth curve curve
Product PredictionProduct Prediction
Defect Modeling Defect Modeling continuedcontinued
Approach 1: SEI data, based on PSP data:Approach 1: SEI data, based on PSP data: DesignDesign Injected/hour = 1.76Injected/hour = 1.76 CodingCoding Injected/hour = 4.20 Injected/hour = 4.20
Approach 2:Approach 2: Defects / KSLOC total are about 40 (30-85)Defects / KSLOC total are about 40 (30-85)
10% requirements (4/KLOC) 25% design (10/KLOC) 40% coding (16/KLOC) 15% user documentation (6/KLOC) 10% bad fixes (4/KLOC)
Product PredictionProduct Prediction
Predicted and Actual Defects Predicted and Actual Defects FoundFound
0
100
200
300
400
500
600
700
800
Defe
cts
Phase injectionestimate
Phase actual removal
Phase expectedremoval
Cumulative actualremoval
Cumulative injectionestimate
Cumulative expectedRemoval
Cumulative injectionreestimate
Development PhaseFrom Edward F. Weller, Practical Applications of Statistical Process Control, IEEE Software May/June 2000
Size reestimate
Product PredictionProduct Prediction
Defect Profile by Type - Defect Profile by Type - ExampleExample
Sources of defects
Release MeasuresRelease Measures
Defect CountsDefect Counts
Defect counts give a quantitative handle Defect counts give a quantitative handle on how much work the project team still on how much work the project team still has to do before it can release the has to do before it can release the softwaresoftware
Graph the cumulative reported defects, Graph the cumulative reported defects, open defects and fixed defectsopen defects and fixed defects
When the software is nearing release, the When the software is nearing release, the number of open defects should trend number of open defects should trend downward, and the fixed defects should downward, and the fixed defects should be approaching the reported defects linebe approaching the reported defects line
Release MeasuresRelease Measures
Defect Trends – Near ReleaseDefect Trends – Near ReleaseAll DefectsAll Defects
Number of defect reports
(in thousands)
0
1.0
Weeks of testing5 10 15 20 25 30 35 40
0.2
0.4
0.8
0.6
Found
Open
Fixed
Target
Release MeasuresRelease Measures
Defect Trends – Near ReleaseDefect Trends – Near ReleaseSeverity 1 and 2Severity 1 and 2
Number of defect reports
(in thousands)
0
1.0
Weeks of testing5 10 15 20 25 30 35 40
0.2
0.4
0.8
0.6
Found
Open
Fixed
Target
Release MeasuresRelease Measures
Construx Measurable Release Construx Measurable Release CriteriaCriteria
Acceptance testing successfully completedAcceptance testing successfully completed All open change requests dispositionedAll open change requests dispositioned System testing successfully completedSystem testing successfully completed All requirements implemented, based on the specAll requirements implemented, based on the spec All review goals have been metAll review goals have been met Declining defect rates are seenDeclining defect rates are seen Declining change rates are seenDeclining change rates are seen No open Priority A defects exist in the databaseNo open Priority A defects exist in the database Code growth has stabilizedCode growth has stabilized
Release MeasuresRelease Measures
HP Measurable Release HP Measurable Release CriteriaCriteria
Breadth – testing coverage of user Breadth – testing coverage of user accessible and internal functionsaccessible and internal functions
Depth – branch coverage testingDepth – branch coverage testing Reliability – continuous hours of operation Reliability – continuous hours of operation
under stress; stability; ability to recover under stress; stability; ability to recover gracefully from defect conditionsgracefully from defect conditions
Remaining defect density at releaseRemaining defect density at release
From Robert B Grady, Practical Software Metrics for Project Management and Process Improvement, 1992
Release MeasuresRelease Measures
Post Release Defect Density by Post Release Defect Density by Whether Met Release CriteriaWhether Met Release Criteria
Postrelease incoming defects submitted by customers (3 month moving average)
MR 1 2 3 4 5 6 7 8 9 10 11 12
Months
Defects submitted
(normalized by KLOC)
Did NotMeet
Worst ProductThat Met
Average ofProductsThat Met
From Practical Software Metrics for Project Management and Process Improvement, by Robert B. Grady 1992
Release Measures: Defect CountsRelease Measures: Defect Counts
Defect Plot Before ReleaseDefect Plot Before Release
0
2
4
6
8
10
12
Time
Num
ber o
f Def
ects
Sev 1 & 2Sev 2Sev 1Target
From Robert B Grady, Practical Software Metrics for Project Management and Process Improvement, 1992
Detection EffectivenessDetection Effectiveness
0
10
20
30
40
50
60
70
80
90
100
Des
ign
Che
ck
Des
ign
Rev
iew
Des
ign
Insp
ectio
n
Cod
e In
spec
tion
Prot
otyp
e
Cod
e C
heck
Uni
t Tes
t
Func
tiona
l Tes
t
Inte
grat
ion
Test
Fiel
d Tr
ial
Cum
ulat
ive
HighestModalLowest
[Jones86]
Process EvaluationProcess Evaluation
Status ModelStatus Model
Units created
Units reviewed
Units tested
Process EvaluationProcess Evaluation
Status ExampleStatus Example
0100200300400500600700800
Implementation Phase
Uni
ts
TargetUnits CreatedUnits ReviewedUnits Tested
1
From NASA, Manager’s Handbook for Software Development, Revision 1, 1990
Goal #1 – Improve Software Quality Goal #1 – Improve Software Quality
Postrelease Discovered Defect Postrelease Discovered Defect DensityDensity
00.10.20.30.40.50.60.70.80.9
1
Nov-84 Mar-86 Aug-87 Dec-88 May-90 Sep-91 Jan-93Num
ber o
f Ope
n Se
rious
and
Crit
ical
D
efec
t Rep
orts
Older
< 12Months
10X Goal
From Practical Software Metrics for Project Management and Process Improvement, by Robert B. Grady 1992
Goal #1 – Improve Software Quality Goal #1 – Improve Software Quality
Prerelease Defect DensityPrerelease Defect Density Question: How can we predict software quality based on Question: How can we predict software quality based on early development processes?early development processes?
0
10
20
30
40
50
60
70
80
Oct-80 Feb-82 Jul-83 Nov-84 Mar-86 Aug-87 Dec-88
Project Release Date
Def
ects
in T
est/K
LOC Defects in
Test/KLOC
Linear(Defects inTest/KLOC)
From Practical Software Metrics for Project Management and Process Improvement, by Robert B. Grady 1992
Goal #3 – Improve Productivity Goal #3 – Improve Productivity
Defect Repair EfficiencyDefect Repair EfficiencyQuestion: How efficient are defect-fixing activities? Are we Question: How efficient are defect-fixing activities? Are we improving?improving?
00.5
11.5
22.5
33.5
44.5
5
1987 1988 1989 1990 1991
Def
ects
Fix
ed/E
ngr.
Mon
th
DefectsFixed /EngrMonth
From Practical Software Metrics for Project Management and Process Improvement, by Robert B. Grady 1992
Goal #4 – Maximize Customer Satisfaction Goal #4 – Maximize Customer Satisfaction
Mean Time to Fix Critical and Mean Time to Fix Critical and Serious DefectsSerious DefectsQuestion: How long does it take to fix a problem?Question: How long does it take to fix a problem?
0
50
100
150
200
250
7/18
/199
0
8/18
/199
0
9/18
/199
0
10/1
8/19
90
11/1
8/19
90
12/1
8/19
90
1/18
/199
1
2/18
/199
1
3/18
/199
1
4/18
/199
1
5/18
/199
1
6/18
/199
1
7/18
/199
1
8/18
/199
1
Days
ARQAKP+ADLCMR
From Practical Software Metrics for Project Management and Process Improvement, by Robert B. Grady 1992
AR = Awaiting release
QA = Final QA testing
KP = known problem
AD = awaiting data
LC = lab classification
MR = marketing review