Risk and Testing (2003)

Thompson informationSystemsConsulting Limited

©

Risk and Testingextended after session

Presentation to Testing Master Class 27/28 March 2003

Neil Thompson

Thompson information Systems Consulting Limited

www.TiSCL.com23 Oast House Crescent

Farnham, SurreyGU9 0NP

England (UK)Phone & fax 01252 726900

[email protected] phone 07000 NeilTh

(634584)Direct fax 07000 NeilTF

(634583)

Some slides included with permission from and Paul Gerrard


©Neil Thompson: Risk and Testing 27/28 Mar

2003

Agenda

• 1. What testers mean by risk• 2. “Traditional” use of risk in testing• 3. More recent contributions to thinking• 4. Risk-Based Testing: Paul Gerrard (and I)• 5. Next steps in RBT:

– end-to-end risk data model; – automation

• 6. Refinements and ideas for future

Slide 1 of 51



2003

1. What testers mean by risk• Risk that software in live use will fail:

– software: could be Commercial Off The Shelf; packages such as ERP; bespoke project; integrated programmes of multiple systems...; industry-wide supply chains; any systems product

• Could include risk that later stages (higher levels) of testing will be excessively disrupted by failures

• Chain of risks:Error:mistake made by human(eg spec-writing,program-coding)

Fault:something wrong in a product(interim eg spec,final eg executable software)

Failure:deviation of product from its expected* delivery or service(doesn’t do what it should,or does what it shouldn’t)

RISK RISKRISK

not all errorsresult in faults not all faults

result in failures

Slide 2 of 51

* “expected” may be as in spec,or spec may be wrong (verification & validation)



2003

Extra links (4,5) in the chain of risks?• Steve Allott:

– distinguish faults in specifications from faults in the resulting program code

Error Mistake:a human action that produces an incorrectresult (eg in spec-writing, program-coding)

Fault:an incorrect step, process or data definition in a computer program (ie executable software)

Failure:an incorrect resultRISK RISKRISK

Slide 2.x1 of 51

Error: amount by whichresult is incorrect

(undefined): incorrect resultsin specifications

RISK

• Ed Kit (1995 book, Software Testing in the Real World):– “error” reserved for its scientific use (but not always very useful in software testing?)

Link 1:mistake made by human

Link 2:something wrong in a spec

Link 4:deviation of product from its expected delivery or service(doesn’t do what it should,or does what it shouldn’t)

RISK RISKRISK Link 3:something wrong in programcode

RISK

Could use Defect for this?



2003

• Adding in a distinction by John Musa (1998 book, Software Reliability Engineering):– not all deviations are failures (but this is just the “anomaly” concept?)– (so the associated risks are in the testing process rather than development: that

an anomaly may not be noticed, or may be misinterpreted)

• A possible hybrid of all sources:

Chain of risks could be up to 6 links?

Mistake:a human action that produces an incorrectresult (eg in spec-writing, program-coding)

Fault:an incorrect step, process or data definition in a computer program (ie executable software)

Failure:an incorrect result

RISK

RISKRISK

Slide 2.x2 of 51

Error: amount by whichresult is incorrect

Defect: incorrect resultsin specifications

RISK

Note: this fits its usagein inspections

Direct programming mistake

RISK

(false alarm): or Change Request,or testware mistakeAnomaly:

an unexpected resultduring testing

RISK OF MISSING

RISK OF MIS-INTERPRETING



2003

Three types of software risk

Project Riskresource constraints, external interfaces,

supplier relationships, contract restrictions

Process Riskvariances in planning and

estimation, shortfalls in staffing, failure to track progress, lack of quality

assurance and configuration management

Primarily a management responsibility

Planning and the development process are the main issues here.

Product Risklack of requirements stability, complexity,

design quality, coding quality, non-functional

issues, test specifications.

Requirements risks are the most significant risks reported in risk assessments.

Testers are mainly

concerned with Product Risk

Slide 3 of 51



2003

Risk management components

CONSEQUENCE OFEACH BAD THING

WHICH COULD HAPPEN

BAD THINGS WHICHCOULD HAPPEN, ANDPROBABILITY OF EACH

ENSW

Slide 4 of 51



2003

Symmetric view of risk probability & consequence

321 2 3

4 66 9

risk EXPOSURE =probability x consequence

PROBABILITY(likelihood)

of bad thingoccurring

CONSEQUENCE (impact)if bad thingdoes occur

Slide 5 of 51

• This is how most people quantify risk• Adding gives same rank as multiplying, but

less differentiation



2003

Risk management components

CONSEQUENCE OFEACH BAD THING

WHICH COULD HAPPEN

BAD THINGS WHICHCOULD HAPPEN, ANDPROBABILITY OF EACH

ENSW

Slide 4r of 51

any other dimensions?



2003

Do risks have any other dimensions?

• In addition to probability and consequence...• Undetectability:

– difficulty of seeing a bad thing if it does happen– eg insidious database corruption

• Urgency: – advisability of looking for / preventing some bad

things before other bad things– eg lack of requirements stability

• Any others?

Slide 6 of 51



2003

2. “Traditional” use of risk in testing

• Few authors & trainers in software testing now miss an opportunity to link testing to “risk”

• In recent years, almost a mandatory mantra• But it isn’t new (what will be new is translating

the mantra into everyday doings!)• Let’s look at the major “traditional” authors:

– Hetzel– Myers– Beizer– others

Slide 7 of 51



2003

Who wrote this and when?• “Part of the art of testing is to know when to stop

testing”:– some recent visionary / pragmatist?– Myers? His eponymous 1979 book was the first on

testing?– No, No and No!– Fred Gruenberger in the original testing book, Program

Test Methods 1972-3, Ed. William C. Hetzel• Also in this book (which is the proceedings of first-

ever testing conference)...Slide 8 of 51



2003

Hetzel (Ed.) 1972-3• Little / nothing explicitly on risk, but:

– reliability as a factor in quality; inability to cope with complexity of systems– “the probability of being faulty is great”p255 (Jean-Claude Rault, CRL France)...– “how to run the test for a given probability of error... number of random input

combinations before... considered ‘good’”p258; sampling as a principle of testing

• Interestingly:– “sampling as a principle should decrease in importance and be replaced by

hierarchical organization & logical reduction”p28 (William C. Hetzel)

• Other curiosities:– ?source of Myers’ triangle exercise p13 (ref.

Dr. Richard Hamming, “Computers and Society”)– the first “V-model”? p172 Outside-in design, inside-out testing

(Allan L. Scherr, IBM Poughkeepsie NY / his colleagues)

MODULE TESTCOMPONENT TEST

SYSTEM TESTCUSTOMER USE

MODULE DESIGNCOMPON’T DESIGN

SYSTEM DESIGN

Slide 9 of 51



2003

Myers 1976: Software Reliability - Principles & Practices

• Again, “risk” not explicit, but principles are there:– “reliability must be stated as a function of the severity of errors as well as their

frequency”; “software reliability is the probability that the software will execute for a period of time without a failure, weighted by the cost to the user of each failure”; “probability that a user will not enter a particular set of inputs that leads to a failure”p7

– “if there is reason to believe that this set of test cases had a high probability of uncovering all possible errors, then the tests have established some confidence in the program’s correctness”; “each test case used should provide should provide a maximum yield on our investment... the probability that the test case will expose a previously undetected error”p170, 176

– “if a reasonable estimate of [the number of remaining errors in a program] were available during the testing stages, it would help to determine when to stop testing”p329

– hazard function as a component of reliability modelsp330

Slide 10 of 51



2003

Myers 1979: The Art of Software Testing• Risk is still not in the index, but more principles:

– “the earlier that errors are found, the lower are the costs of correcting... and the higher is the probability of correcting the errors correctly”p18

– “what subset of all possible test cases has the highest probability of detecting the most errors”p36

– tries to base completion criteria for each phase of testing on an estimate of the number of errors originating in particular design processes, and during what testing phases these errors are likely to be detectedp124

– testing adds value by increasing reliabilityp5

– revisits / updates the reliability models outlined in his 1976 book:• those related to hardware reliability theory (reliability growth, Bayesian, Markov,

tailored per program)• error seeding, statistical sampling theory• simple intuitive (parallel independent testers, historic error data)• complexity-based (composite design, code properties)

Slide 11 of 51



2003

Hetzel (1984)-8: The Complete Guide to Software Testing

• Risk appears only once in the index, but is prominent:– Testing principle #4p24: Testing Is Risk-Based

• amount of testing depends on risk of failure, or of missing a defect; so...• use risk to decide number of cases, amount of emphasis, time & resources

• Other principles appear:– testing measures software quality; want maximum confidence per unit cost via

maximum probability of finding defectsp255

– objectives of Testing In The Large include:p123

• are major failures unlikely?• what level of quality is good enough?• what amount of implementation risk is acceptable?

– System Testing should end when we have enough confidence that Acceptance Testing is ready to startp134

Slide 12 of 51



2003

Beizer 1984: Software System Testing & Quality Assurance

• Risk appears twice in index, but both insignificant• However, some relevant principles are to be found:

– smartness in software production is ability to avoid past, present & future bugsp2 (and bwgs?)

– now more than a dozen models/variations in software reliability theory: but all far from reality; and all far from providing simple, pragmatic tools that can be used to measure software developmentp292-293

– six specific criticisms: but if a theory were to overcome these then it would probably be too complicated to be practicalp293-294

– a compromise may be possible in future, but instead for now, suggest go-live when the system is considered to be useful, or at least sufficiently useful to permit the risk of failurep295

– plotting and extrapolation of S-curves to assess when this point attainedp295-304

Slide 13 of 51



2003

Beizer (1983)-90: Software Testing Techniques

• “Risk” word is indexed as though deliberate:– a couple of occurrences are insignificant, but others:

• purpose of testing is not to prove anything but to reduce perceived risk [of software not working] to an acceptable value (penultimate phase of attitude)

• testing not an act; is a mental discipline which results in low-risk software without much testing effort (ultimate phase of attitude) p4

• accepting principles of statistical quality control (but perhaps not yet implementing, because is not yet obvious how to, and in the case of small products, is dangerous)p6

• add test cases for transactions with high risksp135

• we risk release when confidence is high enoughp6

• Other occurrences of key principles, including:– probability of failure due to hibernating bwgs* low enough to acceptp26

– importance of a bwg* depends on frequency, correction cost, [fix] installation cost & consequencesp27

*bwg: ghost, spectre, bogey, hobgoblin, spirit of the night,any imaginary (?) thing that frightens a person (Welsh) Slide 14 of 51



2003

Others• The “traditional” period could be said to cover the

1970s and 1980s. A variety of views can be found:– Edward Miller 1978, in Software Testing & Validation

Techniques (IEEE Tutorial):• “except under very special situations [...], it is important to recognise that program

testing, if performed systematically, can serve to guarantee the absence of bugs”p4

• and/but(?) “a program is well tested when the program tester has an adequately high level of confidence that there are no remaining “errors” that further testing would uncover”p9 (italics by Neil Thompson!)

– DeMillo, McCracken, Martin & Passafiume 1987: Software Testing & Evaluation

• “a technologically sound approach to testing will incorporate... evaluations of software status into overall assessments of risk associated with the development and eventual fielding of the system”p vii

Slide 15 of 51



2003

3. More recent contributions to (risk use) thinking

• Traditional basis of testing on risk (although more perceptive than some give credit for) is less than satisfactory because:– it tends to be “lip-service”, with no follow-through / practical application– if there is follow-through, it involves merely using risk analysis as part of the

Testing Strategy (then that is shelved, and it’s “heads down” from then on?)

• Contributions more recently from (for example):– Ed Kit (Software Testing in the real world, 1995)– Testing Maturity Model (Illinois Institute of Technology)– Test Process Improvement® (Tim Koomen & Martin Pol)– Testing Organisation MaturityTM questionnaire (Systeme Evolutif)– Hans Schaefer’s work– Zen and the art of Object-Oriented Risk Management (Neil Thompson)

Slide 16x of 51



2003

Kit 1995: Software Testing in the real world

Slide 16.x1 of 51

• Error-fault-failure chain extendedp18

• Clear statements on risk and risk managementp26:– test the parts of the system whose failures would have

most serious consequencesp27

– frequent-use areas increase chances of failure foundp27

– focus on parts most likely to have errors in themp27

– risk is not only basis for test management decisions, is basis for everyday test practitioner decisionsp28

• Risk management used in integration testp95



2003

Testing Maturity Model• Five levels of increasing maturity, based loosely on decades of testing evolution

(eg in 1950s testing not even distinguished from debugging)

• Maturity goals and process areas for the five levels do not include risk explicitly, although emphasis moves from tactical to strategic (eg fault detection to prevention):– in level 1, software released without adequate visibility of quality & risks– in level 3, test strategy is determined using risk management techniques– in level 4, software products are evaluated using quality criteria (relation to risk?)– in level 5, costs & test effectiveness are continually improved (sampling quality)

• Strongly recommended are key practices & subpractices:– (not yet available?)

• Little explicitly visible on risk; very process-oriented

Slide 17 of 51



2003

Test Process Improvement®

• Only one entry in index:– risks and recommendations, substantiated with metrics (as part of

Reporting)• risks indicated with regard to (parts of) the tested object• risks can be (eg) delays, quality shortfalls

• But risks incorporated to some extent, eg:– Test Strategy (differentiation in test depth depending on risks)

Slide 18 of 51



2003

Testing Organisation Maturity (TOMTM) questionnaire

• Risk assessment is not only at the beginning:– when development slips, a risk assessment is

conducted and a decision to squeeze, maintain or extend test time may be made

– [for] tests that are descoped, the associated risks are identified & understood

Slide 19 of 51



2003

Hans Schaefer’s work• Squeeze on testing prioritise based on risk• Consider possibility of stepwise release: :

– test most important functions first– look for functions which can be delayed

• What is “important” in the potential release (key functions, worst problems?)– visibility (of function / characteristic)– frequency of use– possible cost of failure

• Where likely to be most problems?– project history (new technology, methods, tools; numerous people, dispersed)– product measures (areas complex, changed, needing optimising, faulty before)

Slide 20 of 51



2003

Zen and the Art of Object-Oriented Risk Management

• Testing continues to lag development; but pessimism / delay was currently unacceptable (deregulation, Euro, Y2k)

• Could use OO concepts to help testing:– encapsulate risk information with tests (for effectiveness); and/or– inherit tests to reuse (efficiency)

• Basic risk management:– relationship to V-model (outline level)– detail level: test specification

Slide 21 of 51



2003

Risk management & the V-model

Acceptancetesting

Systemtesting

Integration testing

Unit testing

Risks that system(s) haveundetected defects

in them

Slide 22a of 51



2003

Risk management & the V-model

Acceptancetesting

Systemtesting

Integration testing

Unit testing

Risks that system(s) haveundetected defects

in them

Risks that system(s) and business(es)are not right and ready

for each other

Slide 22b of 51



2003

Where and how testing manages risks:first, at outline level

Level Risks

Acceptancetesting

Service requirements Undetected errors damage

business

Systemtesting

System specification Undetected errors waste user time

& damage confidence inAcceptance testing

Integrationtesting

Interfaces don’t match Undetected errors too late to fix

Unittesting

Units don’t work right Undetected errors won’t be found

by later tests

Slide 23a of 51



2003

Where and how testing manages risks:first, at outline level

Level Risks How managed

Acceptancetesting

Service requirements Undetected errors damage

business

Specify user-wanted tests against URS Script tests around user guide and user

& operator training materials

Systemtesting

System specification Undetected errors waste user time

& damage confidence inAcceptance testing

Use independent testers, functional &technical, to get fresh view

Take last opportunity to do automatedstress testing before env’ts re-used

Integrationtesting

Interfaces don’t match Undetected errors too late to fix

Use skills of designers before they moveaway

Take last opportunity to exerciseinterfaces singly

Unittesting

Units don’t work right Undetected errors won’t be found

by later tests

Use detailed knowledge of developersbefore they forget

Take last opportunity to exercise everyerror message

Slide 23b of 51



2003

Second, at detail level:risk management during test specification

• To help decision-making during the “squeezing of testing”, it would be useful to have recorded explicitly as part of the specification of each test:– the type of risk the set of tests is designed to minimise– any specific risks at which a particular test or tests is aimed

• And this was one of the inputs to...

Test specification based on total magnitude of risks for all defects imaginable

x= ( )Estimated probability of defectoccurring

Estimated severity of defect

Slide 24 of 51



2003

4. Risk-Based (E-Business) Testing• Advert!

– Artech House, 2002– ISBN 1-58053-314-0– reviews amazon.com & co.uk

• companion websitewww.riskbasedtesting.com– sample chapters– Master Test Planning template– comments from readers

(reviews, corrections)Slide 25 of 51

With acknowledgementsto lead author Paul Gerrard



2003

Risk-Based E-Business Testing: main themes

• Can define approximately 100 “product” risks threatening a typical e-business system and its implementation and maintenance

• Test objectives can be derived almost directly as “inverse” of risks• Usable reliability models are some way off (perhaps even

unattainable?) so better for now to work on basis of stakeholders’ perceptions of risk

• Lists & explains techniques appropriate to each risk type• Includes information on commercial and DIY tools• Final chapters are on “making it happen”• Go-live decision-making: when benefits “now” exceed risks “now”• Written for e-business but principles are portable; extended to wider

tutorial for EuroSTAR 2002; following slides summarise key points

With acknowledgementsto lead author Paul GerrardSlide 26 of 51



2003

Risks and test objectives - examples

Risk Test Objective

The web site fails to functioncorrectly on the user’s clientoperating system and browserconfiguration.

To demonstrate that the application functionscorrectly on selected combinations ofoperating systems and browser versioncombinations.

Bank statement detailspresented in the clientbrowser do not match recordsin the back-end legacybanking systems.

To demonstrate that statement detailspresented in the client browser reconcile withback-end legacy systems.

Vulnerabilities that hackerscould exploit exist in the website networking infrastructure.

To demonstrate through audit, scanning andethical hacking that there are no securityvulnerabilities in the web site networkinginfrastructure.

Slide 27 of 51



2003

Generic test objectives

Test Objective Typical Test StageDemonstrate component meets requirements Component TestingDemonstrate component is ready for reuse in largersub-system

Component Testing

Demonstrate integrated components correctlyassembled/combined and collaborate

Integration testing

Demonstrate system meets functional requirements Functional SystemTesting

Demonstrate system meets non-functional requirements Non-Functional SystemTesting

Demonstrate system meets industry regulationrequirements

System or AcceptanceTesting

Demonstrate supplier meets contractual obligations (Contract) AcceptanceTesting

Validate system meets business or user requirements (User) AcceptanceTesting

Demonstrate system, processes and people meetbusiness requirements

(User) AcceptanceTesting

Slide 28 of 51



2003

Master test planning

Slide 28.x1 of 51



2003

Test Plans for each testing stage (example)

Risk F1

Requirement F1

Risk F2Risk F3Risk F4Risk U1Risk U2Risk U3Risk S1Risk S2

Requirement F2Requirement F3Requirement N1Requirement N2

eg for System Testing:

Generic test objective G4

Generic test objective G5

Test objective F1Test objective F2Test objective F3Test objective F4Test objective U1Test objective U2Test objective U3Test objective S1Test objective S2

Test 1 2 3 4 5 6 7 8 9 10 11 ...

Test importance H L M H H H M M M L L ...

125

10

60

32

12

25

30

100

40

Slide 29 of 51



2003

Test Design: target execution schedule (example)

TEAMSENVIRONMENTS TESTERS

1

5

6

32

7

9

10

Test execution days 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 ...

4 8

Retests &

regression tests

11

Earliest completion date

Comfortable completion date

Partition for functional tests

Partition for disruptive non-functional tests

Balance &transactionreporting

End-to-endcustomerscenarios

Inter-accounttransfers

PaymentsDirect debits

Slide 30 of 51



2003

Plan to manage risk (and scope) during test specification & execution

RiskQuality

TimeTimeCost

Scope

Time

CostScope

Quality

ScopeCost

best pair tofine-tune

Quality

Slide 31x of 51



2003

Risk as the “inverse” of quality

Risk

TimeScope

Slide 31.x1 of 51

Scope Time

Quality



2003

Managing risk during test specification

Risk

TimeScope

increasingrisk of faults

introduced

decreasingrisk asfaults found

initial scope set by requirements

target go-live dateset in advance

Slide 31.x2 of 51



2003

Verification & validation as risk management methods

Slide 31.x3 of 51



2003

Risk management during test specification: “micro-risks”

RISKS TOTEST AGAINST

Estimatedprobability xTest specification:

• for all defects imaginable...

Estimatedconsequence =

Slide 32ax of 51



2003

Risk management clarifies during test execution


Estimatedprobability x

x

Test specification:• for all defects imaginable...

Test execution:• for each defect detected

Probability= 1

Estimatedconsequence

Consequence =f (urgency,importance)

=

Slide 32bx of 51



2003

But clarity during test execution is only close-range: fog ahead!

REMAININGRISKS


}


x

x


{Test execution:• for each failure detected…

• for all anomalies as yet undiscovered...

Probability= 1

Estimatedprobability




=

=

Slide 32cx of 51



2003

Managing risk during test execution: when has risk got low enough?

Risk

TimeScope


introduced




Slide 32.x1 of 51



2003

Pragmatic approximation to risk reduction: progress through “test+fix”

Slide 32.x2 of 51

Fail

Pass

Target tests run

Actual tests run

# tests

date

Closed

Deferred

ResolvedAwaitingfix

# anomalies

date

Cumulative anomalies

• Cumulative S-curves are good because they:– show several things at once– facilitate extrapolation of trends– are based on acknowledged

theory and empirical data...



2003

Cumulative S-curves: theoretical basis

Slide 32.x3 of 51

Tests run

cumulative #

date

Anomalies found

Early tests blocked by hi-impact anomalies Middle tests fast and productive

Much less than 1 anomaly per test

go-live datedate

(potential)# failuresper day

depending onoperational

profile

Hardware

Software reliability growth model

Above curves based on Myers 1976: Software Reliability: Principles & Practices p10

Software

More than 1 anomaly per testMuch more than 1

anomaly per test, butonly one visible at a time! Late tests slower because

awaiting difficult and/or lo-priority fixes



2003

A possible variation on thesoftware reliability growth model

Slide 32.x4 of 51

date

potential# failuresper day Software

go-live date

failures

failures

TEST &RETEST

CHECKAGAINSTEXPECTEDRESULTS

FIX

DIAGNOSE

failures

failures

TEST &RETEST

HAVE ATHINK ANDTALK TO FOLKS

HACK

STROKE CHIN

Good testing & maintenance: convergence on stability

Bad testing & maintenance: divergence into instability

Good maintenance

Bad maintenance

Possibility of knock-on errors included inLittlewood & Verrall 1973: A Bayesian Reliability Growth Modelfor Computer Software (in IEEE Symposium)



2003

Cumulative S-curves: more theory

Slide 32.x5 of 51

Tests run

cumulative #

date

Anomalies found

Pattern of fault discovery Dunn & Ullman 1982 p61

go-live date

Tests run

# per day

Anomalies found

go-live date date

Actually thereare several reliability growth models, but:• the Rayleigh model

is part of hardware reliability methodologyand has been used successfully in software reliabilityduring developmentand testing• its curve produces the S-curve whenaccumulated



2003

Reliability theory more generally

Slide 32.x6 of 51

# failuresper “t”

Rayleigh (m=2)

Hardware: Poisson distribution Myers 1976

Software: exponential decay Myers 1976

# failuresper “t”

execution time Σt

• NB: models tend to use execution timerather than elapsed time (because removes test distortion, and uses operational profile ie how often used live)

execution time Σt

Both of our software curves are members of the Weibull distribution:• has been used for

decades in hardware: Kan 1995 p179• two single-parameter cases applied to software by Wagoner in 1970s: Dunn & Ullman 1982 p318

Exponential (m=1)

• shape parameter “m” can take various values (only 1 & 2 shown here): Kan 1995 p180



2003

Reliability models taxonomy (Beizer, Dunn & Ullman, + Musa)

future addition to slide pack

Slide 32.x7 of 51



2003

Reliability: problems with theories(Beizer, 1984)

Slide 32.x8 of 51

• Main problem is getting theories to match reality• Several acknowledged shortcomings of many theories, eg:

– don’t evaluate consequence (severity) of anomalies– assume testing is like live (eg relatively few special cases)– don’t correct properly for stress-test effects, or code enhancements– don’t consider interactions between faults– don’t allow for debugging getting harder over time

• The science is moving on eg Wiley, Journal of Software Testing, Verification & Reliability but:– a reliability theory that satisfied all the above would be complex– would project managers use it, or would they go live anyway?

• So until these are resolved, let’s turn to empirical data...



2003

Cumulative S-curves: empirical data

Slide 32.x9 of 51

Tests run

#

date

Anomalies found

Observed empirically that faults plot takes characteristic shape Hetzel 1988 p210

go-live date

S-curve also visible in Kit 1995: Software Testing in the Real World p135

Possible to use to roughly gauge test time or faults remaining Hetzel 1988 p210

The Japanese “Project Bankruptcy” study: Abe, Sakamura & Aiso 1979, in Beizer 1984

• analysed 23 projects, including application software & system software developments• included new code, modifications to existing code, and combinations• remarkable similarity across all projects for shape of test completion curve• anomaly detection rates not significant (eg low could mean good software or bad testing)• significant were (a) length of initial slow progress, and (b) shape of anomaly detection curve...



2003

Project bankruptcy study, summarised by Beizer

Slide 32.x10 of 51

Tests run

cumulative#

date

Anomalies found

planned go-live date

All the projects had 3 phases: Beizer 1984: Software System Testing & Quality Assurance, p297-300

start of integration stage

Phase I............II...............III 100%

(a) Duration of phases was primary indicator of project success or “bankruptcy”:

Ph I..15%................55%................97%

Ph I+II...............................72%.......................................126%success

failure inevitable“bankrupt”

(b) A secondary indicator was anomaly detection rate

Derivative(looks likeRayleighcurve)

failure

inevitablesuccess

“bankrupt”rate



2003

S-curves & reliability: summary of references

Slide 32.x11 of 51

• Many of the original references are very old:– illustrates the points that (a) the earliest testers seemed to be

“advanced” even at that outset, and (b) software reliability seems not to have penetrated mainstream testing 30 years on!

– but: means these books & papers hard to obtain, so...• Recommended “recent” references:

– Boris Beizer 1984 book: Software System Testing & Quality Assurance (ISBN 0-442-21306-9)

– Stephen H. Kan 1995 book: Metrics & Models in Software Quality Engineering (ISBN 0-201-63339-6)

– John Musa 1999 book: Software Reliability Engineering (ISBN 0-07-913271-5)



2003

Fail

Pass

Progress through tests• We are interested in two main aspects:

– can we manage the test execution to get complete before target date?

– if not, can we do it for those tests of high (and medium?) importance?

High

MediumLow

Target tests run

Actual tests run

# tests

date

# tests

Target tests passed

date

Actual testspassed

Slide 33 of 51



2003

Progress through anomaly fixing and retesting

• Similarly, two main aspects:– can we manage the workflow to get anomalies fixed and

retested before target date?– if not, can we do it for those of material impact?

Closed

Deferred

ResolvedAwaitingfix

# anomalies

date

# anomalies

date

Cumulative anomaliesOutstandingmaterial impact

Slide 34x of 51



2003

Quantitative and qualitative risk reduction from tests and retests

PROGRESSTHROUGHINCIDENTFIXING& RETESTING

PROGRESSTHROUGHTESTS

PROGRESS &RESIDUAL RISKUP RIGHT SIDEOF W-MODEL

Large-Scale Integration Testing

SystemTesting

AcceptanceTesting

H ML Fail

Pass

Materialimpact

Awaiting fixResolved

DeferredClosed

Awaiting fix Materialimpact

Slide 35 of 51



2003

“complete”firstrun

Physicalconstraints(environ-mentsetc.)

…and also into regression testing

Specificationof tests

Executionof tests

Retesting &regression testing

Testing Strategy

Testing Plan

Test Design

Test Scripts

Execution: First run

Execution: Retest & 2nd run

Regression Testing

...

Difficulties,squeeze onscripting &execution time

Time & resource con-straints

Desired coverage

What’s left for:

second run

allowance for further runs

Slide 36 of 51



2003

Managing risk during test execution, against “fixed” scope & time

Risk

TimeScope


introduced




Slide 37x of 51



2003

From what we can report, to what we would like to report

Slide 37.1x of 51



2003

Risk-based reporting

Progress through the test plan

Plannedendstart

all risks ‘open’ at the start

Slide 38a of 51



2003

Risk-based reporting

Progress through the test plan

today Plannedend

residual risks of

releasing TODAY

Resid

ual R

isks

start

Slide 38b of 51



2003

Report not only risk, but also scope, over time

Risk

TimeScope


introduced




Slide 38.1x of 51

how muchscope safelydeliveredso far?



2003

OpenClosed

Risk

s

Open

OpenClosedClosed

Open

Obje

ctiv

e

Obje

ctiv

e

Bloc

ked

NFR

1Ob

ject

ive

Bene

fitBe

nefit

Bene

fit

Bloc

ked

func

1Bl

ocke

d fu

nc 2

Project objectives, hence benefits, available for release

Obje

ctiv

e

Bene

fitClosed

Slide 39ax of 51

Benefit & objectives based test reporting

Bene

fit

Obje

ctiv

e



2003

OpenClosed

Risk

s

Open

OpenClosedClosed

Open

Obje

ctiv

e

Obje

ctiv

e

Obje

ctiv

eOb

ject

ive

Bene

fitBe

nefit

Bene

fit

Bene

fitBe

nefit

Obje

ctiv

e

Bene

fitClosed

Slide 39bx of 51

Benefit & objectives based test reporting

Project objectives, hence benefits, available for release



2003

Slippages and trade-offs: an example

• If Test Review Boards recommend delay, management may demand a trade-off, “slip in a little of that descoped functionality”

scope

date

firstslip

originaltarget

Slide 40a of 51



2003

Slippages and trade-offs: an example

• If Test Review Boards recommend delay, management may demand a trade-off, “slip in a little of that descoped functionality”

• This adds benefits but also new risks: more delay?

scope

date

actualgo-live

firstslip

originaltarget

Slide 40b of 51



2003

Tolerable risk-benefit balance: another example

• Even if we resist temptation to trade off slippage against scope, may still need to renegotiate the tolerable level of risk balanced against benefits

(risk -benefits)

date

originaltarget

date

original target net risk

Slide 41a of 51



2003

Tolerable risk-benefit balance: another example

• Even if we resist temptation to trade off slippage against scope, may still need to renegotiate the tolerable level of risk balanced against benefits

(risk -benefits)

date

actualgo-live

originaltarget

date“go for

it”margin

original target net risk

Slide 41b of 51



2003

5. Next steps in Risk-Based Testing• End-to-end risk data model:

– is wanted to keep risk information linked to tests– is under development by Paul Gerrard– main reason is of course...

• Automation:– clerical and spreadsheets really not enough– want to keep risk to test link up-to-date through

descopes, reprioritisations etc– Paul is working on an Access database for now– any vendors with risk in their data models?

Slide 42 of 51With acknowledgementsto lead author Paul Gerrard



2003

6. Refinements and ideas for future• Although almost universal, the simple multiplication of probability x

consequence can be troublingly over-simple: it might descope testing for huge-impact risks which are very unlikely (avionics errors?!). So use an asymmetric view?

• Some risks are from technology, and other risks are business risks, to use of system. So distinguish “cause” risks from “effect” risks?

• Assessing perception of risks is a start, but can metrics give better quantification? Metrics & Fault source analysis

• Reliability models a key part of testing theory in the 1970s, but still not credibly usable? Reliability engineering

• Wider theoretical basis distinguishing risk from uncertainty: Decision theory

Slide 43 of 51



2003

Asymmetric view of risk probability & consequence

321 7

5 86 9

risk EXPOSURE =• very high (red)• high (orange)• high-ish (yellow)• moderate (pale yellow)• low (pale green)• very low (green)

PROBABILITY(likelihood)

CONSEQUENCE(impact)

4

Slide 44 of 51

• This is an arbitrary scheme; others are possible, of course



2003

Distinguishing “cause” risk from technology & “effect” risk to business)

exposure

probability

consequence

RISKS TOBUSINESS(from faultydocumentsor software)

RISKS FROMTECHNOLOGY(of faults being madein documentsor software)

• Each could be symmetric or asymmetric

• Could weight: • business risks higher

for Acceptance Testing

• technology risks higher for Unit & Integration Testing

• equal for System Testing

Slide 45 of 51



2003

But what about next time: metrics?

REMAININGRISKS


}


x

x




Probability= 1





=

=

Slide 46a of 51



2003

How metrics can help

Risks to business:defects reported, fixed & retested by importance

Risks to testing progress:defects reported, fixed & retested by urgency

Number

Low Medium High

High

Number

Low Medium

Slide 47a of 51



2003


REMAININGRISKS


}


x

x




Probability= 1





=

=

Slide 46b of 51



2003


Testing progress:tests executed by each day of testing for a level (cumulative)

Testing productivity:defects found by each day of testing for a level (cumulative)

Time

Time

Number

Number



Number

Low Medium High

High

Number

Low Medium

Slide 47b of 51



2003


REMAININGRISKS


}


x

x




Probability= 1





=

=

Slide 46c of 51



2003


Testing progress:tests executed by each day of testing for a level (cumulative)

Testing productivity:faults found by each day of testing for a level (cumulative)

Time

Time

Number

Number



Number

Low Medium High

High

Number

Low Medium

Fault source analysis:source of faults by testing level detecting them

Unit Integration System Acceptance

Coding

Design

Systemspec-ification

Require-ments

Slide 47c of 51



2003

Fault Source Analysis

UnitTesting

IntegrationTesting

System & LSI Testing

AcceptanceTesting

Coding

Design

Systemspecification

Requirements

Where fault detected

Phase of lifecyclein which error madewhich caused fault

Pilot LiveRunning

Detected asintended

Detectedearly

Detectedlate

(Rehearsal of Acceptance Tests)

Unacceptablylate

Slide 48 of 51



2003

Reliability engineering• A 1999 view of reliability (John D. Musa):

– “division between hardware & software reliability is somewhat artificial... you may combine... to get system reliability”p35

– to model software reliability, consider:• fault introduction (development process, product characteristics)• fault removal (failure discovery, quality of repair)• environment (operational profiles)

– define necessary reliability– execution time increases reliability

• Other books and sourcesSlide 49 of 51



2003

Decision theory• Decision theory is a body of knowledge and related

analytical techniques of different degrees of formality designed to help a decision-maker choose among a set of alternatives in light of their possible consequences. Decision theory can apply to conditions of certainty, risk or uncertainty

• Leads to consideration of utility value, game theory, separating information from noise, etc

• Bayesian was in Myers 1976, but still being discussed now as new & exciting (because of advances in algorithms & computation?)

Slide 50 of 51



2003

Summary• Risk has been part of testing for longer than many people think:

about 30 years?• Main messages today:

– all testing should be based on risk– risk is difficult to calculate, so use perceptions of

stakeholders, and broker consensus– “enough” testing has been done when (benefits-

risks) of going live today is a positive quantity• But there still seems to be far to go before we can use it

scientifically in testing on an everyday basis, and we’re just getting to grips with the art!

• Plenty of fun to come...

Slide 51x of 51