This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
• 1. What testers mean by risk• 2. “Traditional” use of risk in testing• 3. More recent contributions to thinking• 4. Risk-Based Testing: Paul Gerrard (and I)• 5. Next steps in RBT:
1. What testers mean by risk• Risk that software in live use will fail:
– software: could be Commercial Off The Shelf; packages such as ERP; bespoke project; integrated programmes of multiple systems...; industry-wide supply chains; any systems product
• Could include risk that later stages (higher levels) of testing will be excessively disrupted by failures
• Chain of risks:Error:mistake made by human(eg spec-writing,program-coding)
Fault:something wrong in a product(interim eg spec,final eg executable software)
Failure:deviation of product from its expected* delivery or service(doesn’t do what it should,or does what it shouldn’t)
RISK RISKRISK
not all errorsresult in faults not all faults
result in failures
Slide 2 of 51
* “expected” may be as in spec,or spec may be wrong (verification & validation)
• Adding in a distinction by John Musa (1998 book, Software Reliability Engineering):– not all deviations are failures (but this is just the “anomaly” concept?)– (so the associated risks are in the testing process rather than development: that
an anomaly may not be noticed, or may be misinterpreted)
• A possible hybrid of all sources:
Chain of risks could be up to 6 links?
Mistake:a human action that produces an incorrectresult (eg in spec-writing, program-coding)
Fault:an incorrect step, process or data definition in a computer program (ie executable software)
Failure:an incorrect result
RISK
RISKRISK
Slide 2.x2 of 51
Error: amount by whichresult is incorrect
Defect: incorrect resultsin specifications
RISK
Note: this fits its usagein inspections
Direct programming mistake
RISK
(false alarm): or Change Request,or testware mistakeAnomaly:
Hetzel (Ed.) 1972-3• Little / nothing explicitly on risk, but:
– reliability as a factor in quality; inability to cope with complexity of systems– “the probability of being faulty is great”p255 (Jean-Claude Rault, CRL France)...– “how to run the test for a given probability of error... number of random input
combinations before... considered ‘good’”p258; sampling as a principle of testing
• Interestingly:– “sampling as a principle should decrease in importance and be replaced by
hierarchical organization & logical reduction”p28 (William C. Hetzel)
• Other curiosities:– ?source of Myers’ triangle exercise p13 (ref.
Dr. Richard Hamming, “Computers and Society”)– the first “V-model”? p172 Outside-in design, inside-out testing
(Allan L. Scherr, IBM Poughkeepsie NY / his colleagues)
• Again, “risk” not explicit, but principles are there:– “reliability must be stated as a function of the severity of errors as well as their
frequency”; “software reliability is the probability that the software will execute for a period of time without a failure, weighted by the cost to the user of each failure”; “probability that a user will not enter a particular set of inputs that leads to a failure”p7
– “if there is reason to believe that this set of test cases had a high probability of uncovering all possible errors, then the tests have established some confidence in the program’s correctness”; “each test case used should provide should provide a maximum yield on our investment... the probability that the test case will expose a previously undetected error”p170, 176
– “if a reasonable estimate of [the number of remaining errors in a program] were available during the testing stages, it would help to determine when to stop testing”p329
– hazard function as a component of reliability modelsp330
Myers 1979: The Art of Software Testing• Risk is still not in the index, but more principles:
– “the earlier that errors are found, the lower are the costs of correcting... and the higher is the probability of correcting the errors correctly”p18
– “what subset of all possible test cases has the highest probability of detecting the most errors”p36
– tries to base completion criteria for each phase of testing on an estimate of the number of errors originating in particular design processes, and during what testing phases these errors are likely to be detectedp124
– testing adds value by increasing reliabilityp5
– revisits / updates the reliability models outlined in his 1976 book:• those related to hardware reliability theory (reliability growth, Bayesian, Markov,
Hetzel (1984)-8: The Complete Guide to Software Testing
• Risk appears only once in the index, but is prominent:– Testing principle #4p24: Testing Is Risk-Based
• amount of testing depends on risk of failure, or of missing a defect; so...• use risk to decide number of cases, amount of emphasis, time & resources
• Other principles appear:– testing measures software quality; want maximum confidence per unit cost via
maximum probability of finding defectsp255
– objectives of Testing In The Large include:p123
• are major failures unlikely?• what level of quality is good enough?• what amount of implementation risk is acceptable?
– System Testing should end when we have enough confidence that Acceptance Testing is ready to startp134
Beizer 1984: Software System Testing & Quality Assurance
• Risk appears twice in index, but both insignificant• However, some relevant principles are to be found:
– smartness in software production is ability to avoid past, present & future bugsp2 (and bwgs?)
– now more than a dozen models/variations in software reliability theory: but all far from reality; and all far from providing simple, pragmatic tools that can be used to measure software developmentp292-293
– six specific criticisms: but if a theory were to overcome these then it would probably be too complicated to be practicalp293-294
– a compromise may be possible in future, but instead for now, suggest go-live when the system is considered to be useful, or at least sufficiently useful to permit the risk of failurep295
– plotting and extrapolation of S-curves to assess when this point attainedp295-304
• “Risk” word is indexed as though deliberate:– a couple of occurrences are insignificant, but others:
• purpose of testing is not to prove anything but to reduce perceived risk [of software not working] to an acceptable value (penultimate phase of attitude)
• testing not an act; is a mental discipline which results in low-risk software without much testing effort (ultimate phase of attitude) p4
• accepting principles of statistical quality control (but perhaps not yet implementing, because is not yet obvious how to, and in the case of small products, is dangerous)p6
• add test cases for transactions with high risksp135
• we risk release when confidence is high enoughp6
• Other occurrences of key principles, including:– probability of failure due to hibernating bwgs* low enough to acceptp26
– importance of a bwg* depends on frequency, correction cost, [fix] installation cost & consequencesp27
*bwg: ghost, spectre, bogey, hobgoblin, spirit of the night,any imaginary (?) thing that frightens a person (Welsh) Slide 14 of 51
Others• The “traditional” period could be said to cover the
1970s and 1980s. A variety of views can be found:– Edward Miller 1978, in Software Testing & Validation
Techniques (IEEE Tutorial):• “except under very special situations [...], it is important to recognise that program
testing, if performed systematically, can serve to guarantee the absence of bugs”p4
• and/but(?) “a program is well tested when the program tester has an adequately high level of confidence that there are no remaining “errors” that further testing would uncover”p9 (italics by Neil Thompson!)
• “a technologically sound approach to testing will incorporate... evaluations of software status into overall assessments of risk associated with the development and eventual fielding of the system”p vii
3. More recent contributions to (risk use) thinking
• Traditional basis of testing on risk (although more perceptive than some give credit for) is less than satisfactory because:– it tends to be “lip-service”, with no follow-through / practical application– if there is follow-through, it involves merely using risk analysis as part of the
Testing Strategy (then that is shelved, and it’s “heads down” from then on?)
• Contributions more recently from (for example):– Ed Kit (Software Testing in the real world, 1995)– Testing Maturity Model (Illinois Institute of Technology)– Test Process Improvement® (Tim Koomen & Martin Pol)– Testing Organisation MaturityTM questionnaire (Systeme Evolutif)– Hans Schaefer’s work– Zen and the art of Object-Oriented Risk Management (Neil Thompson)
Testing Maturity Model• Five levels of increasing maturity, based loosely on decades of testing evolution
(eg in 1950s testing not even distinguished from debugging)
• Maturity goals and process areas for the five levels do not include risk explicitly, although emphasis moves from tactical to strategic (eg fault detection to prevention):– in level 1, software released without adequate visibility of quality & risks– in level 3, test strategy is determined using risk management techniques– in level 4, software products are evaluated using quality criteria (relation to risk?)– in level 5, costs & test effectiveness are continually improved (sampling quality)
Hans Schaefer’s work• Squeeze on testing prioritise based on risk• Consider possibility of stepwise release: :
– test most important functions first– look for functions which can be delayed
• What is “important” in the potential release (key functions, worst problems?)– visibility (of function / characteristic)– frequency of use– possible cost of failure
• Where likely to be most problems?– project history (new technology, methods, tools; numerous people, dispersed)– product measures (areas complex, changed, needing optimising, faulty before)
Second, at detail level:risk management during test specification
• To help decision-making during the “squeezing of testing”, it would be useful to have recorded explicitly as part of the specification of each test:– the type of risk the set of tests is designed to minimise– any specific risks at which a particular test or tests is aimed
• And this was one of the inputs to...
Test specification based on total magnitude of risks for all defects imaginable
• Can define approximately 100 “product” risks threatening a typical e-business system and its implementation and maintenance
• Test objectives can be derived almost directly as “inverse” of risks• Usable reliability models are some way off (perhaps even
unattainable?) so better for now to work on basis of stakeholders’ perceptions of risk
• Lists & explains techniques appropriate to each risk type• Includes information on commercial and DIY tools• Final chapters are on “making it happen”• Go-live decision-making: when benefits “now” exceed risks “now”• Written for e-business but principles are portable; extended to wider
tutorial for EuroSTAR 2002; following slides summarise key points
With acknowledgementsto lead author Paul GerrardSlide 26 of 51
Test Objective Typical Test StageDemonstrate component meets requirements Component TestingDemonstrate component is ready for reuse in largersub-system
Component Testing
Demonstrate integrated components correctlyassembled/combined and collaborate
Integration testing
Demonstrate system meets functional requirements Functional SystemTesting
Demonstrate system meets non-functional requirements Non-Functional SystemTesting
Demonstrate system meets industry regulationrequirements
Actually thereare several reliability growth models, but:• the Rayleigh model
is part of hardware reliability methodologyand has been used successfully in software reliabilityduring developmentand testing• its curve produces the S-curve whenaccumulated
• Main problem is getting theories to match reality• Several acknowledged shortcomings of many theories, eg:
– don’t evaluate consequence (severity) of anomalies– assume testing is like live (eg relatively few special cases)– don’t correct properly for stress-test effects, or code enhancements– don’t consider interactions between faults– don’t allow for debugging getting harder over time
• The science is moving on eg Wiley, Journal of Software Testing, Verification & Reliability but:– a reliability theory that satisfied all the above would be complex– would project managers use it, or would they go live anyway?
• So until these are resolved, let’s turn to empirical data...
S-curve also visible in Kit 1995: Software Testing in the Real World p135
Possible to use to roughly gauge test time or faults remaining Hetzel 1988 p210
The Japanese “Project Bankruptcy” study: Abe, Sakamura & Aiso 1979, in Beizer 1984
• analysed 23 projects, including application software & system software developments• included new code, modifications to existing code, and combinations• remarkable similarity across all projects for shape of test completion curve• anomaly detection rates not significant (eg low could mean good software or bad testing)• significant were (a) length of initial slow progress, and (b) shape of anomaly detection curve...
• Even if we resist temptation to trade off slippage against scope, may still need to renegotiate the tolerable level of risk balanced against benefits
• Even if we resist temptation to trade off slippage against scope, may still need to renegotiate the tolerable level of risk balanced against benefits
6. Refinements and ideas for future• Although almost universal, the simple multiplication of probability x
consequence can be troublingly over-simple: it might descope testing for huge-impact risks which are very unlikely (avionics errors?!). So use an asymmetric view?
• Some risks are from technology, and other risks are business risks, to use of system. So distinguish “cause” risks from “effect” risks?
• Assessing perception of risks is a start, but can metrics give better quantification? Metrics & Fault source analysis
• Reliability models a key part of testing theory in the 1970s, but still not credibly usable? Reliability engineering
• Wider theoretical basis distinguishing risk from uncertainty: Decision theory
Decision theory• Decision theory is a body of knowledge and related
analytical techniques of different degrees of formality designed to help a decision-maker choose among a set of alternatives in light of their possible consequences. Decision theory can apply to conditions of certainty, risk or uncertainty
• Leads to consideration of utility value, game theory, separating information from noise, etc
• Bayesian was in Myers 1976, but still being discussed now as new & exciting (because of advances in algorithms & computation?)