P R E S E N T A T I O N International Conference On Software Testing, Analysis & Review NOV 8-12, 1999 • BARCELONA, SPAIN Presentation Bio Return to Main Menu T23 Thursday, Nov 11, 1999 Zen and the Art of Object Oriented Risk Management: Does Anything Work Neil Thompson Paper Properly Now?
62
Embed
Zen and the Art of Object Oriented Risk …...How testing is based on risk management • 4. Applying object-oriented concepts to testing • 5. Role of metrics and measurement in
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
– At the talk: to think about where testing now stands within informationsystems philosophy, over 25 years on from first testing conference (and sincea popular book analysing quality), to understand why quality is still elusive,and to share new insights
– To take away: a desire and intention to make more direct use of riskmanagement principles in testing
– To use: application to testing of two concepts from object orientation:• encapsulating risk information with tests, to increase effectiveness
• inheritance of tests by other tests, to increase efficiency
• Target audience:– intermediate, but:
• newcomers to testing may find it interesting
• experienced practitioners may benefit from a “reality check”
– talk is not technical: more about risk management than object orientation
• The road ahead (when it’s upgraded to “informationsuperhighway”) is good
• Business @ the speed of thought (via a digital nervous system)is good
The Dilbert FutureScott Adams, 1997
• Bad things which can be foreseen will be prevented by humans• One day technology will reduce human work, not increase it• Democracy & capitalism will always coexist happily with lazy /
stupid peopleRelease 2.0 and 2.1Esther Dyson, 1997 & 1998
• Release 1.0 is fresh and new, the realisation of the hopes anddreams of its developers
• Release 2.0 is supposed to be perfect, but…• …usually Release 2.1 comes out a few months after• Internet causes decentralisation, where the masses separate into
small groups (systems of which may automatically self-organise)Moral and legal challengesof the information eraChris Anderson, 1999
• Institutions and individuals seem to be coming apart, because…• institutions are built on “machine” principles…• and we now need “chaordic” organisations (eg Visa, Internet)
• How long can we survive computerising and interconnectingeverything, with requirements increasingly volatile andundocumented, and responsibility distributed? What we already havedoesn’t work properly!
• Does anyone except testers care? Will the battle for quality send usmad?
• Is the future for testers worse or better:– E-commerce threatens “disintermediation”; is there a similar threat to testing?
(The technicians are expected to be also business-aware and the business expertsincreasingly proficient in technical skills.)
– Or will testing get its actuaries, like insurance industry?
• Good luck for 2000 and beyond! May the Zen be with you.
1
Zen and the art ofObject-Oriented Risk Management:
Testing continues to struggle in the battle for information systems quality because:
• development continues to outpace testing, driven by market and other timepressures (such as year 2000 and European Economic and Monetary Union) andassisted by ever more productive development environments and tools;
• many managers are not sympathetic to the “culture of pessimism” often associatedwith testing, and refuse to move implementation target dates; and
• there are many obstacles to getting (and keeping) systems working properly, andthese seem to be increasing as systems become more complex and moreinterconnected.
Testers can improve their effectiveness and efficiency by:
• accepting that they will always be under time pressure, so must manage quality;
• encapsulating risk information with tests, to help ensure not only that the righttests are planned, but also if time pressures necessitate a reduction in scope, theleast important tests can be identified and omitted (effectiveness); and
• inheriting tests, for example reusing earlier, simpler tests within later, morecomplex tests (efficiency).
Finally, some thought is given to:
• risks and the year 2000 problem; and
• the future of testing, in the context of emerging organisational concepts.
2
Introduction
Around the time information systems testers held their first conference, a popular bookanalysed technology, art, science, quality and philosophy:
• the first testing conference was held in North Carolina, USA, in 1972 HETZ98; and
• two years later was published “Zen and the art of Motorcycle Maintenance”PIRS74,in which the narrator attempted to define quality but was declared insane in theprocess.
25 years later, testing appears to be a mature discipline, recognised as an indispensablecontributor to systems quality, and its accepted main principles have changed little. Butquality in information systems is still elusive, and many mass-market productsincorporate software which clearly does not work properly. Standard software licenceconditions explicitly allow for this, and the public apparently tolerates the situation.
Can anything be done about this, and how should testers position themselves as weenter the new millennium?
This paper is intended to provoke constructive thoughts and actions, under theheadings:
1. Zen and now: a short history of testing philosophy
2. Millennial challenges: does anything work properly now?
3. How testing is based on risk management
4. Applying object-oriented concepts to testing
5. Role of metrics and measurement in risk management
6. Obstacles to attaining and maintaining quality systems
7. Risks and the year 2000 problem
8. Future of testing.
3
1. Zen and now, a short history of testing philosophy
1.1 Introductory themes
This paper is more about risk management than object orientation, and “Zen and theArt of Motorcycle Maintenance” turned out to be more about motorcycle maintenancethan about Zen. But there are some entertaining parallels between some of thecharacteristics of Zen and the characteristics of testers as we enter a new millennium,when the fastest-growing “religion” seems to be a devotion to ever more pervasive,interconnected and complex technology, built on computer software (Figure 1a).
T hompson informationSy stemsC onsulting L im ited
Zen ( f rom Enc yc lopaedia Brit annica) Com p uterisin g & interlink ingeveryth in g (a m illenn ia lr el ig ion?)
Poten tial to ach ieve enlightenme nt i si nherent in eve ryone bu t lie s dorm an tbecause o f igno rance
Pote nt ial to work properly is (argua bly )inheren t in all s ystem s bu t lies dorm an tbecause o f im posed deadlines , m arket-ledhaste a nd habi tual cynici sm
Zen aim s fo r:• m ental tranquil lity• fearles sness• spon taneity
Testers should als o a im fo r:• m enta l tran quil li ty• fearles sness• spon taneity
Zen s ect m ethods :• (Ri nz ai ) sudde n shock & cons ide ri ng
paradox ica l stat em ent s• (So to ) sit ting in m editati on• (O baku ) con ti nual chan ting of Am ida
Tes ting sect m ethods :• (Dynam ikai ) execu tion & debuggi ng• (S ta tico ) s itti ng inspecti ng• (Automatu ) continual chant ing o f “tools ”
1.2 Philosophical journeys
Actually, ZataoMM was not really about motorcycle maintenance either, it was aboutphilosophy. The narrator rode his motorcycle across the USA with his son and someadult friends, and used maintenance metaphors and geographic imagery to illustrate thehistory of philosophy and some troubling issues, in particular around quality. Againthere are some entertaining parallels with software testing (Figure 1b).
4
T hompson informationSy stemsC onsulting L im ited
Zen & th e art o f M otorcycleMainten an ce R ob ert M . Pi rs ig 1 9 74
Test in g b o rn ?
came o f ag e 1 9 72
in i ts p rime y et?
R ou te • From the US Central Plains up theRocky Mountains…
• and down to the Pacific Ocean
• Down the left slope of the V-m odel(URS to m odu le specfications)…
• and up the right side (Unit Testing toAcceptance Testing)
Philosop hies • Preven tive / reactive maintenance• Tools & meta-tools• Classic / romantic understanding• Testing & fix ing / contracting out• Functions / components• Quality in bu rsts / way o f life
• Static / dynamic• Manual / au tomated• Validation / verification• Seek ing defects / demonstrating absence• Black box / glass box• Quality assurance / control
Issu es • Scientific method & limi tations• Defining quality causes in sanity• (Motorcycles were simple in 1974)
• Art o r science?• Codependent behaviou r?• How keep up with development
innovation?
Just as the motorcycle man had differences of opinion and attitude with hiscompanions, so there are different philosophies among testers:
• Most people when they think of testing mean actually executing the software undertest by inputting data and observing the expected results (ie dynamic testing), butthere is a growing appreciation of the cost-effectiveness of static testing, eg codeanalysis, reviews, inspections, walkthroughs etc. Many testers do not find thisexciting, however.
• The benefits of testing automation are still controversial, after many years ofdebate and considerable improvements in the capabilities of automated tools.
• Semantic differences sometimes confuse the distinction between verification andvalidation, though the usually-accepted definitions are that verification is aboutchecking that the product is being built correctly (eg that the code meets its modulespecifications), and validation is about ensuring fitness for purpose, ie the rightproduct is being built GRAH95. Everyone agrees that both are needed.
• There is general agreement that most testing is meant to detect faults and thattesting cannot prove absence of errors, but there is less clarity over the role ofacceptance testing in error detection and the extent to which it should be designedto run “smoothly”, ie without serious faults. One way of achieving both is torehearse the acceptance tests in advance.
• Both black-box and glass-box styles are of testing are needed, but the emphasisvaries through different levels of testing, eg integration testing is mostly glass-boxbut acceptance testing is mostly black-box. Glass-box is often called white-box, butthe principle is that the tester should be able to see the structure inside.
5
The nature of quality was so troubling to the ZataoMM narrator’s “alter ego”,Phaedrus, that he was deemed insane and given electro-cranial therapy. Quality will berevisited several times in this paper, but in less dramatic terms.
There are some other particularly interesting issues which have been receiving recentattention in the testing world, and which are still the subject of lively discussion:
• Is testing an art or a science? At first sight it’s a science, because we set out witha hypothesis that the software contains faults, then attempt to prove that hypothesisthrough experiments, ie tests (arguably during acceptance testing we also try tobuild confidence in business benefits and fitness for purpose by attempting todemonstrate absence of faults). But much of the data we would like to guide thedesign of our experiments are not readily available (metrics and measurementsfrom previous projects), so we often need to fall back on intuition to design clever,efficient tests and to diagnose errors. Glenford Myers apparently saw it this waywhen he titled the first testing bestseller book “The Art of Software Testing”MYER79.A recent analysis of whether software testing is scientific found many reasons whynot BERE98. Also, there is debate over whether software is an engineering disciplineor not. Fortunately for the motivation and job satisfaction of many testers, itcurrently retains characteristics from both art and science.
• Are testers exceeding their job specification and taking on too many of otherpeoples’ problems? Some managers hold testers responsible not just for findingerrors but getting them fixed, and some see faults as a delay to the schedule (badnews) rather than a prevention (good news) of later live failures. Others ask testerswhether it’s safe to go live yet. And the reason testers are under so much pressureis that they join the critical path of a project towards the end, when mistakes andshortcomings have accumulated from constrained budgets, optimistic plans, inexactrequirements, imperfect design and imperfect coding. Worse, some of thoseproblems may perhaps have been forgotten or hidden. And the testers admit thatthey have to fix it all, and fast! This has been persuasively likened to “co-dependent behaviour”, a psychological disorder COPE98.
• How can testing keep up with the pace and innovation of development? Untilwe can educate managers to delay projects, and markets to demand (and wait for)more reliable software, and until the holy grail of automation is found, we willhave to find better ways of managing within the constraints imposed on us. This isthe main theme of this paper, even though this is “to begin tolerating abnormal,unhealthy and inappropriate behaviours, then [to go one step further, to] convinceourselves these behaviours are normal” COPE98 (making it arguably in itselfcodependent behaviour!)
1.3 Zen, Quality, Testing and Risk Management
The motorcycle man’s journey took him over the summit of the Rocky Mountains, andit was here where he came closest to Zen’s enlightenment and mental tranquillity.Before testers can attain such a state, they have to descend into the depths of the V-model and meet the standards of quality control and quality assurance. All of these arecontributors to overall project risk management (Figure 1c).
“Soon, stunted pines disappear entirely and we’re in alpine meadows. There’s not atree anywhere, only grass everywhere, filled with little pink and blue and white dots ofintense colour…we’ve reached the high country… I look over my shoulder for one lastview of the gorge… People spend their entire lives at those lower altitudes without anyawareness that this high country exists”
1.4 The nature of quality
Like Phaedrus, information systems professionals have expended much effort trying todefine quality. For the purposes of this paper, it is sufficient to outline (Figure 1d):
• the time-cost-quality triangle; and
• the distinction between quality control and quality assurance.
The time-cost-quality triangle sometimes quoted by project managers is shown on thenext page with its true axes of good, fast and low-cost. It is common in commercialprojects for the timescales and budgets to be set early, and only a proportion ofmanagers accept that each phase of a project (eg level in the V-model) can beestimated only when the previous phase has completed, or nearly so (eg one needs toknow the requirements before costing and planning the design phase). So the qualitybecomes constrained within quite a narrow range, and it is usually easier to increasethe budget than to extend the timescales. Quality should of course be maximised withinthat range by effective quality control and quality assurance mechanisms.
7
T hompson informationSy stemsC onsulting L im ited
•… close ly followe d by cost• qual ity is the best w e c an ma nage, (unle ss w e com plain loudly)
Quality Assur anc e:
• ISO9000 e tc
• audi t, external (then fix?)
Quality
C ontrol:
• ”right first t ime ”• internal responsibili ty
Like validation and verification, the difference between Quality Control and QualityAssurance has different interpretations. One useful distinction THOM94 is that:
• QC is the responsibility of those doing the actual work, and may be improved bya “right first time” culture. However, the greater the time pressures on a project, themore staff are likely to make mistakes and feel that quality is a luxury for others.
• QA is an external “audit” or “policing” function, sometimes built on formalstandards eg ISO9000. Unless QC is very good or QA is weak / open tonegotiation, QA is likely to require some rework of tasks done.
1.5 Role of testing in quality
Most people have by now been convinced that testing can never be perfect, unless it isexecuted for an infinite time or with infinite resources. So all real-life systems go livebased on a threshold of acceptable quality, which for some systems claims to be “zero-defect” but for most systems is based on acceptance criteria such as “no critical faultsremaining, less than 10 important faults, less than 30 medium and less than 100 low-importance faults”.
Testing should converge on this threshold via one, or ideally two or three, cycles oftest-check-diagnose-fix-retest (Figure 1e).
into instability through insufficie nt qual ity of
design, de bugging, ma inte nance ,enha ncem ents
It is normal for some fault-fixes themselves to contain errors. The proportion dependson a number of factors, eg working hours of staff and amount of pressure on them,clarity of original specification and of fault descriptions, quality of system design,modularity of code, adequacy of documentation. If the proportion of these “knock-on”errors becomes too high, it is theoretically possible for the spiral to diverge, ie thesystem becomes more and more unstable whenever it is changed. This is unlikely fornew systems, but is a significant risk for old systems after a long period ofmaintenance by staff who were not involved in the original development. Sometimessuch systems have to be frozen to keep them working, and replaced completely at thefirst opportunity.
1.6 RAD versus Trad.
The above-mentioned risk of divergence into instability, or more likely just failing toconverge fast enough on stability, exists for Rapid Application Developments ifinsufficient control is applied. RAD and related methods such as iterative developmentmitigate this risk by strict “timeboxing” and repetitive testing.
Also, it is widely accepted nowadays that this process extends quickly into liverunning, as the initial version of a software-based system or product is usually knownto be imperfect, and the descoped functions and lower-priority defects are scheduledfor resolution in future releases (Figure 1f).
9
T hompson informationSy stemsC onsulting L im ited
Microsoft is reported CUSU95 to use a fine-grained version of this, with:
• extensive parallel work, but with strict configuration control and dailysynchronisations and debugging;
• full build and test as frequently as the product and its current market contextrequire (this could be monthly, fortnightly, weekly or quite commonly daily); andtherefore
• never being far away from a deliverable “fit-for-market” product.
1.7 Test structure for effectiveness
It is important to know how the structure of tests planned and executed covers thefunctionality and other attributed of the system(s) under test. This is usually based onthe very well-known V-model, in which each level of tests (unit, integration, systemand acceptance) aims to exercise the system in different ways from differentviewpoints, so that in total everything is done somewhere, preferably only once (plussome regression testing). Usually nowadays a refinement such as the W-model is used,which emphasises test specification as early as possible, thereby getting the benefit ofstatic testing in addition to dynamic.
But measuring coverage at each level is not trivial. Typical measures used at differentlevels are:
• unit testing: statement and branch coverage of code
• integration testing: condition coverage of interfaces
• acceptance testing: coverage of stated requirements, user transactions for each userrole profile, business events etc.
10
It is system testing which presents the greatest challenge to coverage measurement,because it is expected to cover everything, not only all functionality but also all the“non-functional” or technical attributes such as performance, security, backup etc.
It is possible to think of this as a three-dimensional glass box, with:
• the first dimension being the structure of the system, in whatever terms it isspecified, eg functional decomposition;
• the second dimension giving structure to the way the system behaves, again inwhichever terms the specification is written; and
• the third dimension representing the various types of testing which have to beconsidered.
Following glass-box testing, or in association with it, we also need to do black-boxtesting based on data values in and out.
Figure 1g illustrates these principles THOM93.
T hompson informationSy stemsC onsulting L im ited
Figure 2a: M illennial c hallenges to information s ystems
9 Sep 9 9 etc.
29 Feb 00
1 Jan 00
p o i nk
C= UK£
Ir£ NFl FMBFr LFr DM ASchPEs SPta FFr ItL
19981999
2000
http:/ /w ww
2002:EM U in
rea l life
2001:a cyberspace
odyssey?
(Figure 2a) Over the last few years there have been some very large-scale changes inEurope with a big impact on information systems:
• the opening of the electricity and gas markets to competition has been completed inthe UK, and similar initiatives are under way in other countries;
• the first wave of EMU has been implemented; and
• many companies and organisations have been repairing or replacing systems tohandle the year 2000.
This level of change is not yet over, and in the next few years:
• further major changes are planned to electricity markets, and other utilitycompanies are diversifying widely;
• the public conversion to the euro has still to be done, and other countries may joinEMU; and
• year 2000 is not yet here.
At the same time, systems are becoming more complex, more pervasive and moreinterconnected. For example:
• information systems are becoming more internet-based, despite difficulties withcommunications bandwidth, browser incompatibilities and the need for cookies andother add-ons to produce usable systems;
12
• energy companies are carrying telecommunications, satellites are carrying internettraffic, new telecommunications methods are emerging but giving interferenceproblems, and television is going interactive and internet-linked;
• motor vehicles are increasingly software-controlled, and this can even be accessedover a mobile phone;
• functionality is being added to application software (and then changed in futurereleases) faster than people can learn to use it;
• new releases of operating systems are frequent, and reports of problems andinstability are common;
• each new operating system version and browser version is supposed workalongside all previous versions, as these may be in use for many years; and
• the same thing applies to application software, so it is common for new systems tobe interfaced to legacy systems rather than to replace them; we thereforeaccumulate more and more diverse, interconnected systems.
2.2 Testing squeezed: even more!
It has been common for some years for testers to complain that the proper time fortesting is compressed towards the end of a project as previous phases slip yet theimplementation date does not (or if it does, by a lesser amount). This is even worse inthe current environment, because:
• instead of this problem for each individual system, at different times,…
• now complete industries, with complete sets of diverse systems, have to changesimultaneously for events like EMU, year 2000.
Also, modern systems for Enterprise Resource Planning and electronic commerce arereally super-systems. So the traditional problems of delays, late deployment of testingstaff, shortage of experts and “immovable” implementation dates compress testingmore than ever before. Some of these dates really are immovable (Figure 2b).
T hompson informationSy stemsC onsulting L im ited
So is there anything we can do to make a tester’s life easier? It is well known thattesting is a form of risk management, but this knowledge is not always used asexplicitly as it could be (and should be) in planning and managing the testing process.The main components are illustrated in Figure 3a.
T hompson informationSy stemsC onsulting L im ited
The usual convention is to express the degree of risk as the multiplied values of:
• probability, ie the likelihood of the risk becoming a real problem (0% isimpossible, 100% is certainty); and
• severity, ie the degree of impact expected if the problem anticipated by the riskactually occurs (this may be measured in financial terms, eg lost income ofbetween £40,000 and £100,000, most likely case £50,000).
Such precise measurements of percentage and financial loss are difficult in practice,and it is often sufficient to use a high/medium/low rating system for both probabilityand impact.
However, there are some risks for which for emotional or other reasons, one or other ofprobability / severity is considered predominant. For example, most people are muchmore frightened of thunderstorms than they “should” be based on the real probabilityof being struck by lightning BERN96. Conversely, if a risk is not high-impact but isalmost certain to happen, if we have the time and ability to prevent it then most peoplewould want to do that.
It is often useful to consider a third component, visibility, or more precisely theinvisibility of the effects of a risk. If we see a risk to the correctness of the data in asystem, it will be more of a problem if we don’t detect it (ie the data looks plausiblebut is wrong) than if the problem is obvious (eg values suddenly becoming negative).This is one of the more worrying aspects of the year 2000 problem.
14
3.2 Risk management and testing
It is at different levels of the V-model that we can see the distinction between the risksof faults within the software and other risks of a system(s) implementation (Figure 3b).
• Scr ip t tes t s aro un d user gu ide and u ser& o p erator train in g materials
Systemtest ing
• S ys te m ≠ sp eci ficat ion
• U n d etected erro rs w as te user t ime& d amag e c on fiden ce in
A cce ptan ce test ing
• Use in dep en d ent tes ters , fun ct io n al &
tech n ical , to g et fresh v iew• T ak e last o p po rtun ity to d o au toma ted
st res s tes t in g b efor e env ’t s re-u sed
Integrationtest ing
• In terfaces d o n’t ma tch• U n detected erro rs too late to fix
• U se s k il ls o f d es ig n ers b efor e they mo v e
a way• T ake last o pp o rtu n ity to exer cise
in terfaces s in g ly
Unittest ing
• U n its d on ’t w ork rig ht• U n detected erro rs w on ’t b e fo u nd
b y later tes ts
• Use detai led k n ow ledg e of dev elop ers
b efo re th ey for get• T ak e last o p po rtun ity to exe rcise e very
erro r m essag e
15
3.4 Detailed level
Within each level of the V-model, we should be specifying the tests which are mostlikely to address the specific risks expected at that level. For the lower levels (unit,integration and system), this means predicting the kind of defects which lurk there,and designing tests accordingly. Because testing is at still at least as much art asscience, this is often done almost subconsciously, using intuition based on previousexperience. Sometimes explicit risks are assessed and documented, especially forsystem and acceptance testing, but after explicitly (or more likely implicitly) usingthese to specify the tests, often these are then filed away. We would do better to keepsome record of the risks being addressed as part of the test specification (Figure3d).
Figure 3d: Risk mana gement during test spec ification
• To he lp dec ision-ma king during the “squee zing of testing” , it w ould be use fulto have recorded expl ici tly a s pa rt of the spe cification of each te st :
– the type of risk the set of tests is designed to minimise
– any specific r isks at which a pa rtic ular test or te sts is a ime d
• Rem embe r, eac h test i s a means to an end, not an end in itsel f
• The “object” of eac h test i s risk m anage ment , so let’s encapsulate...
Test spe cification base d on total
magnitude of risks for al l defec ts
imaginable
x= ∑ ( )Estimatedprobabi lity of
defe ctoccurring
Estimatedseveri ty ofdefe ct
16
4. Applying object-oriented concepts to testing
4.1 Encapsulation for effectiveness
The first OO concept considered here is encapsulation: if we want to keep a record ofthe risks which each test or set of tests is intended to mitigate, why not treat the risksas part of the overall test “object”, in a similar way to how a development objectencapsulates the local data it needs? If we want at some future date to reassess the riskto determine the status of that test, we won’t have very far to look. Figure 4a illustratesthe principles of “object-oriented risk management” compared to the correspondingconcepts used in OO development.
T hompson informationSy stemsC onsulting L im ited
4.2 Object-oriented risk management in test specification and execution
The application of risk management to test specification was introduced in section 3.4above. Figure 4b builds on this by applying similar principles to test execution. We arestill looking at the probability and severity of defects, but whereas in specification wewere trying to guess which defects we would be able to find, when we start execution,we know about all the defects found so far.
17
T hompson informationSy stemsC onsulting L im ited
Test specification:•for all defects im aginable...
{Test execution:•for each defect detected…
•for all defects as yet und iscovered ...
Probabil ity
= 1
Estimated
probabi lity
Estimatedseveri ty
Estimatedseveri ty
Severi ty =f (urgenc y,
im portanc e)
=
=∑∑∑∑
The severity of a defect is a function of its urgency and its importance, which can bedefined as follows:
• when a test result either disagrees with the expected result or is otherwise deemedunsatisfactory, an incident should be recorded;
• the term “incident” is chosen because at this stage the tester cannot be sure whetherthis is a problem / defect (ie the system fails to meet its specification) or a changerequest (ie the system’s specification is inadequate);
• either way, there are two categories of risk associated with each incident:
• that it delays or disrupts some or all of the tests planned to follow; and
• that if not fixed or otherwise acted upon before go-live, it damages thebusiness or organisation;
• these two categories should be distinguished by recording two separate priorities inparallel:
• urgency, ie how quickly it needs fixing to minimise impact on progress oftesting; and
• importance, ie how much impact it would have on the business if not fixedbefore go-live.
Often these two priorities are the same, but they can be opposite, eg invoices correctlycalculated but printed with zero value will not stop the tests but would be disastrous fora business.
The actual severity for the purpose of risk calculation and resolution scheduling is bestreviewed by a regular meeting of business, testing and technical representatives, sincea low-importance but high-urgency incident could be blocking tests which are vital forconfidence. The closer we get to go-live, the more value is placed on importance andthe less value placed on urgency for testing.
18
4.3 A framework for safe descoping
The main point about object-oriented risk management is that it recognises that testersare rarely allowed to take as long as they want to test a system. And because manythings go not as well as planned, but few things go better than planned, a fixedimplementation date often means several iterations of removing things from scope:
• in the relaxed early days, we write a testing strategy promising to cover everythingadequately;
• when the testing plan is written, we already know that development is late and wewon’t be allowed such a large testing team;
• when it’s time for test design, we discover that we will not be allowed twodedicated environments but only one, and that will arrive late;
• any difficulties during test scripting, or any further development delays, will furtherattack the achievable test coverage; then
• test execution provides another set of threats, eg blocking errors delaying theschedule.
So our planned coverage is eaten away, bit by bit, until it is full of holes like acaterpillar-attacked leaf. We’d like those holes to be in the safest places; if a leaf iseaten away between the veins, it will still stand, but if the veins are broken, the leafwill collapse. So we need to know where it’s safest to make those holes, where therisks are lowest.
Finally, even when it comes to retesting and regression testing there are furthercompromises still to be made. If testing is as good as it normally is, we will have foundmany defects, probably hundreds. And how much time was allowed in the plan forretesting? Did we tell the project manager we expect 450 defects so we’d better allow 3weeks for retesting?. If so, did he / she accept this?
19
And the regression testing compromises are even worse, because we need a verysophisticated regression testing plan to tell us with high confidence what it is safe toleave out of regression testing. In the absence of that confidence, it feels good to go fora comprehensive re-run. But if we get any further failures during that (which wenormally do), is there time to do it again? Not usually.
4.4 Revisiting risks
At each of these potential descoping points, we ought to check that the risks weoriginally assessed are still valid. There will not be time to repeat the whole process,but we can use the structure outlined in section 1.7 to check that our tests still cover themost important business processes, the key data inputs etc (Figure 4d).
T hompson informationSy stemsC onsulting L im ited
Figure 4d: K eep tests prioritised for bes t effectivenes s
Risks to perform ance , auditabil ity
Complex funct ions,
interfac es
Importantbusiness
processes
K ey data inputs, e ghigh va lue t ransact ions
Important da ta outputs ,eg invoice s
Example r isk areas
GLASS-
BOX
BLACK-BOX
4.5 Inheritance of tests
A second concept from object orientation may be used by testing, this time to improveefficiency. A common approach to specifying system-level tests is to start with a freshready-to-use database (an artificial, stable initial state) then run a number of streams oftesting in parallel. Each test requires some test data set-up, and the more complex thetest, the more work is needed to get the data in the right state.
Inheritance can help us plan some short cuts here, particularly if we have testautomation in place. It is sensible, and usual, to start with simple tests and keep thecomplex tests until the basics have been proven to work. But we can go further withthis principle, and actually plan to reuse the simple tests as part of the complex tests, toget the data in the right state. If automated, we could even simply rerun the automatedscript. The principles are illustrated in Figure 4e.
20
T hompson informationSy stemsC onsulting L im ited
Figure 4e: Complex te sts inheriting simpler tests
INSTEAD OF T HIS TRY TH IS
Early , simple
tes ts
Artif icial, stable
initial state
A la ter, complex
tes t
Progres sively more complex
tes ts
Artif icial, stable
initial state
...
Test 1
Test 2
Test 3
Test 57
Initia l t est
da ta s et-up
Initia l t est
da ta s et-up
M ore t est
da ta s et-up
...Test 1
Test 1
Test 6
Test 1
OR EVE N THIS
Test 8
Test 9 Test 57
Test 57Test 23
The advantages of this approach are that:
• we can save time, as there are less test steps to specify overall;
• as tests are accumulated, a library of reusable tests is built, evolving into a lifelike“test bed”;
• there is an opportunity to “mix and match” different tests, giving ready-madevariations;
• if two composite tests behave differently, it is relatively easy to isolate thedifferences in the tests themselves; and
• there is a by-product of some automatic regression testing, as repeating exactly thesame tests which have worked earlier can reveal unexpected side-effects of fixes.
There are some disadvantages however:
• repeating an earlier test exactly may miss the opportunity to find a problem which aslight variation would have detected; and
• if there are problems with the earlier tests, the later tests which use those aredelayed.
21
5. Role of metrics and measurement in risk management
5.1 Metrics and measurements: which?
Like verification and validation, quality assurance and quality control, metrics andmeasurement are subject to semantic debate. A dictionary definition will suffice here:“metrics: the theory of measurement” Chambers. The simplest interpretation is then that:
• defining metrics is deciding what we want to measure (eg progress through testing,defect priorities and resolution trends, defect sources), and how we can do that; and
• measurements are what we collect to populate our metrics with data.
Cynics often say that one can make statistics appear to prove anything, but that is thepoint here: we should choose our metrics to give us information on the risks we wantto manage, and then at the next level of sophistication to build up a knowledge base toguide future intelligent action.
Some common and useful metrics are illustrated in Figure 5a.
T hompson informationSy stemsC onsulting L im ited
The subject is too wide to give much space here, but a few comments are useful to givecontext:
• Testing progress is needed to predict whether we are going to finish in time.Typically initial progress is slow, owing to environmental problems and “blockingdefects”, then speeds up, then slows at the end as we struggle to complete thedifficult cases. This gives an S-shaped cumulative curve.
• Testing productivity over time should be similar to testing progress, becausetesting productivity per test should be fairly uniform until we reach acceptancetesting. This should contribute to our knowledge base of which kind of tests aregood at finding errors and which are not.
22
• Defect priorities and resolution trends, like testing progress, are essential topredict whether we will be ready in time, or if not, when.
• Defect source analysis is another “knowledge” contributor, because each testinglevel, eg integration, should specialise in the kind of defects which suit it (andwhich later, higher levels have less chance of finding). Note here one criticism ofthe V-model, which is that the kind of errors in which acceptance testingspecialises are often the most difficult to fix, ie requirements errors, yet acceptancetesting is the last to execute. The answer is that although acceptance testing is thelast to run, it should be the first to start (as in the W-model). Defining acceptancetest cases can and should begin as soon as the requirements are documented.
5.2 Metrics in risk management
If we return to our picture of estimated probability and estimated severity of defects aspredicted in test specification and test execution, we can see how metrics can help uswith some of the missing information (Figure 5b).
T hompson informationSy stemsC onsulting L im ited
We can specify more effective tests in future if we know which tests were best atfinding which errors (defect source analysis). Tests which run without failures, unlessthey are necessary for data setup or are part of acceptance tests, are not goodcontributors to risk management.
During test execution, we can use not only current defect information and trends topredict when we it will be safe to stop testing, but we can use defect information fromthe lower levels already executed, to predict where in the system defects are mostlikely to be (the author calls this “glass-shelf testing”). There is evidence that metricsfrom unit testing are very strong indicators of which units will be prone to rework inlater, higher levels HOLT98.
23
6. Obstacles to attaining and maintaining quality systems
6.1 Insufficient time, and neglect of risk
There is never as much time available for testing as testers want. We could respond tothis better if we did not typically encounter the following obstacles:
• Insufficient use of risk. Initial risk analysis at the start of testing is common, butthen we tend to put our heads down and do “the tests, the whole tests and nothingbut the tests”.
• Unclear probability of risk. Knowing the risk factors is currently more art thanscience, and change as technology changes (for example, yesterday’s stack pointerproblems give way to today’s browser incompatibilities). Very few commercialorganisations seem to budget enough for metrics and their measurement andinterpretation.
• Unclear severity of risk. This is usually easier to determine: interview the usersand their managers. But remember to revisit it at key decision points.
Object-oriented risk management is intended to help overcome these obstacles. Butthere are other obstacles…
6.2 Motorcycles and information systems
In ZataoMM, the narrator discussed a number of obstacles he had identified in keepingmotorcycles running properly. There are some interesting comparisons between theseand similar obstacles met by information systems developers, maintainers and testers(Figure 6b).
T hompson informationSy stemsC onsulting L im ited
… in m otorcycle maintenance(analogue: needs frequentadjust ment s, par t re pl aceme nt s)
… in testing info rmationsystems (digital: once it works, itshould continue to work)
External • out-of-sequence re-assem bly• inte rm it tent failure• parts scarc i ty & confus ion
• installation etc . conflicts• inte rmit tent failure!• deve lopme nt delays &
configuration mana geme nt
Internal:• value
• va lue rigidi ty• e go• anxie ty• bore dom• impat ienc e
• no la tera l thinking• a rrogance , cover-ups• desire for “smooth tests”• uninspire d tests• inadequa te stra tegy / design
• truth • true / false / undefined • if… the n… without e lse
• “psycho-motor”
• inade quate to ols• poor working e nvironme nt• lac k of “m e chanic’s feel”
• no, or im perfec t, tool se t-up!• ina dequate te st environm ents• “finger trouble” (de libe rate?)
24
6.3 Why testing continues to struggle
There are some other books, more recent than ZataoMM, which suggest some insightsinto why getting information systems working properly, and keeping them working, isso difficult (Figure 6c)
T hompson informationSy stemsC onsulting L im ited
• The road ahead (when it’s upgraded to “informationsuperhighway”) is good
• Bus iness @ the speed of thought (v ia a d igital nervous system)is good
The Dilbert Futu reScott Adam s, 199 7
• Bad things which can be fo reseen will be preven ted by humans• One day t echnology will reduce human work , not increase it• Democracy & capitalism will always coexis t happily with lazy /
stup id peopleRelease 2 .0 and 2.1Esther Dyson, 199 7 & 199 8
• Release 1.0 is fresh and new, the realisation of the hopes anddream s of its developers
• Release 2.0 is supposed to be perfect, but…• …usually Release 2.1 comes out a few m onths after• In ternet causes decentralis ation, where the masses separate in to
small groups (systems of which may aut omat ic ally se lf-o rgani se)Moral and legal challengesof the information eraC hris And erson, 199 9
• Institutions and indiv iduals seem to be coming apart, because…• institutions are built on “machine” principles…• and we now need “chaord ic” organisations (eg Visa, Internet)
25
6.5 The surprise of Linux
Following up this idea of “chaordic” organisations: the Linux community is arguablyone such organisation. One might at first sight expect a distributed collaboration ofloosely-controlled developers and testers to produce less reliable software than largecorporations, but the growing band of Linux adherents claim the opposite, and there issome evidence to support this. Figure 6e illustrates what we might expect compared towhat seems to be emerging.
T hompson informationSy stemsC onsulting L im ited
Figure 6e: Ins titutional and chaordic c onvergence
W e might e xpect: - close r control
- defined scope- traditionally
motivated
staff
INSTITUTION A L eg proprie tary C HA OR DIC eg LinuxW e might e xpect: - ana rchy
- runaw ay scope- insuffic ient ly
motivated
“publ ic”
W hat we se em to get: - convergenc e...
- but only unti lthe next
m ajor
rele aseor next
product
2 .0
1 .1
1 .0
2 .1
1 .3
1 .2
etc.
���������������������������������������
���������
W hat we se em to get: - c onvergenc e…
- less fac ili tie s- but fewerfaul ts, and
- bet terstabil ity
- why?
26
7. Risks and the year 2000 problem
The risks presented by the millennium problem (or, more correctly, the centuryproblem, since if computers had been in widespread use in 1899 then we would havehad to fix it for 1900) are interesting for several reasons:
• The whole issue, despite some popular belief, was well known in advance by all,or nearly all, technical people (the author of this paper detected it when writing hisfirst program in 1978, but declared himself “unlikely to be still in computing bythen”!
• The main reason it is still a problem, apart from a reluctance by some managers tolisten to, or believe, technical people, is the natural human weakness ofprocrastination, or “short-term-ism”. Few people wanted to pay for fixing itbefore it needed to be fixed.
• Even now, not many weeks away from the event (this was written in earlySeptember 1999), genuine experts cannot agree on whether it will really be a bigproblem or not. For every piece of bad news pounced on by the pessimists, therewill be an immediate response by the optimists, declaring the news false ordistorted, or failing that doubting the intellectual rigour of the reports. And viceversa.
Several “advance mini-crises”, bundled by many commentators with year 2000 itselfand its 01 January and 29 February problems, have passed with little or no visibledisruption:
• the start of the financial year 1999-2000;
• the Global Positioning System date rollover in August from 1023 weeks to 0weeks; and
• the potential, though less plausible, 9/9/99 problem.
These “non-events” are being taken in many quarters as indications that the pessimistshave been wrong so far and will continue to be wrong. The pessimists, on the otherhand, can quote the well-known stories of:
• the boy who falsely cried “wolf” (when a wolf did arrive, the boy was not believedand the sheep were eaten); and
• Cassandra, who knew she was telling the truth but also knew she would not bebelieved.
Arguably year 2000 optimism or pessimism is like religion; it is not a consciousdecision; one either believes or does not (agnostics being treated as non-believersrather than disbelievers). The only conscious decision most people take is to mix withthe kind of people who agree with one’s beliefs BERN96. This tends to reinforce thosebeliefs. So it is with year 2000.
It does seem, however, that the pessimists occupy the moral high ground. The mostpersonal and emotional attacks are by the optimists on the pessimists, declaring themcynical money-grabbers, opportunists, charlatans, self-promoters etc. Rarely do the
27
pessimists attack the optimists, other than to pity them. Had the pessimists not been sovociferous over the last three or four years, much of the remedial work which has beendone may have been too late. Yet there are undoubtedly cynical money-grabbers etc onthe y2k bandwagon. And if the pessimists really do think the optimists are guilty ofdangerous and culpable negligence, why do they not say so more loudly?
But there is a case to be made that even if the pessimists are right, they are right not tobe pessimistic too loudly. They were five years ago, but not now. The reason is thatwhether or not it is a really big problem may depend as much on how many people, atthe last minute, think it will be a problem. There is a risk of a “self-fulfilling prophecy”(Figure 7).
T hompson informationSy stemsC onsulting L im ited
• to o li tt l e rem ediat io n d on e • p an ic ame nd men ts to sys tems
• e xce ss iv ely d is ru pt iv e tes t s
• u n ne cessary sy stem rep lac emen ts
• e xp en siv e co n ting en cy arra ng eme nts
• in ves tmen t su p pre ssed
• s tock p iling & ho ard ing
• s tagn at in g ma rkets eg g il ts
• c ash w i th d raw n
A re w e c ur rently her e?
• a lmo s t-a deq u ate reme diat io n
• to o li tt l e test ing
Not only are the causes and effects of possible year 2000 problems not independent,but:
• potential causes are dependent on other causes (eg one company’s systems arecompliant, but they are fed incorrect dates or date-affected data across aninterface); and
• potential effects are dependent on other effects (eg supply chain problems,particularly the widespread “just in time” chains).
It is this lack of independence which has almost certainly caused the insurance industrynear-unanimously to refuse to insure year 2000 risks. Insurance is guided by actuarieswho perform sophisticated calculations on independent risks. Once the risks can feedoff each other, “all bets are off”. Interdependenzkraft? Nein danke! But then there arealways the lawyers on which to fall back…
Looking wider than the year 2000 problem, it has been argued that such problems ofinterdependence between cause and effect are actually a threat to the world’s currentmix of global capitalism with fragmented and diverse political control SORO98. Thefamous financier sets out concepts of:
28
• fallibility (not only is perfection impossible, but it can have positive consequencesto recognise this and plan for it);
• reflexivity (not only are our expectations of future events affected by past events,our expectations can themselves actually affect those future events); and
• open society (members of an open society recognise that understanding isimperfect and actions can have unintended consequences; capitalism is a distortionof an open society; market values need to be moderated by social values).
This strays outside the boundaries of software testing, but bears comparison with someof the ideas expressed in section 6:
• new, fragmented and cross-border groupings of mutually-interested people (via theinternet) do not necessarily threaten orderly society;
• distributed but collaborative development and testing (Linux), rather surprisingly,does not necessarily lead to bad software;
• other “chaordic” organisations (eg Visa) can be outstandingly successful;
• market forces are taking software complexity out of control, and testers have akey role in setting up the missing “error correction” mechanism based instead onsocial values; therefore perhaps…
• the worldwide community of testers may yet be able to escape the misery ofcodependent behaviour.
29
8. Future of testing.
To conclude with some thoughts on the future:
• How long can we survive computerising and interconnecting everything, withrequirements increasingly volatile and undocumented, and responsibilitydistributed? What we already have doesn’t work properly!
• Does anyone but testers care? Will the battle for quality send us mad?
• Is the future for testers worse or better?
• E-commerce threatens disintermediation, ie “cutting out the middle-man”; isthere a similar threat to testing? Technicians are expected to be increasinglybusiness-aware, business experts are increasingly technically proficient, andtesters often integrate the two other skill-sets. Will they always be necessary?
• On the other hand, will testers get the time and information in the future tobetter assess risk? The insurance industry does not do much without consultingits actuaries (and very well-paid they are too), yet there are no equivalentpositions visible in testing. Will testing get its own actuaries?
The future of object-oriented risk management has already arrived for the author of thispaper, who has just finished a successful acceptance testing / user trials project in avery large company, using the concepts outlined here. There were learning points inaddition to things that worked well, and it is hoped that a case study will be included ina future paper.
…to talk now about Phaedrus’ exploration into the meaning of the term Quality, anexploration which he saw as a route through the mountains of the mind…In the firstphase he made no attempt at a rigid, systematic definition… This was a happy,fulfilling and creative phase. The second phase emerged as a result of normalintellectual criticism of his lack of definition…he made systematic, rigid statementsabout what Quality is, and worked out an enormous hierarchic structure of thought tosupport them.
The take-home messages are therefore:
• keep your customer’s risks evaluated as part of your tests, and revisit them atkey stages in testing;
• define metrics (also based on what risks you want to manage), and measure againstthose metrics, but keep it simple;
• if necessary, be ready to go live on any of a range of dates, each with known risk;
• keep your measurements, and put them on the internet;
• work out what you would have done differently if you had had thosemeasurements at the start of the project;
• take every opportunity to argue that testing needs expert assessors of risk just asmuch as the insurance industry needs actuaries;
• continue to attend, and contribute to, conferences like EuroSTAR;
• fight the battle against codependency, but don’t only talk to other testers, go outand preach to the project managers and the “object orienteers”English pun;
• oh, and… survive year 2000.
30
References
ANDE99 Chris Anderson Moral and Legal Challenges of the InformationEra: the Effect of Y2k – a PhilosophicalDigression to Pretoria University 1999
BERN96 Peter L Bernstein Against the Gods: the RemarkableStory of Risk, Wiley 1996
COPE98 Lee Copeland When Helping Doesn’t Help: Software Testingas Codependent Behaviour, EuroSTAR 1998
CUSU95 Michael A Cusumano &Richard W Selby Microsoft Secrets, The Free Press 1995
GRAH95 Dorothy Graham &Systeme Evolutif CAST Report, CMI 1995 (also other editions)
HETZ98 Bill Hetzel Software Test and Evaluation:a 25-year Retrospective, EuroSTAR 1998
HOLT98 Peter Holt & Leading Indicators of Rework: a Method ofRonald Stewart Preventing Software Defects, EuroSTAR 1998
MYER79 Glenford J Myers The Art of Software Testing, Wiley 1979
PIRS74 Robert M Pirsig Zen and the Art of Motorcycle Maintenance:An Inquiry Into Values, Bodley Head 1974
SORO98 George Soros The Crisis of Global Capitalism:Open Society Endangered, Little, Brown 1998
THOM93 Neil Thompson Organisation before Automation: a Structured
yet Pragmatic Testing Methodology, EuroSTAR 1993
THOM94 Neil Thompson RAD v. Trad: a Case Study, EuroSTAR 1994
end of document NTCES99WP.doc v1.0 Neil Thompson 10 Sep 99
Neil Thompson
Neil Thompson is a graduate in Natural Sciences who has worked forover 20 years in information systems (with a hardware manufacturer, twosoftware houses, a user organisation and two managementconsultancies). His roles have evolved through programming, systemsanalysis and project management, and he became a leading testing expertwith Coopers & Lybrand. Now an independent testing consultant andmanager, Neil works directly for blue-chip clients through his owncompany, sometimes in association with other consultancies or agencies.
He is a member of the British Computer Society’s specialist interestgroups in Software Testing and Configuration Management, and is anassociate of the Institute of Management Consultancy.
He presented papers to EuroSTAR in 1993 (Organisation beforeAutomation) and 1994 (RAD versus Trad.) and to the BCS SIGiST in1998 (Religion, Politics and Testing, from which this paper has evolved)