Transcript
Exploratory Testing
…a powerful approach, yet widely misunderstood …orders of magnitude more productive than scripted testing …simultaneous learning, test design and test execution
James Bach Exploratory testing evangelist
What is ET?
Exploratory software testing (ET) is a style of software testing that emphasizes the personal freedom and responsibility of the individual tester to continually optimize the value of her work by treating test-related learning, test design, test execution, and test result interpretation as mutually supportive activities that run in parallel throughout the project. Cem Kaner
Sounds promising…
…but… – impossible to automate – highly dependent on tester skills – hard to replicate failures (if testing is not
traced)
And, do we really know?
Exploring Exploratory Testing – outline
• Variations of exploratory testing • Empirical evidence on:
– Efficiency – Relation to knowledge and skills
• Recommendations • Making exploratory testing actionable
Variations of Exploratory Testing
Freestyle Pure scripted
Test object only
Test object, test steps, test data
Test goals, constraints
Test charter types • Fully scripted: defined test steps and test data –> no room
for exploration steps.
• Low degree of exploration: defined high goals, restrictions and test steps. Tester chooses test data.
• Medium degree of exploration: defined high goals and additional restrictions e.g. detailed goals, priorities, risks, tools used, the functionality that needs to be covered, or the test method to be used.
• High degree of exploration: one or more high goals. Besides that, the tester can freely explore the system.
• Freestyle: Only the test object is provided to the tester.
Is Exploratory Testing efficient?
Experiments on TCT vs ET • 46 students and 24 practitioners in 90 minute
sessions [Afzal 2015] – Test design included in TCT session – Faults found (ET) >> Faults found (TCT)
• 79 students in 90 minute sessions [Itkonen 2007] – Test design NOT included in TCT session – Faults found (ET) ≈ Faults found (TCT)
Empir Software Eng (2015) 20:844–878DOI 10.1007/s10664-014-9301-4
An experiment on the effectiveness and efficiencyof exploratory testingWasif Afzal ·Ahmad Nauman Ghazi · Juha Itkonen ·Richard Torkar ·Anneliese Andrews ·Khurram Bhatti
Published online: 16 April 2014© Springer Science+Business Media New York 2014
Abstract The exploratory testing (ET) approach is commonly applied in industry, but lacks
scientific research. The scientific community needs quantitative results on the performance
of ET taken from realistic experimental settings. The objective of this paper is to quantify
the effectiveness and efficiency of ET vs. testing with documented test cases (test case based
testing, TCT). We performed four controlled experiments where a total of 24 practition-
ers and 46 students performed manual functional testing using ET and TCT. We measured
the number of identified defects in the 90-minute testing sessions, the detection difficulty,
severity and types of the detected defects, and the number of false defect reports. The results
show that ET found a significantly greater number of defects. ET also found significantly
more defects of varying levels of difficulty, types and severity levels. However, the two test-
ing approaches did not differ significantly in terms of the number of false defect reportsCommunicated by: Jose Carlos MaldonadoW. Afzal (!)School of Innovation, Design and Engineering, Malardalen University, Vasteras, Sweden
e-mail: wasif.afzal@mdh.seA. N. Ghazi · K. BhattiBlekinge Institute of Technology, SE-37179, Karlskrona, SwedenA. N. Ghazie-mail: nauman.ghazi@bth.seJ. ItkonenDepartment of Computer Science and Engineering, Aalto University, Espoo, Finland
e-mail: juha.itkonen@aalto.fiR. TorkarDepartment of Computer Science and Engineering, Chalmers University of Technology|University of
Gothenburg, Gothenburg, Swedene-mail: richard.torkar@cse.gu.seA. AndrewsUniversity of Denver, Denver, CO 80208, USAe-mail: andrews@cs.du.edu
Is Exploratory Testing Efficient?
• Yes, very efficient if you only run test test case once • Equally or more efficient, if you only count execution • Not efficient, if you want to automate
CC BY-ND 2.0 Hans Splinter @Flickr
Knowledge in Exploratory Testing
Analysis of 12 ET sessions in 4 units of 3 companies, analyzing 88 failures [Itkonen 2013]
Knowledge types • domain knowledge, • system knowledge, and • generic software engineering knowledge
Design +
Oracle
The Role of the Tester’s Knowledgein Exploratory Software TestingJuha Itkonen, Member, IEEE, Mika V. Mantyla, Member, IEEE, andCasper Lassenius, Member, IEEEAbstract—We present a field study on how testers use knowledge while performing exploratory software testing (ET) in industrial
settings. We video recorded 12 testing sessions in four industrial organizations, having our subjects think aloud while performing their
usual functional testing work. Using applied grounded theory, we analyzed how the subjects performed tests and what type of
knowledge they utilized. We discuss how testers recognize failures based on their personal knowledge without detailed test case
descriptions. The knowledge is classified under the categories of domain knowledge, system knowledge, and general software
engineering knowledge. We found that testers applied their knowledge either as a test oracle to determine whether a result was correct
or not, or for test design, to guide them in selecting objects for test and designing tests. Interestingly, a large number of failures, windfall
failures, were found outside the actual focus areas of testing as a result of exploratory investigation. We conclude that the way
exploratory testers apply their knowledge for test design and failure recognition differs clearly from the test-case-based paradigm and
is one of the explanatory factors of the effectiveness of the exploratory testing approach.Index Terms—Software testing, exploratory testing, validation, test execution, test design, human factors, methods for SQA, and V&VÇ1 INTRODUCTION
SOFTWARE testing is traditionally considered a process ofexecuting test cases, which are carefully designed using
test-case design techniques [1], [2], [3], [4]. Test-case designtechniques aim at ensuring systematic coverage, detectionof typical error types, and reduction of redundant testing[1], [2], [5]. The test-case-based testing paradigm (TCBT)assumes that actual test execution, even when performedas a manual activity, is a more or less mechanical task.During execution, the predefined test cases are run andtheir outputs compared to the documented expectedresults. However, studies on industrial practice report thatreal-world testing seldom is based on rigorous, systematic,and thoroughly documented test cases [6], [7], [8].
Although test automation has been the focus of a greatdeal of research, manual testing is still widely utilized andappreciated in the software industry, and is unlikely to bereplaced by automated testing in the foreseeable future [6],[9], [10], [11], [12]. In many software development contexts,the manual testing effort of professional testers andapplication domain experts is crucial to ensuring thatproducts fulfill the needs of the users or please the markets.In this context, exploratory software testing has beenproposed as an effective testing approach.The exploratory testing (ET) approach differs signifi-cantly from traditional software testing in that it is notbased on predesigned test cases. Instead, it is a creative,
experience-based approach in which test design, execution,and learning are parallel activities, and the results ofexecuted tests are immediately applied for designingfurther tests [13]. Exploratory testing is a recognized testingapproach [14], but has commonly been referred to as ad hoctesting or error guessing [1], [2], [14]. Practitioners,however, recognize that exploratory aspects are funda-mental to most manual testing activities [10], [15], [16], [17].There are a growing number of practitioner reports andstudies on the benefits of exploratory testing [13], [18], [19],[20], [21]. In these reports, ET is commonly described in thecontext of system-level testing of interactive systemsthrough the GUI and from the end user’s point of view.
In studies of manual testing and, in particular, in the ETcontext, the experience and especially the applicationdomain knowledge of testers have been recognized asimportant aspects affecting the tester’s behavior and results[12], [16], [22], [23].
In this paper, we use the term knowledge to refer to thetester’s personal knowledge in a rather wide meaning. Usingthe terminology of Robillard [24], we include both topic, i.e.,meaning of words, and episodic, i.e., experience withknowledge, types of knowledge, and, to some extent, tacitknowledge.
Knowledge can be applied to different exploratorytesting tasks and purposes. First, knowledge can be usedas information to guide exploratory test design. Second,knowledge can be used to recognize failures, i.e., as anoracle to distinguish between a correct, expected outcomeand an incorrect, defective outcome [14]. Third, knowledge,together with the observed actual behavior of the testedsystem, can be used to create new, better tests duringexploratory testing.
In this paper, we present a study in which we examinedhow testers provoke and recognize software failures in
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 39, NO. 5, MAY 2013
707
. The authors are with the Department of Computer Science and Engineering,
School of Science, Aalto University, PO Box 15400, FI-00076 Aalto,
Finland. E-mail: {juha.itkonen, mika.mantyla, casper.lassenius}@aalto.fi.Manuscript received 11 May 2011; revised 25 Feb. 2012; accepted 4 Sept.
2012; published online 10 Sept. 2012.Recommended for acceptance by L. Williams.For information on obtaining reprints of this article, please send e-mail to:
tse@computer.org, and reference IEEECS Log Number TSE-2011-05-0143.
Digital Object Identifier no. 10.1109/TSE.2012.55.
0098-5589/13/$31.00 ! 2013 IEEE Published by the IEEE Computer Society
Findings on Knowledge
1. ET is efficient since the testers use different types of personal knowledge, rather than restricting their focus
2. Failures are incidentally found outside the actual target features of the testing activities
3. A large fraction of the failures do not require complicated test designs to be provoked
4. Domain knowledge issues are straightforward to provoke, while system or generic knowledge issues are more complicated to provoke in terms of the number of interacting conditions.
Formal Training in Exploratory Testing
• Experiment with 20 professionals [Micalef 2016] – with/without formal test training – 20 injected faults in e-commerce system – up to 40 minute session with eye-tracking device
TABLE IIIPARTICIPANT DEMOGRAPHICS AND BACKGROUND INFORMATION (G:
GENDER, A: AGE, E: YEARS OF EXPERIENCE IN A TESTING ROLE)
Carmen GeorgeID Domain G A E Domain G A E1 Design M 36 0 E-comm F 31 32 E-comm F 51 1 Payments M 25 3.53 Content M 30 0 E-comm M 39 134 E-comm M 34 0 Payments M 27 1.55 E-comm F 37 0 Telco M 32 36 Gaming M 26 1 Payments F 23 37 E-comm F 26 0 E-comm F 27 78 E-comm F 29 0 Payments M 31 29 E-comm F 24 0 Networks M 28 6
10 Virtualiz. M 24 2 Payments M 21 1Median 29.5 0 Median 27.5 3
Mean 31.7 0.4 Mean 28.4 4.3
were allowed to test the system (exploratory) for up to40 minutes whilst a researcher observed their activitiesfrom a remote monitor (and taking notes unobtrusively).The participant’s terminal was equipped with an eye-tracking device (Tobii X-120), a camera (with face-tracking capabilities) as well as a microphone (see Figure3). Participants were encouraged to think-aloud duringthe test session, commenting on specific bugs beingdiscovered as well as general issues with the system. TobiiStudio (Eye Tracking Software for Analysis) was usedto capture eye-gaze data together with mouse activity,audio and video streams. Most participants suggestedimprovements to the e-commerce site while others foundbugs that were not intentionally injected as part of thestudy. Think-aloud allows participants to focus on theirprimary task without the need to interrupt their workflowto log bugs (manually or online). A researcher tooknote of any points of interest (POIs) that may haveoccurred during test session (e.g., participant’s gaze wasfixed for a long time on a specific element) in whichcase the participant was invited for a reflective think-aloud (RTA) session right after the test session. Here theresearcher plays back portions of the session (POIs) tothe participant upon which a discussion ensues. This addsa second layer of understanding to the otherwise sterilegaze-data. Following the test session (and RTA, if appli-cable), a short debriefing exercise is conducted wherebythe researcher concludes the session through a short semi-structured interview (capturing participants’ biographicinformation, knowledge of exploratory testing strategiesas well as insights into their professional experience).
Data processing1) Reviewing raw data: Over 13 hours of gaze-point data
together with the corresponding audio and video streamswere reviewed systematically using a set of predefinedscoring sheets (see Figure 4). For each session theresearcher took note of a) predefined observable be-havioural patterns (e.g., tester hovered around the same
Fig. 3. Participant eye-tracking terminal (left) and remote monitoring station(right)
area where a bug was previously found), b) bugs reported(if at all) and c) corresponding timestamps (down to aminute by minute granularity).
Fig. 4. Scoring sheet used during the data processing stage. This helpedthe researcher to systematically annotate observed behavioural patterns (fromvideo, audio and gaze-data) together with bugs reported by each participant.
2) Data staging: Strategies were abstracted as a sequentialset of behavioural patterns (e.g., ES2 = [BP21, BP5,BP22, BP20]), therefore allowing the researcher to mapa series of observed participant behavioural patterns toa probability that a particular strategy was being usedduring a specific timeframe (knowingly or unknowingly).Furthermore the exploratory strategies selected for thisstudy were grouped into three broad categories: guidedstrategies, semi-guided strategies and unguided strategiesin descending order of rigour and technical know-howrequired for each strategy. Given that we were monitoringthe use of seven strategies by twenty participants in thestudy, it was decided that it would be more manageable togroup strategies together based on where they lay on theguided/unguided scale as depicted in Figure 5. Based onthis scale, ES1 and ES3 were classified as unguided, ES2,ES4 and ES7 were classified as guided whilst ES6 andES5 were classified as semi-guided because they exhibita balanced amount of characteristics from both extremes.The resulting information was synthesised in a tabularformat which represented a minute by minute log ofwhich tester type was using which strategy type and
309
Do Exploratory Testers Need Formal Training?
An Investigation Using HCI Techniques
Mark MicallefFaculty of ICTUniversity of Maltamark.micallef@um.edu.mt
Chris PorterFaculty of ICTUniversity of Maltachris.porter@um.edu.mt
Andrea BorgFaculty of ICTUniversity of Maltaandrea.borg.12@um.edu.mt
Abstract—Exploratory software testing is an activity which
can be carried out by both untrained and formally trained
testers. We personify the former as Carmen and the latter as
George. In this paper, we outline a joint research exercise between
industry and academia that contributes to the body of knowledge
by (1) proposing a data gathering and processing methodology
which leverages HCI techniques to characterise the differences in
strategies utilised by Carmen and George when approaching an
exploratory testing task; and (2) present the findings of an initial
study amongst twenty participants, ten formally trained testers
and another ten with no formal training. Our results shed light
on the types of strategies used by each type of tester, how they are
used, the effectiveness of each type of strategy in terms of finding
bugs, and the types of bugs each tester/strategy combination
uncovers. We also demonstrate how our methodology can be
used to help assemble and manage exploratory testing teams in
the real world.
I. INTRODUCTION
Carmen and George are both employed as software testers.
However, whilst George is formally trained and certified,
Carmen has no training but got the job because she is ‘good at
finding bugs’. A heated debate about whether or not software
testers should be formally trained and certified was sparked
by the announcement in 2011 of the development of a new
ISO standard on software testing [1] and further fuelled by
the perception that unlike other roles in software engineering,
testing can be carried out by anyone, providing they under-
stand an application’s domain. One side argues that certified
testers will approach testing with discipline and consistency
whilst the other argues that much like a driving license does
not make one a good driver, a testing certification does not
guarantee a good tester. Furthermore, end-users regularly find
and report bugs in systems even though they are not trained
as testing professionals. Testing encompasses a wide range of
skills (e.g., planning, design, automation, exploratory testing),
proficiency in most of which requires a certain level of formal
training.However, an interesting opportunity for joint research be-
tween academia and the industry presents itself when one
considers that exploratory testing can be carried out by both
trained and untrained testers. Exploratory testing involves a
software tester interacting with a system in an unscripted man-
ner guided mainly by her intuition and experience. Although it
is a recognised approach, the technique is frequently referred
to as ad-hoc testing and suffers from the reputation of deliv-
ering inconsistent results depending on the tester executing it.
Despite the fact that there are documented exploratory testing
strategies which can be utilised [2], effectiveness has been
shown to be dependent on the tester’s knowledge [3], learning
style [4] and even personality [5]. The debate as to whether
or not formal training is a positive or negative influence on
the quality of exploratory testing motivates our hypothesis.
A. Hypothesis and Research Questions
If one partitions testers into two broad groups such that one
group consists of testers with a formal qualification in software
testing, and the second group consists of testers with no such
formal training, then we hypothesise that the two groups of
testers intuitively use different yet complimentary exploratory
testing strategies. In order to explore this hypothesis, we
propose to investigate three research questions:
(RQ1) Which types of exploratory testing strategies are
utilised by testers in each group?
(RQ2) Which types of bugs are found by testers in each
group?(RQ3) Is there a link between the bugs found and the testing
strategies adopted by the tester groups?
B. Research ChallengeThe main research challenge here is instigated mostly by the
fact that our subjects of interest (software testers) probably
do not possess explicit and standardised knowledge of the
strategies that they utilise to do their testing. To use a medical
metaphor, whereas two doctors are highly likely to refer to
any number of medical procedures by their technical names
and understand each other perfectly well, very little such
standardisation exists in exploratory software testing. This
is further compounded by the fact that one of the groups
we want to study is not even formally trained and would
therefore be even less likely to know any jargon. Therefore,
we needed to design a methodology which non-intrusively
extracts information about which strategies are being used by
participants at any point in time.C. How can HCI Help?Human Computer Interaction (HCI) is an interdisciplinary
area of research and practice which has evolved throughout the
2016 IEEE International Conference on Software Testing, Verification and Validation Workshops
/16 $31.00 © 2016 IEEEDOI 10.1109/ICSTW.2016.31
305
Do Exploratory Testing need Formal Training?
Fig. 5. The scale of test strategies - from unguided to guided
whether any bugs (by type) were reported. The datastaging phase produced the necessary level of detailrequired to conduct an in-depth analysis as discussed inthe following step.
Evaluation and Conclusions
1) Data analysis: Here, we leveraged preprocessed data inorder to produce outcomes and recommendations. Quan-titative tests on scoring data allowed us to identify di-verging a) behavioural patterns between tester categories,b) rigour in the use of test strategies and c) effectivenessof strategy use in terms of bugs discovered (and theirrespective categories). For this purpose pivot charts wereadopted so as to stay as close to the data as possible whilebeing able to transform the observed data to uncoverpotentially hidden patterns and correlations.
2) Expert evaluation: A questionnaire was sent out to around40 software quality assurance professionals to obtain ameasure of perceived importance for the various bugcategories covered in our study (i.e., which bugs, in orderof importance, should be reported and fixed prior to re-lease?). Finally we briefed a group of testing profession-als with our main results and interpretation thereof, andthese in turn contributed with their own practice-basedopinions, interpretations and recommendations. Thesetwo feedback loops were considered while formulatingour discussion (see Section V) as well as in our finalremarks (see Section VI) and potential future researchavenues (see Section VI-A).
3) Conclusions: A set of recommendations were finallyproduced with respect to the assembly and managementof exploratory testing teams, which could in turn improveefficiency and return on investment.
IV. RESULTS
As outlined in Section I-A, this work was driven bythree research questions which were designed to collectivelycharacterise the differences between how formally trainedtesters (George) and untrained testers (Carmen) approach anexploratory testing task. In this section, we present an outlineof the empirical results relating to each of these questions.We then go on to a synthesised discussion of these results inSection V.
TABLE IVDISTRIBUTION OF BUGS FOUND ACCORDING TO CATEGORY AND THE
TYPE OF TESTER THAT FOUND THEM.
Category Carmen George TotalContent Bugs 35 (54%) 30 (46%) 65Input Validation Bugs 6 (21%) 23 (79%) 29Logical Bugs 5 (50%) 5 (50%) 10Functional UI Bugs 10 (48%) 11 (52%) 21Nonfunctional UI Bugs 1 (11%) 8 (89%) 9
57 77 134
A. RQ1: Which types of strategies are use by testers in eachgroup?
1) Testers with no formal training (Carmen) overwhelminglyrely on unguided strategies (65%), using guided strategiesonly 31% of the time and semi-guided strategies for 4%of the time.
2) Formally trained testers (George) split their use of guidedand unguided strategies evenly (45% and 46% respec-tively) whilst making double the use of semi-guidedstrategies as untrained testers (9%).
3) Interestingly, whilst Carmen seems to always prefer un-guided strategies, George tends to alternate between them,using unguided strategies to find opportunities for moreguided testing (see Figure 6 and Figure 7).
Fig. 6. Strategy types collectively used by untrained testers (Carmen)throughout the 40 minute test sessionsg
Fig. 7. Strategy types collectively used by formally trained testers (George)throughout the 40 minute test sessions
B. RQ2: Which bugs are found by testers of each group?
1) We split bugs into five categories (see Section III-D).
310
w training w/o training
[Micalef 2016]
Recommendations on Exploratory Testing
Freestyle Pure scripted
Domain issues Little repetition
System issues Much repetition –>
automation Use both Train your testers
Actionable Exploratory Testing
Workshop agenda • Introduction (10 min): research context, team & participants • The principles of exploratory testing (5 min) • Alternative types of test charters (20 min)
• Exercise: Write test cases according to test charter templates (15 + 25 min)
• Reflect on improvements (10 min)
• Closing (5 min): Sum up; next steps
Further reading
• Itkonen J, Mäntylä M, Lassenius C (2007) Defect Detection Efficiency: Test Case Based vs. Exploratory Testing. ESEM’07, pp 61–70
• Itkonen J., Mäntylä M. V. and Lassenius, C. The Role of the Tester's Knowledge in Exploratory Software Testing IEEE Transactions on Software Engineering (2013) 39(3):707–724
• Micalef M, Porter C, Borg A, Do Exploratory Testers Need Formal Training? An Investigation Using HCI Techniques, TAIC-PART, ICST Workshops 2016: 305-314
• Afzal W, Ghazi, A N, Itkonen, J, Torkar, R, Andrews A, Khurram Bhatti, An Experiment on the Effectiveness and Efficiency of Exploratory Testing, Empir Software Eng (2015) 20:844–878
Further contacts
Per Runeson per.runeson@cs.lth.se
Elizabeth Bjarnason elizabeth.bjarnason@cs.lth.se
Kai Peterson kai.peterson@bth.se
top related