'A critique of testing' UK TMF forum January 2015

A Critique of Testing(a.k.a. a Vision of the Future of the SDLC)

Llyr Wyn Jones

28th January, 2015

My Background

• Senior Developer and Consultant at Grid-Tools Ltd.

• One of the architects of AgileDesigner, a requirements and test case design tool.

• Only been in the industry for 4 years.

• Background: abstract mathematics and physics.

• Repeat contributor to Professional Tester Magazine (among other publications), as well as writing white papers for Grid-Tools Ltd.

Preamble

• What this ISN’T – a critique on how testers accomplish their jobs.

• What this IS – a critique on how the SDLC handles testing and making sure testers have the right information to do their job.

• What this ISN’T – a treatment of how test logistics is done.

• What this IS – a treatment of the theory of testing in terms of delivering quality software that meets the requirements.

• The former is a relatively solved problem. The latter isn’t.

Initial Challenges

• Test case design methods – there are lots!!

• No objective criteria to judge them.

• All focus seems to be on testing logistics.

• Very few purely interested in the theory.

• Without proper test case design, extremely difficult to perform accurate risk estimations.

Broader Challenges

• Communication issues between different roles.

• Different levels/variations of language required for different roles.

• Testing relegated to last – the “backs against the wall” problem.

• Development complexity is not accurately reflected in testing.

Clarity and Vision during development

Business Analyst Programmer TesterUser

The User Knows what they want

The Analyst specifies what that is

The Programmer writes the code

The Tester tests the program

The further the visions diverge The greater the problems

Clarity and Vision during development

Business Analyst Programmer TesterUser

There are less bugs and the product is delivered faster

The closer the vision means the user gets a quality product

13

10

15

30

1

6

10

40

70

Requirements Design Coding Development Testing Acceptance Testing

Lower Bound Upper Bound

1 3 10 15 30 401 6 10

4070

1000

Requirements Design Coding Development Testing Acceptance Testing Operation

Lower Bound Upper Bound

A Path to a Solution

• Analysing the common ingredients of test case design models.

• Putting all these ingredients into a common mathematical model.

• Establishing a theory where requirements, deliverables and testing plans can all co-exist and can be linked together.

• Analysing test case design models according to this theory.

Common Ingredients

• Test cases are made up of, primarily:• A set of input parameters• Expected results

• Some methods allow for application knowledge to be encoded and influence the tests generated.• i.e. relationships between input parameters and expected results.

• Test cases are designed to reveal defects.

• A given system has a theoretical maximum number of defects.

The Mathematical Theory

The mathematical theory – inspirations

• Quantum theory – information; uncertainty; observability.

• Turing Machines – information as instruction sets; transformations; the testability of software.

• Meta-mathematics – completeness and consistency of formal models; the testability of software.

The mathematical theory – information

• Key ingredient: information

• Systems can be modelled as information along with functions that act upon this information.

• Through the SDLC, the “system” is represented in many forms: • Requirements

• Design

• Implementation

• Transformations ensure that information gets shaped to fit the appropriate representation.

The mathematical theory – interpretation

• Associated with information is uncertainty: interpretation is key.

• If there are multiple interpretations of information, this introduces ambiguities.

• If the developer develops something by interpreting an ambiguous requirement, how can we guarantee that the delivered software is actually correct?

• Given n people, there are always n + 1 sides to a story: n people giving their version/interpretation of what happened and what actually happened.

Information and transformations - implementation

• Example:

“A car is comprised of a chassis, engine, four wheels, a steering wheel...”

• A requirement can be thought of as a set of instructions on how to build things.

• Software itself is a set of instructions.

• Thus, from requirements to implementation, the same information used to construct a system also describes how the system works.

Information and transformations - implementation

“To calculate the average of a set of numbers, add all the numbers together then divide it by the number of items.”

Float average(float [] items)

{

float total = 0;

foreach x in items

{

total = total + x;

}

return total / items.count();

}

Both statements describe the same thing, in different languages.

Information and Scaling

• Requirements evolve from a high-level vision to low-level specifications.

• At every step, it is important that information is preserved.

• Simple test: if you can go from the high-level to the low-level then back again, and the substance does not change, then the requirements are good.

Information and Scaling – Example

• Returning to the previous example:

“A car is comprised of a chassis, engine, four wheels, a steering wheel...”

• This is high-level: more detailed requirements will be given for all constituent parts, along with requirements about how they fit together.

• Software systems are similar:• Main high-level requirement• Auxiliary systems, with their own requirements• How it all fits together

• At every level, it is still information – just in different forms.

• Consistency between levels is key – given the low-level requirements for each component and how they tie together, the high-level requirements should be derived.

Information and transformations - testing

• Testing is the act of performing an action and then comparing actual results with expected results.

• Expected results come from requirements.

• The same information used to implement a system can also be used to test the system.

Ambiguities and Defects – Uncertainties

• Ambiguities are modelled as uncertainty in information.

• Defects are when the system does not work as intended.

• Defects come from two sources:• Ambiguities in the requirements.

• Human error.

• Defects also have a criticality factor, which must be taken into account when computing risk.

Ambiguities and Defects – Uncertainties

• Split “uncertainty” into two components:• Epistemic – when there is not enough information to determine correct behaviour.

• Systemic – introduced independent of the information (e.g. human error)

• Uncertainty in requirements (ambiguities) are preserved and manifest as uncertainty in the implementation (defects)

• As transformations preserve information, so they preserve uncertainty.

Uncertainty – Corollaries

• Poor requirements lead to poor implementations (regardless of testing).

• Good requirements lead to good implementations, as long as systemic factors such as human error are minimized.

• Good requirements also lead to good testing – less uncertainty means the user, BA, developer and tester will be on the same page, leading to a convergence in vision.

• As such, uncertainty can never be lost.

Uncertainty – Corollaries

• A piece of sculpture can be polished to make it appear nicer, but no amount of polishing can improve on a poor design.

• Testing, in this instance, is the polishing.

• The developer is the sculptor, and the requirements are the sculptor’s blueprint.

• If the blueprint is poor, or the developer makes a fundamental design flaw, then no amount of testing will remove that without the developer re-coding.

• However, good testing can polish a rough-but-sound implementation into good shape.

Information Classification(a.k.a. The Rumsfeld Matrix)

Knowns Unknowns

Known Full informationPartial information

Epistemic Uncertainty

Unknown Partial informationEpistemic Uncertainty

Non-measurable setsSystemic Uncertainty

Coverage and Risk

What is risk?

• The whole purpose of testing is to minimize the known risk (uncertainty) associated with an implementation.

• More testing = less risk (in theory)

• When to stop testing? The business decides the risk appetite; testing stops when the known risk falls beneath the appetite threshold.

What is Coverage?

• Coverage is an abstract concept.

• Coverage is a measure given to a set of test cases (usually from 0% to 100%) that indicates the proportion of conditions that are satisfied by that set.

• Coverage has to be qualified to indicate what we are measuring it against – a coverage reading will always be “with respect” to something.

• Coverage, in most cases, related to how much functionality is being covered in a test case.

What is Coverage?

For example:

• The Combinatorial Measure (of a set of tests) indicates the proportion of combinations of data elements that are covered by the set.

• The Functional Measure (of a set of tests) indicates the proportion of functional variations that are covered by the set.

Run-time vs. Design-time coverage?

• In common testing terms, coverage usually denotes the run-time coverage of a test of sets that have been run.

• Design-time coverage, on the other hand, allows testers to plan which tests to run according to how much coverage is possible.

• Remember: there are no infinite resources.

• Hence, it is important to be able to determine how to optimize the set of tests to be run.

Run-time vs. Design-time coverage?

• Design-time coverage gives a known value by which risk is mitigated.

• Run-time coverage indicates how much of this is actually being realized.

• Thus, both measures are important to determine risk: they cover the theory and practice.

• Without proper theory, it is impossible to associate run-time coverage with risk.

Risk vs. Coverage

• Most coverage measures do not associate an “impact” or “priority” to particular scenarios.

• Those that do, we call them weighted coverage measures.

• Both priority and risk are functions of probability and impact: a rare but mission-critical scenario is often judged to be more important than a common but trivial scenario.

• As coverage increases, risk decreases.

• However, 100% coverage does not mean 0% risk.

Risk vs. Coverage

• Coverage answers the question: “How well are my test plans designed?”

• Risk answers the question: “Is the software good enough to deploy?”

• The second question is the “exam question” for testing. The first gives you information to answer the second, not the answer.

• Another way of thinking of coverage: the coverage of a test set is the degree by which risk is reduced by running that test set.

Risk, Information and Testing

• Recall that there is both epistemic and systemic uncertainty.

• The former can be measured, the latter only estimated.

• Risk takes into account both.

• Hence, the more information that is given to testers, the lesser the epistemic uncertainty becomes. Hence, less risk.

Risk, Information and Testing

• A good test case design method will allow information to influence the choice of tests.

• Again, this is simply regarded as an information transformation; with it, comes uncertainty.

• Coverage informs risk. Risk represents the uncertainty associated with testing.

• Thus, the best that can be achieved (reliably) is only as good as the information coming in.

A Word on Exploratory Testing

• Exploratory testing can be thought of as “the testers cleaning up after the BAs”.

• It is unreliable, in the sense that it is not possible to reliably predict (with any degree of certainty) the quality of testing before it even begins.

• It can mitigate risk to a degree, but planning should never rely on it producing any results.

Observable Defects in Logic


• Start with a simple sentence:

if A or B then C

• Causes – A, B

• Effect – C

• Complex logic can be built up from simple statements.


• Each cause can have three states:• Implemented correctly (OK)

• Stuck at 0 (0)

• Stuck at 1 (1)

• C can have 8 possible defects:

• Note: lines 2, 3 and 4 are sufficient to discover all 8 defects.

A B C 0OK

1OK

OK0

OK1

11

10

01

00

AB

T T T T T T T T T T F

T F T F T T T T T T F

F T T T T F T T T T F

F F F F T F T T T T F

Observable Defects in Logic – Scaling

• General table for n-input logic gates:

No. of Inputs No. of RequiredFunctional Variations

No. of Possible

Combinations

No. of Possible Defects

2 3 4 8

3 4 8 26

4 5 16 80

5 6 32 242

6 7 64 728

𝑛 𝑛 + 1 2𝑛 𝑥=1

𝑛 𝑛𝑥2𝑥 = 3𝑛 − 1

Observable Defects in Logic – Masking of Errors

Assume A is stuck at FALSE, B is stuck at TRUE:

A B C D

1 0 0 1 1

1 1

0 1 1 1

0 1 1


Assume B fixed, A is still stuck at FALSE:

A B C D

1 0 1 0

1 1

1 1

If the bug found by test #4 is fixed, then test #1 fails.

IMPORTANT: ALL tests must be re-run until they ALL pass – this is what is meant by ‘stronglycovered’.


• Assume C and F are not observable events. G is observable.

• Assume A is stuck at TRUE• Enter as a test case A(F), B(F), D(F),

E(F)• Results should be C(F), F(F) and G(F)


• A, stuck at TRUE, causes C to be TRUE.

• The error is not detected since G is still FALSE due to F(F).

• Therefore, no test of C can be combined with tests of F which would result in G(T).

• These are known as untestablevariations.

• Solution:Force normally unobservable nodes to be observable (i.e. insert probe points).

T

F

F

F

F

T

F

Tenets of Testing, and Good Test Case Design

The Six Tenets of Testing

1. Not everything can be tested.

2. It is always possible to know what can be tested.

3. Observable defects represent the known uncertainty of a test plan.

4. Coverage is the measure of the faithfulness of a test plan, and informs the risk.

5. Uncertainty can never be lost.

6. No single model can uncover all defects.

Important properties of a Test Case Design Model

• How much application information can be encoded into the test case design?

• How many defects are made observable?

• Coverage: how many defects are made observable as a proportion of the theoretical maximum number of observable defects?

• What is the relative number of test cases required to reach optimum coverage?

Criteria: Comparing Test Case Design Models

1. Capability of encoding – how much information can be encoded?

2. Ease of encoding – how easy is it to encode information?

3. Applicability – how many scenarios can be sensibly encoded using the method?

4. Relative number of test cases generated to maximize coverage.

5. Detectable defects – the relative number of defects that can be found.

6. Coverage – the relative functional coverage that can be attained.

Categorization of Methods

• Auxiliary• Equivalence Class Testing / Partitioning

• Boundary Value Analysis

• Gut-feel / Exploratory

• Model-based• Flowcharts

• Logical Modelling

• Cause and Effect Graphing

• Fault Mode Enumeration Analysis (FMEA)

• Fault Tree Analysis (FTA)

• State Machines

Auxiliary Methods

• These can be used to augment and inform any method of testing.

• Equivalence Class Testing / Partitioning allows large search spaces to be split up into smaller sets according to some form of equivalence (usually functional).

• Boundary Value Analysis determines how the partitioning is done, and reduces the number of test cases by only testing key points in the search space.

Gut Feel / Exploratory

• Put into its own class – it is not a method.

• Commonly used where no formal method can be applied: usually when requirements are abysmal or not present.

• Auxiliary methods can be used to augment and improve.

• Critical weakness: no design-time coverage methods available, hence no risk-planning is possible.

Formal Models

• Generally thought of as “too difficult”.

• However, there is no getting around the complexity of a system: it can either be acknowledged or ignored (at peril).

• Formal models, being mathematical, allows for deterministic estimation of risk through proper coverage techniques.

Summary – Observations

• The SDLC is too commonly thought of as disparate pieces instead of a unifying whole.

• The ability to use information in different forms is key to tying all the parts together.

• Software is a set of instructions: this is a very important distinction, and should be exploited.

• Key problem: poor requirements; development and testing pay the price and clean up the mess.

A parting shot

• Returning to the car analogy: we can do this for • Cars

• Bridges

• Railways

• Etc.

• Why can’t we do this for software?