Mutatis Mutandis Evaluating DBMS Test Adequacy with Mutation Testing Ivan T. Bowman, HANA Product Engineering June 24, 2013 Public.

Mutatis MutandisEvaluating DBMS Test Adequacy with Mutation TestingIvan T. Bowman, HANA Product EngineeringJune 24, 2013 Public

© 2013 SAP AG or an SAP affiliate company. All rights reserved. 2Public

Agenda

Test adequacy and why we want to evaluate it

Mutation testing and how to apply it to database systems

Conclusions and future work

Test adequacy


Challenges of testing database servers

Self-management features

Adapting to current conditions leads to increased internal state that needs to be tested

Relational equivalence

DML statements can be transformed into different equivalent forms

The correct answer may be returned even if a desirable optimization was not considered

Concurrent execution

Database systems need to work well with concurrent connections and intra-query parallelism

Concurrency leads to more primitives that need testing and more interactions to test

Performance requirement are not clearly specified

Tests of performance need to control many factors, limiting their sensitivity


Goals of measuring test adequacy

We want to make sure that software is “thoroughly tested”

We want to compare advanced test generation techniques:

Random query generation (RAGS)

Genetic algorithms

We want to minimize test suites by removing expensive tests that contribute little

We want to prioritize test suites

For example, require all developers to run a minimum-acceptance test before submission


Measuring test adequacy

Coverage techniques are commonly used

Measure how many statements / basic blocks / code combinations are executed under tests

The metric is simple to compute and understand

Coverage metrics are necessary but not sufficient for computing test adequacy

A direct approach to evaluating adequacy considers how many faults the tests find

We could try to use “natural” faults that result from developer errors found before or after shipping

We could manually “seed” faults in the code to see how many are detected For example, disable the effect of specific types of mutex

We could automatically introduce faults using patterns: mutation testing


Mutation testing

Mutation testing evaluates the effectiveness of a test suite for detecting incorrect programs

Evaluation focuses on those that are “close” to the correct version

Mutation operators are defined to alter source code

For example, change logical and (&&) to or (||)

Each operator creates a “mutant” program

A test suite “kills” the mutant if it passes with the original and fails with the mutant

Mutation adequacy score is the ratio of killed mutants to total mutants

A particular problem arises with mutants that don’t affect program semantics (equivalence)


Does statement coverage predict ability to kill mutants?

0% 5% 10% 15% 20% 25% 30% 35%0%

2%

4%

6%

8%

10%

12%

14%

16%

18%

20%

R² = 0.169599259716198

Percent of Statements Executed

Per

cen

t o

f M

uta

nts

Kil

led


General approach of mutation testing

Generate Mutants P’

Input Program P Generated

Mutants P’

Input Test Set T

P’ Fails T?

Yes: P’ Killed No: P’ Lives


Mutation operators we evaluated

Function

Applies to all methods and C-style functions. Skip the contents of the function.

Condition

Applies to the condition in for, while, and if statements. Evaluate a mutated condition.

Switch

Applies to all switch statements. Add one to the expression.

Case

Applies to all case statements within a switch. Skip the contents of the case.

Default

Applies to default statements within a switch statement. Skip the default statements.


Mutation types and their outcomes (400K of query processing code)

Count Covered Killed Lethal PassFunction 13,563 75.6% 69.4% 41.1% 30.6%

Condition 35,680 74.8% 65.6% 34.9% 34.4%

Switch 1,176 78.1% 69.0% 34.1% 31.0%

Case 7,528 55.7% 46.1% 27.4% 53.9%

Default 955 24.2% 18.8% 7.9% 81.2%

Total 58,902 71.8% 63.3% 34.9% 36.7%

Practical issues


Large systems generate many mutants

In a 400K line subset of SQL Anywhere, about 59K mutants identified

Compiling these separately would take too much time and space

Running the general mutant algorithm would take 432 days

Generate a single meta-mutant where mutants can be enabled at run-time

A:if( len > 0 )

B:if(mutation_on(123) ? (len>=0) : (len>0) )

Use the same code editing to track mutation coverage

Identify which tests cover mutated code lines

No need to evaluate whether a test kills a mutant it does not execute


Simplifying assumptions

Independence of mutations

If M is a set of mutants then test kills iff it kills or (or both)

Test failures do not corrupt state

Test may terminate on input with one of Crash, Timeout, Fail or Pass

Tests deterministically find faults


Proposed Improvements

1. Test independent mutations simultaneously

Individual tests do not kill many tests On average, a test kills 12% of the mutants it covers

Guess test will not kill any of the currently living mutants it covers

Use binary search if it does fail with enabled If test fails on a single mutant, it kills the mutant

From 432 days to 17.3 with this improvement

2. Identify “lethal” mutations in a first pass

Execute each mutant with the cheapest test that kills it

3. Order tests so that cheaper tests run first

Mutants that are easily killed are removed quickly

2.Lethal Mutation Step

3.Ordering

No Yes

No 415h 54h

Yes 49h 34h


Reminder: general approach of mutation testing

Generate Mutants P’

Input Program P Generated

Mutants P’

Input Test Set T

P’ Fails T?

Yes: P’ Killed No: P’ Lives


Adjusted approach of mutation testing

Input Program P

Input Test Set T

Meta-Mutant P’

Living Mutants

Prepare Meta-Mutant

Measure test time

and mutant coverage

Find lethal mutants

For each ordered by duration

Guess that passes on

Recursively divide and conquer

Delete lethal

Cheapest

Del

ete

k


Coverage and mutation adequacy as tests execute

0 100 200 300 400 500 6000%

20%

40%

60%

80%

100%

Statements Covered Killed Mutants

Number of Tests Executed

Pe

rce

nt

of

Ma

xim

um

Conclusions and Future Work


Conclusions

Mutation testing is practical for large systems such as DBMSs

Mutation testing gives a “harder” adequacy metric than code coverage

Our approach required simplifying assumptions that are not generally true

Small-scale testing shows these hold sufficiently for some purposes

Future work is needed to address simplifying assumptions

Tests do not deterministically find faults; in particular, some mutations are non-deterministic

Mutants may interact; can we characterize them as independent by analysis?

Future work includes comparing test generation frameworks on mutation adequacy


Acknowledgements

Feedback from the SQL Anywhere Query Processing group was invaluable

In particular, detailed suggestions from Daniel J. Farrar helped shape this paper

Intern J. Devin Papineau provided significant motivation and early discussion for this work

The anonymous reviewers provided helpful and insightful suggestions

© 2013 SAP AG or an SAP affiliate company. All rights reserved.

Thank you

Contact information:

Ivan T. BowmanResearch Manager of SQL Anywhere Query ProcessingIvan.Bowman (at) sap.com

Mutatis Mutandis Evaluating DBMS Test Adequacy with Mutation Testing Ivan T. Bowman, HANA Product Engineering June 24, 2013 Public.

Documents

sap ag

sap affiliate company

public slide

test adequacy slide

mutation testing slide

public agenda test adequacy

public mutation operators

public mutation types