Top Banner
On Effective Testing of Health Care Simulation Software Christian Murphy, M.S. Raunak, Andrew King, Sanjian Chen, Christopher Imbriano, Gail Kaiser, Insup Lee, Oleg Sokolsky, Lori Clarke, Lee Osterweil University of Pennsylvania Loyola University Maryland Columbia University University of Massachusetts Amherst
28

On Effective Testing of Health Care Simulation Software Christian Murphy, M.S. Raunak, Andrew King, Sanjian Chen, Christopher Imbriano, Gail Kaiser, Insup.

Dec 21, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: On Effective Testing of Health Care Simulation Software Christian Murphy, M.S. Raunak, Andrew King, Sanjian Chen, Christopher Imbriano, Gail Kaiser, Insup.

On Effective Testing ofHealth Care Simulation Software

Christian Murphy, M.S. Raunak, Andrew King,

Sanjian Chen, Christopher Imbriano, Gail Kaiser,

Insup Lee, Oleg Sokolsky, Lori Clarke, Lee Osterweil

University of Pennsylvania

Loyola University Maryland

Columbia University

University of Massachusetts Amherst

Page 2: On Effective Testing of Health Care Simulation Software Christian Murphy, M.S. Raunak, Andrew King, Sanjian Chen, Christopher Imbriano, Gail Kaiser, Insup.

2 / 27

Overview Simulation software is used widely in the field of

health care

Simulators must not only accurately model the real world, but be free of software defects as well

It is particularly hard to test simulation software because often there is no “test oracle”

Our research shows that it is possible to detect defects if properties of the software are violated

Page 3: On Effective Testing of Health Care Simulation Software Christian Murphy, M.S. Raunak, Andrew King, Sanjian Chen, Christopher Imbriano, Gail Kaiser, Insup.

3 / 27

Outline

Motivating examples

Overview of testing approach

Study #1: Demonstrating feasibility

Study #2: Measuring effectiveness

Future work & conclusion

Page 4: On Effective Testing of Health Care Simulation Software Christian Murphy, M.S. Raunak, Andrew King, Sanjian Chen, Christopher Imbriano, Gail Kaiser, Insup.

4 / 27

Flow of Patients through ED

Length of Stay versus Utilization

0

50

100

150

200

250

300

0 2 4 6 8 10 12

number of beds

unit

s of

tim

e

0

2

4

6

8

10

12

14

16

perc

ent

utiliz

ation

LOS

DoctorUtilizationNurseUtilizationTriageUtilizationClerkUtilization

Raunak et al., “Simulating patient flow through an emergencydepartment using process-drivendiscrete event simulation”, SEHC’09

Page 5: On Effective Testing of Health Care Simulation Software Christian Murphy, M.S. Raunak, Andrew King, Sanjian Chen, Christopher Imbriano, Gail Kaiser, Insup.

5 / 27

Glycemic Control (Insulin Pump)

King et al., “Prototyping closed loopphysiologic control with the MedicalDevice Coordination Framework”,SEHC’10

Page 6: On Effective Testing of Health Care Simulation Software Christian Murphy, M.S. Raunak, Andrew King, Sanjian Chen, Christopher Imbriano, Gail Kaiser, Insup.

6 / 27

Problem Statement Partial oracles may exist for a limited subset

of the input domain in simulation software

Obvious errors (e.g., crashes) can be detected with certain inputs or testing techniques

However, it is difficult to detect subtle computational defects in simulators without test oracles in the general case

Page 7: On Effective Testing of Health Care Simulation Software Christian Murphy, M.S. Raunak, Andrew King, Sanjian Chen, Christopher Imbriano, Gail Kaiser, Insup.

7 / 27

What do I mean by “defect”? Deviation of the implementation from the

specification Violation of a sound property of the software

“Discrete localized” calculation errors Off-by-one Incorrect sentinel values for loops Wrong comparison or mathematical operator

Misinterpretation of specification Parts of input domain not handled Incorrect assumptions made about input

Page 8: On Effective Testing of Health Care Simulation Software Christian Murphy, M.S. Raunak, Andrew King, Sanjian Chen, Christopher Imbriano, Gail Kaiser, Insup.

8 / 27

Research Goals

Identify an approach for testing simulation software that is effective even without a test oracleReliably detect defects Increase confidence that the software works

Demonstrate feasibility of the approach

Measure the effectiveness of the approach

Page 9: On Effective Testing of Health Care Simulation Software Christian Murphy, M.S. Raunak, Andrew King, Sanjian Chen, Christopher Imbriano, Gail Kaiser, Insup.

9 / 27

Outline

Motivating examples

Overview of testing approach

Study #1: Demonstrating feasibility

Study #2: Measuring effectiveness

Future work & conclusion

Page 10: On Effective Testing of Health Care Simulation Software Christian Murphy, M.S. Raunak, Andrew King, Sanjian Chen, Christopher Imbriano, Gail Kaiser, Insup.

10 / 27

Observation Many programs without oracles have

properties such that certain changes to the input yield predictable changes to the output

We can detect defects in these programs by looking for any violations of these “metamorphic properties”

This is known as “metamorphic testing”T.Y. Chen et al., HKUST Tech Report, 1998

Page 11: On Effective Testing of Health Care Simulation Software Christian Murphy, M.S. Raunak, Andrew King, Sanjian Chen, Christopher Imbriano, Gail Kaiser, Insup.

11 / 27

Metamorphic Testing

If new test case output f(t(x)) is as expected, it is not necessarily correct

However, if f(t(x)) is not as expected, either f(x) or f(t(x)) – or both! – is wrong

x f f(x)Initial test case

t(x) f f(t(x))New test case

t f(x) and f(t(x))are “pseudo-oracles”

Transformation function based on

metamorphic properties of f

Page 12: On Effective Testing of Health Care Simulation Software Christian Murphy, M.S. Raunak, Andrew King, Sanjian Chen, Christopher Imbriano, Gail Kaiser, Insup.

12 / 27

Metamorphic Testing Example Consider a function to determine the standard

deviation of a set of numbers

a b c d e fInitialinput

c e b a f dNew testcase #1

2a 2b 2c 2d 2e 2fNew testcase #3

sstd_dev

std_dev

std_dev

s ?

2s ?

std_dev s ?New testcase #2

a+2b+2c+2d+2e+2f+2

Page 13: On Effective Testing of Health Care Simulation Software Christian Murphy, M.S. Raunak, Andrew King, Sanjian Chen, Christopher Imbriano, Gail Kaiser, Insup.

13 / 27

Related Work

Verification of simulation modelsO. Balci, 1997 Winter Simulation Conf.R. Sargent, 2005 Winter Simulation Conf.

Applying metamorphic testing to applications without test oraclesT.Y. Chen et al., Info. and Soft. Tech., 2002

Page 14: On Effective Testing of Health Care Simulation Software Christian Murphy, M.S. Raunak, Andrew King, Sanjian Chen, Christopher Imbriano, Gail Kaiser, Insup.

14 / 27

Outline

Motivating examples

Overview of testing approach

Study #1: Demonstrating feasibility

Study #2: Measuring effectiveness

Future work & conclusion

Page 15: On Effective Testing of Health Care Simulation Software Christian Murphy, M.S. Raunak, Andrew King, Sanjian Chen, Christopher Imbriano, Gail Kaiser, Insup.

15 / 27

Feasibility Study

Goal: Demonstrate that metamorphic testing is feasible for testing simulation software

We first identify metamorphic properties in the applications of interestJSim: discrete event simulator (patients in ED)GCS: glycemic control simulator (insulin pump)

We then apply metamorphic testing and look for defects

Page 16: On Effective Testing of Health Care Simulation Software Christian Murphy, M.S. Raunak, Andrew King, Sanjian Chen, Christopher Imbriano, Gail Kaiser, Insup.

16 / 27

Metamorphic Properties JSim: Flow of patients through ED

Increasing number of resources (e.g., beds) should not increase average patient length of stay

Increasing number of resources should not decrease other resources’ utilization rates

Multiplying the time necessary for each step by a positive constant c should increase the overall time by c

GCS: glycemic control system (insulin pump) A patient who weighs more should get more insulin A patient who produces more endogenous glucose should

get more insulin The modeled insulin absorption rate should vary inversely

with the insulin distribution volume

Page 17: On Effective Testing of Health Care Simulation Software Christian Murphy, M.S. Raunak, Andrew King, Sanjian Chen, Christopher Imbriano, Gail Kaiser, Insup.

17 / 27

JSim Findings

Page 18: On Effective Testing of Health Care Simulation Software Christian Murphy, M.S. Raunak, Andrew King, Sanjian Chen, Christopher Imbriano, Gail Kaiser, Insup.

18 / 27

Unexpected JSim FindingsID Arrival

TimeDeparture

TimeLength of

Stay

1 2 159 157

2 8 185 177

3 14 197 183

4 20 295 275

5 26 321 295

217.4

ID Arrival Time

Departure Time

Length of Stay

1 2 159 157

2 8 185 177

3 14 194 180

4 20 312 292

5 26 321 295

220.2

Average LOS with 1 nurse Average LOS with 2 nurses

Page 19: On Effective Testing of Health Care Simulation Software Christian Murphy, M.S. Raunak, Andrew King, Sanjian Chen, Christopher Imbriano, Gail Kaiser, Insup.

19 / 27

Outline

Motivating examples

Overview of testing approach

Study #1: Demonstrating feasibility

Study #2: Measuring effectiveness

Future work & conclusion

Page 20: On Effective Testing of Health Care Simulation Software Christian Murphy, M.S. Raunak, Andrew King, Sanjian Chen, Christopher Imbriano, Gail Kaiser, Insup.

20 / 27

Measuring Effectiveness

Goal: Estimate the effectiveness of metamorphic testing at detecting defects in simulators

We first systematically seed the software with defects

We then measure the number that are detected

Page 21: On Effective Testing of Health Care Simulation Software Christian Murphy, M.S. Raunak, Andrew King, Sanjian Chen, Christopher Imbriano, Gail Kaiser, Insup.

21 / 27

Methodology Mutation testing was used to seed defects into

each application Reverse comparison operators Change math operators Introduce off-by-one errors

For each program, we created multiple versions, each with exactly one mutation

We ignored mutants that yielded outputs that were obviously wrong, caused crashes, etc.

Effectiveness is determined by measuring what percentage of the mutants were “killed”

Page 22: On Effective Testing of Health Care Simulation Software Christian Murphy, M.S. Raunak, Andrew King, Sanjian Chen, Christopher Imbriano, Gail Kaiser, Insup.

22 / 27

Results

Application JSim

GCS

Control

GCS

Patient

Mutants generated 104 306 644

Usable mutants 25 237 487

Mutants detected 25 58 333

Effectiveness 100% 24.4% 68.4%

Page 23: On Effective Testing of Health Care Simulation Software Christian Murphy, M.S. Raunak, Andrew King, Sanjian Chen, Christopher Imbriano, Gail Kaiser, Insup.

23 / 27

Analysis: JSim “Statistical metamorphic testing” useful for killing

mutants related to non-deterministic event timing

If timing range is [A, B] and observed mean is μ, then mean μ’ for range [10A, 10B] should be around 10μ

Because of mutant, range is actually [A, B-1]

Over many executions, observed mean μ’ has statistically significant difference from expected mean 10μ

Page 24: On Effective Testing of Health Care Simulation Software Christian Murphy, M.S. Raunak, Andrew King, Sanjian Chen, Christopher Imbriano, Gail Kaiser, Insup.

24 / 27

Analysis: GCS Metamorphic testing not as effective in control

algorithm (rules for delivering insulin)

Rules are usually of the form “if patient blood sugar is x then adjust infusion rate by y”

Single mutants did not have much effect on overall insulin delivered

These may be detected by more “straightforward” software testing approaches

Page 25: On Effective Testing of Health Care Simulation Software Christian Murphy, M.S. Raunak, Andrew King, Sanjian Chen, Christopher Imbriano, Gail Kaiser, Insup.

25 / 27

Outline

Motivating examples

Overview of testing approach

Study #1: Demonstrating feasibility

Study #2: Measuring effectiveness

Future work & conclusion

Page 26: On Effective Testing of Health Care Simulation Software Christian Murphy, M.S. Raunak, Andrew King, Sanjian Chen, Christopher Imbriano, Gail Kaiser, Insup.

26 / 27

Future Work

Formalizing the process of identifying metamorphic properties for simulators

Consider the use of metamorphic testing for validation If a property is violated, does that mean there is a

defect, or is the property simply unsound? If the property is unsound, is this simulator

appropriate for the task it is meant to model?

Page 27: On Effective Testing of Health Care Simulation Software Christian Murphy, M.S. Raunak, Andrew King, Sanjian Chen, Christopher Imbriano, Gail Kaiser, Insup.

27 / 27

Conclusion

We have demonstrated that metamorphic testing is an effective technique for testing simulation software

It can increase confidence in the implementation

It also helps increase understanding of how the software behaves

Page 28: On Effective Testing of Health Care Simulation Software Christian Murphy, M.S. Raunak, Andrew King, Sanjian Chen, Christopher Imbriano, Gail Kaiser, Insup.

On Effective Testing ofHealth Care Simulation Software

Christian Murphy, University of Pennsylvania

[email protected]

M.S. Raunak, Loyola University Maryland

[email protected]