On Effective Testing of Health Care Simulation Software Christian Murphy, M.S. Raunak, Andrew King, Sanjian Chen, Christopher Imbriano, Gail Kaiser, Insup Lee, Oleg Sokolsky, Lori Clarke, Lee Osterweil University of Pennsylvania Loyola University Maryland Columbia University University of Massachusetts Amherst
28
Embed
On Effective Testing of Health Care Simulation Software Christian Murphy, M.S. Raunak, Andrew King, Sanjian Chen, Christopher Imbriano, Gail Kaiser, Insup.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
On Effective Testing ofHealth Care Simulation Software
Christian Murphy, M.S. Raunak, Andrew King,
Sanjian Chen, Christopher Imbriano, Gail Kaiser,
Insup Lee, Oleg Sokolsky, Lori Clarke, Lee Osterweil
University of Pennsylvania
Loyola University Maryland
Columbia University
University of Massachusetts Amherst
2 / 27
Overview Simulation software is used widely in the field of
health care
Simulators must not only accurately model the real world, but be free of software defects as well
It is particularly hard to test simulation software because often there is no “test oracle”
Our research shows that it is possible to detect defects if properties of the software are violated
Raunak et al., “Simulating patient flow through an emergencydepartment using process-drivendiscrete event simulation”, SEHC’09
5 / 27
Glycemic Control (Insulin Pump)
King et al., “Prototyping closed loopphysiologic control with the MedicalDevice Coordination Framework”,SEHC’10
6 / 27
Problem Statement Partial oracles may exist for a limited subset
of the input domain in simulation software
Obvious errors (e.g., crashes) can be detected with certain inputs or testing techniques
However, it is difficult to detect subtle computational defects in simulators without test oracles in the general case
7 / 27
What do I mean by “defect”? Deviation of the implementation from the
specification Violation of a sound property of the software
“Discrete localized” calculation errors Off-by-one Incorrect sentinel values for loops Wrong comparison or mathematical operator
Misinterpretation of specification Parts of input domain not handled Incorrect assumptions made about input
8 / 27
Research Goals
Identify an approach for testing simulation software that is effective even without a test oracleReliably detect defects Increase confidence that the software works
Demonstrate feasibility of the approach
Measure the effectiveness of the approach
9 / 27
Outline
Motivating examples
Overview of testing approach
Study #1: Demonstrating feasibility
Study #2: Measuring effectiveness
Future work & conclusion
10 / 27
Observation Many programs without oracles have
properties such that certain changes to the input yield predictable changes to the output
We can detect defects in these programs by looking for any violations of these “metamorphic properties”
This is known as “metamorphic testing”T.Y. Chen et al., HKUST Tech Report, 1998
11 / 27
Metamorphic Testing
If new test case output f(t(x)) is as expected, it is not necessarily correct
However, if f(t(x)) is not as expected, either f(x) or f(t(x)) – or both! – is wrong
x f f(x)Initial test case
t(x) f f(t(x))New test case
t f(x) and f(t(x))are “pseudo-oracles”
Transformation function based on
metamorphic properties of f
12 / 27
Metamorphic Testing Example Consider a function to determine the standard
Applying metamorphic testing to applications without test oraclesT.Y. Chen et al., Info. and Soft. Tech., 2002
14 / 27
Outline
Motivating examples
Overview of testing approach
Study #1: Demonstrating feasibility
Study #2: Measuring effectiveness
Future work & conclusion
15 / 27
Feasibility Study
Goal: Demonstrate that metamorphic testing is feasible for testing simulation software
We first identify metamorphic properties in the applications of interestJSim: discrete event simulator (patients in ED)GCS: glycemic control simulator (insulin pump)
We then apply metamorphic testing and look for defects
16 / 27
Metamorphic Properties JSim: Flow of patients through ED
Increasing number of resources (e.g., beds) should not increase average patient length of stay
Increasing number of resources should not decrease other resources’ utilization rates
Multiplying the time necessary for each step by a positive constant c should increase the overall time by c
GCS: glycemic control system (insulin pump) A patient who weighs more should get more insulin A patient who produces more endogenous glucose should
get more insulin The modeled insulin absorption rate should vary inversely
with the insulin distribution volume
17 / 27
JSim Findings
18 / 27
Unexpected JSim FindingsID Arrival
TimeDeparture
TimeLength of
Stay
1 2 159 157
2 8 185 177
3 14 197 183
4 20 295 275
5 26 321 295
217.4
ID Arrival Time
Departure Time
Length of Stay
1 2 159 157
2 8 185 177
3 14 194 180
4 20 312 292
5 26 321 295
220.2
Average LOS with 1 nurse Average LOS with 2 nurses
19 / 27
Outline
Motivating examples
Overview of testing approach
Study #1: Demonstrating feasibility
Study #2: Measuring effectiveness
Future work & conclusion
20 / 27
Measuring Effectiveness
Goal: Estimate the effectiveness of metamorphic testing at detecting defects in simulators
We first systematically seed the software with defects
We then measure the number that are detected
21 / 27
Methodology Mutation testing was used to seed defects into
each application Reverse comparison operators Change math operators Introduce off-by-one errors
For each program, we created multiple versions, each with exactly one mutation
We ignored mutants that yielded outputs that were obviously wrong, caused crashes, etc.
Effectiveness is determined by measuring what percentage of the mutants were “killed”
22 / 27
Results
Application JSim
GCS
Control
GCS
Patient
Mutants generated 104 306 644
Usable mutants 25 237 487
Mutants detected 25 58 333
Effectiveness 100% 24.4% 68.4%
23 / 27
Analysis: JSim “Statistical metamorphic testing” useful for killing
mutants related to non-deterministic event timing
If timing range is [A, B] and observed mean is μ, then mean μ’ for range [10A, 10B] should be around 10μ
Because of mutant, range is actually [A, B-1]
Over many executions, observed mean μ’ has statistically significant difference from expected mean 10μ
24 / 27
Analysis: GCS Metamorphic testing not as effective in control
algorithm (rules for delivering insulin)
Rules are usually of the form “if patient blood sugar is x then adjust infusion rate by y”
Single mutants did not have much effect on overall insulin delivered
These may be detected by more “straightforward” software testing approaches
25 / 27
Outline
Motivating examples
Overview of testing approach
Study #1: Demonstrating feasibility
Study #2: Measuring effectiveness
Future work & conclusion
26 / 27
Future Work
Formalizing the process of identifying metamorphic properties for simulators
Consider the use of metamorphic testing for validation If a property is violated, does that mean there is a
defect, or is the property simply unsound? If the property is unsound, is this simulator
appropriate for the task it is meant to model?
27 / 27
Conclusion
We have demonstrated that metamorphic testing is an effective technique for testing simulation software
It can increase confidence in the implementation
It also helps increase understanding of how the software behaves
On Effective Testing ofHealth Care Simulation Software