.lu software verification & validation V V S Automated Testing of Autonomous Driving Assistance Systems Lionel Briand VVIoT, Sweden, 2018
.lusoftware verification & validationVVS
Automated Testing of Autonomous Driving Assistance
Systems
Lionel Briand
VVIoT, Sweden, 2018
Collaborative Research @ SnT
2
• Research in context• Addresses actual needs• Well-defined problem• Long-term collaborations• Our lab is the industry
Software Verification and Validation @ SnT Centre
3
• Group established in 2012
• Focus: Automated, novel, cost-effective V&V solutions
• ERC Advanced Grant
• ~ 25 staff members
• Industry and public partnerships
Introduction
4
Autonomous Systems
• May be embodied in a device (e.g., robot) or reside entirely in the cyber world (e.g., financial decisions)
• Gaining, encoding, and appropriately using knowledge is a bottleneck for developing intelligent autonomous systems
• Machine learning, e.g., deep learning, is often an essential component
5
Motivations
• Dangerous tasks
• Tedious, repetitive tasks
• Significant improvements in safety
• Significant reduction in cost, energy, and resources
• Significant optimization of benefits
6
Autonomous CPS
• Read sensors, i.e., collect data about their environment
• Make predictions about their environment
• Make (optimal) decisions about how to behave to achieve some objective(s) based on predictions
• Send commands to actuators according to decisions
• Often mission or safety critical7
A General and Fundamental Shift• Increasingly so, it is easier to learn behavior from data using machine learning,
rather than specify and code
• Deep learning, reinforcement learning …
• Assumption: data captures desirable behavior, in a comprehensive manner
• Example: Neural networks (deep learning)
• Millions of weights learned
• No explicit code, no specifications
• Verification, testing?
8
Many Domains• CPS (e.g., robotics)
• Visual recognition
• Finance, insurance
• Speech recognition
• Speech synthesis
• Machine translation
• Games
• Learning to produce art9
Testing Implications
• Test oracles? No explicit, expected test behavior
• Test completeness? No source code, no specification
10
CPS Development Process
11
Functional modeling: • Controllers• Plant• Decision
Continuous and discrete Simulink models
Model simulation and testing
Architecture modelling• Structure• Behavior• Traceability
System engineering modeling (SysML)
Analysis: • Model execution and
testing• Model-based testing• Traceability and
change impact analysis
• ...
(partial) Code generation
Deployed executables on target platform
Hardware (Sensors ...) Analog simulators
Testing (expensive)
Hardware-in-the-Loop Stage
Software-in-the-Loop StageModel-in-the-Loop Stage
MiL Components
12
Sensor
Controller
Actuator Decision
Plant
Opportunities and Challenges• Early functional models (MiL) offer opportunities for early
functional verification and testing
• But a challenge for constraint solvers and model checkers:
• Continuous mathematical models, e.g., differentialequations
• Discrete software models for code generation, but with complex operations
• Library functions in binary code13
Automotive Environment
• Highly varied environments, e.g., road topology, weather, building and pedestrians …
• Huge number of possible scenarios, e.g., determined by trajectories of pedestrians and cars
• ADAS play an increasingly critical role
• A challenge for testing
14
Testing Advanced Driver Assistance Systems
15
Objective
• Testing ADAS
• Identify and characterize most critical/risky scenarios
• Test oracle: Safety properties
• Need scalable test strategy due to large input space
16
17
Automated Emergency Braking System (AEB)
17
“Brake-request” when braking is needed to avoid collisions
Decision making
Vision(Camera)
Sensor
Brake Controller
Objects’ position/speed
Example Critical Situation
• “AEB properly detects a pedestrian in front of the car with a high degree of certainty and applies braking, but an accident still happens where the car hits the pedestrian with a relatively high speed”
18
Testing via Physics-based Simulation
19
Simulation
20
SUT
SimulatorEgo Vehicule
(physical plant)
Pedestrians
Other Vehicules
- Road- Traffic sign- Weather
OutputsTime-stamped vectors for: - the SUT outputs - the states of the physical plant and the mobile environment objects
sensors
cameras
actuators
Environment
mobile objects
static aspects
Dynamic models
Inputs- the initial state of the physical plant and the mobile environment objects- the static environment aspects
Feedback loop
Our Goal
• Developing an automated testing technique for ADAS
• To help engineers efficiently and effectively explore the complex test input space of ADAS
• To identify critical (failure-revealing) test scenarios
• Characterization of input conditions that lead to most critical situations
21
ADAS Testing Challenges
• Test input space is large, complex and multidimensional
• Explaining failures and fault localization are difficult
• Execution of physics-based simulation models is computationally expensive
22
Our Approach• Effectively combine evolutionary computing algorithms and
decision tree classification models
• Evolutionary computing is used to search the input space for safety violations
• We use decision tress to guide the search-based generation of tests faster towards the most critical regions, and characterize failures
• In turn, we use search algorithms to refine classification models to better characterize critical regions of the ADAS input space
23
AEB Domain Model
- visibility: VisibilityRange- fog: Boolean- fogColor: FogColor
Weather
- frictionCoeff: Real
Road1
- v0 : RealVehicle
- : Real- : Real- : Real- :Real
Pedestrian
- simulationTime: Real- timeStep: Real
Test Scenario
11
- ModerateRain- HeavyRain- VeryHeavyRain- ExtremeRain
«enumeration»RainType- ModerateSnow
- HeavySnow- VeryHeavySnow- ExtremeSnow
«enumeration»SnowType
- DimGray- Gray- DarkGray- Silver- LightGray- None
«enumeration»FogColor
1
WeatherC{{OCL} self.fog=false
implies self.visibility = “300” and self.fogColor=None}
Straight
- height: RampHeight
Ramped
- radius: CurvedRadius
Curved
- snowType: SnowType
Snow
- rainType: RainType
Rain
Normal
- 5 - 10 - 15 - 20 - 25 - 30 - 35 - 40
«enumeration»CurvedRadius (CR)
- 4 - 6 - 8 - 10 - 12
«enumeration»RampHeight (RH)
- 10 - 20 - 30 - 40 - 50 - 60 - 70 - 80 - 90 - 100- 110 - 120 - 130 - 140 - 150 - 160 - 170 - 180 - 190 - 200 - 210 - 220 - 230 - 240 - 250 - 260 - 270 - 280 - 290 - 300
«enumeration»VisibilityRange
- : TTC: Real- : certaintyOfDetection: Real- : braking: Boolean
AEB Output
- : Real- : Real
Output functions
Mobile object
Position vector
- x: Real- y: Real
Position1 11
1
1
Static input
1
Output
11
Dynamic input
x
p0
yp0vp0✓p0
vc0
v3
v2v1
F1F2
Search-Based Software Testing• Express test generation problem
as a search problem
• Search for test input data with certain properties, i.e., constraints
• Non-linearity of software (if, loops, …): complex, discontinuous, non-linear search spaces (Baresel)
• Many search algorithms (metaheuristics), from local search to global search, e.g., Hill Climbing, Simulated Annealing and Genetic Algorithms
Section IV discusses future directions for Search-BasedSoftware Testing, comprising issues involving executionenvironments, testability, automated oracles, reduction ofhuman oracle cost and multi-objective optimisation. Finally,Section V concludes with closing remarks.
II. SEARCH-BASED OPTIMIZATION ALGORITHMS
The simplest form of an optimization algorithm, andthe easiest to implement, is random search. In test datageneration, inputs are generated at random until the goal ofthe test (for example, the coverage of a particular programstatement or branch) is fulfilled. Random search is very poorat finding solutions when those solutions occupy a very smallpart of the overall search space. Such a situation is depictedin Figure 2, where the number of inputs covering a particularstructural target are very few in number compared to thesize of the input domain. Test data may be found fasterand more reliably if the search is given some guidance.For meta-heurstic searches, this guidance can be providedin the form of a problem-specific fitness function, whichscores different points in the search space with respect totheir ‘goodness’ or their suitability for solving the problemat hand. An example fitness function is plotted in Figure3, showing how - in general - inputs closer to the requiredtest data that execute the structure of interest are rewardedwith higher fitness values than those that are further away.A plot of a fitness function such as this is referred to as thefitness landscape. Such fitness information can be utilized byoptimization algorithms, such as a simple algorithm calledHill Climbing. Hill Climbing starts at a random point in thesearch space. Points in the search space neighbouring thecurrent point are evaluated for fitness. If a better candidatesolution is found, Hill Climbing moves to that new point,and evaluates the neighbourhood of that candidate solution.This step is repeated, until the neighbourhood of the currentpoint in the search space offers no better candidate solutions;a so-called ‘local optima’. If the local optimum is not theglobal optimum (as in Figure 3a), the search may benefitfrom being ‘restarted’ and performing a climb from a newinitial position in the landscape (Figure 3b).
An alternative to simple Hill Climbing is SimulatedAnnealing [22]. Search by Simulated Annealing is similar toHill Climbing, except movement around the search space isless restricted. Moves may be made to points of lower fitnessin the search space, with the aim of escaping local optima.This is dictated by a probability value that is dependenton a parameter called the ‘temperature’, which decreasesin value as the search progresses (Figure 4). The lowerthe temperature, the less likely the chances of moving to apoorer position in the search space, until ‘freezing point’ isreached, from which point the algorithm behaves identicallyto Hill Climbing. Simulated Annealing is named so becauseit was inspired by the physical process of annealing inmaterials.
Input domain
portion of input domain
denoting required test data
randomly-generatedinputs
Figure 2. Random search may fail to fulfil low-probability test goals
Fitn
ess
Input domain
(a) Climbing to a local optimum
Fitn
ess
Input domain(b) Restarting, on this occasion resulting in a climb to the global optimum
Figure 3. The provision of fitness information to guide the search withHill Climbing. From a random starting point, the algorithm follows thecurve of the fitness landscape until a local optimum is found. The finalposition may not represent the global optimum (part (a)), and restarts maybe required (part (b))
Fitn
ess
Input domainFigure 4. Simulated Annealing may temporarily move to points of poorerfitness in the search space
Fitn
ess
Input domainFigure 5. Genetic Algorithms are global searches, sampling many pointsin the fitness landscape at once
“Search-Based Software Testing: Past, Present and Future” Phil McMinn
Genetic Algorithm
25
Section IV discusses future directions for Search-BasedSoftware Testing, comprising issues involving executionenvironments, testability, automated oracles, reduction ofhuman oracle cost and multi-objective optimisation. Finally,Section V concludes with closing remarks.
II. SEARCH-BASED OPTIMIZATION ALGORITHMS
The simplest form of an optimization algorithm, andthe easiest to implement, is random search. In test datageneration, inputs are generated at random until the goal ofthe test (for example, the coverage of a particular programstatement or branch) is fulfilled. Random search is very poorat finding solutions when those solutions occupy a very smallpart of the overall search space. Such a situation is depictedin Figure 2, where the number of inputs covering a particularstructural target are very few in number compared to thesize of the input domain. Test data may be found fasterand more reliably if the search is given some guidance.For meta-heurstic searches, this guidance can be providedin the form of a problem-specific fitness function, whichscores different points in the search space with respect totheir ‘goodness’ or their suitability for solving the problemat hand. An example fitness function is plotted in Figure3, showing how - in general - inputs closer to the requiredtest data that execute the structure of interest are rewardedwith higher fitness values than those that are further away.A plot of a fitness function such as this is referred to as thefitness landscape. Such fitness information can be utilized byoptimization algorithms, such as a simple algorithm calledHill Climbing. Hill Climbing starts at a random point in thesearch space. Points in the search space neighbouring thecurrent point are evaluated for fitness. If a better candidatesolution is found, Hill Climbing moves to that new point,and evaluates the neighbourhood of that candidate solution.This step is repeated, until the neighbourhood of the currentpoint in the search space offers no better candidate solutions;a so-called ‘local optima’. If the local optimum is not theglobal optimum (as in Figure 3a), the search may benefitfrom being ‘restarted’ and performing a climb from a newinitial position in the landscape (Figure 3b).
An alternative to simple Hill Climbing is SimulatedAnnealing [22]. Search by Simulated Annealing is similar toHill Climbing, except movement around the search space isless restricted. Moves may be made to points of lower fitnessin the search space, with the aim of escaping local optima.This is dictated by a probability value that is dependenton a parameter called the ‘temperature’, which decreasesin value as the search progresses (Figure 4). The lowerthe temperature, the less likely the chances of moving to apoorer position in the search space, until ‘freezing point’ isreached, from which point the algorithm behaves identicallyto Hill Climbing. Simulated Annealing is named so becauseit was inspired by the physical process of annealing inmaterials.
Input domain
portion of input domain
denoting required test data
randomly-generatedinputs
Figure 2. Random search may fail to fulfil low-probability test goals
Fitn
ess
Input domain
(a) Climbing to a local optimum
Fitn
ess
Input domain(b) Restarting, on this occasion resulting in a climb to the global optimum
Figure 3. The provision of fitness information to guide the search withHill Climbing. From a random starting point, the algorithm follows thecurve of the fitness landscape until a local optimum is found. The finalposition may not represent the global optimum (part (a)), and restarts maybe required (part (b))
Fitn
ess
Input domainFigure 4. Simulated Annealing may temporarily move to points of poorerfitness in the search space
Fitn
ess
Input domainFigure 5. Genetic Algorithms are global searches, sampling many pointsin the fitness landscape at once
Multiple Objectives: Pareto Front
26
Individual A Pareto dominates individual B ifA is at least as good as B
in every objective and better than B in at
least one objective.
Dominated by x
F1
F2
Pareto frontx
• A multi-objective optimization algorithm (e.g., NSGA II) must:• Guide the search towards the global Pareto-Optimal front.• Maintain solution diversity in the Pareto-Optimal front.
Decision Trees
27
Partition the input space into homogeneous regions
All points Count 1200
“non-critical” 79%“critical” 21%
“non-critical” 59%“critical” 41%
Count 564 Count 636“non-critical” 98%“critical” 2%
Count 412“non-critical” 49%“critical” 51%
Count 152“non-critical” 84%“critical” 16%
Count 230 Count 182
vp0 >= 7.2km/h vp
0 < 7.2km/h
✓p0 < 218.6� ✓p0 >= 218.6�
RoadTopology(CR = 5,Straight,RH = [4� 12](m))
RoadTopology
(CR = [10� 40](m))
“non-critical” 31%“critical” 69%
“non-critical” 72%“critical” 28%
Search Algorithm (NSGAII-DT)• We use multi-objective search algorithm (NSGAII)
• Three objectives (CB): Minimum distance between the pedestrian and the field of view, the car speed at the time of collision, and the probability that the object detected in front of the car is a pedestrian
• Inputs are vectors of values containing static and dynamic variables: precipitation, fogginess, road shape, visibility range, car-speed, person-speed, person-position (x,y), person-orientation
• Each search iteration calls simulations to compute fitness
• We use decision tree classification models to predict scenario criticality28
NSGAII-DT1. Generate an initial representative set of input scenarios and run the simulator to label each scenario as critical or non-critical2. Build a decision tree model
criticalregion
non-criticalregion
non-criticalregion
conditions yesno
critical scenarionon-critical scenario
conditionsyesno
3. Run the NSGAII search algorithm for the elements inside each critical leaf
NSGAII
Mutation and crossover
NDS
Select best scenarios
The new scenarios are added to the initial population
4. Rebuild the decision tree (step 2) or stop the process
most criticalregion
conditions yesno
conditions yesno
Region in the input space that is likely to contain more critical scenarios
All points Count 1200
“non-critical” 79%“critical” 21%
“non-critical” 59%“critical” 41%
Count 564 Count 636“non-critical” 98%“critical” 2%
Count 412“non-critical” 49%“critical” 51%
Count 152“non-critical” 84%“critical” 16%
Count 230 Count 182
vp0 >= 7.2km/h vp
0 < 7.2km/h
✓p0 < 218.6� ✓p0 >= 218.6�
RoadTopology(CR = 5,Straight,RH = [4� 12](m))
RoadTopology
(CR = [10� 40](m))
“non-critical” 31%“critical” 69%
“non-critical” 72%“critical” 28%
Initial Classification Model
We focus on generating more scenarios in the critical region, respecting the conditions that lead to that region
30
All points Count 3367
“non-critical” 58%“critical” 42%
“non-critical” 43%“critical” 57%
Count 2198 Count 1169“non-critical” 88%“critical” 12%
Count 338“non-critical” 17%“critical” 83%
Count 1860“non-critical” 47%“critical” 53%
“non-critical” 42%“critical” 58%
Count 1438 Count 422“non-critical” 64%“critical” 36%
Count 553“non-critical” 29%“critical” 71%
Count 885“non-critical” 51%“critical” 49%
“non-critical” 37%“critical” 63%
Count 548 Count 337“non-critical” 73%“critical” 27%
x
p0 >= 37.4 ^ RoadTopology
(Straight,
RH = [4� 12])
x
p0 < 37.4^RoadTopology
(Straight,
✓p0 < 232.5�✓p0 >= 232.5�
x
p0 < 33x
p0 >= 33
✓p0 >= 185.6�✓p0 < 185.6�
yp0 < 57.7yp
0 >= 57.7
^
^
^^
^
^ RoadTopology
RoadTopology
RoadTopology
RoadTopology
RoadTopology
RoadTopology
(Straight,
(CR = [5� 40])
(CR = [5� 40])
(CR = [5� 40])
(CR = [5� 40])
(Straight,
CR = [5� 40],
CR = [5� 40])
CR = [5� 40])
CR = [5� 40])
Refined Classification Model
We get a more refined decision tree with more critical regions and more homogeneous areas
31
Research Questions
• RQ1: Does the decision tree technique help guide the evolutionary search and make it more effective?
• RQ2: Does our approach help characterize and converge towards homogeneous critical regions?
• Failure explanation
• Usefulness (feedback from engineers)
32
RQ1: NSGAII-DT vs. NSGAII
33
NSGAII-DT outperforms NSGAII
HV
0.0
0.4
0.8G
D
0.05
0.15
0.25
SP
20.6
1.0
1.4
6 10 14 18 22 24Time (h)
NSGAII-DTNSGAII
RQ1: NSGAII-DT vs. NSGAII
• NSGAII-DT generates 78% more distinct, critical test scenarios compared to NSGAII
34
RQ2: NSGAII-DT (evaluation of the generated decision trees)
35
Goo
dnes
sOfFit
Reg
ionS
ize
1 5 642 30.40
0.50
0.60
0.70
tree generations
(b) 0.80
71 5 642 30.00
0.05
0.10
0.15
tree generations
(a) 0.20
7
Goo
dnes
sOfFit-
crt
1 5 642 3
0.30
0.50
0.70
tree generations
(c) 0.90
7
The generated critical regions consistently become smaller, more homogeneous and more precise over successive tree generations of
NSGAII-DT
50m
76m
36m32m
θ[15m-40m]
vehicle speed > 36km/h
pedestrian speed < 6km/h
Failure explanation
• A characterization of the input space showing under what input conditions the system is likely to fail
36
• Visualized by decision trees or dedicated diagrams
• Path conditions in trees
road sidewalk
Usefulness
• The characterizations of the different critical regions can help with:
(1) Debugging the system model (or the simulator)
(2) Identifying possible hardware changes to increase ADAS safety
(3) Providing proper warnings to drivers37
Automated Testing of Feature Interactions Using
Many Objective Search
38
System Integration
39
actuators
sensors
feature n
feature 2
feature 1
Integration component
System Under Test (SUT)
...cameras
Case Study: SafeDrive• Our case study describes an automotive system consisting of
four advanced driver assistance features:
• Cruise Control (ACC)
• Traffic Sign Recognition (TSR)
• Pedestrian Protection (PP)
• Automated Emergency Breaking (AEB)
40
Simulation
41
SUT
SimulatorEgo Vehicule
(physical plant)
Pedestrians
Other Vehicules
- Road- Traffic sign- Weather
OutputsTime-stamped vectors for: - the SUT outputs - the states of the physical plant and the mobile environment objects
sensors
cameras
actuators
Environment
mobile objects
static aspects
Dynamic models
Inputs- the initial state of the physical plant and the mobile environment objects- the static environment aspects
Feedback loop
Actuator Command Vectors
42
Safety Requirements
43
Features• Behavior of features based on machine learning algorithms processing sensor
and camera data
• Interactions between features may lead to violating safety requirements, even if features are correct
• E.g., ACC is controlling the car by ordering it to accelerate since the leading car is far away, while a pedestrian starts crossing the road. PP starts sending braking commands to avoid hitting the pedestrian.
• Complex: predict and analyze possible interactions at the requirements level in a complex environment
• Resolution strategies cannot always be determined statically and may depend on environment
44
Objective• Automated and scalable testing to help ensure that resolution
strategies are safe
• Detect undesired feature interactions
• Assumptions: IntC is white-box (integrator is testing), features were previously tested
• Extremely large input space since environmental conditions and scenarios can vary a great deal
45
Input Variables
46
Search• Input space is large
• Dedicated search algorithm (many objectives) directed/guided by test objectives (fitness functions)
• Fitness (distance) functions: reward test cases that are more likely to reveal integration failures leading to safety violations
• Combine three types of functions: (1) safety violations, (2) unsafe overriding by IntC, (3) coverage of the decision structure of integration component
• Many test objectives to be satisfied by the test suite
47
Failure Distance
• Reveal safety requirements violations
• Fitness functions based on the trajectory vectors for the ego car, the leading car and the pedestrian, generated by the simulator
• PP fitness: Minimum distance between the car and the pedestrian during the simulation time.
• AEB fitness: Minimum distance between the car and the leading car during the simulation time.
48
Distance Functions
49
When any of the functions yields zero, a safety failure corresponding to
that function is detected.
Unsafe Overriding Distance
• Goal: Find faults more likely to be due to faults in integration component
• Reward test cases generating integration outputs deviating from the individual feature outputs, in such a way as to possibly lead to safety violations.
• Example: A feature f issues a braking command while the integration component issues no braking command or a braking command with a lower force than that of f .
50
Branch Distance
• Branch coverage of IntC
• Fitness: Approach level and branch distance d (standard for code coverage)
• d(b,tc) = 0 when tc covers b
51
Combining Distance Functions• Goal: Execute every branch of IntC such that while executing
that branch, IntC unsafely overrides every feature f and its outputs violate every safety requirement related to f.
52
Indicates that tc has not covered the branch j
Branch covered but did not caused unsafe override of f
Branch covered, unsafe override, but did not violate requirement I
Search Algorithm• Best test suite covers all search objectives, i.e., for all IntC
branches and all safety requirements
• Not a Pareto front optimization problem
• Objectives compete with each others
• Example: cannot have the ego car violating the speed limit after hitting the leading car in one test case
• Tailored, many-objective genetic algorithm
• Must be efficient (test case executions are very expensive)
53
Search Algorithm
54
Randomly generated TCsCompute fitness
Tests are evolvedCrossover, mutation
Fittest tests selected
Correct constraint violations
Archive covering tests
Evaluation
55
2
0
4 6 8 10 12
1
2
3
4
5
6
7FITest
Baseline
Time (h)
Num
ber o
f Int
egra
tion
erro
rs
Discussion
56
Observations• We will rarely have precise and complete requirements, face great
diversity in the physical environment, including many possible scenarios.
• It is possible, however, to define properties characterizing unacceptable situations (safety)
• Notion of test coverage is elusive: No specification or code/models for some key (decision) components based on ML
• Failure is not clear cut: It is a matter of risk, trade-off …
• We have executable/simulable functional models (e.g., Simulink) at early stages
57
Conclusions• We proposed solutions based on:
• Efficient and realistic (hardware, physics) simulation
• Metaheuristic search, e.g., evolutionary computing
• Guided by fitness functions derived from properties of interest (e.g., safety requirements)
• Machine learning, e.g., to speed up search
• No guarantees though
58
Generalizing
• Examples presented from (safety-critical) cyber-physical systems, e.g., safety requirements
• Can a similar strategy be applied in other domains to test for bias or any other undesirable properties (e.g., legal), when system behavior is driven by machine learning?
• Executable models of environment and users?
59
Summary• Machine learning plays an increasingly prominent role in
autonomous systems
• No (complete) requirements, specifications, or even code
• Some safety and mission-critical requirements
• Neural networks (deep learning) with millions of weights
• How do we gain confidence in such software in a scalable and cost-effective way?
60
Acknowledgements
• Raja Ben Abdessalem
• Shiva Nejati
• Annibale Panichella
• IEE, Luxembourg
61
References
• R. Ben Abdessalem et al., "Testing Advanced Driver Assistance Systems Using Multi-Objective Search and Neural Networks”, IEEE ASE 2016
• R. Ben Abdessalem et al., "Testing Vision-Based Control Systems Using Learnable Evolutionary Algorithms”, IEEE/ACM ICSE 2018
62
.lusoftware verification & validationVVS
Automated Testing of Autonomous Systems
Lionel Briand
VVIoT, Sweden, 2018