Automated Testing of Autonomous Driving Assistance Systemsvviot2018/slides/VViOT2018-Briand.pdf · human oracle cost and multi-objective optimisation. Finally, Section V concludes

.lusoftware verification & validationVVS

Automated Testing of Autonomous Driving Assistance

Systems

Lionel Briand

VVIoT, Sweden, 2018

Collaborative Research @ SnT

2

• Research in context• Addresses actual needs• Well-defined problem• Long-term collaborations• Our lab is the industry

Software Verification and Validation @ SnT Centre

3

• Group established in 2012

• Focus: Automated, novel, cost-effective V&V solutions

• ERC Advanced Grant

• ~ 25 staff members

• Industry and public partnerships

Introduction

4

Autonomous Systems

• May be embodied in a device (e.g., robot) or reside entirely in the cyber world (e.g., financial decisions)

• Gaining, encoding, and appropriately using knowledge is a bottleneck for developing intelligent autonomous systems

• Machine learning, e.g., deep learning, is often an essential component

5

Motivations

• Dangerous tasks

• Tedious, repetitive tasks

• Significant improvements in safety

• Significant reduction in cost, energy, and resources

• Significant optimization of benefits

6

Autonomous CPS

• Read sensors, i.e., collect data about their environment

• Make predictions about their environment

• Make (optimal) decisions about how to behave to achieve some objective(s) based on predictions

• Send commands to actuators according to decisions

• Often mission or safety critical7

A General and Fundamental Shift• Increasingly so, it is easier to learn behavior from data using machine learning,

rather than specify and code

• Deep learning, reinforcement learning …

• Assumption: data captures desirable behavior, in a comprehensive manner

• Example: Neural networks (deep learning)

• Millions of weights learned

• No explicit code, no specifications

• Verification, testing?

8

Many Domains• CPS (e.g., robotics)

• Visual recognition

• Finance, insurance

• Speech recognition

• Speech synthesis

• Machine translation

• Games

• Learning to produce art9

Testing Implications

• Test oracles? No explicit, expected test behavior

• Test completeness? No source code, no specification

10

CPS Development Process

11

Functional modeling: • Controllers• Plant• Decision

Continuous and discrete Simulink models

Model simulation and testing

Architecture modelling• Structure• Behavior• Traceability

System engineering modeling (SysML)

Analysis: • Model execution and

testing• Model-based testing• Traceability and

change impact analysis

• ...

(partial) Code generation

Deployed executables on target platform

Hardware (Sensors ...) Analog simulators

Testing (expensive)

Hardware-in-the-Loop Stage

Software-in-the-Loop StageModel-in-the-Loop Stage

MiL Components

12

Sensor

Controller

Actuator Decision

Plant

Opportunities and Challenges• Early functional models (MiL) offer opportunities for early

functional verification and testing

• But a challenge for constraint solvers and model checkers:

• Continuous mathematical models, e.g., differentialequations

• Discrete software models for code generation, but with complex operations

• Library functions in binary code13

Automotive Environment

• Highly varied environments, e.g., road topology, weather, building and pedestrians …

• Huge number of possible scenarios, e.g., determined by trajectories of pedestrians and cars

• ADAS play an increasingly critical role

• A challenge for testing

14

Testing Advanced Driver Assistance Systems

15

Objective

• Testing ADAS

• Identify and characterize most critical/risky scenarios

• Test oracle: Safety properties

• Need scalable test strategy due to large input space

16

17

Automated Emergency Braking System (AEB)

17

“Brake-request” when braking is needed to avoid collisions

Decision making

Vision(Camera)

Sensor

Brake Controller

Objects’ position/speed

Example Critical Situation

• “AEB properly detects a pedestrian in front of the car with a high degree of certainty and applies braking, but an accident still happens where the car hits the pedestrian with a relatively high speed”

18

Testing via Physics-based Simulation

19

Simulation

20

SUT

SimulatorEgo Vehicule

(physical plant)

Pedestrians

Other Vehicules

- Road- Traffic sign- Weather

OutputsTime-stamped vectors for: - the SUT outputs - the states of the physical plant and the mobile environment objects

sensors

cameras

actuators

Environment

mobile objects

static aspects

Dynamic models

Inputs- the initial state of the physical plant and the mobile environment objects- the static environment aspects

Feedback loop

Our Goal

• Developing an automated testing technique for ADAS

• To help engineers efficiently and effectively explore the complex test input space of ADAS

• To identify critical (failure-revealing) test scenarios

• Characterization of input conditions that lead to most critical situations

21

ADAS Testing Challenges

• Test input space is large, complex and multidimensional

• Explaining failures and fault localization are difficult

• Execution of physics-based simulation models is computationally expensive

22

Our Approach• Effectively combine evolutionary computing algorithms and

decision tree classification models

• Evolutionary computing is used to search the input space for safety violations

• We use decision tress to guide the search-based generation of tests faster towards the most critical regions, and characterize failures

• In turn, we use search algorithms to refine classification models to better characterize critical regions of the ADAS input space

23

AEB Domain Model

- visibility: VisibilityRange- fog: Boolean- fogColor: FogColor

Weather

- frictionCoeff: Real

Road1

- v0 : RealVehicle

- : Real- : Real- : Real- :Real

Pedestrian

- simulationTime: Real- timeStep: Real

Test Scenario

11

- ModerateRain- HeavyRain- VeryHeavyRain- ExtremeRain

«enumeration»RainType- ModerateSnow

- HeavySnow- VeryHeavySnow- ExtremeSnow

«enumeration»SnowType

- DimGray- Gray- DarkGray- Silver- LightGray- None

«enumeration»FogColor

1

WeatherC{{OCL} self.fog=false

implies self.visibility = “300” and self.fogColor=None}

Straight

- height: RampHeight

Ramped

- radius: CurvedRadius

Curved

- snowType: SnowType

Snow

- rainType: RainType

Rain

Normal

- 5 - 10 - 15 - 20 - 25 - 30 - 35 - 40

«enumeration»CurvedRadius (CR)

- 4 - 6 - 8 - 10 - 12

«enumeration»RampHeight (RH)

- 10 - 20 - 30 - 40 - 50 - 60 - 70 - 80 - 90 - 100- 110 - 120 - 130 - 140 - 150 - 160 - 170 - 180 - 190 - 200 - 210 - 220 - 230 - 240 - 250 - 260 - 270 - 280 - 290 - 300

«enumeration»VisibilityRange

- : TTC: Real- : certaintyOfDetection: Real- : braking: Boolean

AEB Output

- : Real- : Real

Output functions

Mobile object

Position vector

- x: Real- y: Real

Position1 11

1

1

Static input

1

Output

11

Dynamic input

x

p0

yp0vp0✓p0

vc0

v3

v2v1

F1F2

Search-Based Software Testing• Express test generation problem

as a search problem

• Search for test input data with certain properties, i.e., constraints

• Non-linearity of software (if, loops, …): complex, discontinuous, non-linear search spaces (Baresel)

• Many search algorithms (metaheuristics), from local search to global search, e.g., Hill Climbing, Simulated Annealing and Genetic Algorithms

Section IV discusses future directions for Search-BasedSoftware Testing, comprising issues involving executionenvironments, testability, automated oracles, reduction ofhuman oracle cost and multi-objective optimisation. Finally,Section V concludes with closing remarks.

II. SEARCH-BASED OPTIMIZATION ALGORITHMS

The simplest form of an optimization algorithm, andthe easiest to implement, is random search. In test datageneration, inputs are generated at random until the goal ofthe test (for example, the coverage of a particular programstatement or branch) is fulfilled. Random search is very poorat finding solutions when those solutions occupy a very smallpart of the overall search space. Such a situation is depictedin Figure 2, where the number of inputs covering a particularstructural target are very few in number compared to thesize of the input domain. Test data may be found fasterand more reliably if the search is given some guidance.For meta-heurstic searches, this guidance can be providedin the form of a problem-specific fitness function, whichscores different points in the search space with respect totheir ‘goodness’ or their suitability for solving the problemat hand. An example fitness function is plotted in Figure3, showing how - in general - inputs closer to the requiredtest data that execute the structure of interest are rewardedwith higher fitness values than those that are further away.A plot of a fitness function such as this is referred to as thefitness landscape. Such fitness information can be utilized byoptimization algorithms, such as a simple algorithm calledHill Climbing. Hill Climbing starts at a random point in thesearch space. Points in the search space neighbouring thecurrent point are evaluated for fitness. If a better candidatesolution is found, Hill Climbing moves to that new point,and evaluates the neighbourhood of that candidate solution.This step is repeated, until the neighbourhood of the currentpoint in the search space offers no better candidate solutions;a so-called ‘local optima’. If the local optimum is not theglobal optimum (as in Figure 3a), the search may benefitfrom being ‘restarted’ and performing a climb from a newinitial position in the landscape (Figure 3b).

An alternative to simple Hill Climbing is SimulatedAnnealing [22]. Search by Simulated Annealing is similar toHill Climbing, except movement around the search space isless restricted. Moves may be made to points of lower fitnessin the search space, with the aim of escaping local optima.This is dictated by a probability value that is dependenton a parameter called the ‘temperature’, which decreasesin value as the search progresses (Figure 4). The lowerthe temperature, the less likely the chances of moving to apoorer position in the search space, until ‘freezing point’ isreached, from which point the algorithm behaves identicallyto Hill Climbing. Simulated Annealing is named so becauseit was inspired by the physical process of annealing inmaterials.

Input domain

portion of input domain

denoting required test data

randomly-generatedinputs

Figure 2. Random search may fail to fulfil low-probability test goals

Fitn

ess

Input domain

(a) Climbing to a local optimum

Fitn

ess

Input domain(b) Restarting, on this occasion resulting in a climb to the global optimum

Figure 3. The provision of fitness information to guide the search withHill Climbing. From a random starting point, the algorithm follows thecurve of the fitness landscape until a local optimum is found. The finalposition may not represent the global optimum (part (a)), and restarts maybe required (part (b))

Fitn

ess

Input domainFigure 4. Simulated Annealing may temporarily move to points of poorerfitness in the search space

Fitn

ess

Input domainFigure 5. Genetic Algorithms are global searches, sampling many pointsin the fitness landscape at once

“Search-Based Software Testing: Past, Present and Future” Phil McMinn

Genetic Algorithm

25

Section IV discusses future directions for Search-BasedSoftware Testing, comprising issues involving executionenvironments, testability, automated oracles, reduction ofhuman oracle cost and multi-objective optimisation. Finally,Section V concludes with closing remarks.

II. SEARCH-BASED OPTIMIZATION ALGORITHMS

The simplest form of an optimization algorithm, andthe easiest to implement, is random search. In test datageneration, inputs are generated at random until the goal ofthe test (for example, the coverage of a particular programstatement or branch) is fulfilled. Random search is very poorat finding solutions when those solutions occupy a very smallpart of the overall search space. Such a situation is depictedin Figure 2, where the number of inputs covering a particularstructural target are very few in number compared to thesize of the input domain. Test data may be found fasterand more reliably if the search is given some guidance.For meta-heurstic searches, this guidance can be providedin the form of a problem-specific fitness function, whichscores different points in the search space with respect totheir ‘goodness’ or their suitability for solving the problemat hand. An example fitness function is plotted in Figure3, showing how - in general - inputs closer to the requiredtest data that execute the structure of interest are rewardedwith higher fitness values than those that are further away.A plot of a fitness function such as this is referred to as thefitness landscape. Such fitness information can be utilized byoptimization algorithms, such as a simple algorithm calledHill Climbing. Hill Climbing starts at a random point in thesearch space. Points in the search space neighbouring thecurrent point are evaluated for fitness. If a better candidatesolution is found, Hill Climbing moves to that new point,and evaluates the neighbourhood of that candidate solution.This step is repeated, until the neighbourhood of the currentpoint in the search space offers no better candidate solutions;a so-called ‘local optima’. If the local optimum is not theglobal optimum (as in Figure 3a), the search may benefitfrom being ‘restarted’ and performing a climb from a newinitial position in the landscape (Figure 3b).

An alternative to simple Hill Climbing is SimulatedAnnealing [22]. Search by Simulated Annealing is similar toHill Climbing, except movement around the search space isless restricted. Moves may be made to points of lower fitnessin the search space, with the aim of escaping local optima.This is dictated by a probability value that is dependenton a parameter called the ‘temperature’, which decreasesin value as the search progresses (Figure 4). The lowerthe temperature, the less likely the chances of moving to apoorer position in the search space, until ‘freezing point’ isreached, from which point the algorithm behaves identicallyto Hill Climbing. Simulated Annealing is named so becauseit was inspired by the physical process of annealing inmaterials.

Input domain

portion of input domain

denoting required test data

randomly-generatedinputs

Figure 2. Random search may fail to fulfil low-probability test goals

Fitn

ess

Input domain

(a) Climbing to a local optimum

Fitn

ess

Input domain(b) Restarting, on this occasion resulting in a climb to the global optimum

Figure 3. The provision of fitness information to guide the search withHill Climbing. From a random starting point, the algorithm follows thecurve of the fitness landscape until a local optimum is found. The finalposition may not represent the global optimum (part (a)), and restarts maybe required (part (b))

Fitn

ess

Input domainFigure 4. Simulated Annealing may temporarily move to points of poorerfitness in the search space

Fitn

ess

Input domainFigure 5. Genetic Algorithms are global searches, sampling many pointsin the fitness landscape at once

Multiple Objectives: Pareto Front

26

Individual A Pareto dominates individual B ifA is at least as good as B

in every objective and better than B in at

least one objective.

Dominated by x

F1

F2

Pareto frontx

• A multi-objective optimization algorithm (e.g., NSGA II) must:• Guide the search towards the global Pareto-Optimal front.• Maintain solution diversity in the Pareto-Optimal front.

Decision Trees

27

Partition the input space into homogeneous regions

All points Count 1200

“non-critical” 79%“critical” 21%


Count 564 Count 636“non-critical” 98%“critical” 2%

Count 412“non-critical” 49%“critical” 51%


Count 230 Count 182

vp0 >= 7.2km/h vp

0 < 7.2km/h

✓p0 < 218.6� ✓p0 >= 218.6�

RoadTopology(CR = 5,Straight,RH = [4� 12](m))

RoadTopology

(CR = [10� 40](m))



Search Algorithm (NSGAII-DT)• We use multi-objective search algorithm (NSGAII)

• Three objectives (CB): Minimum distance between the pedestrian and the field of view, the car speed at the time of collision, and the probability that the object detected in front of the car is a pedestrian

• Inputs are vectors of values containing static and dynamic variables: precipitation, fogginess, road shape, visibility range, car-speed, person-speed, person-position (x,y), person-orientation

• Each search iteration calls simulations to compute fitness

• We use decision tree classification models to predict scenario criticality28

NSGAII-DT1. Generate an initial representative set of input scenarios and run the simulator to label each scenario as critical or non-critical2. Build a decision tree model

criticalregion

non-criticalregion

non-criticalregion

conditions yesno

critical scenarionon-critical scenario

conditionsyesno

3. Run the NSGAII search algorithm for the elements inside each critical leaf

NSGAII

Mutation and crossover

NDS

Select best scenarios

The new scenarios are added to the initial population

4. Rebuild the decision tree (step 2) or stop the process

most criticalregion

conditions yesno

conditions yesno

Region in the input space that is likely to contain more critical scenarios







Count 230 Count 182

vp0 >= 7.2km/h vp

0 < 7.2km/h

✓p0 < 218.6� ✓p0 >= 218.6�

RoadTopology(CR = 5,Straight,RH = [4� 12](m))

RoadTopology

(CR = [10� 40](m))



Initial Classification Model

We focus on generating more scenarios in the critical region, respecting the conditions that lead to that region

30













x

p0 >= 37.4 ^ RoadTopology

(Straight,

RH = [4� 12])

x

p0 < 37.4^RoadTopology

(Straight,

✓p0 < 232.5�✓p0 >= 232.5�

x

p0 < 33x

p0 >= 33

✓p0 >= 185.6�✓p0 < 185.6�

yp0 < 57.7yp

0 >= 57.7

^

^

^^

^

^ RoadTopology

RoadTopology

RoadTopology

RoadTopology

RoadTopology

RoadTopology

(Straight,

(CR = [5� 40])

(CR = [5� 40])

(CR = [5� 40])

(CR = [5� 40])

(Straight,

CR = [5� 40],

CR = [5� 40])

CR = [5� 40])

CR = [5� 40])

Refined Classification Model

We get a more refined decision tree with more critical regions and more homogeneous areas

31

Research Questions

• RQ1: Does the decision tree technique help guide the evolutionary search and make it more effective?

• RQ2: Does our approach help characterize and converge towards homogeneous critical regions?

• Failure explanation

• Usefulness (feedback from engineers)

32

RQ1: NSGAII-DT vs. NSGAII

33

NSGAII-DT outperforms NSGAII

HV

0.0

0.4

0.8G

D

0.05

0.15

0.25

SP

20.6

1.0

1.4

6 10 14 18 22 24Time (h)

NSGAII-DTNSGAII

RQ1: NSGAII-DT vs. NSGAII

• NSGAII-DT generates 78% more distinct, critical test scenarios compared to NSGAII

34

RQ2: NSGAII-DT (evaluation of the generated decision trees)

35

Goo

dnes

sOfFit

Reg

ionS

ize

1 5 642 30.40

0.50

0.60

0.70

tree generations

(b) 0.80

71 5 642 30.00

0.05

0.10

0.15

tree generations

(a) 0.20

7

Goo

dnes

sOfFit-

crt

1 5 642 3

0.30

0.50

0.70

tree generations

(c) 0.90

7

The generated critical regions consistently become smaller, more homogeneous and more precise over successive tree generations of

NSGAII-DT

50m

76m

36m32m

θ[15m-40m]

vehicle speed > 36km/h

pedestrian speed < 6km/h

Failure explanation

• A characterization of the input space showing under what input conditions the system is likely to fail

36

• Visualized by decision trees or dedicated diagrams

• Path conditions in trees

road sidewalk

Usefulness

• The characterizations of the different critical regions can help with:

(1) Debugging the system model (or the simulator)

(2) Identifying possible hardware changes to increase ADAS safety

(3) Providing proper warnings to drivers37

Automated Testing of Feature Interactions Using

Many Objective Search

38

System Integration

39

actuators

sensors

feature n

feature 2

feature 1

Integration component

System Under Test (SUT)

...cameras

Case Study: SafeDrive• Our case study describes an automotive system consisting of

four advanced driver assistance features:

• Cruise Control (ACC)

• Traffic Sign Recognition (TSR)

• Pedestrian Protection (PP)

• Automated Emergency Breaking (AEB)

40

Simulation

41

SUT

SimulatorEgo Vehicule

(physical plant)

Pedestrians

Other Vehicules

- Road- Traffic sign- Weather

OutputsTime-stamped vectors for: - the SUT outputs - the states of the physical plant and the mobile environment objects

sensors

cameras

actuators

Environment

mobile objects

static aspects

Dynamic models

Inputs- the initial state of the physical plant and the mobile environment objects- the static environment aspects

Feedback loop

Actuator Command Vectors

42

Safety Requirements

43

Features• Behavior of features based on machine learning algorithms processing sensor

and camera data

• Interactions between features may lead to violating safety requirements, even if features are correct

• E.g., ACC is controlling the car by ordering it to accelerate since the leading car is far away, while a pedestrian starts crossing the road. PP starts sending braking commands to avoid hitting the pedestrian.

• Complex: predict and analyze possible interactions at the requirements level in a complex environment

• Resolution strategies cannot always be determined statically and may depend on environment

44

Objective• Automated and scalable testing to help ensure that resolution

strategies are safe

• Detect undesired feature interactions

• Assumptions: IntC is white-box (integrator is testing), features were previously tested

• Extremely large input space since environmental conditions and scenarios can vary a great deal

45

Input Variables

46

Search• Input space is large

• Dedicated search algorithm (many objectives) directed/guided by test objectives (fitness functions)

• Fitness (distance) functions: reward test cases that are more likely to reveal integration failures leading to safety violations

• Combine three types of functions: (1) safety violations, (2) unsafe overriding by IntC, (3) coverage of the decision structure of integration component

• Many test objectives to be satisfied by the test suite

47

Failure Distance

• Reveal safety requirements violations

• Fitness functions based on the trajectory vectors for the ego car, the leading car and the pedestrian, generated by the simulator

• PP fitness: Minimum distance between the car and the pedestrian during the simulation time.

• AEB fitness: Minimum distance between the car and the leading car during the simulation time.

48

Distance Functions

49

When any of the functions yields zero, a safety failure corresponding to

that function is detected.

Unsafe Overriding Distance

• Goal: Find faults more likely to be due to faults in integration component

• Reward test cases generating integration outputs deviating from the individual feature outputs, in such a way as to possibly lead to safety violations.

• Example: A feature f issues a braking command while the integration component issues no braking command or a braking command with a lower force than that of f .

50

Branch Distance

• Branch coverage of IntC

• Fitness: Approach level and branch distance d (standard for code coverage)

• d(b,tc) = 0 when tc covers b

51

Combining Distance Functions• Goal: Execute every branch of IntC such that while executing

that branch, IntC unsafely overrides every feature f and its outputs violate every safety requirement related to f.

52

Indicates that tc has not covered the branch j

Branch covered but did not caused unsafe override of f

Branch covered, unsafe override, but did not violate requirement I

Search Algorithm• Best test suite covers all search objectives, i.e., for all IntC

branches and all safety requirements

• Not a Pareto front optimization problem

• Objectives compete with each others

• Example: cannot have the ego car violating the speed limit after hitting the leading car in one test case

• Tailored, many-objective genetic algorithm

• Must be efficient (test case executions are very expensive)

53

Search Algorithm

54

Randomly generated TCsCompute fitness

Tests are evolvedCrossover, mutation

Fittest tests selected

Correct constraint violations

Archive covering tests

Evaluation

55

2

0

4 6 8 10 12

1

2

3

4

5

6

7FITest

Baseline

Time (h)

Num

ber o

f Int

egra

tion

erro

rs

Discussion

56

Observations• We will rarely have precise and complete requirements, face great

diversity in the physical environment, including many possible scenarios.

• It is possible, however, to define properties characterizing unacceptable situations (safety)

• Notion of test coverage is elusive: No specification or code/models for some key (decision) components based on ML

• Failure is not clear cut: It is a matter of risk, trade-off …

• We have executable/simulable functional models (e.g., Simulink) at early stages

57

Conclusions• We proposed solutions based on:

• Efficient and realistic (hardware, physics) simulation

• Metaheuristic search, e.g., evolutionary computing

• Guided by fitness functions derived from properties of interest (e.g., safety requirements)

• Machine learning, e.g., to speed up search

• No guarantees though

58

Generalizing

• Examples presented from (safety-critical) cyber-physical systems, e.g., safety requirements

• Can a similar strategy be applied in other domains to test for bias or any other undesirable properties (e.g., legal), when system behavior is driven by machine learning?

• Executable models of environment and users?

59

Summary• Machine learning plays an increasingly prominent role in

autonomous systems

• No (complete) requirements, specifications, or even code

• Some safety and mission-critical requirements

• Neural networks (deep learning) with millions of weights

• How do we gain confidence in such software in a scalable and cost-effective way?

60

Acknowledgements

• Raja Ben Abdessalem

• Shiva Nejati

• Annibale Panichella

• IEE, Luxembourg

61

References

• R. Ben Abdessalem et al., "Testing Advanced Driver Assistance Systems Using Multi-Objective Search and Neural Networks”, IEEE ASE 2016

• R. Ben Abdessalem et al., "Testing Vision-Based Control Systems Using Learnable Evolutionary Algorithms”, IEEE/ACM ICSE 2018

62

.lusoftware verification & validationVVS

Automated Testing of Autonomous Systems

Lionel Briand

VVIoT, Sweden, 2018

Automated Testing of Autonomous Driving Assistance Systemsvviot2018/slides/VViOT2018-Briand.pdf · human oracle cost and multi-objective optimisation. Finally, Section V concludes

Documents