Edge Cases and Autonomous Vehicle Safety SATURN, Pittsburgh PA May 7, 2019 © 2019 Philip Koopman Prof. Philip Koopman @PhilKoopman
Edge Cases andAutonomous Vehicle
SafetySATURN, Pittsburgh PA
May 7, 2019
© 2019 Philip Koopman
Prof. Philip Koopman
@PhilKoopman
2© 2019 Philip Koopman
Making safe robots Doer/Checker safetyEdge cases matter Robust perception mattersThe heavy tail distribution Fixing stuff you see in testing
isn’t enoughPerception stress testing Finding the weaknesses in perception
UL 4600: autonomy safety standard
Overview
[General Motors]
3© 2019 Philip Koopman
98% Solved For 20+ YearsWashington DC to San Diego CMU Navlab 5 Dean Pomerleau Todd Jochem
https://www.cs.cmu.edu/~tjochem/nhaa/nhaa_home_page.html
AHS San Diego demo Aug 1997
July1995
4© 2019 Philip Koopman
1985 1990 1995 2000 2005 2010
DARPAGrand
Challenge
DARPALAGR
ARPADemo
II
DARPASC-ALV
NASALunar Rover
NASADante II
AutoExcavator
AutoHarvesting
Auto Forklift
Mars Rovers
Urban Challenge
DARPAPerceptOR
DARPAUPI
Auto Haulage
Auto Spraying
Laser Paint RemovalArmy
FCS
Carnegie Mellon University Faculty, staff, studentsOff-campus Robotics Institute facility
NREC: 30+ Years Of Cool Robots
SoftwareSafety
5© 2019 Philip Koopman
The Big Red Button era
Before Autonomy Software Safety
6© 2019 Philip Koopman
Traditional Validation Vs. Machine Learning Use traditional software
safety where you can
..BUT..
Machine Learning (inductive training) No requirements
–Training data is difficult to validate No design insight
–Generally inscrutable; prone to gaming and brittleness
?
7© 2019 Philip Koopman
APD (Autonomous Platform Demonstrator)
TARGET GVW: 8,500 kg TARGET SPEED: 80 km/hr
Approved for Public Release. TACOM Case #20247 Date: 07 OCT 2009
Safety critical speed limit enforcement
8© 2019 Philip Koopman
Specify unsafe regions
Specify safe regions Under-approximate to simplify
Trigger system safety responseupon transition to unsafe region
Safety Envelope Approach to ML Deployment
UNSAFE!
9© 2019 Philip Koopman
“Doer” subsystem Implements normal, untrusted functionality
“Checker” subsystem – Traditional SW Implements failsafes (safety functions)
Checker entirely responsible for safety Doer can be at low Safety Integrity Level Checker must be at higher SIL
(Also known as a “safety bag” approach; also monitor/actuator pair)
Architecting A Safety Envelope SystemDoer/Checker Pair
Low SIL
High SILSimpleSafetyEnvelopeChecker
ML
10© 2019 Philip Koopman
Validating an Autonomous Vehicle Pipeline
ControlSystems
ControlSoftwareValidation
Doer/CheckerArchitecture
AutonomyInterface To
Vehicle
TraditionalSoftwareValidation
Perception presents a uniquely difficult assurance challenge
Randomized& HeuristicAlgorithms
Run-TimeSafety EnvelopesDoer/Checker
Architecture
MachineLearning
BasedApproaches
???
11© 2019 Philip Koopman
Good for identifying “easy” cases Expensive and potentially dangerous
Brute Force AV Validation: Public Road Testing
http://bit.ly/2toadfa
12© 2019 Philip Koopman
Validation Via Brute Force Road Testing? If 100M miles/critical mishap… Test 3x–10x longer than mishap rate Need 1 Billion miles of testing
That’s ~25 round tripson every road in the world With fewer than 10 critical mishaps…
13© 2019 Philip Koopman
Safer, but expensive Not scalable Only tests things you have thought of!
Closed Course Testing
Volvo / Motor Trend
14© 2019 Philip Koopman
Highly scalable; less expensive Scalable; need to manage fidelity vs. cost Only tests things you have thought of!
Simulation
http://bit.ly/2K5pQCN
Udacity ANSYS
15© 2019 Philip Koopman
Gaps in training data canlead to perception failure Safety needs to know:
“Is that a person?” Machine learning provides:
“Is that thing like the peoplein my training data?”
Edge Case are surprises You won’t see these in training or testing
Edge cases are the stuff you didn’t think of!
What About Edge Cases?
https://www.clarifai.com/demo
http://bit.ly/2In4rzj
16© 2019 Philip Koopman
Novel objects (missing from zoo) are triggering events
Need An Edge Case “Zoo”
http://bit.ly/2top1KDhttp://bit.ly/2tvCCPK
https://goo.gl/J3SSyu
17© 2019 Philip Koopman
Where will you be after 1 Billion miles of validation testing?
Assume 1 Million miles between unsafe “surprises” Example #1:
100 “surprises” @ 100M miles / surprise– All surprises seen about 10 times during testing– With luck, all bugs are fixed
Example #2: 100,000 “surprises” @ 100B miles / surprise– Only 1% of surprises seen during 1B mile testing– Bug fixes give no real improvement (1.01M miles / surprise)
Why Edge Cases Matter
https://goo.gl/3dzguf
18© 2019 Philip Koopman
Real World: Heavy Tail Distribution(?)
Common ThingsSeen In Testing
Edge CasesNot Seen In Testing
(Heavy Tail Distribution)
19© 2019 Philip Koopman
Need to find “Triggering Events” to inject into sims/testing
The Heavy Tail Testing Ceiling
20© 2019 Philip Koopman
Need to collect surprises Novel objects Novel operational conditions
Corner Cases vs. Edge Cases Corner cases: infrequent combinations
– Not all corner cases are edge cases Edge cases: combinations that behave unexpectedly
Issue: novel for person ≠ novel for Machine Learning ML can have “edges” in unexpected places ML might train on features that seem irrelevant to people
Edge Cases Pt. 1: Triggering Event Zoo
https://goo.gl/Ni9HhU Not A Pedestrian
21© 2019 Philip Koopman
Sensor data corruption experiments
Edge Cases Part 2: Brittleness
Synthetic Equipment Faults
Gaussian blur
Exploring the response of a DNN to environmentalperturbations from “Robustness Testing forPerception Systems,” RIOT Project, NREC, DIST-A.
Defocus & haze area significant issue
Gaussian Blur &Gaussian Noise cause
similar failures
22© 2019 Philip Koopman
Brittle perception behavior indicates Edge Cases Can uncover false negatives and detect novel objects
Hologram Detects Edge Cases
23© 2019 Philip Koopman
False positive on lane markingFalse negative real bicyclist
False negative whenin front of dark vehicle
False negative whenperson next to light pole
Context-Dependent Perception FailuresPerception failures are often context-dependent False positives and false negatives are both a problem
Will this pass a “vision test” for bicyclists?
24© 2019 Philip Koopman
Mask-R CNN: examples of systemic problems we found
Example Triggering Events via Hologram
“Red objects”
Notes: These are baseline, un-augmented images // Your mileage may vary.“Columns”
“Camouflage”
“Sun glare”
“Bare legs”
“Children”
“Single Lane Control”
25© 2019 Philip Koopman
26© 2019 Philip Koopman
27© 2019 Philip Koopman
28© 2019 Philip Koopman
Drivers do more than just drive Occupant behavior, passenger safety Detecting and managing equipment faults
Operational limitations & situations System exits Operational Design Domain Vehicle fire or catastrophic failure Post-crash response
Interacting with non-drivers Pedestrians, passengers Police, emergency responders
Operations & Human Interactions
https://bit.ly/2GvDkUN
https://bit.ly/2PhzilT
29© 2019 Philip Koopman
Handling updates Fully recertify after
every weekly update? Security in general
Vehicle maintenance Pre-flight checks, cleaning Corrective maintenance
Supply chain issues Quality fade Supply chain faults
Lifecycle Issueshttps://bit.ly/2IKlZJ9
https://bit.ly/2VavsjM
Is windshield cleaning fluid life critical?
30© 2019 Philip Koopman
Safety Standard Landscape
31© 2019 Philip Koopman
More safety transparency Independent safety assessments Industry collaboration on safety
Minimum performance standards “Driver test” is necessary -- but not sufficient
– How do you measure maturity?
Autonomy software safety standards ISO 26262/21448 + UL 4600 + IEEE P700x Dealing with uncertainty and brittleness
Ways To Improve AV Safety
http://bit.ly/2MTbT8F (sign modified)
Mars
Thanks!