Page 1
Minimizing Annotation Effort
Dr. Antonio M. López
[email protected]
June 9th, 2019
ACKNOWLEDGMENTS
ICREA Academia Programme
ACKNOWLEDGMENTS
MICINN Project TIN2017-88709-R ("DANA")
AGAUR 2017-SGR-01597
CERCA (Centres de Recerca de Catalunya)
ACCIÓ (Generalitat de Catalunya)
Page 2
[email protected]
Divide-&-Conquer Engineering View: Modular approach (Perception Local Maneuver)
Page 3
[email protected]
Deep CNNs Need Annotated Data
Let’s labelling data for fun!
Page 4
[email protected]
2008-10 2011 2012-14 2015-16 2017-19
1rst object detector
fully trained using
videogame data.
ECCV’14, ICCV’15, ECCV’16, ICCV’17, ECCV’18*
(*) Friday, Full day at room N1095ZG, VisDA Challenge
DA VirtualReal
for DPM
Virtual/Augmented Reality
for Visual Artificial Intelligence (VARVAI)
Deep Learning “starts”
for Computer Vision
Transferring & Adapting Source Knowledge
in Computer Vision (TASK-CV)
ECCV’16 & ACM-MM’16
’18: Computer Graphics for Autonomous Driving
Explosion on the use of synthetic
data in Computer Vision: GTA-V,
Internet Models, ...
AD Challenge
@ CVPR’19
Page 5
[email protected]
Pure Data-Driven AI View & Naturalistic View: End-to-End Autonomous Driving
Page 6
[email protected]
Imitation Learning: No manual supervision
Page 7
[email protected]
ALVINN (1988)¹ DAVE (2005)²
1.D. Pomerleau. ALVINN: An autonomous land vehicle in a neural network. NIPS, 1988.
2.Y. LeCun, U. Muller, J. Ben, E. Cosatto, and B. Flepp. Off-road obstacle avoidance through end-to-end learning. NIPS, 2005.
Page 8
[email protected]
Pure Data-Driven AI View & Naturalistic View: End-to-End Autonomous Driving (P&LP)
Still, many diverse experiences are required!
Page 9
Index
• SYNTHIA: co-training object detectors
• CARLA: multimodal end-to-end driving
Page 10
Index
• SYNTHIA: co-training object detectors
• CARLA: multimodal end-to-end driving
Page 13
Unlabelled
Real-world
Data
Self-labelled
Real-world
Data
Object
DetectorDetect
Self-Learning, under domain shift
source: SYNTHIA, target: real-world dataset.
Detections as
Labelled Data
Train
Basic assumption:
Source model is relatively good detecting
on target data.
Basic idea:
1. Start with a detector trained on SYNTHIA.
2. Use the detector to process images of an
unlabelled real-world dataset (e.g. KITTI).
3. Select the M images with highest detection
scores. (Thr high precision, low recall).
4. Use detections and backgrounds from such
M images as self-labelled real-world data.
5. Retrain the detector with the SYNTHIA
data and the self-labelled data.
6. Keep doing 2-5 for C cycles.
Page 14
Co-Training, under domain shift source: SYNTHIA, target: real-world dataset.
Unlabelled
Real-world
Data
Self-labelled
Real-world
Data #1
Object
Detector #1
Detect
Detections as
Labelled Data
TrainObject
Detector #2
Self-labelled
Real-world
Data # 2
Train
Detect
Basic assumptions:
1. Source models are good
detecting on target data.
2. Both detectors behave
essentially different.
Basic idea:
1. ~ Self-learning: one detector
(#1) sends to the other (#2)
the M images with most
confident detections.
2. ~ Discrepancy: from such M
images, the other detector
(#2) only keeps the N with
lowest confidence, N<M.
3. Parallel training.
4. Keep doing 1-3 for C cycles.
Page 15
Index
• SYNTHIA: co-training object detectors
• CARLA: multimodal end-to-end driving
Page 16
[email protected]
Pure Data-Driven AI View & Naturalistic View: End-to-End Autonomous Driving (P&LP)
… by Imitation/demonstration (behavior cloning)
Page 17
[email protected]
?StraightLeft Right Nothing
Trajectory Planning
Page 18
[email protected]
Branched Architecture
“End to End Driving via Conditional Imitation Learning”, Codevilla et al., ICRA’2018
Page 20
[email protected]
“Monocular Depth Estimation by Learning from Heterogeneous Datasets”,
A. Gurram, O. Urfalioglu, I. Halfaoui, F. Bouzaraa, A.M. Lopez,
IEEE Intelligent Vehicles Symposium, 2018
Depth ground truth: KITTI LiDAR
Semantic ground truth: Cityscapes semantic segmenation
Page 21
[email protected] 21
Phase 1 – Discrete depth estimation (i.e. classification).
Page 22
[email protected] 22
Phase 1 – Semantic segmentation (classification).
Page 23
[email protected] 23
Phase 2 – Depth regression.
Page 24
[email protected]
KITTI: Training set (LiDAR ground truth) & Testing set
Page 25
[email protected]
Quantitative results
Eigen et al. KITTI split. DRN - Depth regression network, DC-DRN Depth regression model with pre-trained classification network. DSC-DRN - Depth
regression network trained with the conditional flow approach for depth ranges 1-80m & 1-50m. In Godard approaches, "K" means using KITTI for
training, "CS + K" means using Cityscapes too. Bold stands for best, italics for second best.
Page 26
[email protected]
Cityscapes Testing! (cross-domain generalization)
Page 27
[email protected]
Photo-realistic SYNTHIA
Page 28
[email protected]
Multimodal end-to-end driving: RGB+D multisensory / single-sensor (monocular)
Yi et al. (arXiv:1906.03199)
Page 31
Address
Edifici O, Campus UAB
08193 Bellaterra
Barcelona
Phone & Fax
Direct Line: +34 93 581 2561
Fax: +34 93 581 1670
www.cvc.uab.es
E-contact
www.cvc.uab.es/~antonio
[email protected]
Dr. Antonio M. López, Principal Investigator UAB & CVC ADAS Group
In conclusion, we are
lazy annotators!!!
Many Thanks!!!Q?