Top Banner
Deep Learning for Robots Raia Hadsell www.raiahadsell.com
40

Raia Hadsellraiahadsell.com/uploads/3/6/4/2/36428762/erf2017_keynote_talk.pdf · slide from V. Vanhoucke Deep Net Deep Net Deep Net. General Artificial Intelligence Robotics is different

Nov 09, 2018

Download

Documents

lyhanh
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Raia Hadsellraiahadsell.com/uploads/3/6/4/2/36428762/erf2017_keynote_talk.pdf · slide from V. Vanhoucke Deep Net Deep Net Deep Net. General Artificial Intelligence Robotics is different

Deep Learning for Robots

Raia Hadsellwww.raiahadsell.com

Page 2: Raia Hadsellraiahadsell.com/uploads/3/6/4/2/36428762/erf2017_keynote_talk.pdf · slide from V. Vanhoucke Deep Net Deep Net Deep Net. General Artificial Intelligence Robotics is different

The story:

1. Deep Learning is the future of robotics

2. There are very significant challenges

3. But some solutions emerging, as well.

Page 3: Raia Hadsellraiahadsell.com/uploads/3/6/4/2/36428762/erf2017_keynote_talk.pdf · slide from V. Vanhoucke Deep Net Deep Net Deep Net. General Artificial Intelligence Robotics is different

2010: Speech Recognition

Audio → Acoustic Model → Phonetic Model → Language Model → Text

2012: Computer Vision

Pixels → Key Points → SIFT features → Deformable Part Model → Labels

2014: Machine Translation

Text → Reordering → Phrase Table/Dictionary → Language Model → Text

2017: Robotics?

Sensors → Perception → World Model → Planning → Control → Action

End-to-end Deep Learning for robots?

slide from V. Vanhoucke

Page 4: Raia Hadsellraiahadsell.com/uploads/3/6/4/2/36428762/erf2017_keynote_talk.pdf · slide from V. Vanhoucke Deep Net Deep Net Deep Net. General Artificial Intelligence Robotics is different

2010: Speech Recognition

Audio → Acoustic Model → Phonetic Model → Language Model → Text

2012: Computer Vision

Pixels → Key Points → SIFT features → Deformable Part Model → Labels

2014: Machine Translation

Text → Reordering → Phrase Table/Dictionary → Language Model → Text

2017: Robotics?

Sensors → Perception → World Model → Planning → Control → Action

Deep Net

End-to-end Deep Learning for robots?

slide from V. Vanhoucke

Deep Net

Deep Net

Deep Net

Page 5: Raia Hadsellraiahadsell.com/uploads/3/6/4/2/36428762/erf2017_keynote_talk.pdf · slide from V. Vanhoucke Deep Net Deep Net Deep Net. General Artificial Intelligence Robotics is different

General Artificial Intelligence

Robotics is different

LABELS

Page 6: Raia Hadsellraiahadsell.com/uploads/3/6/4/2/36428762/erf2017_keynote_talk.pdf · slide from V. Vanhoucke Deep Net Deep Net Deep Net. General Artificial Intelligence Robotics is different

General Artificial Intelligence

Robotics is different

ACTIONSSENSORS

Page 7: Raia Hadsellraiahadsell.com/uploads/3/6/4/2/36428762/erf2017_keynote_talk.pdf · slide from V. Vanhoucke Deep Net Deep Net Deep Net. General Artificial Intelligence Robotics is different

General Artificial Intelligence

EnvironmentAgent

Reinforcement Learning

GOALOBSERVATIONS

ACTIONS

REWARD?

Page 8: Raia Hadsellraiahadsell.com/uploads/3/6/4/2/36428762/erf2017_keynote_talk.pdf · slide from V. Vanhoucke Deep Net Deep Net Deep Net. General Artificial Intelligence Robotics is different

General Artificial Intelligence

EnvironmentAgent

Deep Reinforcement Learning

GOALOBSERVATIONS

ACTIONS

REWARD?

neural network

Page 9: Raia Hadsellraiahadsell.com/uploads/3/6/4/2/36428762/erf2017_keynote_talk.pdf · slide from V. Vanhoucke Deep Net Deep Net Deep Net. General Artificial Intelligence Robotics is different

General Artificial Intelligence

● Sensorimotor control ?

Could deep RL allow robots to learn end-to-end?

Page 10: Raia Hadsellraiahadsell.com/uploads/3/6/4/2/36428762/erf2017_keynote_talk.pdf · slide from V. Vanhoucke Deep Net Deep Net Deep Net. General Artificial Intelligence Robotics is different

Space Invaders

[Mnih et al, Playing Atari with Deep Reinforcement Learning, 2014]

https://www.youtube.com/watch?v=wHDxF5N700Q

Page 11: Raia Hadsellraiahadsell.com/uploads/3/6/4/2/36428762/erf2017_keynote_talk.pdf · slide from V. Vanhoucke Deep Net Deep Net Deep Net. General Artificial Intelligence Robotics is different

General Atari Player

[Mnih et al, Playing Atari with Deep Reinforcement Learning, 2014]

https://www.youtube.com/watch?v=Erkt7HelEco

Page 12: Raia Hadsellraiahadsell.com/uploads/3/6/4/2/36428762/erf2017_keynote_talk.pdf · slide from V. Vanhoucke Deep Net Deep Net Deep Net. General Artificial Intelligence Robotics is different

9DOF Random reacher

https://youtu.be/u0M3PvTgTcE

Page 13: Raia Hadsellraiahadsell.com/uploads/3/6/4/2/36428762/erf2017_keynote_talk.pdf · slide from V. Vanhoucke Deep Net Deep Net Deep Net. General Artificial Intelligence Robotics is different

General Artificial Intelligence

● Sensorimotor control

Could deep RL allow robots to learn end-to-end?

Page 14: Raia Hadsellraiahadsell.com/uploads/3/6/4/2/36428762/erf2017_keynote_talk.pdf · slide from V. Vanhoucke Deep Net Deep Net Deep Net. General Artificial Intelligence Robotics is different

General Artificial Intelligence

● Sensorimotor control

● Exploration of complex spaces ?

Could deep RL allow robots to learn end-to-end?

Page 15: Raia Hadsellraiahadsell.com/uploads/3/6/4/2/36428762/erf2017_keynote_talk.pdf · slide from V. Vanhoucke Deep Net Deep Net Deep Net. General Artificial Intelligence Robotics is different

Maze navigation

https://youtu.be/zHhbypmKaj0

Page 16: Raia Hadsellraiahadsell.com/uploads/3/6/4/2/36428762/erf2017_keynote_talk.pdf · slide from V. Vanhoucke Deep Net Deep Net Deep Net. General Artificial Intelligence Robotics is different

General Artificial Intelligence

● Sensorimotor control

● Exploration of complex spaces

Could deep RL allow robots to learn end-to-end?

Page 17: Raia Hadsellraiahadsell.com/uploads/3/6/4/2/36428762/erf2017_keynote_talk.pdf · slide from V. Vanhoucke Deep Net Deep Net Deep Net. General Artificial Intelligence Robotics is different

General Artificial Intelligence

● Sensorimotor control

● Exploration of complex spaces

● Strategy and decision making ?

Could deep RL allow robots to learn end-to-end?

Page 18: Raia Hadsellraiahadsell.com/uploads/3/6/4/2/36428762/erf2017_keynote_talk.pdf · slide from V. Vanhoucke Deep Net Deep Net Deep Net. General Artificial Intelligence Robotics is different
Page 19: Raia Hadsellraiahadsell.com/uploads/3/6/4/2/36428762/erf2017_keynote_talk.pdf · slide from V. Vanhoucke Deep Net Deep Net Deep Net. General Artificial Intelligence Robotics is different

General Artificial Intelligence

Policy Network Value Network

Page 20: Raia Hadsellraiahadsell.com/uploads/3/6/4/2/36428762/erf2017_keynote_talk.pdf · slide from V. Vanhoucke Deep Net Deep Net Deep Net. General Artificial Intelligence Robotics is different

General Artificial Intelligence

Lesson: use supervised learning when possible

Human expertpositions

Supervised Learningpolicy network

Reinforcement Learningpolicy network

Generates New Data(30 mil. Positions)

Value network

Self Play Self Play

Page 21: Raia Hadsellraiahadsell.com/uploads/3/6/4/2/36428762/erf2017_keynote_talk.pdf · slide from V. Vanhoucke Deep Net Deep Net Deep Net. General Artificial Intelligence Robotics is different

General Artificial Intelligence

● Sensorimotor control

● Exploration of complex spaces

● Strategy and decision making

Could deep RL allow robots to learn end-to-end?

Page 22: Raia Hadsellraiahadsell.com/uploads/3/6/4/2/36428762/erf2017_keynote_talk.pdf · slide from V. Vanhoucke Deep Net Deep Net Deep Net. General Artificial Intelligence Robotics is different

General Artificial Intelligence

Not so fast …

● Deep RL is very data inefficient -

how can it learn on real robots?

So, where are the superhuman robots?

24 hours in simulation with 16 threads …… 55 days on the real Jaco arm

Page 23: Raia Hadsellraiahadsell.com/uploads/3/6/4/2/36428762/erf2017_keynote_talk.pdf · slide from V. Vanhoucke Deep Net Deep Net Deep Net. General Artificial Intelligence Robotics is different

General Artificial Intelligence

1. Train in simulation, then transfer to real robot

● Benefit is obvious

● Hard to do in practice

Two methods to speed up Deep RL for robots

Page 24: Raia Hadsellraiahadsell.com/uploads/3/6/4/2/36428762/erf2017_keynote_talk.pdf · slide from V. Vanhoucke Deep Net Deep Net Deep Net. General Artificial Intelligence Robotics is different

General Artificial Intelligence

Progressive Neural Networks

Page 25: Raia Hadsellraiahadsell.com/uploads/3/6/4/2/36428762/erf2017_keynote_talk.pdf · slide from V. Vanhoucke Deep Net Deep Net Deep Net. General Artificial Intelligence Robotics is different

General Artificial Intelligence

Progressive Neural Networks

Page 26: Raia Hadsellraiahadsell.com/uploads/3/6/4/2/36428762/erf2017_keynote_talk.pdf · slide from V. Vanhoucke Deep Net Deep Net Deep Net. General Artificial Intelligence Robotics is different

General Artificial Intelligence

11

22

a

a

Progressive Neural Networks

Page 27: Raia Hadsellraiahadsell.com/uploads/3/6/4/2/36428762/erf2017_keynote_talk.pdf · slide from V. Vanhoucke Deep Net Deep Net Deep Net. General Artificial Intelligence Robotics is different

General Artificial Intelligence

11

22

33

a

a

a

a

a

a

Progressive Neural Networks

Page 28: Raia Hadsellraiahadsell.com/uploads/3/6/4/2/36428762/erf2017_keynote_talk.pdf · slide from V. Vanhoucke Deep Net Deep Net Deep Net. General Artificial Intelligence Robotics is different

Sim-to-Real

Page 29: Raia Hadsellraiahadsell.com/uploads/3/6/4/2/36428762/erf2017_keynote_talk.pdf · slide from V. Vanhoucke Deep Net Deep Net Deep Net. General Artificial Intelligence Robotics is different

Sim-to-Real1

12

23

3

Simulation Robot

Task A Task A Task B

Page 30: Raia Hadsellraiahadsell.com/uploads/3/6/4/2/36428762/erf2017_keynote_talk.pdf · slide from V. Vanhoucke Deep Net Deep Net Deep Net. General Artificial Intelligence Robotics is different

General Artificial Intelligence

Sim-to-Real: 3d reacher

https://www.youtube.com/watch?v=YZz5Io_ipi8

11

128

22

a

a

a

16

Page 31: Raia Hadsellraiahadsell.com/uploads/3/6/4/2/36428762/erf2017_keynote_talk.pdf · slide from V. Vanhoucke Deep Net Deep Net Deep Net. General Artificial Intelligence Robotics is different

General Artificial Intelligence

Sim-to-Real: 2d reacher with moving target 1

12

23

3

128 16 16

www.youtube.com/watch?v=e78J1K5LKCI

Page 32: Raia Hadsellraiahadsell.com/uploads/3/6/4/2/36428762/erf2017_keynote_talk.pdf · slide from V. Vanhoucke Deep Net Deep Net Deep Net. General Artificial Intelligence Robotics is different

arxiv.org/abs/1606.04671 arxiv.org/abs/1610.04286v1

Andrei Rusu Neil C. Rabinowitz

Guillaume Desjardins

Hubert Soyer

James Kirkpatrick

Koray Kavukcuoglu

Razvan Pascanu

Progressive Neural Networks

Sim-to-Real Robot Learning from Pixels

NicolasHeess

Raia Hadsell

Page 33: Raia Hadsellraiahadsell.com/uploads/3/6/4/2/36428762/erf2017_keynote_talk.pdf · slide from V. Vanhoucke Deep Net Deep Net Deep Net. General Artificial Intelligence Robotics is different

General Artificial Intelligence

2. Learn with auxiliary tasks● Accelerate and stabilise reinforcement learning

Two methods to speed up Deep RL for robots

Page 34: Raia Hadsellraiahadsell.com/uploads/3/6/4/2/36428762/erf2017_keynote_talk.pdf · slide from V. Vanhoucke Deep Net Deep Net Deep Net. General Artificial Intelligence Robotics is different

General Artificial Intelligence

Navigation mazes Game episode:

1. Random start2. Find the goal (+10)3. Teleport randomly4. Re-find the goal (+10)5. Repeat (limited time)

+10 +1

Page 35: Raia Hadsellraiahadsell.com/uploads/3/6/4/2/36428762/erf2017_keynote_talk.pdf · slide from V. Vanhoucke Deep Net Deep Net Deep Net. General Artificial Intelligence Robotics is different

General Artificial Intelligence

Nav agent ingredients:

1. Convolutional encoder and RGB inputs

2. Stacked LSTM

3. Additional inputs (reward, action, and velocity)

4. RL: Asynchronous advantage actor critic (A3C)

5. Auxiliary task 1: Depth predictor

6. Auxiliary task 2: Loop closure predictor enc

Loop (L)

Depth (D1 )

xt rt-1 {vt, at-1}

Depth (D2 )

[Mnih et al, Asynchonous Methods for Deep Reinforcement Learning, 2016]

Page 36: Raia Hadsellraiahadsell.com/uploads/3/6/4/2/36428762/erf2017_keynote_talk.pdf · slide from V. Vanhoucke Deep Net Deep Net Deep Net. General Artificial Intelligence Robotics is different

Variations in architecture

xt rt-1 {vt, at-1}

enc

xt

enc enc

Loop (L)

Depth (D1 )

a. FF A3C c. Nav A3C d. Nav A3C +D1D2L

xt rt-1 {vt, at-1}

enc

xt

b. LSTM A3C

Depth (D2 )

Page 37: Raia Hadsellraiahadsell.com/uploads/3/6/4/2/36428762/erf2017_keynote_talk.pdf · slide from V. Vanhoucke Deep Net Deep Net Deep Net. General Artificial Intelligence Robotics is different

Results: Results: Auxiliary tasks speed up RL ten-fold!

Page 38: Raia Hadsellraiahadsell.com/uploads/3/6/4/2/36428762/erf2017_keynote_talk.pdf · slide from V. Vanhoucke Deep Net Deep Net Deep Net. General Artificial Intelligence Robotics is different

https://youtu.be/lNoaTyMZsWI

Page 39: Raia Hadsellraiahadsell.com/uploads/3/6/4/2/36428762/erf2017_keynote_talk.pdf · slide from V. Vanhoucke Deep Net Deep Net Deep Net. General Artificial Intelligence Robotics is different

Piotr Mirowski, Razvan Pascanu, Raia Hadsell

Fabio Ross Andy Hubert Laurent Koray Dharsh Misha Andrea

Learning to navigate in complex environments

arxiv.org/abs/1611.03673

Page 40: Raia Hadsellraiahadsell.com/uploads/3/6/4/2/36428762/erf2017_keynote_talk.pdf · slide from V. Vanhoucke Deep Net Deep Net Deep Net. General Artificial Intelligence Robotics is different

The story:

1. Deep Learning is the future of robotics

2. There are very significant challenges

3. But some solutions emerging, as well.

Thank you!We are hiring! [email protected], [email protected]