Top Banner
Deep Reinforcement Learning and Complex Environments Raia Hadsell
65

Deep Reinforcement Learning and Complex Environments · slide from V. Vanhoucke Deep Net Deep Net. General Artificial Intelligence Robotics is different LABELS. General Artificial

Nov 09, 2018

Download

Documents

DuongAnh
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Deep Reinforcement Learning and Complex Environments · slide from V. Vanhoucke Deep Net Deep Net. General Artificial Intelligence Robotics is different LABELS. General Artificial

Deep Reinforcement Learning and Complex Environments

Raia Hadsell

Page 2: Deep Reinforcement Learning and Complex Environments · slide from V. Vanhoucke Deep Net Deep Net. General Artificial Intelligence Robotics is different LABELS. General Artificial

End-to-end Deep Learning for robots?

slide from V. Vanhoucke

Page 3: Deep Reinforcement Learning and Complex Environments · slide from V. Vanhoucke Deep Net Deep Net. General Artificial Intelligence Robotics is different LABELS. General Artificial

2010: Speech Recognition

Audio → Acoustic Model → Phonetic Model → Language Model → TextDeep Net

End-to-end Deep Learning for robots?

slide from V. Vanhoucke

Page 4: Deep Reinforcement Learning and Complex Environments · slide from V. Vanhoucke Deep Net Deep Net. General Artificial Intelligence Robotics is different LABELS. General Artificial

2010: Speech Recognition

Audio → Acoustic Model → Phonetic Model → Language Model → Text

2012: Computer Vision

Pixels → Key Points → SIFT features → Deformable Part Model → Labels

Deep Net

End-to-end Deep Learning for robots?

slide from V. Vanhoucke

Deep Net

Page 5: Deep Reinforcement Learning and Complex Environments · slide from V. Vanhoucke Deep Net Deep Net. General Artificial Intelligence Robotics is different LABELS. General Artificial

2010: Speech Recognition

Audio → Acoustic Model → Phonetic Model → Language Model → Text

2012: Computer Vision

Pixels → Key Points → SIFT features → Deformable Part Model → Labels

2014: Machine Translation

Text → Reordering → Phrase Table/Dictionary → Language Model → Text

Deep Net

End-to-end Deep Learning for robots?

slide from V. Vanhoucke

Deep Net

Deep Net

Page 6: Deep Reinforcement Learning and Complex Environments · slide from V. Vanhoucke Deep Net Deep Net. General Artificial Intelligence Robotics is different LABELS. General Artificial

2010: Speech Recognition

Audio → Acoustic Model → Phonetic Model → Language Model → Text

2012: Computer Vision

Pixels → Key Points → SIFT features → Deformable Part Model → Labels

2014: Machine Translation

Text → Reordering → Phrase Table/Dictionary → Language Model → Text

2017: Robotics?

Sensors → Perception → World Model → Planning → Control → Action

Deep Net

End-to-end Deep Learning for robots?

slide from V. Vanhoucke

Deep Net

Deep Net

Page 7: Deep Reinforcement Learning and Complex Environments · slide from V. Vanhoucke Deep Net Deep Net. General Artificial Intelligence Robotics is different LABELS. General Artificial

General Artificial Intelligence

Robotics is different

LABELS

Page 8: Deep Reinforcement Learning and Complex Environments · slide from V. Vanhoucke Deep Net Deep Net. General Artificial Intelligence Robotics is different LABELS. General Artificial

General Artificial Intelligence

Robotics is different

ACTIONSSENSORS

Page 9: Deep Reinforcement Learning and Complex Environments · slide from V. Vanhoucke Deep Net Deep Net. General Artificial Intelligence Robotics is different LABELS. General Artificial

General Artificial Intelligence

EnvironmentAgent

Deep Reinforcement Learning

GOALOBSERVATIONS

ACTIONS

REWARD?

neural network

Page 10: Deep Reinforcement Learning and Complex Environments · slide from V. Vanhoucke Deep Net Deep Net. General Artificial Intelligence Robotics is different LABELS. General Artificial

General Atari Player

[Mnih et al, Playing Atari with Deep Reinforcement Learning, 2014]

Page 11: Deep Reinforcement Learning and Complex Environments · slide from V. Vanhoucke Deep Net Deep Net. General Artificial Intelligence Robotics is different LABELS. General Artificial

9DOF Random reacher

Page 12: Deep Reinforcement Learning and Complex Environments · slide from V. Vanhoucke Deep Net Deep Net. General Artificial Intelligence Robotics is different LABELS. General Artificial

Deep RL — Raia Hadsell

● Can deep RL agents learn multiple tasks?

● Can deep RL agents learn efficiently?

● Can deep RL agents learn from real data?

● Can deep RL agents learn continuous control?

Page 13: Deep Reinforcement Learning and Complex Environments · slide from V. Vanhoucke Deep Net Deep Net. General Artificial Intelligence Robotics is different LABELS. General Artificial

Lab Mazes StreetLearn ParkourMultiple Tasks&

Lifelong learning

Page 14: Deep Reinforcement Learning and Complex Environments · slide from V. Vanhoucke Deep Net Deep Net. General Artificial Intelligence Robotics is different LABELS. General Artificial

Raia Hadsell 2017

Lifelong Learning - 3 challenges

1. Catastrophic forgetting2. Positive transfer3. Specialization and generalization

Page 15: Deep Reinforcement Learning and Complex Environments · slide from V. Vanhoucke Deep Net Deep Net. General Artificial Intelligence Robotics is different LABELS. General Artificial

Raia Hadsell 2017

Catastrophic forgetting

● Well-known phenomenon● Especially severe in Deep RL

Page 16: Deep Reinforcement Learning and Complex Environments · slide from V. Vanhoucke Deep Net Deep Net. General Artificial Intelligence Robotics is different LABELS. General Artificial

Raia Hadsell 2017

Catastrophic forgetting

● Well-known phenomenon● Especially severe in Deep RL

Page 17: Deep Reinforcement Learning and Complex Environments · slide from V. Vanhoucke Deep Net Deep Net. General Artificial Intelligence Robotics is different LABELS. General Artificial

Raia Hadsell 2017

Catastrophic forgetting

Page 18: Deep Reinforcement Learning and Complex Environments · slide from V. Vanhoucke Deep Net Deep Net. General Artificial Intelligence Robotics is different LABELS. General Artificial

Raia Hadsell 2017

Catastrophic forgetting

Page 19: Deep Reinforcement Learning and Complex Environments · slide from V. Vanhoucke Deep Net Deep Net. General Artificial Intelligence Robotics is different LABELS. General Artificial

Raia Hadsell 2017

Elastic Weight Consolidation

Task B

𝜽*

Task A

SGD

EWC

L2

James Kirkpatrick et al (2017), “Overcoming Catastrophic Forgetting in NNs”

Page 20: Deep Reinforcement Learning and Complex Environments · slide from V. Vanhoucke Deep Net Deep Net. General Artificial Intelligence Robotics is different LABELS. General Artificial

Raia Hadsell 2017

What if my tasks really don’t get along?

Page 21: Deep Reinforcement Learning and Complex Environments · slide from V. Vanhoucke Deep Net Deep Net. General Artificial Intelligence Robotics is different LABELS. General Artificial

Raia Hadsell 2017

Progressive Nets● add columns for new tasks● freeze params of learnt columns● layer-wise neural connections

→ capacity for task-specific features→ enables deep compositionality → precludes forgetting

What if my tasks really don’t get along?

𝛑1𝝂1

Andrei Rusu et al (2016), “Progressive Neural Networks”

Page 22: Deep Reinforcement Learning and Complex Environments · slide from V. Vanhoucke Deep Net Deep Net. General Artificial Intelligence Robotics is different LABELS. General Artificial

Raia Hadsell 2017

Progressive Nets● add columns for new tasks● freeze params of learnt columns● layer-wise neural connections

→ capacity for task-specific features→ enables deep compositionality → precludes forgetting

What if my tasks really don’t get along?

𝛑1𝝂1 𝛑

2

𝝂2

a

a

Andrei Rusu et al (2016), “Progressive Neural Networks”

Page 23: Deep Reinforcement Learning and Complex Environments · slide from V. Vanhoucke Deep Net Deep Net. General Artificial Intelligence Robotics is different LABELS. General Artificial

Raia Hadsell 2017

Progressive Nets● add columns for new tasks● freeze params of learnt columns● layer-wise neural connections

→ capacity for task-specific features→ enables deep compositionality → precludes forgetting

What if my tasks really don’t get along?

𝛑1𝝂1 𝛑

2

𝝂2 𝛑3

𝝂3

a

a

a

a

a

a

Andrei Rusu et al (2016), “Progressive Neural Networks”

Page 24: Deep Reinforcement Learning and Complex Environments · slide from V. Vanhoucke Deep Net Deep Net. General Artificial Intelligence Robotics is different LABELS. General Artificial

Sim-to-Real𝛑1

𝝂1 𝛑2

𝝂2 𝛑3

𝝂3

Simulation Robot

Task A Task A Task B

Page 25: Deep Reinforcement Learning and Complex Environments · slide from V. Vanhoucke Deep Net Deep Net. General Artificial Intelligence Robotics is different LABELS. General Artificial

Raia Hadsell 2017

What if my tasks really don’t get along?

Page 26: Deep Reinforcement Learning and Complex Environments · slide from V. Vanhoucke Deep Net Deep Net. General Artificial Intelligence Robotics is different LABELS. General Artificial

Raia Hadsell

● Task-specific networks plus shared network

● KL Divergence constraint

● Regularisation in policy space rather than parameter space

● Shared policy as a communication channel between tasks

Distral (Distill and Transfer Learning)𝛑1

𝝂1

𝛑2

𝝂2

𝛑3

𝝂3

𝛑4

𝝂4

𝛑0

𝝂0

KL

KL

KL

KL

Yee Whye Teh et al (2017), “Distral: Robust Multitask Reinforcement Learning”

Page 27: Deep Reinforcement Learning and Complex Environments · slide from V. Vanhoucke Deep Net Deep Net. General Artificial Intelligence Robotics is different LABELS. General Artificial

Raia Hadsell

Distral (Distill and Transfer Learning)

𝛑1 𝛑2

𝛑3 𝛑4

𝛑0

KL

KL

KL

KL

distillation● Task-specific networks plus shared

network

● Regularisation in policy space rather than parameter space

● Shared policy as a communication channel between tasks

→ Distillation of knowledge into shared model enables transfer to tasks

Yee Whye Teh et al (2017), “Distral: Robust Multitask Reinforcement Learning”

Page 28: Deep Reinforcement Learning and Complex Environments · slide from V. Vanhoucke Deep Net Deep Net. General Artificial Intelligence Robotics is different LABELS. General Artificial

Raia Hadsell

Distral (Distill and Transfer Learning)

𝛑1 𝛑2

𝛑3 𝛑4

𝛑0

KL

KL

KL

KL

distillation & regularisation

● Task-specific networks plus shared network

● Regularisation in policy space rather than parameter space

● Shared policy as a communication channel between tasks

→ Distillation of knowledge into shared model enables transfer to tasks

→ Regularisation of shared model gives stability and robustness

Yee Whye Teh et al (2017), “Distral: Robust Multitask Reinforcement Learning”

Page 29: Deep Reinforcement Learning and Complex Environments · slide from V. Vanhoucke Deep Net Deep Net. General Artificial Intelligence Robotics is different LABELS. General Artificial

Deep RL — Raia Hadsell

Page 30: Deep Reinforcement Learning and Complex Environments · slide from V. Vanhoucke Deep Net Deep Net. General Artificial Intelligence Robotics is different LABELS. General Artificial

Deep RL — Raia Hadsell

Page 31: Deep Reinforcement Learning and Complex Environments · slide from V. Vanhoucke Deep Net Deep Net. General Artificial Intelligence Robotics is different LABELS. General Artificial

Lab Mazes&

Auxiliary Learning

StreetLearn ParkourMultiple Tasks&

Lifelong learning

Page 32: Deep Reinforcement Learning and Complex Environments · slide from V. Vanhoucke Deep Net Deep Net. General Artificial Intelligence Robotics is different LABELS. General Artificial

Navigation mazes Game episode:

1. Random start 2. Find the goal (+10) 3. Teleport randomly 4. Re-find the goal (+10) 5. Repeat (limited time)

Variants: Static maze, static goal Static maze, random goal Random maze

10800 steps/episode

3600 steps/episode

Page 33: Deep Reinforcement Learning and Complex Environments · slide from V. Vanhoucke Deep Net Deep Net. General Artificial Intelligence Robotics is different LABELS. General Artificial

Nav agent architecture

1. Convolutional encoder and RGB inputs

enc

xt

Piotr Mirowski, Razvan Pascanu et al (2017) “Learning to navigate in complex environments”

Page 34: Deep Reinforcement Learning and Complex Environments · slide from V. Vanhoucke Deep Net Deep Net. General Artificial Intelligence Robotics is different LABELS. General Artificial

Nav agent architecture

1. Convolutional encoder and RGB inputs

2. Single or stacked LSTM with skip connection

enc

xt

Piotr Mirowski, Razvan Pascanu et al (2017) “Learning to navigate in complex environments”

Page 35: Deep Reinforcement Learning and Complex Environments · slide from V. Vanhoucke Deep Net Deep Net. General Artificial Intelligence Robotics is different LABELS. General Artificial

Nav agent architecture

1. Convolutional encoder and RGB inputs

2. Stacked LSTM

3. Additional inputs (reward, action, and velocity)

enc

xt rt-1 {vt, at-1}

Piotr Mirowski, Razvan Pascanu et al (2017) “Learning to navigate in complex environments”

Page 36: Deep Reinforcement Learning and Complex Environments · slide from V. Vanhoucke Deep Net Deep Net. General Artificial Intelligence Robotics is different LABELS. General Artificial

Nav agent architecture

1. Convolutional encoder and RGB inputs

2. Stacked LSTM

3. Additional inputs (reward, action, and velocity)

4. RL: Asynchronous advantage actor critic (A3C)

enc

𝛑𝑽

xt rt-1 {vt, at-1}

Piotr Mirowski, Razvan Pascanu et al (2017) “Learning to navigate in complex environments”

Page 37: Deep Reinforcement Learning and Complex Environments · slide from V. Vanhoucke Deep Net Deep Net. General Artificial Intelligence Robotics is different LABELS. General Artificial

Nav agent architecture

1. Convolutional encoder and RGB inputs

2. Stacked LSTM

3. Additional inputs (reward, action, and velocity)

4. RL: Asynchronous advantage actor critic (A3C)

5. Aux task 1: Depth predictors

enc

𝛑𝑽

Depth (D1 )

xt rt-1 {vt, at-1}

Depth (D2 )

Piotr Mirowski, Razvan Pascanu et al (2017) “Learning to navigate in complex environments”

Page 38: Deep Reinforcement Learning and Complex Environments · slide from V. Vanhoucke Deep Net Deep Net. General Artificial Intelligence Robotics is different LABELS. General Artificial

Nav agent architecture

1. Convolutional encoder and RGB inputs

2. Stacked LSTM

3. Additional inputs (reward, action, and velocity)

4. RL: Asynchronous advantage actor critic (A3C)

5. Aux task 1: Depth predictor

6. Aux task 2: Loop closure predictor enc

𝛑𝑽 Loop

(L)

Depth (D1 )

xt rt-1 {vt, at-1}

Depth (D2 )

Piotr Mirowski, Razvan Pascanu et al (2017) “Learning to navigate in complex environments”

Page 39: Deep Reinforcement Learning and Complex Environments · slide from V. Vanhoucke Deep Net Deep Net. General Artificial Intelligence Robotics is different LABELS. General Artificial

Variations in architecture

xt rt-1 {vt, at-1}

enc

𝛑𝑽

xt

enc

𝛑𝑽

enc

𝛑𝑽 Loop

(L)

Depth (D1 )

a. FF A3C c. Nav A3C d. Nav A3C +D1D2L

xt rt-1 {vt, at-1}

enc

𝛑𝑽

xt

b. LSTM A3C

Depth (D2 )

Page 40: Deep Reinforcement Learning and Complex Environments · slide from V. Vanhoucke Deep Net Deep Net. General Artificial Intelligence Robotics is different LABELS. General Artificial

+10 +1

Results on large maze with static goal

Page 41: Deep Reinforcement Learning and Complex Environments · slide from V. Vanhoucke Deep Net Deep Net. General Artificial Intelligence Robotics is different LABELS. General Artificial
Page 42: Deep Reinforcement Learning and Complex Environments · slide from V. Vanhoucke Deep Net Deep Net. General Artificial Intelligence Robotics is different LABELS. General Artificial

Deep RL — Raia Hadsell

Page 43: Deep Reinforcement Learning and Complex Environments · slide from V. Vanhoucke Deep Net Deep Net. General Artificial Intelligence Robotics is different LABELS. General Artificial

Lab Mazes&

Auxiliary Learning

StreetLearn&

Real woRld RL

ParkourMultiple Tasks&

Lifelong learning

Page 44: Deep Reinforcement Learning and Complex Environments · slide from V. Vanhoucke Deep Net Deep Net. General Artificial Intelligence Robotics is different LABELS. General Artificial

Deep RL — Raia Hadsell

observation

Navigation mazes in the real world?

observation

structure structure

Page 45: Deep Reinforcement Learning and Complex Environments · slide from V. Vanhoucke Deep Net Deep Net. General Artificial Intelligence Robotics is different LABELS. General Artificial

Deep RL — Raia Hadsell

observation

StreetView as an RL environment: StreetLearn

observation

structure structure

● RGB image cropped from panorama (84x84)

● Goal location

Actions: move to next node, rotate view 20° or 60°

Page 46: Deep Reinforcement Learning and Complex Environments · slide from V. Vanhoucke Deep Net Deep Net. General Artificial Intelligence Robotics is different LABELS. General Artificial

Deep RL — Raia Hadsell

left or right?

StreetView as an RL environment: StreetLearn

Page 47: Deep Reinforcement Learning and Complex Environments · slide from V. Vanhoucke Deep Net Deep Net. General Artificial Intelligence Robotics is different LABELS. General Artificial

Deep RL — Raia Hadsell

Looks like a road, but it’s a park entrance

StreetView as an RL environment: StreetLearn

Page 48: Deep Reinforcement Learning and Complex Environments · slide from V. Vanhoucke Deep Net Deep Net. General Artificial Intelligence Robotics is different LABELS. General Artificial

Deep RL — Raia Hadsell

west side highway

StreetView as an RL environment: StreetLearn

Page 49: Deep Reinforcement Learning and Complex Environments · slide from V. Vanhoucke Deep Net Deep Net. General Artificial Intelligence Robotics is different LABELS. General Artificial

Deep RL — Raia Hadsell

curved roads and tunnels

StreetView as an RL environment: StreetLearn

Page 50: Deep Reinforcement Learning and Complex Environments · slide from V. Vanhoucke Deep Net Deep Net. General Artificial Intelligence Robotics is different LABELS. General Artificial

Deep RL — Raia Hadsell

really, tunnels!

StreetView as an RL environment: StreetLearn

Page 51: Deep Reinforcement Learning and Complex Environments · slide from V. Vanhoucke Deep Net Deep Net. General Artificial Intelligence Robotics is different LABELS. General Artificial

Raia Hadsell

StreetLearn: The Courier Task

1. Spawn randomly and navigate to a random target location.

2. Start receiving reward when close to target (within 400m).

3. If target is reached (100m), navigate to a new random target.

Page 52: Deep Reinforcement Learning and Complex Environments · slide from V. Vanhoucke Deep Net Deep Net. General Artificial Intelligence Robotics is different LABELS. General Artificial

Raia Hadsell

Agent architecture

rt-1, at-1

CNN

image

LSTM

Policy (π, V)

target

LSTM

Relativepathway

Local graph neighbour prediction

Global pathway

Absolute heading

prediction

LSTM

Page 53: Deep Reinforcement Learning and Complex Environments · slide from V. Vanhoucke Deep Net Deep Net. General Artificial Intelligence Robotics is different LABELS. General Artificial

Raia Hadsell

Agent architecture

rt-1, at-1

CNN

image

LSTM

Policy (π, V)

target

LSTM

Relativepathway

Local graph neighbour prediction

Global pathway

Absolute heading

prediction

LSTM

Page 54: Deep Reinforcement Learning and Complex Environments · slide from V. Vanhoucke Deep Net Deep Net. General Artificial Intelligence Robotics is different LABELS. General Artificial

Raia Hadsell

Agent architecture

rt-1, at-1

CNN

image

LSTM

Policy (π, V)

target

LSTM

Relativepathway

Local graph neighbour prediction

Global pathway

Absolute heading

prediction

LSTM

Page 55: Deep Reinforcement Learning and Complex Environments · slide from V. Vanhoucke Deep Net Deep Net. General Artificial Intelligence Robotics is different LABELS. General Artificial

Raia Hadsell

Agent architecture

rt-1, at-1

CNN

image

LSTM

Policy (π, V)

target

LSTM

Relativepathway

Local graph neighbour prediction

Global pathway

Absolute heading

prediction

LSTM

Page 56: Deep Reinforcement Learning and Complex Environments · slide from V. Vanhoucke Deep Net Deep Net. General Artificial Intelligence Robotics is different LABELS. General Artificial
Page 57: Deep Reinforcement Learning and Complex Environments · slide from V. Vanhoucke Deep Net Deep Net. General Artificial Intelligence Robotics is different LABELS. General Artificial

Deep RL — Raia Hadsell

Page 58: Deep Reinforcement Learning and Complex Environments · slide from V. Vanhoucke Deep Net Deep Net. General Artificial Intelligence Robotics is different LABELS. General Artificial

Lab Mazes&

Auxiliary Learning

StreetLearn&

Real woRld RLParkour

&Continuous control

Multiple Tasks&

Lifelong learning

Page 59: Deep Reinforcement Learning and Complex Environments · slide from V. Vanhoucke Deep Net Deep Net. General Artificial Intelligence Robotics is different LABELS. General Artificial

Deep RL — Raia Hadsell

Proprioceptive and exteroceptive observations

Proprioceptive -- “near the body”:

● Joint angles & velocities

● Touch sensors ● Positions and

velocities of limbs in body coordinate frame

Page 60: Deep Reinforcement Learning and Complex Environments · slide from V. Vanhoucke Deep Net Deep Net. General Artificial Intelligence Robotics is different LABELS. General Artificial

Deep RL — Raia Hadsell

Proprioceptive and exteroceptive observations

Proprioceptive -- “near the body”:

● Joint angles & velocities

● Touch sensors ● Positions and

velocities of limbs in body coordinate frame

Exteroceptive -- “away from the body”:

● Position / velocity in global coordinate frame

● Task-related (e.g. goal position)

● Vision

Page 61: Deep Reinforcement Learning and Complex Environments · slide from V. Vanhoucke Deep Net Deep Net. General Artificial Intelligence Robotics is different LABELS. General Artificial

Raia Hadsell

Rich environments for skill discovery: setup

Training ● Proximal policy optimization

[Schulman et al.]

● Batched policy gradient ● Trust region

(“gradient-based TRPO”) ● High-performance

implementation: ○ Distributed (multiple

workers) ○ Synchronous gradient updates

actions

proprioception

terrain

Nicolas Heess, et al. 2016: “Learning and transfer of modulated locomotor controllers”

Page 62: Deep Reinforcement Learning and Complex Environments · slide from V. Vanhoucke Deep Net Deep Net. General Artificial Intelligence Robotics is different LABELS. General Artificial

Raia Hadsell

Single uniform reward, based on forward progress

Nicolas Heess, et al. 2017: “Emergence of Locomotion Behaviours in Rich Environments”

Page 63: Deep Reinforcement Learning and Complex Environments · slide from V. Vanhoucke Deep Net Deep Net. General Artificial Intelligence Robotics is different LABELS. General Artificial

Deep RL — Raia Hadsell

Humanoid: learned behaviors

● 27 DoFs● 21 actuators

Nicolas Heess, et al. 2017: “Emergence of Locomotion Behaviours in Rich Environments”

Page 64: Deep Reinforcement Learning and Complex Environments · slide from V. Vanhoucke Deep Net Deep Net. General Artificial Intelligence Robotics is different LABELS. General Artificial

Deep RL — Raia Hadsell

● Can deep RL agents learn multiple tasks?

● Can deep RL agents learn efficiently?

● Can deep RL agents learn from real data?

● Can deep RL agents learn continuous control?

Page 65: Deep Reinforcement Learning and Complex Environments · slide from V. Vanhoucke Deep Net Deep Net. General Artificial Intelligence Robotics is different LABELS. General Artificial

Thank you!

Overcoming catastrophic forgetting in NNs, 2016 James Kirkpatrick, Razvan Pascanu, Neil Rabinowitz, Joel Veness, Guillaume Desjardins, Andrei A. Rusu, Kieran Milan, John Quan, Tiago Ramalho, Agnieszka Grabska-Barwinska, Demis Hassabis, Claudia Clopath, Dharshan Kumaran, Raia Hadsell

Progressive Neural Networks, 2016 Andrei A. Rusu, Neil C. Rabinowitz, Guillaume Desjardins, Hubert Soyer, James Kirkpatrick, Koray Kavukcuoglu, Razvan Pascanu, Raia Hadsell

Distral: Robust Multitask RL, 2017 Yee Whye Teh, Victor Bapst, Wojciech Marian Czarnecki, John Quan, James Kirkpatrick, Raia Hadsell, Nicolas Heess, Razvan Pascanu

Learning to navigate in complex environments, 2017 Piotr Mirowski*, Razvan Pascanu*, Fabio Viola, Hubert Soyer, Andrew J. Ballard, Andrea Banino, Misha Denil, Ross Goroshin, Laurent Sifre, Koray Kavukcuoglu, Dharshan Kumaran, Raia Hadsell

Learning and transfer of modulated locomotor controllers, 2016 Nicolas Heess, Greg Wayne, Yuval Tassa, Timothy Lillicrap, Martin Riedmiller, David Silver

Emergence of Locomotion Behaviours in Rich Environments, 2017 Nicolas Heess, Dhruva TB, Srinivasan Sriram, Jay Lemmon, Josh Merel, Greg Wayne, Yuval Tassa, Tom Erez, Ziyu Wang, S. M. Ali Eslami, Martin Riedmiller, David Silver