Deep Reinforcement Learning for Robotics

Deep Reinforcement Learning for Robotics Pieter Abbeel -- UC Berkeley EECS

State-of-the-art object detection until 2012:

Deep Supervised Learning (Krizhevsky, Sutskever, Hinton 2012; also LeCun, Bengio, Ng, Darrell, …):

60 million learned parameters (since then, billions of parameters)

~1.2 million training images

Object Detection in Computer Vision

Input Image

Hand-engineered features (SIFT,

HOG, DAISY, …)

Support Vector

Machine (SVM)

“cat” “dog” “car” …

Input Image

8-layer neural network with 60 million parameters to learn

Performance

graph credit Matt Zeiler, Clarifai

Performance

AlexNet

Performance

AlexNet

Performance

AlexNet

Speech Recognition

History

Is deep learning 3, 30, or 60 years old?

2000s Sparse, Probabilistic, and Energy models (Hinton, Bengio, LeCun, Ng)

Rosenblatt’s Perceptron

(Olshausen, 1996)

based on history by K. Cho

1.2M training examples

* 2048 (shifts)

* 90 (PCA re-coloring)

1.2M * 2k *90 ~ 0.216 trillion

Human eye: 1k frames/s

~6.84yrs

Compute power

Two NVIDIA GTX 580 GPUs

5-6 days of training time

What’s Changed Nonlinearity

Sigmoid

Regularization

Drop-out

(Training data augmentation)

Exploration of model structure

Optimization know-how

State-of-the-art object detection until 2012:

Deep Supervised Learning (Krizhevsky, Sutskever, Hinton 2012; also LeCun, Bengio, Ng, Darrell, …):

60 million learned parameters (since then, billions of parameters)

~1.2 million training images

Object Detection in Computer Vision

Input Image

Hand-engineered features (SIFT,

HOG, DAISY, …)

Support Vector

Machine (SVM)

Input Image

8-layer neural network with 60 million parameters to learn

Current state-of-the-art robotics

Deep reinforcement learning

Robotics

Percepts Hand-

engineered state-

estimation

Many-layer neural network

with many parameters to learn

Hand-engineered

control policy class

Hand-tuned (or learned) 10’ish free parameters

Motor commands

Percepts Motor commands

Reinforcement Learning (RL)

Robotics

Marketing / Advertising

Dialogue

Optimizing operations / logistics

Queue management

Robot + Environment

probability of taking action a in state s

How About Deep RL?

Pong Enduro Beamrider Q*bert

Deep Q-learning [Mnih et al, 2013]

Monte Carlo Tree Search [Xiao-Xiao et al, 2014]

Trust Region Policy Optimization [Schulman, Levine, Moritz, Jordan, A., 2014]

Deep Reinforcement Learning for Atari Games

Pong Enduro Beamrider Q*bert

[Schulman, Levine, Moritz, Jordan, Abbeel, ICML 2015]

Experiments in Locomotion

How About Real Robotic Visuo-Motor Skills?

Architecture (92,000 parameters)

[Levine*, Finn*, Darrell, Abbeel, 2015, TR at: rll.berkeley.edu/deeplearningrobotics]

Block Stacking – Learning the Controller for a Single Instance

Learned Skills

Architectures for shared learning / transfer learning

Multiple robots and sensors (including simulation)

Multiple tasks

Simulation – Real world

Frontiers / Limitations Exploration

Controllers that require memory / estimation

Temporal hierarchy

Thank you

Deep Reinforcement Learning for Robotics

Documents

IEEE ROBOTICS AND AUTOMATION ... - yuxiangsun.github.io ·....

Profil: AI och Maskininlärning - Linköping...

Introduction to Deep Reinforcement Learning · PDF...

Deep Reinforcement Learning for Robotics Using...

DEEP FEATURE EXTRACTION FOR SAMPLE-EFFICIENT REINFORCEMENT.....

Deep Reinforcement Learning Robot for Search and Rescue...

Deep Learning for Reinforcement Learning in · PDF fileDeep....

Hierarchical Deep Reinforcement Learning: Integrating...

Human-level control through deep reinforcement … control.....

Deep Reinforcement Learning in a Handful of Trials...

förändrar allt? Eller? - Atomer och bitar ABLearning...

Exploration in Deep Reinforcement Learning ·...

Deep Learning and Reinforcement Learning

Tutorial: Deep Reinforcement Learning · Outline...

COMP 4180: Intelligent Mobile Robotics Reinforcement...

Hierarchical Deep Reinforcement Learning: Integrating...