Deep Reinforcement Learning for Robotics

Post on 13-Feb-2017

232 Views

Category:

Documents

2 Downloads

Preview:

Click to see full reader

Transcript

Deep Reinforcement Learning for Robotics Pieter Abbeel -- UC Berkeley EECS

State-of-the-art object detection until 2012:

Deep Supervised Learning (Krizhevsky, Sutskever, Hinton 2012; also LeCun, Bengio, Ng, Darrell, …):

60 million learned parameters (since then, billions of parameters)

~1.2 million training images

Object Detection in Computer Vision

Input Image

Hand-engineered features (SIFT,

HOG, DAISY, …)

Support Vector

Machine (SVM)

“cat” “dog” “car” …

Input Image

8-layer neural network with 60 million parameters to learn

“cat” “dog” “car” …

Performance

graph credit Matt Zeiler, Clarifai

Performance

graph credit Matt Zeiler, Clarifai

Performance

graph credit Matt Zeiler, Clarifai

AlexNet

Performance

graph credit Matt Zeiler, Clarifai

AlexNet

Performance

graph credit Matt Zeiler, Clarifai

AlexNet

Speech Recognition

graph credit Matt Zeiler, Clarifai

History

Is deep learning 3, 30, or 60 years old?

2000s Sparse, Probabilistic, and Energy models (Hinton, Bengio, LeCun, Ng)

Rosenblatt’s Perceptron

(Olshausen, 1996)

based on history by K. Cho

Presenter
Presentation Notes
connected the dots exploration of model structure optimization know-how computation + data

Data

1.2M training examples

* 2048 (shifts)

* 90 (PCA re-coloring)

1.2M * 2k *90 ~ 0.216 trillion

Human eye: 1k frames/s

~6.84yrs

Compute power

Two NVIDIA GTX 580 GPUs

5-6 days of training time

What’s Changed Nonlinearity

Sigmoid

ReLU

Regularization

Drop-out

(Training data augmentation)

Exploration of model structure

Optimization know-how

State-of-the-art object detection until 2012:

Deep Supervised Learning (Krizhevsky, Sutskever, Hinton 2012; also LeCun, Bengio, Ng, Darrell, …):

60 million learned parameters (since then, billions of parameters)

~1.2 million training images

Object Detection in Computer Vision

Input Image

Hand-engineered features (SIFT,

HOG, DAISY, …)

Support Vector

Machine (SVM)

“cat” “dog” “car” …

Input Image

8-layer neural network with 60 million parameters to learn

“cat” “dog” “car” …

Current state-of-the-art robotics

Deep reinforcement learning

Robotics

Percepts Hand-

engineered state-

estimation

Many-layer neural network

with many parameters to learn

Hand-engineered

control policy class

Hand-tuned (or learned) 10’ish free parameters

Motor commands

Percepts Motor commands

Reinforcement Learning (RL)

Robotics

Marketing / Advertising

Dialogue

Optimizing operations / logistics

Queue management

Robot + Environment

probability of taking action a in state s

How About Deep RL?

Pong Enduro Beamrider Q*bert

Deep Q-learning [Mnih et al, 2013]

Monte Carlo Tree Search [Xiao-Xiao et al, 2014]

Trust Region Policy Optimization [Schulman, Levine, Moritz, Jordan, A., 2014]

Deep Reinforcement Learning for Atari Games

Pong Enduro Beamrider Q*bert

[Schulman, Levine, Moritz, Jordan, Abbeel, ICML 2015]

Experiments in Locomotion

How About Real Robotic Visuo-Motor Skills?

Architecture (92,000 parameters)

[Levine*, Finn*, Darrell, Abbeel, 2015, TR at: rll.berkeley.edu/deeplearningrobotics]

Block Stacking – Learning the Controller for a Single Instance

Learned Skills

Architectures for shared learning / transfer learning

Multiple robots and sensors (including simulation)

Multiple tasks

Simulation – Real world

Frontiers / Limitations Exploration

Controllers that require memory / estimation

Temporal hierarchy

Thank you

top related