Deep Reinforcement Learning for Robotics Using DIANNEon-demand.gputechconf.com/gtc-eu/2017/presentation/... · PUBLIC Deep Reinforcement Learning for Robotics Using DIANNE Tim Verbelen,

PUBLIC

Deep Reinforcement Learning for Robotics Using DIANNETim Verbelen, Steven Bohez, Elias De Coninck, Sam Leroux, Pieter Van Molle

Bert VanKeirsbilck, Pieter Simoens, Bart [email protected]

How can we build robots that are able to execute complex tasks without programming them explicitly ?

Kuka Youbot

3

5 axis armLength: 66 cm

Gripper

Omnidirectional wheelsMax speed: 0.8 m/s

Battery operated

Embedded PC

Hokuyo Laser rangefinder

Kuka soft gripper

Reinforcement learning

5

EnvironmentAgent


6

Environment

Observation

Agent


7

Environment

Action

Agent


8

EnvironmentReward

Agent

Deep Reinforcement learning

9

● The actor needs to process high dimensional observations to determine the next action.● Our favorite processing block: deep neural networks

Observation Action

How can we train without destroying our robot ?

11

https://docs.google.com/file/d/0B6aNwc1Zdoa9Z3p2QzVabHhpcVk/preview

V-REP simulator

12

13

Multiple simulator instances gathering experience on CPU

14

Multiple simulator instances gathering experience on CPU

GPU system training the model

How can we evaluate our models on the robot ?

16

Brain transplantation !

How can we connect the different components ?

18

19

Dianne

• Modular software framework for designing, training and evaluating neural networks.

• Distributed training and evaluation

• Java based

• Easy integration (service based architecture)

• GUI

• Open source (AGPL 3)

20

Deployed agent

Deployed agent

21

Experience Pool

Deployed agent

Deployed agent

22

Experience Pool

Repository

TrainingDeployed

agentDeployed

agent

23

Experience Pool

Repository

TrainingDeployed

agentDeployed

agent

Deep Reinforcement learning algorithms

DQN

25

“Playing Atari with Deep Reinforcement Learning” (Mnih et al, 2013)

Expected future return for each possible action

raw laser scanner measurements

(512 values)

Q Values

26

https://docs.google.com/file/d/0B6aNwc1Zdoa9anl4NHRLNDd3UTg/preview

DDPG

27

Continuous control with Deep Reinforcement Learning (Lillicrap, et al. 2015)

Actor network

Critic network

raw laser scanner measurements

(512 values)

Continuous action

Expected future return

28

https://docs.google.com/file/d/0B6aNwc1Zdoa9UkFTRXIwYTZjM1E/preview

29

Visit dianne.intec.ugent.be for more information

PUBLIC

31

Abstraction layer with ROS

Base

Sensor

Arm

Deep Reinforcement Learning for Robotics Using DIANNEon-demand.gputechconf.com/gtc-eu/2017/presentation/... · PUBLIC Deep Reinforcement Learning for Robotics Using DIANNE Tim Verbelen,

Documents