Top Banner
Robot Learning Jeremy Wyatt School of Computer Science University of Birmingham
16

Robot Learning Jeremy Wyatt School of Computer Science University of Birmingham.

Dec 20, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Robot Learning Jeremy Wyatt School of Computer Science University of Birmingham.

Robot Learning

Jeremy Wyatt

School of Computer Science

University of Birmingham

Page 2: Robot Learning Jeremy Wyatt School of Computer Science University of Birmingham.

Plan

Why and when What we can do

– Learning how to act– Learning maps– Evolutionary Robotics

How we do it– Supervised Learning– Learning from punishments and rewards– Unsupervised Learning

Page 3: Robot Learning Jeremy Wyatt School of Computer Science University of Birmingham.

Learning How to Act What can we do?

– Reaching– Road following– Box pushing– Wall following– Pole-balancing– Stick juggling– Walking

Page 4: Robot Learning Jeremy Wyatt School of Computer Science University of Birmingham.

Learning How to Act: Reaching

We can learn from reinforcement or from a teacher (supervised learning)

Reinforcement Learning:– Action: Move your arm ()– You received a reward of 2.1

Supervised Learning:– Action: Move your hand to – You should have moved to

(x,y,z)

Page 5: Robot Learning Jeremy Wyatt School of Computer Science University of Birmingham.

Learning How to Act: Driving ALVINN: learned to drive in 5 minutes Learns to copy the human response Feedforward multilayer neural network

30

32

Steering wheel

position

Page 6: Robot Learning Jeremy Wyatt School of Computer Science University of Birmingham.

Learning How to Act: Driving

Network outputs form a Gaussian Mean encodes the driving direction Compare with the “correct” human action Compute error for each unit given desired Gaussian

0

0.01

0.02

0.03

0.04

0.05

0.06

0.07

1 3 5 7 9 11

13

15

17

19

21

23

25

27

29

31

00.010.020.030.040.050.060.07

Page 7: Robot Learning Jeremy Wyatt School of Computer Science University of Birmingham.

Learning How to Act: Driving

Distribution of training examples from on the fly learning causes problems

Network doesn’t see how to cope with misalignments Network can forget if it doesn’t see a situation for a

while Answer: generate new examples from the on the fly

images

Page 8: Robot Learning Jeremy Wyatt School of Computer Science University of Birmingham.

Learning How to Act: Driving

Use camera geometry to assess new field of view

Fill in using information about road structure Transform the target steering direction Present as a new training example

Page 9: Robot Learning Jeremy Wyatt School of Computer Science University of Birmingham.

Learning How to Act: Driving

Page 10: Robot Learning Jeremy Wyatt School of Computer Science University of Birmingham.

Learning How to Act

Obelix Learns to push boxes Reinforcement Learning

Page 11: Robot Learning Jeremy Wyatt School of Computer Science University of Birmingham.

What is Reinforcement Learning? Learning from punishments and rewards Agent moves through world, observing states

and rewards Adapts its behaviour to maximise some

function of reward

s9s5s4

……

+50

-1-1

+3

r9r5r4r1

s1

a9a5a4a2 …a3a1

s2 s3

Page 12: Robot Learning Jeremy Wyatt School of Computer Science University of Birmingham.

Return: Long term performance Let’s assume our agent acts according to some

rules, called a policy, The return Rt is a measure of long term reward

collected after time t

The expected return for a state-action pair is called a Q value Q(s,a)

+50

-1-1

+3

r9r5r4r1

3 4 80 3 1 1 50R 0 1

Page 13: Robot Learning Jeremy Wyatt School of Computer Science University of Birmingham.

One step Q-learning

Guess how good state-action pairs are Take an action Watch the new state and reward Update the state-action value

1 1 1ˆ ˆ ˆ ˆ( , ) ( , ) max ( , ) ( , )t t t t t t t t t t t t

b AQ s a Q s a r Q s b Q s a

st+1at

strt+1

Page 14: Robot Learning Jeremy Wyatt School of Computer Science University of Birmingham.

Obelix

Won’t converge with a single controller Works if you divide it into behaviours But …

Page 15: Robot Learning Jeremy Wyatt School of Computer Science University of Birmingham.

Evolutionary Robotics

Page 16: Robot Learning Jeremy Wyatt School of Computer Science University of Birmingham.

Learning Maps