Top Banner
From Exploration to Planning Cornelius Weber and Jochen Triesch Frankfurt Institute for Advanced Studies Goethe University Frankfurt, Germany 18 th International Conference on Artificial Neural Networks 3 d - 6 th September 2008, Prague
27

From Exploration to Planning Cornelius Weber and Jochen Triesch Frankfurt Institute for Advanced Studies Goethe University Frankfurt, Germany 18 th International.

Dec 21, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: From Exploration to Planning Cornelius Weber and Jochen Triesch Frankfurt Institute for Advanced Studies Goethe University Frankfurt, Germany 18 th International.

From Exploration to Planning

Cornelius Weber and Jochen TrieschFrankfurt Institute for Advanced StudiesGoethe University Frankfurt, Germany

18th International Conference on Artificial Neural Networks3d - 6th September 2008, Prague

Page 2: From Exploration to Planning Cornelius Weber and Jochen Triesch Frankfurt Institute for Advanced Studies Goethe University Frankfurt, Germany 18 th International.

Reinforcement Learning

value actor units

fixed reactive system that always strives for the same goal

Trained Weights

Page 3: From Exploration to Planning Cornelius Weber and Jochen Triesch Frankfurt Institute for Advanced Studies Goethe University Frankfurt, Germany 18 th International.

reinforcement learning does not use the exploration phase

to learn a general model of the environment

that would allow the agent to plan a route to any goal

so let’s do this

Page 4: From Exploration to Planning Cornelius Weber and Jochen Triesch Frankfurt Institute for Advanced Studies Goethe University Frankfurt, Germany 18 th International.

Learning

actor

state space

randomly move aroundthe state space

learn world models:● associative model● inverse model● forward model

Page 5: From Exploration to Planning Cornelius Weber and Jochen Triesch Frankfurt Institute for Advanced Studies Goethe University Frankfurt, Germany 18 th International.

Learning: Associative Model

weights to associateneighbouring states

use these to find any possible routes between agent and goalj

ss'iji sw=' s~ jii

ss'ij s''sε=Δw s~

Page 6: From Exploration to Planning Cornelius Weber and Jochen Triesch Frankfurt Institute for Advanced Studies Goethe University Frankfurt, Germany 18 th International.

Learning: Inverse Model

weights to “postdict”action given state pair

use these to identify the action that leads to a desired stateji

s s'akijk s'sw=a ~ jikk

sas'kij s'saaε=Δw ~

sum product Sigma-Pi neuron model

Page 7: From Exploration to Planning Cornelius Weber and Jochen Triesch Frankfurt Institute for Advanced Studies Goethe University Frankfurt, Germany 18 th International.

Learning: Forward Model

weights to predict stategiven state-action pair

use these to predict the next state given the chosen actionjk

ass'ikji saw=' s jkii

ass'ikj sa''sε=Δw s

Page 8: From Exploration to Planning Cornelius Weber and Jochen Triesch Frankfurt Institute for Advanced Studies Goethe University Frankfurt, Germany 18 th International.

Planning

Page 9: From Exploration to Planning Cornelius Weber and Jochen Triesch Frankfurt Institute for Advanced Studies Goethe University Frankfurt, Germany 18 th International.

Planning

Page 10: From Exploration to Planning Cornelius Weber and Jochen Triesch Frankfurt Institute for Advanced Studies Goethe University Frankfurt, Germany 18 th International.

Planning

Page 11: From Exploration to Planning Cornelius Weber and Jochen Triesch Frankfurt Institute for Advanced Studies Goethe University Frankfurt, Germany 18 th International.

Planning

Page 12: From Exploration to Planning Cornelius Weber and Jochen Triesch Frankfurt Institute for Advanced Studies Goethe University Frankfurt, Germany 18 th International.

Planning

Page 13: From Exploration to Planning Cornelius Weber and Jochen Triesch Frankfurt Institute for Advanced Studies Goethe University Frankfurt, Germany 18 th International.

Planning

Page 14: From Exploration to Planning Cornelius Weber and Jochen Triesch Frankfurt Institute for Advanced Studies Goethe University Frankfurt, Germany 18 th International.

Planning

Page 15: From Exploration to Planning Cornelius Weber and Jochen Triesch Frankfurt Institute for Advanced Studies Goethe University Frankfurt, Germany 18 th International.

Planning

Page 16: From Exploration to Planning Cornelius Weber and Jochen Triesch Frankfurt Institute for Advanced Studies Goethe University Frankfurt, Germany 18 th International.

Planning

Page 17: From Exploration to Planning Cornelius Weber and Jochen Triesch Frankfurt Institute for Advanced Studies Goethe University Frankfurt, Germany 18 th International.

Planning

Page 18: From Exploration to Planning Cornelius Weber and Jochen Triesch Frankfurt Institute for Advanced Studies Goethe University Frankfurt, Germany 18 th International.

Planning

Page 19: From Exploration to Planning Cornelius Weber and Jochen Triesch Frankfurt Institute for Advanced Studies Goethe University Frankfurt, Germany 18 th International.

Planning

Page 20: From Exploration to Planning Cornelius Weber and Jochen Triesch Frankfurt Institute for Advanced Studies Goethe University Frankfurt, Germany 18 th International.

Planning

Page 21: From Exploration to Planning Cornelius Weber and Jochen Triesch Frankfurt Institute for Advanced Studies Goethe University Frankfurt, Germany 18 th International.

Planning

Page 22: From Exploration to Planning Cornelius Weber and Jochen Triesch Frankfurt Institute for Advanced Studies Goethe University Frankfurt, Germany 18 th International.

Planning

goal

actorunits

agent

Page 23: From Exploration to Planning Cornelius Weber and Jochen Triesch Frankfurt Institute for Advanced Studies Goethe University Frankfurt, Germany 18 th International.

Planning

Page 24: From Exploration to Planning Cornelius Weber and Jochen Triesch Frankfurt Institute for Advanced Studies Goethe University Frankfurt, Germany 18 th International.

Planning

Page 25: From Exploration to Planning Cornelius Weber and Jochen Triesch Frankfurt Institute for Advanced Studies Goethe University Frankfurt, Germany 18 th International.

Planning

Page 26: From Exploration to Planning Cornelius Weber and Jochen Triesch Frankfurt Institute for Advanced Studies Goethe University Frankfurt, Germany 18 th International.

Discussion

- reinforcement learning ... if no access to full state space

- previous work ... AI-like planners assume links between states

- noise ... wide “goal hills” will have flat slopes

- shortest path ... not taken; how to define?

- biological plausibility ... Sigma-Pi neurons; winner-take-all

- to do: embedding ... learn state space from sensor input

- to do: embedding ... let the goal be assigned naturally

- to do: embedding ... hand-designed planning phases

Page 27: From Exploration to Planning Cornelius Weber and Jochen Triesch Frankfurt Institute for Advanced Studies Goethe University Frankfurt, Germany 18 th International.

Acknowledgments

Collaborators:

Jochen Triesch FIAS J-W-Goethe University Frankfurt

Stefan Wermter University of Sunderland

Mark Elshaw University of Sheffield