Top Banner
Machine Learning and Motion Planning Dave Millman October 17, 2007
29

Machine Learning and Motion Planning

Jan 03, 2016

Download

Documents

Machine Learning and Motion Planning. Dave Millman October 17, 2007. Machine Learning intro. Machine Learning (ML) The study of algorithms which improve automatically though experience. - Mitchell General description Data driven Extract some information from data Mathematically based - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Machine Learning and Motion Planning

Machine Learning and Motion Planning

Dave Millman

October 17, 2007

Page 2: Machine Learning and Motion Planning

Machine Learning intro

• Machine Learning (ML) – The study of algorithms which improve

automatically though experience. - Mitchell

• General description– Data driven– Extract some information from data– Mathematically based

• Probability, Statistics, Information theory, Computational learning theory, optimization

Page 3: Machine Learning and Motion Planning

A very small set of uses of ML

– Text• Document labeling, Part of speech tagging,

Summarization

– Vision• Object recognition, Hand writing recognition, Emotion

labeling, Surveillance

– Sound• Speech recognition, music genra classification

– Finance• Algorithmic trading

– Medical, Biological, Chemical, on and on and on…

Page 4: Machine Learning and Motion Planning

A few types of ML

• Supervised– Given: labeled data– Usual goal: learn function– Ex: SVM, Neural Networks, Boosting etc.

• Unsupervised– Given: unlabeled data– Usual goal: cluster data, learn conditional

probabilities– Ex: Nearest Neighbors, Decision trees

Page 5: Machine Learning and Motion Planning

A few types of ML (cont.)

• Semi-Supervised– Given: labeled and unlabeled data– Usual goal: use unlabeled data to increase labeled

data– Ex: Cluster, Label unlabeled data from clusters

• Reinforcement– Given: Reward function and set of actions– Goal: Learn a function which optimizes the reward

function– Ex: Q-learning , Ant-Q

Page 6: Machine Learning and Motion Planning

General Idea Supervised

Page 7: Machine Learning and Motion Planning

General Idea Unsupervised

Page 8: Machine Learning and Motion Planning

General Idea Semi-Supervised

Page 9: Machine Learning and Motion Planning

General Idea Reinforcement

• Markov Decision Process (MDP)– State space (fully or partially observable)– Action space (static or time dependant)– Transition function produces an action

(based the present state, not the past)– Reward function (based on action)

Page 10: Machine Learning and Motion Planning

Why regression is not enough…The XOR problem

Page 11: Machine Learning and Motion Planning

Text book Q-Learning [MI06]

• Learning flocking behavior– N agents– discrete time steps– Agent i partner j

– Define Q-state Q(st, at)

– st - state

– ai - action

Page 12: Machine Learning and Motion Planning

Our text book example

– State of i• [R] = floor(|i-j|)

– Actions for i• a1 - Attract to j

• a2 - Parallel positive orientation to j

• a3 - Parallel negative orientation to j

• a4 - Repulsion from j

Page 13: Machine Learning and Motion Planning

Reward Function - no predator

• Distances R1, R2, R3 s.t. R1 < R2 < R3

st 0<[R]≤R1 R1<[R]≤R2 R2<[R]≤R3 R3<[R]

at a4 a1,a2,a3 a2 a1,a3,a4 a1 a2,a3,a4 a1,a2,a3, a4

r 1 -1 1 -1 1 -1 0

Page 14: Machine Learning and Motion Planning

Reward Function - predator

• Distances R1, R2, R3 s.t. R1 < R2 < R3

st 0<[R]≤R3 R3<[R]

at a4 a1,a2,a3 a1,a2,a3,a4

r 1 -1 0

Page 15: Machine Learning and Motion Planning

Don’t repeat work!!

• Basic planners work from scratch• Ex, path planning for parking, no

difference between first time and the hundredth time

• Ideal learn some general higher level “strategies” that can be reused

• General solution patterns in the problem space

Page 16: Machine Learning and Motion Planning

Viability Filtering [KP07]

• Agent can “see”, perceptual information– Range finder like virtual sensors

• Data base of successfully perceptually-parameterized motions– From its own experimentation or external source

• Database exploited for future queries– Search based off of what has previously been

successful in similar situations.

Page 17: Machine Learning and Motion Planning

Sensors in Viability Filtering some defs

• X set of agent states• E set of environment states• def x+ \in X+ {x+=(x,e) | x \in X, e \in E}• Sensor function (x+): X+ R• At a specific sensor state x+ \in X+

• def sensor state s =(1(x+), …, n(x+))

• And sensor space s \in S where S all sensor state values

Page 18: Machine Learning and Motion Planning

Finally

• Def locally situated state of the agent = (s,x’) \in where x’ is some state information independent of the sensory agent.

• Now we want collect data to train a function

(): {viable, nonviable}• Note, errors in () could cause problems

Page 19: Machine Learning and Motion Planning

Check Viability not Collision

Function IS_NONVIABLE(x+)

if is_collision(x+) then

return True

s := (1(x+), …, n(x+))

x’ := extract_internal_state(x+)

:= (s, x’)

return ¬():

Page 20: Machine Learning and Motion Planning

Results and Further work

• Bootstrapping • Use of history to create macroscopic plans• Model transfer

Page 21: Machine Learning and Motion Planning

Training a Dog [B02]

• MIT lab - System where the user interactively train the dog using “click training”

• Uses acoustic patterns as cues for actions

• Can be taught cues on different acoustic pattern

• Can create new actions from state space search

• Simplified Q-learning based on animal training techniques

Page 22: Machine Learning and Motion Planning

Training a Dog (cont.)• Predictable regularities

– animals will tend to successful state– small time window

• Maximize use of supervisor feedback – limit the state space by only looking at states that

matter, ex if utterance u followed by action a produces a reward then utterance u is important.

• Easy to train– Credit accumulation– And allowing state action pair to delegate credit to

another state action pain.

Page 23: Machine Learning and Motion Planning

Alternatives to Q-Learning

• Q-decomp [RZ03]– Complex agent as

set of simpler subagents

– Subagent has its own reward function

– Arbitrator decides best actions based on “advice” from subagents

• A simple world with initial state S0 and three terminal states SL , SU , SR , each with an associated reward of dollars and/or euros. The discount factor is γ ∈

(0, 1). [fig from. RZ03]

Page 24: Machine Learning and Motion Planning

Learning Behavior with Q-Decomp [CT06]

• Q-Decomp as the learning technique

• Reward function - Inverse Reinforcement Learning (IRL) [NR00]– Mimicking behavior from an “expert”

Page 25: Machine Learning and Motion Planning

Support Vector Path Planning

• Idea that uses the SVM algorithm to generate a smooth path.

• Not really Machine learning but neat application of a ML algortihm

• Here is the idea

Page 26: Machine Learning and Motion Planning

Support Vector Path Planning

Page 27: Machine Learning and Motion Planning

Videos

• Robot learning to pick up objects– http://www.cs.ou.edu/~fagg/movies/index.html#torso_200

4

• Training a Dog– http://characters.media.mit.edu/projects/dobie.html

Page 28: Machine Learning and Motion Planning

References• [NR00] A. Y. Ng and S. Russell. Algorithms for inverse reinforcement learning. In Proc.

17th International Conf. on Machine Learning, pages 663-670. Morgan Kaufmann, San Francisco, CA, 2000.

• [B02] B. Blumberg et al. Integrated learning for interactive synthetic characters. In SIGGRAPH ‘02: Proceedings of the 29th annual conference on Computer graphics and interactive techniques, pages 417-426, New York, NY, USA, 2002. ACM Press.

• [RZ03] S. J. Russell and A. Zimdars. Q-decomposition for reinforcement learning agents. In ICML, pages 656-663, 2003

• [MI06] K. Morihiro, Teijiro Isokawa, Haruhiko Nishimura, Nobuyuki Matsui, Emergence of Flocking Behavior Based on Reinforcement Learning, Knowledge-Based Intelligent Information and Engineering Systems, pages 699-706, 2006

• [CT06] T. Conde and D. Thalmann. Learnable behavioural model for autonomous virtual agents: low-level learning. In AAMAS ‘06: Proceedings of the fifth international joint conference on Autonomous agents and multiagent systems, pages 89-96, New York, NY, USA, 2006. ACM Press.

• [M06] J. Miura. Support vector path planning. In Intelligent Robots and Systems, 2006 IEEE/RSJ International Conference on, pages 2894-2899, 2006.

• [KP07] M. Kalisiak and M. van de Panne. Faster motion planning using learned local viability models. In ICRA, pages 2700-2705, 2007.

Page 29: Machine Learning and Motion Planning

Machine Learning Ref

[M07] Mehryar Mohri - Foundations of Machine Learning course notes http://www.cs.nyu.edu/~mohri/ml07.html

[M97] Tom M. Mitchell. Machine learning. McGraw-Hill, 1997

RN05] Russell S, Norvig P (1995) Artificial Intelligence: A Modern Approach, Prentice Hall Series in Artificial Intelligence. Englewood Cliffs, New Jersey

[CV95] Corinna Cortes and Vladimir Vapnik, Support-Vector Networks, Machine Learning, 20, 1995.

[V98] Vladimir N. Vapnik. Statistical Learning Theory. Wiley, 1998.

[KV94] Michael J. Kearns and Umesh V. Vazirani. An Introduction to Computational Learning Theory. MIT Press, 1994.