Machine Learning and Motion Planning

Machine Learning and Motion Planning

Dave Millman

October 17, 2007

Machine Learning intro

• Machine Learning (ML) – The study of algorithms which improve

automatically though experience. - Mitchell

• General description– Data driven– Extract some information from data– Mathematically based

• Probability, Statistics, Information theory, Computational learning theory, optimization

A very small set of uses of ML

– Text• Document labeling, Part of speech tagging,

Summarization

– Vision• Object recognition, Hand writing recognition, Emotion

labeling, Surveillance

– Sound• Speech recognition, music genra classification

– Finance• Algorithmic trading

– Medical, Biological, Chemical, on and on and on…

A few types of ML

• Supervised– Given: labeled data– Usual goal: learn function– Ex: SVM, Neural Networks, Boosting etc.

• Unsupervised– Given: unlabeled data– Usual goal: cluster data, learn conditional

probabilities– Ex: Nearest Neighbors, Decision trees

A few types of ML (cont.)

• Semi-Supervised– Given: labeled and unlabeled data– Usual goal: use unlabeled data to increase labeled

data– Ex: Cluster, Label unlabeled data from clusters

• Reinforcement– Given: Reward function and set of actions– Goal: Learn a function which optimizes the reward

function– Ex: Q-learning , Ant-Q

General Idea Supervised

General Idea Unsupervised

General Idea Semi-Supervised

General Idea Reinforcement

• Markov Decision Process (MDP)– State space (fully or partially observable)– Action space (static or time dependant)– Transition function produces an action

(based the present state, not the past)– Reward function (based on action)

Why regression is not enough…The XOR problem

Text book Q-Learning [MI06]

• Learning flocking behavior– N agents– discrete time steps– Agent i partner j

– Define Q-state Q(st, at)

– st - state

– ai - action

Our text book example

– State of i• [R] = floor(|i-j|)

– Actions for i• a1 - Attract to j

• a2 - Parallel positive orientation to j

• a3 - Parallel negative orientation to j

• a4 - Repulsion from j

Reward Function - no predator

• Distances R1, R2, R3 s.t. R1 < R2 < R3

st 0<[R]≤R1 R1<[R]≤R2 R2<[R]≤R3 R3<[R]

at a4 a1,a2,a3 a2 a1,a3,a4 a1 a2,a3,a4 a1,a2,a3, a4

r 1 -1 1 -1 1 -1 0

Reward Function - predator

• Distances R1, R2, R3 s.t. R1 < R2 < R3

st 0<[R]≤R3 R3<[R]

at a4 a1,a2,a3 a1,a2,a3,a4

r 1 -1 0

Don’t repeat work!!

• Basic planners work from scratch• Ex, path planning for parking, no

difference between first time and the hundredth time

• Ideal learn some general higher level “strategies” that can be reused

• General solution patterns in the problem space

Viability Filtering [KP07]

• Agent can “see”, perceptual information– Range finder like virtual sensors

• Data base of successfully perceptually-parameterized motions– From its own experimentation or external source

• Database exploited for future queries– Search based off of what has previously been

successful in similar situations.

Sensors in Viability Filtering some defs

• X set of agent states• E set of environment states• def x+ \in X+ {x+=(x,e) | x \in X, e \in E}• Sensor function (x+): X+ R• At a specific sensor state x+ \in X+

• def sensor state s =(1(x+), …, n(x+))

• And sensor space s \in S where S all sensor state values

Finally

• Def locally situated state of the agent = (s,x’) \in where x’ is some state information independent of the sensory agent.

• Now we want collect data to train a function

(): {viable, nonviable}• Note, errors in () could cause problems

Check Viability not Collision

Function IS_NONVIABLE(x+)

if is_collision(x+) then

return True

s := (1(x+), …, n(x+))

x’ := extract_internal_state(x+)

:= (s, x’)

return ¬():

Results and Further work

• Bootstrapping • Use of history to create macroscopic plans• Model transfer

Training a Dog [B02]

• MIT lab - System where the user interactively train the dog using “click training”

• Uses acoustic patterns as cues for actions

• Can be taught cues on different acoustic pattern

• Can create new actions from state space search

• Simplified Q-learning based on animal training techniques

Training a Dog (cont.)• Predictable regularities

– animals will tend to successful state– small time window

• Maximize use of supervisor feedback – limit the state space by only looking at states that

matter, ex if utterance u followed by action a produces a reward then utterance u is important.

• Easy to train– Credit accumulation– And allowing state action pair to delegate credit to

another state action pain.

Alternatives to Q-Learning

• Q-decomp [RZ03]– Complex agent as

set of simpler subagents

– Subagent has its own reward function

– Arbitrator decides best actions based on “advice” from subagents

• A simple world with initial state S0 and three terminal states SL , SU , SR , each with an associated reward of dollars and/or euros. The discount factor is γ ∈

(0, 1). [fig from. RZ03]

Learning Behavior with Q-Decomp [CT06]

• Q-Decomp as the learning technique

• Reward function - Inverse Reinforcement Learning (IRL) [NR00]– Mimicking behavior from an “expert”

Support Vector Path Planning

• Idea that uses the SVM algorithm to generate a smooth path.

• Not really Machine learning but neat application of a ML algortihm

• Here is the idea

Support Vector Path Planning

Videos

• Robot learning to pick up objects– http://www.cs.ou.edu/~fagg/movies/index.html#torso_200

4

• Training a Dog– http://characters.media.mit.edu/projects/dobie.html

http://www.cs.ou.edu/~fagg/movies/index.html#torso_2004

http://www.cs.ou.edu/~fagg/movies/index.html#torso_2004

http://characters.media.mit.edu/projects/dobie.html

References• [NR00] A. Y. Ng and S. Russell. Algorithms for inverse reinforcement learning. In Proc.

17th International Conf. on Machine Learning, pages 663-670. Morgan Kaufmann, San Francisco, CA, 2000.

• [B02] B. Blumberg et al. Integrated learning for interactive synthetic characters. In SIGGRAPH ‘02: Proceedings of the 29th annual conference on Computer graphics and interactive techniques, pages 417-426, New York, NY, USA, 2002. ACM Press.

• [RZ03] S. J. Russell and A. Zimdars. Q-decomposition for reinforcement learning agents. In ICML, pages 656-663, 2003

• [MI06] K. Morihiro, Teijiro Isokawa, Haruhiko Nishimura, Nobuyuki Matsui, Emergence of Flocking Behavior Based on Reinforcement Learning, Knowledge-Based Intelligent Information and Engineering Systems, pages 699-706, 2006

• [CT06] T. Conde and D. Thalmann. Learnable behavioural model for autonomous virtual agents: low-level learning. In AAMAS ‘06: Proceedings of the fifth international joint conference on Autonomous agents and multiagent systems, pages 89-96, New York, NY, USA, 2006. ACM Press.

• [M06] J. Miura. Support vector path planning. In Intelligent Robots and Systems, 2006 IEEE/RSJ International Conference on, pages 2894-2899, 2006.

• [KP07] M. Kalisiak and M. van de Panne. Faster motion planning using learned local viability models. In ICRA, pages 2700-2705, 2007.

Machine Learning Ref

[M07] Mehryar Mohri - Foundations of Machine Learning course notes http://www.cs.nyu.edu/~mohri/ml07.html

[M97] Tom M. Mitchell. Machine learning. McGraw-Hill, 1997

RN05] Russell S, Norvig P (1995) Artificial Intelligence: A Modern Approach, Prentice Hall Series in Artificial Intelligence. Englewood Cliffs, New Jersey

[CV95] Corinna Cortes and Vladimir Vapnik, Support-Vector Networks, Machine Learning, 20, 1995.

[V98] Vladimir N. Vapnik. Statistical Learning Theory. Wiley, 1998.

[KV94] Michael J. Kearns and Umesh V. Vazirani. An Introduction to Computational Learning Theory. MIT Press, 1994.

http://www.cs.nyu.edu/~mohri/ml07.html

Machine Learning and Motion Planning

Documents