COMP 4180: Intelligent Mobile Robotics
Reinforcement Learning
Jacky BaltesDepartment of Computer Science
University of Manitoba
Email: [email protected]
http://www4.cs.umanitoba.ca/~jacky/...Teaching/Courses/COMP_4180-
IntelligentMobileRobotics/current/index.php
Outline
● Reinforcement Learning Problem– Dynamic Programming– Control learning– Control policies that choose optimal actions– Q Learning– Convergence
● Monte-Carlo Methods● Temporal Difference Learning
Control Learning
Example: TD-Gammon
Reinforcement Learning Problem
Markov Decision Processes
Agent's Learning Task
State Value Function
Bellman Equation(Deterministic Case)
Example
Example
Iterative Policy Evaluation
Iterative Policy Evaluation
What to learn?
Q (Action-Value) Function
Q (Action-Value) Function
Bellman EquationDeterministic Case
Optimal Value Functions
Policy Improvement
Example
Example
Generalized Policy Iteration
Value IterationQ-Learning
Non-deterministic Case
Bellman EquationsNon-deterministic Case
Value IterationQ-Learning
Example
Example
Reinforcement Learning
Monte-Carlo MethodsPolicy Evaluation
Monte Carlo MethodPolicy Evaluation
Temporal Difference (TD) Learning
TD(0): Policy Evaluation
TD(0): Policy Evaluation
e-Greedy Policy
SARSA Policy Iteration
SARSA Example
SARSA Example V(s)
SARSA ExampleQ(s,a)
Rotational Inverted Pendulum
Rotational Inverted Pendulum Stablization Demo, Tor Aarnodthttp://www.eecg.utoronto.ca/~aamodt/BAScThesis/RLsim.htm
Q-Learning (Off-Policy TD)
Q-Learning (Off Policy Iteration)
TD vs Monte Carlo
Temporal Difference Learning
Monte Carlo Method
N-Step return
TD() Learning
Eligibility Traces
On-line TD()
Function Approximation
Function Approximation
Stochastic Gradient Descent
Convergence
Subtleties and Ongoing Research
● Replace Q^ table with neural net or other generalizer
● Handle cases where the state is only partially observable
● Design optimal exploration strategies● Extend to continuous action, state● Learn and use delta^: S x A -> S● Relationship to dynamic programming
References
● Reinforcement Learning: An Introduction. Richard S. Sutton, Andrew G. Barto. MIT Press 1998. http://www-anw.cs.umass.edu/~rich/book/the-book.html
● Neuro-Dynamic Programming, Dimitri Bertsekas, John Tsitsiklis, Athena Scientific, 1996.
● Reinforcement Learning: A Tutorial. M. Harmon, S. Harmon.● Reinforcement Learning: A Survey, L. Kaebling et al., Journal of Aritificial
Intelligence Research, Vol 4, pp. 237-285● How to Make Software Agents Do the Right Thing: An Introduction to
Reinforcement Learning, S. Singh, P. Norvig, D. Cohn.● Reinforcement Learning Software:
– http://www-anw.cs.umass.edu/~rich/software.html– http://www.cse.msu.edu/rlr/domains.html
● Reinforcement Learning for Humanoid Robots–
● Frank Hoffman. http://www.nada.kth.se/kurser/kth/2D1431/02/index.html