Towards Real-Life Reinforcement Learning Michael L. Littman Rutgers University Department of Computer Science Rutgers Laboratory for Real-Life Reinforcement Learning Where We’re Going Introduce reinforcement learning • why I think it’s exciting Define the problem and current approaches • highlight challenges of RL with real data Current projects in my lab: • efficient exploration • rich sensors • partial observability • non-stationary environments Why RL? Why AI? •Make “Your plastic pal who’s fun to be with” •Solve challenging problems in computer science •Understand humanity •Create useful tools •Produce a decent movie
8
Embed
Towards Real-Life Reinforcement Learningmlittman/talks/ai04-rl.pdf · 2004-05-19 · Instance-based approach to partially observability. Learning Network Troubleshooting Recovery
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Towards Real-LifeReinforcement Learning
Michael L. LittmanRutgers University
Department of Computer Science
Rutgers Laboratory for Real-Life Reinforcement Learning
Where We’re Going
Introduce reinforcement learning
• why I think it’s exciting
Define the problem and current approaches
• highlight challenges of RL with real data
Current projects in my lab:
• efficient exploration
• rich sensors
• partial observability
• non-stationary environments
Why RL? Why AI?
•Make “Your plastic pal who’sfun to be with”
•Solve challenging problems in computer science
•Understand humanity
•Create useful tools
•Produce a decentmovie
Creating Human-Level AI
Significant motivator in the early days
Big question still unanswered:
What kind of information do we need to putinto our programs for them to be intelligent?
How ought we program intelligent machines?
• Program behavior?
• Program desires?
(Likely long haul even after we answer this.)
Impressive Accomplishment
Honda’s Asimo
• development began in 1999, building on 13years of engineering experience.
Additional information helps to make the right choice.
How Learn a Model?
Assume repair episodes are iid.
In each episode, some mix of actions taken.
New episode:
• Assume it is one of the previous episodes.
• Time-minimum plan under this assumption.
• “State” is subset of past episodes.
Instance-based approach to partiallyobservability.
Learning Network Troubleshooting
Recovery from corruptednetwork interfaceconfiguration.
Java/Windows XP:Minimize time to repair.
After 95 failure episodes
Non-Stationary Environments
Problem: To predict future events in the face of abruptchanges in the environment.
Income: 2/3 Income: 1/3
Investment: 2/3
Investment: 1/3
Animal behavior: Matchinvestment to income givenmultiple options
Observation (Gallistel et al.):Abrupt changes in payoffrates result in abruptchanges in investmentrates. Proposed change-detection algorithm.
with Diuk, Sharma
Recognizing Changes in Disk Access
• Under real usage conditions, abrupt changes between usage modes.
• Detecting abrupt changes to mode can save energy
Portable computers usetechniques such as disk spin-down/up to conserve energy.Given the history of diskaccesses of the user,predicting how long it will beuntil the next disk accessoccurs
Where We Went
Reinforcement learning: Lots of progress.
Let’s reconnect learning with real data:
• previous ideas contribute significantly
• model-based approaches showing promise
• new twists needed
• some fundamental new ideas needed– representation
– reward
– reasoning about change and partial observability