1 Endgame Logistics h Final Project Presentations 5 Tuesday, March 19, 3-5, KEC2057 5 Powerpoint suggested (email to me before class) g Can use your own laptop if necessary (e.g. demo) 5 10 minutes of presentation per project g Not including questions h Final Project Reports 5 Due: Friday, March 22, 12 noon
Endgame Logistics. Final Project Presentations Tuesday, March 19, 3-5, KEC2057 Powerpoint suggested (email to me before class ) Can use your own laptop if necessary (e.g. demo) 10 minutes of presentation per project Not including questions Final Project Reports - PowerPoint PPT Presentation
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
1
Endgame Logisticsh Final Project Presentations
5 Tuesday, March 19, 3-5, KEC20575 Powerpoint suggested (email to me before class)
g Can use your own laptop if necessary (e.g. demo)5 10 minutes of presentation per project
g Not including questions
h Final Project Reports5 Due: Friday, March 22, 12 noon
2
Percepts
ActionsWorldperfect
vs. noisy
fully observable vs. partially observable
instantaneous vs. durative
deterministic vs. stochastic
????
sole sourceof change vs. other sources
concurrent actions vs. single action
goal satisfaction vs. general reward
Objective
known world model vs. unknown
numeric vs. discrete
3
Percepts
ActionsWorldperfect
vs. noisy
fully observable vs. partially observable
instantaneous vs. durative
deterministic vs. stochastic
????
sole sourceof change vs. other sources
concurrent actions (but …) vs. single action
goal satisfaction vs. general reward
Objective
known world model vs. unknown vs. partial model
numeric vs. discrete
STRIPS Planning
4
Percepts
ActionsWorldperfect
vs. noisy
fully observable vs. partially observable
instantaneous vs. durative
deterministic vs. stochastic
????
sole sourceof change vs. other sources
concurrent actions vs. single action
goal satisfaction vs. general reward
Objective
known world model vs. unknown
numeric vs. discrete
MDP Planning
5
Percepts
ActionsWorldperfect
vs. noisy
fully observable vs. partially observable
instantaneous vs. durative
deterministic vs. stochastic
????
sole sourceof change vs. other sources
concurrent actions vs. single action
goal satisfaction vs. general reward
Objective
known world model vs. unknown
numeric vs. discrete
ReinforcementLearning
6
Percepts
ActionsWorldperfect
vs. noisy
fully observable vs. partially observable
instantaneous vs. durative
deterministic vs. stochastic
????
sole sourceof change vs. other sources
concurrent actions vs. single action
goal satisfaction vs. general reward
Objective
known world model vs. unknown vs. simulator
numeric vs. discrete
Simulation-BasedPlanning
7
Percepts
ActionsWorldperfect
vs. noisy
fully observable vs. partially observable
instantaneous vs. durative
deterministic vs. stochastic
????
sole sourceof change vs. other sources
concurrent actions vs. single action
goal satisfaction vs. general reward
Objective
known world model vs. unknown
numeric vs. discrete
8
Numeric Statesh In many cases states are naturally described in
terms of numeric quantitiesh Classical control theory typically studies MDPs
with real-valued continuous state spaces5 Typically assume linear dynamical systems5 Quite limited for most applications we are interested in
in AI (often mix of discrete and numeric)
h Typically we deal with this via feature encodings of the state space
h Simulation based methods, function approximation RL, and policy gradient are agnostic about whether state is numeric or not
9
Percepts
ActionsWorldperfect
vs. noisy
fully observable vs. partially observable
instantaneous vs. durative
deterministic vs. stochastic
????
sole sourceof change vs. other sources
concurrent actions vs. single action
goal satisfaction vs. general reward
Objective
known world model vs. unknown
numeric vs. discrete
10
Partial Observabilityh In reality we only observe percepts of the world
not the actual stateh Partially-Observable MDPs (POMDPs) extend
MDPs to handle partial observability5 Start with an MDP and add an observation distribution
P(o | s) : probability of observation o given state s5 We see a sequence of observations rather than
sequence of states
h POMDP planning is much harder than MDP planning. Scalability is poor.
h Can often apply RL in practice using features of observations
11
Percepts
ActionsWorldperfect
vs. noisy
fully observable vs. partially observable
instantaneous vs. durative
deterministic vs. stochastic
????
sole sourceof change vs. other sources
concurrent actions vs. single action
goal satisfaction vs. general reward
Objective
known world model vs. unknown
numeric vs. discrete
12
Other Sources of Changeh In many cases the environment changes even if
no actions are select by the agenth Sometimes due to exogenous events, e.g. 911
calls come in at randomh Sometimes due to other agents
5 Adversarial agents try to decrease our reward5 Cooperative agents may be trying to increase our
reward or have their own objectives
h Decision making in the context of other agents is studied in the area of game theory
13
Percepts
ActionsWorldperfect
vs. noisy
fully observable vs. partially observable
instantaneous vs. durative
deterministic vs. stochastic
????
sole sourceof change vs. other sources
concurrent actions vs. single action
goal satisfaction vs. general reward
Objective
known world model vs. unknown
numeric vs. discrete
14
Durative Actionsh Generally different actions have different
durations5 Often durations are stochastic
h Semi-Markov MDPs (SMDPs) are an extension to MDPs that account for actions with probabilistic durations5 Transition distribution changes to P(s’,t | s, a)
which gives the probability of ending up in state s’ in t time steps after taking action a in state s
h Planning and learning algorithms are very similar to standard MDPs. The equations are just a bit more complex to account for time.
15
Percepts
ActionsWorldperfect
vs. noisy
fully observable vs. partially observable
instantaneous vs. durative
deterministic vs. stochastic
????
sole sourceof change vs. other sources
concurrent actions vs. single action
goal satisfaction vs. general reward
Objective
known world model vs. unknown
numeric vs. discrete
16
Durative Actionsh Generally different actions have different
durations5 Often durations are stochastic
h Semi-Markov MDPs (SMDPs) are an extension to MDPs that account for actions with probabilistic durations5 Transition distribution changes to P(s’,t | s, a)
which gives the probability of ending up in state s’ in t time steps after taking action a in state s
h Planning and learning algorithms are very similar to standard MDPs. The equations are just a bit more complex to account for time.
17
Percepts
ActionsWorldperfect
vs. noisy
fully observable vs. partially observable
instantaneous vs. durative
deterministic vs. stochastic
????
sole sourceof change vs. other sources
concurrent actions vs. single action
goal satisfaction vs. general reward
Objective
known world model vs. unknown
numeric vs. discrete
18
Concurrent Durative Actionsh In many problems we need to form plans that
direct the actions of a team of agents5 Typically requires planning over the space of
concurrent activities, where the different activities can have different durations
h Can treat these problems as a huge MDP (SMDP) where the action space is the cross-product of the individual agent actions5 Standard MDP algorithms will break
h There are multi-agent or concurrent-action extensions to most of the formalisms we studied in class
19
Percepts
ActionsWorldperfect
vs. noisy
fully observable vs. partially observable
instantaneous vs. durative
deterministic vs. stochastic
????
sole sourceof change vs. other sources
concurrent actions vs. single action
goal satisfaction vs. general reward
Objective
known world model vs. unknown
numeric vs. discrete
20
Percepts
ActionsWorldperfect
vs. noisy
fully observable vs. partially observable
instantaneous vs. durative
deterministic vs. stochastic
????
sole sourceof change vs. other sources
concurrent actions vs. single action
goal satisfaction vs. general reward
Objective
known world model vs. unknown
numeric vs. discrete
21
Percepts ActionsWorldperfect
vs. noisy
fully observable vs. partially observable instantaneous
vs. durative
deterministic vs. stochastic
AI Planning
????
sole sourceof change vs. other sources
concurrent actions vs. single action
goal satisfaction vs. general reward
Objective
known world model vs. unknown
22
PerceptsActions
Worldperfect vs. noisy
fully observable vs. partially observable instantaneous
vs. durative
deterministic vs. stochastic
????
sole sourceof change vs. other sources
concurrent actions vs. single action
goal satisfaction vs. general reward
Objective
known world model vs. unknown
23
Percepts ActionsWorldperfect
vs. noisy
fully observable vs. partially observable instantaneous
vs. durative
deterministic vs. stochastic
AI Planning
????
sole sourceof change vs. other sources
concurrent actions vs. single action
goal satisfaction vs. general reward
Objective
known world model vs. unknown
24
Percepts ActionsWorldperfect
vs. noisy
fully observable vs. partially observable instantaneous