Concurrent Probabilistic Temporal Planning (CPTP) Mausam Joint work with Daniel S. Weld University of Washington Seattle
Jan 24, 2016
Concurrent Probabilistic Temporal Planning (CPTP)
Mausam Joint work with Daniel S. WeldUniversity of WashingtonSeattle
Motivation
Three features of real world planning domains : Durative actions
All actions (navigation between sites, placing instruments etc.) take time.
Concurrency Some instruments may warm up Others may perform their tasks Others may shutdown to save power.
Uncertainty All actions (pick up the rock, send data etc.)
have a probability of failure.
Motivation (contd.)
Concurrent Temporal Planning (widely studied with deterministic
effects) Extends classical planning Doesn’t easily extend to probabilistic
outcomes. Concurrent planning with uncertainty
(Concurrent MDPs – AAAI’04) Handle combinations of actions over an MDP Actions take unit time.
Few planners handle the three in concert!
Outline of the talk
MDP and CoMDPConcurrent Probabilistic Temporal
PlanningConcurrent MDP in augmented state space.
Solution Methods for CPTPTwo heuristics to guide the searchHybridisation
Experiments & ConclusionsRelated & Future Work
Markov Decision Process
S : a set of states, factored into Boolean
variables.A : a set of actionsPr (S£A£S! [0,1]): the transition modelC (A! R) : the cost models0 : the start stateG : a set of absorbing goals
unit duration
GOAL of an MDP
Find a policy (S ! A) which:minimises expected cost of reaching
a goal for a fully observable Markov decision process if the agent executes for indefinite
horizon.
Equations : optimal policy
Define J*(s) {optimal cost} as the minimum expected cost to reach a goal from s.
J* should satisfy:
Min
Bellman Backup
a1
a2
a3
s
Jn
Jn
Jn
Jn
Jn
Jn
Jn
Qn+1(s,a)
Jn+1(s)
Ap(s)
min
Min
RTDP Trial
a1
a2
a3
Jn
Jn
Jn
Jn
Jn
Jn
Jn
Qn+1(s,a)
Jn+1(s)
Ap(s)
amin = a2
Goal
s
min
Real Time Dynamic Programming(Barto, Bradtke and Singh’95)
Trial : Simulate greedy policy;
Perform Bellman backup on visited states
Repeat RTDP Trials until cost function converges Anytime behaviour Only expands reachable state space Complete convergence is slow
Labeled RTDP (Bonet & Geffner’03) Admissible, if started with admissible cost function. Monotonic; converges quickly
optimistic
Lower bound
Concurrent MDP (CoMDP)(Mausam & Weld’04)
Allows concurrent combinations of actions
Safe execution: Inherit mutex definitions from classical planning:Conflicting preconditionsConflicting effects Interfering preconditions and effects
Jn
Jn
Jn
Jn
Jn
Bellman Backup (CoMDP)
a2
a1,a2
a3
sJn+1(s)
Ap(s)
a1
a1,a
3
a2,a3
a1,a2,a3
Jn
Jn
Jn
Jn
Jn
Jn JnJn
Jn
Jn
Jn
Jn
Jn
Exponential blowup to calculate a
Bellman Backup!
min
Sampled RTDP
RTDP with Stochastic (partial) backups:ApproximateAlways try the last best combination Randomly sample a few other
combinations In practice
Close to optimal solutionsConverges very fast
Outline of the talk
MDP and CoMDPConcurrent Probabilistic Temporal
PlanningConcurrent MDP in augmented state space.
Solution Methods for CPTPTwo heuristics to guide the searchHybridisation
Experiments & ConclusionsRelated & Future Work
Modelling CPTP as CoMDP
CoMDP CPTP
Model explicit action durationsMinimise expected make-span.
If we initialise C(a) as its duration – (a) :
Aligned epochs Interwoven epochs
Augmented state space
0 3 6 9
X
a
b
c
e
d f
h
g
<X,;><X1,{(a,1), (c,3)}>X1 : Application of b on X.
<X2,{(h,1)}>X2 : Application of a, b, c, d and e over X.
Time
Simplifying assumptions
All actions have deterministic durations. All action durations are integers. Action model
Preconditions must hold until end of action. Effects are usable only at the end of action.
Properties : Mutex rules are still required. Sufficient to consider only epochs when an action
ends
Completing the CoMDP
Redefine Applicability set Transition function Start and goal states.
Example: Transition function is redefined
Agent moves forward in time to an epoch where some action completes.
Start state : <s0,;> etc.
Solution
CPTP = CoMDP in interwoven state space.
Thus one may use our sampled RTDP (etc)
PROBLEM: Exponential blowup in the size of the state space.
Outline of the talk
MDP and CoMDPConcurrent Probabilistic Temporal
PlanningConcurrent MDP in augmented state space.
Solution Methods for CPTPSolution 1 : Two heuristics to guide the
searchSolution 2 : Hybridisation
Experiments & ConclusionsRelated & Future Work
Max Concurrency Heuristic (MC)
Define c : maximum number of actions executable concurrently in the domain.
•J*(X) · 2£ J*(<X,;>)
•J*(<X,;>) ¸ J*(X)/2
a
b c
J*(<X,;>) = 10
X Ga b c
J*(X) · 20
X G
Serialisation
Admissible Heuristic
Eager Effects Heuristic : Solving a relaxed problem
S : S £ ZLet (X be a state where
X is the world state. : time remaining for all actions
(started anytime in the history) to complete execution.
Start state : (s0,0)Goal states : { (X,0) | X2G }
Eager Effects Heuristic (contd.)
After 2 units(V,6)a
bX
2
8V
c 4
Allow all actions even when
mutex with a or c!
Allowing inapplicable actions to execute, thus
optimistic!
Assuming information of action
effects ahead of time, thus optimisitic!
Hence the name –Eager Effects!
Admissible Heuristic
Solution2 : Hybridisation
ObservationsAligned epoch policy is sub-optimal
but fast to compute. Interwoven epoch policy is optimal
but slow to compute.
Solution: Produce a hybrid policy i.e. : Output interwoven policy for probable
states.Output aligned policy for improbable
states.
Path to goals
s G
GLow
Prob.
Hybrid algorithm (contd.)
Observation: RTDP explores probable branches much more than others.
Algorithm(m,k,r) : Loop
Do m RTDP trials: let current value of start state be J(s0).
Output a hybrid policy () Interwoven policy for states visited > k times Aligned policy for other states.
Evaluate policy : J(s0)
Stop if {J(s0) – J(s0)} < rJ(s0)
Less than optimal
Greater than optimal
Hybridisation
Outputs a proper policy : Policy defined at all reachablepolicy states Policy guaranteed to take agent to goal.
Has an optimality ratio (r) parameter Controls balance between optimality & running
times. Can be used as an anytime algorithm. Is general –
we can hybridise two algorithms in other cases e.g. in solving original concurrent MDP.
Outline of the talk
MDP and CoMDPConcurrent Probabilistic Temporal
PlanningConcurrent MDP in augmented state space.
Solution Methods for CPTPTwo heuristics to guide the searchHybridisation
Experiments & ConclusionsRelated & Future Work
Experiments
DomainsRoverMachineShopArtificial
State Variables: 14-26Durations: 1-20
Speedups in Rover domain
Efficiency of different methods
1
10
100
1000
10000
1 2 3 4 5 6
Different Rover Problems
Tim
e in
sec (
in lo
gari
thm
ic s
cale
)
Interwoven Epoch
Max Concurrency
Eager Effects
Hybrid Algorithm
Aligned epochs
Qualities of solution
Solution Quality of different methods
0.8
0.9
1
1.1
1.2
1.3
1.4
1.5
1.6
1.7
1 2 3 4 5 6
Different Rover Problems
Rati
o o
f m
ake-s
pan
to
th
e o
pti
mal
Interwoven Epoch
Max Concurrency
Eager Effects
Hybrid Algorithm
Aligned epochs
Experiments : Summary
Max Concurrency heuristic Fast to compute Speeds up the search.
Eager Effects heuristic High quality Can be expensive in some domains.
Hybrid algorithm Very fast Produces good quality solutions.
Aligned epoch model Superfast Outputs poor quality solutions at times.
Related Work
Prottle (Little, Aberdeen, Thiebaux’05)
Generate, test and debug paradigm (Younes & Simmons’04)
Concurrent options (Rohanimanesh & Mahadevan’04)
Future Work
Other applications of hybridisation CoMDP MDP OverSubscription Planning
Relaxing the assumptions Handling mixed costs Extending to PDDL2.1 Stochastic action durations
Extensions to metric resources State space compression/aggregation