SAPA: A Domain-independent Heuristic Temporal Planner
Minh B. Do & Subbarao KambhampatiArizona State University
Buenos dias, amigos.Obviamente este es al articulo de Binh Minh.De todas maneras, yo lo convenci de que seria mejor para el usarsu tiempo en trabajar en otro articulo proximo mas que en visitarToledo, un pueblo del oeste medio en Ohio.
Yo entiendo que esta es basicamente la estrategia que Malik usopara presentar tambien el articulo de Romain.
Talk Outline
Temporal Planning and SAPAAction representation and search algorithmObjective functions and heuristics– Admissible/Inadmissible– Resource adjustment
Empirical resultsRelated & future work
Planning
Most academic research has been done in the context of classical planning:
– Already P-SPACE complete– Useful techniques are likely to be
applicable in more expressive planning problems
Real world application normally has more complex requirements: Non-instantaneous actions Temporal constraints on goals Resource consumption
Classical planning has been able to scale up to big problems recently
Can winning strategies in classical planning be applicable in more expressive environments?
Related Work
Planners that can handle similar types of temporal and resource constraints: TLPlan, HSTS, IxTexT, Zeno– Cannot scale up without domain knowledge
Planners that can handle a subset of constraints:– Only temporal: TGP
– Only resources: LPSAT, GRT-R
– Subset of temporal and resource constraints: TP4, Resource-IPP
SAPA
Forward state space planner – Based on [Bachus&Ady].– Make resource reasoning easier
Handles temporal constraintsActions with static and dynamic durationsTemporal goals with deadlinesContinuous resource consumption and productionHeuristic functions to support a variety of objective functions
Action Representation
Flying
(in-city ?airplane ?city1)
(fuel ?airplane) > 0
(in-city ?airplane ?city1) (in-city ?airplane ?city2)
consume (fuel ?airplane)
Durative with EA = SA + DA
Instantaneous effects e at time te = SA + d, 0 d DA
Preconditions need to be true at the starting point, and protected during a period of time d, 0 d DA
Action can consume or produce continuous amount of some resource
Action Conflicts:
Consuming the same resourceOne action’s effect conflictingwith other’s precondition or effect
Searching time-stamped states
Search through the space of time-stamped states
S=(P,M,,Q,t)
Set <pi,ti> of predicates pi and thetime of their last achievement ti < t.
Set of functions represent resource values.
Set of protectedpersistent conditions.
Event queue.
Time stamp of S.
Search Algorithm (cont.)Goal Satisfaction:
S=(P,M,,Q,t) G if <pi,ti> G either: <pi,tj> P, tj < ti and no event in Q deletes pi. e Q that adds pi at time te < ti.
Action Application: Action A is applicable in S if:
– All instantaneous preconditions of A are satisfied by P and M.– A’s effects do not interfere with and Q.– No event in Q interferes with persistent preconditions of A.
When A is applied to S:– S is updated according to A’s instantaneous effects.– Persistent preconditions of A are put in – Delayed effects of A are put in Q.
Flying
(in-city ?airplane ?city1)
(fuel ?airplane) > 0
(in-city ?airplane ?city1) (in-city ?airplane ?city2)
consume (fuel ?airplane)
Flying
(in-city ?airplane ?city1)
(fuel ?airplane) > 0
(in-city ?airplane ?city1) (in-city ?airplane ?city2)
consume (fuel ?airplane)
S=(P,M,,Q,t)
Heuristic Control
Temporal planners have to deal with more branchingpossibilities More critical to have good heuristic guidance
Design of heuristics depends on the objective function
Classical PlanningNumber of actionsParallel execution timeSolving time
Temporal Resource PlanningNumber of actionsMakespanResource consumptionSlack…….
In temporal Planning heuristics focus on richer obj. functions that guide both planning and scheduling
Objectives in Temporal Planning
Number of actions: Total number of actions in the plan.Makespan: The shortest duration in which we can possibly execute all actions in the solution.Resource Consumption: Total amount of resource consumed by actions in the solution.Slack: The duration between the time a goal is achieved and its deadline.– Optimize max, min or average slack values
Deriving heuristics for SAPA
We use phased relaxation approach to derive different heuristics
Relax the negative logical and resource effectsto build the Relaxed Temporal Planning Graph
Pruning a bad statewhile preservingthe completeness.
Deriving admissible heuristics:–To minimize solution’s makespan.–To maximize slack-based objective functions.
Find relaxed solution which is used as distance heuristics
Adjust the heuristic valuesusing the negative interaction
(Future work)
Adjust the heuristic valuesusing the resource consumptionInformation.
[AltAlt,AIJ2001]
Relaxed Temporal Planning Graph
Heuristics in Sapa are derived from the Graphplan-stylebi-level relaxed temporal planning graph (RTPG)
Relaxed Action:No delete effectsNo resource consumption
PersonAirplane
Person
A B
Load(P,A)
Fly(A,B) Fly(B,A)
Unload(P,A)
Unload(P,B)
Init Goal Deadline
t=0 tgwhile(true) forall Aadvance-time applicable in S S = Apply(A,S) if SG then Terminate{solution}
S’ = Apply(advance-time,S) if (pi,ti) G such that ti < Time(S’) and piS then Terminate{non-solution} else S = S’end while;
Heuristics directly from RTPG
For Makespan: Distance from a state S to the goals is equal to the duration between time(S) and the time the last goal appears in the RTPG.For Min/Max/Sum Slack: Distance from a state to the goals is equal to the minimum, maximum, or summation of slack estimates for all individual goals using the RTPG.
Proof: All goals appear in the RTPG at times smalleror equal to their achievable times.
ADMISSIBLE
PersonAirplane
Person
A B
Load(P,A)
Fly(A,B) Fly(B,A)
Unload(P,A)
Unload(P,B)
Init Goal Deadline
t=0 tg
PersonAirplane
Person
A B
Load(P,A)
Fly(A,B) Fly(B,A)
Unload(P,A)
Unload(P,B)
Init Goal Deadline
t=0 tg
Heuristics from Solution Extracted from RTPG
RTPG can be used to find a relaxed solution which is thenused to estimate distance from a given state to the goals
Sum actions: Distance from a state S to the goals equals the number of actions in the relaxed plan.
Sum durations: Distance from a state S to the goals equals the summation of action durations in the relaxed plan.
PersonAirplane
Person
A B
Load(P,A)
Fly(A,B) Fly(B,A)
Unload(P,A)
Unload(P,B)
Init Goal Deadline
t=0 tg
Resource-based Adjustments to Heuristics
Resource related information, ignored originally, can be used to improve the heuristic values
Adjusted Sum-Action:
h = h + R (Con(R) – (Init(R)+Pro(R)))/R
Adjusted Sum-Duration:
h = h + R [(Con(R) – (Init(R)+Pro(R)))/R].Dur(AR)
Will not preserve admissibility
Aims of Empirical Study
Evaluate the effectiveness of the different heuristics.Ablation studies:– Test if the resource adjustment technique helps
different heuristics.
Compare with other temporal planning systems.
Empirical Results
Adjusted Sum-Action Sum-DurationProb time #act nodes dur time #act nodes dur
Zeno1 0.317 5 14/48 320 0.35 5 20/67 320
Zeno2 54.37 23 188/1303 950 - - - -
Zeno3 29.73 13 250/1221 430 6.20 13 60/289 450
Zeno9 13.01 13 151/793 590 98.66 13 4331/5971 460
Log1 1.51 16 27/157 10.0 1.81 16 33/192 10.0
Log2 82.01 22 199/1592 18.87 38.43 22 61/505 18.87
Log3 10.25 12 30/215 11.75 - - - -
Log9 116.09 32 91/830 26.25 - - - -
Sum-action finds solutions faster than sum-dur Admissible heuristics do not scale up to bigger problems
Sum-dur finds shorter duration solutions in most of the casesResource-based adjustment helps sum-action, but not sum-durVery few irrelevant actions. Better quality than TemporalTLPlan.
So, (transitively) better than LPSAT
Comparison to other planners
Planners with similar capabilities – IxTet, Zeno
• Poor scaleup– HSTS, TLPLAN
• Domain dependent search control
Planners with limited capabilities– TGP and TGP – Compared on a set of random temporal logistics problem: Domain specification and problems are defined by TP4’s creator
(“P@trik” Haslum) No resource requirements No deadline constraints or actions with dynamic duration
Empirical Results (cont.)
Logistics domain with driving restricted to intra-city(traditional logistics domain)
0102030405060708090
100
0 100 200 300 400 500 600 700 800 900 1000
Solving Time (s)
Pro
ble
m S
olv
ed
(%
)
SAPA
TP4
TGP
Sapa is the only planner that can solve all 80 problems
Empirical Results (cont.)
The “sum-action” heuristic used as the default in Sapa can be mislead by the long duration actions...
Logistics domain with inter-city driving actions
Future work on fixed point time/level propagation
0102030405060708090
100
0 100 200 300 400 500 600 700 800 900 1000
Solving Time (s)
Pro
ble
ms
So
lve
d (
%)
SAPA
TP4
TGP
Conclusion
Presented SAPA, a domain-independent forward temporal planner that can handle:– Durative actions– Deadline goals– Continuous resources
Developed different heuristic functions based on the relaxed temporal planning graph to address both satisficing and optimizing searchMethod to improve heuristic values by resource reasoningPromising initial empirical results
Future Work
Exploit mutex information in:– Building the temporal planning graph– Adjusting the heuristic values in the relaxed solution
Relevance analysisImproving solution quality Relaxing constraints and integrating with full-scale scheduler