Page 1
An LP-Based Heuristic for Optimal Planning
Menkes van den BrielDepartment of Industrial Engineering
Arizona State [email protected]
Subbarao KambhampatiDepartment of Computer Science
Arizona State [email protected]
Thomas VossenLeeds School of Business
University of Colorado at [email protected]
J. BentonDepartment of Computer Science
Arizona State [email protected]
http://rakaposhi.eas.asu.edu/yochan/
Page 2
What is automated planning?
loc1 loc2 loc1 loc2
Initial states0 S
Goals* S
Page 3
What is automated planning?
loc1 loc2 loc1 loc2
loc1 loc1
Initial states0 S
Goals* S
Action
a = pre, post, prevail
Page 4
What is automated planning?
loc1 loc2 loc1 loc2
loc1 loc1
Initial states0 S
Goals* S
Action
a = pre, post, prevail
PlanP = a1, …, an
Page 5
Motivation
• Why heuristics?– Heuristic state space search have been very successful in
solving automated planning problems
• Why optimal planning?– Real-world planning applications require optimal or near-optimal
solutions• The difference between a (near) optimal solution and a feasible
solution may be the difference between winning or losing the interest of an investor or strategic partner
Page 6
LP-based heuristic
Relax the ordering of the actions
Setup an integer programming formulation
Solve the LP-relaxation and use the objective function value as an admissible distance estimate
Strengthen the formulation by adding valid inequalites
Page 7
Action selection formulation
• Represent the planning problem as a set of loosely coupled network flow problems– Each state variable defines one network flow problem– Nodes correspond to the state variable values– Arcs correspond to state variable transitions
Page 8
Simple logistics example
1
2
T
1
2
DTGPackage1
DTGTruck1
Load(p1,t1,l1)
Load(p1,t1,l2)
Unload(p1,t1,l1)
Unload(p1,t1,l2)
Drive(l1,l2) Drive(l2,l1)
Load(p1,t1,l1)Unload(p1,t1,l1)
Load(p1,t1,l1)Unload(p1,t1,l1)
loc1 loc2
Page 9
Action selection formulation
• Variables– xa Z+, for a A; xa is equal to the number of times action a is
executed
• Objective function– MIN aA xa
• Constraints, for all c C, f Vc
eVc+(f):aAcE(e) xa – eVc–(f):bAcE(e) xb
– xa M eVc+(f):bAcE(e) xb for all f s0[c], a AcV(f)
1 if f s0[c], f = s*[c]–1 if f = s0[c], f s*[c]0 otherwise
No time indicesNo upper bound
Page 10
Simple logistics example
1
2
T
1
2
DTGPackage1
DTGTruck1
Load(p1,t1,l1)
Load(p1,t1,l2)
Unload(p1,t1,l1)
Unload(p1,t1,l2)
Drive(l1,l2) Drive(l2,l1)
Load(p1,t1,l1)Unload(p1,t1,l1)
Load(p1,t1,l1)Unload(p1,t1,l1)
loc1 loc2
Page 11
Simple logistics example
Feasible plan
xDrive(l2,l1) = 1xLoad(p1,t1,l1) = 1xDrive(l1,l2) = 1xUnload(p1,t1,l2) = 11
2
T
1
2
DTGPackage1
DTGTruck1
Load(p1,t1,l1)
Load(p1,t1,l2)
Unload(p1,t1,l1)
Unload(p1,t1,l2)
Drive(l1,l2) Drive(l2,l1)
Load(p1,t1,l1)Unload(p1,t1,l1)
Load(p1,t1,l1)Unload(p1,t1,l1)
4
Drive(l2,l1) Load(p1,t1,l1) Drive(l1,l2) Unload(p1,t1,l2)
Page 12
Simple logistics example
LP solution
xLoad(p1,t1,l1) = 1xUnload(p1,t1,l2) = 1xDrive(l2,l1) = 1/M
1
2
T
1
2
DTGPackage1
DTGTruck1
Load(p1,t1,l1)
Load(p1,t1,l2)
Unload(p1,t1,l1)
Unload(p1,t1,l2)
Drive(l1,l2) Drive(l2,l1)
Load(p1,t1,l1)Unload(p1,t1,l1)
Load(p1,t1,l1)Unload(p1,t1,l1)
2 + 1/M
Drive(l2,l1) Load(p1,t1,l1) Unload(p1,t1,l2)… …
Page 13
Preliminary resultsProblem LP LP- Lplan h+ hFF Optimallog4-0 16.0* 17 19 19 20log4-1 14.0* 15 17 17 19log4-2 10.0* 11 13 13 15log5-1 12.0* 13 15 15 17log5-2 6.0* 7 8 8 8log6-1 10.0* 11 13 13 14log6-9 18.0* 19 21 21 24log12-0 32.0* 33 39 39 -log15-1 54.0* - 63 66 -freecell2-1 9 9 9 9 9freecell2-2 8 8 8 8 8freecell2-3 8 8 8 9 8freecell2-4 8 8 8 9 8freecell2-5 9 9 9 9 9freecell3-5 12 13 13 14 -freecell13-3 55 - - 95 -freecell13-4 54 - - 94 -freecell13-5 52 - - 94 -driverlog1 3.0* 7 6 8 7driverlog2 12.0* 13 14 15 19driverlog3 8.0* 9 11 11 12driverlog4 11.0* 12 12 15 16driverlog6 8.0* 9 10 10 11driverlog7 11.0* 12 12 15 13driverlog13 15.0* 16 21 26 -driverlog19 60.0* - 89 93 -driverlog20 60.0* - 84 106 -
Page 14
Preliminary resultsProblem LP LP- Lplan h+ hFF Optimalzenotravel1 1 1 1 1 1zenotravel2 3.0* 5 4 4 6zenotravel3 4.0* 5 5 5 6zenotravel4 5.0* 6 6 6 8zenotravel5 8.0* 9 11 11 11zenotravel6 8.0* 9 11 13 11zenotravel13 18.0* 19 23 23 -zenotravel19 46.0* - 62 63 -zenotravel20 50.0* - - 69 -tpp1 3.0* 5 4 4 5tpp2 6.0* 7 7 7 8tpp3 9.0* 10 10 10 11tpp4 12.0* 13 13 13 14tpp5 15.0* 17 17 17 19tpp6 21.0* 23 21 21 -tpp28 150.0* - - 88 -tpp29 - - - 104 -tpp30 174.0* - - 101 -bw-sussman 4 6 5 5 6bw-12step 4 8 4 7 12bw-large-a 12 12 12 12 12bw-large-b 16 18 16 16 18
Page 15
Strengthening techniques
• Composition of state variables (i.e. fluent merging)– Given the domain transition graph (DTG) of two state variables
c1, c2, the composition of DTGc1 and DTGc2 is the domain transition graph DTGc1||c2 = (Vc1||c2, Ec1||c2) where
– Vc1||c2 = Vc1 Vc2
– ((f1,g1),(f2,g2)) Ec1||c2 if f1,f2 Vc1, g1,g2 Vc2 and there exists an action a A such that one of the following conditions hold
• pre[c1] = f1, post[c1] = f2, and pre[c2] = g1, post[c2] = g2
• pre[c1] = f1, post[c1] = f2, and prevail[c2] = g1, g1 = g2
• pre[c1] = f1, post[c1] = f2, and g1= g2
The term composition is also used in model checking to define the parallel composition or the synchronized product of automata
[Cassandras & Lafortune, 1999]
Page 16
Example
• Two DTGs and their composition
f3
f2
f1
g2
g1
b
c
d
DTGc1 DTGc2
a
b
f1,g2
f2,g1
f2,g2
f3,g1
f3,g2
f1,,g1
DTGc1 || c2
a
a
b
c
c
d
d
Page 17
Example
• Two DTGs and their composition– Small in-arcs denote the initial state– Double circles denote the goal
f3
f2
f1
g2
g1
b
c
d
DTGc1 DTGc2
a
b
f1,g2
f2,g1
f2,g2
f3,g1
f1,,g1
DTGc1 || c2
a
a
b
c
c
d
d
Page 18
Simple logistics example
loc1 loc2
1,1
1,T
2,T
2,2
1,2
2,1
DTGTruck1 || Package1
Drive(l1,l2)
Drive(l2,l1)
Load(p1,t1,l1)
Load(p1,t1,l2)
Unload(p1,t1,l1)
Unload(p1,t1,l2)
Drive(l1,l2)
Drive(l2,l1)
Drive(l1,l2)Drive(l2,l1)
Page 19
Simple logistics example
1,1
1,T
2,T
2,2
1,2
2,1
DTGTruck1 || Package1
LP solution
xDrive(l2,l1) = 1xLoad(p1,t1,l1) = 1xDrive(l1,l2) = 1xUnload(p1,t1,l2) = 1
4
Drive(l2,l1) Load(p1,t1,l1) Drive(l1,l2) Unload(p1,t1,l2)
Drive(l1,l2)
Drive(l2,l1)
Load(p1,t1,l2)
Unload(p1,t1,l1)
Unload(p1,t1,l2)
Drive(l1,l2)
Drive(l2,l1)
Drive(l1,l2)Drive(l2,l1)
Page 20
Another example
• Two DTGs and their composition
f3
f2
f1
g3
g2
g1
f1,g2
f1,g3
f2,g1
f2,g2f2,g3
f3,g1
f3,g2
f3,g3
f1,,g1
DTGc1 DTGc2 DTGc1 || c2
Page 21
Another example
• Two DTGs and their composition– Solution to the individual state variables
f3
f2
f1
g3
g2
g1
f1,g2
f1,g3
f2,g1
f2,g2f2,g3
f3,g1
f3,g2
f3,g3
f1,,g1
b
a
a
b
DTGc1 DTGc2 DTGc1 || c2
Page 22
Another example
• Two DTGs and their composition– Solution to the individual state variables represented in the
composed state variable
f3
f2
f1
g3
g2
g1
f1,g2
f1,g3
f2,g1
f2,g2f2,g3
f3,g1
f3,g2
f3,g3
f1,,g1
b
a
a
b
DTGc1 DTGc2 DTGc1 || c2
b
a
Page 23
Another example
• Two DTGs and their composition– Solution to the individual state variables represented in the
composed state variable
f3
f2
f1
g3
g2
g1
f1,g2
f1,g3
f2,g1
f2,g2f2,g3
f3,g1
f3,g2
f3,g3
f1,,g1
b
a
a
b
DTGc1 DTGc2 DTGc1 || c2
b
a
Violates balance of flow constraints
Page 24
Another example
• Two DTGs and their composition– Adding new balance of flow constraints strengthens the
formulation
f3
f2
f1
g3
g2
g1
f1,g2
f1,g3
f2,g1
f2,g2f2,g3
f3,g1
f3,g2
f3,g3
f1,,g1
b
a
a
b
DTGc1 DTGc2 DTGc1 || c2
b
a
c
c
e
dd
e
Page 25
Identifying mergeable fluents
• When should we create a composition of two or more state variables?– Look at the causal graph– Look at the actions that introduce dependencies in the causal
graph
Person 1 Person 2
Airplane 1 Airplane 2
Fuel 1 Fuel 2
Person 1 Person 2
Airplane 1Fuel1
Airplane 2Fuel2
Page 26
Experimental setup
• Objective– Minimize number of actions
• Domains– Selected domains from the International Planning Competition
• Logistics
• Freecell
• Driverlog
• Zenotravel
• TPP
• Blocksworld
• Resources– 2.67Ghz Linux machine– 1GB memory– 15 minutes runtime– CPLEX 10.0
Page 27
Experimental setup
• Distance estimates– LP
• Action selection formulation with strengthening
– LP–
• Action selection formulation without strengthening
– Lplan• Step based integer programming formulation by Lplan [Bylander, 1997]
– h+
• Optimal relaxed plan when the delete effects are ignored
– hFF
• Inadmissible but efficient relaxed plan heuristic by FF [Hoffmann, and Nebel, 2001]
– Optimal• Optimal distance estimate given by Satplanner using the –opt flag
[Rintanen, Heljanko, and Niemela, 2005]
Page 28
Experimental resultsProblem LP LP- Lplan h+ hFF Optimallog4-0 20 16.0* 17 19 19 20log4-1 19 14.0* 15 17 17 19log4-2 15 10.0* 11 13 13 15log5-1 17 12.0* 13 15 15 17log5-2 8 6.0* 7 8 8 8log6-1 14 10.0* 11 13 13 14log6-9 24 18.0* 19 21 21 24log12-0 42 32.0* 33 39 39 -log15-1 67 54.0* - 63 66 -freecell2-1 9 9 9 9 9 9freecell2-2 8 8 8 8 8 8freecell2-3 8 8 8 8 9 8freecell2-4 8 8 8 8 9 8freecell2-5 9 9 9 9 9 9freecell3-5 12 12 13 13 14 -freecell13-3 55 55 - - 95 -freecell13-4 54 54 - - 94 -freecell13-5 52 52 - - 94 -driverlog1 7 3.0* 7 6 8 7driverlog2 19 12.0* 13 14 15 19driverlog3 11 8.0* 9 11 11 12driverlog4 15.5 11.0* 12 12 15 16driverlog6 11 8.0* 9 10 10 11driverlog7 13 11.0* 12 12 15 13driverlog13 24 15.0* 16 21 26 -driverlog19 96.6* 60.0* - 89 93 -driverlog20 89.5* 60.0* - 84 106 -
Page 29
Experimental resultsProblem LP LP- Lplan h+ hFF Optimalzenotravel1 1 1 1 1 1 1zenotravel2 6 3.0* 5 4 4 6zenotravel3 6 4.0* 5 5 5 6zenotravel4 8 5.0* 6 6 6 8zenotravel5 11 8.0* 9 11 11 11zenotravel6 11 8.0* 9 11 13 11zenotravel13 24 18.0* 19 23 23 -zenotravel19 66.2* 46.0* - 62 63 -zenotravel20 68.3* 50.0* - - 69 -tpp1 5 3.0* 5 4 4 5tpp2 8 6.0* 7 7 7 8tpp3 11 9.0* 10 10 10 11tpp4 14 12.0* 13 13 13 14tpp5 19 15.0* 17 17 17 19tpp6 25 21.0* 23 21 21 -tpp28 - 150.0* - - 88 -tpp29 - - - - 104 -tpp30 - 174.0* - - 101 -bw-sussman 4 4 6 5 5 6bw-12step 4 4 8 4 7 12bw-large-a 12 12 12 12 12 12bw-large-b 16 16 18 16 16 18
Distance estimates from the initial state to the goal (highlighted values equal the optimal distance)
Page 30
Experimental results
• Heuristic calculation time
0.01
0.1
1
10
100
1000lp
lp-
lplan
h+
Logistics Freecell Driverlog Zenotravel TPP Blocks
Page 31
Conclusions and future work
• LP-based heuristic that respects delete effects, but ignores action ordering shows very promising results– Finds the optimal distance estimate in several problem instances– Can be used to calculate admissible distance estimates for
various optimization problems in planning– Ongoing work successfully incorporated our LP-based heuristic
in a search algorithm that solves oversubscription planning
• Interesting directions for future work– Apply fluent merging more aggressively– Extend the formulation into a complete planning system
Page 32
LP-based heuristic
Relax the ordering of the actions
Setup an integer programming formulation
Solve the LP-relaxation and use the objective function value as an admissible distance estimate
Strengthen the formulation by adding valid inequalites