An LP-Based Heuristic for Optimal Planning Menkes van den Briel Department of Industrial Engineering Arizona State University menkes@asu.edu menkes@asu.edu.

Post on 14-Dec-2015

219 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

Transcript

An LP-Based Heuristic for Optimal Planning

Menkes van den BrielDepartment of Industrial Engineering

Arizona State Universitymenkes@asu.edu

Subbarao KambhampatiDepartment of Computer Science

Arizona State Universityrao@asu.edu

Thomas VossenLeeds School of Business

University of Colorado at Bouldervossen@colorado.edu

J. BentonDepartment of Computer Science

Arizona State Universitybentonj@asu.edu

http://rakaposhi.eas.asu.edu/yochan/

What is automated planning?

loc1 loc2 loc1 loc2

Initial states0 S

Goals* S

What is automated planning?

loc1 loc2 loc1 loc2

loc1 loc1

Initial states0 S

Goals* S

Action

a = pre, post, prevail

What is automated planning?

loc1 loc2 loc1 loc2

loc1 loc1

Initial states0 S

Goals* S

Action

a = pre, post, prevail

PlanP = a1, …, an

Motivation

• Why heuristics?– Heuristic state space search have been very successful in

solving automated planning problems

• Why optimal planning?– Real-world planning applications require optimal or near-optimal

solutions• The difference between a (near) optimal solution and a feasible

solution may be the difference between winning or losing the interest of an investor or strategic partner

LP-based heuristic

Relax the ordering of the actions

Setup an integer programming formulation

Solve the LP-relaxation and use the objective function value as an admissible distance estimate

Strengthen the formulation by adding valid inequalites

Action selection formulation

• Represent the planning problem as a set of loosely coupled network flow problems– Each state variable defines one network flow problem– Nodes correspond to the state variable values– Arcs correspond to state variable transitions

Simple logistics example

1

2

T

1

2

DTGPackage1

DTGTruck1

Load(p1,t1,l1)

Load(p1,t1,l2)

Unload(p1,t1,l1)

Unload(p1,t1,l2)

Drive(l1,l2) Drive(l2,l1)

Load(p1,t1,l1)Unload(p1,t1,l1)

Load(p1,t1,l1)Unload(p1,t1,l1)

loc1 loc2

Action selection formulation

• Variables– xa Z+, for a A; xa is equal to the number of times action a is

executed

• Objective function– MIN aA xa

• Constraints, for all c C, f Vc

eVc+(f):aAcE(e) xa – eVc–(f):bAcE(e) xb

– xa M eVc+(f):bAcE(e) xb for all f s0[c], a AcV(f)

1 if f s0[c], f = s*[c]–1 if f = s0[c], f s*[c]0 otherwise

No time indicesNo upper bound

Simple logistics example

1

2

T

1

2

DTGPackage1

DTGTruck1

Load(p1,t1,l1)

Load(p1,t1,l2)

Unload(p1,t1,l1)

Unload(p1,t1,l2)

Drive(l1,l2) Drive(l2,l1)

Load(p1,t1,l1)Unload(p1,t1,l1)

Load(p1,t1,l1)Unload(p1,t1,l1)

loc1 loc2

Simple logistics example

Feasible plan

xDrive(l2,l1) = 1xLoad(p1,t1,l1) = 1xDrive(l1,l2) = 1xUnload(p1,t1,l2) = 11

2

T

1

2

DTGPackage1

DTGTruck1

Load(p1,t1,l1)

Load(p1,t1,l2)

Unload(p1,t1,l1)

Unload(p1,t1,l2)

Drive(l1,l2) Drive(l2,l1)

Load(p1,t1,l1)Unload(p1,t1,l1)

Load(p1,t1,l1)Unload(p1,t1,l1)

4

Drive(l2,l1) Load(p1,t1,l1) Drive(l1,l2) Unload(p1,t1,l2)

Simple logistics example

LP solution

xLoad(p1,t1,l1) = 1xUnload(p1,t1,l2) = 1xDrive(l2,l1) = 1/M

1

2

T

1

2

DTGPackage1

DTGTruck1

Load(p1,t1,l1)

Load(p1,t1,l2)

Unload(p1,t1,l1)

Unload(p1,t1,l2)

Drive(l1,l2) Drive(l2,l1)

Load(p1,t1,l1)Unload(p1,t1,l1)

Load(p1,t1,l1)Unload(p1,t1,l1)

2 + 1/M

Drive(l2,l1) Load(p1,t1,l1) Unload(p1,t1,l2)… …

Preliminary resultsProblem LP LP- Lplan h+ hFF Optimallog4-0 16.0* 17 19 19 20log4-1 14.0* 15 17 17 19log4-2 10.0* 11 13 13 15log5-1 12.0* 13 15 15 17log5-2 6.0* 7 8 8 8log6-1 10.0* 11 13 13 14log6-9 18.0* 19 21 21 24log12-0 32.0* 33 39 39 -log15-1 54.0* - 63 66 -freecell2-1 9 9 9 9 9freecell2-2 8 8 8 8 8freecell2-3 8 8 8 9 8freecell2-4 8 8 8 9 8freecell2-5 9 9 9 9 9freecell3-5 12 13 13 14 -freecell13-3 55 - - 95 -freecell13-4 54 - - 94 -freecell13-5 52 - - 94 -driverlog1 3.0* 7 6 8 7driverlog2 12.0* 13 14 15 19driverlog3 8.0* 9 11 11 12driverlog4 11.0* 12 12 15 16driverlog6 8.0* 9 10 10 11driverlog7 11.0* 12 12 15 13driverlog13 15.0* 16 21 26 -driverlog19 60.0* - 89 93 -driverlog20 60.0* - 84 106 -

Preliminary resultsProblem LP LP- Lplan h+ hFF Optimalzenotravel1 1 1 1 1 1zenotravel2 3.0* 5 4 4 6zenotravel3 4.0* 5 5 5 6zenotravel4 5.0* 6 6 6 8zenotravel5 8.0* 9 11 11 11zenotravel6 8.0* 9 11 13 11zenotravel13 18.0* 19 23 23 -zenotravel19 46.0* - 62 63 -zenotravel20 50.0* - - 69 -tpp1 3.0* 5 4 4 5tpp2 6.0* 7 7 7 8tpp3 9.0* 10 10 10 11tpp4 12.0* 13 13 13 14tpp5 15.0* 17 17 17 19tpp6 21.0* 23 21 21 -tpp28 150.0* - - 88 -tpp29 - - - 104 -tpp30 174.0* - - 101 -bw-sussman 4 6 5 5 6bw-12step 4 8 4 7 12bw-large-a 12 12 12 12 12bw-large-b 16 18 16 16 18

Strengthening techniques

• Composition of state variables (i.e. fluent merging)– Given the domain transition graph (DTG) of two state variables

c1, c2, the composition of DTGc1 and DTGc2 is the domain transition graph DTGc1||c2 = (Vc1||c2, Ec1||c2) where

– Vc1||c2 = Vc1 Vc2

– ((f1,g1),(f2,g2)) Ec1||c2 if f1,f2 Vc1, g1,g2 Vc2 and there exists an action a A such that one of the following conditions hold

• pre[c1] = f1, post[c1] = f2, and pre[c2] = g1, post[c2] = g2

• pre[c1] = f1, post[c1] = f2, and prevail[c2] = g1, g1 = g2

• pre[c1] = f1, post[c1] = f2, and g1= g2

The term composition is also used in model checking to define the parallel composition or the synchronized product of automata

[Cassandras & Lafortune, 1999]

Example

• Two DTGs and their composition

f3

f2

f1

g2

g1

b

c

d

DTGc1 DTGc2

a

b

f1,g2

f2,g1

f2,g2

f3,g1

f3,g2

f1,,g1

DTGc1 || c2

a

a

b

c

c

d

d

Example

• Two DTGs and their composition– Small in-arcs denote the initial state– Double circles denote the goal

f3

f2

f1

g2

g1

b

c

d

DTGc1 DTGc2

a

b

f1,g2

f2,g1

f2,g2

f3,g1

f1,,g1

DTGc1 || c2

a

a

b

c

c

d

d

Simple logistics example

loc1 loc2

1,1

1,T

2,T

2,2

1,2

2,1

DTGTruck1 || Package1

Drive(l1,l2)

Drive(l2,l1)

Load(p1,t1,l1)

Load(p1,t1,l2)

Unload(p1,t1,l1)

Unload(p1,t1,l2)

Drive(l1,l2)

Drive(l2,l1)

Drive(l1,l2)Drive(l2,l1)

Simple logistics example

1,1

1,T

2,T

2,2

1,2

2,1

DTGTruck1 || Package1

LP solution

xDrive(l2,l1) = 1xLoad(p1,t1,l1) = 1xDrive(l1,l2) = 1xUnload(p1,t1,l2) = 1

4

Drive(l2,l1) Load(p1,t1,l1) Drive(l1,l2) Unload(p1,t1,l2)

Drive(l1,l2)

Drive(l2,l1)

Load(p1,t1,l2)

Unload(p1,t1,l1)

Unload(p1,t1,l2)

Drive(l1,l2)

Drive(l2,l1)

Drive(l1,l2)Drive(l2,l1)

Another example

• Two DTGs and their composition

f3

f2

f1

g3

g2

g1

f1,g2

f1,g3

f2,g1

f2,g2f2,g3

f3,g1

f3,g2

f3,g3

f1,,g1

DTGc1 DTGc2 DTGc1 || c2

Another example

• Two DTGs and their composition– Solution to the individual state variables

f3

f2

f1

g3

g2

g1

f1,g2

f1,g3

f2,g1

f2,g2f2,g3

f3,g1

f3,g2

f3,g3

f1,,g1

b

a

a

b

DTGc1 DTGc2 DTGc1 || c2

Another example

• Two DTGs and their composition– Solution to the individual state variables represented in the

composed state variable

f3

f2

f1

g3

g2

g1

f1,g2

f1,g3

f2,g1

f2,g2f2,g3

f3,g1

f3,g2

f3,g3

f1,,g1

b

a

a

b

DTGc1 DTGc2 DTGc1 || c2

b

a

Another example

• Two DTGs and their composition– Solution to the individual state variables represented in the

composed state variable

f3

f2

f1

g3

g2

g1

f1,g2

f1,g3

f2,g1

f2,g2f2,g3

f3,g1

f3,g2

f3,g3

f1,,g1

b

a

a

b

DTGc1 DTGc2 DTGc1 || c2

b

a

Violates balance of flow constraints

Another example

• Two DTGs and their composition– Adding new balance of flow constraints strengthens the

formulation

f3

f2

f1

g3

g2

g1

f1,g2

f1,g3

f2,g1

f2,g2f2,g3

f3,g1

f3,g2

f3,g3

f1,,g1

b

a

a

b

DTGc1 DTGc2 DTGc1 || c2

b

a

c

c

e

dd

e

Identifying mergeable fluents

• When should we create a composition of two or more state variables?– Look at the causal graph– Look at the actions that introduce dependencies in the causal

graph

Person 1 Person 2

Airplane 1 Airplane 2

Fuel 1 Fuel 2

Person 1 Person 2

Airplane 1Fuel1

Airplane 2Fuel2

Experimental setup

• Objective– Minimize number of actions

• Domains– Selected domains from the International Planning Competition

• Logistics

• Freecell

• Driverlog

• Zenotravel

• TPP

• Blocksworld

• Resources– 2.67Ghz Linux machine– 1GB memory– 15 minutes runtime– CPLEX 10.0

Experimental setup

• Distance estimates– LP

• Action selection formulation with strengthening

– LP–

• Action selection formulation without strengthening

– Lplan• Step based integer programming formulation by Lplan [Bylander, 1997]

– h+

• Optimal relaxed plan when the delete effects are ignored

– hFF

• Inadmissible but efficient relaxed plan heuristic by FF [Hoffmann, and Nebel, 2001]

– Optimal• Optimal distance estimate given by Satplanner using the –opt flag

[Rintanen, Heljanko, and Niemela, 2005]

Experimental resultsProblem LP LP- Lplan h+ hFF Optimallog4-0 20 16.0* 17 19 19 20log4-1 19 14.0* 15 17 17 19log4-2 15 10.0* 11 13 13 15log5-1 17 12.0* 13 15 15 17log5-2 8 6.0* 7 8 8 8log6-1 14 10.0* 11 13 13 14log6-9 24 18.0* 19 21 21 24log12-0 42 32.0* 33 39 39 -log15-1 67 54.0* - 63 66 -freecell2-1 9 9 9 9 9 9freecell2-2 8 8 8 8 8 8freecell2-3 8 8 8 8 9 8freecell2-4 8 8 8 8 9 8freecell2-5 9 9 9 9 9 9freecell3-5 12 12 13 13 14 -freecell13-3 55 55 - - 95 -freecell13-4 54 54 - - 94 -freecell13-5 52 52 - - 94 -driverlog1 7 3.0* 7 6 8 7driverlog2 19 12.0* 13 14 15 19driverlog3 11 8.0* 9 11 11 12driverlog4 15.5 11.0* 12 12 15 16driverlog6 11 8.0* 9 10 10 11driverlog7 13 11.0* 12 12 15 13driverlog13 24 15.0* 16 21 26 -driverlog19 96.6* 60.0* - 89 93 -driverlog20 89.5* 60.0* - 84 106 -

Experimental resultsProblem LP LP- Lplan h+ hFF Optimalzenotravel1 1 1 1 1 1 1zenotravel2 6 3.0* 5 4 4 6zenotravel3 6 4.0* 5 5 5 6zenotravel4 8 5.0* 6 6 6 8zenotravel5 11 8.0* 9 11 11 11zenotravel6 11 8.0* 9 11 13 11zenotravel13 24 18.0* 19 23 23 -zenotravel19 66.2* 46.0* - 62 63 -zenotravel20 68.3* 50.0* - - 69 -tpp1 5 3.0* 5 4 4 5tpp2 8 6.0* 7 7 7 8tpp3 11 9.0* 10 10 10 11tpp4 14 12.0* 13 13 13 14tpp5 19 15.0* 17 17 17 19tpp6 25 21.0* 23 21 21 -tpp28 - 150.0* - - 88 -tpp29 - - - - 104 -tpp30 - 174.0* - - 101 -bw-sussman 4 4 6 5 5 6bw-12step 4 4 8 4 7 12bw-large-a 12 12 12 12 12 12bw-large-b 16 16 18 16 16 18

Distance estimates from the initial state to the goal (highlighted values equal the optimal distance)

Experimental results

• Heuristic calculation time

0.01

0.1

1

10

100

1000lp

lp-

lplan

h+

Logistics Freecell Driverlog Zenotravel TPP Blocks

Conclusions and future work

• LP-based heuristic that respects delete effects, but ignores action ordering shows very promising results– Finds the optimal distance estimate in several problem instances– Can be used to calculate admissible distance estimates for

various optimization problems in planning– Ongoing work successfully incorporated our LP-based heuristic

in a search algorithm that solves oversubscription planning

• Interesting directions for future work– Apply fluent merging more aggressively– Extend the formulation into a complete planning system

LP-based heuristic

Relax the ordering of the actions

Setup an integer programming formulation

Solve the LP-relaxation and use the objective function value as an admissible distance estimate

Strengthen the formulation by adding valid inequalites

top related