Top Banner
1 Dana Nau and Vikas Shivashankar: Lecture slides for Automated Planning and Ac0ng Updated 3/24/15 This work is licensed under a CreaCve Commons AFribuCon NonCommercial ShareAlike 4.0 InternaConal License . Chapter 2 Delibera.on with Determinis.c Models Dana S. Nau and Vikas Shivashankar University of Maryland
81

Chapter02

Apr 13, 2017

Download

Documents

Tianlu Wang
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Chapter02

1  Dana  Nau  and  Vikas  Shivashankar:  Lecture  slides  for  Automated  Planning  and  Ac0ng   Updated  3/24/15  

This  work  is  licensed  under  a  CreaCve  Commons  AFribuCon-­‐NonCommercial-­‐ShareAlike  4.0  InternaConal  License.  

Chapter  2    Delibera.on  with  Determinis.c  Models  

Dana S. Nau and Vikas Shivashankar

University of Maryland

Page 2: Chapter02

2  Dana  Nau  and  Vikas  Shivashankar:  Lecture  slides  for  Automated  Planning  and  Ac0ng   Updated  3/24/15  

Purpose  of  this  Chapter  ●  Last time, Vikas mentioned conventional AI planning

Ø  Given •  a domain model (descriptions of the states and actions) •  initial state s0, and goal g

Ø  Find a plan or a policy that •  is executable starting in s0 •  produces a state that satisfies g

●  This chapter discusses some techniques for doing that Ø  Also, how to use those techniques in acting systems

●  Outline: Ø  2a: Representing planning domains Ø  2b: Planning algorithms Ø  2c: Acting and planning

Page 3: Chapter02

3  Dana  Nau  and  Vikas  Shivashankar:  Lecture  slides  for  Automated  Planning  and  Ac0ng   Updated  3/24/15  

This  work  is  licensed  under  a  CreaCve  Commons  AFribuCon-­‐NonCommercial-­‐ShareAlike  4.0  InternaConal  License.  

Chapter  2    Delibera.on  with  Determinis.c  Models  

 2a:  Represen.ng  Planning  Domains  

Dana S. Nau and Vikas Shivashankar

University of Maryland

Page 4: Chapter02

4  Dana  Nau  and  Vikas  Shivashankar:  Lecture  slides  for  Automated  Planning  and  Ac0ng   Updated  3/24/15  

Domain  Model  ●  Planning domain: an abstract model of the environment

Ø  Many different kinds of environments, various ways to model them

●  In this chapter, the model is a deterministic state-transition system Ø  Σ = (S,A,γ)

•  S is a finite set of states ▸  States of the world

•  A is a finite set of actions ▸  Things an actor can do

•  γ: S × A → S is a prediction function (or state-transition function) ▸  Given a state s and action a, γ(s,a) is another state

•  Prediction of what state will be produced by executing a in s •  γ is partial: γ(s,a) is undefined if a is inapplicable in s

▸  Dom(a) = {s ∈ S | γ(s,a) is defined} = {states where a is applicable} ▸  Range(a) = {γ(s,a) | s ∈ Dom(a)}

Page 5: Chapter02

5  Dana  Nau  and  Vikas  Shivashankar:  Lecture  slides  for  Automated  Planning  and  Ac0ng   Updated  3/24/15  

Implicit  Assump.ons  ●  The state-transition model incorporates the following assumptions

Ø  Static world •  Changes occur only in response to the actor’s actions

Ø  Perfect information •  Actor always has all the information it needs

Ø  Instantaneous actions •  Each action causes an instantaneous transition from one state to the next

Ø  Determinism •  Actions are deterministic

Ø  Correct prediction function •  Outcome of action a in state s is always γ(s,a)

Ø  Flat search space •  Only one level of abstraction; ignore how to refine actions at a lower level

Page 6: Chapter02

6  Dana  Nau  and  Vikas  Shivashankar:  Lecture  slides  for  Automated  Planning  and  Ac0ng   Updated  3/24/15  

How  to  Represent  Σ?  

●  If the domain is small enough Ø  Give each state and action a

unique name Ø  For each s and a, store γ(s,a)

in a lookup table

loc1

loc3

loc2

loc6

loc5loc4 loc7

loc8

loc0

x

y

4

3

2

1

1 2 3 4 5 60

loc9

●  If a domain is larger, don’t represent all states explicitly Ø  Have a formalism for describing states by describing their properties Ø  Represent each action by describing how it changes those properties Ø  Start with initial state, use actions to produce other states

Page 7: Chapter02

7  Dana  Nau  and  Vikas  Shivashankar:  Lecture  slides  for  Automated  Planning  and  Ac0ng   Updated  3/24/15  

Determinis.c  Operator  (General  Form)  ●  Domain-specific format for representing states

Ø  Invent your own format ●  General form of a deterministic operator:

Ø  o = (head(o), pre(o), eff(o), cost(o)) •  head(o): name and parameter list •  pre(o): preconditions

▸  Computational tests to predict whether an action can be performed in a state s

▸  In principle, should be necessary/sufficient for the action to run without error

•  eff(o): effects ▸  Procedures that assign new values to some of the state variables

•  cost(o): procedure that returns a number ▸  Can be omitted, in which case cost(o) = 1 ▸  Could represent monetary cost, time required, something else

Page 8: Chapter02

8  Dana  Nau  and  Vikas  Shivashankar:  Lecture  slides  for  Automated  Planning  and  Ac0ng   Updated  3/24/15  

Example  ●  Suppose we want to plan how to create a metal hole in the workpiece ●  a state s includes

Ø  geometric model of the workpiece, variables describing its location, orientation, and other status information,

Ø  capabilities and status of drilling machine and drill ●  Several actions (getting the workpiece onto the machine, clamping it, loading a

drill bit, etc.) Ø  Next slide: the drilling operation itself

Page 9: Chapter02

9  Dana  Nau  and  Vikas  Shivashankar:  Lecture  slides  for  Automated  Planning  and  Ac0ng   Updated  3/24/15  

Drilling  Ac.on  ●  head(o) = drill-­‐hole(m, w, b, x1, …, xn)

Ø  m, w, b are the names or ID numbers of the drilling machine, the workpiece, and the drill bit

Ø  x1, …, xn are specifications of the hole’s location, orientation, depth, and required machining tolerances

●  pre(o): computational tests Ø  Can the drilling machine and drill bit produce a hole having the desired

geometry and machining tolerances? Ø  Is the drill loaded into the drilling machine? Is the workpiece is properly

clamped onto the drilling platform? Etc. ●  eff(o): geometric computation

Ø  Modify the geometric model of the workpiece to include a hole having the desired specifications.

●  cost(o) Ø  could be an estimate of the time required for the drilling operation Ø  could be a cost estimate based on time + other criteria

Page 10: Chapter02

10  Dana  Nau  and  Vikas  Shivashankar:  Lecture  slides  for  Automated  Planning  and  Ac0ng   Updated  3/24/15  

State-­‐Variable  Representa.on  ●  Represent each state as a collection of properties of various objects

Ø  Represent each action by describing how it changes those properties ●  Let B = {all objects that matter for planning}

Ø  B may be classified into subsets: various kinds of objects ●  Example

Ø  B = Robots ∪ Containers ∪ Locs ∪ Booleans •  Robots = {r1,  r2} •  Containers = {c1,  c2} •  Locs = {d1,  d2,  d3} •  Booleans = {T,  F}  

                         d2                            d1  

d3

r1  c1  

r2  c2  

Page 11: Chapter02

11  Dana  Nau  and  Vikas  Shivashankar:  Lecture  slides  for  Automated  Planning  and  Ac0ng   Updated  3/24/15  

Proper.es  of  Objects  ●  Define ways to represent properties of objects

Ø  Two kinds of properties: rigid and varying ●  A property is rigid if it stays the same in every state

Ø  Represent as a mathematical relation ●  Example:

Ø  adjacent = {(d1,d2), (d2,d1)} Ø  Can also write as

•  adjacent(d1,d2)  •  adjacent(d2,d1)    

                         d2                            d1  

d3

r1  c1  

r2  c2  

Page 12: Chapter02

12  Dana  Nau  and  Vikas  Shivashankar:  Lecture  slides  for  Automated  Planning  and  Ac0ng   Updated  3/24/15  

Varying  Proper.es  ●  A property is varying if it may differ in different states

Ø  Represent using a state variable that we can assign a value to Ø  Each state variable x has a range (set of possible values), Range(x) Ø  For each state s, s(x) ∈ Range(x) is x’s value in state s

●  Example: Ø  what we want to represent Ø  Each robot can hold at most one container  Ø  Each robot is at a one of the locations Ø  Each container is on a robot or at one of the locations

                         d2                            d1  

d3

r1  c1  

r2  c2  

Page 13: Chapter02

13  Dana  Nau  and  Vikas  Shivashankar:  Lecture  slides  for  Automated  Planning  and  Ac0ng   Updated  3/24/15  

A  Simple  Example  ●  Set of all state variables

Ø  X = {loc(r1), loaded(r1), loc(c1), loc(r2), loaded(r2), loc(c2)}

●  Range(loc(r1)) = Locs ●  Range(loc(r2)) = Locs

●  Range(loaded(r1)) = Booleans ●  Range(loaded(r2)) = Booleans

●  Range(loc(c1)) = Robots ∪ Locs ●  Range(loc(c2)) = Robots ∪ Locs

Why have both loc and loaded?

                         d2                            d1  

d3

r1  c1  

r2  c2  

Page 14: Chapter02

14  Dana  Nau  and  Vikas  Shivashankar:  Lecture  slides  for  Automated  Planning  and  Ac0ng   Updated  3/24/15  

States  as  Func.ons  ●  Write each state as a function s that maps each x ∈ X to a value in Range(x)

s1(loc(r1))  =  d1,          s1(loaded(r1))  =  F,          s1(loc(c1))  =  d1,                s1(loc(r2))  =  d2,          s1(loaded(r2))  =  F,          s1(loc(c2))  =  d2 Ø  Let S = the set of all such functions

●  Functions are sets of ordered pairs s1 = {(loc(r1),  d1),    (loaded(r1),  F),    (loc(c1),  d1), …}

●  Rewrite as s1 = {loc(r1)  =  d1,      loaded(r1)  =  F,      loc(c1)  =  d1,      

 loc(r2)  =  d2,      loaded(r2)  =  F,      loc(c2)  =  d2}

                         d2                            d1  

d3

r1  c1  

r2  c2  

Page 15: Chapter02

15  Dana  Nau  and  Vikas  Shivashankar:  Lecture  slides  for  Automated  Planning  and  Ac0ng   Updated  3/24/15  

●  Just because a function maps each state variable x into a value in Range(x), this doesn’t make it a state Ø  Some sets of state-variable values may not make any sense as states

s = {loaded(r1) = F, loc(c1) = r1, …} s = {loc(c1) = d1, loc(c2) = r1, …}

●  Need restrictions on what sets of state-variable values constitute real states Ø  In most of the book, we won’t represent such restrictions explicitly Ø  Just write the action models in such a way that none of them will ever

produce such a set of assignments

Discussion  

                         d2                            d1  

d3

r1  c1  

r2  c2  

Page 16: Chapter02

16  Dana  Nau  and  Vikas  Shivashankar:  Lecture  slides  for  Automated  Planning  and  Ac0ng   Updated  3/24/15  

Descrip.ve  Ac.on  Models  ●  Actions often fall into closely related classes ●  For each class, write a parameterized schema called a planning operator to

describe all the actions in that class action move(r, l, m)

Pre: loc(r) = l, adjacent(l, m) Eff: loc(r) ← m

action take(r, l, c) Pre: loaded(r) = F,  loc(r) = l, loc(c) = l,    Eff: loaded(r) ← T, loc(c) ← r

action put(r, l, c) Pre: loc(r) = l, loc(c) = r Eff: loaded(r) ← F, loc(c) ← l

Each parameter has a range of possible values, e.g., Range(r) = Robots

                         d2                            d1  

d3

r1  c1  

r2  c2  

Page 17: Chapter02

17  Dana  Nau  and  Vikas  Shivashankar:  Lecture  slides  for  Automated  Planning  and  Ac0ng   Updated  3/24/15  

CSV  Operator  ●  Classical State Variable (CSV) Operator:

•  the kind of operator shown on the previous page ●  General form

Ø  o = (head(o), pre(o), eff(o), cost(o)) •  each precondition in pre(o) must have one of these forms:

▸  relname(t1,…,tk) varname(t1,…,tk) = t0 ¬relname(t1,…,tk) varname(t1,…,tk) ≠ t0 •  relname = name of a rigid relation •  varname = name of a state variable •  each ti must be a constant (i.e., a member of B) or a parameter

•  each effect in eff(o) must have the form varname(t1,…,tk) ← t0 •  if cost(o) isn’t omitted, it must be a nonnegative number (not a formula)

●  Limited representational capability, but easy to compute, easy to reason about Ø  Many algorithms have been written to use this kind of operator

Page 18: Chapter02

18  Dana  Nau  and  Vikas  Shivashankar:  Lecture  slides  for  Automated  Planning  and  Ac0ng   Updated  3/24/15  

CSV  Ac.ons  

●  A CSV action is an instance of a planning operator Ø  assign values to parameters

action move(r1,d1,d2)          Pre: loc(r1)  =  d1, adjacent(d1,d2) Eff: loc(r1) ← d2   action take(r2,d2,c2)          Pre: loaded(r2)  =  F,  loc(r2)  =  d2,                              loc(c2)  =  d2,            Eff: loaded(r2)  ← T,  loc(c1)  ← r2    

action put(r1,d1,c1)          Pre: loc(r1)  =  d1,  loc(c1)  =  r1          Eff: loaded(r1)  ← F,  loc(c1)  ← d1

                         d2                            d1  

d3

r1  c1  

r2  c2  

Page 19: Chapter02

19  Dana  Nau  and  Vikas  Shivashankar:  Lecture  slides  for  Automated  Planning  and  Ac0ng   Updated  3/24/15  

Compu.ng  γ  

●  s1 = {loaded(r1)  =  F,  loaded(r2)  =  F,    loc(r1)  =  d1,  loc(r2)  =  d2,    

                   loc(c1)  =  d1,    loc(c2)  =  d2}

●  action take(r2,d2,c2)          Pre: loaded(r2)  =  F,  loc(r2)  =  d2,                              loc(c2)  =  d2,            Eff: loaded(r2)  ← T,  loc(c1)  ← r2      

●  γ(s1, take(r2,d2,c2)) = {loaded(r1)  =  F,  loaded(r2)  =  T,      loc(r1)  =  d1,  loc(r2)  =  d2,    loc(c1)  =  d1,  loc(c2)  =  r2}

 

                         d2                            d1  

d3

r1  c1  

r2  c2  

                         d2                            d1  

d3

r1  c1  

r2   c2  

Page 20: Chapter02

20  Dana  Nau  and  Vikas  Shivashankar:  Lecture  slides  for  Automated  Planning  and  Ac0ng   Updated  3/24/15  

Plans  

●  Plan: a sequence of actions π = 〈a1,a2,…,an〉 Ø  π is applicable in s0 if the actions

can be applied in the order given •  γ (s0,a1) = s1, γ (s1,a2) = s2, …, γ (sn–1,an) = sn

●  If π is applicable, define γ (s0,π) = sn

●  Example π = 〈take(r2,d2,c2),    move(r2,d2,d1),    put(r2,d1,c2)〉

                         d2                            d1  

d3

c1    r2    

c2    

                         d2                            d1  

d3

c1    c2    

s0:

γ (s0,π):

r2    r1    

r1    

Page 21: Chapter02

21  Dana  Nau  and  Vikas  Shivashankar:  Lecture  slides  for  Automated  Planning  and  Ac0ng   Updated  3/24/15  

Planning  Problems,  Solu.ons  ●  Planning problem: P=(Σ,s0,g)

Ø  Σ = planning domain Ø  s0 = initial state Ø  g = {g1,…,gk} is a set of constraints called the goal

●  π is a solution for P if γ (s0,π) satisfies g

Ø  π is a shortest solution if no shorter plan is also a solution Ø  π is a minimal solution if no proper subsequence of π is also a solution

●  CSV planning problem: Ø  Σ is a CSV planning domain, Ø  g has the same form as a CSV operator’s preconditions

●  Example: Ø  s0 and π as on previous page

Ø  g = {loc(r2)=d1,  loc(c2)=d1}

Page 22: Chapter02

22  Dana  Nau  and  Vikas  Shivashankar:  Lecture  slides  for  Automated  Planning  and  Ac0ng   Updated  3/24/15  

Classical  and  State-­‐Variable  Representa.ons  ●  Motivation

Ø  The field of AI planning started out as automated theorem proving Ø  It still uses a lot of that notation

●  Classical representation Ø  Equivalent to CSV representation Ø  Represents both rigid and varying properties using logical predicates

•  adjacent(l,m) - location l is adjacent to location m •  loc(r,l) - robot r is at location l •  loc(c,l), loc(c,r) - container c is at location l or on robot r •  loaded(r) - there is a container on robot r

                         d2                            d1  

d3

r1  c1  

r2  c2  

Page 23: Chapter02

23  Dana  Nau  and  Vikas  Shivashankar:  Lecture  slides  for  Automated  Planning  and  Ac0ng   Updated  3/24/15  

States  ●  Use ground atoms to represent both rigid and varying properties of Σ ●  To represent a state

Ø  s = {all ground atoms that are true in ŝ}

Ø  e.g., s1 = {loc(c1,d1),  loc(c2,d2),    loc(r1,d1),  loc(r2,d2),        adjacent(d1,d2),  adjacent(d2,d1),      adjacent(d1,d3),  adjacent(d3,d1)}

                         d2                            d1  

d3

r1  c1  

r2  c2  

Page 24: Chapter02

24  Dana  Nau  and  Vikas  Shivashankar:  Lecture  slides  for  Automated  Planning  and  Ac0ng   Updated  3/24/15  

Classical  Planning  Operators  ●  Classical planning operator:

Ø  o = (head(o), pre(o), eff(o)) ●  Each precondition and effect

must have one of these forms: pred(t1,…,tk) ¬pred(t1,…,tk) Ø  pred = predicate name Ø  each ti must be a constant

(i.e., member of B) or a parameter

●  Classical action: a ground instance of a classical operator

move(r,l,m)  Precond: loc(r,l),    Effects: ¬loc(r,l),  loc(r,m)    

take(r,l,c)  Precond: loc(r,l),  loc(c,l),  ¬loaded(r)  Effects: loc(c,r),  ¬loc(c,l),  loaded(r)  

put(r,l,c)  Precond: loc(r,l),  loc(c,r)  Effects: loc(c,l),  ¬loc(c,r),  ¬loaded(r)

                         d2                            d1  

d3

r1  c1  

r2  c2  

Page 25: Chapter02

25  Dana  Nau  and  Vikas  Shivashankar:  Lecture  slides  for  Automated  Planning  and  Ac0ng   Updated  3/24/15  

Discussion  ●  Classical representation is equivalent to state-variable representation in

expressive power Ø  Each can be converted to the other in linear time and space

●  Classical representation is more natural for logicians ●  CSV is more natural for engineers and most computer scientists

Ø  When changing a value, you don’t have to explicitly delete the old one ●  Historically, classical representation has been more widely used

Ø  Many of the algorithms in the book were originally written to use classical representation

Ø  That’s starting to change

Classical rep.

CSV rep.

P(b1,…,bk) becomes xP(b1,…,bk)=1

x(b1,…,bn)=b0 becomes

Px(b1,…,bn,b0)

Page 26: Chapter02

26  Dana  Nau  and  Vikas  Shivashankar:  Lecture  slides  for  Automated  Planning  and  Ac0ng   Updated  3/24/15  

This  work  is  licensed  under  a  CreaCve  Commons  AFribuCon-­‐NonCommercial-­‐ShareAlike  4.0  InternaConal  License.  

Chapter  2    Delibera.on  with  Determinis.c  Models  

 2b:  Planning  Algorithms  

Dana S. Nau and Vikas Shivashankar

University of Maryland

Page 27: Chapter02

27  Dana  Nau  and  Vikas  Shivashankar:  Lecture  slides  for  Automated  Planning  and  Ac0ng   Updated  3/24/15  

Mo.va.on  

●  Nearly all planning procedures are search procedures ●  Different planning procedures have different search spaces

Ø  This section: state-space planning •  Each node represents a state of the world

▸  A plan is a path through the space

Page 28: Chapter02

28  Dana  Nau  and  Vikas  Shivashankar:  Lecture  slides  for  Automated  Planning  and  Ac0ng   Updated  3/24/15  

Forward  Search  

Which forward-search algorithm depends on how you implement the nondeterministic choice I’ll discuss several such algorithms

Page 29: Chapter02

29  Dana  Nau  and  Vikas  Shivashankar:  Lecture  slides  for  Automated  Planning  and  Ac0ng   Updated  3/24/15  

Depth-­‐First  Search  ●  At each state, select the action that has the lowest heuristic-functionvalue ●  Visited is for cycle-checking

Ø  If you come to a state you’ve already seen on the current path, then backtrack Ø  In a finite domain, this guarantees termination

●  Guaranteed to find a solution if one exists

Page 30: Chapter02

30  Dana  Nau  and  Vikas  Shivashankar:  Lecture  slides  for  Automated  Planning  and  Ac0ng   Updated  3/24/15  

Greedy  Search  ●  Like DFFS, but never backtracks

Ø  Not guaranteed to find a solution

Page 31: Chapter02

31  Dana  Nau  and  Vikas  Shivashankar:  Lecture  slides  for  Automated  Planning  and  Ac0ng   Updated  3/24/15  

●  Requires a heuristic function in line 3 Ø  Chooses a node (π,s) in Fringe having the smallest value for cost(π) + h(s) Ø  Expands the node (computes nodes for all applicable actions) Ø  Prunes nodes that can be shown to be no better than nodes already expanded

A*  

Page 32: Chapter02

32  Dana  Nau  and  Vikas  Shivashankar:  Lecture  slides  for  Automated  Planning  and  Ac0ng   Updated  3/24/15  

●  Requires a heuristic function in line 3 Ø  Chooses a node (π,s) in Fringe having the smallest value for cost(π) + h(s) Ø  Expands the node (computes nodes for all applicable actions) Ø  Prunes nodes that can be shown to be no better than nodes already expanded

A*  

Page 33: Chapter02

33  Dana  Nau  and  Vikas  Shivashankar:  Lecture  slides  for  Automated  Planning  and  Ac0ng   Updated  3/24/15  

Bucharest

Giurgiu

Urziceni

Hirsova

Eforie

NeamtOradea

Zerind

Arad

Timisoara

LugojMehadia

DobretaCraiova

Sibiu

Fagaras

PitestiRimnicu Vilcea

Vaslui

Iasi

Straight−line distanceto Bucharest

0160242161

77151

241

366

193

178

253329

80199

244

380

226

234

374

98

Giurgiu

UrziceniHirsova

Eforie

Neamt

Oradea

Zerind

Arad

Timisoara

Lugoj

Mehadia

DobretaCraiova

Sibiu Fagaras

Pitesti

Vaslui

Iasi

Rimnicu Vilcea

Bucharest

71

75

118

111

70

75120

151

140

99

80

97

101

211

138

146 85

90

98

142

92

87

86

Arad366=0+366

Page 34: Chapter02

34  Dana  Nau  and  Vikas  Shivashankar:  Lecture  slides  for  Automated  Planning  and  Ac0ng   Updated  3/24/15  

Bucharest

Giurgiu

Urziceni

Hirsova

Eforie

NeamtOradea

Zerind

Arad

Timisoara

LugojMehadia

DobretaCraiova

Sibiu

Fagaras

PitestiRimnicu Vilcea

Vaslui

Iasi

Straight−line distanceto Bucharest

0160242161

77151

241

366

193

178

253329

80199

244

380

226

234

374

98

Giurgiu

UrziceniHirsova

Eforie

Neamt

Oradea

Zerind

Arad

Timisoara

Lugoj

Mehadia

DobretaCraiova

Sibiu Fagaras

Pitesti

Vaslui

Iasi

Rimnicu Vilcea

Bucharest

71

75

118

111

70

75120

151

140

99

80

97

101

211

138

146 85

90

98

142

92

87

86

Zerind

Arad

Sibiu Timisoara447=118+329 449=75+374393=140+253

Page 35: Chapter02

35  Dana  Nau  and  Vikas  Shivashankar:  Lecture  slides  for  Automated  Planning  and  Ac0ng   Updated  3/24/15  

Bucharest

Giurgiu

Urziceni

Hirsova

Eforie

NeamtOradea

Zerind

Arad

Timisoara

LugojMehadia

DobretaCraiova

Sibiu

Fagaras

PitestiRimnicu Vilcea

Vaslui

Iasi

Straight−line distanceto Bucharest

0160242161

77151

241

366

193

178

253329

80199

244

380

226

234

374

98

Giurgiu

UrziceniHirsova

Eforie

Neamt

Oradea

Zerind

Arad

Timisoara

Lugoj

Mehadia

DobretaCraiova

Sibiu Fagaras

Pitesti

Vaslui

Iasi

Rimnicu Vilcea

Bucharest

71

75

118

111

70

75120

151

140

99

80

97

101

211

138

146 85

90

98

142

92

87

86

Zerind

Arad

Sibiu

Arad

Timisoara

Rimnicu VilceaFagaras Oradea

447=118+329 449=75+374

646=280+366 413=220+193415=239+176 671=291+380

Page 36: Chapter02

36  Dana  Nau  and  Vikas  Shivashankar:  Lecture  slides  for  Automated  Planning  and  Ac0ng   Updated  3/24/15  

Bucharest

Giurgiu

Urziceni

Hirsova

Eforie

NeamtOradea

Zerind

Arad

Timisoara

LugojMehadia

DobretaCraiova

Sibiu

Fagaras

PitestiRimnicu Vilcea

Vaslui

Iasi

Straight−line distanceto Bucharest

0160242161

77151

241

366

193

178

253329

80199

244

380

226

234

374

98

Giurgiu

UrziceniHirsova

Eforie

Neamt

Oradea

Zerind

Arad

Timisoara

Lugoj

Mehadia

DobretaCraiova

Sibiu Fagaras

Pitesti

Vaslui

Iasi

Rimnicu Vilcea

Bucharest

71

75

118

111

70

75120

151

140

99

80

97

101

211

138

146 85

90

98

142

92

87

86

Zerind

Arad

Sibiu

Arad

Timisoara

Fagaras Oradea

447=118+329 449=75+374

646=280+366 415=239+176Rimnicu Vilcea

Craiova Pitesti Sibiu526=366+160 553=300+253417=317+100

671=291+380

X  X  

Page 37: Chapter02

37  Dana  Nau  and  Vikas  Shivashankar:  Lecture  slides  for  Automated  Planning  and  Ac0ng   Updated  3/24/15  

Bucharest

Giurgiu

Urziceni

Hirsova

Eforie

NeamtOradea

Zerind

Arad

Timisoara

LugojMehadia

DobretaCraiova

Sibiu

Fagaras

PitestiRimnicu Vilcea

Vaslui

Iasi

Straight−line distanceto Bucharest

0160242161

77151

241

366

193

178

253329

80199

244

380

226

234

374

98

Giurgiu

UrziceniHirsova

Eforie

Neamt

Oradea

Zerind

Arad

Timisoara

Lugoj

Mehadia

DobretaCraiova

Sibiu Fagaras

Pitesti

Vaslui

Iasi

Rimnicu Vilcea

Bucharest

71

75

118

111

70

75120

151

140

99

80

97

101

211

138

146 85

90

98

142

92

87

86

Zerind

Arad

Sibiu

Arad

Timisoara

Sibiu Bucharest

Rimnicu VilceaFagaras Oradea

Craiova Pitesti Sibiu

447=118+329 449=75+374

646=280+366

591=338+253 450=450+0 526=366+160 553=300+253417=317+100

671=291+380

X  X  X  

Page 38: Chapter02

38  Dana  Nau  and  Vikas  Shivashankar:  Lecture  slides  for  Automated  Planning  and  Ac0ng   Updated  3/24/15  

Bucharest

Giurgiu

Urziceni

Hirsova

Eforie

NeamtOradea

Zerind

Arad

Timisoara

LugojMehadia

DobretaCraiova

Sibiu

Fagaras

PitestiRimnicu Vilcea

Vaslui

Iasi

Straight−line distanceto Bucharest

0160242161

77151

241

366

193

178

253329

80199

244

380

226

234

374

98

Giurgiu

UrziceniHirsova

Eforie

Neamt

Oradea

Zerind

Arad

Timisoara

Lugoj

Mehadia

DobretaCraiova

Sibiu Fagaras

Pitesti

Vaslui

Iasi

Rimnicu Vilcea

Bucharest

71

75

118

111

70

75120

151

140

99

80

97

101

211

138

146 85

90

98

142

92

87

86

Zerind

Arad

Sibiu

Arad

Timisoara

Sibiu Bucharest

Rimnicu VilceaFagaras Oradea

Craiova Pitesti Sibiu

Bucharest Craiova Rimnicu Vilcea418=418+0

447=118+329 449=75+374

646=280+366

591=338+253 450=450+0 526=366+160 553=300+253

615=455+160 607=414+193

671=291+380

X  X  X  

X  

Page 39: Chapter02

39  Dana  Nau  and  Vikas  Shivashankar:  Lecture  slides  for  Automated  Planning  and  Ac0ng   Updated  3/24/15  

●  Requires a heuristic function in line 3 Ø  Chooses a node (π,s) in Fringe having the smallest value for cost(π) + h(s) Ø  Expands the node (computes nodes for all applicable actions) Ø  Prunes nodes that can be shown to be no better than nodes already expanded

A*  

Page 40: Chapter02

40  Dana  Nau  and  Vikas  Shivashankar:  Lecture  slides  for  Automated  Planning  and  Ac0ng   Updated  3/24/15  

●  h is admissible if for every s, 0 ≤ h(s) ≤ h*(s) Ø  where h*(s) = least cost of getting from s to a state that satisfies g

●  If h is admissible then A* is guaranteed to return an optimal solution ●  Inadmissible heuristics might get you to a solution faster, but the solution won’t

necessarily be optimal

Proper.es  of  A*  

Page 41: Chapter02

41  Dana  Nau  and  Vikas  Shivashankar:  Lecture  slides  for  Automated  Planning  and  Ac0ng   Updated  3/24/15  

Depth-­‐First  Branch  and  Bound  ●  Depth-first search with heuristic function and pruning

Ø  π* and c*: least-cost solution found so far, and the cost of that solution Ø  Any time you find a solution with lower cost, update π* and c* Ø  Prune any plan π such that cost(π) + h(s) ≥ c*

Page 42: Chapter02

42  Dana  Nau  and  Vikas  Shivashankar:  Lecture  slides  for  Automated  Planning  and  Ac0ng   Updated  3/24/15  

Discussion  ●  If the state space is not too large, then A* or DFBB may be preferable

Ø  They are guaranteed to return optimal solutions ●  DFFS returns the first solution it finds

Ø  can be arbitrarily far from optimal ●  Greedy isn’t guaranteed to return a solution at all ●  If S is very large, A* may require excessive memory, and both it and DFBB may

require excessive running time Ø  In these cases DFFS and Greedy may be preferable

Page 43: Chapter02

43  Dana  Nau  and  Vikas  Shivashankar:  Lecture  slides  for  Automated  Planning  and  Ac0ng   Updated  3/24/15  

Example  ●  One rigid relation, adjacent ●  State variables to represent each

location’s x and y coordinates: Ø  x = {(loc0, 2), (loc1, 0),

(loc2, 4), (loc3, 0), . . .} Ø  y = {(loc0, 4), (loc1, 3),

(loc2, 4), (loc3, 2), . . .} ●  One planning operator:

action move(r, l, m) Ø  pre: adjacent(l, m), loc(r) = l Ø  eff: loc(r) ← m

●  cost(loci, move(r,locj)) = ︎(xi −xj)2 +(yi − yj)2

= ︎(x(loci) − x(locj))2 + (y(loci) − y(locj))2 ●  hsld(si) = ︎(x(loci) − x(locg))2 + (y(loci) − y(locg))2

Ø  Optimal solution to a relaxed problem in which the locations don’t need to be adjacent

loc1

loc3

loc2

loc6

loc5loc4 loc7

loc8

loc0

x

y

4

3

2

1

1 2 3 4 5 60

loc9

Not CSV

Page 44: Chapter02

44  Dana  Nau  and  Vikas  Shivashankar:  Lecture  slides  for  Automated  Planning  and  Ac0ng   Updated  3/24/15  

CSV  Example  

●  s0 = {loc(r1)  =  d3,                      loaded(r1)  =  F,                      loc(c1)  =  d1};

●  g = {loc(r1)  =  d3,                    loc(c1)  =  r1}

●  action move(r, d, e) Ø  pre: loc(r) = d Ø  eff: loc(r) ← e

●  action load(r, c, l) Ø  pre: loaded(r) = F, loc(c) = l, loc(r) = l Ø  eff: loaded(r) ← T, loc(c) ← r

●  action unload(r, c, l) Ø  pre: loc(c) = r, loc(r) = l Ø  eff: loaded(r) ← F, loc(c) ← l

d3 r1   c1  s0:

g:

                         d2                            d1  

d3 r1  

c1   g = {loc(r1)  =  d3,  loc(c1)  =  r1}

Page 45: Chapter02

45  Dana  Nau  and  Vikas  Shivashankar:  Lecture  slides  for  Automated  Planning  and  Ac0ng   Updated  3/24/15  

d3 r1   c1  s0:

g:

                         d2                            d1  

d3 r1  

c1  

CSV  Example  ●  Two applicable actions ●  a1 = move(r1,  d3,  d1)  

s1 = {loc(r1)  =  d1,  loaded(r1)  =  F,                        loc(c1)  =  d1}

●  a2 = move(r1,  d3,  d2)  s2 = {loc(r1)  =  d2,  loaded(r1)  =  F,                        loc(c1)  =  d1}

●  In line 4, compute Ø  min(h(s1), h(s2))

g = {loc(r1)  =  d3,  loc(c1)  =  r1}

Page 46: Chapter02

46  Dana  Nau  and  Vikas  Shivashankar:  Lecture  slides  for  Automated  Planning  and  Ac0ng   Updated  3/24/15  

Addi.ve-­‐Cost  and  Max-­‐Cost  Heuris.cs  ●  hadd(s) = Δadd(s,g), where

●  Minimum cost of a “plan tree” to achieve g from s

Ø  Pretend each element of g needs a completely separate plan •  If an action achieves i elements of g, it’s included i times

Ø  Pretend each of an action’s preconditions needs a completely separate plan Ø  Thus, get a “plan” that’s a tree of actions

•  Total cost can sometimes be much higher than h*(s) ●  Can implement as a tree search, going backward from g

Page 47: Chapter02

47  Dana  Nau  and  Vikas  Shivashankar:  Lecture  slides  for  Automated  Planning  and  Ac0ng   Updated  3/24/15  

true in s2

loc(r1)=d3 loc(c1)=r1

g = {loc(r1)=d3,,loc(c1)=r1}

move(r1,d1,d3) move(r1,d2,d3)pre:

loc(r1)=d1pre:

loc(r1)=d2

true in s1 … 00 > 0

min(1,(>1)) = 1

pre:

loaded(r1)=nil

true in s1

loc(r1)=d1 loc(c1)=d1

true in s2

0 0

load(r1,c1,d1)

0+1 = 10+1 = 1 (>0)+1 > 1

load(r1,c2,d2)

load(r1,c3,d3)

…> 0

> 0

(>0) +1 > 1(>0)+1 > 1

sum

min(1,(>1),(>1)) = 1

sum(0,0,0) = 0su m

alternativesalternatives

hadd(s1) = Δadd(s1,g) = sum(1,1) = 2

d3 r1   c1  

s1:

                         d2                            d1  

d3

r1  c1  

Addi.ve-­‐Cost  Heuris.c  

g = {loc(r1)  =  d3,  loc(c1)  =  r1}

Page 48: Chapter02

48  Dana  Nau  and  Vikas  Shivashankar:  Lecture  slides  for  Automated  Planning  and  Ac0ng   Updated  3/24/15  

hadd(s2) = Δadd(s2,g) = sum(1,2) = 3

move(r1,d2,d1)

loc(r1)=d3 loc(c1)=r1

g = {loc(r1)=d3,0loc(c1)=r1}

move(r1,d1,d3) move(r1,d2,d3)pre:

loc(r1)=d1pre:

loc(r1)=d2

… true in s20

> 0 0

min(1,(>1)) = 1

pre:

loaded(r1)=nil

true in s2

loc(r1)=d1 loc(c1)=d1

true in s2

00+1 = 1

load(r1,c1,d1)

1+1 = 2(>0)+1 > 1 0+1 = 1

load(r1,c2,d2)

load(r1,c3,d3)

…> 1

> 1

(>1) +1 > 2(>1)+1 > 2

sum

true in s2

0

min(2,(>2),(>2)) = 2

sum(0,1,0) = 1su m

alternativesalternatives

Addi.ve-­‐Cost  Heuris.c  

g = {loc(r1)  =  d3,  loc(c1)  =  r1} d3

r1   c1  s2:

                         d2                            d1  

d3

r1  c1  

Page 49: Chapter02

49  Dana  Nau  and  Vikas  Shivashankar:  Lecture  slides  for  Automated  Planning  and  Ac0ng   Updated  3/24/15  

Addi.ve-­‐Cost  and  Max-­‐Cost  Heuris.cs  ●  hmax(s) = Δmax(s,g), where

●  Like hadd(s), but doesn’t add all of the costs in the plan tree

Ø  Just the most costly path in the plan tree ●  Guaranteed to be a lower bound on h*(s)

●  Same tree search as for hadd(s)

Page 50: Chapter02

50  Dana  Nau  and  Vikas  Shivashankar:  Lecture  slides  for  Automated  Planning  and  Ac0ng   Updated  3/24/15  

true in s2

loc(r1)=d3 loc(c1)=r1

g = {loc(r1)=d3,,loc(c1)=r1}

move(r1,d1,d3) move(r1,d2,d3)pre:

loc(r1)=d1pre:

loc(r1)=d2

true in s1 … 00 > 0

min(1,(>1)) = 1

pre:

loaded(r1)=nil

true in s1

loc(r1)=d1 loc(c1)=d1

true in s2

0 0

load(r1,c1,d1)

0+1 = 10+1 = 1 (>0)+1 > 1

load(r1,c2,d2)

load(r1,c3,d3)

…> 0

> 0

(>0) +1 > 1(>0)+1 > 1

max

min(1,(>1),(>1)) = 1

max(0,0,0) = 0m ax

alternativesalternatives

hmax(s1) = Δmax(s1,g) = max(1,1) = 1

s1:

                         d2                            d1  

d3

r1  c1  

Max-­‐Cost  Heuris.c  

g = {loc(r1)  =  d3,  loc(c1)  =  r1} d3

r1   c1  

Page 51: Chapter02

51  Dana  Nau  and  Vikas  Shivashankar:  Lecture  slides  for  Automated  Planning  and  Ac0ng   Updated  3/24/15  

move(r1,d2,d1)

loc(r1)=d3 loc(c1)=r1

g = {loc(r1)=d3,0loc(c1)=r1}

move(r1,d1,d3) move(r1,d2,d3)pre:

loc(r1)=d1pre:

loc(r1)=d2

… true in s20

> 0 0

min(1,(>1)) = 1

hmax(s2) = Δmax(s2,g) = max(1,2) = 2

pre:

loaded(r1)=nil

true in s2

loc(r1)=d1 loc(c1)=d1

true in s2

00+1 = 1

load(r1,c1,d1)

1+1 = 2(>0)+1 > 1 0+1 = 1

load(r1,c2,d2)

load(r1,c3,d3)

…> 1

> 1

(>1) +1 > 2(>1)+1 > 2

max

true in s2

0

min(2,(>2),(>2)) = 2

max(0,1,0) = 1m ax

alternativesalternatives

Max-­‐Cost  Heuris.c  

g = {loc(r1)  =  d3,  loc(c1)  =  r1} d3

r1   c1  s2:

                         d2                            d1  

d3

r1  c1  

Page 52: Chapter02

52  Dana  Nau  and  Vikas  Shivashankar:  Lecture  slides  for  Automated  Planning  and  Ac0ng   Updated  3/24/15  

Delete-­‐Relaxa.on  Heuris.cs  ●  Suppose a state s includes an assignment x = v ●  Suppose an action a has an effect x ← w ●  Then γ+(s, a) includes both x = v and x = w

●  Relaxed state (or r-state) Ø  any set ŝ of state-variable values Ø  may include more than one value for each state variable

●  ŝ r-satisfies a goal g if ŝ contains a subset that satisfies g ●  An action a is r-applicable in ŝ if ŝ r-satisfies a’s preconditions

Ø  In this case, γ+(ŝ,a) = ŝ ∪ γ(s,a) ●  π = ⟨a1, …, an⟩ is r-applicable in ŝ0 if there are r-states ŝ1, ŝ2, …, ŝn such that

•  a1 is r-applicable in ŝ0 and γ+(ŝ0,a1) = ŝ1 •  a2 is r-applicable in ŝ1 and γ+(ŝ1,a2) = ŝ2 •  …

Ø  In this case, γ+(ŝ,a) = ŝn ●  π is a relaxed solution for P = (Σ, s0, g) if γ+(s0, π) r-satisfies g

Name is from classical planning “don’t do the deletion”  

move(r,l,m)  Precond: loc(r,l),    Effects: ¬loc(r,l),  loc(r,m)  

Page 53: Chapter02

53  Dana  Nau  and  Vikas  Shivashankar:  Lecture  slides  for  Automated  Planning  and  Ac0ng   Updated  3/24/15  

d3 r1   c1  ŝ1:

g:

                         d2                            d1  

d3

r1  

r1  

c1  

Op.mal  Relaxed  Solu.on  Heuris.c  ●  ∆+(ŝ, g) = minimum cost of all plans π such that γ+(s0, π) r-satisfies g ●  Optimal relaxed solution heuristic: h+(s) = ∆+(s, g) ●  Example:

Ø  ŝ1 = γ+(s0, move(r1,d3,d1)) = {loc(r1)  =  d1,  loaded(r1)  =  F,  loc(c1)  =  d1,  loc(r1)  =  d3}

g = {loc(r1)  =  d3,  loc(c1)  =  r1}

Page 54: Chapter02

54  Dana  Nau  and  Vikas  Shivashankar:  Lecture  slides  for  Automated  Planning  and  Ac0ng   Updated  3/24/15  

d3 r1   c1  ŝ2:

g:

                         d2                            d1  

d3

r1  

r1  

c1  c1   r1  

r1  c1  

Op.mal  Relaxed  Solu.on  Heuris.c  ●  ∆+(ŝ, g) = minimum cost of all plans π such that γ+(s0, π) r-satisfies g ●  Optimal relaxed solution heuristic: h+(s) = ∆+(s, g) ●  Example:

Ø  ŝ1 = γ+(s0, move(r1,d3,d1)) = {loc(r1)  =  d1,  loaded(r1)  =  F,  loc(c1)  =  d1,  loc(r1)  =  d3}

Ø  ŝ2 = γ+(s1, load(r1,c1,d1)) = {loc(r1)  =  d1,  loaded(r1)  =  F,  loc(c1)  =  d1,  loc(r1)  =  d3,                                                                                  loaded(r1)  =  T,  loc(c1)  =  r1}. Ø  ŝ2 r-satisfies g, so ⟨move(r1,d3,d1), load(r1,c1,d1)⟩ is a relaxed solution Ø  It’s optimal, so h+(s0) = 2

g = {loc(r1)  =  d3,  loc(c1)  =  r1}

Page 55: Chapter02

55  Dana  Nau  and  Vikas  Shivashankar:  Lecture  slides  for  Automated  Planning  and  Ac0ng   Updated  3/24/15  

Op.mal  Relaxed  Solu.on  Heuris.c  ●  ∆+(ŝ, g) = minimum cost of all plans π such that γ+(s0, π) r-satisfies g ●  Optimal relaxed solution heuristic: h+(s) = ∆+(s, g)

●  Every solution is also a relaxed solution Ø  Thus h+ is admissible Ø  Problem: computing it is NP-hard

d3 r1   c1  s0:

g:

                         d2                            d1  

d3 r1  

c1   g = {loc(r1)  =  d3,  loc(c1)  =  r1}

Page 56: Chapter02

56  Dana  Nau  and  Vikas  Shivashankar:  Lecture  slides  for  Automated  Planning  and  Ac0ng   Updated  3/24/15  

RPG  Heuris.c  

●  Relaxed planning graph (RPG) heuristic Ø  An approximation of h+ that’s easier to compute

●  Based on the fact that γ+ doesn’t depend on the order in which actions are applied

●  If a1 and a2 are both applicable in s0 then γ+(s0, ⟨a1, a2⟩) = γ+(s0, ⟨a2, a1⟩) = s0 ∪ eff(a1) ∪ eff(a2)

Page 57: Chapter02

57  Dana  Nau  and  Vikas  Shivashankar:  Lecture  slides  for  Automated  Planning  and  Ac0ng   Updated  3/24/15  

RPG  Heuris.c  

●  Computation of RPG(s1,g) Ø  First line of RPG assigns ŝ0 = s1 , A0 = ∅

ŝ0:

         loc(r1)  =  d1            loc(c1)  =  d1            loaded(r1)  =  F

ŝ1:

         loc(r1)  =  d3            loc(r1)  =  d2            loaded(r1)  =  T            loc(c1)  =  r1    

         loc(c1)  =  d1            loc(r1)  =  d1            loaded(r1)  =  F

A1:

         move(r1,d1,d3)            move(r1,d1,d2)            load(r1,c1,d1)

From ŝ0

d3 r1   c1  s1:

g:

                         d2                            d1  

d3

r1  c1   g = {loc(r1)  =  d3,  loc(c1)  =  r1}

Page 58: Chapter02

58  Dana  Nau  and  Vikas  Shivashankar:  Lecture  slides  for  Automated  Planning  and  Ac0ng   Updated  3/24/15  

RPG  Heuris.c  

●  Computation of RPG(s1,g) Ø  First line of RPG assigns ŝ0 = s1 , A0 = ∅

●  RPG(s1,g) returns 2

ŝ0:

         loc(r1)  =  d1            loc(c1)  =  d1            loaded(r1)  =  F

ŝ1:

         loc(r1)  =  d3            loc(r1)  =  d2            loaded(r1)  =  T            loc(c1)  =  r1    

         loc(c1)  =  d1            loc(r1)  =  d1            loaded(r1)  =  F

A1:

         move(r1,d1,d3)            move(r1,d1,d2)            load(r1,c1,d1)

From ŝ0

d3 r1   c1  s1:

g:

                         d2                            d1  

d3

r1  c1   g = {loc(r1)  =  d3,  loc(c1)  =  r1}

Page 59: Chapter02

59  Dana  Nau  and  Vikas  Shivashankar:  Lecture  slides  for  Automated  Planning  and  Ac0ng   Updated  3/24/15  

RPG  Heuris.c  

ŝ0:

 loaded(r1)  =  F    loc(c1)  =  d1    loc(r1)  =  d2  

ŝ1:

         loc(r1)  =  d3            loc(r1)  =  d1    

         loaded(r1)  =  F            loc(c1)  =  d1            loc(r1)  =  d2

A1:

         move(r1,d2,d3)            move(r1,d2,d1)  

From ŝ0

A2:

         move(r1,d3,d1)            move(r1,d3,d2)            move(r1,d1,d2)            move(r1,d1,d3)                  load(r1,c1,d1)            move(r1,d2,d3)            move(r1,d2,d1)  

           ŝ2:

   loc(c1)  =  r1            loaded(r1)  =  T  

         loc(r1)  =  d3            loc(r1)  =  d1    

         loc(c1)  =  d1            loc(r1)  =  d2            loaded(r1)  =  F

From ŝ0

d3 r1   c1  s2:

g:

                         d2                            d1  

d3

r1  c1   g = {loc(r1)  =  d3,  loc(c1)  =  r1}

●  Computation of RPG(s2,g) Ø  First line of RPG assigns ŝ0 = s2 , A0 = ∅

Page 60: Chapter02

60  Dana  Nau  and  Vikas  Shivashankar:  Lecture  slides  for  Automated  Planning  and  Ac0ng   Updated  3/24/15  

RPG  Heuris.c  

ŝ0:

 loaded(r1)  =  F    loc(c1)  =  d1    loc(r1)  =  d2  

ŝ1:

         loc(r1)  =  d3            loc(r1)  =  d1    

         loaded(r1)  =  F            loc(c1)  =  d1            loc(r1)  =  d2

A1:

         move(r1,d2,d3)            move(r1,d2,d1)  

From ŝ0

A2:

         move(r1,d3,d1)            move(r1,d3,d2)            move(r1,d1,d2)            move(r1,d1,d3)                  load(r1,c1,d1)            move(r1,d2,d3)            move(r1,d2,d1)  

           ŝ2:

   loc(c1)  =  r1            loaded(r1)  =  T  

         loc(r1)  =  d3            loc(r1)  =  d1    

         loc(c1)  =  d1            loc(r1)  =  d2            loaded(r1)  =  F

From ŝ0

d3 r1   c1  s2:

g:

                         d2                            d1  

d3

r1  c1   g = {loc(r1)  =  d3,  loc(c1)  =  r1}

●  Computation of RPG(s2,g) Ø  First line of RPG assigns ŝ0 = s2 , A0 = ∅

●  RPG(s2,g) returns 3

Page 61: Chapter02

61  Dana  Nau  and  Vikas  Shivashankar:  Lecture  slides  for  Automated  Planning  and  Ac0ng   Updated  3/24/15  

Landmark  Heuris.cs  ●  P = (Σ,s0,g) be a planning problem ●  Let φ = φ1 ∨ ... ∨ φm be a disjunction of state-variable assignments ●  Definition: φ is a landmark for P if φ is true at some point in every solution plan

of P

●  Example Landmarks

d3 r1   c1  s0:

g:

                         d2                            d1  

d3 r1  

c1   g = {loc(r1)  =  d3,  loc(c1)  =  r1}

                         d1  r1  

A complete state s0 A single state-variable Assignment (loc(r1)=d1)

                         d1  r1  

                         d2  r1  

Can be a disjunction of state-variable assignments loc(r1)=d1 ∨ loc(r1)=d2

Page 62: Chapter02

62  Dana  Nau  and  Vikas  Shivashankar:  Lecture  slides  for  Automated  Planning  and  Ac0ng   Updated  3/24/15  

Why  are  Landmarks  Useful?  ●  Help in breaking down the given problem into smaller subproblems

gs0

Page 63: Chapter02

63  Dana  Nau  and  Vikas  Shivashankar:  Lecture  slides  for  Automated  Planning  and  Ac0ng   Updated  3/24/15  

Why  are  Landmarks  Useful?  ●  Help in breaking down the given problem into smaller subproblems

●  Every solution to P has to achieve these landmarks ●  Possible strategy:

Ø  find a plan that takes us from s0 to any state s1 that satisfies lm1 Ø  find a plan that takes us from s1 to any state s2 that satisfies lm2 Ø  …

lm1

gs0

lm2

lm3P1

P2 P3

P4

Page 64: Chapter02

64  Dana  Nau  and  Vikas  Shivashankar:  Lecture  slides  for  Automated  Planning  and  Ac0ng   Updated  3/24/15  

Compu.ng  Landmarks  ●  Question: How do we compute landmarks for a problem P? ●  Not easy:

Ø  Deciding whether a state-variable assignment φ is a landmark is in the worst-case PSPACE-hard

Ø  To put it in perspective: as hard as solving the planning problem itself! ●  However, all is not lost:

Ø  There are often useful landmarks that can be found more easily Ø  There are polynomial-time procedures that can compute these landmarks Ø  Going to see one such procedure based on Relaxed Planning Graphs

●  Why Relaxed Planning Graphs? Ø  Solving relaxed planning problems easier

•  Computing landmarks for relaxed planning problems easier Ø  A landmark for a relaxed planning problem is a landmark for the original

planning problem as well

Page 65: Chapter02

65  Dana  Nau  and  Vikas  Shivashankar:  Lecture  slides  for  Automated  Planning  and  Ac0ng   Updated  3/24/15  

RPG-­‐based  Landmark  Computa.on  ●  Main intuition: if a state-variable assignment φ is a landmark, then preconditions

of actions that achieve φ is also a landmark Ø  In other words: we’re going to start from known landmarks and discover new

ones by looking at preconditions of actions that achieve these landmarks ●  Example:

Ø  g is the goal Ø  Actions a1 and a2 can achieve g Ø  Therefore, either p1∧q or p2∧q must be true to be able to achieve g Ø  In other words: (p1∧q)∨(p2∧q) is a landmark Ø  By rearranging terms: we get (p1∧q)∨(p2∧q) ≅ q∧(p1∨p2) Ø  Since q∧(p1∨p2) is a landmark, both q and (p1∨p2) are landmarks

●  In practice, we try to rearrange assignments in a similar manner to group terms with the same state variable together

g

a2

a1p1

q

p2

Page 66: Chapter02

66  Dana  Nau  and  Vikas  Shivashankar:  Lecture  slides  for  Automated  Planning  and  Ac0ng   Updated  3/24/15  

RPG-­‐based  Landmark  Computa.on  ●  Question: What landmarks can we start with?

Ø  Every goal is trivially a landmark; can start from there ●  E.g. loc(r1)  =  d3  is a landmark

Ø  Two actions achieve this landmark: move (r1, d3, d1) and move (r1, d2, d1)

Ø  Can infer a new landmark that is the disjunction of the preconditions of these two actions: φ‘ = loc(r1)  =  d3∨loc(r1)  =  d2

action move(r, d, e) pre: loc(r) = d eff: loc(r) ← e

d3 r1   c1  s0:

g:

                         d2                            d1  

d3 r1  

c1   g = {loc(r1)  =  d3,  loc(c1)  =  r1}

Page 67: Chapter02

67  Dana  Nau  and  Vikas  Shivashankar:  Lecture  slides  for  Automated  Planning  and  Ac0ng   Updated  3/24/15  

RPG-­‐based  Landmark  Computa.on  ComputeLandmark (s0, g = g1 ∧ g2 ∧ … gk) ●  queue = {g1, g2,…, gk} ●  While queue is not empty:

Ø  Remove a gi from queue

Ø  Ai = {all actions that can achieve gi} Ø  Compute all assignments that can be r-

produced starting from s0 without using Ai, thus generating the RPG foo  

Ø  act(gi) = {all actions in Ai that are r-applicable in the r-state resulting from foo}

Ø  For each action in act(gi): •  Pick a precondition not satisfied in s0

and add to φ

Ø  The resulting disjunction φ is a landmark; add it to queue

gi

a1p1

q1

a2p2

q2

a3p3

q3

Page 68: Chapter02

68  Dana  Nau  and  Vikas  Shivashankar:  Lecture  slides  for  Automated  Planning  and  Ac0ng   Updated  3/24/15  

RPG-­‐based  Landmark  Computa.on  ComputeLandmark (s0, g = g1 ∧ g2 ∧ … gk) ●  queue = {g1, g2,…, gk} ●  While queue is not empty:

Ø  Remove a gi from queue

Ø  Ai = {all actions that can achieve gi} Ø  Compute all assignments that can be r-

produced starting from s0 without using Ai, thus generating the RPG foo  

Ø  act(gi) = {all actions in Ai that are r-applicable in the r-state resulting from foo}

Ø  For each action in act(gi): •  Pick a precondition not satisfied in s0

and add to φ

Ø  The resulting disjunction φ is a landmark; add it to queue

gi

a1p1

q1

a2p2

q2

a3p3

q3

Page 69: Chapter02

69  Dana  Nau  and  Vikas  Shivashankar:  Lecture  slides  for  Automated  Planning  and  Ac0ng   Updated  3/24/15  

RPG-­‐based  Landmark  Computa.on  ComputeLandmark (s0, g = g1 ∧ g2 ∧ … gk) ●  queue = {g1, g2,…, gk} ●  While queue is not empty:

Ø  Remove a gi from queue

Ø  Ai = {all actions that can achieve gi} Ø  Compute all assignments that can be r-

produced starting from s0 without using Ai, thus generating the RPG foo  

Ø  act(gi) = {all actions in Ai that are r-applicable in the r-state resulting from foo}

Ø  For each action in act(gi): •  Pick a precondition not satisfied in s0

and add to φ

Ø  The resulting disjunction φ is a landmark; add it to queue

gi

a1p1

q1

a2p2

q2

a3p3

q3

Page 70: Chapter02

70  Dana  Nau  and  Vikas  Shivashankar:  Lecture  slides  for  Automated  Planning  and  Ac0ng   Updated  3/24/15  

RPG-­‐based  Landmark  Computa.on  ComputeLandmark (s0, g = g1 ∧ g2 ∧ … gk) ●  queue = {g1, g2,…, gk} ●  While queue is not empty:

Ø  Remove a gi from queue

Ø  Ai = {all actions that can achieve gi} Ø  Compute all assignments that can be r-

produced starting from s0 without using Ai, thus generating the RPG foo  

Ø  act(gi) = {all actions in Ai that are r-applicable in the r-state resulting from foo}

Ø  For each action in act(gi): •  Pick a precondition not satisfied in s0

and add to φ

Ø  The resulting disjunction φ is a landmark; add it to queue

gi

a1p1

q1

a2p2

q2

a3p3

q3

Page 71: Chapter02

71  Dana  Nau  and  Vikas  Shivashankar:  Lecture  slides  for  Automated  Planning  and  Ac0ng   Updated  3/24/15  

RPG-­‐based  Landmark  Computa.on  ComputeLandmark (s0, g = g1 ∧ g2 ∧ … gk) ●  queue = {g1, g2,…, gk} ●  While queue is not empty:

Ø  Remove a gi from queue

Ø  Ai = {all actions that can achieve gi} Ø  Compute all assignments that can be r-

produced starting from s0 without using Ai, thus generating the RPG foo  

Ø  act(gi) = {all actions in Ai that are r-applicable in the r-state resulting from foo}

Ø  For each action in act(gi): •  Pick a precondition not satisfied in s0

and add to φ

Ø  The resulting disjunction φ is a landmark; add it to queue

gi

a1p1

q1

a2p2

q2

a3p3

q3

Page 72: Chapter02

72  Dana  Nau  and  Vikas  Shivashankar:  Lecture  slides  for  Automated  Planning  and  Ac0ng   Updated  3/24/15  

RPG-­‐based  Landmark  Computa.on  ComputeLandmark (s0, g = g1 ∧ g2 ∧ … gk) ●  queue = {g1, g2,…, gk} ●  While queue is not empty:

Ø  Remove a gi from queue

Ø  Ai = {all actions that can achieve gi} Ø  Compute all assignments that can be r-

produced starting from s0 without using Ai, thus generating the RPG foo  

Ø  act(gi) = {all actions in Ai that are r-applicable in the r-state resulting from foo}

Ø  For each action in act(gi): •  Pick a precondition not satisfied in s0

and add to φ

Ø  The resulting disjunction φ is a landmark; add it to queue

gi

a1p1

q1

a2p2

q2

a3p3

q3 p3∨q1: a new landmark

Page 73: Chapter02

73  Dana  Nau  and  Vikas  Shivashankar:  Lecture  slides  for  Automated  Planning  and  Ac0ng   Updated  3/24/15  

RPG-­‐based  Landmark  Computa.on  

d3 r1   c1  s0:

g:

                         d2                            d1  

d3 r1  

c1   g = {loc(r1)  =  d3,  loc(c1)  =  r1}

loc(r1)  =  d3 loc(c1)  =  r1 ComputeLandmark (s0, g = g1 ∧ g2 ∧ … gk) ●  queue = {g1, g2,…, gk} ●  While queue is not empty:

Ø  Remove a gi from queue

Ø  Ai = {all actions that can achieve gi} Ø  Compute all assignments that can be r-produced

starting from s0 without using Ai, thus generating the RPG foo  

Ø  act(gi) = {all actions in Ai that are r-applicable in the r-state resulting from foo}

Ø  For each action in act(gi): •  Pick a precondition not satisfied in s0 and add to φ

Ø  The resulting disjunction φ is a landmark; add it to queue

Page 74: Chapter02

74  Dana  Nau  and  Vikas  Shivashankar:  Lecture  slides  for  Automated  Planning  and  Ac0ng   Updated  3/24/15  

RPG-­‐based  Landmark  Computa.on  

d3 r1   c1  s0:

g:

                         d2                            d1  

d3 r1  

c1   g = {loc(r1)  =  d3,  loc(c1)  =  r1}

loc(r1)  =  d3 loc(c1)  =  r1

True in current state, always going

to be true in RPG; No use expanding

ComputeLandmark (s0, g = g1 ∧ g2 ∧ … gk) ●  queue = {g1, g2,…, gk} ●  While queue is not empty:

Ø  Remove a gi from queue

Ø  Ai = {all actions that can achieve gi} Ø  Compute all assignments that can be r-produced

starting from s0 without using Ai, thus generating the RPG foo  

Ø  act(gi) = {all actions in Ai that are r-applicable in the r-state resulting from foo}

Ø  For each action in act(gi): •  Pick a precondition not satisfied in s0 and add to φ

Ø  The resulting disjunction φ is a landmark; add it to queue

Page 75: Chapter02

75  Dana  Nau  and  Vikas  Shivashankar:  Lecture  slides  for  Automated  Planning  and  Ac0ng   Updated  3/24/15  

RPG-­‐based  Landmark  Computa.on  

d3 r1   c1  s0:

g:

                         d2                            d1  

d3 r1  

c1   g = {loc(r1)  =  d3,  loc(c1)  =  r1}

loc(r1)  =  d3 loc(c1)  =  r1

3 actions achieve this: load  (r1,c1,d1),  load  (r1,c1,d2),    load  (r1,c1,d3)       action load(r, c, l)

pre: loaded(r) = nil, loc(c) = l, loc(r) = l eff: loaded(r) ← c, loc(c) ← r  

ComputeLandmark (s0, g = g1 ∧ g2 ∧ … gk) ●  queue = {g1, g2,…, gk} ●  While queue is not empty:

Ø  Remove a gi from queue

Ø  Ai = {all actions that can achieve gi} Ø  Compute all assignments that can be r-produced

starting from s0 without using Ai, thus generating the RPG foo  

Ø  act(gi) = {all actions in Ai that are r-applicable in the r-state resulting from foo}

Ø  For each action in act(gi): •  Pick a precondition not satisfied in s0 and add to φ

Ø  The resulting disjunction φ is a landmark; add it to queue

Page 76: Chapter02

76  Dana  Nau  and  Vikas  Shivashankar:  Lecture  slides  for  Automated  Planning  and  Ac0ng   Updated  3/24/15  

ComputeLandmark (s0, g = g1 ∧ g2 ∧ … gk) ●  queue = {g1, g2,…, gk} ●  While queue is not empty:

Ø  Remove a gi from queue

Ø  Ai = {all actions that can achieve gi} Ø  Compute all assignments that can be r-produced

starting from s0 without using Ai, thus generating the RPG foo  

Ø  act(gi) = {all actions in Ai that are r-applicable in the r-state resulting from foo}

Ø  For each action in act(gi): •  Pick a precondition not satisfied in s0 and add to φ

Ø  The resulting disjunction φ is a landmark; add it to queue

RPG-­‐based  Landmark  Computa.on  

d3 r1   c1  s0:

g:

                         d2                            d1  

d3 r1  

c1   g = {loc(r1)  =  d3,  loc(c1)  =  r1}

loc(r1)  =  d3 loc(c1)  =  r1

3 actions achieve this: load  (r1,c1,d1),  load  (r1,c1,d2),    load  (r1,c1,d3)       action load(r, c, l)

pre: loaded(r) = nil, loc(c) = l, loc(r) = l eff: loaded(r) ← c, loc(c) ← r  

Page 77: Chapter02

77  Dana  Nau  and  Vikas  Shivashankar:  Lecture  slides  for  Automated  Planning  and  Ac0ng   Updated  3/24/15  

RPG-­‐based  Landmark  Computa.on  

d3 r1   c1  s0:

g:

                         d2                            d1  

d3 r1  

c1   g = {loc(r1)  =  d3,  loc(c1)  =  r1}

ŝ0 = {

         loc(c1)  =  d1            loc(r1)  =  d3            loaded(r1)  =  nil                      }

A1 = {

         move(r1,d3,d1)            move(r1,d3,d2)                      }

ŝ1 = {

         loc(r1)  =  d1            loc(r1)  =  d2  

         loc(c1)  =  d1            loc(r1)  =  d3            loaded(r1)  =  nil                        }

From ŝ0

 load(r1,c1,d1)    load(r1,c1,d2)    load(r1,c1,d3)  

Relaxed Planning Graph foo  Only first load action applicable in r-state at the end of foo  

Page 78: Chapter02

78  Dana  Nau  and  Vikas  Shivashankar:  Lecture  slides  for  Automated  Planning  and  Ac0ng   Updated  3/24/15  

RPG-­‐based  Landmark  Computa.on  

d3 r1   c1  s0:

g:

                         d2                            d1  

d3 r1  

c1   g = {loc(r1)  =  d3,  loc(c1)  =  r1}

loc(r1)  =  d3 loc(c1)  =  r1

3 actions achieve this: load  (r1,c1,d1),  load  (r1,c1,d2),    load  (r1,c1,d3)       action load(r, c, l)

pre: loaded(r) = nil, loc(c) = l, loc(r) = l eff: loaded(r) ← c, loc(c) ← r  

Only load  (r1,c1,d1)  is applicable in final level of RPG; these preconds used to generate new landmarks

ComputeLandmark (s0, g = g1 ∧ g2 ∧ … gk) ●  queue = {g1, g2,…, gk} ●  While queue is not empty:

Ø  Remove a gi from queue

Ø  Ai = {all actions that can achieve gi} Ø  Compute all assignments that can be r-produced

starting from s0 without using Ai, thus generating the RPG foo  

Ø  act(gi) = {all actions in Ai that are r-applicable in the r-state resulting from foo}

Ø  For each action in act(gi): •  Pick a precondition not satisfied in s0 and add to φ

Ø  The resulting disjunction φ is a landmark; add it to queue

Page 79: Chapter02

79  Dana  Nau  and  Vikas  Shivashankar:  Lecture  slides  for  Automated  Planning  and  Ac0ng   Updated  3/24/15  

RPG-­‐based  Landmark  Computa.on  

d3 r1   c1  s0:

g:

                         d2                            d1  

d3 r1  

c1   g = {loc(r1)  =  d3,  loc(c1)  =  r1}

loc(r1)  =  d3 loc(c1)  =  r1

loaded(r1) = nil

loc(c1) = d1 loc(r1) = d1

These new landmarks are added to the queue

ComputeLandmark (s0, g = g1 ∧ g2 ∧ … gk) ●  queue = {g1, g2,…, gk} ●  While queue is not empty:

Ø  Remove a gi from queue

Ø  Ai = {all actions that can achieve gi} Ø  Compute all assignments that can be r-produced

starting from s0 without using Ai, thus generating the RPG foo  

Ø  act(gi) = {all actions in Ai that are r-applicable in the r-state resulting from foo}

Ø  For each action in act(gi): •  Pick a precondition not satisfied in s0 and add to φ

Ø  The resulting disjunction φ is a landmark; add it to queue

Page 80: Chapter02

80  Dana  Nau  and  Vikas  Shivashankar:  Lecture  slides  for  Automated  Planning  and  Ac0ng   Updated  3/24/15  

Landmark  Heuris.c  ●  Every solution to the problem needs to achieve all the computed landmarks ●  One possible heuristic to estimate distance of state s from g:

Ø  Number of landmarks required to be accomplished from s Ø  Planner biased towards actions that achieve landmarks

●  Is this heuristic admissible?

Page 81: Chapter02

81  Dana  Nau  and  Vikas  Shivashankar:  Lecture  slides  for  Automated  Planning  and  Ac0ng   Updated  3/24/15  

Landmark  Heuris.c  ●  Every solution to the problem needs to achieve all the computed landmarks ●  One possible heuristic to estimate distance of state s from g:

Ø  Number of landmarks required to be accomplished from s Ø  Planner biased towards actions that achieve landmarks

●  Is this heuristic admissible?

●  A number of more advanced landmark-based heuristics developed (including admissible ones) Ø  Check textbook for references

g1as0

g2Number of landmarks: | {g1, g2} | = 2 Optimal Plan Length = |<a>| = 1

g = g1∧g2