Chapter02

1 Dana Nau and Vikas Shivashankar: Lecture slides for Automated Planning and Ac0ng Updated 3/24/15

This work is licensed under a CreaCve Commons AFribuCon-‐NonCommercial-‐ShareAlike 4.0 InternaConal License.

Chapter 2 Delibera.on with Determinis.c Models

Dana S. Nau and Vikas Shivashankar

University of Maryland


Purpose of this Chapter ●  Last time, Vikas mentioned conventional AI planning

Ø  Given •  a domain model (descriptions of the states and actions) •  initial state s0, and goal g

Ø  Find a plan or a policy that •  is executable starting in s0 •  produces a state that satisfies g

●  This chapter discusses some techniques for doing that Ø  Also, how to use those techniques in acting systems

●  Outline: Ø  2a: Representing planning domains Ø  2b: Planning algorithms Ø  2c: Acting and planning




2a: Represen.ng Planning Domains




Domain Model ●  Planning domain: an abstract model of the environment

Ø  Many different kinds of environments, various ways to model them

●  In this chapter, the model is a deterministic state-transition system Ø  Σ = (S,A,γ)

•  S is a finite set of states ▸  States of the world

•  A is a finite set of actions ▸  Things an actor can do

•  γ: S × A → S is a prediction function (or state-transition function) ▸  Given a state s and action a, γ(s,a) is another state

•  Prediction of what state will be produced by executing a in s •  γ is partial: γ(s,a) is undefined if a is inapplicable in s

▸  Dom(a) = {s ∈ S | γ(s,a) is defined} = {states where a is applicable} ▸  Range(a) = {γ(s,a) | s ∈ Dom(a)}


Implicit Assump.ons ●  The state-transition model incorporates the following assumptions

Ø  Static world •  Changes occur only in response to the actor’s actions

Ø  Perfect information •  Actor always has all the information it needs

Ø  Instantaneous actions •  Each action causes an instantaneous transition from one state to the next

Ø  Determinism •  Actions are deterministic

Ø  Correct prediction function •  Outcome of action a in state s is always γ(s,a)

Ø  Flat search space •  Only one level of abstraction; ignore how to refine actions at a lower level


How to Represent Σ?

●  If the domain is small enough Ø  Give each state and action a

unique name Ø  For each s and a, store γ(s,a)

in a lookup table

loc1

loc3

loc2

loc6

loc5loc4 loc7

loc8

loc0

x

y

4

3

2

1

1 2 3 4 5 60

loc9

●  If a domain is larger, don’t represent all states explicitly Ø  Have a formalism for describing states by describing their properties Ø  Represent each action by describing how it changes those properties Ø  Start with initial state, use actions to produce other states


Determinis.c Operator (General Form) ●  Domain-specific format for representing states

Ø  Invent your own format ●  General form of a deterministic operator:

Ø  o = (head(o), pre(o), eff(o), cost(o)) •  head(o): name and parameter list •  pre(o): preconditions

▸  Computational tests to predict whether an action can be performed in a state s

▸  In principle, should be necessary/sufficient for the action to run without error

•  eff(o): effects ▸  Procedures that assign new values to some of the state variables

•  cost(o): procedure that returns a number ▸  Can be omitted, in which case cost(o) = 1 ▸  Could represent monetary cost, time required, something else


Example ●  Suppose we want to plan how to create a metal hole in the workpiece ●  a state s includes

Ø  geometric model of the workpiece, variables describing its location, orientation, and other status information,

Ø  capabilities and status of drilling machine and drill ●  Several actions (getting the workpiece onto the machine, clamping it, loading a

drill bit, etc.) Ø  Next slide: the drilling operation itself


Drilling Ac.on ●  head(o) = drill-‐hole(m, w, b, x1, …, xn)

Ø  m, w, b are the names or ID numbers of the drilling machine, the workpiece, and the drill bit

Ø  x1, …, xn are specifications of the hole’s location, orientation, depth, and required machining tolerances

●  pre(o): computational tests Ø  Can the drilling machine and drill bit produce a hole having the desired

geometry and machining tolerances? Ø  Is the drill loaded into the drilling machine? Is the workpiece is properly

clamped onto the drilling platform? Etc. ●  eff(o): geometric computation

Ø  Modify the geometric model of the workpiece to include a hole having the desired specifications.

●  cost(o) Ø  could be an estimate of the time required for the drilling operation Ø  could be a cost estimate based on time + other criteria


State-‐Variable Representa.on ●  Represent each state as a collection of properties of various objects

Ø  Represent each action by describing how it changes those properties ●  Let B = {all objects that matter for planning}

Ø  B may be classified into subsets: various kinds of objects ●  Example

Ø  B = Robots ∪ Containers ∪ Locs ∪ Booleans •  Robots = {r1, r2} •  Containers = {c1, c2} •  Locs = {d1, d2, d3} •  Booleans = {T, F}

d2 d1

d3

r1 c1

r2 c2


Proper.es of Objects ●  Define ways to represent properties of objects

Ø  Two kinds of properties: rigid and varying ●  A property is rigid if it stays the same in every state

Ø  Represent as a mathematical relation ●  Example:

Ø  adjacent = {(d1,d2), (d2,d1)} Ø  Can also write as

•  adjacent(d1,d2) •  adjacent(d2,d1)

d2 d1

d3

r1 c1

r2 c2


Varying Proper.es ●  A property is varying if it may differ in different states

Ø  Represent using a state variable that we can assign a value to Ø  Each state variable x has a range (set of possible values), Range(x) Ø  For each state s, s(x) ∈ Range(x) is x’s value in state s

●  Example: Ø  what we want to represent Ø  Each robot can hold at most one container Ø  Each robot is at a one of the locations Ø  Each container is on a robot or at one of the locations

d2 d1

d3

r1 c1

r2 c2


A Simple Example ●  Set of all state variables

Ø  X = {loc(r1), loaded(r1), loc(c1), loc(r2), loaded(r2), loc(c2)}

●  Range(loc(r1)) = Locs ●  Range(loc(r2)) = Locs

●  Range(loaded(r1)) = Booleans ●  Range(loaded(r2)) = Booleans

●  Range(loc(c1)) = Robots ∪ Locs ●  Range(loc(c2)) = Robots ∪ Locs

Why have both loc and loaded?

d2 d1

d3

r1 c1

r2 c2


States as Func.ons ●  Write each state as a function s that maps each x ∈ X to a value in Range(x)

s1(loc(r1)) = d1, s1(loaded(r1)) = F, s1(loc(c1)) = d1, s1(loc(r2)) = d2, s1(loaded(r2)) = F, s1(loc(c2)) = d2 Ø  Let S = the set of all such functions

●  Functions are sets of ordered pairs s1 = {(loc(r1), d1), (loaded(r1), F), (loc(c1), d1), …}

●  Rewrite as s1 = {loc(r1) = d1, loaded(r1) = F, loc(c1) = d1,

loc(r2) = d2, loaded(r2) = F, loc(c2) = d2}

d2 d1

d3

r1 c1

r2 c2


●  Just because a function maps each state variable x into a value in Range(x), this doesn’t make it a state Ø  Some sets of state-variable values may not make any sense as states

s = {loaded(r1) = F, loc(c1) = r1, …} s = {loc(c1) = d1, loc(c2) = r1, …}

●  Need restrictions on what sets of state-variable values constitute real states Ø  In most of the book, we won’t represent such restrictions explicitly Ø  Just write the action models in such a way that none of them will ever

produce such a set of assignments

Discussion

d2 d1

d3

r1 c1

r2 c2


Descrip.ve Ac.on Models ●  Actions often fall into closely related classes ●  For each class, write a parameterized schema called a planning operator to

describe all the actions in that class action move(r, l, m)

Pre: loc(r) = l, adjacent(l, m) Eff: loc(r) ← m

action take(r, l, c) Pre: loaded(r) = F, loc(r) = l, loc(c) = l, Eff: loaded(r) ← T, loc(c) ← r

action put(r, l, c) Pre: loc(r) = l, loc(c) = r Eff: loaded(r) ← F, loc(c) ← l

Each parameter has a range of possible values, e.g., Range(r) = Robots

d2 d1

d3

r1 c1

r2 c2


CSV Operator ●  Classical State Variable (CSV) Operator:

•  the kind of operator shown on the previous page ●  General form

Ø  o = (head(o), pre(o), eff(o), cost(o)) •  each precondition in pre(o) must have one of these forms:

▸  relname(t1,…,tk) varname(t1,…,tk) = t0 ¬relname(t1,…,tk) varname(t1,…,tk) ≠ t0 •  relname = name of a rigid relation •  varname = name of a state variable •  each ti must be a constant (i.e., a member of B) or a parameter

•  each effect in eff(o) must have the form varname(t1,…,tk) ← t0 •  if cost(o) isn’t omitted, it must be a nonnegative number (not a formula)

●  Limited representational capability, but easy to compute, easy to reason about Ø  Many algorithms have been written to use this kind of operator


CSV Ac.ons

●  A CSV action is an instance of a planning operator Ø  assign values to parameters

action move(r1,d1,d2) Pre: loc(r1) = d1, adjacent(d1,d2) Eff: loc(r1) ← d2 action take(r2,d2,c2) Pre: loaded(r2) = F, loc(r2) = d2, loc(c2) = d2, Eff: loaded(r2) ← T, loc(c1) ← r2

action put(r1,d1,c1) Pre: loc(r1) = d1, loc(c1) = r1 Eff: loaded(r1) ← F, loc(c1) ← d1

d2 d1

d3

r1 c1

r2 c2


Compu.ng γ

●  s1 = {loaded(r1) = F, loaded(r2) = F, loc(r1) = d1, loc(r2) = d2,

loc(c1) = d1, loc(c2) = d2}

●  action take(r2,d2,c2) Pre: loaded(r2) = F, loc(r2) = d2, loc(c2) = d2, Eff: loaded(r2) ← T, loc(c1) ← r2

●  γ(s1, take(r2,d2,c2)) = {loaded(r1) = F, loaded(r2) = T, loc(r1) = d1, loc(r2) = d2, loc(c1) = d1, loc(c2) = r2}

d2 d1

d3

r1 c1

r2 c2

d2 d1

d3

r1 c1

r2 c2


Plans

●  Plan: a sequence of actions π = 〈a1,a2,…,an〉 Ø  π is applicable in s0 if the actions

can be applied in the order given •  γ (s0,a1) = s1, γ (s1,a2) = s2, …, γ (sn–1,an) = sn

●  If π is applicable, define γ (s0,π) = sn

●  Example π = 〈take(r2,d2,c2), move(r2,d2,d1), put(r2,d1,c2)〉

d2 d1

d3

c1 r2

c2

d2 d1

d3

c1 c2

s0:

γ (s0,π):

r2 r1

r1


Planning Problems, Solu.ons ●  Planning problem: P=(Σ,s0,g)

Ø  Σ = planning domain Ø  s0 = initial state Ø  g = {g1,…,gk} is a set of constraints called the goal

●  π is a solution for P if γ (s0,π) satisfies g

Ø  π is a shortest solution if no shorter plan is also a solution Ø  π is a minimal solution if no proper subsequence of π is also a solution

●  CSV planning problem: Ø  Σ is a CSV planning domain, Ø  g has the same form as a CSV operator’s preconditions

●  Example: Ø  s0 and π as on previous page

Ø  g = {loc(r2)=d1, loc(c2)=d1}


Classical and State-‐Variable Representa.ons ●  Motivation

Ø  The field of AI planning started out as automated theorem proving Ø  It still uses a lot of that notation

●  Classical representation Ø  Equivalent to CSV representation Ø  Represents both rigid and varying properties using logical predicates

•  adjacent(l,m) - location l is adjacent to location m •  loc(r,l) - robot r is at location l •  loc(c,l), loc(c,r) - container c is at location l or on robot r •  loaded(r) - there is a container on robot r

d2 d1

d3

r1 c1

r2 c2


States ●  Use ground atoms to represent both rigid and varying properties of Σ ●  To represent a state

Ø  s = {all ground atoms that are true in ŝ}

Ø  e.g., s1 = {loc(c1,d1), loc(c2,d2), loc(r1,d1), loc(r2,d2), adjacent(d1,d2), adjacent(d2,d1), adjacent(d1,d3), adjacent(d3,d1)}

d2 d1

d3

r1 c1

r2 c2


Classical Planning Operators ●  Classical planning operator:

Ø  o = (head(o), pre(o), eff(o)) ●  Each precondition and effect

must have one of these forms: pred(t1,…,tk) ¬pred(t1,…,tk) Ø  pred = predicate name Ø  each ti must be a constant

(i.e., member of B) or a parameter

●  Classical action: a ground instance of a classical operator

move(r,l,m) Precond: loc(r,l), Effects: ¬loc(r,l), loc(r,m)

take(r,l,c) Precond: loc(r,l), loc(c,l), ¬loaded(r) Effects: loc(c,r), ¬loc(c,l), loaded(r)

put(r,l,c) Precond: loc(r,l), loc(c,r) Effects: loc(c,l), ¬loc(c,r), ¬loaded(r)

d2 d1

d3

r1 c1

r2 c2


Discussion ●  Classical representation is equivalent to state-variable representation in

expressive power Ø  Each can be converted to the other in linear time and space

●  Classical representation is more natural for logicians ●  CSV is more natural for engineers and most computer scientists

Ø  When changing a value, you don’t have to explicitly delete the old one ●  Historically, classical representation has been more widely used

Ø  Many of the algorithms in the book were originally written to use classical representation

Ø  That’s starting to change

Classical rep.

CSV rep.

P(b1,…,bk) becomes xP(b1,…,bk)=1

x(b1,…,bn)=b0 becomes

Px(b1,…,bn,b0)




2b: Planning Algorithms




Mo.va.on

●  Nearly all planning procedures are search procedures ●  Different planning procedures have different search spaces

Ø  This section: state-space planning •  Each node represents a state of the world

▸  A plan is a path through the space


Forward Search

Which forward-search algorithm depends on how you implement the nondeterministic choice I’ll discuss several such algorithms


Depth-‐First Search ●  At each state, select the action that has the lowest heuristic-functionvalue ●  Visited is for cycle-checking

Ø  If you come to a state you’ve already seen on the current path, then backtrack Ø  In a finite domain, this guarantees termination

●  Guaranteed to find a solution if one exists


Greedy Search ●  Like DFFS, but never backtracks

Ø  Not guaranteed to find a solution


●  Requires a heuristic function in line 3 Ø  Chooses a node (π,s) in Fringe having the smallest value for cost(π) + h(s) Ø  Expands the node (computes nodes for all applicable actions) Ø  Prunes nodes that can be shown to be no better than nodes already expanded

A*



A*


Bucharest

Giurgiu

Urziceni

Hirsova

Eforie

NeamtOradea

Zerind

Arad

Timisoara

LugojMehadia

DobretaCraiova

Sibiu

Fagaras

PitestiRimnicu Vilcea

Vaslui

Iasi

Straight−line distanceto Bucharest

0160242161

77151

241

366

193

178

253329

80199

244

380

226

234

374

98

Giurgiu

UrziceniHirsova

Eforie

Neamt

Oradea

Zerind

Arad

Timisoara

Lugoj

Mehadia

DobretaCraiova

Sibiu Fagaras

Pitesti

Vaslui

Iasi

Rimnicu Vilcea

Bucharest

71

75

118

111

70

75120

151

140

99

80

97

101

211

138

146 85

90

98

142

92

87

86

Arad366=0+366


Bucharest

Giurgiu

Urziceni

Hirsova

Eforie

NeamtOradea

Zerind

Arad

Timisoara

LugojMehadia

DobretaCraiova

Sibiu

Fagaras


Vaslui

Iasi


0160242161

77151

241

366

193

178

253329

80199

244

380

226

234

374

98

Giurgiu

UrziceniHirsova

Eforie

Neamt

Oradea

Zerind

Arad

Timisoara

Lugoj

Mehadia

DobretaCraiova

Sibiu Fagaras

Pitesti

Vaslui

Iasi

Rimnicu Vilcea

Bucharest

71

75

118

111

70

75120

151

140

99

80

97

101

211

138

146 85

90

98

142

92

87

86

Zerind

Arad

Sibiu Timisoara447=118+329 449=75+374393=140+253


Bucharest

Giurgiu

Urziceni

Hirsova

Eforie

NeamtOradea

Zerind

Arad

Timisoara

LugojMehadia

DobretaCraiova

Sibiu

Fagaras


Vaslui

Iasi


0160242161

77151

241

366

193

178

253329

80199

244

380

226

234

374

98

Giurgiu

UrziceniHirsova

Eforie

Neamt

Oradea

Zerind

Arad

Timisoara

Lugoj

Mehadia

DobretaCraiova

Sibiu Fagaras

Pitesti

Vaslui

Iasi

Rimnicu Vilcea

Bucharest

71

75

118

111

70

75120

151

140

99

80

97

101

211

138

146 85

90

98

142

92

87

86

Zerind

Arad

Sibiu

Arad

Timisoara

Rimnicu VilceaFagaras Oradea

447=118+329 449=75+374

646=280+366 413=220+193415=239+176 671=291+380


Bucharest

Giurgiu

Urziceni

Hirsova

Eforie

NeamtOradea

Zerind

Arad

Timisoara

LugojMehadia

DobretaCraiova

Sibiu

Fagaras


Vaslui

Iasi


0160242161

77151

241

366

193

178

253329

80199

244

380

226

234

374

98

Giurgiu

UrziceniHirsova

Eforie

Neamt

Oradea

Zerind

Arad

Timisoara

Lugoj

Mehadia

DobretaCraiova

Sibiu Fagaras

Pitesti

Vaslui

Iasi

Rimnicu Vilcea

Bucharest

71

75

118

111

70

75120

151

140

99

80

97

101

211

138

146 85

90

98

142

92

87

86

Zerind

Arad

Sibiu

Arad

Timisoara

Fagaras Oradea

447=118+329 449=75+374

646=280+366 415=239+176Rimnicu Vilcea

Craiova Pitesti Sibiu526=366+160 553=300+253417=317+100

671=291+380

X X


Bucharest

Giurgiu

Urziceni

Hirsova

Eforie

NeamtOradea

Zerind

Arad

Timisoara

LugojMehadia

DobretaCraiova

Sibiu

Fagaras


Vaslui

Iasi


0160242161

77151

241

366

193

178

253329

80199

244

380

226

234

374

98

Giurgiu

UrziceniHirsova

Eforie

Neamt

Oradea

Zerind

Arad

Timisoara

Lugoj

Mehadia

DobretaCraiova

Sibiu Fagaras

Pitesti

Vaslui

Iasi

Rimnicu Vilcea

Bucharest

71

75

118

111

70

75120

151

140

99

80

97

101

211

138

146 85

90

98

142

92

87

86

Zerind

Arad

Sibiu

Arad

Timisoara

Sibiu Bucharest


Craiova Pitesti Sibiu

447=118+329 449=75+374

646=280+366

591=338+253 450=450+0 526=366+160 553=300+253417=317+100

671=291+380

X X X


Bucharest

Giurgiu

Urziceni

Hirsova

Eforie

NeamtOradea

Zerind

Arad

Timisoara

LugojMehadia

DobretaCraiova

Sibiu

Fagaras


Vaslui

Iasi


0160242161

77151

241

366

193

178

253329

80199

244

380

226

234

374

98

Giurgiu

UrziceniHirsova

Eforie

Neamt

Oradea

Zerind

Arad

Timisoara

Lugoj

Mehadia

DobretaCraiova

Sibiu Fagaras

Pitesti

Vaslui

Iasi

Rimnicu Vilcea

Bucharest

71

75

118

111

70

75120

151

140

99

80

97

101

211

138

146 85

90

98

142

92

87

86

Zerind

Arad

Sibiu

Arad

Timisoara

Sibiu Bucharest


Craiova Pitesti Sibiu

Bucharest Craiova Rimnicu Vilcea418=418+0

447=118+329 449=75+374

646=280+366

591=338+253 450=450+0 526=366+160 553=300+253

615=455+160 607=414+193

671=291+380

X X X

X



A*


●  h is admissible if for every s, 0 ≤ h(s) ≤ h*(s) Ø  where h*(s) = least cost of getting from s to a state that satisfies g

●  If h is admissible then A* is guaranteed to return an optimal solution ●  Inadmissible heuristics might get you to a solution faster, but the solution won’t

necessarily be optimal

Proper.es of A*


Depth-‐First Branch and Bound ●  Depth-first search with heuristic function and pruning

Ø  π* and c*: least-cost solution found so far, and the cost of that solution Ø  Any time you find a solution with lower cost, update π* and c* Ø  Prune any plan π such that cost(π) + h(s) ≥ c*


Discussion ●  If the state space is not too large, then A* or DFBB may be preferable

Ø  They are guaranteed to return optimal solutions ●  DFFS returns the first solution it finds

Ø  can be arbitrarily far from optimal ●  Greedy isn’t guaranteed to return a solution at all ●  If S is very large, A* may require excessive memory, and both it and DFBB may

require excessive running time Ø  In these cases DFFS and Greedy may be preferable


Example ●  One rigid relation, adjacent ●  State variables to represent each

location’s x and y coordinates: Ø  x = {(loc0, 2), (loc1, 0),

(loc2, 4), (loc3, 0), . . .} Ø  y = {(loc0, 4), (loc1, 3),

(loc2, 4), (loc3, 2), . . .} ●  One planning operator:

action move(r, l, m) Ø  pre: adjacent(l, m), loc(r) = l Ø  eff: loc(r) ← m

●  cost(loci, move(r,locj)) = ︎(xi −xj)2 +(yi − yj)2

= ︎(x(loci) − x(locj))2 + (y(loci) − y(locj))2 ●  hsld(si) = ︎(x(loci) − x(locg))2 + (y(loci) − y(locg))2

Ø  Optimal solution to a relaxed problem in which the locations don’t need to be adjacent

loc1

loc3

loc2

loc6

loc5loc4 loc7

loc8

loc0

x

y

4

3

2

1

1 2 3 4 5 60

loc9

Not CSV


CSV Example

●  s0 = {loc(r1) = d3, loaded(r1) = F, loc(c1) = d1};

●  g = {loc(r1) = d3, loc(c1) = r1}

●  action move(r, d, e) Ø  pre: loc(r) = d Ø  eff: loc(r) ← e

●  action load(r, c, l) Ø  pre: loaded(r) = F, loc(c) = l, loc(r) = l Ø  eff: loaded(r) ← T, loc(c) ← r

●  action unload(r, c, l) Ø  pre: loc(c) = r, loc(r) = l Ø  eff: loaded(r) ← F, loc(c) ← l

d3 r1 c1 s0:

g:

d2 d1

d3 r1

c1 g = {loc(r1) = d3, loc(c1) = r1}


d3 r1 c1 s0:

g:

d2 d1

d3 r1

c1

CSV Example ●  Two applicable actions ●  a1 = move(r1, d3, d1)

s1 = {loc(r1) = d1, loaded(r1) = F, loc(c1) = d1}

●  a2 = move(r1, d3, d2) s2 = {loc(r1) = d2, loaded(r1) = F, loc(c1) = d1}

●  In line 4, compute Ø  min(h(s1), h(s2))

g = {loc(r1) = d3, loc(c1) = r1}


Addi.ve-‐Cost and Max-‐Cost Heuris.cs ●  hadd(s) = Δadd(s,g), where

●  Minimum cost of a “plan tree” to achieve g from s

Ø  Pretend each element of g needs a completely separate plan •  If an action achieves i elements of g, it’s included i times

Ø  Pretend each of an action’s preconditions needs a completely separate plan Ø  Thus, get a “plan” that’s a tree of actions

•  Total cost can sometimes be much higher than h*(s) ●  Can implement as a tree search, going backward from g


true in s2

loc(r1)=d3 loc(c1)=r1

g = {loc(r1)=d3,,loc(c1)=r1}

move(r1,d1,d3) move(r1,d2,d3)pre:

loc(r1)=d1pre:

loc(r1)=d2

true in s1 … 00 > 0

min(1,(>1)) = 1

pre:

loaded(r1)=nil

true in s1

loc(r1)=d1 loc(c1)=d1

true in s2

0 0

load(r1,c1,d1)

0+1 = 10+1 = 1 (>0)+1 > 1

load(r1,c2,d2)

load(r1,c3,d3)

…

…> 0

> 0

(>0) +1 > 1(>0)+1 > 1

sum

min(1,(>1),(>1)) = 1

sum(0,0,0) = 0su m

alternativesalternatives

hadd(s1) = Δadd(s1,g) = sum(1,1) = 2

d3 r1 c1

s1:

d2 d1

d3

r1 c1

Addi.ve-‐Cost Heuris.c

g = {loc(r1) = d3, loc(c1) = r1}


hadd(s2) = Δadd(s2,g) = sum(1,2) = 3

move(r1,d2,d1)


g = {loc(r1)=d3,0loc(c1)=r1}


loc(r1)=d1pre:

loc(r1)=d2

… true in s20

> 0 0

min(1,(>1)) = 1

pre:

loaded(r1)=nil

true in s2


true in s2

00+1 = 1

load(r1,c1,d1)

1+1 = 2(>0)+1 > 1 0+1 = 1

load(r1,c2,d2)

load(r1,c3,d3)

…

…> 1

> 1

(>1) +1 > 2(>1)+1 > 2

sum

true in s2

0

min(2,(>2),(>2)) = 2

sum(0,1,0) = 1su m


Addi.ve-‐Cost Heuris.c

g = {loc(r1) = d3, loc(c1) = r1} d3

r1 c1 s2:

d2 d1

d3

r1 c1


Addi.ve-‐Cost and Max-‐Cost Heuris.cs ●  hmax(s) = Δmax(s,g), where

●  Like hadd(s), but doesn’t add all of the costs in the plan tree

Ø  Just the most costly path in the plan tree ●  Guaranteed to be a lower bound on h*(s)

●  Same tree search as for hadd(s)


true in s2


g = {loc(r1)=d3,,loc(c1)=r1}


loc(r1)=d1pre:

loc(r1)=d2

true in s1 … 00 > 0

min(1,(>1)) = 1

pre:

loaded(r1)=nil

true in s1


true in s2

0 0

load(r1,c1,d1)

0+1 = 10+1 = 1 (>0)+1 > 1

load(r1,c2,d2)

load(r1,c3,d3)

…

…> 0

> 0

(>0) +1 > 1(>0)+1 > 1

max

min(1,(>1),(>1)) = 1

max(0,0,0) = 0m ax


hmax(s1) = Δmax(s1,g) = max(1,1) = 1

s1:

d2 d1

d3

r1 c1

Max-‐Cost Heuris.c

g = {loc(r1) = d3, loc(c1) = r1} d3

r1 c1


move(r1,d2,d1)


g = {loc(r1)=d3,0loc(c1)=r1}


loc(r1)=d1pre:

loc(r1)=d2

… true in s20

> 0 0

min(1,(>1)) = 1

hmax(s2) = Δmax(s2,g) = max(1,2) = 2

pre:

loaded(r1)=nil

true in s2


true in s2

00+1 = 1

load(r1,c1,d1)

1+1 = 2(>0)+1 > 1 0+1 = 1

load(r1,c2,d2)

load(r1,c3,d3)

…

…> 1

> 1

(>1) +1 > 2(>1)+1 > 2

max

true in s2

0

min(2,(>2),(>2)) = 2

max(0,1,0) = 1m ax


Max-‐Cost Heuris.c

g = {loc(r1) = d3, loc(c1) = r1} d3

r1 c1 s2:

d2 d1

d3

r1 c1


Delete-‐Relaxa.on Heuris.cs ●  Suppose a state s includes an assignment x = v ●  Suppose an action a has an effect x ← w ●  Then γ+(s, a) includes both x = v and x = w

●  Relaxed state (or r-state) Ø  any set ŝ of state-variable values Ø  may include more than one value for each state variable

●  ŝ r-satisfies a goal g if ŝ contains a subset that satisfies g ●  An action a is r-applicable in ŝ if ŝ r-satisfies a’s preconditions

Ø  In this case, γ+(ŝ,a) = ŝ ∪ γ(s,a) ●  π = ⟨a1, …, an⟩ is r-applicable in ŝ0 if there are r-states ŝ1, ŝ2, …, ŝn such that

•  a1 is r-applicable in ŝ0 and γ+(ŝ0,a1) = ŝ1 •  a2 is r-applicable in ŝ1 and γ+(ŝ1,a2) = ŝ2 •  …

Ø  In this case, γ+(ŝ,a) = ŝn ●  π is a relaxed solution for P = (Σ, s0, g) if γ+(s0, π) r-satisfies g

Name is from classical planning “don’t do the deletion”

move(r,l,m) Precond: loc(r,l), Effects: ¬loc(r,l), loc(r,m)


d3 r1 c1 ŝ1:

g:

d2 d1

d3

r1

r1

c1

Op.mal Relaxed Solu.on Heuris.c ●  ∆+(ŝ, g) = minimum cost of all plans π such that γ+(s0, π) r-satisfies g ●  Optimal relaxed solution heuristic: h+(s) = ∆+(s, g) ●  Example:

Ø  ŝ1 = γ+(s0, move(r1,d3,d1)) = {loc(r1) = d1, loaded(r1) = F, loc(c1) = d1, loc(r1) = d3}

g = {loc(r1) = d3, loc(c1) = r1}


d3 r1 c1 ŝ2:

g:

d2 d1

d3

r1

r1

c1 c1 r1

r1 c1

Op.mal Relaxed Solu.on Heuris.c ●  ∆+(ŝ, g) = minimum cost of all plans π such that γ+(s0, π) r-satisfies g ●  Optimal relaxed solution heuristic: h+(s) = ∆+(s, g) ●  Example:

Ø  ŝ1 = γ+(s0, move(r1,d3,d1)) = {loc(r1) = d1, loaded(r1) = F, loc(c1) = d1, loc(r1) = d3}

Ø  ŝ2 = γ+(s1, load(r1,c1,d1)) = {loc(r1) = d1, loaded(r1) = F, loc(c1) = d1, loc(r1) = d3, loaded(r1) = T, loc(c1) = r1}. Ø  ŝ2 r-satisfies g, so ⟨move(r1,d3,d1), load(r1,c1,d1)⟩ is a relaxed solution Ø  It’s optimal, so h+(s0) = 2

g = {loc(r1) = d3, loc(c1) = r1}


Op.mal Relaxed Solu.on Heuris.c ●  ∆+(ŝ, g) = minimum cost of all plans π such that γ+(s0, π) r-satisfies g ●  Optimal relaxed solution heuristic: h+(s) = ∆+(s, g)

●  Every solution is also a relaxed solution Ø  Thus h+ is admissible Ø  Problem: computing it is NP-hard

d3 r1 c1 s0:

g:

d2 d1

d3 r1

c1 g = {loc(r1) = d3, loc(c1) = r1}


RPG Heuris.c

●  Relaxed planning graph (RPG) heuristic Ø  An approximation of h+ that’s easier to compute

●  Based on the fact that γ+ doesn’t depend on the order in which actions are applied

●  If a1 and a2 are both applicable in s0 then γ+(s0, ⟨a1, a2⟩) = γ+(s0, ⟨a2, a1⟩) = s0 ∪ eff(a1) ∪ eff(a2)


RPG Heuris.c

●  Computation of RPG(s1,g) Ø  First line of RPG assigns ŝ0 = s1 , A0 = ∅

ŝ0:

loc(r1) = d1 loc(c1) = d1 loaded(r1) = F

ŝ1:

loc(r1) = d3 loc(r1) = d2 loaded(r1) = T loc(c1) = r1

loc(c1) = d1 loc(r1) = d1 loaded(r1) = F

A1:

move(r1,d1,d3) move(r1,d1,d2) load(r1,c1,d1)

From ŝ0

d3 r1 c1 s1:

g:

d2 d1

d3

r1 c1 g = {loc(r1) = d3, loc(c1) = r1}


RPG Heuris.c


●  RPG(s1,g) returns 2

ŝ0:

loc(r1) = d1 loc(c1) = d1 loaded(r1) = F

ŝ1:

loc(r1) = d3 loc(r1) = d2 loaded(r1) = T loc(c1) = r1


A1:

move(r1,d1,d3) move(r1,d1,d2) load(r1,c1,d1)

From ŝ0

d3 r1 c1 s1:

g:

d2 d1

d3

r1 c1 g = {loc(r1) = d3, loc(c1) = r1}


RPG Heuris.c

ŝ0:

loaded(r1) = F loc(c1) = d1 loc(r1) = d2

ŝ1:

loc(r1) = d3 loc(r1) = d1


A1:

move(r1,d2,d3) move(r1,d2,d1)

From ŝ0

A2:

move(r1,d3,d1) move(r1,d3,d2) move(r1,d1,d2) move(r1,d1,d3) load(r1,c1,d1) move(r1,d2,d3) move(r1,d2,d1)

ŝ2:

loc(c1) = r1 loaded(r1) = T



From ŝ0

d3 r1 c1 s2:

g:

d2 d1

d3

r1 c1 g = {loc(r1) = d3, loc(c1) = r1}



RPG Heuris.c

ŝ0:


ŝ1:



A1:

move(r1,d2,d3) move(r1,d2,d1)

From ŝ0

A2:

move(r1,d3,d1) move(r1,d3,d2) move(r1,d1,d2) move(r1,d1,d3) load(r1,c1,d1) move(r1,d2,d3) move(r1,d2,d1)

ŝ2:

loc(c1) = r1 loaded(r1) = T



From ŝ0

d3 r1 c1 s2:

g:

d2 d1

d3

r1 c1 g = {loc(r1) = d3, loc(c1) = r1}


●  RPG(s2,g) returns 3


Landmark Heuris.cs ●  P = (Σ,s0,g) be a planning problem ●  Let φ = φ1 ∨ ... ∨ φm be a disjunction of state-variable assignments ●  Definition: φ is a landmark for P if φ is true at some point in every solution plan

of P

●  Example Landmarks

d3 r1 c1 s0:

g:

d2 d1

d3 r1

c1 g = {loc(r1) = d3, loc(c1) = r1}

d1 r1

A complete state s0 A single state-variable Assignment (loc(r1)=d1)

d1 r1

d2 r1

Can be a disjunction of state-variable assignments loc(r1)=d1 ∨ loc(r1)=d2


Why are Landmarks Useful? ●  Help in breaking down the given problem into smaller subproblems

gs0


Why are Landmarks Useful? ●  Help in breaking down the given problem into smaller subproblems

●  Every solution to P has to achieve these landmarks ●  Possible strategy:

Ø  find a plan that takes us from s0 to any state s1 that satisfies lm1 Ø  find a plan that takes us from s1 to any state s2 that satisfies lm2 Ø  …

lm1

gs0

lm2

lm3P1

P2 P3

P4


Compu.ng Landmarks ●  Question: How do we compute landmarks for a problem P? ●  Not easy:

Ø  Deciding whether a state-variable assignment φ is a landmark is in the worst-case PSPACE-hard

Ø  To put it in perspective: as hard as solving the planning problem itself! ●  However, all is not lost:

Ø  There are often useful landmarks that can be found more easily Ø  There are polynomial-time procedures that can compute these landmarks Ø  Going to see one such procedure based on Relaxed Planning Graphs

●  Why Relaxed Planning Graphs? Ø  Solving relaxed planning problems easier

•  Computing landmarks for relaxed planning problems easier Ø  A landmark for a relaxed planning problem is a landmark for the original

planning problem as well


RPG-‐based Landmark Computa.on ●  Main intuition: if a state-variable assignment φ is a landmark, then preconditions

of actions that achieve φ is also a landmark Ø  In other words: we’re going to start from known landmarks and discover new

ones by looking at preconditions of actions that achieve these landmarks ●  Example:

Ø  g is the goal Ø  Actions a1 and a2 can achieve g Ø  Therefore, either p1∧q or p2∧q must be true to be able to achieve g Ø  In other words: (p1∧q)∨(p2∧q) is a landmark Ø  By rearranging terms: we get (p1∧q)∨(p2∧q) ≅ q∧(p1∨p2) Ø  Since q∧(p1∨p2) is a landmark, both q and (p1∨p2) are landmarks

●  In practice, we try to rearrange assignments in a similar manner to group terms with the same state variable together

g

a2

a1p1

q

p2


RPG-‐based Landmark Computa.on ●  Question: What landmarks can we start with?

Ø  Every goal is trivially a landmark; can start from there ●  E.g. loc(r1) = d3 is a landmark

Ø  Two actions achieve this landmark: move (r1, d3, d1) and move (r1, d2, d1)

Ø  Can infer a new landmark that is the disjunction of the preconditions of these two actions: φ‘ = loc(r1) = d3∨loc(r1) = d2

action move(r, d, e) pre: loc(r) = d eff: loc(r) ← e

d3 r1 c1 s0:

g:

d2 d1

d3 r1

c1 g = {loc(r1) = d3, loc(c1) = r1}


RPG-‐based Landmark Computa.on ComputeLandmark (s0, g = g1 ∧ g2 ∧ … gk) ●  queue = {g1, g2,…, gk} ●  While queue is not empty:

Ø  Remove a gi from queue

Ø  Ai = {all actions that can achieve gi} Ø  Compute all assignments that can be r-

produced starting from s0 without using Ai, thus generating the RPG foo

Ø  act(gi) = {all actions in Ai that are r-applicable in the r-state resulting from foo}

Ø  For each action in act(gi): •  Pick a precondition not satisfied in s0

and add to φ

Ø  The resulting disjunction φ is a landmark; add it to queue

gi

a1p1

q1

a2p2

q2

a3p3

q3








and add to φ


gi

a1p1

q1

a2p2

q2

a3p3

q3








and add to φ


gi

a1p1

q1

a2p2

q2

a3p3

q3








and add to φ


gi

a1p1

q1

a2p2

q2

a3p3

q3








and add to φ


gi

a1p1

q1

a2p2

q2

a3p3

q3








and add to φ


gi

a1p1

q1

a2p2

q2

a3p3

q3 p3∨q1: a new landmark


RPG-‐based Landmark Computa.on

d3 r1 c1 s0:

g:

d2 d1

d3 r1

c1 g = {loc(r1) = d3, loc(c1) = r1}

loc(r1) = d3 loc(c1) = r1 ComputeLandmark (s0, g = g1 ∧ g2 ∧ … gk) ●  queue = {g1, g2,…, gk} ●  While queue is not empty:


Ø  Ai = {all actions that can achieve gi} Ø  Compute all assignments that can be r-produced

starting from s0 without using Ai, thus generating the RPG foo


Ø  For each action in act(gi): •  Pick a precondition not satisfied in s0 and add to φ




d3 r1 c1 s0:

g:

d2 d1

d3 r1

c1 g = {loc(r1) = d3, loc(c1) = r1}

loc(r1) = d3 loc(c1) = r1

True in current state, always going

to be true in RPG; No use expanding

ComputeLandmark (s0, g = g1 ∧ g2 ∧ … gk) ●  queue = {g1, g2,…, gk} ●  While queue is not empty:









d3 r1 c1 s0:

g:

d2 d1

d3 r1

c1 g = {loc(r1) = d3, loc(c1) = r1}


3 actions achieve this: load (r1,c1,d1), load (r1,c1,d2), load (r1,c1,d3) action load(r, c, l)

pre: loaded(r) = nil, loc(c) = l, loc(r) = l eff: loaded(r) ← c, loc(c) ← r

















d3 r1 c1 s0:

g:

d2 d1

d3 r1

c1 g = {loc(r1) = d3, loc(c1) = r1}






d3 r1 c1 s0:

g:

d2 d1

d3 r1

c1 g = {loc(r1) = d3, loc(c1) = r1}

ŝ0 = {

loc(c1) = d1 loc(r1) = d3 loaded(r1) = nil }

A1 = {

move(r1,d3,d1) move(r1,d3,d2) }

ŝ1 = {


loc(c1) = d1 loc(r1) = d3 loaded(r1) = nil }

From ŝ0

load(r1,c1,d1) load(r1,c1,d2) load(r1,c1,d3)

Relaxed Planning Graph foo Only first load action applicable in r-state at the end of foo



d3 r1 c1 s0:

g:

d2 d1

d3 r1

c1 g = {loc(r1) = d3, loc(c1) = r1}




Only load (r1,c1,d1) is applicable in final level of RPG; these preconds used to generate new landmarks










d3 r1 c1 s0:

g:

d2 d1

d3 r1

c1 g = {loc(r1) = d3, loc(c1) = r1}


loaded(r1) = nil

loc(c1) = d1 loc(r1) = d1

These new landmarks are added to the queue









Landmark Heuris.c ●  Every solution to the problem needs to achieve all the computed landmarks ●  One possible heuristic to estimate distance of state s from g:

Ø  Number of landmarks required to be accomplished from s Ø  Planner biased towards actions that achieve landmarks

●  Is this heuristic admissible?


Landmark Heuris.c ●  Every solution to the problem needs to achieve all the computed landmarks ●  One possible heuristic to estimate distance of state s from g:

Ø  Number of landmarks required to be accomplished from s Ø  Planner biased towards actions that achieve landmarks

●  Is this heuristic admissible?

●  A number of more advanced landmark-based heuristics developed (including admissible ones) Ø  Check textbook for references

g1as0

g2Number of landmarks: | {g1, g2} | = 2 Optimal Plan Length = |<a>| = 1

g = g1∧g2

Chapter02

Documents