1 Dana Nau and Vikas Shivashankar: Lecture slides for Automated Planning and Ac0ng Updated 3/24/15 This work is licensed under a CreaCve Commons AFribuCon NonCommercial ShareAlike 4.0 InternaConal License . Chapter 2 Delibera.on with Determinis.c Models Dana S. Nau and Vikas Shivashankar University of Maryland
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
1 Dana Nau and Vikas Shivashankar: Lecture slides for Automated Planning and Ac0ng Updated 3/24/15
This work is licensed under a CreaCve Commons AFribuCon-‐NonCommercial-‐ShareAlike 4.0 InternaConal License.
Chapter 2 Delibera.on with Determinis.c Models
Dana S. Nau and Vikas Shivashankar
University of Maryland
2 Dana Nau and Vikas Shivashankar: Lecture slides for Automated Planning and Ac0ng Updated 3/24/15
Purpose of this Chapter ● Last time, Vikas mentioned conventional AI planning
Ø Given • a domain model (descriptions of the states and actions) • initial state s0, and goal g
Ø Find a plan or a policy that • is executable starting in s0 • produces a state that satisfies g
● This chapter discusses some techniques for doing that Ø Also, how to use those techniques in acting systems
3 Dana Nau and Vikas Shivashankar: Lecture slides for Automated Planning and Ac0ng Updated 3/24/15
This work is licensed under a CreaCve Commons AFribuCon-‐NonCommercial-‐ShareAlike 4.0 InternaConal License.
Chapter 2 Delibera.on with Determinis.c Models
2a: Represen.ng Planning Domains
Dana S. Nau and Vikas Shivashankar
University of Maryland
4 Dana Nau and Vikas Shivashankar: Lecture slides for Automated Planning and Ac0ng Updated 3/24/15
Domain Model ● Planning domain: an abstract model of the environment
Ø Many different kinds of environments, various ways to model them
● In this chapter, the model is a deterministic state-transition system Ø Σ = (S,A,γ)
• S is a finite set of states ▸ States of the world
• A is a finite set of actions ▸ Things an actor can do
• γ: S × A → S is a prediction function (or state-transition function) ▸ Given a state s and action a, γ(s,a) is another state
• Prediction of what state will be produced by executing a in s • γ is partial: γ(s,a) is undefined if a is inapplicable in s
▸ Dom(a) = {s ∈ S | γ(s,a) is defined} = {states where a is applicable} ▸ Range(a) = {γ(s,a) | s ∈ Dom(a)}
5 Dana Nau and Vikas Shivashankar: Lecture slides for Automated Planning and Ac0ng Updated 3/24/15
Implicit Assump.ons ● The state-transition model incorporates the following assumptions
Ø Static world • Changes occur only in response to the actor’s actions
Ø Perfect information • Actor always has all the information it needs
Ø Instantaneous actions • Each action causes an instantaneous transition from one state to the next
Ø Determinism • Actions are deterministic
Ø Correct prediction function • Outcome of action a in state s is always γ(s,a)
Ø Flat search space • Only one level of abstraction; ignore how to refine actions at a lower level
6 Dana Nau and Vikas Shivashankar: Lecture slides for Automated Planning and Ac0ng Updated 3/24/15
How to Represent Σ?
● If the domain is small enough Ø Give each state and action a
unique name Ø For each s and a, store γ(s,a)
in a lookup table
loc1
loc3
loc2
loc6
loc5loc4 loc7
loc8
loc0
x
y
4
3
2
1
1 2 3 4 5 60
loc9
● If a domain is larger, don’t represent all states explicitly Ø Have a formalism for describing states by describing their properties Ø Represent each action by describing how it changes those properties Ø Start with initial state, use actions to produce other states
7 Dana Nau and Vikas Shivashankar: Lecture slides for Automated Planning and Ac0ng Updated 3/24/15
Determinis.c Operator (General Form) ● Domain-specific format for representing states
Ø Invent your own format ● General form of a deterministic operator:
Ø o = (head(o), pre(o), eff(o), cost(o)) • head(o): name and parameter list • pre(o): preconditions
▸ Computational tests to predict whether an action can be performed in a state s
▸ In principle, should be necessary/sufficient for the action to run without error
• eff(o): effects ▸ Procedures that assign new values to some of the state variables
• cost(o): procedure that returns a number ▸ Can be omitted, in which case cost(o) = 1 ▸ Could represent monetary cost, time required, something else
8 Dana Nau and Vikas Shivashankar: Lecture slides for Automated Planning and Ac0ng Updated 3/24/15
Example ● Suppose we want to plan how to create a metal hole in the workpiece ● a state s includes
Ø geometric model of the workpiece, variables describing its location, orientation, and other status information,
Ø capabilities and status of drilling machine and drill ● Several actions (getting the workpiece onto the machine, clamping it, loading a
drill bit, etc.) Ø Next slide: the drilling operation itself
9 Dana Nau and Vikas Shivashankar: Lecture slides for Automated Planning and Ac0ng Updated 3/24/15
11 Dana Nau and Vikas Shivashankar: Lecture slides for Automated Planning and Ac0ng Updated 3/24/15
Proper.es of Objects ● Define ways to represent properties of objects
Ø Two kinds of properties: rigid and varying ● A property is rigid if it stays the same in every state
Ø Represent as a mathematical relation ● Example:
Ø adjacent = {(d1,d2), (d2,d1)} Ø Can also write as
• adjacent(d1,d2) • adjacent(d2,d1)
d2 d1
d3
r1 c1
r2 c2
12 Dana Nau and Vikas Shivashankar: Lecture slides for Automated Planning and Ac0ng Updated 3/24/15
Varying Proper.es ● A property is varying if it may differ in different states
Ø Represent using a state variable that we can assign a value to Ø Each state variable x has a range (set of possible values), Range(x) Ø For each state s, s(x) ∈ Range(x) is x’s value in state s
● Example: Ø what we want to represent Ø Each robot can hold at most one container Ø Each robot is at a one of the locations Ø Each container is on a robot or at one of the locations
d2 d1
d3
r1 c1
r2 c2
13 Dana Nau and Vikas Shivashankar: Lecture slides for Automated Planning and Ac0ng Updated 3/24/15
A Simple Example ● Set of all state variables
Ø X = {loc(r1), loaded(r1), loc(c1), loc(r2), loaded(r2), loc(c2)}
15 Dana Nau and Vikas Shivashankar: Lecture slides for Automated Planning and Ac0ng Updated 3/24/15
● Just because a function maps each state variable x into a value in Range(x), this doesn’t make it a state Ø Some sets of state-variable values may not make any sense as states
s = {loaded(r1) = F, loc(c1) = r1, …} s = {loc(c1) = d1, loc(c2) = r1, …}
● Need restrictions on what sets of state-variable values constitute real states Ø In most of the book, we won’t represent such restrictions explicitly Ø Just write the action models in such a way that none of them will ever
produce such a set of assignments
Discussion
d2 d1
d3
r1 c1
r2 c2
16 Dana Nau and Vikas Shivashankar: Lecture slides for Automated Planning and Ac0ng Updated 3/24/15
Descrip.ve Ac.on Models ● Actions often fall into closely related classes ● For each class, write a parameterized schema called a planning operator to
describe all the actions in that class action move(r, l, m)
action put(r, l, c) Pre: loc(r) = l, loc(c) = r Eff: loaded(r) ← F, loc(c) ← l
Each parameter has a range of possible values, e.g., Range(r) = Robots
d2 d1
d3
r1 c1
r2 c2
17 Dana Nau and Vikas Shivashankar: Lecture slides for Automated Planning and Ac0ng Updated 3/24/15
CSV Operator ● Classical State Variable (CSV) Operator:
• the kind of operator shown on the previous page ● General form
Ø o = (head(o), pre(o), eff(o), cost(o)) • each precondition in pre(o) must have one of these forms:
▸ relname(t1,…,tk) varname(t1,…,tk) = t0 ¬relname(t1,…,tk) varname(t1,…,tk) ≠ t0 • relname = name of a rigid relation • varname = name of a state variable • each ti must be a constant (i.e., a member of B) or a parameter
• each effect in eff(o) must have the form varname(t1,…,tk) ← t0 • if cost(o) isn’t omitted, it must be a nonnegative number (not a formula)
● Limited representational capability, but easy to compute, easy to reason about Ø Many algorithms have been written to use this kind of operator
18 Dana Nau and Vikas Shivashankar: Lecture slides for Automated Planning and Ac0ng Updated 3/24/15
CSV Ac.ons
● A CSV action is an instance of a planning operator Ø assign values to parameters
Ø Σ = planning domain Ø s0 = initial state Ø g = {g1,…,gk} is a set of constraints called the goal
● π is a solution for P if γ (s0,π) satisfies g
Ø π is a shortest solution if no shorter plan is also a solution Ø π is a minimal solution if no proper subsequence of π is also a solution
● CSV planning problem: Ø Σ is a CSV planning domain, Ø g has the same form as a CSV operator’s preconditions
● Example: Ø s0 and π as on previous page
Ø g = {loc(r2)=d1, loc(c2)=d1}
22 Dana Nau and Vikas Shivashankar: Lecture slides for Automated Planning and Ac0ng Updated 3/24/15
Classical and State-‐Variable Representa.ons ● Motivation
Ø The field of AI planning started out as automated theorem proving Ø It still uses a lot of that notation
● Classical representation Ø Equivalent to CSV representation Ø Represents both rigid and varying properties using logical predicates
• adjacent(l,m) - location l is adjacent to location m • loc(r,l) - robot r is at location l • loc(c,l), loc(c,r) - container c is at location l or on robot r • loaded(r) - there is a container on robot r
d2 d1
d3
r1 c1
r2 c2
23 Dana Nau and Vikas Shivashankar: Lecture slides for Automated Planning and Ac0ng Updated 3/24/15
States ● Use ground atoms to represent both rigid and varying properties of Σ ● To represent a state
25 Dana Nau and Vikas Shivashankar: Lecture slides for Automated Planning and Ac0ng Updated 3/24/15
Discussion ● Classical representation is equivalent to state-variable representation in
expressive power Ø Each can be converted to the other in linear time and space
● Classical representation is more natural for logicians ● CSV is more natural for engineers and most computer scientists
Ø When changing a value, you don’t have to explicitly delete the old one ● Historically, classical representation has been more widely used
Ø Many of the algorithms in the book were originally written to use classical representation
Ø That’s starting to change
Classical rep.
CSV rep.
P(b1,…,bk) becomes xP(b1,…,bk)=1
x(b1,…,bn)=b0 becomes
Px(b1,…,bn,b0)
26 Dana Nau and Vikas Shivashankar: Lecture slides for Automated Planning and Ac0ng Updated 3/24/15
This work is licensed under a CreaCve Commons AFribuCon-‐NonCommercial-‐ShareAlike 4.0 InternaConal License.
Chapter 2 Delibera.on with Determinis.c Models
2b: Planning Algorithms
Dana S. Nau and Vikas Shivashankar
University of Maryland
27 Dana Nau and Vikas Shivashankar: Lecture slides for Automated Planning and Ac0ng Updated 3/24/15
Mo.va.on
● Nearly all planning procedures are search procedures ● Different planning procedures have different search spaces
Ø This section: state-space planning • Each node represents a state of the world
▸ A plan is a path through the space
28 Dana Nau and Vikas Shivashankar: Lecture slides for Automated Planning and Ac0ng Updated 3/24/15
Forward Search
Which forward-search algorithm depends on how you implement the nondeterministic choice I’ll discuss several such algorithms
29 Dana Nau and Vikas Shivashankar: Lecture slides for Automated Planning and Ac0ng Updated 3/24/15
Depth-‐First Search ● At each state, select the action that has the lowest heuristic-functionvalue ● Visited is for cycle-checking
Ø If you come to a state you’ve already seen on the current path, then backtrack Ø In a finite domain, this guarantees termination
● Guaranteed to find a solution if one exists
30 Dana Nau and Vikas Shivashankar: Lecture slides for Automated Planning and Ac0ng Updated 3/24/15
Greedy Search ● Like DFFS, but never backtracks
Ø Not guaranteed to find a solution
31 Dana Nau and Vikas Shivashankar: Lecture slides for Automated Planning and Ac0ng Updated 3/24/15
● Requires a heuristic function in line 3 Ø Chooses a node (π,s) in Fringe having the smallest value for cost(π) + h(s) Ø Expands the node (computes nodes for all applicable actions) Ø Prunes nodes that can be shown to be no better than nodes already expanded
A*
32 Dana Nau and Vikas Shivashankar: Lecture slides for Automated Planning and Ac0ng Updated 3/24/15
● Requires a heuristic function in line 3 Ø Chooses a node (π,s) in Fringe having the smallest value for cost(π) + h(s) Ø Expands the node (computes nodes for all applicable actions) Ø Prunes nodes that can be shown to be no better than nodes already expanded
A*
33 Dana Nau and Vikas Shivashankar: Lecture slides for Automated Planning and Ac0ng Updated 3/24/15
Bucharest
Giurgiu
Urziceni
Hirsova
Eforie
NeamtOradea
Zerind
Arad
Timisoara
LugojMehadia
DobretaCraiova
Sibiu
Fagaras
PitestiRimnicu Vilcea
Vaslui
Iasi
Straight−line distanceto Bucharest
0160242161
77151
241
366
193
178
253329
80199
244
380
226
234
374
98
Giurgiu
UrziceniHirsova
Eforie
Neamt
Oradea
Zerind
Arad
Timisoara
Lugoj
Mehadia
DobretaCraiova
Sibiu Fagaras
Pitesti
Vaslui
Iasi
Rimnicu Vilcea
Bucharest
71
75
118
111
70
75120
151
140
99
80
97
101
211
138
146 85
90
98
142
92
87
86
Arad366=0+366
34 Dana Nau and Vikas Shivashankar: Lecture slides for Automated Planning and Ac0ng Updated 3/24/15
Bucharest
Giurgiu
Urziceni
Hirsova
Eforie
NeamtOradea
Zerind
Arad
Timisoara
LugojMehadia
DobretaCraiova
Sibiu
Fagaras
PitestiRimnicu Vilcea
Vaslui
Iasi
Straight−line distanceto Bucharest
0160242161
77151
241
366
193
178
253329
80199
244
380
226
234
374
98
Giurgiu
UrziceniHirsova
Eforie
Neamt
Oradea
Zerind
Arad
Timisoara
Lugoj
Mehadia
DobretaCraiova
Sibiu Fagaras
Pitesti
Vaslui
Iasi
Rimnicu Vilcea
Bucharest
71
75
118
111
70
75120
151
140
99
80
97
101
211
138
146 85
90
98
142
92
87
86
Zerind
Arad
Sibiu Timisoara447=118+329 449=75+374393=140+253
35 Dana Nau and Vikas Shivashankar: Lecture slides for Automated Planning and Ac0ng Updated 3/24/15
Bucharest
Giurgiu
Urziceni
Hirsova
Eforie
NeamtOradea
Zerind
Arad
Timisoara
LugojMehadia
DobretaCraiova
Sibiu
Fagaras
PitestiRimnicu Vilcea
Vaslui
Iasi
Straight−line distanceto Bucharest
0160242161
77151
241
366
193
178
253329
80199
244
380
226
234
374
98
Giurgiu
UrziceniHirsova
Eforie
Neamt
Oradea
Zerind
Arad
Timisoara
Lugoj
Mehadia
DobretaCraiova
Sibiu Fagaras
Pitesti
Vaslui
Iasi
Rimnicu Vilcea
Bucharest
71
75
118
111
70
75120
151
140
99
80
97
101
211
138
146 85
90
98
142
92
87
86
Zerind
Arad
Sibiu
Arad
Timisoara
Rimnicu VilceaFagaras Oradea
447=118+329 449=75+374
646=280+366 413=220+193415=239+176 671=291+380
36 Dana Nau and Vikas Shivashankar: Lecture slides for Automated Planning and Ac0ng Updated 3/24/15
38 Dana Nau and Vikas Shivashankar: Lecture slides for Automated Planning and Ac0ng Updated 3/24/15
Bucharest
Giurgiu
Urziceni
Hirsova
Eforie
NeamtOradea
Zerind
Arad
Timisoara
LugojMehadia
DobretaCraiova
Sibiu
Fagaras
PitestiRimnicu Vilcea
Vaslui
Iasi
Straight−line distanceto Bucharest
0160242161
77151
241
366
193
178
253329
80199
244
380
226
234
374
98
Giurgiu
UrziceniHirsova
Eforie
Neamt
Oradea
Zerind
Arad
Timisoara
Lugoj
Mehadia
DobretaCraiova
Sibiu Fagaras
Pitesti
Vaslui
Iasi
Rimnicu Vilcea
Bucharest
71
75
118
111
70
75120
151
140
99
80
97
101
211
138
146 85
90
98
142
92
87
86
Zerind
Arad
Sibiu
Arad
Timisoara
Sibiu Bucharest
Rimnicu VilceaFagaras Oradea
Craiova Pitesti Sibiu
Bucharest Craiova Rimnicu Vilcea418=418+0
447=118+329 449=75+374
646=280+366
591=338+253 450=450+0 526=366+160 553=300+253
615=455+160 607=414+193
671=291+380
X X X
X
39 Dana Nau and Vikas Shivashankar: Lecture slides for Automated Planning and Ac0ng Updated 3/24/15
● Requires a heuristic function in line 3 Ø Chooses a node (π,s) in Fringe having the smallest value for cost(π) + h(s) Ø Expands the node (computes nodes for all applicable actions) Ø Prunes nodes that can be shown to be no better than nodes already expanded
A*
40 Dana Nau and Vikas Shivashankar: Lecture slides for Automated Planning and Ac0ng Updated 3/24/15
● h is admissible if for every s, 0 ≤ h(s) ≤ h*(s) Ø where h*(s) = least cost of getting from s to a state that satisfies g
● If h is admissible then A* is guaranteed to return an optimal solution ● Inadmissible heuristics might get you to a solution faster, but the solution won’t
necessarily be optimal
Proper.es of A*
41 Dana Nau and Vikas Shivashankar: Lecture slides for Automated Planning and Ac0ng Updated 3/24/15
Depth-‐First Branch and Bound ● Depth-first search with heuristic function and pruning
Ø π* and c*: least-cost solution found so far, and the cost of that solution Ø Any time you find a solution with lower cost, update π* and c* Ø Prune any plan π such that cost(π) + h(s) ≥ c*
42 Dana Nau and Vikas Shivashankar: Lecture slides for Automated Planning and Ac0ng Updated 3/24/15
Discussion ● If the state space is not too large, then A* or DFBB may be preferable
Ø They are guaranteed to return optimal solutions ● DFFS returns the first solution it finds
Ø can be arbitrarily far from optimal ● Greedy isn’t guaranteed to return a solution at all ● If S is very large, A* may require excessive memory, and both it and DFBB may
require excessive running time Ø In these cases DFFS and Greedy may be preferable
43 Dana Nau and Vikas Shivashankar: Lecture slides for Automated Planning and Ac0ng Updated 3/24/15
Example ● One rigid relation, adjacent ● State variables to represent each
location’s x and y coordinates: Ø x = {(loc0, 2), (loc1, 0),
46 Dana Nau and Vikas Shivashankar: Lecture slides for Automated Planning and Ac0ng Updated 3/24/15
Addi.ve-‐Cost and Max-‐Cost Heuris.cs ● hadd(s) = Δadd(s,g), where
● Minimum cost of a “plan tree” to achieve g from s
Ø Pretend each element of g needs a completely separate plan • If an action achieves i elements of g, it’s included i times
Ø Pretend each of an action’s preconditions needs a completely separate plan Ø Thus, get a “plan” that’s a tree of actions
• Total cost can sometimes be much higher than h*(s) ● Can implement as a tree search, going backward from g
47 Dana Nau and Vikas Shivashankar: Lecture slides for Automated Planning and Ac0ng Updated 3/24/15
true in s2
loc(r1)=d3 loc(c1)=r1
g = {loc(r1)=d3,,loc(c1)=r1}
move(r1,d1,d3) move(r1,d2,d3)pre:
loc(r1)=d1pre:
loc(r1)=d2
true in s1 … 00 > 0
min(1,(>1)) = 1
pre:
loaded(r1)=nil
true in s1
loc(r1)=d1 loc(c1)=d1
true in s2
0 0
load(r1,c1,d1)
0+1 = 10+1 = 1 (>0)+1 > 1
load(r1,c2,d2)
load(r1,c3,d3)
…
…> 0
> 0
(>0) +1 > 1(>0)+1 > 1
sum
min(1,(>1),(>1)) = 1
sum(0,0,0) = 0su m
alternativesalternatives
hadd(s1) = Δadd(s1,g) = sum(1,1) = 2
d3 r1 c1
s1:
d2 d1
d3
r1 c1
Addi.ve-‐Cost Heuris.c
g = {loc(r1) = d3, loc(c1) = r1}
48 Dana Nau and Vikas Shivashankar: Lecture slides for Automated Planning and Ac0ng Updated 3/24/15
hadd(s2) = Δadd(s2,g) = sum(1,2) = 3
move(r1,d2,d1)
loc(r1)=d3 loc(c1)=r1
g = {loc(r1)=d3,0loc(c1)=r1}
move(r1,d1,d3) move(r1,d2,d3)pre:
loc(r1)=d1pre:
loc(r1)=d2
… true in s20
> 0 0
min(1,(>1)) = 1
pre:
loaded(r1)=nil
true in s2
loc(r1)=d1 loc(c1)=d1
true in s2
00+1 = 1
load(r1,c1,d1)
1+1 = 2(>0)+1 > 1 0+1 = 1
load(r1,c2,d2)
load(r1,c3,d3)
…
…> 1
> 1
(>1) +1 > 2(>1)+1 > 2
sum
true in s2
0
min(2,(>2),(>2)) = 2
sum(0,1,0) = 1su m
alternativesalternatives
Addi.ve-‐Cost Heuris.c
g = {loc(r1) = d3, loc(c1) = r1} d3
r1 c1 s2:
d2 d1
d3
r1 c1
49 Dana Nau and Vikas Shivashankar: Lecture slides for Automated Planning and Ac0ng Updated 3/24/15
Addi.ve-‐Cost and Max-‐Cost Heuris.cs ● hmax(s) = Δmax(s,g), where
● Like hadd(s), but doesn’t add all of the costs in the plan tree
Ø Just the most costly path in the plan tree ● Guaranteed to be a lower bound on h*(s)
● Same tree search as for hadd(s)
50 Dana Nau and Vikas Shivashankar: Lecture slides for Automated Planning and Ac0ng Updated 3/24/15
true in s2
loc(r1)=d3 loc(c1)=r1
g = {loc(r1)=d3,,loc(c1)=r1}
move(r1,d1,d3) move(r1,d2,d3)pre:
loc(r1)=d1pre:
loc(r1)=d2
true in s1 … 00 > 0
min(1,(>1)) = 1
pre:
loaded(r1)=nil
true in s1
loc(r1)=d1 loc(c1)=d1
true in s2
0 0
load(r1,c1,d1)
0+1 = 10+1 = 1 (>0)+1 > 1
load(r1,c2,d2)
load(r1,c3,d3)
…
…> 0
> 0
(>0) +1 > 1(>0)+1 > 1
max
min(1,(>1),(>1)) = 1
max(0,0,0) = 0m ax
alternativesalternatives
hmax(s1) = Δmax(s1,g) = max(1,1) = 1
s1:
d2 d1
d3
r1 c1
Max-‐Cost Heuris.c
g = {loc(r1) = d3, loc(c1) = r1} d3
r1 c1
51 Dana Nau and Vikas Shivashankar: Lecture slides for Automated Planning and Ac0ng Updated 3/24/15
move(r1,d2,d1)
loc(r1)=d3 loc(c1)=r1
g = {loc(r1)=d3,0loc(c1)=r1}
move(r1,d1,d3) move(r1,d2,d3)pre:
loc(r1)=d1pre:
loc(r1)=d2
… true in s20
> 0 0
min(1,(>1)) = 1
hmax(s2) = Δmax(s2,g) = max(1,2) = 2
pre:
loaded(r1)=nil
true in s2
loc(r1)=d1 loc(c1)=d1
true in s2
00+1 = 1
load(r1,c1,d1)
1+1 = 2(>0)+1 > 1 0+1 = 1
load(r1,c2,d2)
load(r1,c3,d3)
…
…> 1
> 1
(>1) +1 > 2(>1)+1 > 2
max
true in s2
0
min(2,(>2),(>2)) = 2
max(0,1,0) = 1m ax
alternativesalternatives
Max-‐Cost Heuris.c
g = {loc(r1) = d3, loc(c1) = r1} d3
r1 c1 s2:
d2 d1
d3
r1 c1
52 Dana Nau and Vikas Shivashankar: Lecture slides for Automated Planning and Ac0ng Updated 3/24/15
Delete-‐Relaxa.on Heuris.cs ● Suppose a state s includes an assignment x = v ● Suppose an action a has an effect x ← w ● Then γ+(s, a) includes both x = v and x = w
● Relaxed state (or r-state) Ø any set ŝ of state-variable values Ø may include more than one value for each state variable
● ŝ r-satisfies a goal g if ŝ contains a subset that satisfies g ● An action a is r-applicable in ŝ if ŝ r-satisfies a’s preconditions
Ø In this case, γ+(ŝ,a) = ŝ ∪ γ(s,a) ● π = ⟨a1, …, an⟩ is r-applicable in ŝ0 if there are r-states ŝ1, ŝ2, …, ŝn such that
• a1 is r-applicable in ŝ0 and γ+(ŝ0,a1) = ŝ1 • a2 is r-applicable in ŝ1 and γ+(ŝ1,a2) = ŝ2 • …
Ø In this case, γ+(ŝ,a) = ŝn ● π is a relaxed solution for P = (Σ, s0, g) if γ+(s0, π) r-satisfies g
Name is from classical planning “don’t do the deletion”
● Computation of RPG(s2,g) Ø First line of RPG assigns ŝ0 = s2 , A0 = ∅
● RPG(s2,g) returns 3
61 Dana Nau and Vikas Shivashankar: Lecture slides for Automated Planning and Ac0ng Updated 3/24/15
Landmark Heuris.cs ● P = (Σ,s0,g) be a planning problem ● Let φ = φ1 ∨ ... ∨ φm be a disjunction of state-variable assignments ● Definition: φ is a landmark for P if φ is true at some point in every solution plan
of P
● Example Landmarks
d3 r1 c1 s0:
g:
d2 d1
d3 r1
c1 g = {loc(r1) = d3, loc(c1) = r1}
d1 r1
A complete state s0 A single state-variable Assignment (loc(r1)=d1)
d1 r1
d2 r1
Can be a disjunction of state-variable assignments loc(r1)=d1 ∨ loc(r1)=d2
62 Dana Nau and Vikas Shivashankar: Lecture slides for Automated Planning and Ac0ng Updated 3/24/15
Why are Landmarks Useful? ● Help in breaking down the given problem into smaller subproblems
gs0
63 Dana Nau and Vikas Shivashankar: Lecture slides for Automated Planning and Ac0ng Updated 3/24/15
Why are Landmarks Useful? ● Help in breaking down the given problem into smaller subproblems
● Every solution to P has to achieve these landmarks ● Possible strategy:
Ø find a plan that takes us from s0 to any state s1 that satisfies lm1 Ø find a plan that takes us from s1 to any state s2 that satisfies lm2 Ø …
lm1
gs0
lm2
lm3P1
P2 P3
P4
64 Dana Nau and Vikas Shivashankar: Lecture slides for Automated Planning and Ac0ng Updated 3/24/15
Compu.ng Landmarks ● Question: How do we compute landmarks for a problem P? ● Not easy:
Ø Deciding whether a state-variable assignment φ is a landmark is in the worst-case PSPACE-hard
Ø To put it in perspective: as hard as solving the planning problem itself! ● However, all is not lost:
Ø There are often useful landmarks that can be found more easily Ø There are polynomial-time procedures that can compute these landmarks Ø Going to see one such procedure based on Relaxed Planning Graphs
• Computing landmarks for relaxed planning problems easier Ø A landmark for a relaxed planning problem is a landmark for the original
planning problem as well
65 Dana Nau and Vikas Shivashankar: Lecture slides for Automated Planning and Ac0ng Updated 3/24/15
RPG-‐based Landmark Computa.on ● Main intuition: if a state-variable assignment φ is a landmark, then preconditions
of actions that achieve φ is also a landmark Ø In other words: we’re going to start from known landmarks and discover new
ones by looking at preconditions of actions that achieve these landmarks ● Example:
Ø g is the goal Ø Actions a1 and a2 can achieve g Ø Therefore, either p1∧q or p2∧q must be true to be able to achieve g Ø In other words: (p1∧q)∨(p2∧q) is a landmark Ø By rearranging terms: we get (p1∧q)∨(p2∧q) ≅ q∧(p1∨p2) Ø Since q∧(p1∨p2) is a landmark, both q and (p1∨p2) are landmarks
● In practice, we try to rearrange assignments in a similar manner to group terms with the same state variable together
g
a2
a1p1
q
p2
66 Dana Nau and Vikas Shivashankar: Lecture slides for Automated Planning and Ac0ng Updated 3/24/15
RPG-‐based Landmark Computa.on ● Question: What landmarks can we start with?
Ø Every goal is trivially a landmark; can start from there ● E.g. loc(r1) = d3 is a landmark
Ø Two actions achieve this landmark: move (r1, d3, d1) and move (r1, d2, d1)
Ø Can infer a new landmark that is the disjunction of the preconditions of these two actions: φ‘ = loc(r1) = d3∨loc(r1) = d2
action move(r, d, e) pre: loc(r) = d eff: loc(r) ← e
d3 r1 c1 s0:
g:
d2 d1
d3 r1
c1 g = {loc(r1) = d3, loc(c1) = r1}
67 Dana Nau and Vikas Shivashankar: Lecture slides for Automated Planning and Ac0ng Updated 3/24/15
RPG-‐based Landmark Computa.on ComputeLandmark (s0, g = g1 ∧ g2 ∧ … gk) ● queue = {g1, g2,…, gk} ● While queue is not empty:
Ø Remove a gi from queue
Ø Ai = {all actions that can achieve gi} Ø Compute all assignments that can be r-
produced starting from s0 without using Ai, thus generating the RPG foo
Ø act(gi) = {all actions in Ai that are r-applicable in the r-state resulting from foo}
Ø For each action in act(gi): • Pick a precondition not satisfied in s0
and add to φ
Ø The resulting disjunction φ is a landmark; add it to queue
gi
a1p1
q1
a2p2
q2
a3p3
q3
68 Dana Nau and Vikas Shivashankar: Lecture slides for Automated Planning and Ac0ng Updated 3/24/15
RPG-‐based Landmark Computa.on ComputeLandmark (s0, g = g1 ∧ g2 ∧ … gk) ● queue = {g1, g2,…, gk} ● While queue is not empty:
Ø Remove a gi from queue
Ø Ai = {all actions that can achieve gi} Ø Compute all assignments that can be r-
produced starting from s0 without using Ai, thus generating the RPG foo
Ø act(gi) = {all actions in Ai that are r-applicable in the r-state resulting from foo}
Ø For each action in act(gi): • Pick a precondition not satisfied in s0
and add to φ
Ø The resulting disjunction φ is a landmark; add it to queue
gi
a1p1
q1
a2p2
q2
a3p3
q3
69 Dana Nau and Vikas Shivashankar: Lecture slides for Automated Planning and Ac0ng Updated 3/24/15
RPG-‐based Landmark Computa.on ComputeLandmark (s0, g = g1 ∧ g2 ∧ … gk) ● queue = {g1, g2,…, gk} ● While queue is not empty:
Ø Remove a gi from queue
Ø Ai = {all actions that can achieve gi} Ø Compute all assignments that can be r-
produced starting from s0 without using Ai, thus generating the RPG foo
Ø act(gi) = {all actions in Ai that are r-applicable in the r-state resulting from foo}
Ø For each action in act(gi): • Pick a precondition not satisfied in s0
and add to φ
Ø The resulting disjunction φ is a landmark; add it to queue
gi
a1p1
q1
a2p2
q2
a3p3
q3
70 Dana Nau and Vikas Shivashankar: Lecture slides for Automated Planning and Ac0ng Updated 3/24/15
RPG-‐based Landmark Computa.on ComputeLandmark (s0, g = g1 ∧ g2 ∧ … gk) ● queue = {g1, g2,…, gk} ● While queue is not empty:
Ø Remove a gi from queue
Ø Ai = {all actions that can achieve gi} Ø Compute all assignments that can be r-
produced starting from s0 without using Ai, thus generating the RPG foo
Ø act(gi) = {all actions in Ai that are r-applicable in the r-state resulting from foo}
Ø For each action in act(gi): • Pick a precondition not satisfied in s0
and add to φ
Ø The resulting disjunction φ is a landmark; add it to queue
gi
a1p1
q1
a2p2
q2
a3p3
q3
71 Dana Nau and Vikas Shivashankar: Lecture slides for Automated Planning and Ac0ng Updated 3/24/15
RPG-‐based Landmark Computa.on ComputeLandmark (s0, g = g1 ∧ g2 ∧ … gk) ● queue = {g1, g2,…, gk} ● While queue is not empty:
Ø Remove a gi from queue
Ø Ai = {all actions that can achieve gi} Ø Compute all assignments that can be r-
produced starting from s0 without using Ai, thus generating the RPG foo
Ø act(gi) = {all actions in Ai that are r-applicable in the r-state resulting from foo}
Ø For each action in act(gi): • Pick a precondition not satisfied in s0
and add to φ
Ø The resulting disjunction φ is a landmark; add it to queue
gi
a1p1
q1
a2p2
q2
a3p3
q3
72 Dana Nau and Vikas Shivashankar: Lecture slides for Automated Planning and Ac0ng Updated 3/24/15
RPG-‐based Landmark Computa.on ComputeLandmark (s0, g = g1 ∧ g2 ∧ … gk) ● queue = {g1, g2,…, gk} ● While queue is not empty:
Ø Remove a gi from queue
Ø Ai = {all actions that can achieve gi} Ø Compute all assignments that can be r-
produced starting from s0 without using Ai, thus generating the RPG foo
Ø act(gi) = {all actions in Ai that are r-applicable in the r-state resulting from foo}
Ø For each action in act(gi): • Pick a precondition not satisfied in s0
and add to φ
Ø The resulting disjunction φ is a landmark; add it to queue
gi
a1p1
q1
a2p2
q2
a3p3
q3 p3∨q1: a new landmark
73 Dana Nau and Vikas Shivashankar: Lecture slides for Automated Planning and Ac0ng Updated 3/24/15
RPG-‐based Landmark Computa.on
d3 r1 c1 s0:
g:
d2 d1
d3 r1
c1 g = {loc(r1) = d3, loc(c1) = r1}
loc(r1) = d3 loc(c1) = r1 ComputeLandmark (s0, g = g1 ∧ g2 ∧ … gk) ● queue = {g1, g2,…, gk} ● While queue is not empty:
Ø Remove a gi from queue
Ø Ai = {all actions that can achieve gi} Ø Compute all assignments that can be r-produced
starting from s0 without using Ai, thus generating the RPG foo
Ø act(gi) = {all actions in Ai that are r-applicable in the r-state resulting from foo}
Ø For each action in act(gi): • Pick a precondition not satisfied in s0 and add to φ
Ø The resulting disjunction φ is a landmark; add it to queue
74 Dana Nau and Vikas Shivashankar: Lecture slides for Automated Planning and Ac0ng Updated 3/24/15
RPG-‐based Landmark Computa.on
d3 r1 c1 s0:
g:
d2 d1
d3 r1
c1 g = {loc(r1) = d3, loc(c1) = r1}
loc(r1) = d3 loc(c1) = r1
True in current state, always going
to be true in RPG; No use expanding
ComputeLandmark (s0, g = g1 ∧ g2 ∧ … gk) ● queue = {g1, g2,…, gk} ● While queue is not empty:
Ø Remove a gi from queue
Ø Ai = {all actions that can achieve gi} Ø Compute all assignments that can be r-produced
starting from s0 without using Ai, thus generating the RPG foo
Ø act(gi) = {all actions in Ai that are r-applicable in the r-state resulting from foo}
Ø For each action in act(gi): • Pick a precondition not satisfied in s0 and add to φ
Ø The resulting disjunction φ is a landmark; add it to queue
75 Dana Nau and Vikas Shivashankar: Lecture slides for Automated Planning and Ac0ng Updated 3/24/15
pre: loaded(r) = nil, loc(c) = l, loc(r) = l eff: loaded(r) ← c, loc(c) ← r
Only load (r1,c1,d1) is applicable in final level of RPG; these preconds used to generate new landmarks
ComputeLandmark (s0, g = g1 ∧ g2 ∧ … gk) ● queue = {g1, g2,…, gk} ● While queue is not empty:
Ø Remove a gi from queue
Ø Ai = {all actions that can achieve gi} Ø Compute all assignments that can be r-produced
starting from s0 without using Ai, thus generating the RPG foo
Ø act(gi) = {all actions in Ai that are r-applicable in the r-state resulting from foo}
Ø For each action in act(gi): • Pick a precondition not satisfied in s0 and add to φ
Ø The resulting disjunction φ is a landmark; add it to queue
79 Dana Nau and Vikas Shivashankar: Lecture slides for Automated Planning and Ac0ng Updated 3/24/15
RPG-‐based Landmark Computa.on
d3 r1 c1 s0:
g:
d2 d1
d3 r1
c1 g = {loc(r1) = d3, loc(c1) = r1}
loc(r1) = d3 loc(c1) = r1
loaded(r1) = nil
loc(c1) = d1 loc(r1) = d1
These new landmarks are added to the queue
ComputeLandmark (s0, g = g1 ∧ g2 ∧ … gk) ● queue = {g1, g2,…, gk} ● While queue is not empty:
Ø Remove a gi from queue
Ø Ai = {all actions that can achieve gi} Ø Compute all assignments that can be r-produced
starting from s0 without using Ai, thus generating the RPG foo
Ø act(gi) = {all actions in Ai that are r-applicable in the r-state resulting from foo}
Ø For each action in act(gi): • Pick a precondition not satisfied in s0 and add to φ
Ø The resulting disjunction φ is a landmark; add it to queue
80 Dana Nau and Vikas Shivashankar: Lecture slides for Automated Planning and Ac0ng Updated 3/24/15
Landmark Heuris.c ● Every solution to the problem needs to achieve all the computed landmarks ● One possible heuristic to estimate distance of state s from g:
Ø Number of landmarks required to be accomplished from s Ø Planner biased towards actions that achieve landmarks
● Is this heuristic admissible?
81 Dana Nau and Vikas Shivashankar: Lecture slides for Automated Planning and Ac0ng Updated 3/24/15
Landmark Heuris.c ● Every solution to the problem needs to achieve all the computed landmarks ● One possible heuristic to estimate distance of state s from g:
Ø Number of landmarks required to be accomplished from s Ø Planner biased towards actions that achieve landmarks
● Is this heuristic admissible?
● A number of more advanced landmark-based heuristics developed (including admissible ones) Ø Check textbook for references
g1as0
g2Number of landmarks: | {g1, g2} | = 2 Optimal Plan Length = |<a>| = 1