Top Banner
REPLANNING IN DOMAINS WITH PARTIAL INFORMATION AND SENSING ACTIONS Guy Shani Ronen Brafman Ben-Gurion University 1 Problem Background SDR Results
20

REPLANNING IN DOMAINS WITH PARTIAL INFORMATION AND SENSING ACTIONS Guy Shani Ronen Brafman Ben-Gurion University 1 ProblemBackgroundSDR Results.

Dec 21, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: REPLANNING IN DOMAINS WITH PARTIAL INFORMATION AND SENSING ACTIONS Guy Shani Ronen Brafman Ben-Gurion University 1 ProblemBackgroundSDR Results.

1

REPLANNING IN DOMAINS WITH PARTIAL INFORMATION AND SENSING ACTIONSGuy Shani

Ronen Brafman

Ben-Gurion University

Problem Background SDR Results

Page 2: REPLANNING IN DOMAINS WITH PARTIAL INFORMATION AND SENSING ACTIONS Guy Shani Ronen Brafman Ben-Gurion University 1 ProblemBackgroundSDR Results.

2

Online Planning under Uncertainty with Partial Observability and Sensing• Deterministic actions• Concrete goal condition• Uncertainty about the initial state

• Non-stochastic model – states are either possible or impossible

• Sensing actions provide information about the world• We can generate a conditional plan

• Online planning• We do not plan for all contingencies ahead of time – just until the

next observation

Problem Background SDR Results

Page 3: REPLANNING IN DOMAINS WITH PARTIAL INFORMATION AND SENSING ACTIONS Guy Shani Ronen Brafman Ben-Gurion University 1 ProblemBackgroundSDR Results.

3

Start state:(oneof (wumpus-at 2,3) (wumpus-at 3,2))(oneof (wumpus-at 3,4) (wumpus-at 4,3))

Start state:(oneof (at 1,1) … (at 5,1) (at 1,2) (at 5,2) … (at 5,1) … (at 5,5))

Examples

• Toy problems from CLG [Albore et at.]

• Doors• Gate location unknown

• Wumpus• Monster location unknown• Must correlate observations from

multiple locations

• Localize• Agent location unknown• Must reason from history

Start state:(oneof (door-at 2,1) … (door-at 2,5))(oneof (door-at 4,1) … (door-at 4,5))

Problem Background SDR Results

Page 4: REPLANNING IN DOMAINS WITH PARTIAL INFORMATION AND SENSING ACTIONS Guy Shani Ronen Brafman Ben-Gurion University 1 ProblemBackgroundSDR Results.

4

Why work on this problem?• Uncertainty, partial observability

• no need to motivate

• Study the challenge of planning to sense/learn• Many POMDP methods cope poorly with information gathering

sub-plans that do not provide rewards• We study this in a slightly simpler setting obtained by:

• Simpler form of uncertainty : non-stochastic, deterministic actions• Structured actions and state (a-la STRIPS)

• Extend existing techniques that focus on contingent planning with full observability

Problem Background SDR Results

Page 5: REPLANNING IN DOMAINS WITH PARTIAL INFORMATION AND SENSING ACTIONS Guy Shani Ronen Brafman Ben-Gurion University 1 ProblemBackgroundSDR Results.

5

Our contributions

• Extending replanning techniques to handle this case

• A lazy technique for (not) maintaining the belief state

Problem Background SDR Results

Page 6: REPLANNING IN DOMAINS WITH PARTIAL INFORMATION AND SENSING ACTIONS Guy Shani Ronen Brafman Ben-Gurion University 1 ProblemBackgroundSDR Results.

7

Replanning (basic idea)

• Pros: very simple, fast, and often effective

• Cons: a greedy approach with the regular drawbacks • Simplistic classical model can lead to poor choices• Can get caught in dead-ends• Smart sampling may reduce these problems

Generate simpler classical probleme.g. Reduce initial state uncertainty by choosing one state

Plan for the reduced problemExecute plan until things breake.g. Observation doesn’t agree with the

selected state

Classical problem

Plan

Reduced uncertainty

Page 7: REPLANNING IN DOMAINS WITH PARTIAL INFORMATION AND SENSING ACTIONS Guy Shani Ronen Brafman Ben-Gurion University 1 ProblemBackgroundSDR Results.

8

Replanning with PO and Sensing – Take 1

• Determinize the problem by determinizing the current state• Plan for this initial state only• Execute until observations conflict with deterministic model• Replan!

Problem: Planner will make no effort to sense. It plans as if it knows everything. Need a more sophisticated model that captures the agent’s belief state

Problem Background SDR Results

Page 8: REPLANNING IN DOMAINS WITH PARTIAL INFORMATION AND SENSING ACTIONS Guy Shani Ronen Brafman Ben-Gurion University 1 ProblemBackgroundSDR Results.

9

Solution: Use Palacious and Geffner’s Translation-based Approach

• Explicitly represent the agent knowledge• Knowledge predicates replace regular predicates• Kp = Know that p is true• Must ground knowledge on some initial features

• A short tutorial with zero details.

Problem Background SDR Results

Page 9: REPLANNING IN DOMAINS WITH PARTIAL INFORMATION AND SENSING ACTIONS Guy Shani Ronen Brafman Ben-Gurion University 1 ProblemBackgroundSDR Results.

10

Translation to Classical Planning

• Maintain predicate values given an initial state• i.e. we know that p is true given that si

was the initial state and false if sj was the initial state.

• Kp means that we know that p is true in all valid states

• Revise actions:• An effect is transformed into • Precondition p transformed to Kp, i.e. an

action can be applied only if the preconditions hold in all valid states

s0

• K(wumpus-at p-2,3)|s0

• K(not (wumpus-at p-3,2))|s0

• K(stench-at p-2,4)|s0

• K(stench-at p-2,2)

Problem Background SDR Results

Page 10: REPLANNING IN DOMAINS WITH PARTIAL INFORMATION AND SENSING ACTIONS Guy Shani Ronen Brafman Ben-Gurion University 1 ProblemBackgroundSDR Results.

11

Translation to Classical Planning

• Sensing actions reveal an unknown predicate p, and hence have effect or

• Actions to eliminate states from the belief• If then we know that s was not the initial state effect

• Warning! Many details are missing…

Problem Background SDR Results

Page 11: REPLANNING IN DOMAINS WITH PARTIAL INFORMATION AND SENSING ACTIONS Guy Shani Ronen Brafman Ben-Gurion University 1 ProblemBackgroundSDR Results.

12

Replanning with PO and Sensing – Take 2

• Use knowledge domain translation• Feed translation into classical planner• Execute plan until things break

• E.g. observation is inconsistent with expectations

• Replan!

• Still missing…• What happens when sensing actions are executed in the

knowledge domain?• Translation size is often huge!

Page 12: REPLANNING IN DOMAINS WITH PARTIAL INFORMATION AND SENSING ACTIONS Guy Shani Ronen Brafman Ben-Gurion University 1 ProblemBackgroundSDR Results.

13

• Problem 1: Sensing actions translate into non-deterministic actions

• Solution: Determinize sensing by choosing an initial state s0. All observations will be consistent with this state• Sensing actions have conditional (deterministic) effects:

• Planner must KNOW preconditions of actions and goals• It must use explicit sensing actions

Problem Background SDR Results

Replanning with PO and Sensing – Take 2

Action Move-right K-Move-right K-Sense-right

Precondition Free-right Kfree-right

Effect (at 1,1)->(at 2,1) (Kat 1,1)->(Kat 2,1) Free-right->Kfree-right

Page 13: REPLANNING IN DOMAINS WITH PARTIAL INFORMATION AND SENSING ACTIONS Guy Shani Ronen Brafman Ben-Gurion University 1 ProblemBackgroundSDR Results.

14

• Problem: Translation is often huge• Given N initial states. Predicate copy for each initial state (N

copies).• Each condition in every actions is copied 2N times.• Actions to eliminate every initial state on every predicate.

• Solution: sample a small number of possible initial states

Localize 5X5

Original N=19 N=2

Predicates 24 963 162

Actions 9 1201 130

Conditions 59 2309 421

Start state - 533 85

Problem Background SDR Results

Replanning with PO and Sensing – Take 2

To summarize:

1.Sample subset S out of the possible

current states

• To reduce the translation size

2.Sample s0 from S

• Base observations on s0

3.Generate knowledge domain

translation, given S and s0

4.Solve using a classical planner

Page 14: REPLANNING IN DOMAINS WITH PARTIAL INFORMATION AND SENSING ACTIONS Guy Shani Ronen Brafman Ben-Gurion University 1 ProblemBackgroundSDR Results.

15

Still missing … Belief Maintenance

• Must recognize if the goal was reached• Must recognize if the preconditions of the next action are

guaranteed to be true

• Requires maintaining information about the current belief state (set of valid states)• This issue is orthogonal to how we generate the plan

Problem Background SDR Results

Page 15: REPLANNING IN DOMAINS WITH PARTIAL INFORMATION AND SENSING ACTIONS Guy Shani Ronen Brafman Ben-Gurion University 1 ProblemBackgroundSDR Results.

18

Belief Maintenance through Regression

• A (very) lazy approach• Maintain b0 as a formula

• Maintain history a1,o1,…,at,ot

• Cons: must regenerate formula on every query

• Pros: generated formula is focused only on the current query and remains small

Check whether ct holds at bt

Regress through a1,o1 resulting in

Regress through at-1,ot-1 resulting in

Regress through at,ot resulting in

Solve SAT problem If there is no satisfying

assignment then ct holds at bt

Page 16: REPLANNING IN DOMAINS WITH PARTIAL INFORMATION AND SENSING ACTIONS Guy Shani Ronen Brafman Ben-Gurion University 1 ProblemBackgroundSDR Results.

20

Sample, Determinize, Replan - SDRSelect S and s0

Translate to classical planning

Run classical planner (FF)

Regress goal (solved using MiniSat)

Regress action precondition

Execute action

Check observation consistencyGoal

achieved!Terminate

PLAN

EXECUTE

Problem Background SDR Results

Page 17: REPLANNING IN DOMAINS WITH PARTIAL INFORMATION AND SENSING ACTIONS Guy Shani Ronen Brafman Ben-Gurion University 1 ProblemBackgroundSDR Results.

21

CLG vs. SDR• CLG translation generates non-deterministic effects for

observation actions• In offline mode all possibilities are checked• In online mode the environment is queried (as we do)• Uses a specialized semi-classical planner (FF variant) – SDR can

use any black box planner (experiments use FF)

• CLG uses tags• In many (most) cases more efficient than complete states.• Complete translation still blows up rapidly.

Problem Background SDR Results

Page 18: REPLANNING IN DOMAINS WITH PARTIAL INFORMATION AND SENSING ACTIONS Guy Shani Ronen Brafman Ben-Gurion University 1 ProblemBackgroundSDR Results.

22

Results

SDR CLG

Domain #Actions Time #Actions Time

Wumpus15 92.5 42 103 240.7

Wumpus20 115.9 156.1 160 1224.8

doors13 177.1 25.1 111.8 264.5

doors17 306.8 96.9 PF X

localize15 56.5 35.6 PF X

localize17 71.7 75 PF X

colorballs-9-3 660.2 209.1 227.8 707.1

colorballs-9-7 1343.9 693.3 TF X

medpks150 88.8 268 CSU X

medpks199 89.9 502.9 PF X

Problem Background SDR Results

Page 19: REPLANNING IN DOMAINS WITH PARTIAL INFORMATION AND SENSING ACTIONS Guy Shani Ronen Brafman Ben-Gurion University 1 ProblemBackgroundSDR Results.

30

Summary• SDR – Contingent Replanner under partial observability

• Sample a set of possible states from the current belief.• Create a classical planning translation.• Execute plan until sample proven invalid or goal was reached.

• SDR shown to be faster and scale up to larger domains than CLG (state-of-the-art)

Problem Background SDR Results

Page 20: REPLANNING IN DOMAINS WITH PARTIAL INFORMATION AND SENSING ACTIONS Guy Shani Ronen Brafman Ben-Gurion University 1 ProblemBackgroundSDR Results.

31

Future Work• Sensing costs

• Sensing can have a cost (e.g. sensor warmup)• Should have a tradeoff between sensing and acting• Remove sensed preconditions – agent should decide whether it

wants to sense or not

• Deadends – well known pitfall of replanning algorithms• Smarter sampling techniques• Scaling up – currently not much better than POMDPs!

Problem Background SDR Results

Thank you