Top Banner
Decentralized Decision Making in Partially Observable, Uncertain Worlds Shlomo Zilberstein Department of Computer Science University of Massachusetts Amherst Joint work with Martin Allen, Christopher Amato, Daniel Bernstein, Alan Carlin, Claudia Goldman, Eric Hansen, Akshat Kumar, Marek Petrik, Sven Seuken, Feng Wu, and Xiaojian Wu IJCAI’11 Workshop on Decision Making in Partially Observable, Uncertain Worlds Barcelona, Spain July 18, 2011
83

Decentralized Decision Makingebrun/ijcai2011/ws_papers/zilber... · 3! Problem Characteristics! A group of decision makers or agents interact in a stochastic environment! Each “episode”

May 31, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Decentralized Decision Makingebrun/ijcai2011/ws_papers/zilber... · 3! Problem Characteristics! A group of decision makers or agents interact in a stochastic environment! Each “episode”

Decentralized Decision Making !in Partially Observable, Uncertain Worlds

Shlomo Zilberstein Department of Computer Science University of Massachusetts Amherst

Joint work with Martin Allen, Christopher Amato, Daniel Bernstein, Alan Carlin, Claudia Goldman, Eric Hansen, Akshat Kumar, Marek Petrik, Sven Seuken, Feng Wu, and Xiaojian Wu

IJCAI’11 Workshop on Decision Making in Partially Observable, Uncertain Worlds Barcelona, Spain July 18, 2011

Page 2: Decentralized Decision Makingebrun/ijcai2011/ws_papers/zilber... · 3! Problem Characteristics! A group of decision makers or agents interact in a stochastic environment! Each “episode”

2!

Decentralized Decision Making!

  Challenge: How to achieve intelligent coordination of a group of decision makers in spite of stochasticity and partial observability?!

  Key objective: Develop effective decision-theoretic methods to address the uncertainty about the domain, the outcome of actions, and the knowledge, beliefs and intentions of the other agents.!

Page 3: Decentralized Decision Makingebrun/ijcai2011/ws_papers/zilber... · 3! Problem Characteristics! A group of decision makers or agents interact in a stochastic environment! Each “episode”

3!

Problem Characteristics!  A group of decision makers or agents interact in

a stochastic environment!  Each “episode” involves a sequence of decisions

over finite or infinite horizon!  The change in the environment is determined

stochastically by the current state and the set of actions taken by the agents!

  Each decision maker obtains different partial observations of the overall situation!

  Decision makers have the same objectives!

Page 4: Decentralized Decision Makingebrun/ijcai2011/ws_papers/zilber... · 3! Problem Characteristics! A group of decision makers or agents interact in a stochastic environment! Each “episode”

4!

Applications!  Autonomous rovers for space

exploration!  Protocol design for multi-

access broadcast channels!  Coordination of mobile robots!

  Decentralized detection and tracking!

  Decentralized detection of hazardous weather events !

Page 5: Decentralized Decision Makingebrun/ijcai2011/ws_papers/zilber... · 3! Problem Characteristics! A group of decision makers or agents interact in a stochastic environment! Each “episode”

5!

Outline!  Models for decentralized decision making

  Complexity results

  Solving finite-horizon DEC-POMDPs

  Solving infinite-horizon DEC-POMDPs

  Scalability beyond two agents

  Conclusion

Page 6: Decentralized Decision Makingebrun/ijcai2011/ws_papers/zilber... · 3! Problem Characteristics! A group of decision makers or agents interact in a stochastic environment! Each “episode”

8!

Decentralized POMDP !

  Generalization of POMDP involving multiple cooperating decision makers with different observation functions!

a1

o1

o2

a2

1

2

World Reward r

Page 7: Decentralized Decision Makingebrun/ijcai2011/ws_papers/zilber... · 3! Problem Characteristics! A group of decision makers or agents interact in a stochastic environment! Each “episode”

9!

DEC-POMDPs!  A DEC-POMDP is defined by a tuple 〈S, A1, A2, P, R1, R2, Ω1, Ω2, O〉, where!  S is a finite set of domain states, with initial state s0!  A1, A2 are finite action sets!  P(s, a1, a2, s' ) is a state transition function!  R(s, a1, a2) is a reward function!  Ω1, Ω2 are finite observation sets

  O(a1, a2, s', o1, o2) is an observation function!

  Straightforward generalization to n agents!

Page 8: Decentralized Decision Makingebrun/ijcai2011/ws_papers/zilber... · 3! Problem Characteristics! A group of decision makers or agents interact in a stochastic environment! Each “episode”

10!

Formal Models !

Page 9: Decentralized Decision Makingebrun/ijcai2011/ws_papers/zilber... · 3! Problem Characteristics! A group of decision makers or agents interact in a stochastic environment! Each “episode”

11!

Example: Mobile Robot Planning!

States: grid cell pairs

Actions: ↑,↓,←,→

Transitions: noisy

Goal: meet quickly

Observations: red lines

Page 10: Decentralized Decision Makingebrun/ijcai2011/ws_papers/zilber... · 3! Problem Characteristics! A group of decision makers or agents interact in a stochastic environment! Each “episode”

12!

Example: Cooperative Box-Pushing

Goal: push as many boxes as possible to goal area; larger box has higher reward, but requires two agents to be moved.

Page 11: Decentralized Decision Makingebrun/ijcai2011/ws_papers/zilber... · 3! Problem Characteristics! A group of decision makers or agents interact in a stochastic environment! Each “episode”

13!

Solving DEC-POMDPs!  Each agentʼs behavior is described by a

local policy δi!

  Policy can be represented as a mapping from!  Local observation sequences to actions; or!  Local memory states to actions

  Actions can be selected deterministically or stochastically!

  Goal is to maximize expected reward over a finite horizon or discounted infinite horizon!

Page 12: Decentralized Decision Makingebrun/ijcai2011/ws_papers/zilber... · 3! Problem Characteristics! A group of decision makers or agents interact in a stochastic environment! Each “episode”

14!

Work on Decentralized Decision Making and DEC-POMDPs!

  Team theory [Marschak 55, Tsitsiklis & Papadimitriou 82]!  Incorporating dynamics [Witsenhausen 71]!  Communication strategies [Varaiya & Walrand 78, Xuan et al. 01,

Pynadath & Tambe 02]!  Approximation algorithms [Peshkin et al. 00, Guestrin et al. 01,

Nair et al. 03, Emery-Montemerlo et al. 04]!  First Exact DP algorithm [Hansen et al. 04]!  First policy iteration algorithm [Bernstein et al. 05]!  Many recent exact and approximate DEC-POMDP algorithms!

Page 13: Decentralized Decision Makingebrun/ijcai2011/ws_papers/zilber... · 3! Problem Characteristics! A group of decision makers or agents interact in a stochastic environment! Each “episode”

15!

Some Fundamental Questions!  Are DEC-POMDPs significantly harder to solve than

POMDPs? Why?!  What features of the problem domain affect the

complexity and how?!  Is optimal dynamic programming possible?!  Can dynamic programming be made practical?!  Is it beneficial to treat communication as a separate type

of action?!  How can we exploit the locality of agent interaction to

develop more scalable algorithms?!

Page 14: Decentralized Decision Makingebrun/ijcai2011/ws_papers/zilber... · 3! Problem Characteristics! A group of decision makers or agents interact in a stochastic environment! Each “episode”

16!

Outline!  Models for decentralized decision making

  Complexity results

  Solving finite-horizon DEC-POMDPs

  Solving infinite-horizon DEC-POMDPs

  Scalability beyond two agents

  Conclusion

Page 15: Decentralized Decision Makingebrun/ijcai2011/ws_papers/zilber... · 3! Problem Characteristics! A group of decision makers or agents interact in a stochastic environment! Each “episode”

17!

Previous Complexity Results!

MDP! P-complete !( if T < |S| )!

Papadimitriou & Tsitsiklis 87!

POMDP! PSPACE- complete !( if T < |S| )!

Papadimitriou & Tsitsiklis 87!

MDP! P-complete! Papadimitriou & Tsitsiklis 87!

POMDP! Undecidable! Madani et al. 99!

Finite Horizon!

Infinite-Horizon Discounted!

Page 16: Decentralized Decision Makingebrun/ijcai2011/ws_papers/zilber... · 3! Problem Characteristics! A group of decision makers or agents interact in a stochastic environment! Each “episode”

18!

How Hard are DEC-POMDPs? !Bernstein, Givan, Immerman & Zilberstein, UAI 2000, MOR 2002

  The complexity of finite-horizon DEC-POMDPs has been hard to establish.!

  A static version of the problem, where a single set of decisions is made in response to a single set of observations, was shown to be NP-hard [Tsitsiklis and Athan, 1985]!

  We proved that two-agent finite-horizon DEC-POMDPs are NEXP-hard!

  But these are worst-case results! ! Are real-world problems easier?!

Page 17: Decentralized Decision Makingebrun/ijcai2011/ws_papers/zilber... · 3! Problem Characteristics! A group of decision makers or agents interact in a stochastic environment! Each “episode”

19!

What Features of the Domain Affect the Complexity and How?   Factored state spaces (structured domains)!  Independent transitions (IT)!

  Independent observations (IO)!  Structured reward function (SR)!

  Goal-oriented objectives (GO)!  Degree of observability (partial, full, jointly full)!

  Degree and structure of interaction!

  Degree of information sharing and communication!

Page 18: Decentralized Decision Makingebrun/ijcai2011/ws_papers/zilber... · 3! Problem Characteristics! A group of decision makers or agents interact in a stochastic environment! Each “episode”

20!

NP-C

P-C P-C

NP-C

NP-C NEXP-C

NEXP-C

Complexity of Sub-Classes Goldman & Zilberstein, JAIR 2004

Finite-Horizon

DEC-MDP

IO & IT Goal Oriented

Goal Oriented

|G| = 1 |G| > 1 Certain Conditions

w/ Sharing Information

Page 19: Decentralized Decision Makingebrun/ijcai2011/ws_papers/zilber... · 3! Problem Characteristics! A group of decision makers or agents interact in a stochastic environment! Each “episode”

21!

Outline!  Models for decentralized decision making

  Complexity results

  Solving finite-horizon DEC-POMDPs

  Solving infinite-horizon DEC-POMDPs

  Scalability beyond two agents

  Conclusion

Page 20: Decentralized Decision Makingebrun/ijcai2011/ws_papers/zilber... · 3! Problem Characteristics! A group of decision makers or agents interact in a stochastic environment! Each “episode”

JESP: First DP Algorithm!Nair, Tambe, Yokoo, Pynadath & Marsella, IJCAI 2003!

  JESP: Joint Equilibrium-based Search for Policies!

  Complexity: exponential!

  Result: only locally optimal solutions!

22!

Page 21: Decentralized Decision Makingebrun/ijcai2011/ws_papers/zilber... · 3! Problem Characteristics! A group of decision makers or agents interact in a stochastic environment! Each “episode”

23!

Is Exact DP Possible?

  The key to solving POMDPs is that they can be viewed as belief-state MDPs [Smallwood & Sondik 73] !

  Not as clear how to define a belief-state MDP for a DEC-POMDP!

  The first exact DP algorithm for finite-horizon DEC-POMDPs uses the notion of a generalized belief state!

  The algorithm also applies to competitive situations modeled as POSGs !

Page 22: Decentralized Decision Makingebrun/ijcai2011/ws_papers/zilber... · 3! Problem Characteristics! A group of decision makers or agents interact in a stochastic environment! Each “episode”

Generalized Belief State!

!A generalized belief state captures the uncertainty of one agent with respect to the state of the world as well as the policies of other agents.!

24!

Page 23: Decentralized Decision Makingebrun/ijcai2011/ws_papers/zilber... · 3! Problem Characteristics! A group of decision makers or agents interact in a stochastic environment! Each “episode”

25!

Strategy Elimination!  Any finite-horizon DEC-POMDP can be

converted to a normal form game!  But the number of strategies is doubly

exponential in the horizon length!!

R111, R11

2! …! R1n1, R1n

2!

…! …! …!

Rm11, Rm1

2! …! Rmn1, Rmn

2!

Page 24: Decentralized Decision Makingebrun/ijcai2011/ws_papers/zilber... · 3! Problem Characteristics! A group of decision makers or agents interact in a stochastic environment! Each “episode”

26!

A Better Way to Do Elimination Hansen, Bernstein & Zilberstein, AAAI 2004

  We can use dynamic programming to eliminate dominated strategies without first converting to normal form!

  Pruning a subtree eliminates the set of trees containing it!

a1

a1 a2

a2 a2 a3 a3

o1

o1 o2 o1 o2

o2 a3

a2 a1

o1 o2

a2

a2 a3

a3 a2 a2 a1

o1

o1 o2 o1 o2

o2

prune

eliminate

Page 25: Decentralized Decision Makingebrun/ijcai2011/ws_papers/zilber... · 3! Problem Characteristics! A group of decision makers or agents interact in a stochastic environment! Each “episode”

35!

First Exact DP for DEC-POMDPs !Hansen, Bernstein & Zilberstein, AAAI 2004

  Theorem: DP performs iterated elimination of dominated strategies in the normal form of the POSG.!

  Corollary: DP can be used to find an optimal joint policy in a DEC-POMDP.!

  Algorithm is complete & optimal!

  Complexity is double exponential!

Page 26: Decentralized Decision Makingebrun/ijcai2011/ws_papers/zilber... · 3! Problem Characteristics! A group of decision makers or agents interact in a stochastic environment! Each “episode”

37!

Alternative: Heuristic Search Szer, Charpillet & Zilberstein, UAI 2005!  Perform forward best-

first search in the space of joint policies!

  Take advantage of a known start state distribution!

  Take advantage of domain-independent heuristics for pruning!

Page 27: Decentralized Decision Makingebrun/ijcai2011/ws_papers/zilber... · 3! Problem Characteristics! A group of decision makers or agents interact in a stochastic environment! Each “episode”

38!

The MAA* Algorithm Szer, Charpillet & Zilberstein, UAI 2005!

  MAA* is complete and optimal!

  Main advantage: significant reduction in memory requirements over the dynamic programming approach!

Page 28: Decentralized Decision Makingebrun/ijcai2011/ws_papers/zilber... · 3! Problem Characteristics! A group of decision makers or agents interact in a stochastic environment! Each “episode”

Scaling Up Heuristic Search Spaan, Oliehoek, and Amato, IJCAI 2011!

  Problem with MAA*: The number of children of a node is doubly exponential in the node's depth !

  Basic idea: avoid the full expansion of each node by incrementally generating the children only when a child might have a higher heuristic value !

  Introduce a more memory-efficient representation for heuristic functions!

  Yields a speed up over the state-of-the-art allowing for the optimal solution over longer horizons!

39!

Page 29: Decentralized Decision Makingebrun/ijcai2011/ws_papers/zilber... · 3! Problem Characteristics! A group of decision makers or agents interact in a stochastic environment! Each “episode”

Scaling Up Heuristic Search Spaan, Oliehoek, and Amato, IJCAI 2011!

40!

Page 30: Decentralized Decision Makingebrun/ijcai2011/ws_papers/zilber... · 3! Problem Characteristics! A group of decision makers or agents interact in a stochastic environment! Each “episode”

41!

Memory-Bounded DP (MBDP) Seuken & Zilberstein, IJCAI 2007!

  Combining the two approaches:!  The DP algorithm is a bottom-up approach!  The search operates top-down!

  The DP step can only eliminate a policy tree if it is dominated for every belief state!

  But, only a small subset of the belief space is actually reachable!

  Furthermore, the combined approach allows the algorithm to focus on a small subset of joint policies that appear best !

Page 31: Decentralized Decision Makingebrun/ijcai2011/ws_papers/zilber... · 3! Problem Characteristics! A group of decision makers or agents interact in a stochastic environment! Each “episode”

42!

Memory-Bounded DP Cont.

Page 32: Decentralized Decision Makingebrun/ijcai2011/ws_papers/zilber... · 3! Problem Characteristics! A group of decision makers or agents interact in a stochastic environment! Each “episode”

43!

The MBDP Algorithm!

Page 33: Decentralized Decision Makingebrun/ijcai2011/ws_papers/zilber... · 3! Problem Characteristics! A group of decision makers or agents interact in a stochastic environment! Each “episode”

44!

Generating “Good” Belief States!  MDP Heuristic -- Obtained by solving the

corresponding fully-observable multiagent MDP!

  Infinite-Horizon Heuristic -- Obtained by solving the corresponding infinite-horizon DEC-POMDP!

  Random Policy Heuristic -- Could augment another heuristic by adding random exploration!

  Heuristic Portfolio -- Maintain a set of belief states generated by a set of different heuristics!

  Recursive MBDP!

Page 34: Decentralized Decision Makingebrun/ijcai2011/ws_papers/zilber... · 3! Problem Characteristics! A group of decision makers or agents interact in a stochastic environment! Each “episode”

46!

Performance of MBDP!

Page 35: Decentralized Decision Makingebrun/ijcai2011/ws_papers/zilber... · 3! Problem Characteristics! A group of decision makers or agents interact in a stochastic environment! Each “episode”

MBDP Successors!  Improved MBDP (IMBDP)!

![Seuken and Zilberstein, UAI 2007]!

  MBDP with Observation Compression (MBDP-OC)!![Carlin and Zilberstein, AAMAS 2008]!

  Point Based Incremental Pruning (PBIP)!![Dibangoye, Mouaddib, and Chaib-draa, AAMAS 2009]!

  PBIP with Incremental Policy Generation (PBIP-IPG)!![Amato, Dibagoye, Zilberstein, AAAI 2009]!

  Constraint-Based Dynamic Programming (CBDP)!![Kumar and Zilberstein, AAMAS 2009]!

  Point-Based Backup for Decentralized POMDPs!![Kumar and Zilberstein, AAMAS 2010]!

  Point-Based Policy Generation (PBPG)!![Wu, Zilberstein, and Chen, AAMAS 2010]! 49!

Page 36: Decentralized Decision Makingebrun/ijcai2011/ws_papers/zilber... · 3! Problem Characteristics! A group of decision makers or agents interact in a stochastic environment! Each “episode”

Key Ideas Behind These Algorithms!  Perform search in a reduced

policy space!  Exact algorithm perform only

lossless pruning !  Approximate algorithms rely on

more aggressive pruning!  MBDP represents an

exponential size policy with linear space O(maxTrees × T)!

  Resulting policy is an acyclic finite-state controller.!

55!

Page 37: Decentralized Decision Makingebrun/ijcai2011/ws_papers/zilber... · 3! Problem Characteristics! A group of decision makers or agents interact in a stochastic environment! Each “episode”

56!

Outline!  Models for decentralized decision making

  Complexity results

  Solving finite-horizon DEC-POMDPs

  Solving infinite-horizon DEC-POMDPs

  Scalability beyond two agents

  Conclusion

Page 38: Decentralized Decision Makingebrun/ijcai2011/ws_papers/zilber... · 3! Problem Characteristics! A group of decision makers or agents interact in a stochastic environment! Each “episode”

57!

Infinite-Horizon DEC-POMDPs

  Unclear how to define a compact belief-state without fixing the policies of other agents!

  Value iteration does not generalize to the infinite-horizon case!

  Can generalize policy iteration for POMDPs [Hansen 98, Poupart & Boutilier 04]!

  Basic idea: Representing local policies using (deterministic/stochastic) finite-state controllers and defining a set of controller transformations that guarantee improvement & convergence !

Page 39: Decentralized Decision Makingebrun/ijcai2011/ws_papers/zilber... · 3! Problem Characteristics! A group of decision makers or agents interact in a stochastic environment! Each “episode”

58!

Policies as Controllers

  Finite state controller represents each policy!  Fixed memory!  Randomness used to offset memory limitations !  Action selection, ψ : Qi → ΔAi!  Transitions, η : Qi × Ai × Oi → ΔQi!

  Value of two-agent joint controller given by the Bellman equation:!

V (q1,q2,s) = P(a1 |q1)P(a2 |q2)a1 ,a2

∑ R(s,a1,a2) +[

γ P(s' | s,a1,a2) O(o1,o2 | s',a1,a2) P(q1' |q1,a1,o1)P(q2 ' |q2,a2,o2)q1 ',q2 '∑

o1 ,o2

∑ V (q1',q2 ',s')s'∑

⎦ ⎥ ⎥

Page 40: Decentralized Decision Makingebrun/ijcai2011/ws_papers/zilber... · 3! Problem Characteristics! A group of decision makers or agents interact in a stochastic environment! Each “episode”

59!

Controller Example

a1

a1

a2

o1

  Stochastic controller for one agent!  2 nodes, 2 actions, 2 observations !  Parameters!

•  P(ai | qi)!•  P(qi | qi, oi)

1 2

o2 0.5 0.5

0.75 0.25

o1

1.0

o2 1.0

1.0

o2

1.0 1.0

'!

Page 41: Decentralized Decision Makingebrun/ijcai2011/ws_papers/zilber... · 3! Problem Characteristics! A group of decision makers or agents interact in a stochastic environment! Each “episode”

60!

Finding Optimal Controllers   How can we search the space of possible joint

controllers?!  How do we set the parameters of the controllers

to maximize value?!  Deterministic controllers – can use traditional

search methods such as BSF or B&B!  Stochastic controllers – continuous optimization

problem!  Key question: how to best use a limited

amount of memory to optimize value?!

Page 42: Decentralized Decision Makingebrun/ijcai2011/ws_papers/zilber... · 3! Problem Characteristics! A group of decision makers or agents interact in a stochastic environment! Each “episode”

61!

Independent Joint Controllers!  Local controller for agent

i is defined by conditional distribution P(ai, qi | qi, oi)

  Independent joint controller is expressed by: Πi P(ai, qi | qi, oi)

  Can be represented as a dynamic Bayes net!

'!

'!

s0

q1t+1

q2t+1

o2t+1

o1t+1

o21

o1t

o2t

a1t

a2t

st

a1

q2t

a2 o2

q2 q!

2

q1

q1t

a1t

st st+1

s!

o1

q!

1

s

a1

a2 o2

q2 q!

2

q1

s!

o1

q!

1

s

q!

cqc

a10

a20

s1

r0

a2t

o11

Page 43: Decentralized Decision Makingebrun/ijcai2011/ws_papers/zilber... · 3! Problem Characteristics! A group of decision makers or agents interact in a stochastic environment! Each “episode”

62!

Correlated Joint Controllers!Bernstein, Hansen & Zilberstein, IJCAI 2005, JAIR 2009!

  A correlation device, [Qc,ψ], is a set of nodes and a stochastic state transition function !

'!'!'!

  Joint controller:!!∑qc

P(qc|qc) Πi P(ai, qi | qi, oi, qc)!

  A shared source of randomness affecting decisions and memory state update!

  Random bits for the correlation device can be determined prior to execution time!

s0

q1t+1

q2t+1

o2t+1

o1t+1

o21

o1t

o2t

a1t

a2t

st

a1

q2t

a2 o2

q2 q!

2

q1

q1t

a1t

st st+1

s!

o1

q!

1

s

a1

a2 o2

q2 q!

2

q1

s!

o1

q!

1

s

q!

cqc

a10

a20

s1

r0

a2t

o11

Page 44: Decentralized Decision Makingebrun/ijcai2011/ws_papers/zilber... · 3! Problem Characteristics! A group of decision makers or agents interact in a stochastic environment! Each “episode”

63!

Exhaustive Backups!

a1 a2 o1,o2

o1,o2 a1 a2

o1,o2

o1,o2

a1

a1

a1 a1

a2

a2

a2 a2

o2 o1

o1 o2

o1,o2

o2 o1

o1,o2

o2 o1

o1,o2 o1,o2

a1

a1

a1 a1

a2

a2

a2 a2

o2 o1

o1 o2

o1,o2

o2 o1

o1,o2

o2 o1

o1,o2 o1,o2

  Add a node for every possible action and deterministic transition rule!

  Repeated backups converge to optimality, but lead to very large controllers!

Page 45: Decentralized Decision Makingebrun/ijcai2011/ws_papers/zilber... · 3! Problem Characteristics! A group of decision makers or agents interact in a stochastic environment! Each “episode”

64!

Value-Preserving Transformations!  A value-preserving transformation changes

the joint controller without sacrificing value!

  Formally, there must exist mappings! fi : Qi → ΔRi for each agent i and fc : Qc → ΔRc such that!

!for all s ∈ S, , and

V (s, q ,qc ) ≤ P( r | q ) P(rc |qc )V (s, r ,rc )

rc

∑ r ∑

q ∈ Q

qc ∈ Qc

Page 46: Decentralized Decision Makingebrun/ijcai2011/ws_papers/zilber... · 3! Problem Characteristics! A group of decision makers or agents interact in a stochastic environment! Each “episode”

65!

Bounded Policy Iteration Algorithm!Bernstein, Hansen & Zilberstein, IJCAI 2005, JAIR 2009!

Repeat!1)  Evaluate the controller!2)  Perform an exhaustive backup!3)  Perform value-preserving transformations!Until controller is ε-optimal for all states!

Theorem: For any ε, bounded policy iteration returns a joint controller that is ε-optimal for all initial states in a finite number of iterations.!

Page 47: Decentralized Decision Makingebrun/ijcai2011/ws_papers/zilber... · 3! Problem Characteristics! A group of decision makers or agents interact in a stochastic environment! Each “episode”

66!

Useful Transformations!  Controller reductions!

  Shrink the controller without sacrificing value!

  Bounded dynamic programming updates!  Increase value while keeping the size fixed!

  Both can be done using polynomial-size linear programs!

  Generalize ideas from POMDP literature, particularly the BPI algorithm [Poupart & Boutilier 03]!

Page 48: Decentralized Decision Makingebrun/ijcai2011/ws_papers/zilber... · 3! Problem Characteristics! A group of decision makers or agents interact in a stochastic environment! Each “episode”

67!

Controller Reduction!  For some node qi, find a convex combination of nodes

in Qi \ qi that dominates qi for all states and nodes of the other controllers; Merge qi into the convex combination by changing transition probabilities!

  Corresponding linear program:!!Variables: ε, !!Objective: Maximize ε!!Constraints: ∀s ∈ S, q–i ∈ Q–i, qc ∈ Qc!

  Theorem: A controller reduction is a value-preserving transformation.!

V (s,qi,q−i,qc ) + ε ≤ P( ˆ q i)V (s, ˆ q i,q− i,qc )ˆ q i

∑€

P( ˆ q i)

Page 49: Decentralized Decision Makingebrun/ijcai2011/ws_papers/zilber... · 3! Problem Characteristics! A group of decision makers or agents interact in a stochastic environment! Each “episode”

68!

Bounded DP Update!  For some node qi, find better parameters assuming that

the old parameters will be used from the second step onwards; New parameters must yield value at least as high for all states and nodes of the other controllers!

  Corresponding linear program:!!Variables: ε, P(ai, qi | qi, oi, qc)!!Objective: Maximize ε!!Constraints: ∀s ∈ S, q–i ∈ Q–i, qc ∈ Qc!

  Theorem: A bounded DP update is a value-preserving transformation. !

V (s, q ,qc ) + ε ≤ P( a | q ,qc ) R(s,a) + γ P( q ' | q , a , o ,qc )P(s', o | s, a )P(qc ' |qc )V (s',

q ',qc ')s', o , q ',qc '∑

⎣ ⎢ ⎢

⎦ ⎥ ⎥

a ∑

'!

Page 50: Decentralized Decision Makingebrun/ijcai2011/ws_papers/zilber... · 3! Problem Characteristics! A group of decision makers or agents interact in a stochastic environment! Each “episode”

69!

Modifying the Correlation Device!  Both transformations can be applied to the

correlation device!  Slightly different linear programs to solve!  Can think of the correlation device as another

agent!  Lots of implementation questions…!

  What to use for an initial joint controller?!  Which transformations to perform?!  Order for choosing nodes to remove or improve?!

Page 51: Decentralized Decision Makingebrun/ijcai2011/ws_papers/zilber... · 3! Problem Characteristics! A group of decision makers or agents interact in a stochastic environment! Each “episode”

76!

Decentralized BPI Summary!  DEC-BPI finds better and much more compact

solutions than exhaustive backups!  A larger correlation device tends to lead to higher

values on average!  Larger local controllers tend to yield higher average

values up to a point!  But, bounded DP is limited by improving one

controller at a time!  Linear program (one-step lookahead) results in

local optimality and tends to “get stuck”!

Page 52: Decentralized Decision Makingebrun/ijcai2011/ws_papers/zilber... · 3! Problem Characteristics! A group of decision makers or agents interact in a stochastic environment! Each “episode”

77!

Nonlinear Optimization Approach!Amato, Bernstein & Zilberstein, UAI 2007, JAAMAS 2010!

  Basic idea: Model the problem as a non-linear program (NLP)!

  Consider node values (as well as controller parameters) as variables!

  The NLP can take advantage of an initial state distribution when it is given!

  Improvement and evaluation all in one step (equivalent to an infinite lookahead) !

  Additional constraints maintain valid values!

Page 53: Decentralized Decision Makingebrun/ijcai2011/ws_papers/zilber... · 3! Problem Characteristics! A group of decision makers or agents interact in a stochastic environment! Each “episode”

78!

NLP Representation Variables:! , ,!Objective: Maximize !

Value Constraints: ∀s ∈ S, ∈ Q

Additional linear constraints:!  ensure controllers are independent!  all probabilities sum to 1 and are non-negative!

z( q ,s) = x( q ', a ) R(s, a ) + γ P(s' | s, a ) O( o | s', a ) y( q , a , o , q ') q '∑

o ∑ z( q ',s')

s'∑

⎣ ⎢ ⎢

⎦ ⎥ ⎥

a ∑

b0(s)s∑ z( q 0,s)

x( q , a ) = P( a | q )

y( q , a , o , q ') = P( q ' | q , a , o )

z( q ,s) = V ( q ,s)

q

Page 54: Decentralized Decision Makingebrun/ijcai2011/ws_papers/zilber... · 3! Problem Characteristics! A group of decision makers or agents interact in a stochastic environment! Each “episode”

79!

Independence Constraints

  Independence constraints guarantee that action selection and controller transition probabilities for each agent depend only on local information!

  Action selection independence:!

  Controller transition independence:!

Page 55: Decentralized Decision Makingebrun/ijcai2011/ws_papers/zilber... · 3! Problem Characteristics! A group of decision makers or agents interact in a stochastic environment! Each “episode”

80!

Probability Constraints

  Probability constraints guarantee that action selection probabilities and controller transition probabilities are non negative and that they add up to 1: !

! (Superscript f ʼs represent arbitrary fixed values)!

Page 56: Decentralized Decision Makingebrun/ijcai2011/ws_papers/zilber... · 3! Problem Characteristics! A group of decision makers or agents interact in a stochastic environment! Each “episode”

81!

Optimality Theorem: An optimal solution of the NLP results in

optimal stochastic controllers for the given size and initial state distribution.!

  Advantages of the NLP approach:!  Efficient policy representation with fixed memory!  NLP represents optimal policy for given size!  Takes advantage of known start state!  Easy to implement using off-the-shelf solvers!

  Limitations:!  Difficult to solve optimally!

Page 57: Decentralized Decision Makingebrun/ijcai2011/ws_papers/zilber... · 3! Problem Characteristics! A group of decision makers or agents interact in a stochastic environment! Each “episode”

82!

Adding a Correlation Device   NLP approach can be extended to include a correlation

device, using the following formulation:!

  New variable w(c,c') represents the transition function of the correlation device; action selection and controller transitions depend on new shared signal. !

Page 58: Decentralized Decision Makingebrun/ijcai2011/ws_papers/zilber... · 3! Problem Characteristics! A group of decision makers or agents interact in a stochastic environment! Each “episode”

83!

Comparison of NLP & DEC-BPI!Amato, Bernstein & Zilberstein, UAI 2007, JAAMAS 2010!

  Used freely available nonlinear constrained optimization solver called “filter” on the NEOS server (http://www-neos.mcs.anl.gov/neos/)!

  Solver guarantees locally optimal solution!  Used 10 random initial controllers for a range

of controller sizes!  Compared NLP with DEC-BPI, with and

without a small (2-node) correlation device!

Page 59: Decentralized Decision Makingebrun/ijcai2011/ws_papers/zilber... · 3! Problem Characteristics! A group of decision makers or agents interact in a stochastic environment! Each “episode”

84!

Results: Broadcast Channel!Amato, Bernstein & Zilberstein, UAI 2007!

  Simple two agents networking problem !!(2 agents, 4 states, 2 actions, 5 observations)!

  Average quality over 10 trials:!

  Average run time:!

Page 60: Decentralized Decision Makingebrun/ijcai2011/ws_papers/zilber... · 3! Problem Characteristics! A group of decision makers or agents interact in a stochastic environment! Each “episode”

85!

Results: Multi-Agent Tiger!Amato, Bernstein & Zilberstein, JAAMAS 2010!

  A two-agent version of a “well-known” POMDP benchmark [Nair et al. 03] (2 states, 3 actions, 2 observations)!

  Average quality of various controller sizes using NLP methods with and without 2-node correlation device and BFS!

Page 61: Decentralized Decision Makingebrun/ijcai2011/ws_papers/zilber... · 3! Problem Characteristics! A group of decision makers or agents interact in a stochastic environment! Each “episode”

86!

Results: Meeting in a Grid!Amato, Bernstein & Zilberstein, JAAMAS 2010!

  A two-agent domain with 16 states, 5 actions, 2 observations!  Average quality of various controller sizes using NLP

methods and DEC-BPI with and without 2-node correlation device and BFS!

Page 62: Decentralized Decision Makingebrun/ijcai2011/ws_papers/zilber... · 3! Problem Characteristics! A group of decision makers or agents interact in a stochastic environment! Each “episode”

Results: Box Pushing!Amato, Bernstein & Zilberstein, JAAMAS 2010!

88!

Values and running times (in seconds) for each controller size using NLP methods and DEC-BPI with and without a 2 node correlation device and BFS. An “x” indicates that the approach was not able to solve the problem.!

Page 63: Decentralized Decision Makingebrun/ijcai2011/ws_papers/zilber... · 3! Problem Characteristics! A group of decision makers or agents interact in a stochastic environment! Each “episode”

90!

NLP Approach Summary   The NLP defines the optimal fixed-size stochastic

controller!

  Approach shows consistent improvement over DEC-BPI using an off-the-shelf locally optimal solver!

  A small correlation device can have significant benefits !

  Better performance may be obtained by exploiting the structure of the NLP!

Page 64: Decentralized Decision Makingebrun/ijcai2011/ws_papers/zilber... · 3! Problem Characteristics! A group of decision makers or agents interact in a stochastic environment! Each “episode”

91!

Outline!  Models for decentralized decision making

  Complexity results

  Solving finite-horizon DEC-POMDPs

  Solving infinite-horizon DEC-POMDPs

  Scalability beyond two agents

  Conclusion

Page 65: Decentralized Decision Makingebrun/ijcai2011/ws_papers/zilber... · 3! Problem Characteristics! A group of decision makers or agents interact in a stochastic environment! Each “episode”

Exploiting the Locality of Interaction

  In practical settings that involve many agents, each agent often interacts with a small number of “neighboring” agents (e.g., firefighting, sensor networks)!

  Algorithms designed exploit this property include LID-JESP [Nair et al. AAAI 05] and SPIDER [Varakantham et al. AAMAS 07] and FANS [Marecki et al. AAMAS 08] !

  FANS uses FSCs for policy representation and!  Exploits FSCs for dynamic programming in policy evaluation and

heuristic computations and provides significant speedups!  Introduces novel heuristics to automatically vary the FSC size in

different agents!  Performs policy search that exploits the locality of agent interactions!

92!

Page 66: Decentralized Decision Makingebrun/ijcai2011/ws_papers/zilber... · 3! Problem Characteristics! A group of decision makers or agents interact in a stochastic environment! Each “episode”

93!

Constraint-Based DP !Kumar & Zilberstein, AAMAS 2009

  Model the domain as a Network Distributed POMDP (ND-POMDP)—a restricted class of DEC-POMDPs characterized by a decomposable reward function.!

  CBDP uses a point-based dynamic programming (similar to MBDP).!

  CBDP uses constraint networks algorithms to improve the efficiency of key steps:!  Computation of the heuristic function!  Belief sampling using heuristic function!  Finding the best joint policy for a particular belief!

Page 67: Decentralized Decision Makingebrun/ijcai2011/ws_papers/zilber... · 3! Problem Characteristics! A group of decision makers or agents interact in a stochastic environment! Each “episode”

94!

Results: Sensors Tracking Target!Kumar & Zilberstein, AAMAS 2009

  CBDP provides orders!!of magnitude of speedup !!over FANS!

  Provides better solution quality for all test instances!  Provides strong theoretical guarantees on the time and

space complexity enhancing scalability!  Linear complexity in planning horizon length!  Linear in the number of agents, which is necessary to solve large

realistic problems!  Exponential only in a small parameter that depends on the level of

interaction among the agents!

N S E W N

S E W N S E W loc1 loc2

Page 68: Decentralized Decision Makingebrun/ijcai2011/ws_papers/zilber... · 3! Problem Characteristics! A group of decision makers or agents interact in a stochastic environment! Each “episode”

Sample Results   A 7-agent configuration with 4 actions per agent. Two adjacent

agents are required to track a target!  Graphs show the solution quality (left) and time (right) of our

approach (CBDP) compared with the best existing method (FANS)!

  FANS is not scalable beyond horizon 7. CBDP has linear complexity in the horizon, and it provides better solution quality is less time!

0.1!

1!

10!

100!

1000!

2! 3! 4! 5! 6! 7! 8! 10!

CBDP!FANS!

0!

100!

200!

300!

400!

500!

600!

700!

2! 3! 4! 5! 6! 7! 8! 10!

CBDP!FANS!

Horizon! Horizon!

Solu

tion

qual

ity!

Tim

e (s

ec, l

ogsc

ale)!

Page 69: Decentralized Decision Makingebrun/ijcai2011/ws_papers/zilber... · 3! Problem Characteristics! A group of decision makers or agents interact in a stochastic environment! Each “episode”

New Scalable Approach!Kumar, Zilberstein, and Toussaint, IJCAI 2011!

  Extend an approach [Toussaint and Storkey, ICML 06] that maps planning under uncertainty (POMDP) problems into probabilistic inference !

  Characterize general constraints on the interaction graph that facilitate scalable planning!

  Introduce an efficient algorithm to solve such models using probabilistic inference!

  Identify a number of existing models with such constraints!

96!

Page 70: Decentralized Decision Makingebrun/ijcai2011/ws_papers/zilber... · 3! Problem Characteristics! A group of decision makers or agents interact in a stochastic environment! Each “episode”

Value Factorization!  θ = parameters of an agent!  Factors state-space s = (s1, . . . , sM )!

  Example: Consider four agents!!s.t. V = V12 + V23 + V34!

97!

Page 71: Decentralized Decision Makingebrun/ijcai2011/ws_papers/zilber... · 3! Problem Characteristics! A group of decision makers or agents interact in a stochastic environment! Each “episode”

Existing Models Satisfy VF!  Each agent/state variable can participate in multiple

value factors!  Worst case complexity is NEXP-C!  TI-DEC-MDP, ND-POMDP, TD-POMDP satisfy value

factorization!

98!

Page 72: Decentralized Decision Makingebrun/ijcai2011/ws_papers/zilber... · 3! Problem Characteristics! A group of decision makers or agents interact in a stochastic environment! Each “episode”

Computational Advantages!  Applicability!

  In models that satisfy VF, inference in the EM framework can be done independently in each value factor!

  Smaller value factors ⇒ efficient inference!  Planning no longer exponential, linear in # of factors!

  Implementation!  Distributed planning!  Efficient implementation using message-passing!  Parallel computation of messages!

99!

Page 73: Decentralized Decision Makingebrun/ijcai2011/ws_papers/zilber... · 3! Problem Characteristics! A group of decision makers or agents interact in a stochastic environment! Each “episode”

Planning by Inference!  Recasts planning as likelihood

maximization in a DBN mixture with binary reward variable r :!!P(r =1 | s, a1, a2) ∝ R(s, a1, a2)!

100!

DB

N M

ixtu

re

Page 74: Decentralized Decision Makingebrun/ijcai2011/ws_papers/zilber... · 3! Problem Characteristics! A group of decision makers or agents interact in a stochastic environment! Each “episode”

Exploiting the VF Property!  Exploit additive nature of value function for scalability!  Outer mixture simulates the VF property!  Each Vf (θf, sf ) evaluated using time dependent mixture!

  Theorem: Maximizing the likelihood of observing the variable r = 1 optimizes the joint-policy!

101!

Page 75: Decentralized Decision Makingebrun/ijcai2011/ws_papers/zilber... · 3! Problem Characteristics! A group of decision makers or agents interact in a stochastic environment! Each “episode”

The Expectation-Maximization Algorithm!  Observed data r = 1, every other variable hidden!  Use the EM algorithm to maximize the likelihood!  Implemented using message passing on the VF graph!  Example: 3 factors {Ag1, Ag2}, {Ag2, Ag3} and {Ag2, Ag3}!

102!

Page 76: Decentralized Decision Makingebrun/ijcai2011/ws_papers/zilber... · 3! Problem Characteristics! A group of decision makers or agents interact in a stochastic environment! Each “episode”

Properties of the EM Algorithm!  Scalability!

  μ message requires independent inference in each factor!  Agents/state vars. can be involved in multiple factors – can

model complex systems via simpler interactions!  Distributed planning via message passing!

  Complexity!  Linear in the number of factors, exponential in the number of

agents/state variables in a factor!  Generality!

  No additional assumptions (such as TOI) required – a general optimization recipe for models with the VF property!

  Local optima?!103!

Page 77: Decentralized Decision Makingebrun/ijcai2011/ws_papers/zilber... · 3! Problem Characteristics! A group of decision makers or agents interact in a stochastic environment! Each “episode”

Experiments!  ND-POMDP domains involving target tracking in

sensor networks with imperfect sensing!  Multiple targets, limited sensors with battery!  Penalty = -1 per sensor for miscoordination,

recharging battery; positive reward (+80) per target scanned simultaneous by two adjacent sensors!

104!

Page 78: Decentralized Decision Makingebrun/ijcai2011/ws_papers/zilber... · 3! Problem Characteristics! A group of decision makers or agents interact in a stochastic environment! Each “episode”

Comparisons with NLP Approach (5P Domain)!

106!

Page 79: Decentralized Decision Makingebrun/ijcai2011/ws_papers/zilber... · 3! Problem Characteristics! A group of decision makers or agents interact in a stochastic environment! Each “episode”

Scalability on Larger Benchmarks!  15 agent and 20 agent domains, internal states = 5!

108!

Page 80: Decentralized Decision Makingebrun/ijcai2011/ws_papers/zilber... · 3! Problem Characteristics! A group of decision makers or agents interact in a stochastic environment! Each “episode”

Summary of the EM Approach !  Value factorization (VF) facilitates scalability!  Several existing weakly-coupled models satisfy VF!  An EM algorithm can solve models with such

property and yield good quality solutions!  Scalability: E-step decomposes according to value

factors; smaller factors lead to efficient inference!  Can be easily implemented using message-passing

among the agents!  Future work: Explore techniques for even faster

inference, and establish better error bounds.!109!

Page 81: Decentralized Decision Makingebrun/ijcai2011/ws_papers/zilber... · 3! Problem Characteristics! A group of decision makers or agents interact in a stochastic environment! Each “episode”

110!

Outline!  Models for decentralized decision making

  Complexity results

  Solving finite-horizon DEC-POMDPs

  Solving infinite-horizon DEC-POMDPs

  Scalability beyond two agents

  Conclusion

Page 82: Decentralized Decision Makingebrun/ijcai2011/ws_papers/zilber... · 3! Problem Characteristics! A group of decision makers or agents interact in a stochastic environment! Each “episode”

111!

Back to Some Basic Questions!  Are DEC-POMDPs significantly harder to solve than

POMDPs? Why?!  What features of the problem domain affect the

complexity and how?!  Is optimal dynamic programming possible?!  Can dynamic programming be made practical?!  Is it beneficial to treat communication as a separate type

of action?!  How can we exploit the locality of agent interaction to

develop more scalable algorithms?!

Page 83: Decentralized Decision Makingebrun/ijcai2011/ws_papers/zilber... · 3! Problem Characteristics! A group of decision makers or agents interact in a stochastic environment! Each “episode”

112!

Questions?!

Additional Information:!Resource-Bounded Reasoning Lab!University of Massachusetts, Amherst!http://rbr.cs.umass.edu