Top Banner
1 Execution-Time Communication Decisions for Coordination of Multi-Agent Teams Maayan Roth Thesis Defense Carnegie Mellon University September 4, 2007
71

Execution-Time Communication Decisions for Coordination of Multi-Agent Teams

Feb 02, 2016

Download

Documents

tiana

Execution-Time Communication Decisions for Coordination of Multi-Agent Teams. Maayan Roth Thesis Defense Carnegie Mellon University September 4, 2007. Cooperative Multi-Agent Teams Operating Under Uncertainty and Partial Observability. Cooperative teams - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Execution-Time Communication Decisions for Coordination of       Multi-Agent Teams

1

Execution-Time Communication Decisions for Coordination of Multi-

Agent Teams

Maayan RothThesis Defense

Carnegie Mellon University

September 4, 2007

Page 2: Execution-Time Communication Decisions for Coordination of       Multi-Agent Teams

2

Cooperative Multi-Agent Teams Operating Under Uncertainty and Partial Observability

Cooperative teams– Agents work together to

achieve team reward– No individual motivations

Uncertainty – Actions have stochastic

outcomes Partial observability

– Agents don’t always know world state

Page 3: Execution-Time Communication Decisions for Coordination of       Multi-Agent Teams

3

Coordinating When Communication is a Limited Resource

Tight coordination– One agent’s best action

choice depends on the action choices of its teammates

– We wish to Avoid Coordination Errors

Limited communication– Communication costs– Limited bandwidth

Page 4: Execution-Time Communication Decisions for Coordination of       Multi-Agent Teams

4

Thesis Question

“How can we effectively use communication to enable the coordination of cooperative

multi-agent teams making sequential decisions under uncertainty and partial

observability?

Page 5: Execution-Time Communication Decisions for Coordination of       Multi-Agent Teams

5

Multi-Agent Sequential Decision Making

Page 6: Execution-Time Communication Decisions for Coordination of       Multi-Agent Teams

6

Thesis Statement

“Reasoning about communication decisions at execution-time provides a more tractable means for coordinating

teams of agents operating under uncertainty and partial observability.”

Page 7: Execution-Time Communication Decisions for Coordination of       Multi-Agent Teams

7

Thesis Contributions

Algorithms that: – Guarantee agents will Avoid Coordination Errors

(ACE) during decentralized execution– Answer the questions of when and what agents

should communicate

Page 8: Execution-Time Communication Decisions for Coordination of       Multi-Agent Teams

8

Outline

Dec-POMDP model– Impact of communication on complexity

Avoiding Coordination Errors by reasoning over Possible Joint Beliefs (ACE-PJB)– ACE-PJB-Comm: When should agents communicate?– Selective ACE-PJB-Comm: What should agents

communicate? Avoiding Coordination Errors by executing Individual

Factored Policies (ACE-IFP) Future directions

Page 9: Execution-Time Communication Decisions for Coordination of       Multi-Agent Teams

9

Dec-POMDP Model

Decentralized Partially Observable Markov Decision Process– Multi-agent extension of single-agent POMDP

model– Sequential decision-making in domains where:

Uncertainty in outcome of actions Partial observability - uncertainty about world state

Page 10: Execution-Time Communication Decisions for Coordination of       Multi-Agent Teams

10

Dec-POMDP Model

M = <, S, {Ai}im, T, {i}im, O, R> is the number of agents– S is set of possible world states– {Ai}im is set of joint actions, <a1, …, am> where ai Ai

– T defines transition probabilities over joint actions– {i}im is set of joint observations, <1, …, m> where i

i

– O defines observation probabilities over joint actions and joint observations

– R is team reward function

Page 11: Execution-Time Communication Decisions for Coordination of       Multi-Agent Teams

11

Dec-POMDP Complexity

Goal - Compute policy which, for each agent, maps its local observation history to an action

For all 2, Dec-POMDP with agents is NEXP-complete– Agents must reason about the possible actions

and observations of their teammates

Page 12: Execution-Time Communication Decisions for Coordination of       Multi-Agent Teams

12

Impact of Communication on Complexity [Pynadath and Tambe, 2002]

If communication is free:– Dec-POMDP reducible to single-agent POMDP – Optimal communication policy is to communicate

at every time step

When communication has any cost, Dec-POMDP is still intractable (NEXP-complete)– Agents must reason about value of information

Page 13: Execution-Time Communication Decisions for Coordination of       Multi-Agent Teams

13

Classifying Communication Heuristics

AND- vs. OR-communication [Emery-Montemerlo, 2005]

– AND-communication does not replace domain-level actions– OR-communication does replace domain-level actions

Initiating communication [Xuan et al., 2001]– Tell - Agent decides to tell local information to teammates– Query - Agent asks a teammate for information– Sync - All agents broadcast all information simultaneously

Page 14: Execution-Time Communication Decisions for Coordination of       Multi-Agent Teams

14

Classifying Communication Heuristics

Does the algorithm consider communication cost?

Is the algorithm is applicable to:– General Dec-POMDP domains– General Dec-MDP domains– Restricted domains

Are the agents guaranteed to Avoid Coordination Errors?

Page 15: Execution-Time Communication Decisions for Coordination of       Multi-Agent Teams

15

Related Work

[Xuan and Lesser, 2002] X X X

Communicative JESP [Nair et al., 2003]

X X X X

BaGA-Comm [Emery-Montemerlo, 2005]

X X X X

ACE-PJB-Comm X X X X X

Selective ACE-PJB-Comm

X X X X X

ACE-IFP X X / X

Unr

estr

icte

d

Cos

t

Syn

c

Que

ry

Tel

l

OR

AN

D

AC

E

Page 16: Execution-Time Communication Decisions for Coordination of       Multi-Agent Teams

16

Overall Approach

Recall, if communication is free, you can treat a Dec-POMDP like a single agent

1) At plan-time, pretend communication is free- Generate a centralized policy for the team

2) At execution-time, use communication to enable decentralized execution of this policy while Avoiding Coordination Errors

Page 17: Execution-Time Communication Decisions for Coordination of       Multi-Agent Teams

17

Outline

Dec-POMDP, Dec-MDP models– Impact of communication on complexity

Avoiding Coordination Errors by reasoning over Possible Joint Beliefs (ACE-PJB)– ACE-PJB-Comm: When should agents communicate?– Selective ACE-PJB-Comm: What should agents communicate?

Avoiding Coordination Errors by executing Individual Factored Policies (ACE-IFP)

Future directions

Page 18: Execution-Time Communication Decisions for Coordination of       Multi-Agent Teams

18

Tiger Domain: (States, Actions)

Two-agent tiger problem [Nair et al., 2003]:

S: {SL, SR}

Tiger is either behind left door or behind right door

Individual Actions:

ai {OpenL, OpenR, Listen}

Robot can open left door, open right door, or listen

Page 19: Execution-Time Communication Decisions for Coordination of       Multi-Agent Teams

19

Tiger Domain: (Observations)

Individual Observations:

I {HL, HR}

Robot can hear tiger behind left door or hear tiger behind right door

Observations are noisy and independent.

Page 20: Execution-Time Communication Decisions for Coordination of       Multi-Agent Teams

20

Tiger Domain:(Reward)

Coordination problem – agents must act together for maximum reward

Maximum reward (+20) when both agents open door with treasure

Minimum reward (-100) when only one agent opens door with tiger

Listen has small cost (-1 per agent)

Both agents opening door with tiger leads to medium negative reward (-50)

Page 21: Execution-Time Communication Decisions for Coordination of       Multi-Agent Teams

21

Coordination Errors

HL

HL

HL

a1 = OpenR

a2 = OpenL

Reward(<OpenR, OpenL>) = -100

Reward(<OpenL, OpenL>) ≥ -50

Agents Avoid Coordination Errors when each agent’s action is a best response to its teammates’ actions.

Page 22: Execution-Time Communication Decisions for Coordination of       Multi-Agent Teams

22

Avoid Coordination Errors by Reasoning Over Possible Joint Beliefs (ACE-PJB)

Centralized POMDP policy maps joint beliefs to joint actions

– Joint belief (bt) – distribution over world states Individual agents can’t compute the joint belief

– Don’t know what their teammates have observed or what action they selected

Simplifying assumption:– What if agents knew the joint action at each timestep?– Agents would only have to reason about possible

observations– How can this be assured?

Page 23: Execution-Time Communication Decisions for Coordination of       Multi-Agent Teams

23

Ensuring Action Synchronization

Agents only allowed to choose actions based on information known to all team members

At start of execution, agents knowb0 – initial distribution over world states

A0 – optimal joint action given b0, based on centralized policy

At each timestep, each agent computes Lt, distribution of possible joint beliefs Lt = {<bt, pt, t>} t – observation history that led to bt

pt - likelihood of observing t

Page 24: Execution-Time Communication Decisions for Coordination of       Multi-Agent Teams

24

Possible Joint Beliefs

a = <Listen, Listen>

HL

HL

How should agents select actions over joint beliefs?

),|( 111 −−− ×= ttttt abPpp

b: P(SL) = 0.5

p: p(b) = 1.0L0

b: P(SL) = 0.8

p: p(b) = 0.29L1

b: P(SL) = 0.5

p: p(b) = 0.21

b: P(SL) = 0.5

p: p(b) = 0.21

b: P(SL) = 0.2

p: p(b) = 0.29

HL,HL

HL,HR

HR

,HL

HR,HR

Page 25: Execution-Time Communication Decisions for Coordination of       Multi-Agent Teams

25

Q-POMDP Heuristic

Select joint action that maximizes expected reward over possible joint beliefs

Q-MDP [Littman et al., 1995]– approximate solution to large POMDP using

underlying MDP

Q-POMDP [Roth et al., 2005]– approximate solution to Dec-POMDP using

underlying single-agent POMDP

)()(maxarg)( sVsbbQSs

aa

MDP ∑∈

×=

)),(()(maxarg)( aLbQLpLQ ti

LL

ti

a

tPOMDP

tti

×= ∑∈

Page 26: Execution-Time Communication Decisions for Coordination of       Multi-Agent Teams

26

Q-POMDP Heuristic

)),(()(maxarg)( aLbQLpLQ ti

LL

ti

a

tPOMDP

tti

×= ∑∈

Choose joint action by computing expected reward over all leaves

Agents will independently select same joint action, guaranteeing they avoid coordination errors…

but action choice is very conservative (always <Listen,Listen>)

ACE-PJB-Comm: Communication adds local observations to joint belief

b: P(SL) = 0.5

p: p(b) = 1.0

b: P(SL) = 0.8

p: p(b) = 0.29

b: P(SL) = 0.5

p: p(b) = 0.21

b: P(SL) = 0.5

p: p(b) = 0.21

b: P(SL) = 0.2

p: p(b) = 0.29

HL,HL

HL,HR H

R,H

L

HR,HR

Page 27: Execution-Time Communication Decisions for Coordination of       Multi-Agent Teams

27

ACE-PJB-Comm Example

<HR,HL><HL,HL> <HL,HR>

{}

<HR,HR>

HL

L1

aNC = Q-POMDP(L1) = <Listen,Listen>

L* = circled nodes

aC = Q-POMDP(L*) = <Listen,Listen>

Don’t communicate

Page 28: Execution-Time Communication Decisions for Coordination of       Multi-Agent Teams

28

ACE-PJB-Comm Example

<HL,HL>

{}

L1 <HL,HR> <HR,HL> <HR,HR>

…<HL,HL>

<HL,HL>

<HL,HL>

<HL,HR>

<HL,HL>

<HR,HL>

<HL,HL>

<HR,HR>

<HL,HR>

<HL,HL>

<HL,HR>

<HL,HR>

<HL,HR>

<HR,HL>

<HL,HR>

<HR.HR>L2

a = <Listen, Listen>

{HL,HL}

aNC = Q-POMDP(L2) = <Listen, Listen>

L* = circled nodes

V(aC) - V(aNC) > ε

Agent 1 communicatesaC = Q-POMDP(L*) = <OpenR,OpenR>

Page 29: Execution-Time Communication Decisions for Coordination of       Multi-Agent Teams

29

ACE-PJB-Comm Example

<HL,HL>

{}

L1 <HL,HR> <HR,HL> <HR,HR>

…<HL,HL>

<HL,HL>

<HL,HL>

<HL,HR>

<HL,HL>

<HR,HL>

<HL,HL>

<HR,HR>

<HL,HR>

<HL,HL>

<HL,HR>

<HL,HR>

<HL,HR>

<HR,HL>

<HL,HR>

<HR.HR>L2

a = <Listen, Listen>

{HL,HL}

Agent 1 communicates <HL,HL>

Q-POMDP(L2) = <OpenR, OpenR>Agents open right door!

Page 30: Execution-Time Communication Decisions for Coordination of       Multi-Agent Teams

30

ACE-PJB-Comm Results

20,000 trials in 2-Agent Tiger Domain– 6 timesteps per trial

Agents communicate 49.7% fewer observations using ACE-PJB-Comm, 93.3% fewer messages

Difference in expected reward because ACE-PJB-Comm is slightly pessimistic about outcome of communication

Mean Reward

()

Mean Messages

()

Mean Observations

()

Full Communication 7.14

(27.88)

10.0

(0.0)

10.0

(0.0)

ACE-PJB-Comm 5.31

(19.79)

1.77

(0.79)

5.13

(2.38)

Page 31: Execution-Time Communication Decisions for Coordination of       Multi-Agent Teams

31

Additional Challenges

Number of possible joint beliefs grows exponentially– Use particle filter to model distribution of possible joint

beliefs

ACE-PJB-Comm answers the question of when agents should communicate

– Doesn’t deal with what to communicate– Agents communicate all observations that they haven’t

previously communicated

Page 32: Execution-Time Communication Decisions for Coordination of       Multi-Agent Teams

32

Selective ACE-PJB-Comm[Roth et al., 2006]

Answers what agents should communicate Chooses most valuable subset of

observations

Hill-climbing heuristic to choose observations that “push” teams towards aC

– aC - joint action that would be chosen if agent communicated all observations

– See details in thesis document

Page 33: Execution-Time Communication Decisions for Coordination of       Multi-Agent Teams

33

Selective ACE-PJB-Comm Results

2-Agent Tiger domain:

Communicates 28.7% fewer observations Same expected reward Slightly more messages

Mean Reward

()

Mean Messages

()

Mean Observations

()

ACE-PJB-Comm 5.30

(19.79)

1.77

(0.79)

5.13

(2.38)

Selective ACE-PJB-Comm

5.31

(19.74)

1.81

(0.92)

3.66

(1.67)

Page 34: Execution-Time Communication Decisions for Coordination of       Multi-Agent Teams

34

Outline

Dec-POMDP, Dec-MDP models– Impact of communication on complexity

Avoiding Coordination Errors by reasoning over Possible Joint Beliefs (ACE-PJB)– ACE-PJB-Comm: When should agents communicate?– Selective ACE-PJB-Comm: What should agents communicate?

Avoiding Coordination Errors by executing Individual Factored Policies (ACE-IFP)

Future directions

Page 35: Execution-Time Communication Decisions for Coordination of       Multi-Agent Teams

35

Dec-MDP

State is collectively observable– One agent can’t identify full state on its own– Union of team observations uniquely identifies

state Underlying problem is an MDP, not a

POMDP Dec-MDP has same complexity as Dec-

POMDP– NEXP-Complete

Page 36: Execution-Time Communication Decisions for Coordination of       Multi-Agent Teams

36

Acting Independently

ACE-PJB requires agents to know joint action at every timestep

Claim: In many multi-agent domains, agents can act independently for long periods of time, only needing to coordinate infrequently

Page 37: Execution-Time Communication Decisions for Coordination of       Multi-Agent Teams

37

Meeting-Under-Uncertainty Domain

Agents must move to goal location and signal simultaneously

Reward:+20 - Both agents signal at goal

-50 - Both agents signal at another location

-100 - Only one agent signals

-1 - Agents move north, south, east, west, or stop

Page 38: Execution-Time Communication Decisions for Coordination of       Multi-Agent Teams

38

Factored Representations

Represent relationships among state variables instead of relationships among states

S = <X0, Y0, X1, Y1>

Each agent observes its own position

Page 39: Execution-Time Communication Decisions for Coordination of       Multi-Agent Teams

39

Factored Representations

Dynamic Decision Network models state variables over time at = <East, *>:

Page 40: Execution-Time Communication Decisions for Coordination of       Multi-Agent Teams

40

Tree-structured Policies

Decision tree that branches over state variables A tree-structured joint policy has joint actions at the leaves

Page 41: Execution-Time Communication Decisions for Coordination of       Multi-Agent Teams

41

Approach [Roth et al., 2007]

Generate tree-structured joint policies for underlying centralized MDP

Use this joint policy to generate a tree-structured individual policy for each agent*

Execute individual policies

* See details in thesis document

Page 42: Execution-Time Communication Decisions for Coordination of       Multi-Agent Teams

42

Context-specific Independence

Claim: In many multi-agent domains, one agent’s individual policy will have large sections where it is independent

of variables that its teammates observe.

Page 43: Execution-Time Communication Decisions for Coordination of       Multi-Agent Teams

43

Individual Policies

One agent’s individual policy may depend on state features it doesn’t observe

Page 44: Execution-Time Communication Decisions for Coordination of       Multi-Agent Teams

44

Avoid Coordination Errors by Executing an Individual Factored Policy (ACE-IFP)

Robot traverses policy tree according to its observations

– If it reaches a leaf, its action is independent of its teammates’ observations

– If it reaches a state variable that it does not observe directly, it must ask a teammate for the current value of that variable

The amount of communication needed to execute a particular policy corresponds to the amount of context-specific independence in that domain

Page 45: Execution-Time Communication Decisions for Coordination of       Multi-Agent Teams

45

Avoid Coordination Errors by Executing an Individual Factored Policy (ACE-IFP)

Benefits:– Agents can act independently without reasoning

about the possible observations or actions of their teammates

– Policy directs agents about when, what, and with whom to communicate

Drawback:– In domains with little independence, agents may

need to communicate a lot

Page 46: Execution-Time Communication Decisions for Coordination of       Multi-Agent Teams

46

Experimental Results

In 3x3 domain, executing factored policy required less than half as many messages as full communication, with same reward

Communication usage decreases relative to full communication as domain size increases

Mean Reward

Mean Messages

Sent

Mean Variables

Sent

Full Communication

17.484 7.032 14.064

Factored Execution

17.484 3.323 6.646

Page 47: Execution-Time Communication Decisions for Coordination of       Multi-Agent Teams

47

Factored Dec-POMDPs

[Hansen and Feng, 2000] looked at factored POMDPs– ADD-representations of transition, observation, and reward

functions– Policy is a finite-state controller

Nodes are actions Transitions depend on conjunctions of state variable

assignments

To extend to Dec-POMDP, make individual policy a finite-state controller among individual actions

– Somehow combine nodes with the same action– Communicate to enable transitions between action nodes

Page 48: Execution-Time Communication Decisions for Coordination of       Multi-Agent Teams

48

Future Directions

Considering communication cost in ACE-IFP– All children of a particular variable may have

similar values– Worst-case cost of mis-coordination?– Modeling teammate variables requires reasoning

about possible teammate actions

Extending factoring to Dec-POMDPs

Page 49: Execution-Time Communication Decisions for Coordination of       Multi-Agent Teams

49

Future Directions

Knowledge persistence– Modeling teammates’ variables– Can we identify “necessary conditions”?

e.g. “Tell me when you reach the goal.”

Are you here yet?

Are you here yet?

Page 50: Execution-Time Communication Decisions for Coordination of       Multi-Agent Teams

50

Contributions

Decentralized execution of centralized policies– Guarantee that agents will Avoid Coordination

Errors– Make effective use of limited communication

resources– When should agents communicate?– What should agents communicate?

Demonstrate significant communication savings in experimental domains

Page 51: Execution-Time Communication Decisions for Coordination of       Multi-Agent Teams

51

Contributions

ACE-PJB-Comm X X X X X XSelective ACE-PJB-Comm

X X X X X X X

ACE-IFP X X / X X X X

Unr

estr

icte

d

Cos

t

Syn

c

Que

ry

Tel

l

OR

AN

D

AC

E

Whe

n?

Wha

t?

Who

?

Page 52: Execution-Time Communication Decisions for Coordination of       Multi-Agent Teams

52

Thank You!

Advisors: Reid Simmons, Manuela Veloso Committee: Carlos Guestrin, Jeff Schneider,

Milind Tambe RI Folks: Suzanne, Alik, Damion, Doug,

Drew, Frank, Harini, Jeremy, Jonathan, Kristen, Rachel (and many others!)

Aba, Ema, Nitzan, Yoel

Page 53: Execution-Time Communication Decisions for Coordination of       Multi-Agent Teams

53

References

Roth, M., Simmons, R., and Veloso, M. “Reasoning About Joint Beliefs for Execution-Time Communication Decisions” In AAMAS, 2005

Roth, M., Simmons, R., and Veloso, M. “What to Communicate? Execution-Time Decisions in Multi-agent POMDPs” In DARS, 2006

Roth, M., Simmons, R., and Veloso, M. “Exploiting Factored Representations for Decentralized Execution in Multi-agent Teams” In AAMAS, 2007

Bernstein, D., Zilberstein, S., and Immerman, N. “The Complexity of Decentralized Control of Markov Decision Processes” In UAI, 2000

Pynadath, D. and Tambe, M. “The Communicative Multiagent Team Decision Problem: Analyzing Teamwork Theories and Models” In JAIR, 2002

Becker, R., Zilberstein, S., Lesser, V., and Goldman, C. “Transition-independent Decentralized Markov Decision Processes” In AAMAS, 2003

Nair, R., Roth, M., Yokoo, M., and Tambe, M. “Communication for Improving Policy Computation in Distributed POMDPs” In IJCAI, 2003

Page 54: Execution-Time Communication Decisions for Coordination of       Multi-Agent Teams

54

Tiger Domain Details

Action/State SL SR

<OpenR, OpenR> +20 -50

<OpenL, OpenL> -50 +20

<OpenR, OpenL> -100 -100

<OpenL, OpenR> -100 -100

<Listen, Listen> -2 -2

<Listen, OpenR> +9 -101

<Listen, OpenL> +9 -101

<OpenR, Listen> -101 +9

<OpenL, Listen> -101 +9

Action/Transition SL → SL SL → SR SR→ SL SR → SR

<OpenR, *> 0.5 0.5 0.5 0.5

<OpenL, *> 0.5 0.5 0.5 0.5

<*, OpenR> 0.5 0.5 0.5 0.5

<*, OpenL> 0.5 0.5 0.5 0.5

<Listen, Listen> 1.0 0.0 1.0 0.0

Action State HL HL

<Listen, Listen> SL 0.7 0.3

<Listen, Listen> SR 0.3 0.7

<OpenR, *> * 0.5 0.5

<OpenL, *> * 0.5 0.5

<*, OpenR> * 0.5 0.5

<*, OpenL> * 0.5 0.5

Page 55: Execution-Time Communication Decisions for Coordination of       Multi-Agent Teams

55

Particle filter representation

Each particle is a possible joint belief Each agent maintains two particle filters:

– Ljoint : possible joint team beliefs

– Lown : possible joint beliefs that are consistent with local observation history

Compare action selected by Q-POMDP over Ljoint to action selected over Lown and communicate as needed

Page 56: Execution-Time Communication Decisions for Coordination of       Multi-Agent Teams

56

Related Work: Transition Independence [Becker, Zilberstein, Lesser, Goldman, 2003]

DEC-MDP – collective observability Transition independence:

– Local state transitions Each agent observes local state Individual actions only affect local state transitions

– Team connected through joint reward Coverage set algorithm – finds optimal policy quickly

in experimental domains

No communication

Page 57: Execution-Time Communication Decisions for Coordination of       Multi-Agent Teams

57

Related Work: COMM-JESP [Nair, Roth, Yokoo, Tambe, 2004]

Add SYNC action to domain– If one agent chooses SYNC,

all other agents SYNC– At SYNC, send entire

observation history since last SYNC

SYNC brings agents to synchronized belief over world states

Policies indexed by root synchronized belief and observation history since last SYNC

t=0(SL () 0.5)(SR () 0.5)

(SL (HR) 0.1275)(SL (HL) 0.7225)(SR (HR) 0.1275)(SR (HL) 0.0225)

(SL (HR) 0.0225)(SL (HL) 0.1275)(SR (HR) 0.7225)(SR (HL) 0.1275)

a = {Listen, Listen}

= HL = HR

a = SYNC

t = 2(SL () 0.5)(SR () 0.5)

t = 2(SL () 0.97)(SR () 0.03)

“At-most K” heuristic – there must be a SYNC within at most K timesteps

Page 58: Execution-Time Communication Decisions for Coordination of       Multi-Agent Teams

58

Related Work: “No news is good news” [Xuan, Lesser, Zilberstein, 2000]

Applies to transition-independent DEC-MDPs Agents form joint plan

– “plan”: exact path to be followed to accomplish goal

Communicate when deviation from plan occurs– agent sees it has slipped from optimal path– communicates need for re-planning

Page 59: Execution-Time Communication Decisions for Coordination of       Multi-Agent Teams

59

Related Work: BaGA-Comm [Emery-Montemerlo, 2005]

Each agent has a type– Observation and action history

Agents model distribution of possible joint types– Choose actions by finding joint type closest to own local

type– Allows coordination errors

Communicate if gain in expected reward is greater than cost of communication

Page 60: Execution-Time Communication Decisions for Coordination of       Multi-Agent Teams

60

Colorado/Wyoming Domain

Robots must meet in the capital, but do not know if they are in Colorado or Wyoming

Robots receive positive reward of +20 only if they SIGNAL simultaneously from correct goal location

To simplify problem, each robot knows both own and teammate position

Colorado Wyoming= capital

Page 61: Execution-Time Communication Decisions for Coordination of       Multi-Agent Teams

61

Noisy observations – mountain, plain, pikes peak, old faithful

Communication can help team reach goal more efficiently

Pike’s Peak

Old Faithful

Colorado/Wyoming Domain

= possible goal location

State Mt Pl PP Of

C 0.7 0.1 0.19 0.01

W 0.1 0.7 0.01 0.19

Page 62: Execution-Time Communication Decisions for Coordination of       Multi-Agent Teams

62

Build-Message: What to Communicate

First, determine if communication is necessary

– Calculate AC using Ace-PJB-Comm

– If AC = ANC, do not communicate

Greedily build message– “Hill-climbing” towards AC,

away from ANC- Choose single observation that

most increases difference between Q-POMDP values of AC

and ANC

Mt

Pl

Mt

Pike

Page 63: Execution-Time Communication Decisions for Coordination of       Multi-Agent Teams

63

Build-Message: What to Communicate

Is communication necessary?

Mt

Pl

Mt

Pike

ANC = [east, south]

AC = [east, west]

AC ≠ ANC so agent should communicate

Page 64: Execution-Time Communication Decisions for Coordination of       Multi-Agent Teams

64

Build-Message: What to Communicate

Mt

AC = [east, west] - “toward Denver”

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0 0.5 1

P(State = Colorado)

Distribution if agent communicates entire observation history

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0 0.5 1

P(State = Colorado)

Mt

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0 0.5 1

P(State = Colorado)

Mt

PlPl

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0 0.5 1

P(State = Colorado)

Pike

Page 65: Execution-Time Communication Decisions for Coordination of       Multi-Agent Teams

650

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0 0.5 1

P(State = Colorado)

Build-Message: What to Communicate

Mt

Pl

Mt

AC = [east, west] – “toward Denver”

- PIKE is single best observation

- In this case, PIKE sufficient to change joint action to AC, so agent communicates only one observation

m = {Pike}

Page 66: Execution-Time Communication Decisions for Coordination of       Multi-Agent Teams

66

Context-specific Independence

A variable may be independent of a parent variable in some contexts but not others

– e.g. X2 depends on X3 when X1 has value 1, but is independent otherwise

Claim - Many multi-agent domains exhibit a large amount of context-specific independence

Page 67: Execution-Time Communication Decisions for Coordination of       Multi-Agent Teams

67

Constructing Individual Factored Policies

[Boutilier et al., 2000] defined Merge and Simplify operations for policy trees

We want to construct trees that maximize context-specific independence– Depends on variable ordering in policy– We define Intersect and Independent

operations

Page 68: Execution-Time Communication Decisions for Coordination of       Multi-Agent Teams

68

Intersect

Find the intersection of the action sets of a node’s children

1. If all children are leaves, and action sets have non-empty intersections, replace the node with the intersection2. If all but one child is a leaf, and all the actions in the non-leaf child’s subtree are included in the leaf-children’s intersection, replace with the non-leaf child

Page 69: Execution-Time Communication Decisions for Coordination of       Multi-Agent Teams

69

Independent

An individual action is Independent in a particular leaf of a policy tree if it is optimal when paired with any action its teammate could perform at that leaf

a is independent for agent 1 agent 1 has no independent actions

Page 70: Execution-Time Communication Decisions for Coordination of       Multi-Agent Teams

70

Generate Individual Policies

Generate a tree-structured joint policy For each agent:

– Reorder variables in joint policy so that variables local to this agent are near the root

– For each leaf in the policy, find the Independent actions

– Break ties among remaining joint actions– Convert joint actions individual actions– Intersect and Simplify

Page 71: Execution-Time Communication Decisions for Coordination of       Multi-Agent Teams

71

Why Break Ties?

Ensure agents select the same optimal joint action to prevent mis-coordination