Top Banner
Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo joint work with Geoff Gordon, Jeff Schneider and Sebastian Thrun July 21, 2004 AAMAS 2004
34

Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs

Jan 13, 2016

Download

Documents

Becky

Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs. Rosemary Emery-Montemerlo joint work with Geoff Gordon, Jeff Schneider and Sebastian Thrun July 21, 2004AAMAS 2004. Robot Teams. Robot Teams. - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs

Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs

Rosemary Emery-Montemerlo

joint work with

Geoff Gordon, Jeff Schneider and Sebastian Thrun

July 21, 2004 AAMAS 2004

Page 2: Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs

Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo

Robot Teams

Page 3: Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs

Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo

Robot Teams

With limited communication, existing paradigms for decentralized robot control are not sufficient

Game theoretic methods are necessary for multi-robot coordination under these conditions

Page 4: Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs

Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo

Decentralized Decision Making

Page 5: Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs

Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo

Decentralized Decision Making

Page 6: Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs

Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo

Decentralized Decision Making

Page 7: Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs

Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo

Decentralized Decision Making

Page 8: Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs

Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo

Decentralized Decision Making

Page 9: Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs

Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo

Decentralized Decision Making

Page 10: Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs

Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo

Decentralized Decision Making

Page 11: Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs

Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo

Decentralized Decision Making

Page 12: Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs

Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo

Decentralized Decision Making

Page 13: Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs

Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo

Decentralized Decision Making

A robot cannot choose actions based only on joint observations consistent with its own sensor readings

It must consider all joint observations that are consistent with its possible sensor readings

Page 14: Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs

Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo

Relationship Between Decision Theoretic Models

State Space State Space Belief Space Belief Space

MDP POMDP ?

Distributionover

Belief Space

Page 15: Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs

Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo

Models of Multi-Agent Systems Partially observable stochastic

games Generalization of stochastic games to

partially observable worlds Related models

DEC-POMDP [Bernstein et al., 2000] MTDP [Pynadath and Tambe, 2002] I-POMDP [Gmystrasiewicz and Doshi, 2004] POIPSG [Peshkin et al., 2000]

Page 16: Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs

Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo

Partially Observable Stochastic Games

POSG = {I, S, A, Z, T, R, O} I is the set of agents, I= {1,…,n} S is the set of states A is the set of actions, A= A1 An Z is the set of observations, Z= Z1

Zn T is the transition function, T: S A S R is the reward function, R: S A O are the observation emission

probabilities O: S Z A [0,1]

Page 17: Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs

Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo

Solving POSGs

POSGs are computationally infeasible to solve

Page 18: Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs

Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo

Solving POSGs

Full POSG

One-StepLookaheadGame at time t(Bayesian Game)

We can approximate a POSG as a series of smaller Bayesian games

Page 19: Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs

Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo

Bayesian Games Private information relevant to game

Uncertainty in utility Type

Encapsulates private information Will limit selves to games with finite number

of types In robot example

Type 1: Robot doesn’t see anything Type 2: Robot sees intruder at location x

Page 20: Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs

Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo

Bayesian Games BG = {I, , A,p(), u}

is the joint type space, = 1 n is a specific joint type, = {1,…, n}

p() is common prior on the distribution over

u is the utility function, u= {u1,…,un} ui(ai,a-i,(i, -i))

i is a strategy for player i Defines what player i does for each of its

possible types Actions are individual actions, not joint

actions

Page 21: Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs

Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo

Bayesian-Nash Equilibrium

Set of best response strategies Each agent tries to maximize its

expected utility conditioned on its probability distribution over the other agents’ types p() Each agent has a policy i that, given

-i , maximizes ui(i,-i, -i)

Page 22: Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs

Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo

POSG to Bayesian Game Approximation {I,S,A,Z,T,R,O} to {I, , A,p(), u}t

I = I A = A Type space i

t = all possible histories of agent i’s actions and observations up to time t

p()t calculated from S0,A,T,Z,O, t-1

Prune low probability types Each joint type maps to a joint belief

u given by heuristic and ui = uj QMDP

Page 23: Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs

Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo

AlgorithmInitializet=0, hi = {},p(0)0=solveGame(0,p(0))

Make Observationhi = obsi

t U ait-1 U hi

Determine Typei

t = bestMatch(hi, i

t)

Execute Actionai

t = i

t (i

t )

Propagate Forwardt+1,p(t+1)

Find Policy for t+1t+1=solveGame(t,p(t

))t= t+1

Agent i

Initializet=0, hj = {},p(0)0=solveGame(0,p(0))

Make Observationhj = obsj

t U ajt-1 U hj

Determine Typej

t = bestMatch(hj, 2

t)

Execute Actionaj

t = j

t (j

t )

Propagate Forwardt+1,p(t+1)

Find Policy for t+1t+1=solveGame(t,p(t

))t= t+1

Agent j

Page 24: Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs

Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo

Robotic Team Tag Version of Team Tag

Environment is portion of Gates Hall Full teammate

observability Opponent can be

captured by a single robot in any state

QMDP used as heuristic

Two pioneer-class robots

Page 25: Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs

Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo

Robot Policies

Page 26: Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs

Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo

Lady And The Tiger [Nair et al. 2003]

Computation Time

0

100000

200000

300000

400000

500000

600000

700000

800000

900000

1000000

3 4 5 6 7 8 9 10

Horizion

Tim

e(m

s)

Full POSG

Bayesian GameApproximation

Page 27: Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs

Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo

Contributions Algorithm for finding approximate

solutions to POSG with common payoffs Tractability achieved by modeling POSG as

a sequence of Bayesian games Performs comparably to the full POSG for a

small finite-horizon problem Improved performance over ‘blind’

application of utility heuristic in more complex problems

Successful real-time game-theoretic controller for indoor robots

Page 28: Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs

Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo

Questions?

[email protected] www.cs.cmu.edu/~remery

Page 29: Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs

Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo

Back-Up Slides

Page 30: Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs

Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo

Policy Performance

-80

-70

-60

-50

-40

-30

-20

-10

0

10

20

3 4 5 6 7 8 9 10

Horizon

Ex

pe

cte

d o

r A

ve

rag

e R

ew

ard

Full POSG

Bayesian GameApproximationSelfish Policy

Lady And The Tiger [Nair et al. 2003]

Page 31: Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs

Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo

Robotic Team Tag I = {1,2} S = S1 X S2 X Sopponent

Si = {s0,…,s28}, sopponent= {s0,…,s28,stagged} |S| = 25230

Ai = {N,S,E,W,Tag} Zi = [{si,-1},s-i,a-i] T: adjacent cells O: see opponent if on same cell R: minimize capture time Modified from [Pineau et al. 2003]

Page 32: Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs

Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo

Environment

Page 33: Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs

Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo

Performance

-60

-50

-40

-30

-20

-10

0

Full Observability ofTeammate's Position

Without Full Observability ofTeammate's Position

Av

era

ge

Dis

co

un

ted

Va

lue

Full Observability

Most Likely State

QMDP

BayesianApproximation

Robotic Team Tag Results

Page 34: Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs

Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo

Performance

0

10

20

30

40

50

60

70

80

90

100

Full Observability ofTeammate's Position

Without FullObservability of

Teammate's Position

Av

era

ge

Tim

es

tep

s

Full Observability

Most Likely State

QMDP

BayesianApproximation

Robotic Team Tag Results