Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs

Post on 13-Jan-2016

37 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

DESCRIPTION

Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs. Rosemary Emery-Montemerlo joint work with Geoff Gordon, Jeff Schneider and Sebastian Thrun July 21, 2004AAMAS 2004. Robot Teams. Robot Teams. - PowerPoint PPT Presentation

Transcript

Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs

Rosemary Emery-Montemerlo

joint work with

Geoff Gordon, Jeff Schneider and Sebastian Thrun

July 21, 2004 AAMAS 2004

Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo

Robot Teams

Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo

Robot Teams

With limited communication, existing paradigms for decentralized robot control are not sufficient

Game theoretic methods are necessary for multi-robot coordination under these conditions

Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo

Decentralized Decision Making

Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo

Decentralized Decision Making

Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo

Decentralized Decision Making

Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo

Decentralized Decision Making

Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo

Decentralized Decision Making

Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo

Decentralized Decision Making

Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo

Decentralized Decision Making

Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo

Decentralized Decision Making

Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo

Decentralized Decision Making

Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo

Decentralized Decision Making

A robot cannot choose actions based only on joint observations consistent with its own sensor readings

It must consider all joint observations that are consistent with its possible sensor readings

Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo

Relationship Between Decision Theoretic Models

State Space State Space Belief Space Belief Space

MDP POMDP ?

Distributionover

Belief Space

Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo

Models of Multi-Agent Systems Partially observable stochastic

games Generalization of stochastic games to

partially observable worlds Related models

DEC-POMDP [Bernstein et al., 2000] MTDP [Pynadath and Tambe, 2002] I-POMDP [Gmystrasiewicz and Doshi, 2004] POIPSG [Peshkin et al., 2000]

Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo

Partially Observable Stochastic Games

POSG = {I, S, A, Z, T, R, O} I is the set of agents, I= {1,…,n} S is the set of states A is the set of actions, A= A1 An Z is the set of observations, Z= Z1

Zn T is the transition function, T: S A S R is the reward function, R: S A O are the observation emission

probabilities O: S Z A [0,1]

Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo

Solving POSGs

POSGs are computationally infeasible to solve

Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo

Solving POSGs

Full POSG

One-StepLookaheadGame at time t(Bayesian Game)

We can approximate a POSG as a series of smaller Bayesian games

Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo

Bayesian Games Private information relevant to game

Uncertainty in utility Type

Encapsulates private information Will limit selves to games with finite number

of types In robot example

Type 1: Robot doesn’t see anything Type 2: Robot sees intruder at location x

Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo

Bayesian Games BG = {I, , A,p(), u}

is the joint type space, = 1 n is a specific joint type, = {1,…, n}

p() is common prior on the distribution over

u is the utility function, u= {u1,…,un} ui(ai,a-i,(i, -i))

i is a strategy for player i Defines what player i does for each of its

possible types Actions are individual actions, not joint

actions

Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo

Bayesian-Nash Equilibrium

Set of best response strategies Each agent tries to maximize its

expected utility conditioned on its probability distribution over the other agents’ types p() Each agent has a policy i that, given

-i , maximizes ui(i,-i, -i)

Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo

POSG to Bayesian Game Approximation {I,S,A,Z,T,R,O} to {I, , A,p(), u}t

I = I A = A Type space i

t = all possible histories of agent i’s actions and observations up to time t

p()t calculated from S0,A,T,Z,O, t-1

Prune low probability types Each joint type maps to a joint belief

u given by heuristic and ui = uj QMDP

Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo

AlgorithmInitializet=0, hi = {},p(0)0=solveGame(0,p(0))

Make Observationhi = obsi

t U ait-1 U hi

Determine Typei

t = bestMatch(hi, i

t)

Execute Actionai

t = i

t (i

t )

Propagate Forwardt+1,p(t+1)

Find Policy for t+1t+1=solveGame(t,p(t

))t= t+1

Agent i

Initializet=0, hj = {},p(0)0=solveGame(0,p(0))

Make Observationhj = obsj

t U ajt-1 U hj

Determine Typej

t = bestMatch(hj, 2

t)

Execute Actionaj

t = j

t (j

t )

Propagate Forwardt+1,p(t+1)

Find Policy for t+1t+1=solveGame(t,p(t

))t= t+1

Agent j

Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo

Robotic Team Tag Version of Team Tag

Environment is portion of Gates Hall Full teammate

observability Opponent can be

captured by a single robot in any state

QMDP used as heuristic

Two pioneer-class robots

Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo

Robot Policies

Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo

Lady And The Tiger [Nair et al. 2003]

Computation Time

0

100000

200000

300000

400000

500000

600000

700000

800000

900000

1000000

3 4 5 6 7 8 9 10

Horizion

Tim

e(m

s)

Full POSG

Bayesian GameApproximation

Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo

Contributions Algorithm for finding approximate

solutions to POSG with common payoffs Tractability achieved by modeling POSG as

a sequence of Bayesian games Performs comparably to the full POSG for a

small finite-horizon problem Improved performance over ‘blind’

application of utility heuristic in more complex problems

Successful real-time game-theoretic controller for indoor robots

Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo

Questions?

remery@cs.cmu.edu www.cs.cmu.edu/~remery

Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo

Back-Up Slides

Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo

Policy Performance

-80

-70

-60

-50

-40

-30

-20

-10

0

10

20

3 4 5 6 7 8 9 10

Horizon

Ex

pe

cte

d o

r A

ve

rag

e R

ew

ard

Full POSG

Bayesian GameApproximationSelfish Policy

Lady And The Tiger [Nair et al. 2003]

Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo

Robotic Team Tag I = {1,2} S = S1 X S2 X Sopponent

Si = {s0,…,s28}, sopponent= {s0,…,s28,stagged} |S| = 25230

Ai = {N,S,E,W,Tag} Zi = [{si,-1},s-i,a-i] T: adjacent cells O: see opponent if on same cell R: minimize capture time Modified from [Pineau et al. 2003]

Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo

Environment

Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo

Performance

-60

-50

-40

-30

-20

-10

0

Full Observability ofTeammate's Position

Without Full Observability ofTeammate's Position

Av

era

ge

Dis

co

un

ted

Va

lue

Full Observability

Most Likely State

QMDP

BayesianApproximation

Robotic Team Tag Results

Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo

Performance

0

10

20

30

40

50

60

70

80

90

100

Full Observability ofTeammate's Position

Without FullObservability of

Teammate's Position

Av

era

ge

Tim

es

tep

s

Full Observability

Most Likely State

QMDP

BayesianApproximation

Robotic Team Tag Results

top related