Top Banner
Convergent Learning in Unknown Graphical Games Dr Archie Chapman, Dr David Leslie, Dr Alex Rogers and Prof Nick Jennings School of Mathematics, University of Bristol and School of Electronics and Computer Science University of Southampton [email protected]
29

Convergent Learning in Unknown Graphical Games Dr Archie Chapman, Dr David Leslie, Dr Alex Rogers and Prof Nick Jennings School of Mathematics, University.

Dec 13, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Convergent Learning in Unknown Graphical Games Dr Archie Chapman, Dr David Leslie, Dr Alex Rogers and Prof Nick Jennings School of Mathematics, University.

Convergent Learning in Unknown Graphical Games 

Dr Archie Chapman, Dr David Leslie,Dr Alex Rogers and Prof Nick Jennings

School of Mathematics, University of Bristol and School of Electronics and Computer Science

University of Southampton

[email protected]

Page 2: Convergent Learning in Unknown Graphical Games Dr Archie Chapman, Dr David Leslie, Dr Alex Rogers and Prof Nick Jennings School of Mathematics, University.

Playing games?

)0,0()1,1()1,1(

)1,1()0,0()1,1(

)1,1()1,1()0,0(

Paper

Scissors

RockPaperScissorsRock

Page 3: Convergent Learning in Unknown Graphical Games Dr Archie Chapman, Dr David Leslie, Dr Alex Rogers and Prof Nick Jennings School of Mathematics, University.

Playing games?

Page 4: Convergent Learning in Unknown Graphical Games Dr Archie Chapman, Dr David Leslie, Dr Alex Rogers and Prof Nick Jennings School of Mathematics, University.

Playing games?Dense deployment of sensors to detect pedestrian and vehicle activity within an urban environment.

Berkeley Engineering

Page 5: Convergent Learning in Unknown Graphical Games Dr Archie Chapman, Dr David Leslie, Dr Alex Rogers and Prof Nick Jennings School of Mathematics, University.

Learning in games

• Adapt to observations of past play

• Hope to converge to something “good”

• Why?!• Bounded rationality justification of equilibrium• Robust to behaviour of “opponents”• Language to describe distributed optimisation

Page 6: Convergent Learning in Unknown Graphical Games Dr Archie Chapman, Dr David Leslie, Dr Alex Rogers and Prof Nick Jennings School of Mathematics, University.

Notation

• Players

• Discrete action sets

• Reward functions

• Mixed strategies

• Joint mixed strategy space

• Reward functions extend to

},...,1{ NiiA

R ni AAAr 1:

)( iii AN 1

R:ir

Page 7: Convergent Learning in Unknown Graphical Games Dr Archie Chapman, Dr David Leslie, Dr Alex Rogers and Prof Nick Jennings School of Mathematics, University.

Best response / Equilibrium

• Mixed strategies of all players other than i is

• Best response of player i is

• An equilibrium is a satisfying, for all i,

i

),(argmax)( iiiii rbii

)( iii b

Page 8: Convergent Learning in Unknown Graphical Games Dr Archie Chapman, Dr David Leslie, Dr Alex Rogers and Prof Nick Jennings School of Mathematics, University.

Fictitious playEstimate strategies of other players

Game matrix

Select best action given estimates

Update estimates

Page 9: Convergent Learning in Unknown Graphical Games Dr Archie Chapman, Dr David Leslie, Dr Alex Rogers and Prof Nick Jennings School of Mathematics, University.

Belief updates

• Belief about strategy of player i is the MLE

• Online updating

t

aa i

ti

iti

)()(

1111 )( ttt

tt b

Page 10: Convergent Learning in Unknown Graphical Games Dr Archie Chapman, Dr David Leslie, Dr Alex Rogers and Prof Nick Jennings School of Mathematics, University.

• Processes of the form

where and

• F is set-valued (convex and u.s.c.)

• Limit points are chain-recurrent sets of the differential inclusion

Stochastic approximation

1111 )( tttttt eMXFXX

0)|( 1 tt XME 0te

)(XFX

Page 11: Convergent Learning in Unknown Graphical Games Dr Archie Chapman, Dr David Leslie, Dr Alex Rogers and Prof Nick Jennings School of Mathematics, University.

Best-response dynamics

• Fictitious play has M and e identically 0, and

• Limit points are limit points of the best-response differential inclusion

• In potential games (and zero-sum games and some others) the limit points must be Nash equilibria

)(b

tt1

Page 12: Convergent Learning in Unknown Graphical Games Dr Archie Chapman, Dr David Leslie, Dr Alex Rogers and Prof Nick Jennings School of Mathematics, University.

Generalised weakened fictitious play

• Consider any process such that

where and

and also an interplay between and M.

• Convergence properties are unchanged

tttttt Mbt

111 )(

,0t t0t

Page 13: Convergent Learning in Unknown Graphical Games Dr Archie Chapman, Dr David Leslie, Dr Alex Rogers and Prof Nick Jennings School of Mathematics, University.

Fictitious playEstimate strategies of other players

Game matrix

Select best action given estimates

Update estimates

Page 14: Convergent Learning in Unknown Graphical Games Dr Archie Chapman, Dr David Leslie, Dr Alex Rogers and Prof Nick Jennings School of Mathematics, University.

Learning the game

?)(?,?)(?,?)(?,

?)(?,?)(?,?)(?,

?)(?,?)(?,?)(?,

Paper

Scissors

RockPaperScissorsRock

ti

ti

ti earR )(

Page 15: Convergent Learning in Unknown Graphical Games Dr Archie Chapman, Dr David Leslie, Dr Alex Rogers and Prof Nick Jennings School of Mathematics, University.

Reinforcement learning

• Track the average reward for each joint action

• Play each joint action frequently enough

• Estimates will be close to the expected value

• Estimated game converges to the true game

Page 16: Convergent Learning in Unknown Graphical Games Dr Archie Chapman, Dr David Leslie, Dr Alex Rogers and Prof Nick Jennings School of Mathematics, University.

Q-learned fictitious playEstimate strategies of other players

Game matrix

Select best action given estimates

Estimated game matrix

Select best action given estimates

Update estimates

Page 17: Convergent Learning in Unknown Graphical Games Dr Archie Chapman, Dr David Leslie, Dr Alex Rogers and Prof Nick Jennings School of Mathematics, University.

Theoretical result

Theorem – If all joint actions are played infinitely often then beliefs follow a GWFP

Proof: The estimated game converges to the true game, so selected strategies are -best responses.

Page 18: Convergent Learning in Unknown Graphical Games Dr Archie Chapman, Dr David Leslie, Dr Alex Rogers and Prof Nick Jennings School of Mathematics, University.

Playing games?Dense deployment of sensors to detect pedestrian and vehicle activity within an urban environment.

Berkeley Engineering

Page 19: Convergent Learning in Unknown Graphical Games Dr Archie Chapman, Dr David Leslie, Dr Alex Rogers and Prof Nick Jennings School of Mathematics, University.

It’s impossible!

• N players, each with A actions• Game matrix has AN entries to learn

• Each individual must estimate the strategy of every other individual

• It’s just not possible for realistic game scenarios

Page 20: Convergent Learning in Unknown Graphical Games Dr Archie Chapman, Dr David Leslie, Dr Alex Rogers and Prof Nick Jennings School of Mathematics, University.

Marginal contributions

• Marginal contribution of player i is

total system reward – system reward if i absent

• Maximised marginal contributions implies system is at a (local) optimum

• Marginal contribution might depend only on the actions of a small number of neighbours

Page 21: Convergent Learning in Unknown Graphical Games Dr Archie Chapman, Dr David Leslie, Dr Alex Rogers and Prof Nick Jennings School of Mathematics, University.

Graphical games

Page 22: Convergent Learning in Unknown Graphical Games Dr Archie Chapman, Dr David Leslie, Dr Alex Rogers and Prof Nick Jennings School of Mathematics, University.

Sensors – rewards

• Global reward for action a is

• Marginal reward for i is

• Actually use

j

an

jj

jg

jEIEaU )(

eventsobserved is

nobservatio and events

1)(

ij

ananiggi

jjEaUaUar

by observed

)(1)(

events)()()(

ij

ananti

tj

tjR

by observed

)(1)(

Page 23: Convergent Learning in Unknown Graphical Games Dr Archie Chapman, Dr David Leslie, Dr Alex Rogers and Prof Nick Jennings School of Mathematics, University.

Marginal contributions

Page 24: Convergent Learning in Unknown Graphical Games Dr Archie Chapman, Dr David Leslie, Dr Alex Rogers and Prof Nick Jennings School of Mathematics, University.

Local learningEstimate strategies of other players

Game matrix

Select best action given estimates

Estimated game matrix

Select best action given estimates

Update estimates

Page 25: Convergent Learning in Unknown Graphical Games Dr Archie Chapman, Dr David Leslie, Dr Alex Rogers and Prof Nick Jennings School of Mathematics, University.

Local learningEstimate strategies of neighbours

Game matrix

Select best action given estimates

Estimated game matrixfor local interactions

Select best action given estimates

Update estimates

Page 26: Convergent Learning in Unknown Graphical Games Dr Archie Chapman, Dr David Leslie, Dr Alex Rogers and Prof Nick Jennings School of Mathematics, University.

Theoretical result

Theorem – If all joint actions of local games are played infinitely often then beliefs follow a GWFP

Proof: The estimated game converges to the true game, so selected strategies are -best responses.

Page 27: Convergent Learning in Unknown Graphical Games Dr Archie Chapman, Dr David Leslie, Dr Alex Rogers and Prof Nick Jennings School of Mathematics, University.

Sensing results

Page 28: Convergent Learning in Unknown Graphical Games Dr Archie Chapman, Dr David Leslie, Dr Alex Rogers and Prof Nick Jennings School of Mathematics, University.

So what?!

• Convergence to (local) optimum with only noisy information and local communication

• Individual rationality: always choose an action to maximise expected reward

• Robustness: If an individual doesn’t “play cricket”, the others will reach a “Nash response”.

Page 29: Convergent Learning in Unknown Graphical Games Dr Archie Chapman, Dr David Leslie, Dr Alex Rogers and Prof Nick Jennings School of Mathematics, University.

Summary

• Learning the game while playing is essential

• This can be accommodated within the GWFP framework

• Exploiting the neighbourhood structure of marginal contributions is essential for feasibility