M. Wellman 6 Sep 12 PRIMA12 1 Empirical GameTheore<c Analysis for Prac<cal Strategic Reasoning Michael P. Wellman University of Michigan Planning in Strategic Environments • Planning problem – find agent behavior sa<sfying/op<mizing objec<ves wrt environment – strives for ra<onality • When environment contains other agents – model them as ra#onal planners as well – problem is a game – search now mul<dimensional, different (global) objec<ve Environment Agent2 Agent1
30
Embed
Empirical Game-Theoretic Analysis for Practical Strategic Reasoning
Keynote by Michael Wellman at The 15th International Conference on Principles and Practice of Multi-Agent Systems (PRIMA-2012) during the Knowledge Technology Week (KTW2012). September 3 - 7, 2012. Kuching, Sarawak, Malaysia
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
M. Wellman 6 Sep 12
PRIMA-‐12 1
Empirical Game-‐Theore<c Analysis for Prac<cal Strategic Reasoning
Michael P. Wellman University of Michigan
Planning in Strategic Environments • Planning problem – find agent behavior sa<sfying/op<mizing objec<ves wrt environment
– strives for ra<onality
• When environment contains other agents – model them as ra#onal planners as well – problem is a game – search now mul<-‐dimensional, different (global) objec<ve
Environment Agent2 Agent1
M. Wellman 6 Sep 12
PRIMA-‐12 2
Real-‐World Games
• rich strategy space – strategy: obs* × <me
ac<on • severely incomplete informa<on – interdependent types (signals) – info par<ally revealed over
• Revealed payoff model – sample provides exact payoff – minimum-‐regret-‐first search (MRFS) • abempts to refute best current candidate
• Noisy payoff model – sample drawn from payoff distribu<on – informa<on gain search (IGS) • sample profile maximizing entropy difference wrt probability of being min-‐regret profile
• Simplest approach: direct es<ma<on – employ control variates and other variance reduc<on techniques
(s1,u(s1))
(sL,u(sL))
...
? u(•)
Empirical Game
Payoff data from selected profiles
Payoff Func<on Regression Si = [0,1]
eq = (0.32,0.32)
solve learned game
learn regression
3,3 1,0 1,1
4,1 2,2 4,1
1,1 1,4 3,3 0
0.5
1
0 0.5 1 generate data (simula<ons) FPSB2 Example
Vorobeychik et al., ML 2007
M. Wellman 6 Sep 12
PRIMA-‐12 24
Generaliza<on Risk Approach
• Model varia<ons – func<onal forms, rela<onship structures, parameters
– strategy granularity • Approach: – Treat candidate game model as a predictor for payoff data
– Adopt loss func<on for predictor
– Select model candidate minimizing expected loss
Cross ValidaHon
Observa#on Data
Training
Fold 2 Fold 3 Fold 1
Valida#on
Jordan et al., AAMAS-‐09
Itera<ve EGTA Process
Simulator Profile Payoff Data
Empirical Game
Strategy Set
Profile Space
Game Analysis (NE)
Strategy Space Refine?
End
N
More Strategies
More Samples
Select
Sampling Control Problem
Game Model Induc<on Problem
Add Strategy
Strategy Explora<on Problem
M. Wellman 6 Sep 12
PRIMA-‐12 25
Simulator
Game Analysis (NE)
Strategy Set
Profile Space
Profile Payoff Data
Empirical Game
RL: Best response to NE Refine?
Add new Strategy
Deviates? Improve RL Model?
Online Learning
N Y
Y
End N
N
More Strategies
More Samples
Select
New Strategy
Learning New Strategies: EGTA+RL
CDA Learning Problem Setup
H1: Moving average H2: Frequency weighted ra<o, threshold= V H3: Frequency weighted ra<o, threshold= A Q1: Opposite role Q2: Same role T1: Total T2: Since last trade U: Number of trades le` V: Value of next unit to be traded
History of recent trades
Quotes
Time
Pending Trades
State Space
Ac<ons
A: Offset from V
Rewards
R: Difference between unit valua<on and trade price
M. Wellman 6 Sep 12
PRIMA-‐12 26
EGTA/RL Round 1
Strategies Payoff NE Learning
Strategy Dev. Payoff
Kaplan ZI ZIbtq
248.1 1.000 ZI L1 268.7
L1 242.5 1.000 L1
EGTA/RL Round 2
Strategies Payoff NE Learning
Strategy Dev. Payoff
Kaplan ZI ZIbtq
248.1 1.000 ZI L1 268.7
L1 242.5 1.000 L1
ZIP 248.0 1.000 ZIP
GD 248.6 1.000 GD L2-‐L8 L9
-‐-‐-‐ 251.8
L9 246.1 0.531 GD 0.469 L9
L10 252.1
M. Wellman 6 Sep 12
PRIMA-‐12 27
EGTA/RL Rounds 3+
Strategies Payoff NE Learning
Strategy Dev. Payoff … … … … …
L10 248.0 0.191 GD 0.809 L10
L11 251.0
L11 246.2 1.000 L11
GDX 245.8 0.192 GDX 0.808 L11
L12 248.3
L12 245.8 0.049 L11 0.951 L12
L13 245.9
L13 245.6 0.872 L12 0.128 L13
L14 245.6
RB 245.6 0.872 L12 0.128 L13
Final champion
Strategy Explora<on Problem
• Premise: – Limited ability to cover profile space – Expecta<on to reasonably evaluate all considered strategies
• Need deliberate policy to decide which strategies to introduce
• RL for strategy explora<on – abempt at best response to current equilibrium – is this a good heuris<c (even assuming ideal BR calc?)