Computing Nash Equilibrium
Post on 02-Feb-2016
32 Views
Preview:
DESCRIPTION
Transcript
1
Computing Nash Equilibrium
Presenter: Yishay Mansour
2
Outline
• Problem Definition• Notation• Last week: Zero-Sum game• This week:
– Zero Sum: Online algorithm– General Sum Games
• Multiple players – approximate Nash• 2 players – exact Nash
3
Model
• Multiple players N={1, ... , n}
• Strategy set– Player i has m actions Si = {si1, ... , sim}
– Si are pure actions of player i
– S = i Si
• Payoff functions– Player i ui : S
4
Strategies
• Pure strategies: actions• Mixed strategy
– Player i : pi distribution over Si
– Game : P = i pi
• Product distribution
• Modified distribution– P-i = probability P except for player i
– (q, P-i ) = player i plays q other player pj
5
Notations
• Average Payoff– Player i: ui(P) = Es~P[ui(s)] = P(s)ui(s)– P(s) = i pi (si)
• Nash Equilibrium– P* is a Nash Eq. If for every player i– For any distribution qi
– ui(qi,P*-i) ui(P*)• Best Response
6
Two player games
• Payoff matrices (A,B)– m rows and n columns– player 1 has m action, player 2 has n actions
• strategies p and q
• Payoffs: u1(pq)=pAqt and u2(pq)= pBqt
• Zero sum game– A= -B
7
Online learning
• Playing with unknown payoff matrix• Online algorithm:
– at each step selects an action.• can be stochastic or fractional
– Observes all possible payoffs– Updates its parameters
• Goal: Achieve the value of the game– Payoff matrix of the “game” define at the end
8
Online learning - Algorithm
• Notations:– Opponent distribution Qt
– Our distribution Pt
– Observed cost M(i, Qt)
• Should be MQt, and M(Pt,Qt) = Pt M Qt
• cost on [0,1]
– Goal: minimize cost
• Algorithm: Exponential weights– Action i has weight proportional to bL(i,t)
– L(i,t) = loss of action i until time t
9
Online algorithm: Notations
• Formally:– Number of total steps T is known– parameter: b 0< b < 1
– wt+1(i) = wt(i) bM(i,Qt)
– Zt = wt(i)
– Pt+1(i) = wt+1(i) / Zt
– Initially, P1(i) > 0 , for every i
10
Online algorithm: Theorem
• Theorem– For any matrix M with entries in [0,1]
– Any sequence of dist. Q1 ... QT
– The algorithm generates P1, ... , PT
– RE(A||B) = Ex~A [ln (A(x) / B(x) ) ]
)||(1
1),(
1
)/1ln(min),( 1
11
PPREb
QPMb
bQPM
T
ttP
T
ttt
11
Relative Entropy
• For any two distributions A and B
• RE(A||B) = Ex~A [ln (A(x) / B(x) ) ]
– can be infinite • B(x) = 0 and A(x) 0
– Always non-negative• log is concave ai log bi log ai bi
A(x) ln B(x) / A(x) ln A(x) B(x) / A(x) = 0
12
Online algorithm: Analysis
• Lemma– For any mixed strategy P
• Corollary
)),()1(1ln(),()/1ln()||()||( 1 ttttt QPMbQPMbPPREPPRE
nb
QPMb
bQPM
T
ttP
T
ttt ln
1
1),(
1
)/1ln(min),(
11
13
Online Algorithm: Optimization
• b= 1/(1 + sqrt{2 (ln n) / T})– additional loss– O(sqrt{(ln n )/T})
• Zero sum game:– Average Loss: v – additional loss O(sqrt{(ln n )/T})
14
Example: Zero Sum
15
23
32
43
15
Two players General sum games
• Input matrices (A,B)• No unique value• Computational issues:
– find some Nash, – all Nash
• Can be exponentially many• identity matrix
• Example 2xN
16
Computational Complexity• Complexity of finding a sample equilibrium is unknown
– “…no proof of NP-completeness seems possible” (Papadimitriou, 94)
• Equilibria with certain properties are NP-Hard– e.g., max-payoff, max-support
• (Even) for symmetric 2-player games: NE with expected social welfare at least k? NE with least payoff at least k? Pareto-optimal NE? NE with player 1 EU of at least k? multiple NE? NE where player 1 plays (or not) a particular strategy?
Gilboa & Zemel,
Conitzer & Sandholm
17
Two players General sum games
• player 1 best response:– Like for zero sum:
– Fix strategy q of player 2
– maximize p (Aqt) such that j pj = 1 and pj 0
– dual LP: minimize u such that u Aqt
– Strong Duality: p(Aqt) = u = p u• p( u – Aq) = 0
• complementary system
• Player 2: q(v- pB) =0
18
Nash: Linear Complementary System
• Find distributions p and q and values u and v– u Aqt
– v pB– p( u – Aq) = 0– q(v- pB) =0 j pj = 1 and pj 0
j qj = 1 and qj 0
19
Two players General sum games
• Assume the support of strategies known.– p has support Sp and q has support Sq
– Can formulate the Nash as LP:
ii
pi
pi
pj
jij
pj
jij
p
Sip
Sip
Sivqa
Sivqa
1
for 0
for 0
for
for
jj
qj
qj
qi
iji
qi
iji
q
Sjq
Sjq
Sjuap
Sjuap
1
for 0
for 0
for
for
20
Approximate Nash
• Assume we are given Nash– strategies (p,q)
• Show that there exists:– small support– epsilon-Nash
• Brute force search – enumerate all small supports!– Each one requires only poly. time
• Proof!
21
Nash: Linear Complementary System
• Find distributions p and q and values u and v– u Aqt
– v pB– p( u – Aq) = 0– q(v- pB) =0 j pj = 1 and pj 0
j qj = 1 and qj 0
22
Lemke & Howson
• Define labeling• For strategy p (player 1):
– Label i : if (pi=0) where i action of player 1
– Label j : if action j (payer 2) is best response to p• bj p bkp
• Similar for player 2– Label j : if (qj=0) where j action of player 2
– Label i : if action i (payer 1) is best response to q• ai q ajq
23
LM algo
• strategy (p,q) is Nash if and only if:– Each label k is either a label of p or q (or both)
• Proof!
• Example
34
20
01
B
33
52
60
A
24
Lemke-Howson: Example
24
1
5
3
a3
a1
a2
a5
a4
1
2
3
4
5
a4a5
a106
a225
a333
a4a5
a110
a202
a343
U1= U2=
(0,0,1)
(0,1,0)
(1,0,0)
(2/3,1/3,0)
(0,1/3,2/3)
(0,1)
(1,0)
(2/3,1/3)
(1/3,2/3)
G1: G2:
25
Lemke-Howson: Example
24
1
5
3
a3
a1
a2
a5
a4
1
2
3
4
5
a4a5
a106
a225
a333
a4a5
a110
a202
a343
U1= U2=
(0,0,1)
(0,1,0)
(1,0,0)
(2/3,1/3,0)
(0,1/3,2/3)
(0,1)
(1,0)
(2/3,1/3)
(1/3,2/3)
G1: G2:
26
LM: non-degenerate
• Two player game is non-degenerate if• given a strategy (p or q)
– with support k
• At most k pure best responses• Many equivalent definitions• Theorem: For a non-degenerate game
– finite number of p with m labels– finite number of q with n labels
27
LM: Graphs
• Consider distributions where:– player 1 has m labels– player 2 has n labels
• Graph (per player):– join nodes that share all but 1 label
• Product graph:– nodes are pair of nodes (p,q)– edges: if (p,p’) an edge then (p,q)-(p’,q) edge
28
LM
• completely labeled node:– node that has m+n labels– Nash!
• node: k-almost completely labeled– all labeling but label k.
• edge: k-almost completely labeled– all labels on both sides except label k
• artificial node: (0,0)
29
LM : Paths
• Any Nash Eq.– connected to exactly one vertex which is – k-almost completely labeled
• Any k-almost completely labeled node– has two neighbors in the graph
• Follows from the non-degeneracy!
30
LM: algo
• start at (0,0)
• drop label k
• follow a path
• end of the path is a Nash
31
Lemke-Howson: Algorithm
24
1
5
3
a3
a1
a2
a5
a4
1
2
3
4
5
(0,0,1)
(0,1,0)
(1,0,0)
(2/3,1/3,0)
(0,1/3,2/3)
(0,1)
(1,0)
(2/3,1/3)
(1/3,2/3)
G1: G2:
32
Lemke-Howson: Algorithm
24
1
5
3
a3
a1
a2
a5
a4
1
2
3
4
5
(0,0,1)
(0,1,0)
(1,0,0)
(2/3,1/3,0)
(0,1/3,2/3)
(0,1)
(1,0)
(2/3,1/3)
(1/3,2/3)
G1: G2:
33
Lemke-Howson: Algorithm
24
1
5
3
a3
a1
a2
a5
a4
1
2
3
4
5
(0,0,1)
(0,1,0)
(1,0,0)
(2/3,1/3,0)
(0,1/3,2/3)
(0,1)
(1,0)
(2/3,1/3)
(1/3,2/3)
G1: G2:
34
Lemke-Howson: Other Equilibria
24
1
5
3
a3
a1
a2
a5
a4
1
2
3
4
5
(0,0,1)
(0,1,0)
(1,0,0)
(2/3,1/3,0)
(0,1/3,2/3)
(0,1)
(1,0)
(2/3,1/3)
(1/3,2/3)
G1: G2:
35
LM: Theorem
• Consider a non-degenerate game
• Graph consists of disjoint paths and cycles
• End points of paths are Nash– or (0,0)
• Number of Nash is odd.
36
LM: Sketch of Proof
• Deleting a label k– making support larger– making BR smaller
• Smaller BR– solve for the smaller BR– subtract from dist. until one component is zero
• Larger support– unique solution (since non-degenerate)
top related