Computing Nash Equilibrium

Presenter: Yishay Mansour

Outline

• Problem Definition• Notation• Last week: Zero-Sum game• This week:

– Zero Sum: Online algorithm– General Sum Games

• Multiple players – approximate Nash• 2 players – exact Nash

• Multiple players N={1, ... , n}

• Strategy set– Player i has m actions Si = {si1, ... , sim}

– Si are pure actions of player i

– S = i Si

• Payoff functions– Player i ui : S

Strategies

• Pure strategies: actions• Mixed strategy

– Player i : pi distribution over Si

– Game : P = i pi

• Product distribution

• Modified distribution– P-i = probability P except for player i

– (q, P-i ) = player i plays q other player pj

Notations

• Average Payoff– Player i: ui(P) = Es~P[ui(s)] = P(s)ui(s)– P(s) = i pi (si)

• Nash Equilibrium– P* is a Nash Eq. If for every player i– For any distribution qi

– ui(qi,P*-i) ui(P*)• Best Response

Two player games

• Payoff matrices (A,B)– m rows and n columns– player 1 has m action, player 2 has n actions

• strategies p and q

• Payoffs: u1(pq)=pAqt and u2(pq)= pBqt

• Zero sum game– A= -B

Online learning

• Playing with unknown payoff matrix• Online algorithm:

– at each step selects an action.• can be stochastic or fractional

– Observes all possible payoffs– Updates its parameters

• Goal: Achieve the value of the game– Payoff matrix of the “game” define at the end

Online learning - Algorithm

• Notations:– Opponent distribution Qt

– Our distribution Pt

– Observed cost M(i, Qt)

• Should be MQt, and M(Pt,Qt) = Pt M Qt

• cost on [0,1]

– Goal: minimize cost

• Algorithm: Exponential weights– Action i has weight proportional to bL(i,t)

– L(i,t) = loss of action i until time t

Online algorithm: Notations

• Formally:– Number of total steps T is known– parameter: b 0< b < 1

– wt+1(i) = wt(i) bM(i,Qt)

– Zt = wt(i)

– Pt+1(i) = wt+1(i) / Zt

– Initially, P1(i) > 0 , for every i

Online algorithm: Theorem

• Theorem– For any matrix M with entries in [0,1]

– Any sequence of dist. Q1 ... QT

– The algorithm generates P1, ... , PT

– RE(A||B) = Ex~A [ln (A(x) / B(x) ) ]

)/1ln(min),( 1

Relative Entropy

• For any two distributions A and B

• RE(A||B) = Ex~A [ln (A(x) / B(x) ) ]

– can be infinite • B(x) = 0 and A(x) 0

– Always non-negative• log is concave ai log bi log ai bi

A(x) ln B(x) / A(x) ln A(x) B(x) / A(x) = 0

Online algorithm: Analysis

• Lemma– For any mixed strategy P

• Corollary

)),()1(1ln(),()/1ln()||()||( 1 ttttt QPMbQPMbPPREPPRE

ttt ln

)/1ln(min),(

Online Algorithm: Optimization

• b= 1/(1 + sqrt{2 (ln n) / T})– additional loss– O(sqrt{(ln n )/T})

• Zero sum game:– Average Loss: v – additional loss O(sqrt{(ln n )/T})

Example: Zero Sum

Two players General sum games

• Input matrices (A,B)• No unique value• Computational issues:

– find some Nash, – all Nash

• Can be exponentially many• identity matrix

• Example 2xN

Computational Complexity• Complexity of finding a sample equilibrium is unknown

– “…no proof of NP-completeness seems possible” (Papadimitriou, 94)

• Equilibria with certain properties are NP-Hard– e.g., max-payoff, max-support

• (Even) for symmetric 2-player games: NE with expected social welfare at least k? NE with least payoff at least k? Pareto-optimal NE? NE with player 1 EU of at least k? multiple NE? NE where player 1 plays (or not) a particular strategy?

Gilboa & Zemel,

Conitzer & Sandholm

• player 1 best response:– Like for zero sum:

– Fix strategy q of player 2

– maximize p (Aqt) such that j pj = 1 and pj 0

– dual LP: minimize u such that u Aqt

– Strong Duality: p(Aqt) = u = p u• p( u – Aq) = 0

• complementary system

• Player 2: q(v- pB) =0

Nash: Linear Complementary System

• Find distributions p and q and values u and v– u Aqt

– v pB– p( u – Aq) = 0– q(v- pB) =0 j pj = 1 and pj 0

j qj = 1 and qj 0

• Assume the support of strategies known.– p has support Sp and q has support Sq

– Can formulate the Nash as LP:

Approximate Nash

• Assume we are given Nash– strategies (p,q)

• Show that there exists:– small support– epsilon-Nash

• Brute force search – enumerate all small supports!– Each one requires only poly. time

• Proof!

Nash: Linear Complementary System

• Find distributions p and q and values u and v– u Aqt

– v pB– p( u – Aq) = 0– q(v- pB) =0 j pj = 1 and pj 0

j qj = 1 and qj 0

Lemke & Howson

• Define labeling• For strategy p (player 1):

– Label i : if (pi=0) where i action of player 1

– Label j : if action j (payer 2) is best response to p• bj p bkp

• Similar for player 2– Label j : if (qj=0) where j action of player 2

– Label i : if action i (payer 1) is best response to q• ai q ajq

LM algo

• strategy (p,q) is Nash if and only if:– Each label k is either a label of p or q (or both)

• Proof!

• Example

Lemke-Howson: Example

U1= U2=

(0,0,1)

(0,1,0)

(1,0,0)

(2/3,1/3,0)

(0,1/3,2/3)

(2/3,1/3)

(1/3,2/3)

G1: G2:

Lemke-Howson: Example

U1= U2=

(0,0,1)

(0,1,0)

(1,0,0)

(2/3,1/3,0)

(0,1/3,2/3)

(2/3,1/3)

(1/3,2/3)

G1: G2:

LM: non-degenerate

• Two player game is non-degenerate if• given a strategy (p or q)

– with support k

• At most k pure best responses• Many equivalent definitions• Theorem: For a non-degenerate game

– finite number of p with m labels– finite number of q with n labels

LM: Graphs

• Consider distributions where:– player 1 has m labels– player 2 has n labels

• Graph (per player):– join nodes that share all but 1 label

• Product graph:– nodes are pair of nodes (p,q)– edges: if (p,p’) an edge then (p,q)-(p’,q) edge

• completely labeled node:– node that has m+n labels– Nash!

• node: k-almost completely labeled– all labeling but label k.

• edge: k-almost completely labeled– all labels on both sides except label k

• artificial node: (0,0)

LM : Paths

• Any Nash Eq.– connected to exactly one vertex which is – k-almost completely labeled

• Any k-almost completely labeled node– has two neighbors in the graph

• Follows from the non-degeneracy!

LM: algo

• start at (0,0)

• drop label k

• follow a path

• end of the path is a Nash

Lemke-Howson: Algorithm

(0,0,1)

(0,1,0)

(1,0,0)

(2/3,1/3,0)

(0,1/3,2/3)

(2/3,1/3)

(1/3,2/3)

G1: G2:

(0,0,1)

(0,1,0)

(1,0,0)

(2/3,1/3,0)

(0,1/3,2/3)

(2/3,1/3)

(1/3,2/3)

G1: G2:

(0,0,1)

(0,1,0)

(1,0,0)

(2/3,1/3,0)

(0,1/3,2/3)

(2/3,1/3)

(1/3,2/3)

G1: G2:

Lemke-Howson: Other Equilibria

(0,0,1)

(0,1,0)

(1,0,0)

(2/3,1/3,0)

(0,1/3,2/3)

(2/3,1/3)

(1/3,2/3)

G1: G2:

LM: Theorem

• Consider a non-degenerate game

• Graph consists of disjoint paths and cycles

• End points of paths are Nash– or (0,0)

• Number of Nash is odd.

LM: Sketch of Proof

• Deleting a label k– making support larger– making BR smaller

• Smaller BR– solve for the smaller BR– subtract from dist. until one component is zero

• Larger support– unique solution (since non-degenerate)

Computing Nash Equilibrium

Documents

Task graph pre-scheduling, using Nash equilibrium in game...

Median Voter Theorem- Nash Equilibrium

Convergence Time to Nash Equilibrium in Load...

Bayes-Nash equilibrium with Incomplete Information.

Variants of Nash Equilibrium

Computing Nash Equilibrium in Wireless Ad Hoc Networks Using...

1 Computing Nash Equilibrium Presenter: Yishay Mansour.

Nash Equilibrium Problems of...

Chapter 3: Nash Equilibrium - Rice University

Beyond Nash Equilibrium - Correlated Equilibrium and...

3 Nash Equilibrium: Illustrations

Nash equilibrium based fairness

Stochastic Nash Equilibrium

Nash Equilibrium: Illustrations

Nash Equilibrium and International Law

Lecture 12: Game Theory // Nash...