Preliminaries State of the art Framework Application to EFGs Experimental evaluation Learning Correlation in Multi-Player General-Sum Games with Regret Minimization Tommaso Bianchi Advisor: Professor Nicola Gatti CSE Track September 30, 2019 Learning Correlation in Multi-Player General-Sum Games with Regret Minimization September 30, 2019 1 / 29
29
Embed
Learning Correlation in Multi-Player General-Sum Games ...
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Preliminaries State of the art Framework Application to EFGs Experimental evaluation
Learning Correlation in Multi-Player General-SumGames with Regret Minimization
Tommaso BianchiAdvisor: Professor Nicola Gatti
CSE Track
September 30, 2019
Learning Correlation in Multi-Player General-Sum Games with Regret Minimization September 30, 2019 1 / 29
Preliminaries State of the art Framework Application to EFGs Experimental evaluation
Goal
Develop novel algorithms to efficiently compute game theoreticalequilibria that enable correlation among players.
General approach for all multi-player, general-sum games.
Online and decentralized computation via regret minimization.
Learning Correlation in Multi-Player General-Sum Games with Regret Minimization September 30, 2019 2 / 29
Preliminaries State of the art Framework Application to EFGs Experimental evaluation
Game representations - Normal-form game
player 2
L R
player 1T 4, 4 1, 5
B 5, 1 2, 2
Model simultaneous, one-shot interactions.
Each player’s goal is to play as to maximize its own utility.
Learning Correlation in Multi-Player General-Sum Games with Regret Minimization September 30, 2019 3 / 29
Preliminaries State of the art Framework Application to EFGs Experimental evaluation
Game representations - Extensive-form game
player 1
player 2
(4, 4) (1, 5)
player 2
(5, 1) (2, 2)
I1
T B
L R L R
Model sequential interactions among players.
Can explicitly model imperfect information through informationsets, which are sets of indistinguishable nodes of a player.
Learning Correlation in Multi-Player General-Sum Games with Regret Minimization September 30, 2019 4 / 29
Preliminaries State of the art Framework Application to EFGs Experimental evaluation
Game representations - Equivalence
player 1
player 2
(4, 4) (1, 5)
player 2
(5, 1) (2, 2)
T B
L1 R1 L2 R2
player 2
L1L2 L1R1 R1L2 R1R2
player
1
T 4, 4 4, 4 1, 5 1, 5
B 5, 1 2, 2 5, 1 2, 2
Equivalence by enumerating all the possible action plans, whichspecify an action for each information set.
The set of action plans has a cardinality which is exponential inthe size of the extensive-form game.
Learning Correlation in Multi-Player General-Sum Games with Regret Minimization September 30, 2019 5 / 29
Preliminaries State of the art Framework Application to EFGs Experimental evaluation
A behavioural strategy πi for player i is a function specifying aprobability distribution for each information set I ∈ Ii .
In extensive-form games, behavioural strategies allow for a muchmore compact representation than the normal-form strategies ofthe equivalent normal-form game.
Learning Correlation in Multi-Player General-Sum Games with Regret Minimization September 30, 2019 7 / 29
Preliminaries State of the art Framework Application to EFGs Experimental evaluation
Strategy representations - Joint strategies
player 1 T B
player 2 L1L2 L1R1 R1L2 R1R2 L1L2 L1R1 R1L2 R1R2
0.1 0.1 0 0.2 0.4 0 0 0.2
Í ÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÑÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÏ1
A normal-form joint strategy x is a probability distribution overthe set A =⨉i∈P Ai of action profiles of the players.
Joint strategies specify how players correlate their play.
It is always possible to construct a joint strategy from a set ofmarginal normal-form strategies (one for each player); the oppositeis not always true.
Learning Correlation in Multi-Player General-Sum Games with Regret Minimization September 30, 2019 8 / 29
Preliminaries State of the art Framework Application to EFGs Experimental evaluation
Solution concepts - Nash equilibrium
A Nash equilibrium (Nash, 1951) is a strategy profilex̂ = (x̂1, . . . , x̂n) such that no player has any incentive to deviate(i.e., to change its strategy), given that all the other players do notdeviate themselves.
Nash equilibria models the way in which perfectly rational, selfishagents will act given they are completely isolated from each other.
Learning Correlation in Multi-Player General-Sum Games with Regret Minimization September 30, 2019 9 / 29
Preliminaries State of the art Framework Application to EFGs Experimental evaluation
Introducing correlation in solution concepts
Correlation is introduced through a mediator, a central device withthe role of sending recommendations to the players on how to play.
The mediator takes a sample from a publicly known joint strategy,and privately communicates to each player how they should play.
Players are free to play according to the recommendation or todeviate and play differently.
Learning Correlation in Multi-Player General-Sum Games with Regret Minimization September 30, 2019 10 / 29
Preliminaries State of the art Framework Application to EFGs Experimental evaluation
Solution concepts - Coarse-correlated equilibrium
In a Coarse-correlated equilibrium (Moulin and Vial, 1978),players have no incentive to deviate given the knowledge a-priori ofthe probability distribution from which recommendations will besampled, given that also the other players commit to following thecorrelation plan.
Coarse-correlated equilibria are well-suited for scenarios where theplayers have limited communication capabilities and can onlycommunicate before the game starts.
Learning Correlation in Multi-Player General-Sum Games with Regret Minimization September 30, 2019 11 / 29
Preliminaries State of the art Framework Application to EFGs Experimental evaluation
Regret minimization
Regret is a measure of how much a player would have preferred toplay a different strategy with respect to the one he actually used.
RTi ≔ max
ai∈Ai
T
∑t=1
ui(ai , x t−i) −T
∑t=1
ui(x t)
A regret minimizer is a device providing player i ’s strategy xt+1i
for the next iteration t + 1 on the basis of the past history of play.
Learning Correlation in Multi-Player General-Sum Games with Regret Minimization September 30, 2019 12 / 29
Preliminaries State of the art Framework Application to EFGs Experimental evaluation
Regret matching (Hart and Mas-Colell, 2001)
xT+1i (ai) =
⎧⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎩
[RTi (ai )]+
∑a′i∈Ai
[RT ,+i (a′i )]+
if ∑a′i∈Ai
[RTi (a′i)]+ > 0
1∣Ai ∣ otherwise
Regret matching is a regret minimizer for normal-form gamesbased on the simple idea that the probability to play an action isproportional to how ‘good’ it would have been to play it in the past(i.e., on the regret of not having played it).
Learning Correlation in Multi-Player General-Sum Games with Regret Minimization September 30, 2019 13 / 29
Preliminaries State of the art Framework Application to EFGs Experimental evaluation
CFR - Counterfactual regret minimization(Zinkevich et al., 2008)
Counterfactual regret minimization (CFR) is a regret minimizerfor extensive-form games.
Regret is decomposed into local terms at each information set, soas to guarantee that minimizing the local regrets implies theminimization the overall regret.
CFR uses simpler regret minimizers at each information set, such asregret matching.
Learning Correlation in Multi-Player General-Sum Games with Regret Minimization September 30, 2019 14 / 29
Preliminaries State of the art Framework Application to EFGs Experimental evaluation
Empirical frequency of play (Hart and Mas-Colell, 2000)
DefinitionThe empirical frequency of play x̄ is the joint probability distributiondefined as x̄(σ) ≔ ∣t≤T ∶σt
=σ∣T
for each normal-form action plan σ.
PropositionIf lim supT→∞
1TRTi ≤ 0 almost surely for each player i , then the
empirical frequency of play x̄ approaches almost surely as T →∞the set of CCE.
Learning Correlation in Multi-Player General-Sum Games with Regret Minimization September 30, 2019 15 / 29
Preliminaries State of the art Framework Application to EFGs Experimental evaluation
Framework - General idea
Use a regret minimizer for each player to ensure that their playapproaches over time the set of CCE.
Combine it with a polynomial-time oracle that maps players’strategies in the space of normal-form strategies so as to explicitlykeep track of the empirical frequency of play.
Learning Correlation in Multi-Player General-Sum Games with Regret Minimization September 30, 2019 16 / 29
Preliminaries State of the art Framework Application to EFGs Experimental evaluation
CCE computation with a sampling oracle
Use a sampling oracle to generate at each iteration a normal-formaction plan from the more compact strategies of the players.
Sampled action plan can be stored to explicitly keep track of theempirical frequency of play.
Polynomial-time sampling is often trivial, but can be dispersive ifthe strategies to sample from have some symmetries.
Learning Correlation in Multi-Player General-Sum Games with Regret Minimization September 30, 2019 17 / 29
Preliminaries State of the art Framework Application to EFGs Experimental evaluation
CCE computation with a marginal reconstruction oracle
Use a reconstruction oracle to generate normal-form strategiesthat are equivalent to the compact strategies of the players.
Reconstructed strategies are multiplied together to get a jointstrategy.
We proved that the time average of the reconstructed jointstrategies behaves like the empirical frequency of play.
Learning Correlation in Multi-Player General-Sum Games with Regret Minimization September 30, 2019 18 / 29
Preliminaries State of the art Framework Application to EFGs Experimental evaluation
CFR with Sampling (CFR-S)
Use CFR as a regret minimizer, which employs behaviouralstrategies as compact strategy representation.
Sampling a normal-form action plan from a behavioural strategysimply requires sampling one action at each information set.
Fast iterations, but a lot of them might be required before reachinga good approximation of the empirical frequency of play.
Learning Correlation in Multi-Player General-Sum Games with Regret Minimization September 30, 2019 19 / 29
Preliminaries State of the art Framework Application to EFGs Experimental evaluation
Marginal reconstruction oracle
Algorithm 1 Reconstruct xi from πi1: function Nf-reconstruct(πi )2: X ← ∅ ▷ X is a dictionary defining xi3: ωz ← ρ
πiz ∀z ∈ Z
4: while ω > 0 do5: σ̄i ← argmaxσi∈Σi
minz∈Z(σi ) ωz
6: ω̄ ← minz∈Z(σ̄i ) ωi (z)7: X ← X ∪ (σ̄i , ω̄)8: ω ← ω − ω̄ ρσ̄i
return xi built from the pairs in X
Main idea: assign probability to normal-form action plans σi so asto match the probability ωz of reaching terminal node z induced bybehavioural strategy πi .
Learning Correlation in Multi-Player General-Sum Games with Regret Minimization September 30, 2019 20 / 29
Preliminaries State of the art Framework Application to EFGs Experimental evaluation
CFR with Joint reconstruction (CFR-Jr)
Use CFR as a regret minimizer, which employs behaviouralstrategies as compact strategy representation.
Use the reconstruction oracle to build normal-form realizationequivalent strategies from the behavioural strategies built by CFR.
Iterations are slower due to the more complex oracle, but usuallyeven a few reconstruction steps are sufficient to build a goodapproximation of the empirical frequency of play.
Learning Correlation in Multi-Player General-Sum Games with Regret Minimization September 30, 2019 21 / 29
Preliminaries State of the art Framework Application to EFGs Experimental evaluation
Non-convergence of product of marginal strategies
1,0 0,1 0,0
0,0 2,0 0,1
0,1 0,0 1,0
100 101 102 103 104 105
0
0.2
0.4
Iterations
ε
x̄T
x̄T1 ⊗ x̄T
2
The naïve solution of keeping track of each players’ marginalstrategy and building the product of the average strategies mightlead to cyclic behaviours.
For example, by employing regret matching (right figure) in avariant of the Shapley game (Shapley, 1964; left figure).
Learning Correlation in Multi-Player General-Sum Games with Regret Minimization September 30, 2019 22 / 29
Preliminaries State of the art Framework Application to EFGs Experimental evaluation
Non-convergence of product of marginal strategies
0 1 2 3 4
·104
0
2
4
6
8·10−2
Iterations
ε ∆
CFR-JrCFRCFR-S
0 1 2 3 4
·104
0.85
0.9
0.95
1
Iterations
swApx/sw
Opt
CFR-JrCFRCFR-S
Cyclic behaviours for the product of marginal strategies in aninstance of the Goofspiel (Ross, 1971) card game.
CFR-Jr clearly outperforms CFR-S in terms of convergence speed(left figure) and in terms of attained social welfare (right figure).
Learning Correlation in Multi-Player General-Sum Games with Regret Minimization September 30, 2019 23 / 29
Preliminaries State of the art Framework Application to EFGs Experimental evaluation
Comparison with the prior state of the art technique, a columngeneration algorithm (Celli et al., 2019).
Both CFR-Jr and CFR-S vastly outperform it, and can beeffectively used in much larger game instances.
Learning Correlation in Multi-Player General-Sum Games with Regret Minimization September 30, 2019 24 / 29
Preliminaries State of the art Framework Application to EFGs Experimental evaluation
Comparison between CFR-S and CFR-Jr
4 5 6 7 8
0
1
2
3
4
Depth of the tree
Tim
e[s]
CFR-Jr α = 10−1
CFR-S α = 10−1
4 5 6 7 8
0
2
4
6
Depth of the tree
Tim
e[s]
CFR-Jr α = 10−2
CFR-S α = 10−2
4 5 6 7 8
0
20
40
60
80
Depth of the tree
Tim
e[s]
CFR-Jr α = 10−3
CFR-S α = 10−3
4 5 6 7 8
0
200
400
600
800
1,000
Depth of the tree
Tim
e[s]
CFR-Jr α = 10−4
CFR-S α = 10−4
Comparison between the running time of CFR-S and CFR-Jr onrandom game instances.
Faster iterations lead CFR-S to reach a rough approximation of asolution in a shorter time, but as we require a higher accuracyCFR-Jr performs better.
Learning Correlation in Multi-Player General-Sum Games with Regret Minimization September 30, 2019 25 / 29
Preliminaries State of the art Framework Application to EFGs Experimental evaluation
Conclusions
There exist general regret minimization approaches thatguarantee convergence to the set of CCE in general-sum,multi-player games.
The best algorithm derived through this method is able to vastlyoutperform the prior state of the art in reasonably-sizedextensive-form games.
No optimality guarantee, but high social-welfare in practice.
Learning Correlation in Multi-Player General-Sum Games with Regret Minimization September 30, 2019 26 / 29
Preliminaries State of the art Framework Application to EFGs Experimental evaluation
Future works
Compute approximate Coarse-correlated equilibria in other classes ofstructured games by employing our regret minimization framework.
Employ a CCE strategy profile as a starting point to approximatetighter solution concepts that admit some form of correlation.
Give theoretical guarantees on the approximation of the optimalsocial welfare.
Define regret-minimizing procedures for general, multi-playerextensive-form games leading to refinements of CCE, such asCorrelated equilibria and Extensive-form Correlated equilibria.
Learning Correlation in Multi-Player General-Sum Games with Regret Minimization September 30, 2019 27 / 29
Preliminaries State of the art Framework Application to EFGs Experimental evaluation
Bibliography
Andrea Celli, Stefano Coniglio, and Nicola Gatti. Computing optimal exante correlated equilibria in two-player sequential games. Proceedings ofthe 18th International Conference on Autonomous Agents andMultiAgent Systems, 2019.
Sergiu Hart and Andreu Mas-Colell. A simple adaptive procedure leadingto correlated equilibrium. Econometrica, 2000.
Sergiu Hart and Andreu Mas-Colell. A general class of adaptivestrategies. Journal of Economic Theory, 2001.
H. Moulin and J-P Vial. Strategically zero-sum games: the class ofgames whose completely mixed equilibria cannot be improved upon.International Journal of Game Theory, 1978.
Learning Correlation in Multi-Player General-Sum Games with Regret Minimization September 30, 2019 28 / 29
Preliminaries State of the art Framework Application to EFGs Experimental evaluation
John Nash. Non-cooperative games. Annals of mathematics, 1951.
Martin Zinkevich, Michael Johanson, Michael Bowling, and CarmeloPiccione. Regret minimization in games with incomplete information.Proceedings of the Annual Conference on Neural Information Processing,2008.
Lloyd Shapley. Some topics in two-person games. Advances in gametheory, 1964.
Sheldon M Ross. Goofspiel – the game of pure strategy. Journal ofApplied Probability, 1971.
Learning Correlation in Multi-Player General-Sum Games with Regret Minimization September 30, 2019 29 / 29