Top Banner
Introduction to Natural Computation Lecture 10 Games Peter Lewis 1 / 24
24

Peter Lewis - University of Birmingham

Feb 28, 2022

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Peter Lewis - University of Birmingham

Introduction to Natural Computation

Lecture 10

Games

Peter Lewis

1 / 24

Page 2: Peter Lewis - University of Birmingham

Overview of the Lecture

What is a game?

What is the theory of games?

What are strategies for playing a game and which strategies should we play?

Equilibria, dynamics and what happens when people play their best strategies.

Are the players really rational?

Repeated games.

Some applications of game theory.

2 / 24

Page 3: Peter Lewis - University of Birmingham

All about games

What is a game?

Two people playing chess are playing a game.

Criminals and the police are playing a game.

Labour and the Conservatives are playing a game.

Stock market traders are playing a game.

A shopkeeper and his customers are playing a game.

Kruschev and Kennedy played a very dangerous game.

A game is being played whenever two or more individuals interact. Game theory isconcerned with how the outcomes of these interactions relate to the individuals’preferences and the structure of the game.

3 / 24

Page 4: Peter Lewis - University of Birmingham

What is a game?

The origins of game theory

People have always tried to predict what the outcome of various game-like situationsmight be, and what they should do in order to achieve their outcomes.

But formal game theory was developedby John Von Neumann and OskarMorgenstern and described in theirbook Theory of Games and EconomicBehaviour

Yes, that John Von Neumann...

4 / 24

Page 5: Peter Lewis - University of Birmingham

A formal game

So, what is a game?

According to Von Neumann and Morgenstern, a game can be completely described by:

The players of a game,

For every player, every opportunity they have to move,

What each player can do at each of their moves,

What each player knows for every move, and

The payoffs received by every player for every possible combination of moves.

Extensive form

Usually used to denote sequential playgames.

Normal (strategic) form

Player 2

Player 1Red Blue

Red 1 , 1 2 , 1Blue 1 , 2 0.5 , 0.5

Usually used to denote simultaneous playgames.

5 / 24

Page 6: Peter Lewis - University of Birmingham

Strategies for playing a game

So how does an individual play the game?

According to their strategy.

A player’s strategy is “a predetermined programme of play that tells her whatactions to take in response to every possible strategy other players might use”.

Defines a player’s entire behaviour.

Example: Noughts and Crosses / Tic Tac Toe

As a player, in 3x3 Noughts and Crosses wehave 9! = 362, 880 possible games (leaf nodes)to consider.

Discounting unreachable subtrees, we have255, 168 possible games remaining.

If our player is clever enough to understandrotations and reflections, we can reduce this toonly 26, 830!

Example: Chess

So in case you think we’re about to use game theory to solve chess... There are in theorder of 10123 possible unique games of chess for a complete strategy to consider.

6 / 24

Page 7: Peter Lewis - University of Birmingham

Simpler games

Let’s think of some simpler games...

Prisoner’s Dilemma

Two people have beenarrested for robbing abank and are in separateisolation cells. Both caremuch more about theirown freedom than aboutthe welfare of theiraccomplice. They areeach presented with achoice by the prosecutor:

“You may either confess or remain silent. If youconfess and your accomplice remains silent I willdrop all charges against you and and use yourtestimony to ensure that your accomplice gets 10years in prison. If your accomplice confesses andyou remain silent, they will go free while you getthe 10 years. If you both confess you both go toprison, but I’ll make sure that you both get out in5. If you both remain silent, you’ll both get 1 yearsentences on firearms possession charges.”

What should they do?

Person 2

Person 1Stay silent Confess

Stay silent -1 -10Confess 0 -5

7 / 24

Page 8: Peter Lewis - University of Birmingham

The Prisoner’s Dilemma

More generally...

The payoff matrix for the two-player prisoner’s dilemma game

Player 2

Player 1

Cooperate DefectR T

CooperateR S

S PDefect

T P

The values S,P ,R,T must satisfy T > R > P > S and R > (S + T )/2.

From each prisoner’s perspective:

If he were to cooperate, then I should defect and get away free.

And if he were to defect, then I should also defect, as I’d halve my sentence!

So for both prisoners, it is always rational to defect.

Defection is a dominant strategy. It is always the best strategy to play and neitherplayer will deviate from it. Mutual defection is a Nash equilibrium.

8 / 24

Page 9: Peter Lewis - University of Birmingham

Nash equilibria

A Nash equilibrium, named after John Nash whoproposed it, is:

a solution concept of a game involving two ormore players,

in which each player is assumed to know theequilibrium strategies of the other players,

and no player has anything to gain bychanging only his or her own strategyunilaterally.

“We cannot predict the result of the choices ofmultiple decision makers if we analyze thosedecisions in isolation. Instead, we must ask whateach player would do, taking into account thedecision-making of the others.”

Verifying if strategy profile is a Nash equilibrium

Each player asks themself “knowing the strategies of the other players, and treatingthose strategies as set in stone, can I benefit by changing my strategy?”

If everyone answers no, it is a Nash equilibrium.

If anyone answers yes, it is not a Nash equilibrium.

9 / 24

Page 10: Peter Lewis - University of Birmingham

The Final Problem

In The Final Problem, ProfessorMoriarty is pursuing Sherlock

Holmes, who leaves from Londonon the train.

Holmes must choose whether toalight at Newhaven or Canterbury.

Moriarty must also choose astation, at which to lie and wait,

in order to catch Holmes.

London Victoria Canterbury Newhaven

If they both select the same station, then Moriarty will kill Holmes and get away withhis crimes. If they choose different stations, then Holmes will escape, and provide

evidence that will convict Moriarty.

10 / 24

Page 11: Peter Lewis - University of Birmingham

The Final Problem

London Victoria Canterbury Newhaven

What might the payoff matrix look like?

Moriarty

HolmesCanterbury Newhaven

Canterbury -1 , 1 1 , -1Newhaven 1, -1 -1 , 1

It is clear that for both players neither choice dominates the other, and each playermay as well flip a coin, accepting a 50 / 50 chance of winning.

11 / 24

Page 12: Peter Lewis - University of Birmingham

Mixed strategies

Surely flipping a coin is not a useful strategy to follow?

Well, it turns out that it is!

In fact, both players choosing either station with probability 0.5is a Nash equilibrium.

This is an example of a mixed strategy, as opposed to purestrategies which make a single choice with probability 1.

The equilibrium is known as a mixed strategy Nash equilibrium.

Pure strategy Nash equilibria are a special case of these.

This particular game is known as Matching Pennies.

Mixed strategies

A mixed strategy defines a probability distribution over a set of possible actions.

When the player plays, he chooses an action according to this distribution.

Mixed strategy Nash equilibria

When no player can improve his payoff by unilaterally changing his probabilitydistribution.

Nash showed that for any game with a finite set of actions, at least one mixedstrategy Nash equilibrium must exist.

12 / 24

Page 13: Peter Lewis - University of Birmingham

Another common example

The Driving Game

In the Driving Game two players are driving in oppositedirections along the same road.

They have to choose on which side of the road to drive.

If they both choose the same side, they both get totheir destinations.

If they both choose opposite sides, they crash.

We assume that they don’t have enough notice toswitch sides upon seeing the other car.

Possible payoff matrix

Player 2

Player 1Left Right

Left 10 -1,000,000Right -1,000,000 10

There are two obvious Nashequilibria in this game, which wecan express as probabilitydistributions over the strategyspace (Left, Right)

(1.0, 0.0) and (1.0, 0.0)

(0.0, 1.0) and (0.0, 1.0)

But there’s also one more: (0.5, 0.5) and (0.5, 0.5).

13 / 24

Page 14: Peter Lewis - University of Birmingham

Stupid questions?

Possible payoff matrix for the Driving Game

Player 2

Player 1Left Right

Left 10 -1,000,000Right -1,000,000 10

Three Nash equilibria:

(1.0, 0.0) and (1.0, 0.0)

(0.0, 1.0) and (0.0, 1.0)

(0.5, 0.5) and (0.5, 0.5)

So why don’t we see the third one in real life?

Well, at the third Nash equilibrium, the expected payoffs are not good!

Not all Nash equilibria are the same; some may appear non-rational from anexternal perspective.

A Nash equilibrium is not guaranteed to be Pareto optimal.

Cooperative or multilateral decision making can allow players to move from oneNash equilibrium to another one.

Other forces, capable of changing multiple players’ strategies at the same time,can achieve the same thing.

This might be a move to a worse equilibrium.

14 / 24

Page 15: Peter Lewis - University of Birmingham

On rationality

Rock Paper Scissors

Rock Paper Scissors can be thought of as athree choice extension of Matching Pennies.

There is no dominant pure strategy.

The Nash equilibrium is when both players’strategies are ( 1

3, 1

3, 1

3).

So why all the fuss?

Tim Conrad won $7,000 for winning the RockPaper Scissors World Championships!

Humans are not particularly good at keepingto the equilibrium strategy.

Any deviation on the part of your opponentcan be exploited.

Psychology plays a large part.

In many games, human are not rationalanyway.

15 / 24

Page 16: Peter Lewis - University of Birmingham

On Rationality

The Ultimatum Game

Two players interact to decide how to divide a sum of money that is given tothem by a third party.

First, player 1 proposes how to divide the sum between the two players,

Subsequently, player 2 can either accept or reject this proposal.

If the second player accepts, the money is split according to the proposal.

If the second player rejects, neither player receives anything.

The game is played only once so that reciprocation is not an issue.

What should the players do?

Rationally, the second player should accept any proposal which offers her apositive amount of money.

Even if the proposal is to offer nothing, at this point the second player is onlyindifferent.

So, the first player has nothing to loose by offering the smallest amount whichconvinces the second player to accept.

An offer of the smallest possible division of the money is the Nash equilibrium.

16 / 24

Page 17: Peter Lewis - University of Birmingham

On Rationality

In reality

Would you really dare to offer 1p out of £100 to the second player?

In one set of experiments, 43% of those playing first offered an even split.

Those playing second rejected on average offers falling below 13

of the totalamount.

Over half of the offers below 14

were rejected!

Why?

Is homo economicus, the model of a human as rational economic actors false?

Does the payoff not take into account a positive psychological reaction to offeringmore money?

Are the second players attempting to punish those going first?

Does it illustrate a human unwillingness to accept injustice and social inequality?

Are empathy and perspective driving the generosity?

Is this kin selection in action?

Experimentally...

Externally administered oxytocin, used to increase levels of emotion in the subject,increased generous offers by 80% relative to a placebo, though it did not affect theminimum acceptance threshold.

17 / 24

Page 18: Peter Lewis - University of Birmingham

Repeated Games

One shot and repeated games

So far, we have just considered a game which is played only once. These areknown as one shot games.

In many situations of course, we interact multiple times with the same individual.

When a game is played multiple times by the same players, it is called a repeatedgame.

The payoff in a repeated game is just the sum of the payoffs from each round.

18 / 24

Page 19: Peter Lewis - University of Birmingham

Iterated Prisoner’s Dilemma

A very commonly studied repeated game is the Iterated Prisoner’s Dilemma.Recall the Prisoner’s Dilemma payoff matrix:

Player 2

Player 1

Cooperate DefectR T

CooperateR S

S PDefect

T P

The values S,P ,R,T must satisfy T > R > P > S and R > (S + T )/2.

The Nash equilibrium in a one shot game was defect, defect.

What happens if we play n games with the same opponent?

19 / 24

Page 20: Peter Lewis - University of Birmingham

Iterated Prisoner’s Dilemma

Player 2

Player 1

Cooperate DefectR T

CooperateR S

S PDefect

T P

The values S,P ,R,T must satisfy T > R > P > S and R > (S + T )/2.

If we both defect for n rounds, then we each get a payoff of Pn.

But if we were to both cooperate, we’d get Rn each!

Except I think I might defect in the nth round, since then I’d get (n− 1)R+ T .

But my opponent will think the same thing so we’ll both be left with(n− 1)R+ P .

But I can’t cooperate in the last round, since if he defects I’ll only getR(n− 1) + S, so I must defect in the nth round.

The problem is that this logic can now be applied to the n− 1th round, and then− 2th round, and so on...

Until we’re left defecting in round 1 and throughout the whole game.

20 / 24

Page 21: Peter Lewis - University of Birmingham

Learning to play games

Okay, so the Nash equilibrium is for both players to always defect.

But the IPD models real world scenarios where people do cooperate!

And their payoffs are higher as a result.

Is this human irrationality again?

Mutual defection throughout the game is the Nash equilibrium, but it is not adominant strategy. I.e. it is not the best response to every other strategy thatyour opponent could play.

Furthermore, for the Iterated Prisoner’s Dilemma there is no single best strategyagainst all possible opponents.

Can we develop strategies which can perform well against a good range of otherstrategies?

We could search the strategy space...but it’s very big!For an n round game there are 22n−1 possible strategies!

Learning game playing strategies

Strategies can be encoded in many ways: neural networks, bitstrings, finite statemachines etc.

Learning is typically done through the (co)evolution of a population of strategies.

Surprisingly, strategies can emerge which bring about cooperation through thethreat of retaliation if the opponent defects (e.g. tit for tat).

21 / 24

Page 22: Peter Lewis - University of Birmingham

So why is all this interesting?

Game theory has been used to try to understand:

Pricing and the formation of cartels in business,

Why people vote in certain ways,

Evolutionary dynamics in populations of animals,

How to maintain biodiversity,

Why humans appear to behave altruistically,

How to win competitions of Rock Paper Scissors,

Bacterial strain diversity,

How to allocate resources in computer networks,

Why countries spend billions on nuclear weapons and (almost) never use them.

And it’s also the basis of a lot of economics-inspired computation.

22 / 24

Page 23: Peter Lewis - University of Birmingham

Summary

We have learnt:

What formal games are and how they can be described. We looked at severalexamples, including particularly the Prisoner’s Dilemma.

That pure and mixed strategies define how a player plays a game. Some gameshave a dominant strategy, i.e. one which is always best.

How Nash equilibria describe a certain type of “solution” for a game, where noplayer can unilaterally improve his payoff.

That there are pure strategy Nash equilibria and mixed strategy Nash equilibria,and that all games have at least one (mixed strategy) Nash equilibrium.

That Nash equilibria can lead to either good or bad outcomes for the players!

That in many (especially repeated) games, such as the Iterated Prisoner’sDilemma, there is often no dominant strategy.

That in these cases, we can learn high performing strategies.

23 / 24

Page 24: Peter Lewis - University of Birmingham

Further reading

Stanford Encyclopedia of Philosophy. Game Theory; 2010.http://plato.stanford.edu/entries/game-theory/.

Binmore K.Game Theory: A Very Short Introduction.Oxford University Press; 2007.

Kendall G, Yao X, Chong SY.The Iterated Prisoners’ Dilemma: 20 Years On.World Scientific; 2007.

24 / 24