Top Banner
Making Decisions in Games 1
70

Lecture 3 - Decision Making

May 13, 2015

Download

Technology

Luke Dicken

This is the 3rd of an 8 lecture series that I presented at University of Strathclyde in 2011/2012 as part of the final year AI course.

This lecture moves beyond the Game Theoretic definition of a game, and demonstrates how algorithms can be used not only to find a single good choice, but a sequence of choices that will eventually reach a winning state.
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Lecture 3 - Decision Making

Making Decisions in Games

1

Page 2: Lecture 3 - Decision Making

Theory of Real Games

•We’ve been talking about “games” as single

instances of choice - heads/tails, odds/evens etc.

•We’ve talked about how we can repeat the game

(iterating) and interesting things happen.

• Are most games the same choice repeatedly?

2

Page 3: Lecture 3 - Decision Making

Real Games

• At a much less abstract level, a game is not one

choice repeated.

• A sequence of different choices.

• Delayed reward

3

Page 4: Lecture 3 - Decision Making

Delayed Reward

• Last week we could see the payoffs for each choice

pair in the games.

• Does a single move in chess have a “reward”?

• The reward is whether the game is won or lost -

the combined result of the choice sequence

4

Page 5: Lecture 3 - Decision Making

Evaluating Delayed Rewards

•We need to evaluate what the expected payoff of a

given choice is.

• Typically we can only do this at the end of the game.

• How can we decide what to do now if we won’t

know if it was a good decision until later?

5

Page 6: Lecture 3 - Decision Making

Chess

•Opening move is one choice.

•Opponent makes their move.

• You reply.

• Note that your 2nd move is a totally different

theoretical “game” to the first move.

6

Page 7: Lecture 3 - Decision Making

Chess

• Initially there are 20 opening moves

• Your opponent has 20 responding moves

• 2 moves in, the size of the potential statespace is

400 states.

• The game gets more complicated later

‣ Average number moves per turn : 35

‣ Average game length : 80

• State space size (Shannon's number) : 3123 - HUGE 7

Page 8: Lecture 3 - Decision Making

Search

• This state space is way too big for an exhaustive

search approach like mini-max

• Any brute force approach is not going to work

•We need some mechanism to guide the search

towards areas of the game tree that are useful

8

Page 9: Lecture 3 - Decision Making

Heuristics

• A heuristic is formally a “strategy using readily

accessible, though loosely applicable, information to

control problem solving in human beings and

machines”

• Less formally, it’s a guess-timate of the value of a

state, typically based on the distance to the goal

(planning) or likelihood of winning (games)

9

Page 10: Lecture 3 - Decision Making

Using Heuristics

• Heuristics guide search across spaces that are too

complex to fully enumerate.

• Estimate potential of the next set of states using the

heuristic and go with the best looking one.

• Can be combined with a search strategy like Best

First Search or Enforced Hill Climbing

10

Page 11: Lecture 3 - Decision Making

Heuristic Example - A*

• A* search for path planning is a great example of

heuristics in use.

• In a world of tiles, find an optimal path from A to B.

• A* uses two metric :

‣ Concrete metric of the work to get to a location (g)

‣ Estimate of work to get from location to goal (h)

• Search strategy always chooses location that

minimises (h+g)11

Page 12: Lecture 3 - Decision Making

Heuristic Example - A*

12

Page 13: Lecture 3 - Decision Making

Heuristic Example - A*

13

B

A

Page 14: Lecture 3 - Decision Making

Heuristic Example - A*

14

B

A

Page 15: Lecture 3 - Decision Making

Heuristic Example - A*

15

B

1 + 7 A

1 +7

Page 16: Lecture 3 - Decision Making

Heuristic Example - A*

16

B

1 + 7 A

1 +7

Page 17: Lecture 3 - Decision Making

Heuristic Example - A*

17

B

2 + 6

1 + 7 A

2 + 8 1 +7

Page 18: Lecture 3 - Decision Making

Heuristic Example - A*

18

B

2 + 6

1 + 7 A

2 + 8 1 +7

Page 19: Lecture 3 - Decision Making

Heuristic Example - A*

19

B

3 + 5

2 + 6

1 + 7 A

2 + 8 1 +7

Page 20: Lecture 3 - Decision Making

Heuristic Example - A*

20

B

3 + 5

2 + 6

1 + 7 A

2 + 8 1 +7

Page 21: Lecture 3 - Decision Making

Heuristic Example - A*

21

4 + 4 B

3 + 5 4 + 4

2 + 6

1 + 7 A

2 + 8 1 +7

Page 22: Lecture 3 - Decision Making

Heuristic Example - A*

22

4 + 4 5 + 3 B

3 + 5 4 + 4 5 + 3

2 + 6

1 + 7 A

2 + 8 1 +7

Page 23: Lecture 3 - Decision Making

Heuristic Example - A*

23

4 + 4 5 + 3 6 + 2 B

3 + 5 4 + 4 5 + 3

2 + 6

1 + 7 A

2 + 8 1 +7

6 + 4

Page 24: Lecture 3 - Decision Making

Heuristic Example - A*

24

4 + 4 5 + 3 6 + 2 B

3 + 5 4 + 4 5 + 3

2 + 6

1 + 7 A

2 + 8 1 +7

6 + 4

Page 25: Lecture 3 - Decision Making

Heuristic Example - A*

25

4 + 4 5 + 3 6 + 2 B

3 + 5 4 + 4 5 + 3

2 + 6

1 + 7 A

2 + 8 1 +7 2 + 6

6 + 4

Page 26: Lecture 3 - Decision Making

Heuristic Example - A*

26

4 + 4 5 + 3 6 + 2 B

3 + 5 4 + 4 5 + 3

2 + 6

1 + 7 A

2 + 8 1 +7 2 + 6

6 + 4

Page 27: Lecture 3 - Decision Making

Heuristic Example - A*

27

4 + 4 5 + 3 6 + 2 B

3 + 5 4 + 4 5 + 3

2 + 6

1 + 7 A

2 + 8 1 +7 2 + 6 3 + 5

6 + 4

Page 28: Lecture 3 - Decision Making

Heuristic Example - A*

28

4 + 4 5 + 3 6 + 2 B

3 + 5 4 + 4 5 + 3

2 + 6

1 + 7 A 4 + 4

2 + 8 1 +7 2 + 6 3 + 5 4 + 4

6 + 4

Page 29: Lecture 3 - Decision Making

Heuristic Example - A*

29

4 + 4 5 + 3 6 + 2 B

3 + 5 4 + 4 5 + 3

2 + 6

1 + 7 A 4 + 4

2 + 8 1 +7 2 + 6 3 + 5 4 + 4

6 + 4

Page 30: Lecture 3 - Decision Making

Heuristic Example - A*

30

4 + 4 5 + 3 6 + 2 B

3 + 5 4 + 4 5 + 3

2 + 6 5 + 3

1 + 7 A 4 + 4 5 + 3

2 + 8 1 +7 2 + 6 3 + 5 4 + 4

6 + 4

Page 31: Lecture 3 - Decision Making

Heuristic Example - A*

31

4 + 4 5 + 3 6 + 2 B

3 + 5 4 + 4 5 + 3

2 + 6 5 + 3

1 + 7 A 4 + 4 5 + 3

2 + 8 1 +7 2 + 6 3 + 5 4 + 4

6 + 4

Page 32: Lecture 3 - Decision Making

Heuristic Example - A*

32

4 + 4 5 + 3 6 + 2 B

3 + 5 4 + 4 5 + 3

2 + 6 5 + 3 6 + 2

1 + 7 A 4 + 4 5 + 3

2 + 8 1 +7 2 + 6 3 + 5 4 + 4

6 + 4

Page 33: Lecture 3 - Decision Making

Heuristic Example - A*

33

4 + 4 5 + 3 6 + 2 B

3 + 5 4 + 4 5 + 3

2 + 6 5 + 3 6 + 2

1 + 7 A 4 + 4 5 + 3

2 + 8 1 +7 2 + 6 3 + 5 4 + 4

6 + 4

Page 34: Lecture 3 - Decision Making

Heuristic Example - A*

34

4 + 4 5 + 3 6 + 2 B

3 + 5 4 + 4 5 + 3 7 + 1

2 + 6 5 + 3 6 + 2

1 + 7 A 4 + 4 5 + 3

2 + 8 1 +7 2 + 6 3 + 5 4 + 4

6 + 4

Page 35: Lecture 3 - Decision Making

Heuristic Example - A*

35

4 + 4 5 + 3 6 + 2 B

3 + 5 4 + 4 5 + 3 7 + 1

2 + 6 5 + 3 6 + 2

1 + 7 A 4 + 4 5 + 3

2 + 8 1 +7 2 + 6 3 + 5 4 + 4

6 + 4

Page 36: Lecture 3 - Decision Making

Heuristic Example - A*

36

4 + 4 5 + 3 6 + 2 B

3 + 5 4 + 4 5 + 3 7 + 1

2 + 6 5 + 3 6 + 2

1 + 7 A 4 + 4 5 + 3

2 + 8 1 +7 2 + 6 3 + 5 4 + 4

6 + 4

Page 37: Lecture 3 - Decision Making

Heuristic Example - A*

37

4 + 4 5 + 3 6 + 2 B

3 + 5 4 + 4 5 + 3 7 + 1

2 + 6 5 + 3 6 + 2

1 + 7 A 4 + 4 5 + 3

2 + 8 1 +7 2 + 6 3 + 5 4 + 4

6 + 4

Page 38: Lecture 3 - Decision Making

Heuristics

• Heuristics can guide our search

• Help us understand what states are bringing us

closer to our goals

• Allow us to backtrack when a promising route

becomes problematic

• Do they work well for games?

38

Page 39: Lecture 3 - Decision Making

The Maths of Choice

• Common (basic) Combinatorics problem:

‣ How many X element sub-sets can I make from this set of Y

elements.

• Less formally :

‣ How many different ways can I pick Y things from X things

39

Page 40: Lecture 3 - Decision Making

Choice

• We can refer to this as “Choosing”

• “I have 5 things, I choose 2”

• We can write it as : 5 C 2

40

Page 41: Lecture 3 - Decision Making

Binomials

• Mathematically, n C k is equivalent to the binomial coefficient

• This can be re-written as

‣ ( nk / k! )

41

Page 42: Lecture 3 - Decision Making

Permutations

• The choose operator tells you how many sets there are with

unique elements.

• What if the order that the elements are in matters?

• For this we use Permutation

‣ n P k

• Equivalent to :

‣ n! / (n - k)!

42

Page 43: Lecture 3 - Decision Making

Poker

• Card game.

• Typically involves gambling.

• “Poker” is technically an entire set of different games that

share similar structure.

• For the purposes of this lecture, Poker refers specifically to

“Limit Texas Hold ‘Em”

43

Page 44: Lecture 3 - Decision Making

Texas Hold ‘Em

• Variant of poker created in 1900’s

• Typically 2-10 player games

• Popular recently - Poker on TV and online is typically Texas

Hold ‘Em

• Aim is to make best hand 5 card hand possible using any of

two private “hole” cards and 5 public “community” cards

44

Page 45: Lecture 3 - Decision Making

Phases of the Game

• The game is broken into four phases.

• Initial or “Pre-flop” - Hole cards are dealt and a round of

betting occurs.

• Flop - The first three community cards are dealt, another

round of betting.

• Turn - A fourth community card is dealt, and a round of

betting

• River - Final community card dealt, final betting

45

Page 46: Lecture 3 - Decision Making

Some Terminology

• Raise - Increase the bet amount

• Fold - Give up on this game, losing any money already bet

• Call - Put in an amount of money to equal what others are

wagering

• Blinds - An initial mandatory wager by two players. Small and

Big. Players responsible for the blind rotates each game.

46

Page 47: Lecture 3 - Decision Making

Poker in Research

• Poker has been a major research area for AI for many years.

• Characteristics in common with many real world problems

‣ Hidden information

‣ Bluffing

‣ Loss minimisation

47

Page 48: Lecture 3 - Decision Making

Poker at SAIG

• Major research area for us for many years

• Under my supervision for the last 2 years as

honours projects and Summer internships.

• Much of what you’re going to hear about this week

is based on current research happening right now at

SAIG

48

Page 49: Lecture 3 - Decision Making

Strathclyde Poker Research Environment

• SPREE was developed to overcome two challenges we face.

‣ Training data sets obtained from online casinos are

imperfect information. This leads to bad machine learning

‣ Every research project wasted significant time re-

implementing a framework for Poker

• SPREE is open source client/server implementation

in Java, with AI-based client and GUI client.

• http://sourceforge.net/projects/spree-poker

49

Page 50: Lecture 3 - Decision Making

Limit or No Limit?

• Two types of game - Limit and No Limit

• No Limit - Classical movie Poker.

‣ Raises can be any amount

‣ Any number of raises

• Limit - Common rule set

‣ Raises are a single fixed amount

‣ Limited number of raises allowed per round

50

Page 51: Lecture 3 - Decision Making

Limit or No Limit?

• Focus on Limit

• Significantly reduces complexity of the problem.

• Also means we can focus on the game, rather than the

psychological aspects.

51

Page 52: Lecture 3 - Decision Making

Poker State Space

• At each point, each player has typically 3 options

‣ Raise, Call, Fold

• We can approximate the size of the search space at point k

as 3k

• We can also determine lower and upper bounds for k since

in Limit there are a fixed number of raises.

52

Page 53: Lecture 3 - Decision Making

Dealing Cards

• For a game of N players, 2N + 5 cards are required.

• There are 52 C (2N + 5) different sets of cards that could be

dealt.

• But who gets which card is important, so we need to use

Permutation not Choose

• 52 P (2N+5)

‣ For a standard 10 player game - 5.86 x1024

53

Page 54: Lecture 3 - Decision Making

Length of a Poker Game(Lower)

• In the shortest game possible, all players fold.

• The last player (who put in the Big Blind) wins by default

• N-1 choices to reach this point

• 2N cards are required

• 3(N-1) * 52 P 2N

• For a standard 10 player game :

‣ 19683 * 3x1032 = 6x1036

54

Page 55: Lecture 3 - Decision Making

Length of a Poker Game(Upper)

• In the longest game possible

• All players initially call, final player to call instead raises.

• 4N-4 turns per round, 4 rounds = 16N-16 turns total

• 2N + 5 cards required

• 3(16N-6) * 52 P 2N + 5

• Again for a 10 player game

‣ 5x1068 * 7.4x1039 = 3.7x10108

55

Page 56: Lecture 3 - Decision Making

Total State Space Size

• The total state space is smaller than Shannon’s number

• Still completely unwieldy for any kind of exhaustive search

• Note that we’ve considered the lower and upper bounds of

the state space.

• Actual values will typically fall somewhere between.

• Also note that the upper bound hinges on the restrictions

imposted by Limit, and we don’t need to consider any state

complexity variable raise size would introduce.

56

Page 57: Lecture 3 - Decision Making

Abstraction

• There are some things we can do to trim this down (a bit)

• Firstly, we can simplify our view of the starting position

• We don’t need to consider all possible cards that could be

dealt

‣ Cards that will help us change the situation

‣ Cards that don’t help us can be grouped together

57

Page 58: Lecture 3 - Decision Making

Starting Hands

• There are 52 C 2 = 1,326 potential opening hands

• But we can reduce this

‣ Suit doesn’t matter except for matching

‣We can reduce it to 2 card “suited” or “unsuited”

‣ 2c, 7d is equivalent to 2d, 7c or 2s, 7h

• This gives a total number of abstract hands as

‣ 13 (pairs) + 13 C 2 (suited) + 13 C 2 (unsuited) = 169

•We’ll see tomorrow there are more abstractions.

58

Page 59: Lecture 3 - Decision Making

Heuristics for Poker

• “Every hand’s a winner and every hand’s a loser”

• Heuristics for Poker are tricky because of this.

• Analysis is largely based on your own hand - if my hand at a

point is such-and-such a type or better, it is worth playing

• Kind of naive

59

Page 60: Lecture 3 - Decision Making

“Expert” Poker Systems

• You can make a somewhat capable agent by combining a

bunch of these naive heuristics.

• It’s known which of the starting hands are strong and which

are weak.

• You can make a guess as to what you should do based on

your hand strength.

‣ This is not massively informed

• Basic, functional approach, attempts to lift out general rules

that will lead to good results.

60

Page 61: Lecture 3 - Decision Making

Evaluating DelayedReward in Poker

• I’ve mentioned delayed reward a few times

• How does this fit into Poker?

• We know that the strength of our hand alone won’t

decide the game.

• We know that opponents can bluff about their hand

strength.

• Need to find out “what happens if” for possible

actions61

Page 62: Lecture 3 - Decision Making

Monte Carlo Tree Search

• Initially used without formally defining it by Buffon and Fermi

(among others)

• Developed at Los Alamos by our Game Theory friend John

Von Neumann

• For a large enough sample size, random sampling can often

take the place of exhaustive enumeration

62

Page 63: Lecture 3 - Decision Making

Samples and Probes

• When we say a “random sample” we want to sample

the potential outcomes

‣ And find the potential rewards

• The leaf nodes of the game tree have the final value

of the game.

• By randomly walking from the current node to leaf

nodes, we can build up a picture of where our

actions might lead us.63

Page 64: Lecture 3 - Decision Making

Exploration vs Exploitation

• We can sample at random, and we'll get coverage in all areas

• Some areas are more promising than others

• We want to "exploit" these areas and inspect them closely

‣ Ensure that they are as good as they look

• At the same time, we want to keep "exploring" in case there

are better areas in the game tree.

• Balancing these two contradictory goals falls to the UCT

heuristic.

64

Page 65: Lecture 3 - Decision Making

Reward Evaluation

• We can use the Monte Carlo samples to simulate down to

the end of the game.

• Establish whether we win or lose (and how much).

• Bubble this value back up the tree.

• Build a picture of the amount we can expect to win based on

the actions we are considering this turn.

65

Page 66: Lecture 3 - Decision Making

Caveat Emptor

•What we’ve seen today is just ONE approach to

tackling Poker.

• It’s an open challenge in AI to find a good solution

• The techniques used are important

• More important is the reasoning for using these

approaches.

• AI as a toolkit, not a definitive solution.

66

Page 67: Lecture 3 - Decision Making

Sampling the StPetersburg Paradox

67

12481632641282565121024

2,147,483,647

834,532,607

435,781,603

222,566,052

108,347,756

54,225,257

27,184,330

13,605,016

6,792,164

3,393,086

1,698,228

Page 68: Lecture 3 - Decision Making

Sampling the StPetersburg Paradox

68

• If we repeatedly play out the St Petersburg game we

see that it behaves much as we expect

• Half the games end immediately, a quarter after 1

turn and so on.

• 4,000,000,000 samples, the average is only £14.50

•Where the Expected Value metric didn't inform our

decision making, we can use sampling to see what

actually happens!

Page 69: Lecture 3 - Decision Making

Summary

• Understanding real games

• Delayed reward systems

• Poker

• Monte Carlo with UCT (in brief)

69

Page 70: Lecture 3 - Decision Making

Next Lecture

• More on Monte Carlo

• Describing a player mathematically

• Categorising players into types

• Using this classification for better decisions

70