Top Banner
Game Playing Perfect decisions Heuristically based decisions Pruning search trees Games involving chance
48

Game Playing

Feb 25, 2016

Download

Documents

gwylan

Game Playing. Perfect decisions Heuristically based decisions Pruning search trees Games involving chance. What is a game?. Search problem with Initial state: board position and whose turn it is Successor function: What are possible moves from here? Terminal test: Is the game over? - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Game Playing

Game Playing Perfect decisions Heuristically based decisions Pruning search trees Games involving chance

Page 2: Game Playing

What is a game? Search problem with

Initial state: board position and whose turn it is

Successor function: What are possible moves from here?

Terminal test: Is the game over? Utility function: How good is this

terminal state?

Page 3: Game Playing

Differences from problem solving

Multiagent environment Opponent makes own choices!

Playing quickly may be important – need a good way of approximating solutions and improving search

Page 4: Game Playing

Starting point:Look at entire tree

Page 5: Game Playing

Simple game Let’s play a game! Motivate minimax

Page 6: Game Playing

Minimax Decision Assign a utility value to each possible

ending Assures best possible ending,

assuming opponent also plays perfectly opponent tries to give you worst

possible ending Depth-first search tree traversal that

updates utility values as it recurses back up the tree

Page 7: Game Playing

Simple game for example:Minimax decision

3 12 8 2 4 6 14 5 2

MAX (player)

MIN(opponent)

Page 8: Game Playing

Simple game for example:Minimax decision

3

3 2 2

3 12 8 2 4 6 14 5 2

MAX (player)

MIN(opponent)

Page 9: Game Playing

Properties of Minimax Time complexity

O(bm) Space complexity

O(bm) (or O(m) if you can just generate next successor)

Same complexity as depth-first search

Page 10: Game Playing

Multiplayer games Same strategy exactly, but each

node has a utility for each player involved Assume that each player maximizes

own utility at each node

Page 11: Game Playing
Page 12: Game Playing

Typical tree size For chess, b ~ 35, m ~ 100 for a

“reasonable” game completely intractable!

Page 13: Game Playing

So what can you do? Cutoff search early and apply a heuristic

evaluation function Evaluation function can represent point

values to pieces, board position, and/or other characteristics

Evaluation function represents in some sense “probability” of winning

In practice, evaluation function is often a weighted sum

)rooksblack ofnumber - rooks whiteofnumber (

)queensblack ofnumber - queens whiteofnumber (

2

1

ww

Page 14: Game Playing

When do you cutoff search?

Most straightforward: depth limit ... or even iterative deepening

Bad in some cases What if just beyond depth limit, catastrophic

move happens? One fix: only apply evaluation function to

quiescent moves, i.e. unlikely to have wild swings in evaluation function

Example: no pieces about to be captured Run test on state – if not quiescent, run a

quiescence search for a nearby suitable state

Page 15: Game Playing

Horizon Effect One piece is about to transform the game

e.g. pawn becoming queen Opponent can prevent this for a long time,

but not forever Minimax places this stellar move “beyond the

horizon” Procrastination

Resolved (somewhat) with singular extensions Go much deeper on best moves Related to quiescent search

Page 16: Game Playing

How much lookahead for chess?

Ply = half-move Human novice: 4 ply Typical PC, human master: 8 ply Deep Blue, Deep Fritz: 10-20 ply Kasparov, Kramnik: 20-30 ply but only

on select strategies But if b=35, m = 10 (for example): Time ~ O(bm) = 3510 ~ 3.5 x 1011

Need to cut this down

Page 17: Game Playing

Alpha-Beta Pruning: Example

3 12 8 2

MAX (player)

MIN(opponent)

3

Page 18: Game Playing

Alpha-Beta Pruning: Example

3

3

3 12 8 2

MAX (player)

MIN(opponent)

Stop right here whenevaluating this node:•opponent takesminimum of these nodes,•player will take maximumof nodes above

Page 19: Game Playing

Alpha-Beta Pruning: Concept

m

n

If m > n, Player wouldchoose the m-node toget a guaranteed utilityof at least m

n-node would never bereached, stop evaluationof n-node as soon as youfind child with smallerutility

Page 20: Game Playing

Alpha-Beta Pruning: Concept

m

n

If m < n, Opponent wouldchoose the m-node toget a guaranteed utilityof at m

n-node would never bereached, stop evaluation ofn-node as soon as you finda child > m

Page 21: Game Playing

The Alpha and the Beta At any given point in time…

= largest utility found so far for MAX

= smallest utility found so far for MIN

Page 22: Game Playing

Originally from http://yoda.cis.temple.edu:8080/UGAIWWW/lectures95/search/alpha-beta.html

A: = -inf, = inf

B: = -inf, = inf

C: = -inf, = inf

D: = -inf, = inf

E: = 10, = 10 utility = 10

Page 23: Game Playing

Originally from http://yoda.cis.temple.edu:8080/UGAIWWW/lectures95/search/alpha-beta.html

A: = -inf, = inf

B: = -inf, = inf

C: = -inf, = inf

D: = -inf, = 10

E: = 10, = 10

Page 24: Game Playing

Originally from http://yoda.cis.temple.edu:8080/UGAIWWW/lectures95/search/alpha-beta.html

A: = -inf, = inf

B: = -inf, = inf

C: = -inf, = inf

D: = -inf, = 10

F: = 11, = 11

Page 25: Game Playing

Originally from http://yoda.cis.temple.edu:8080/UGAIWWW/lectures95/search/alpha-beta.html

A: = -inf, = inf

B: = -inf, = inf

C: = -inf, = inf

D: = -inf, = 10 utility = 10

F: = 11, = 11 utility = 11

Page 26: Game Playing

Originally from http://yoda.cis.temple.edu:8080/UGAIWWW/lectures95/search/alpha-beta.html

A: = -inf, = inf

B: = -inf, = inf

C: = 10, = inf

D: = -inf, = 10 utility = 10

Page 27: Game Playing

Originally from http://yoda.cis.temple.edu:8080/UGAIWWW/lectures95/search/alpha-beta.html

A: = -inf, = inf

B: = -inf, = inf

C: = 10, = inf

G: = 10, = inf

Page 28: Game Playing

Originally from http://yoda.cis.temple.edu:8080/UGAIWWW/lectures95/search/alpha-beta.html

A: = -inf, = inf

B: = -inf, = inf

C: = 10, = inf

G: = 10, = inf

H: = 9, = 9 utility = 9

Page 29: Game Playing

Originally from http://yoda.cis.temple.edu:8080/UGAIWWW/lectures95/search/alpha-beta.html

A: = -inf, = inf

B: = -inf, = inf

C: = 10, = inf

G: = 10, = 9 utility = ?

At an opponent node, with > : Stop here and backtrack (never visit I)

H: = 9, = 9

Page 30: Game Playing

Originally from http://yoda.cis.temple.edu:8080/UGAIWWW/lectures95/search/alpha-beta.html

A: = -inf, = inf

B: = -inf, = inf

C: = 10, = inf utility = 10G: = 10, = 9 utility = ?

Page 31: Game Playing

Originally from http://yoda.cis.temple.edu:8080/UGAIWWW/lectures95/search/alpha-beta.html

A: = -inf, = inf

B: = -inf, = 10

C: = 10, = inf utility = 10

Page 32: Game Playing

Originally from http://yoda.cis.temple.edu:8080/UGAIWWW/lectures95/search/alpha-beta.html

A: = -inf, = inf

B: = -inf, = 10

J: = -inf, = 10

... and so on!

Page 33: Game Playing

How effective is alpha-beta in practice?

Pruning does not affect final result With some extra heuristics (good

move ordering): Branching factor becomes b1/2

35 6 Can look ahead twice as far for same

cost Can easily reach depth 8 and play good

chess

Page 34: Game Playing

Deterministic games today Checkers: Chinook ended 40 year reign of

human world champion Marion Tinsley in 1994. Used an endgame database defining perfect play for all positions involving 8 or fewer pieces on the board, a total of 443,748,401,247 positions.

Othello: human champions refuse to compete against computers, who are too good.

Go: human champions refuse to compete against computers, who are too bad. In go, b > 300, so most programs use pattern knowledge bases to suggest plausible moves.

Page 35: Game Playing

Deterministic games today Chess: Deep Blue defeated human

world champion Gary Kasparov in a six game match in 1997. Deep Blue searched 197 million positions per second, used very sophisticated evaluation, and undisclosed methods for extending some lines of search up to 40 ply.

Page 36: Game Playing

More on Deep Blue Garry Kasparov, world champ, beat

IBM’s Deep Blue in 1996 In 1997, played a rematch

Game 1: Kasparov won Game 2: Kasparov resigned when he could

have had a draw Game 3: Draw Game 4: Draw Game 5: Draw Game 6: Kasparov made some bad

mistakes, resigned

Info from http://www.mark-weeks.com/chess/97dk$$.htm

Page 37: Game Playing

Kasparov said... “Unfortunately, I based my preparation for this match ... on

the conventional wisdom of what would constitute good anti-computer strategy.

Conventional wisdom is -- or was until the end of this match -- to avoid early confrontations, play a slow game, try to out-maneuver the machine, force positional mistakes, and then, when the climax comes, not lose your concentration and not make any tactical mistakes.

It was my bad luck that this strategy worked perfectly in Game 1 -- but never again for the rest of the match. By the middle of the match, I found myself unprepared for what turned out to be a totally new kind of intellectual challenge.

http://www.cs.vu.nl/~aske/db.html

Page 38: Game Playing

Some technical details on Deep Blue

32-node IBM RS/6000 supercomputer Each node had a Power Two Super Chip (P2SC)

Processor and 8 specialized chess processors Total of 256 chess processors working in parallel Could calculate 60 billion moves in 3 minutes

Evaluation function (tuned via neural networks) considers

material: how much pieces are worth position: how many safe squares can pieces attack king safety: some measure of king safety tempo: have you accomplished little while opponent has

gotten better position? Written in C under AIX Operating System

Uses MPI to pass messages between nodes

http://www.research.ibm.com/deepblue/meet/html/d.3.3a.html

Page 39: Game Playing

Deep Fritz Played world champion Vladimir Kramnik in

2002 More “fair” contest: Kramnik could play with Deep

Fritz software in advance Ran on $40k 8 processor Compaq server running

Windows XP, essentially same software sold for normal computers

Searched fewer moves than Deep Blue per second, but heuristics were better

Pic from ww.chess.gr

Page 40: Game Playing

Kramnik starts strong Game 1: Kramnik black, Fritz white

Typically play to a draw when playing black. Fritz ended up in “Berlin endgame” which Kramnik knows better than anyone. Kramnik sealed a draw.

Game 2: Kramnik white, Fritz black Fritz makes a dreadfully stupid mistake that

beginners don’t even make. Kramnik wins. http://www.chessbase.com/images2/2002/bahrain/games/bahrain2.htm

Game 3: Kramnik black, Fritz black Fritz traded queens, but couldn’t fight this kind of

battle, Kramnik wins

Page 41: Game Playing

But later… Game 4: Kramnik white, Fritz black

Kramnik ended up in a long, drawn out ending resulting in a draw

Game 5: Kramnik black, Fritz white Deep in a difficult game, Kramnik makes worst mistake

of career and resigns, Fritz wins Game 6: Kramnik white, Fritz black

Kramnik resigns, but analysis after the fact hasn’t found a certain win for black, Fritz wins

Game 7: Kramnik black, Fritz white Kramnik plays to draw

Game 8: Kramnik white, Fritz black 21 moves in, Kramnik can’t do anything, offers draw and

Fritz accepts

Page 42: Game Playing

Alpha-Beta Pruning:Coding It

(defun max-value (state alpha beta) (let ((node-value 0)) (if (cutoff-test state) (evaluate state)

(dolist (new-state (neighbors state) nil) (setf node-value (min-value new-state alpha beta)) (setf alpha (max alpha node-value)) (if (>= alpha beta) (return beta))) alpha)))

Page 43: Game Playing

Alpha-Beta Pruning:Coding It

(defun min-value (state alpha beta) (let ((node-value 0)) (if (cutoff-test state) (evaluate state)

(dolist (new-state (neighbors state) nil) (setf node-value (max-value new-state alpha beta)) (setf beta (min beta node-value)) (if (<= beta alpha) (return alpha))) beta)))

Page 44: Game Playing

Nondeterminstic Games Games with an element of chance (e.g.,

dice, drawing cards) like backgammon, Risk, RoboRally, Magic, etc.

Add chance nodes to tree

Page 45: Game Playing

Example with coin flip instead of dice (simple)

2 4 7 4 6 0 5 -2

0.5 0.5 0.5 0.5

children

d)ility(chilP(child)ut

node chancefor valueExpected

Page 46: Game Playing

Example with coin flip instead of dice (simple)

3

2

2 4

3

4

7 4

0

6 0

-2

5 -2

-1

0.5 0.5 0.5 0.5

Page 47: Game Playing

Expectiminimax Methodology For each chance node, determine expected value Evaluation function should be linear with value,

otherwise expected value calculations are wrong Evaluation should be linearly proportional to expected

payoff Complexity: O(bmnm), where n=number of

random states (distinct dice rolls) Alpha-beta pruning can be done

Requires a bounded evaluation function Need to calculate upper / lower bounds on utilities Less effective

Page 48: Game Playing

Real World Most gaming systems start with these

concepts, then apply various hacks and tricks to get around computability problems

Databases of stored game configurations

Learning (coming up next): Chapter 18