Top Banner
ICS-270a:Notes 5: 1 Notes 5: Game-Playing ICS 270a Winter 2003
30

ICS-270a:Notes 5: 1 Notes 5: Game-Playing ICS 270a Winter 2003.

Dec 24, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: ICS-270a:Notes 5: 1 Notes 5: Game-Playing ICS 270a Winter 2003.

ICS-270a:Notes 5: 1

Notes 5: Game-Playing

ICS 270a Winter 2003

Page 2: ICS-270a:Notes 5: 1 Notes 5: Game-Playing ICS 270a Winter 2003.

ICS-270a:Notes 5: 2

Summary

• Computer programs which play 2-player games

– game-playing as search

– with the complication of an opponent

• General principles of game-playing and search

– evaluation functions

– minimax principle

– alpha-beta-pruning

– heuristic techniques

• Status of Game-Playing Systems

– in chess, checkers, backgammon, Othello, etc, computers routinely defeat leading world players

• Applications?

– think of “nature” as an opponent

– economics, war-gaming, medical drug treatment

Page 3: ICS-270a:Notes 5: 1 Notes 5: Game-Playing ICS 270a Winter 2003.

ICS-270a:Notes 5: 3

Chess Rating Scale

1200

1400

1600

1800

2000

2200

2400

2600

2800

3000

1966 1971 1976 1981 1986 1991 1997

Ratings

Garry Kasparov (current World Champion) Deep Blue

Deep Thought

Page 4: ICS-270a:Notes 5: 1 Notes 5: Game-Playing ICS 270a Winter 2003.

ICS-270a:Notes 5: 4

Solving 2-players Games

• Two players, perfect information

• Examples:

– e.g., chess, checkers, tic-tac-toe

• configuration of the board = unique arrangement of “pieces”

• Statement of Game as a Search Problem

– States = board configurations

– Operators = legal moves

– Initial State = current configuration

– Goal = winning configuration

– payoff function = gives numerical value of outcome of the game

• A working example: Grundy's game

– Given a set of coins, a player takes a set and divides it into two unequal sets. The player who plays last, looses.

Page 5: ICS-270a:Notes 5: 1 Notes 5: Game-Playing ICS 270a Winter 2003.

ICS-270a:Notes 5: 5

Game Tree Representation

• New aspect to search problem– there’s an opponent we cannot control– how can we handle this?

SComputer Moves

OpponentMoves

ComputerMoves

G Possible Goal Statelower in Tree (winning situation for computer)

Page 6: ICS-270a:Notes 5: 1 Notes 5: Game-Playing ICS 270a Winter 2003.

ICS-270a:Notes 5: 6

Game Trees

Page 7: ICS-270a:Notes 5: 1 Notes 5: Game-Playing ICS 270a Winter 2003.

ICS-270a:Notes 5: 7

Game Trees

Page 8: ICS-270a:Notes 5: 1 Notes 5: Game-Playing ICS 270a Winter 2003.

ICS-270a:Notes 5: 8

Grundy’s Game

• Search Tree: represents Max moves.

• Goal: evaluate root node-value of game

• 0 - loss

• 1 - win

• In complex games search to termination is impossible. Rather:

• Find a first good move.

• Do it, wait for Min’s response

• Find a good move from new state

Page 9: ICS-270a:Notes 5: 1 Notes 5: Game-Playing ICS 270a Winter 2003.

ICS-270a:Notes 5: 9

Grundy’s game - special case of nim

Page 10: ICS-270a:Notes 5: 1 Notes 5: Game-Playing ICS 270a Winter 2003.

ICS-270a:Notes 5: 10

An optimal procedure: The Min-Max method

• Designed to find the optimal strategy for Max and find best move:

– 1. Generate the whole game tree to leaves

– 2. Apply utility (payoff) function to leaves

– 3. Back-up values from leaves toward the root:

• a Max node computes the max of its child values

• a Min node computes the Min of its child values

– 4. When value reaches the root: choose max value and the corresponding move.

• However: It is impossible to develop the whole search tree, instead

develop part of the tree and evaluate promise of leaves using a static

evaluation function.

Page 11: ICS-270a:Notes 5: 1 Notes 5: Game-Playing ICS 270a Winter 2003.

ICS-270a:Notes 5: 11

Complexity of Game Playing

• Imagine we could predict the opponent’s moves given each computer move

• How complex would search be in this case?

– worst case, it will be O(bd)

– Chess:

• b ~ 35 (average branching factor)

• d ~ 100 (depth of game tree for typical game)

• bd ~ 35100 ~10154 nodes!!

– Tic-Tac-Toe

• ~5 legal moves, total of 9 moves

• 59 = 1,953,125

• 9! = 362,880 (Computer goes first)

• 8! = 40,320 (Computer goes second)

• well-known games can produce enormous search trees

Page 12: ICS-270a:Notes 5: 1 Notes 5: Game-Playing ICS 270a Winter 2003.

ICS-270a:Notes 5: 12

Static (Heuristic) Evaluation Functions

• An Evaluation Function:

– estimates how good the current board configuration is for a player.

– Typically, one figures how good it is for the player, and how good it is for the opponent, and subtracts the opponents score from the players

– Othello: Number of white pieces - Number of black pieces

– Chess: Value of all white pieces - Value of all black pieces

• Typical values from -infinity (loss) to +infinity (win) or [-1, +1].

• If the board evaluation is X for a player, it’s -X for the opponent

• Example:

– Evaluating chess boards,

– Checkers

– Tic-tac-toe

Page 13: ICS-270a:Notes 5: 1 Notes 5: Game-Playing ICS 270a Winter 2003.

ICS-270a:Notes 5: 13

General Minimax Procedure on a Game Tree

For each move:

1. expand the game tree as far as possible

2. assign state evaluations at each open node

3. propagate upwards the minimax choicesif the parent is a Min node (opponent)

propagate up the minimum value of the childrenif the parent is a Max node (computer)

propagate up the maximum value of the children

Page 14: ICS-270a:Notes 5: 1 Notes 5: Game-Playing ICS 270a Winter 2003.

ICS-270a:Notes 5: 14

Minimax Principle

• “Assume the worst”

– say each configuration has an evaluation number

– high numbers favor the player (the computer)

• so we want to choose moves which maximize evaluation

– low numbers favor the opponent

• so they will choose moves which minimize evaluation

• Minimax Principle

– you (the computer) assume that the opponent will choose the minimizing move next (after your move)

– so you now choose the best move under this assumption

• i.e., the maximum (highest-value) option considering both your move and the opponent’s optimal move.

– we can extend this argument more than 2 moves ahead: we can search ahead as far as we can afford.

Page 15: ICS-270a:Notes 5: 1 Notes 5: Game-Playing ICS 270a Winter 2003.

ICS-270a:Notes 5: 15

Applying MiniMax to tic-tac-toe

• The static evaluation function heuristic

Page 16: ICS-270a:Notes 5: 1 Notes 5: Game-Playing ICS 270a Winter 2003.

ICS-270a:Notes 5: 16

Backup Values

Page 17: ICS-270a:Notes 5: 1 Notes 5: Game-Playing ICS 270a Winter 2003.

ICS-270a:Notes 5: 17

Page 18: ICS-270a:Notes 5: 1 Notes 5: Game-Playing ICS 270a Winter 2003.

ICS-270a:Notes 5: 18

Page 19: ICS-270a:Notes 5: 1 Notes 5: Game-Playing ICS 270a Winter 2003.

ICS-270a:Notes 5: 19

Pruning with Alpha/Beta

• In Min-Max there is a separation between node generation and evaluation.

Backup Values

Page 20: ICS-270a:Notes 5: 1 Notes 5: Game-Playing ICS 270a Winter 2003.

ICS-270a:Notes 5: 20

Alpha Beta Procedure

• Idea:

– Do Depth first search to generate partial game tree,

– Give static evaluation function to leaves,

– compute bound on internal nodes.

• Alpha, Beta bounds:

– Alpha value for Max node means that Max real value is at least alpha.

– Beta for Min node means that Min can guarantee a value below Beta.

• Computation:

– Alpha of a Max node is the maximum value of its seen children.

– Beta of a Min node is the lowest value seen of its child node .

Page 21: ICS-270a:Notes 5: 1 Notes 5: Game-Playing ICS 270a Winter 2003.

ICS-270a:Notes 5: 21

When to Prune

• Pruning

– Below a Min node whose beta value is lower than or equal to the alpha value of its ancestors.

– Below a Max node having an alpha value greater than or equal to the beta value of any of its Min nodes ancestors.

Page 22: ICS-270a:Notes 5: 1 Notes 5: Game-Playing ICS 270a Winter 2003.

ICS-270a:Notes 5: 22

Effectiveness of Alpha-Beta Search

• Worst-Case

– branches are ordered so that no pruning takes place. In this case alpha-beta gives no improvement over exhaustive search

• Best-Case

– each player’s best move is the left-most alternative (i.e., evaluated first)

– in practice, performance is closer to best rather than worst-case

• In practice often get O(b(d/2)) rather than O(bd)

– this is the same as having a branching factor of sqrt(b),

• since (sqrt(b))d = b(d/2)

• i.e., we have effectively gone from b to square root of b

– e.g., in chess go from b ~ 35 to b ~ 6

• this permits much deeper search in the same amount of time

Page 23: ICS-270a:Notes 5: 1 Notes 5: Game-Playing ICS 270a Winter 2003.

ICS-270a:Notes 5: 23

Iterative (Progressive) Deepening

• In real games, there is usually a time limit T on making a move

• How do we take this into account?

– using alpha-beta we cannot use “partial” results with any confidence unless the full breadth of the tree has been searched

– So, we could be conservative and set a conservative depth-limit which guarantees that we will find a move in time < T

• disadvantage is that we may finish early, could do more search

• In practice, iterative deepening search (IDS) is used

– IDS runs depth-first search with an increasing depth-limit

– when the clock runs out we use the solution found at the previous depth limit

Page 24: ICS-270a:Notes 5: 1 Notes 5: Game-Playing ICS 270a Winter 2003.

ICS-270a:Notes 5: 24

Heuristics and Game Tree Search

• The Horizon Effect

– sometimes there’s a major “effect” (such as a piece being captured) which is just “below” the depth to which the tree has been expanded

– the computer cannot see that this major event could happen

– it has a “limited horizon”

– there are heuristics to try to follow certain branches more deeply to detect to such important events

– this helps to avoid catastrophic losses due to “short-sightedness”

• Heuristics for Tree Exploration

– it may be better to explore some branches more deeply in the allotted time

– various heuristics exist to identify “promising” branches

Page 25: ICS-270a:Notes 5: 1 Notes 5: Game-Playing ICS 270a Winter 2003.

ICS-270a:Notes 5: 25

Computers can play GrandMaster Chess

• “Deep Blue” (IBM)

– parallel processor, 32 nodes

– each node has 8 dedicated VLSI “chess chips”

– each chip can search 200 million configurations/second

– uses minimax, alpha-beta, heuristics: can search to depth 14

– memorizes starts, end-games

– power based on speed and memory: no common sense

• Kasparov v. Deep Blue, May 1997

– 6 game full-regulation chess match (sponsored by ACM)

– Kasparov lost the match (2.5 to 3.5)

– a historic achievement for computer chess: the first time a computer is the best chess-player on the planet

• Note that Deep Blue plays by “brute-force”: there is relatively little which is similar to human intuition and cleverness

Page 26: ICS-270a:Notes 5: 1 Notes 5: Game-Playing ICS 270a Winter 2003.

ICS-270a:Notes 5: 26

Status of Computers in Other Games

• Checkers/Draughts– current world champion is Chinook, can beat any human– uses alpha-beta search

• Othello– computers can easily beat the world experts

• Backgammon– system which learns is ranked in the top 3 in the world– uses neural networks to learn from playing many many games against

itself

• Go– branching factor b ~ 360: very large!– $2 million prize for any system which can beat a world expert

Page 27: ICS-270a:Notes 5: 1 Notes 5: Game-Playing ICS 270a Winter 2003.

ICS-270a:Notes 5: 27

Summary

• Game playing is best modeled as a search problem

• Game trees represent alternate computer/opponent moves

• Evaluation functions estimate the quality of a given board configuration for the Max player.

• Minimax is a procedure which chooses moves by assuming that the opponent will always choose the move which is best for them

• Alpha-Beta is a procedure which can prune large parts of the search tree and allow search to go deeper

• For many well-known games, computer algorithms based on heuristic search match or out-perform human world experts.

• Reading: Nillson Chapter 12, R&N Chapter 5.

Page 28: ICS-270a:Notes 5: 1 Notes 5: Game-Playing ICS 270a Winter 2003.

ICS-270a:Notes 5: 28

Minimax Search Example

• Look ahead several turns (we’ll use 2 for now)

• Evaluate resulting board configurations

• The computer will make the move such that when the opponent makes his best move, the board configuration will be in the best position for the computer

Page 29: ICS-270a:Notes 5: 1 Notes 5: Game-Playing ICS 270a Winter 2003.

ICS-270a:Notes 5: 29

Propagating Minimax Values up the Game Tree

• Starting from the leaves

– Assign a value to the parent node as follows

• Children are Opponent’s moves: Minimum of all immediate children

• Children are Computer’s moves: Maximum of all immediate children

Page 30: ICS-270a:Notes 5: 1 Notes 5: Game-Playing ICS 270a Winter 2003.

ICS-270a:Notes 5: 30

Deeper Game Trees