CS 480: GAME AIsanti/teaching/2012/CS...Minimax vs Monte-Carlo Minimax: Monte-Carlo: U U U U U U U U U U U U U U U U Monte-Carlo search runs, for each possible move at the root node,

CS 480: GAME AI ADVERSARIAL SEARCH 2

5/29/2012 Santiago Ontañón [email protected] https://www.cs.drexel.edu/~santi/teaching/2012/CS480/intro.html

Reminders • Check BBVista site for the course regularly • Also: https://www.cs.drexel.edu/~santi/teaching/2012/CS480/intro.html

• Project 4 description is available. •  Project 4 due June 7th

Outline • Student Presentations:

•  “Game AI as Storytelling” •  “Computational Approaches to Story-telling and Creativity”

• Monte-Carlo Search Algorithms • UCT • Strategy Simulation




Board Games • Main characteristic: turn-based

•  The AI has a lot of time to decide the next move

Board Games • Not just chess…

Board Games •  From an AI point of view:

•  Turn-based •  Discrete actions •  Complete information (mostly)

•  Those features make these games amenable to game tree search!

Game Tree Search in Complex Games •  Classic minimax assumes (Chess, Checkers, Go…):

•  2 players •  Perfect information •  Turn-taking game •  Given a state and an action, we can predict the next state

•  It is easily generalizable to a multiplayer turn-taking game

(max^n algorithm)

•  Complex games (like RTS games): •  Real-time, not turn-taking, simultaneous actions •  Lots of possible actions: branching factor too large! •  We cannot exactly predict the next state •  Imperfect information

Game Tree Search in RTS Games • Problem:

•  Lots of possible actions, branching factor too large!

• Solution: •  ???

• Problems: •  real-time, no turn taking, simultaneous actions




• Solution: •  Sampling (Monte-Carlo Search)



Monte-Carlo Methods •  Idea: use sampling instead of exact calculations • Simplest Monte-Carlo method:

•  Integration •  Imagine a very complex function f(x), we want to compute the

definite integral of f(x) between a and b:

•  Generate N random numbers between a and b. For each number n, compute f(n), and do the average.

•  For large values of N, this converges to the actual integral!

Z b

af(x)

Monte-Carlo Tree Search • Monte-Carlo Search:

•  Instead of opening the whole minimax tree •  Approximate it by sampling (same idea as for the integral)

•  For each possible action: play N games at random until the end starting with each action

•  If N is large, the average win ratio converges to the expected utility of the action

Minimax vs Monte-Carlo Minimax: Monte-Carlo:

U U U U U U U U U U U U U U U U



Minimax opens the complete tree (all

possible moves) up to a fixed depth.

Then, the Utility function is applied to

the leaves.



Monte-Carlo search runs, for each possible move at the root

node, a fixed number K of random complete games.

No need for a Utility function (but it can be used),

Complete Game

Monte-Carlo Search • Advantages:

•  Scales up better than minimax (less sensitive to branching factors) •  No need for a utility function! Just play till the end and return the

move with highest probability of win.

• Disadvantages: •  Brittle: possibility that a good move of the opponent is not sampled

Monte-Carlo Search Improvements • Each branch of a Monte-Carlo search tree is a random

game. •  Instead of generating random games, bias the probability

of each move: •  Example, in chess, favor capturing moves. This is more likely to

generate move sequences that make sense! •  In general: use game-play data to learn which moves are more

frequent, and use those probabilities when generating random games.

Monte-Carlo Search Uses • Extremely useful in complex games when minimax cannot

be used

• When trying to decide between a set of actions: •  Just play random games with each action, and select the best one

• Can be used, for example, in: •  RTS games •  RPG game battles •  Board games •  Etc.




Monte-Carlo Tree Search: UCT • Upper Confidence Tree (UCT) is a state of the art, simple

variant of Monte-Carlo Search, responsible for the recent success of Computer Go programs

•  Ideas: •  Sampling optimally (UCB) •  Instead of opening the whole Minimax tree or play N random

games open only the upper part of the tree, and play random games from there

UCT 0/0 Tree Search

Monte-Carlo Search

Current state w/t is the account of how many games starting from this state

have be found to be won out of the total games explored in the

current search

Current State

UCT 1/1 Tree Search

Monte-Carlo Search

win

UCT 1/2 Tree Search

Monte-Carlo Search

0/1

loss

At each iteration, one node of the tree (upper part) is selected and expanded (one node added to the tree). From this new node a complete game is played out

at random (Monte-Carlo)

UCT 2/3 Tree Search

Monte-Carlo Search

0/1

At each iteration, one node of the tree (upper part) is selected and expanded (one node added to the tree). From this new node a complete game is played out

at random (Monte-Carlo)

1/1

win

UCT 3/4 Tree Search

Monte-Carlo Search

0/1 2/2

1/1

win

The counts w/t are used to determine which nodes to explore next.

Naïve Exploration/Exploitation policy: 50% expand the best node in the tree

50% expand a node at random

UCT 3/4 Tree Search

Monte-Carlo Search

0/1 2/2

1/1

win

The counts w/t are used to determine which nodes to explore next.

Naïve Exploration/Exploitation policy: 50% expand the best node in the tree

50% expand a node at random Instead of this naïve policy, UCT uses an optimal sampling policy called UCB (Upper Confidence Bounds) coming

from reinforcement learning.

UCT 3/5 Tree Search

Monte-Carlo Search

0/1 2/3

1/1 0/1

loss

The tree ensures all relevant actions are explored (greatly alleviates the

randomness that affects Monte-Carlo methods)

UCT 3/5 Tree Search

Monte-Carlo Search

0/1 2/3

1/1 0/1

loss

The random games played from each node of the tree serve to estimate the

Utility function. They can be random, or use an opponent model (if available)

UCT • After a fixed number of iterations K (or after the assigned

time is over), UCT analyzes the resulting trees, and the selected action is the one that has been explored more often.

• UCT can search in games with much larger state spaces than minimax. It is the standard algorithms for modern (from 2008 to present) Go playing programs






• Solution: •  Sampling (Monte-Carlo Search)


• Solution: •  Strategy simulation, rather than turn-based action taking

Strategy Simulation: Example • Assume we want to use UCT for the Strategy module of

an RTS AI game • Define a collection of “high level actions” (or strategies)

that make sense for the game. For example, in S3: •  S1: Attack with the units we have •  S2: Train 4 footmen •  S3: Train 4 archers •  S4: Train 4 catapults •  S5: Train 4 knights •  S6: Build 2 defense Towers •  S7: Build 2 defense Towers around a Gold Mine •  S8: Build 2 defense Towers around a group of Trees •  S9: Bring units back to the base •  S10: Train 2 more peasants to gather resources

Strategy Simulation: Example •  Instead of taking turns in executing actions, we assign a

“strategy” to each player, and simulate it until completion:

Player 1, Action 1

Player 2, Action 2

Player 1, Action 3

Player 1: S2 (ETA 240) Player 2: S3 (ETA 400)



Player 1: S1

Player 2: S1

Standard Minimax

Strategy Simulation

Strategy Simulation • Requires:

•  A way to simulate strategies: typically a very simplified model •  E.g. battles just decided by who has more units, or added damage of

units (taking into account air/ground units) •  No pathfinding, etc. •  Abstracted version of the game, e.g.: divide map in regions, and just

count the number of unit types in each region •  Utility function (optional):

•  If available, there is no need to simulate games till the end when using Monte-Carlo

•  If not available, simply simulate games to the end

UCT for RTS Games • Applicable to:

•  Strategy (previous example) •  Attack: where the high-level actions are things like attack enemy X,

retreat, etc. •  Economy

•  In Turn-based games, minimax is executed each turn •  For RTS games: execute each K cycles (e.g. once per second), or

once the current action has finished, or an important event happened (e.g. new enemy sighted)

• State of the art: •  No current commercial games use it •  Research in experimental games shows its potential

Projects 3 & 4 • Project 4 (and last): Rule-based Strategy for RTS Game (S3)

•  Idea: •  Create a perception layer that creates a simple knowledge base (logical

terms) •  Create a simple unification algorithm with variable bindings •  Define a set of actions the rule-based system can execute •  Define a small set of rules (do not overdo it! J) •  RETE is optional (extra credit) •  See how well it plays and how easy is it to make the AI play well!

•  Anyone wants to do a different project 4? Any ideas?

Next Thursday • Machine Learning in games (last lecture!)

CS 480: GAME AIsanti/teaching/2012/CS...Minimax vs Monte-Carlo Minimax: Monte-Carlo: U U U U U U U U U U U U U U U U Monte-Carlo search runs, for each possible move at the root node,

Documents