COMP-4640: Intelligent & Interactive Syste Game Playing A game can be formally defined as a search problem with: -An initial state -a set of operators (actions or moves) -a terminal test -a utility (payoff)
Feb 08, 2016
COMP-4640: Intelligent & Interactive SystemsGame Playing
A game can be formally defined as a search problem with:
-An initial state
-a set of operators (actions or moves)
-a terminal test
-a utility (payoff) function
1. Multi-agent environment– Multi-player games involve planning and acting in environments
populated by other active agents– Agents use sense/plan/act architecture that does not plan too far into
the unpredictable future– But with proper information agent can construct plan that consider the
effects of the actions of other agents– In AI we will consider the special case of a games,
• deterministic• turn taking• two-player• zero sum games of perfect-information
2. Zero Sum Games– either one of them wins (and the other loses), or a draw results– +1 win -1 loss 0 draw
3. Agents utility functions make the games adversarial
COMP-4640: Intelligent & Interactive SystemsGame Playing
Multi-agent environmentRobot Soccer
COMP-4640: Intelligent & Interactive SystemsGame Playing
Game tree (2-player, deterministic, turns)
COMP-4640: Intelligent & Interactive SystemsGame PlayingThe Minimax Algorithm
COMP-4640: Intelligent & Interactive SystemsGame Playing
The Minimax Algorithm
COMP-4640: Intelligent & Interactive SystemsGame Playing
• The evaluation function:• Must have the same terminal states (goal states)
as the utility function• Must be of reasonable complexity so that it can
be computed quickly (this is a trade-off between Accuracy and Time)
• Should be accurate• The performance of the game playing system
depends on the accuracy “goodness” of the evaluation function
COMP-4640: Intelligent & Interactive SystemsGame Playing
• One problem with using minimax is that it may not be feasible to search the whole game tree for a minimax decision (move or action)
• Using depth-limited search may speed thing up the minimax decision process but instead of using the utility function one would need to construct an evaluation fuction.
• This evaluation function would provide an estimate of the expected utility of a game position
COMP-4640: Intelligent & Interactive Systems
Game Playing Properties of minimax• Complete? Yes (if tree is finite)• Optimal? Yes (against an optimal opponent)• Time complexity? O(bm)• Space complexity? O(bm) (depth-first exploration)
• For chess, b ≈ 35, m ≈100 for "reasonable" games exact solution completely infeasible
•••••
Once we have developed a good evaluation function, we must also consider:
• The depth-limit
• The Horizon Problem– Difficult to eliminate– When a program is facing a move by the opponent
that causes serious damage and is ultimately unavoidable
– Stalling pushes the move over the horizon to a place where it can’t be detected
COMP-4640: Intelligent & Interactive SystemsGame Playing
• Once we have an evaluation function and a depth-limit we can then re-apply minimax search.
• However, for depth-limited search minimax may still be inefficient.
• Minimax will expand nodes that need not be searched.
• By making our search method more efficient, we will be able to search at deeper levels of our game tree.
COMP-4640: Intelligent & Interactive SystemsGame Playing
COMP-4640: Intelligent & Interactive SystemsGame Playing: Alpha-Beta Pruning
1. Search below a MIN node may be alpha-pruned if the beta value is < to the alpha value of some MAX ancestor.
2. Search below a
MAX node may be beta-pruned if the alpha value is > to the beta value of some MIN ancestor.
27
Alpha-Beta Pruning (αβ prune) • Rules of Thumb
– α is the highest max found so far– β is the lowest min value found so far
– If Min is on top Alpha prune– If Max is on top Beta prune
– You will only have alpha prune’s at Min level– You will only have beta prunes at Max level
– See detailed algorithm p167
COMP-4640: Intelligent & Interactive SystemsGame Playing: Alpha-Beta Pruning
1. Search below a MIN node may be alpha-pruned if the beta value is < to the alpha value of some MAX ancestor.
2. Search below a
MAX node may be beta-pruned if the alpha value is > to the beta value of some MIN ancestor.
27
COMP-4640: Intelligent & Interactive SystemsGame Playing: Alpha-Beta Pruning
1. Search below a MIN node may be alpha-pruned if the beta value is < to the alpha value of some MAX ancestor.
2. Search below a
MAX node may be beta-pruned if the alpha value is > to the beta value of some MIN ancestor.
3
2 3
3
593
3
β
5
COMP-4640: Intelligent & Interactive SystemsGame Playing: Alpha-Beta Pruning
1. Search below a MIN node may be alpha-pruned if the beta value is < to the alpha value of some MAX ancestor.
2. Search below a
MAX node may be beta-pruned if the alpha value is > to the beta value of some MIN ancestor.
3
2 3
3
5 0
9
93
3
9
0
0
747
0 7
α
β
COMP-4640: Intelligent & Interactive SystemsGame Playing: Alpha-Beta Pruning
1. Search below a MIN node may be alpha-pruned if the beta value is < to the alpha value of some MAX ancestor.
2. Search below a
MAX node may be beta-pruned if the alpha value is > to the beta value of some MIN ancestor.
3
2 3
3
5 0
9
93
3
3
90
0
02
22 6
22 1 5
6747 6
0 7
αα
β
COMP-4640: Intelligent & Interactive SystemsGame Playing
COMP-4640: Intelligent & Interactive SystemsGame Playing
3
3
5 0 6 1
65 3
4 7
73
5
5 6
55
3
2
COMP-4640: Intelligent & Interactive SystemsGame Playing
α
β3
3
5 0 6 1
65 3
4 7
73
5
5 6
55
3
2
COMP-4640: Intelligent & Interactive SystemsGame of Chance: Expecti-minimax
•Initial value of leaves indicate board state•Use percentage chance based upon roll for first calculated value•Min eval f(n) selects Max value•The second roll uses different assigned percentage chance•Max eval f(n) selects Max value
COMP-4640: Intelligent & Interactive SystemsGame of Chance: Expecti-minimax
3
0
0
3 0
(3*1.0)
•Initial value of leaves indicate board state•Use percentage chance based upon roll for first calculated value•Min eval f(n) selects Max value•The second roll uses different assigned percentage chance•Max eval f(n) selects Max value
COMP-4640: Intelligent & Interactive SystemsGame of Chance: Expecti-minimax
3
0
0
6
6 0 12
9
3 6
3 0 6
9(3*1.0)
•Initial value of leaves indicate board state•Use percentage chance based upon roll for first calculated value•Min eval f(n) selects Max value•The second roll uses different assigned percentage chance•Max eval f(n) selects Max value
COMP-4640: Intelligent & Interactive SystemsGame of Chance: Expecti-minimax
3
0
0
6
6 0 12
9
3 6
3 0 6
9
(0*0.67 + 6*0.33)2
2
(3*1.0)
•Initial value of leaves indicate board state•Use percentage chance based upon roll for first calculated value•Min eval f(n) selects Max value•The second roll uses different assigned percentage chance•Max eval f(n) selects Max value
COMP-4640: Intelligent & Interactive SystemsGame of Chance: Expecti-minimax
3
0
0
6
6 0 12
9
3 6
0 63 0 6
9
3 0 6 12
(0*0.67 + 6*0.33)2
2
(3*1.0)
•Initial value of leaves indicate board state•Use percentage chance based upon roll for first calculated value•Min eval f(n) selects Max value•The second roll uses different assigned percentage chance•Max eval f(n) selects Max value
COMP-4640: Intelligent & Interactive SystemsGame of Chance: Expecti-minimax
3
0
0
6
6 0 12
9
3 6
0 63 0 6
9
3 0 6 12
(0*0.67 + 6*0.33) (0*0.67 + 6*0.33)2 2
22 2
(3*1.0)
•Initial value of leaves indicate board state•Use percentage chance based upon roll for first calculated value•Min eval f(n) selects Max value•The second roll uses different assigned percentage chance•Max eval f(n) selects Max value
Cutting off searchMinimaxCutoff is identical to MinimaxValue except
1. Terminal? is replaced by Cutoff?2. Utility is replaced by Eval
Does it work in practice?bm = 106, b=35 m=4
4-ply lookahead is a hopeless chess player!– 4-ply ≈ human novice– 8-ply ≈ typical PC, human master– 12-ply ≈ Deep Blue, Kasparov
COMP-4640: Deterministic games in practice
• Checkers: Chinook ended 40-year-reign of human world champion Marion Tinsley in 1994. Used a precomputed endgame database defining perfect play for all positions involving 8 or fewer pieces on the board, a total of 444 billion positions.
»• Chess: Deep Blue defeated human world champion Garry Kasparov
in a six-game match in 1997. Deep Blue searches 200 million positions per second, uses very sophisticated evaluation, and undisclosed methods for extending some lines of search up to 40 ply.
• Othello: human champions refuse to compete against computers, who are too good.
• Go: human champions refuse to compete against computers, who are too bad. In go, b > 300, so most programs use pattern knowledge bases to suggest plausible moves.
•••
»
http://www.research.ibm.com/deepblue/