Transcript
COMP 4106
AI Project: Go
Project Report
Vladimir Menshikov (100840927)
May 15, 2016
Submitted to:
Professor: Dr. John Oommen
TA: Dave McKenney
COMP 4106 Artificial Intelligence
School of Computer Science
Carleton University
1 Introduction
Go is an ancient Chinese board game for 2 players. It involves placing “stones” on an
empty board of 19x19 intersecting lines. Each player puts 1 stone down per turn and tries to
occupy as much territory as possible without being surrounded and captured. There are many
strategies, complex and simple, to increase winning chances. Go is easy to learn and hard to
master. People spend decades improving and mastering this game.
Figure 11: An opening sequence. [1]
There are a few big problems for creating a strong Artificial Intelligence in Go. The main
issue is the number of possibilities per move on a 19x19 board. Every turn there are roughly
362 turn number possible moves, which makes computing deep trees take an extremely long
time. There are 10170 possible legal board combinations. Another problem is evaluating the
strength of a move as there is no clear score at any stage of the game until the very end. While
the board pieces are static and generally don’t get removed throughout the game, their
influence and potential is very dynamic. A move can be very bad initially but in the end appear
as the winning position, while another move can lead to short term advantages and a long term
loss.
Opponent may “pass” and not play a move. Being able to estimate score like a human player is
difficult as the AI does cannot “keep in mind” areas that will definitely survive until the end of
the game.
2 Motivation
I am very interested in Go and Go AI. I have been playing Go for over 10 years,
sometimes competitively, sometimes casually, and have played many AI opponents they are
my favorite. I have seen many interesting plays from weaker and stronger AIs that humans
would not generally make. Beginner human opponents quickly learned from mistakes and
improved as the game progressed while an AI opponent would keep making the same mistakes,
leading to exploitable wins as well as practice on what I should not play like. I never really
understood how it worked when I tried to research it before starting to study computer science
and artificial intelligence. With my new knowledge on the topic I decided to try and implement
an AI opponent for the game I enjoy so much.
3 AI Techniques
I have implemented 4 heuristics. 2 main types of AI heuristics, as well as an improved
version for each of them. Minimax and AlphaBeta, Monte Carlo and UCT (Upper Confidence
bound 1 applied on Trees).
Minimax searches the game’s move tree depth first, playing moves alternatingly for
each player. When the end of the game or a given depth limit is reached the game state is
evaluated and a value is returned. For each node on a branch, if it is a minimizing node (played
by opponent) then the returned value will be of the smallest child node. Similarly, if a node is a
maximizing node (played by the player) then the returned value will be of the biggest child
node. The final selected move at the root will be based on the maximum value of given
minimums (applied recursively).
AlphaBeta pruning is an improved version of Minimax. It adds 2 variables to be passed
down with the recursive calls, alpha (negative infinity) and beta (positive infinity). For a
maximizing node, alpha will grow to a max value (of recursive minimums). For a minimizing
node, beta will be reduced to a min value (of recursive maximums). Both values will slowly
approach each other from their side. If at some node alpha becomes larger than beta, the rest
of that tree branch will be pruned and not be evaluated. This means that if the available highest
value given by the opponent is lower than a maximum you are receiving for that node, there is
no need to continue exploring down that path as your score will not be any higher. In practice
this is significantly faster than Minimax while still having the same worstcase runtime.
Monte Carlo search is based on random game processing. Initially the game tree is
explored to some depth where it will begin simulating random games for all nodes until the
game is over. At the final state of a simulation a value is determined based on a win or a loss. 1
is returned in case of a win and 0 for a loss. The value is back propagated, starting to update the
win rate for the node that was simulated and its parents recursively until it reaches root. The
move with the highest win rate is then selected to be played.
UCT is an improved version of Monte Carlo. It creates a balance between exploration
and exploitation of nodes. Instead of having the simulated nodes be completely random, it
selects a node to explore and expand based on UCT Value. The formula for UCT Value is:
v = winrate + sqrt( ln(parent.visits) / visits ).
This means higher win rate moves will be picked for simulation more frequently while allowing
less winning moves to be selected periodically. If a lower win rate move simulation wins then it
may increase in priority enough to be expanded and explored more. For a selected node, if it
has a child that was not yet simulated then it will be simulated first bypassing the UCT Value
check.
4 Design and Implementation
This section will describe how I implemented each heuristic for compatibility with Go,
how the game is represented, and the features implemented. The board size used for most
testing is 9x9, a standard smaller alternative to the 19x19 board. For terminology on the game
of Go used below refer to Appendix B at the end of this report. For snippets of AI games refer to
Appendix C.
4.1 Heuristic Implementation
Minimax is implemented with a depth search limit and a growing max node visit limit.
Due to being very slow on the Go board the search should not exceed 2 ply. This is the default
limit and can be configured in the program based on board size. Node visit limit was
implemented for the same reason. It can take several minutes to explore all the moves during
the opening, this makes it play sooner. The limit will grow for the next move every time it is
reached, since if reached future moves will be very bad at 0 search depth. This allows for a
stronger mid and endgame approach while not sacrificing early game too much.
AlphaBeta pruning is also implemented with a depth search limit and a growing max
node visit limit. AlphaBeta’s search depth setting is generally 2 higher than Minimax since the
search is significantly faster. While the node visit limit is implemented it rarely reaches it.
Monte Carlo search will begin node simulation from the children of root without
exploring down further due to the already large number of moves. Monte carlo search has a
nongrowing node visit limit but it resets every call. Every node will be simulated at least once
regardless of the limit. This means that once every node is simulated there will be another
simulation iteration if the limit is not reached. This leads to a significantly stronger endgame
with a fairly random opening.
UCT is implemented similarly to Monte Carlo. Both share the random game simulation
function. The node visit limit is implemented differently. The search will terminate upon
reaching the limit, returning the highest win rate move found so far. During the game opening
not all original child moves may be simulated, therefore there is no expansion of higherwinning
nodes. There are too many moves to simulate as it is.
For all searches, the values for depth and max node visits per board size variation are
selected manually in order for the game to progress in a timely manner. Generally a move
should be made within a few seconds.
4.2 Game Representation
Representing state for the game of Go to an AI is very complex compared to the
simplicity of the game state from a human perspective. A good game state representation and
move generation must be implemented for faster searches.
4.2.1 Basic Features
● Board. Represented as a 2D array of numbers. 0 empty, 1 black, 2 white.
● Move: Represented as a 2D coordinate on the board with a color.
Move and Board are the most important objects of the game and they are passed to
many functions together. Everything is done on the board with respect to the move. For
example finding neighboring allies or enemies, finding eyes, counting liberties, etc.
● End of a game: Based on black and white making a “pass” move successively which skips
their turn. A pass move has the color 1.
● Move generation: A played move must be legal. According the GNU Go[2] documentation
a legal move is not a suicide move, a suicide is determined as
1. There is no neighboring empty intersection
2. There is no nearby enemy group with exactly 1 liberty.
3. There is no nearby ally group with more than 1 liberty.
A move should also not be a KO.
The legal move function is used to verify all moves first, including user made moves.
● KO: A KO is defined as capturing 1 stone. the captured location is not playable for a
single turn. which then prevents the opponent from capturing back.
● Capturing stones: if a group of stones has 0 liberties then it is captured and removed
from the game.
● Score: Scoring is done based on the chinese ruleset where each played stone also gives a
point. Each captured stone gives a point. Each point of territory gives a point. In case of
a draw white will be given half a point komi to make up for the disadvantage of playing
second. White will win if the game results in a draw.
● Player Input: a human player can be selected to play.
● Move evaluation: the value of a game state is given by the score of the player minus the
score of the opponent.
● Territory and Eye detection: In my implementation territory is defined as an eye. an
empty spot surrounded by 4 stones of the same color. Board edges are accounted for, a
corner eye only needs 2 stones.
4.2.2 Extra features
● Efficient move: a function that will perform further checks on move generation and
prevent it from doing moves that would be considered bad, such as filling own eyes
which eventually leads to suicide.
● Group management
○ Liberty counting: a group can find all its liberties. determining their location and
total number.
■ Atari detection: if a group has 1 liberty it is in atari.
○ Group size.
○ Group merging: if a stone is played nearby merge it with an existing group.
■ If a stone connects multiple groups, merge them all into 1.
○ Option to print all group information per turn is selectable in the main menu.
● Node counting for AI.
● Notification when an AI finds a better move (Minimax/AB)
● Print Information on the winrate of a move (Monte Carlo/UCT)
○ UCT Value of a move.
● Select one of the six predefined game settings for convenience and select which AI will
play each other in the main menu.
5 Results
5.1 Minimax and AlphaBeta
● Table 1. Board size 9x9. Average node count.
Algorithm (depth) First Move (node visits) Total Moves (node visits)
Minimax (2) 500k 6m
AlphaBeta (2) 40k 600k
AlphaBeta (4) 4m 330m
● Table 2. Board size 5x5. Average node count.
Algorithm (depth) First Move (node visits) Total Moves (node visits)
Minimax (2) 13k 60k
Minimax (4) 6m 16m
AlphaBeta (2) 3k 10k
AlphaBeta (4) 100k 400k
AlphaBeta (6) 7m 20m
AlphaBeta pruning is significantly faster. In games where I want both algorithms to
produce moves at roughly the same pace AB can easily be set to a higher depth. AB also wins
more frequently than Minimax. Minimax is impractical to use in a realistic scenario. A running
example can be found in Appendix C.
5.2 Monte Carlo and UCT
Both algorithms are set to the same node limits. From my observations the opening
game play is random enough and neither algorithm has an advantage. There seem to be a
threshold node limit past which UCT will be winning more in the endgame. For a 9x9 board the
threshold appears to be 40k nodes per move past which UCT generally performs more
consistently and better than Monte Carlo. At lower node counts the outcome appears to be
very random and UCT may rely on a poorly tested move which appears to be winning a lot due
to not simulating enough games. By default the node limit is set to 50k which results in 1 move
per second. Spectating many game simulations of those algorithms remind me of how 2
beginners would play. It is amazing how efficient and realistic those random algorithms are.
5.3 AlphaBeta vs Monte Carlo and UCT
At a small board size such as 5x5 AlphaBeta can compete with the random algorithms
and win since it can search deep enough to provide good results. However, for 9x9 and
anything larger AlphaBeta cannot search deep enough in a timely manner. Monte Carlo and
UCT will be winning very frequently.
6 Difficulties and Future Work
The main problem I encountered during simulations is MultiKO. It happens when there
are multiple single stones that can be captured are being captured successively by each player,
resulting in a infinite loop of the same game state. In human play each player would simply fill
in one of his own KOs and the game would continue normally, it was never an issue. I have tried
to generate moves that would fill in multiKOs but the only success so far is detecting if a
multiKO is happening. It would require an expansion on the link between move generation and
KO states. Currently I was unable to find a solution.
The territory counting mechanism could be improved to be applicable to larger areas.
This would require an influence map that would be generated based on the board state. Moves
will not be played in areas of high influence and score would be counted for it.
The main improvement that will increase the performance of all algorithms will be
improving move generation. According to GNU Go[2] documentation moves are generated
based on the need to attack, defend, expand, board symmetry, etc. Increasing the initial
efficiency of each move and reducing the total amount of moves to be simulated will greatly
improve the performance of any algorithm. Implementing an influence map would also greatly
improve move generation and selection while not being too difficult of a task.
7 References
[1]Figure 1 by “361points”, “Beyond Joseki”
http://361points.com/articles/20/1/
[2]GNU Go Documentation,
https://www.gnu.org/software/gnugo/gnugo_toc.html
Florin Chelaru, “Artificial Intelligence in Computer Go”
https://www.cs.umd.edu/~florinc/files/2008_06_11_BSc_Thesis.pdf
Atomic Object, “Monte Carlo Tree Search for Game AI”
https://spin.atomicobject.com/2015/12/12/montecarlotreesearchalgorithmgameai/
Class notes on Intelligent Game Playing.
Personal GO experience. 1 dan.
Appendix A Running the program
The Java application can be ran from a command line by typing:
java jar ProjectGo.jar
Alternatively a batch file is provided to run it.
The program will greet you with game mode selection followed by AI selection and
group info display selection. Enter a number followed by enter according to the given values.
The game will be played and print a lot of information. When done it will ask if you want to run
a new simulation. Refer to Appendix C for screenshots of output.
Source code is also clear and well documented.
Appendix B Terminology
● GNU Go A free Go program that is developed by the public.
● Pass A player can pass his turn, which indicates he is ready to end the game. If both
players pass consecutively the game is over and the scoring is made.
● Stone A playing piece. Can be black or white.
● Group A group consists of 1 or more stones.
● Liberty An empty intersection. Can be thought of as “life” points.
● Atari A group with only 1 liberty next to it.
● Suicide Playing a move which will make you automatically captured.
● Captured Group A group is captured by the opponent and removed from play if it has 0
liberties as a result of a move. Self capturing is suicide and not allowed.
● KO A ruling to prevent moves that will revert the board to its previous state.
● Territory An area surrounded by stones of 1 color.
● Eye A small territory. If a group has 2 eyes it can never be captured.
● Score Determined by adding stones on the board, captured stones and territory.
● Komi Bonus points given to white for playing second. Involves a 0.5 fraction to break
ties. Officially the komi on a 19x19 board is 6.5 points.
Appendix C Simulation examples
● Main Menu.
● AB playing each other with depth 4. Moves not shuffled initially (mode 1).
● Monte Carlo is playing vs UCT. 30k nodes per move.
top related