Project GO

COMP 4106

AI Project: Go

Project Report

Vladimir Menshikov (100840927)

May 15, 2016

Submitted to:

Professor: Dr. John Oommen

TA: Dave McKenney

COMP 4106 Artificial Intelligence

School of Computer Science

Carleton University

1 Introduction

Go is an ancient Chinese board game for 2 players. It involves placing “stones” on an

empty board of 19x19 intersecting lines. Each player puts 1 stone down per turn and tries to

occupy as much territory as possible without being surrounded and captured. There are many

strategies, complex and simple, to increase winning chances. Go is easy to learn and hard to

master. People spend decades improving and mastering this game.

Figure 11: An opening sequence. [1]

There are a few big problems for creating a strong Artificial Intelligence in Go. The main

issue is the number of possibilities per move on a 19x19 board. Every turn there are roughly

362 turn number possible moves, which makes computing deep trees take an extremely long

time. There are 10170 possible legal board combinations. Another problem is evaluating the

strength of a move as there is no clear score at any stage of the game until the very end. While

the board pieces are static and generally don’t get removed throughout the game, their

influence and potential is very dynamic. A move can be very bad initially but in the end appear

as the winning position, while another move can lead to short term advantages and a long term

loss.

Opponent may “pass” and not play a move. Being able to estimate score like a human player is

difficult as the AI does cannot “keep in mind” areas that will definitely survive until the end of

the game.

2 Motivation

I am very interested in Go and Go AI. I have been playing Go for over 10 years,

sometimes competitively, sometimes casually, and have played many AI opponents they are

my favorite. I have seen many interesting plays from weaker and stronger AIs that humans

would not generally make. Beginner human opponents quickly learned from mistakes and

improved as the game progressed while an AI opponent would keep making the same mistakes,

leading to exploitable wins as well as practice on what I should not play like. I never really

understood how it worked when I tried to research it before starting to study computer science

and artificial intelligence. With my new knowledge on the topic I decided to try and implement

an AI opponent for the game I enjoy so much.

3 AI Techniques

I have implemented 4 heuristics. 2 main types of AI heuristics, as well as an improved

version for each of them. Minimax and AlphaBeta, Monte Carlo and UCT (Upper Confidence

bound 1 applied on Trees).

Minimax searches the game’s move tree depth first, playing moves alternatingly for

each player. When the end of the game or a given depth limit is reached the game state is

evaluated and a value is returned. For each node on a branch, if it is a minimizing node (played

by opponent) then the returned value will be of the smallest child node. Similarly, if a node is a

maximizing node (played by the player) then the returned value will be of the biggest child

node. The final selected move at the root will be based on the maximum value of given

minimums (applied recursively).

AlphaBeta pruning is an improved version of Minimax. It adds 2 variables to be passed

down with the recursive calls, alpha (negative infinity) and beta (positive infinity). For a

maximizing node, alpha will grow to a max value (of recursive minimums). For a minimizing

node, beta will be reduced to a min value (of recursive maximums). Both values will slowly

approach each other from their side. If at some node alpha becomes larger than beta, the rest

of that tree branch will be pruned and not be evaluated. This means that if the available highest

value given by the opponent is lower than a maximum you are receiving for that node, there is

no need to continue exploring down that path as your score will not be any higher. In practice

this is significantly faster than Minimax while still having the same worstcase runtime.

Monte Carlo search is based on random game processing. Initially the game tree is

explored to some depth where it will begin simulating random games for all nodes until the

game is over. At the final state of a simulation a value is determined based on a win or a loss. 1

is returned in case of a win and 0 for a loss. The value is back propagated, starting to update the

win rate for the node that was simulated and its parents recursively until it reaches root. The

move with the highest win rate is then selected to be played.

UCT is an improved version of Monte Carlo. It creates a balance between exploration

and exploitation of nodes. Instead of having the simulated nodes be completely random, it

selects a node to explore and expand based on UCT Value. The formula for UCT Value is:

v = winrate + sqrt( ln(parent.visits) / visits ).

This means higher win rate moves will be picked for simulation more frequently while allowing

less winning moves to be selected periodically. If a lower win rate move simulation wins then it

may increase in priority enough to be expanded and explored more. For a selected node, if it

has a child that was not yet simulated then it will be simulated first bypassing the UCT Value

check.

4 Design and Implementation

This section will describe how I implemented each heuristic for compatibility with Go,

how the game is represented, and the features implemented. The board size used for most

testing is 9x9, a standard smaller alternative to the 19x19 board. For terminology on the game

of Go used below refer to Appendix B at the end of this report. For snippets of AI games refer to

Appendix C.

4.1 Heuristic Implementation

Minimax is implemented with a depth search limit and a growing max node visit limit.

Due to being very slow on the Go board the search should not exceed 2 ply. This is the default

limit and can be configured in the program based on board size. Node visit limit was

implemented for the same reason. It can take several minutes to explore all the moves during

the opening, this makes it play sooner. The limit will grow for the next move every time it is

reached, since if reached future moves will be very bad at 0 search depth. This allows for a

stronger mid and endgame approach while not sacrificing early game too much.

AlphaBeta pruning is also implemented with a depth search limit and a growing max

node visit limit. AlphaBeta’s search depth setting is generally 2 higher than Minimax since the

search is significantly faster. While the node visit limit is implemented it rarely reaches it.

Monte Carlo search will begin node simulation from the children of root without

exploring down further due to the already large number of moves. Monte carlo search has a

nongrowing node visit limit but it resets every call. Every node will be simulated at least once

regardless of the limit. This means that once every node is simulated there will be another

simulation iteration if the limit is not reached. This leads to a significantly stronger endgame

with a fairly random opening.

UCT is implemented similarly to Monte Carlo. Both share the random game simulation

function. The node visit limit is implemented differently. The search will terminate upon

reaching the limit, returning the highest win rate move found so far. During the game opening

not all original child moves may be simulated, therefore there is no expansion of higherwinning

nodes. There are too many moves to simulate as it is.

For all searches, the values for depth and max node visits per board size variation are

selected manually in order for the game to progress in a timely manner. Generally a move

should be made within a few seconds.

4.2 Game Representation

Representing state for the game of Go to an AI is very complex compared to the

simplicity of the game state from a human perspective. A good game state representation and

move generation must be implemented for faster searches.

4.2.1 Basic Features

● Board. Represented as a 2D array of numbers. 0 empty, 1 black, 2 white.

● Move: Represented as a 2D coordinate on the board with a color.

Move and Board are the most important objects of the game and they are passed to

many functions together. Everything is done on the board with respect to the move. For

example finding neighboring allies or enemies, finding eyes, counting liberties, etc.

● End of a game: Based on black and white making a “pass” move successively which skips

their turn. A pass move has the color 1.

● Move generation: A played move must be legal. According the GNU Go[2] documentation

a legal move is not a suicide move, a suicide is determined as

1. There is no neighboring empty intersection

2. There is no nearby enemy group with exactly 1 liberty.

3. There is no nearby ally group with more than 1 liberty.

A move should also not be a KO.

The legal move function is used to verify all moves first, including user made moves.

● KO: A KO is defined as capturing 1 stone. the captured location is not playable for a

single turn. which then prevents the opponent from capturing back.

● Capturing stones: if a group of stones has 0 liberties then it is captured and removed

from the game.

● Score: Scoring is done based on the chinese ruleset where each played stone also gives a

point. Each captured stone gives a point. Each point of territory gives a point. In case of

a draw white will be given half a point komi to make up for the disadvantage of playing

second. White will win if the game results in a draw.

● Player Input: a human player can be selected to play.

● Move evaluation: the value of a game state is given by the score of the player minus the

score of the opponent.

● Territory and Eye detection: In my implementation territory is defined as an eye. an

empty spot surrounded by 4 stones of the same color. Board edges are accounted for, a

corner eye only needs 2 stones.

4.2.2 Extra features

● Efficient move: a function that will perform further checks on move generation and

prevent it from doing moves that would be considered bad, such as filling own eyes

which eventually leads to suicide.

● Group management

○ Liberty counting: a group can find all its liberties. determining their location and

total number.

■ Atari detection: if a group has 1 liberty it is in atari.

○ Group size.

○ Group merging: if a stone is played nearby merge it with an existing group.

■ If a stone connects multiple groups, merge them all into 1.

○ Option to print all group information per turn is selectable in the main menu.

● Node counting for AI.

● Notification when an AI finds a better move (Minimax/AB)

● Print Information on the winrate of a move (Monte Carlo/UCT)

○ UCT Value of a move.

● Select one of the six predefined game settings for convenience and select which AI will

play each other in the main menu.

5 Results

5.1 Minimax and AlphaBeta

● Table 1. Board size 9x9. Average node count.

Algorithm (depth) First Move (node visits) Total Moves (node visits)

Minimax (2) 500k 6m

AlphaBeta (2) 40k 600k

AlphaBeta (4) 4m 330m

● Table 2. Board size 5x5. Average node count.

Algorithm (depth) First Move (node visits) Total Moves (node visits)

Minimax (2) 13k 60k

Minimax (4) 6m 16m



AlphaBeta (6) 7m 20m

AlphaBeta pruning is significantly faster. In games where I want both algorithms to

produce moves at roughly the same pace AB can easily be set to a higher depth. AB also wins

more frequently than Minimax. Minimax is impractical to use in a realistic scenario. A running

example can be found in Appendix C.

5.2 Monte Carlo and UCT

Both algorithms are set to the same node limits. From my observations the opening

game play is random enough and neither algorithm has an advantage. There seem to be a

threshold node limit past which UCT will be winning more in the endgame. For a 9x9 board the

threshold appears to be 40k nodes per move past which UCT generally performs more

consistently and better than Monte Carlo. At lower node counts the outcome appears to be

very random and UCT may rely on a poorly tested move which appears to be winning a lot due

to not simulating enough games. By default the node limit is set to 50k which results in 1 move

per second. Spectating many game simulations of those algorithms remind me of how 2

beginners would play. It is amazing how efficient and realistic those random algorithms are.

5.3 AlphaBeta vs Monte Carlo and UCT

At a small board size such as 5x5 AlphaBeta can compete with the random algorithms

and win since it can search deep enough to provide good results. However, for 9x9 and

anything larger AlphaBeta cannot search deep enough in a timely manner. Monte Carlo and

UCT will be winning very frequently.

6 Difficulties and Future Work

The main problem I encountered during simulations is MultiKO. It happens when there

are multiple single stones that can be captured are being captured successively by each player,

resulting in a infinite loop of the same game state. In human play each player would simply fill

in one of his own KOs and the game would continue normally, it was never an issue. I have tried

to generate moves that would fill in multiKOs but the only success so far is detecting if a

multiKO is happening. It would require an expansion on the link between move generation and

KO states. Currently I was unable to find a solution.

The territory counting mechanism could be improved to be applicable to larger areas.

This would require an influence map that would be generated based on the board state. Moves

will not be played in areas of high influence and score would be counted for it.

The main improvement that will increase the performance of all algorithms will be

improving move generation. According to GNU Go[2] documentation moves are generated

based on the need to attack, defend, expand, board symmetry, etc. Increasing the initial

efficiency of each move and reducing the total amount of moves to be simulated will greatly

improve the performance of any algorithm. Implementing an influence map would also greatly

improve move generation and selection while not being too difficult of a task.

7 References

[1]Figure 1 by “361points”, “Beyond Joseki”

http://361points.com/articles/20/1/

[2]GNU Go Documentation,

https://www.gnu.org/software/gnugo/gnugo_toc.html

Florin Chelaru, “Artificial Intelligence in Computer Go”

https://www.cs.umd.edu/~florinc/files/2008_06_11_BSc_Thesis.pdf

Atomic Object, “Monte Carlo Tree Search for Game AI”

https://spin.atomicobject.com/2015/12/12/montecarlotreesearchalgorithmgameai/

Class notes on Intelligent Game Playing.

Personal GO experience. 1 dan.

Appendix A Running the program

The Java application can be ran from a command line by typing:

java jar ProjectGo.jar

Alternatively a batch file is provided to run it.

The program will greet you with game mode selection followed by AI selection and

group info display selection. Enter a number followed by enter according to the given values.

The game will be played and print a lot of information. When done it will ask if you want to run

a new simulation. Refer to Appendix C for screenshots of output.

Source code is also clear and well documented.

Appendix B Terminology

● GNU Go A free Go program that is developed by the public.

● Pass A player can pass his turn, which indicates he is ready to end the game. If both

players pass consecutively the game is over and the scoring is made.

● Stone A playing piece. Can be black or white.

● Group A group consists of 1 or more stones.

● Liberty An empty intersection. Can be thought of as “life” points.

http://361points.com/articles/20/1/

https://www.gnu.org/software/gnugo/gnugo_toc.html

https://www.cs.umd.edu/~florinc/files/2008_06_11_BSc_Thesis.pdf

https://spin.atomicobject.com/2015/12/12/monte-carlo-tree-search-algorithm-game-ai/

● Atari A group with only 1 liberty next to it.

● Suicide Playing a move which will make you automatically captured.

● Captured Group A group is captured by the opponent and removed from play if it has 0

liberties as a result of a move. Self capturing is suicide and not allowed.

● KO A ruling to prevent moves that will revert the board to its previous state.

● Territory An area surrounded by stones of 1 color.

● Eye A small territory. If a group has 2 eyes it can never be captured.

● Score Determined by adding stones on the board, captured stones and territory.

● Komi Bonus points given to white for playing second. Involves a 0.5 fraction to break

ties. Officially the komi on a 19x19 board is 6.5 points.

Appendix C Simulation examples

● Main Menu.

● AB playing each other with depth 4. Moves not shuffled initially (mode 1).

● Monte Carlo is playing vs UCT. 30k nodes per move.

Project GO

Documents