SteinmetzBoleyGiniPoster2020.graffleResults
Root Parallelization in Monte Carlo Tree Search: Time vs Threads vs
Trees
Erik Steinmetz, Daniel Boley, and Maria Gini Department of Computer
Science and Engineering
University of Minnesota
Statistics
In order to study the performance in terms of winning rates in an
adversarial game situation, we use the binomial confidence interval
p +/- (1.96) sqrt(p(1-p)/n) where p is the probability of a win and
n is the number of trials in the sample. This means that 95% of the
time an experiment is run the actual value sought will be within
the confidence interval of the value seen in the experiment. To
detect an improvement in the software’s performance thus requires a
large number of games in a tournament. With winning rates near 50%
for example, a 1000 game tournament yields a confidence interval of
+/- 3%.
Future Directions • Quantify relationship between problem
difficulty and optimal number of nodes
to find a best solution • Experiment with a different number of
nodes depending on the turn number
Monte Carlo Tree Search (MCTS) is being effectively used in many
domains, but acquiring good results from building larger trees
takes time that can in many cases be impractical or
disadvantageous. Building multiple smaller trees in parallel can
improve results without requiring a longer run-time.
In this work we compare parallelizing the tree building process
using multiple independent trees, called root parallelization, in
Monte Carlo Tree Search against using multiple threads, called tree
parallelization, against a baseline of longer time. Our experiments
used the domain of the game of Go and measured results by looking
at the win rates of a parallelized MCTS-based game engine playing
in tournaments against other Go game engines.
Monte Carlo Tree Search Techniques
• Go is a non-stochastic perfect-information game played by humans
for over 1000 years.
• Stones are placed on grid intersections • Win by controlling the
most area on the board • Board sizes are 19x19 (full-size), 9x9,
and 13x13
Monte Carlo Tree Search (MCTS) techniques were first developed for
playing the game of Go. MCTS can build game trees without a static
evaluation function. MCTS samples subtrees by playing games to the
end, where a winner can be determined, using random moves.
The Game of Go
• Select a tree node to expand using win/loss rates
• Expand a node, adding one new node to tree
• Simulate a game to the end with random moves, determine
winner
• Propagate the win or loss to all parent nodes
Overview
This study looks at parallelizing MCTS by constructing multiple
search trees independently of each other and combining the results
once the search is complete. This is known as root parallelization.
The results are combined with a one-tree, one-vote for final
result. Because each tree is built using a stochastic sampling
technique, they will differ from each other. Combining the end
results of the trees allows a larger sampling without the large
overhead of combining the results during tree construction (known
as tree parallelization). A third technique, leaf parallelization,
only runs the randomized playouts in parallel, and has been found
to not scale well.
Parallelization of MCTS
The Itasca cluster allowed the use of almost 1000 nodes, each with
an 8-core CPU and 24 GB of memory. The software was modified to
construct a single search tree on each node available, and utilized
the OpenMPI interface to combine the results of each tree into a
single move choice. The tournament managing software was also able
to use multiple nodes in the cluster to run more than one game at a
time.
At each turn in a game, the software must decide which move to
make. Using root parallelization each node in the cluster
independently creates a search tree to determine a best move. The
move chosen by the most most nodes is considered the overall best
move for the current turn.
Improvements in the winning rates were recorded over an increasing
amount of time, an increasing number of threads, and an increasing
number of nodes available. This was done in the domains of the 9x9
game and the more complex 19x19 game.
Each game in a tournament consumes about one hour of time. The
cluster was used not only to show the effects of an increasing
number of nodes on the win rates of the software, but also allowed
us to run large tournaments on the order of 1000 games each. This
created results with greater statistical significance than previous
similar studies.
Root-Parallel Player
Node 0 Chooses D-16
Tally Winner: D-16 C-16: 3 C-17: 4 D-16: 7 D-17: 1 E-17: 1
Node 3: C-17
Node 4: D-16
50
60
70
1 2 4 8 16 32 64 128 Time Multiplier / Number of Threads / Number
of Nodes
W in
ni ng
R at
Nodes
Threads
Time
40
50
60
70
80
1 2 4 8 16 32 64 128 Time Multiplier / Number of Threads / Number
of Nodes
W in
ni ng
R at
Board size 9x9 Fuego vs Pachi
Board size 19x19 Fuego vs Pachi