Root Parallelization in Monte Carlo Tree Search: Time vs ...MCTS can build game trees without a static evaluation function. MCTS samples subtrees by playing games to the end, where

SteinmetzBoleyGiniPoster2020.graffleResults
Root Parallelization in Monte Carlo Tree Search: Time vs Threads vs Trees
Erik Steinmetz, Daniel Boley, and Maria Gini Department of Computer Science and Engineering
University of Minnesota
Statistics
In order to study the performance in terms of winning rates in an adversarial game situation, we use the binomial confidence interval p +/- (1.96) sqrt(p(1-p)/n) where p is the probability of a win and n is the number of trials in the sample. This means that 95% of the time an experiment is run the actual value sought will be within the confidence interval of the value seen in the experiment. To detect an improvement in the software’s performance thus requires a large number of games in a tournament. With winning rates near 50% for example, a 1000 game tournament yields a confidence interval of +/- 3%.
Future Directions • Quantify relationship between problem difficulty and optimal number of nodes
to find a best solution • Experiment with a different number of nodes depending on the turn number
Monte Carlo Tree Search (MCTS) is being effectively used in many domains, but acquiring good results from building larger trees takes time that can in many cases be impractical or disadvantageous. Building multiple smaller trees in parallel can improve results without requiring a longer run-time.
In this work we compare parallelizing the tree building process using multiple independent trees, called root parallelization, in Monte Carlo Tree Search against using multiple threads, called tree parallelization, against a baseline of longer time. Our experiments used the domain of the game of Go and measured results by looking at the win rates of a parallelized MCTS-based game engine playing in tournaments against other Go game engines.
Monte Carlo Tree Search Techniques
• Go is a non-stochastic perfect-information game played by humans for over 1000 years.
• Stones are placed on grid intersections • Win by controlling the most area on the board • Board sizes are 19x19 (full-size), 9x9, and 13x13
Monte Carlo Tree Search (MCTS) techniques were first developed for playing the game of Go. MCTS can build game trees without a static evaluation function. MCTS samples subtrees by playing games to the end, where a winner can be determined, using random moves.
The Game of Go
• Select a tree node to expand using win/loss rates
• Expand a node, adding one new node to tree
• Simulate a game to the end with random moves, determine winner
• Propagate the win or loss to all parent nodes

Overview
This study looks at parallelizing MCTS by constructing multiple search trees independently of each other and combining the results once the search is complete. This is known as root parallelization. The results are combined with a one-tree, one-vote for final result. Because each tree is built using a stochastic sampling technique, they will differ from each other. Combining the end results of the trees allows a larger sampling without the large overhead of combining the results during tree construction (known as tree parallelization). A third technique, leaf parallelization, only runs the randomized playouts in parallel, and has been found to not scale well.
Parallelization of MCTS
The Itasca cluster allowed the use of almost 1000 nodes, each with an 8-core CPU and 24 GB of memory. The software was modified to construct a single search tree on each node available, and utilized the OpenMPI interface to combine the results of each tree into a single move choice. The tournament managing software was also able to use multiple nodes in the cluster to run more than one game at a time.
At each turn in a game, the software must decide which move to make. Using root parallelization each node in the cluster independently creates a search tree to determine a best move. The move chosen by the most most nodes is considered the overall best move for the current turn.
Improvements in the winning rates were recorded over an increasing amount of time, an increasing number of threads, and an increasing number of nodes available. This was done in the domains of the 9x9 game and the more complex 19x19 game.
Each game in a tournament consumes about one hour of time. The cluster was used not only to show the effects of an increasing number of nodes on the win rates of the software, but also allowed us to run large tournaments on the order of 1000 games each. This created results with greater statistical significance than previous similar studies.
Root-Parallel Player
Node 0 Chooses D-16
Tally Winner: D-16 C-16: 3 C-17: 4 D-16: 7 D-17: 1 E-17: 1
Node 3: C-17
Node 4: D-16
50
60
70
1 2 4 8 16 32 64 128 Time Multiplier / Number of Threads / Number of Nodes
W in
ni ng
R at
Nodes
Threads
Time
40
50
60
70
80
1 2 4 8 16 32 64 128 Time Multiplier / Number of Threads / Number of Nodes
W in
ni ng
R at
Board size 9x9 Fuego vs Pachi
Board size 19x19 Fuego vs Pachi

Root Parallelization in Monte Carlo Tree Search: Time vs ...MCTS can build game trees without a static evaluation function. MCTS samples subtrees by playing games to the end, where

Documents

Root Parallelization in Monte Carlo Tree Search: Time vs ...MCTS can build game trees without a static evaluation function. MCTS samples subtrees by playing games to the end, where