Top Banner
Results Root Parallelization in Monte Carlo Tree Search: Time vs Threads vs Trees Erik Steinmetz, Daniel Boley, and Maria Gini Department of Computer Science and Engineering University of Minnesota Use of the MSI Itasca Cluster Statistics In order to study the performance in terms of winning rates in an adversarial game situation, we use the binomial confidence interval p +/- (1.96) sqrt(p(1-p)/n) where p is the probability of a win and n is the number of trials in the sample. This means that 95% of the time an experiment is run the actual value sought will be within the confidence interval of the value seen in the experiment. To detect an improvement in the software’s performance thus requires a large number of games in a tournament. With winning rates near 50% for example, a 1000 game tournament yields a confidence interval of +/- 3%. Future Directions Quantify relationship between problem diculty and optimal number of nodes to find a best solution Experiment with a dierent number of nodes depending on the turn number Monte Carlo Tree Search (MCTS) is being eectively used in many domains, but acquiring good results from building larger trees takes time that can in many cases be impractical or disadvantageous. Building multiple smaller trees in parallel can improve results without requiring a longer run-time. In this work we compare parallelizing the tree building process using multiple independent trees, called root parallelization, in Monte Carlo Tree Search against using multiple threads, called tree parallelization, against a baseline of longer time. Our experiments used the domain of the game of Go and measured results by looking at the win rates of a parallelized MCTS-based game engine playing in tournaments against other Go game engines. Monte Carlo Tree Search Techniques Go is a non-stochastic perfect-information game played by humans for over 1000 years. Stones are placed on grid intersections Win by controlling the most area on the board Board sizes are 19x19 (full-size), 9x9, and 13x13 Monte Carlo Tree Search (MCTS) techniques were first developed for playing the game of Go. MCTS can build game trees without a static evaluation function. MCTS samples subtrees by playing games to the end, where a winner can be determined, using random moves. The Game of Go Select a tree node to expand using win/loss rates Expand a node, adding one new node to tree Simulate a game to the end with random moves, determine winner Propagate the win or loss to all parent nodes Overview This study looks at parallelizing MCTS by constructing multiple search trees independently of each other and combining the results once the search is complete. This is known as root parallelization. The results are combined with a one-tree, one-vote for final result. Because each tree is built using a stochastic sampling technique, they will dier from each other. Combining the end results of the trees allows a larger sampling without the large overhead of combining the results during tree construction (known as tree parallelization). A third technique, leaf parallelization, only runs the randomized playouts in parallel, and has been found to not scale well. Parallelization of MCTS The Itasca cluster allowed the use of almost 1000 nodes, each with an 8-core CPU and 24 GB of memory. The software was modified to construct a single search tree on each node available, and utilized the OpenMPI interface to combine the results of each tree into a single move choice. The tournament managing software was also able to use multiple nodes in the cluster to run more than one game at a time. At each turn in a game, the software must decide which move to make. Using root parallelization each node in the cluster independently creates a search tree to determine a best move. The move chosen by the most most nodes is considered the overall best move for the current turn. Improvements in the winning rates were recorded over an increasing amount of time, an increasing number of threads, and an increasing number of nodes available. This was done in the domains of the 9x9 game and the more complex 19x19 game. Each game in a tournament consumes about one hour of time. The cluster was used not only to show the eects of an increasing number of nodes on the win rates of the software, but also allowed us to run large tournaments on the order of 1000 games each. This created results with greater statistical significance than previous similar studies. Root-Parallel Player Node 1: D-16 Node 2: C-17 Node 5: D-17 Node 6: D-16 Node 9: D-16 Node 10: D-16 Node 13: C-17 Node 14: C-16 Node 0 Chooses D-16 Tally Winner: D-16 C-16: 3 C-17: 4 D-16: 7 D-17: 1 E-17: 1 Node 3: C-17 Node 4: D-16 Node 7: C-16 Node 11: C-17 Node 15: D-16 Node 8: E-17 Node 12: C-16 Node 16: D-16 50 60 70 1 2 4 8 16 32 64 128 Time Multiplier / Number of Threads / Number of Nodes Winning Rate (%) Nodes Threads Time 40 50 60 70 80 1 2 4 8 16 32 64 128 Time Multiplier / Number of Threads / Number of Nodes Winning Rate (%) Nodes Threads Time Board size 9x9 Fuego vs Pachi Board size 19x19 Fuego vs Pachi While increasing the number of nodes improved performance up through 32 nodes on the 9x9 board before leveling out, it appears as though in the 19x19 game performance was enhanced through 64 nodes.
1

Root Parallelization in Monte Carlo Tree Search: Time vs ...MCTS can build game trees without a static evaluation function. MCTS samples subtrees by playing games to the end, where

Jan 28, 2021

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
SteinmetzBoleyGiniPoster2020.graffleResults
Root Parallelization in Monte Carlo Tree Search: Time vs Threads vs Trees
Erik Steinmetz, Daniel Boley, and Maria Gini Department of Computer Science and Engineering
University of Minnesota
Statistics
In order to study the performance in terms of winning rates in an adversarial game situation, we use the binomial confidence interval p +/- (1.96) sqrt(p(1-p)/n) where p is the probability of a win and n is the number of trials in the sample. This means that 95% of the time an experiment is run the actual value sought will be within the confidence interval of the value seen in the experiment. To detect an improvement in the software’s performance thus requires a large number of games in a tournament. With winning rates near 50% for example, a 1000 game tournament yields a confidence interval of +/- 3%.
Future Directions • Quantify relationship between problem difficulty and optimal number of nodes
to find a best solution • Experiment with a different number of nodes depending on the turn number
Monte Carlo Tree Search (MCTS) is being effectively used in many domains, but acquiring good results from building larger trees takes time that can in many cases be impractical or disadvantageous. Building multiple smaller trees in parallel can improve results without requiring a longer run-time.
In this work we compare parallelizing the tree building process using multiple independent trees, called root parallelization, in Monte Carlo Tree Search against using multiple threads, called tree parallelization, against a baseline of longer time. Our experiments used the domain of the game of Go and measured results by looking at the win rates of a parallelized MCTS-based game engine playing in tournaments against other Go game engines.
Monte Carlo Tree Search Techniques
• Go is a non-stochastic perfect-information game played by humans for over 1000 years.
• Stones are placed on grid intersections • Win by controlling the most area on the board • Board sizes are 19x19 (full-size), 9x9, and 13x13
Monte Carlo Tree Search (MCTS) techniques were first developed for playing the game of Go. MCTS can build game trees without a static evaluation function. MCTS samples subtrees by playing games to the end, where a winner can be determined, using random moves.
The Game of Go
• Select a tree node to expand using win/loss rates
• Expand a node, adding one new node to tree
• Simulate a game to the end with random moves, determine winner
• Propagate the win or loss to all parent nodes

Overview
This study looks at parallelizing MCTS by constructing multiple search trees independently of each other and combining the results once the search is complete. This is known as root parallelization. The results are combined with a one-tree, one-vote for final result. Because each tree is built using a stochastic sampling technique, they will differ from each other. Combining the end results of the trees allows a larger sampling without the large overhead of combining the results during tree construction (known as tree parallelization). A third technique, leaf parallelization, only runs the randomized playouts in parallel, and has been found to not scale well.
Parallelization of MCTS
The Itasca cluster allowed the use of almost 1000 nodes, each with an 8-core CPU and 24 GB of memory. The software was modified to construct a single search tree on each node available, and utilized the OpenMPI interface to combine the results of each tree into a single move choice. The tournament managing software was also able to use multiple nodes in the cluster to run more than one game at a time.
At each turn in a game, the software must decide which move to make. Using root parallelization each node in the cluster independently creates a search tree to determine a best move. The move chosen by the most most nodes is considered the overall best move for the current turn.
Improvements in the winning rates were recorded over an increasing amount of time, an increasing number of threads, and an increasing number of nodes available. This was done in the domains of the 9x9 game and the more complex 19x19 game.
Each game in a tournament consumes about one hour of time. The cluster was used not only to show the effects of an increasing number of nodes on the win rates of the software, but also allowed us to run large tournaments on the order of 1000 games each. This created results with greater statistical significance than previous similar studies.
Root-Parallel Player
Node 0 Chooses D-16
Tally Winner: D-16 C-16: 3 C-17: 4 D-16: 7 D-17: 1 E-17: 1
Node 3: C-17
Node 4: D-16
50
60
70
1 2 4 8 16 32 64 128 Time Multiplier / Number of Threads / Number of Nodes
W in
ni ng
R at
Nodes
Threads
Time
40
50
60
70
80
1 2 4 8 16 32 64 128 Time Multiplier / Number of Threads / Number of Nodes
W in
ni ng
R at
Board size 9x9 Fuego vs Pachi
Board size 19x19 Fuego vs Pachi