GoGoGo: Improving Deep Neural Network Based Go Playing AI with Residual Networks Introduction Xingyu Liu CONV3, 64 CONV3, 64 Input (19x19x5) CONV3, 64 CONV3, 64 CONV3, 64 CONV3, 64 CONV3, 64 CONV3, 64 CONV3, 64 CONV3, 64 CONV3, 64 CONV3, 64 Output CONV3 64 P Fig. 2 (a) Policy Network CONV3, 64 CONV3, 64 Input (19x19x5) CONV3, 64 CONV3, 64 CONV3, 64 CONV3, 64 CONV3, 64 CONV3, 64 CONV3, 64 CONV3, 64 CONV3, 64 CONV3, 64 CONV3 64 (b) Value Network Network Architecture Experiment Result • Go playing AIs using Traditional Search: GNU Go, Pachi, Fuego, Zen etc. Training Methodology and Data • From Kifu to Input Feature Maps Channels: 1) Space Positions; 2) Black Positions; 3) White Positions; 4) Current Player; 5) Ko Positions [1] David Silver et al., “Mastering the game of go with deep neural networks and tree search”, Nature, 529:484–503, 2016. [2] Kaiming He et al, “Deep residual learning for image recognition”, CoRR, abs/1512.03385, 2015. SL on Policy Network RL on Policy Network RL on Value Network • Dynamic Board State Expansion Ko fight performing. Saves disk space. Small Mem • Use Ing Chang-ki rule Board State + Ko is Game State, No need to remember the number of captured stones • Two Levels of Batches (Kifus, moves) Random Shuffling. Mem usage small and locality. • Powered by Deep Learning: Zen → Deep Zen Go, darkforest , AlphaGo • Goal: From by Vanilla CNN to ResNets CONV3 Batch Norm ReLU CONV3 Batch Norm ReLU Data Eltwise Add (c) Residual Module FC23104 1 Hyperparameters Value Base learning rate 2E-4 Decay Policy Exp Decay Rate 0.95 Decay Step (kifu) 200 Loss Function Softmax (b) Hyperparameters Future Work Fig 1. Ko fight explicity expansion • Training Accuracy ~ 32% • Testing Accuracy ~ 26% Fig. 3 GoGoGo plays against itself, policy network only 0 1 2 3 4 5 6 7 1 183 365 547 729 911 1093 1275 1457 1639 1821 2003 2185 2367 2549 2731 2913 3095 3277 3459 3641 3823 4005 4187 4369 4551 4733 4915 5097 5279 5461 5643 5825 6007 6189 6371 6553 6735 6917 7099 7281 7463 7645 7827 8009 8191 8373 8555 8737 Supervised Learning Training Loss /100 batches • Monte Carlo Tree Search = argmax ( , + ( , )) (, ) ∝ (, ) 1 + (, ) • Reinforcement Learning of Value Network • Network Architecture Exploration • Real Match Testing against Human Players