Top Banner
Monte-Carlo Tree Search (MCTS) for Computer Go Bruno Bouzy [email protected] Université Paris Descartes AOA class
103

Monte-Carlo Tree Search (MCTS) for Computer Gobouzy/Doc/AA2/MCTSGo-Bouzy.pdf · Monte-Carlo Tree Search (MCTS) for Computer Go Bruno Bouzy [email protected] Université

Sep 14, 2018

Download

Documents

truongnhan
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Monte-Carlo Tree Search (MCTS) for Computer Gobouzy/Doc/AA2/MCTSGo-Bouzy.pdf · Monte-Carlo Tree Search (MCTS) for Computer Go Bruno Bouzy bruno.bouzy@parisdescartes.fr Université

Monte-Carlo Tree Search (MCTS) for Computer Go

Bruno [email protected]

Université Paris Descartes

AOA class

Page 2: Monte-Carlo Tree Search (MCTS) for Computer Gobouzy/Doc/AA2/MCTSGo-Bouzy.pdf · Monte-Carlo Tree Search (MCTS) for Computer Go Bruno Bouzy bruno.bouzy@parisdescartes.fr Université

MCTS for Computer Go 2

Outline

● The game of Go: a 9x9 game● The « old » approach (*-2002)● The Monte-Carlo approach (2002-2005)● The MCTS approach (2006-today)● Conclusion

Page 3: Monte-Carlo Tree Search (MCTS) for Computer Gobouzy/Doc/AA2/MCTSGo-Bouzy.pdf · Monte-Carlo Tree Search (MCTS) for Computer Go Bruno Bouzy bruno.bouzy@parisdescartes.fr Université

MCTS for Computer Go 3

The game of Go

Page 4: Monte-Carlo Tree Search (MCTS) for Computer Gobouzy/Doc/AA2/MCTSGo-Bouzy.pdf · Monte-Carlo Tree Search (MCTS) for Computer Go Bruno Bouzy bruno.bouzy@parisdescartes.fr Université

MCTS for Computer Go 4

The game of Go

● 4000 years● Originated from China● Developed by Japan (20th century)● Best players in Korea, Japan, China● 19x19: official board size● 9x9: beginners' board size

Page 5: Monte-Carlo Tree Search (MCTS) for Computer Gobouzy/Doc/AA2/MCTSGo-Bouzy.pdf · Monte-Carlo Tree Search (MCTS) for Computer Go Bruno Bouzy bruno.bouzy@parisdescartes.fr Université

MCTS for Computer Go 5

A 9x9 game● The board has 81 « intersections ». Initially,it

is empty.

Page 6: Monte-Carlo Tree Search (MCTS) for Computer Gobouzy/Doc/AA2/MCTSGo-Bouzy.pdf · Monte-Carlo Tree Search (MCTS) for Computer Go Bruno Bouzy bruno.bouzy@parisdescartes.fr Université

MCTS for Computer Go 6

A 9x9 game● Black moves first. A « stone » is played on

an intersection.

Page 7: Monte-Carlo Tree Search (MCTS) for Computer Gobouzy/Doc/AA2/MCTSGo-Bouzy.pdf · Monte-Carlo Tree Search (MCTS) for Computer Go Bruno Bouzy bruno.bouzy@parisdescartes.fr Université

MCTS for Computer Go 7

A 9x9 game

● White moves second.

Page 8: Monte-Carlo Tree Search (MCTS) for Computer Gobouzy/Doc/AA2/MCTSGo-Bouzy.pdf · Monte-Carlo Tree Search (MCTS) for Computer Go Bruno Bouzy bruno.bouzy@parisdescartes.fr Université

MCTS for Computer Go 8

A 9x9 game● Moves alternate between Black and White.

Page 9: Monte-Carlo Tree Search (MCTS) for Computer Gobouzy/Doc/AA2/MCTSGo-Bouzy.pdf · Monte-Carlo Tree Search (MCTS) for Computer Go Bruno Bouzy bruno.bouzy@parisdescartes.fr Université

MCTS for Computer Go 9

A 9x9 game● Two adjacent stones of the same color

builds a « string » with « liberties ».● 4-adjacency

Page 10: Monte-Carlo Tree Search (MCTS) for Computer Gobouzy/Doc/AA2/MCTSGo-Bouzy.pdf · Monte-Carlo Tree Search (MCTS) for Computer Go Bruno Bouzy bruno.bouzy@parisdescartes.fr Université

MCTS for Computer Go 10

A 9x9 game

● Strings are created.

Page 11: Monte-Carlo Tree Search (MCTS) for Computer Gobouzy/Doc/AA2/MCTSGo-Bouzy.pdf · Monte-Carlo Tree Search (MCTS) for Computer Go Bruno Bouzy bruno.bouzy@parisdescartes.fr Université

MCTS for Computer Go 11

A 9x9 game

● A white stone is in « atari » (one liberty).

Page 12: Monte-Carlo Tree Search (MCTS) for Computer Gobouzy/Doc/AA2/MCTSGo-Bouzy.pdf · Monte-Carlo Tree Search (MCTS) for Computer Go Bruno Bouzy bruno.bouzy@parisdescartes.fr Université

MCTS for Computer Go 12

A 9x9 game

● The white string has five liberties.

Page 13: Monte-Carlo Tree Search (MCTS) for Computer Gobouzy/Doc/AA2/MCTSGo-Bouzy.pdf · Monte-Carlo Tree Search (MCTS) for Computer Go Bruno Bouzy bruno.bouzy@parisdescartes.fr Université

MCTS for Computer Go 13

A 9x9 game

● The black stone is « atari ».

Page 14: Monte-Carlo Tree Search (MCTS) for Computer Gobouzy/Doc/AA2/MCTSGo-Bouzy.pdf · Monte-Carlo Tree Search (MCTS) for Computer Go Bruno Bouzy bruno.bouzy@parisdescartes.fr Université

MCTS for Computer Go 14

A 9x9 game

● White « captures » the black stone.

Page 15: Monte-Carlo Tree Search (MCTS) for Computer Gobouzy/Doc/AA2/MCTSGo-Bouzy.pdf · Monte-Carlo Tree Search (MCTS) for Computer Go Bruno Bouzy bruno.bouzy@parisdescartes.fr Université

MCTS for Computer Go 15

A 9x9 game

● For advised players, the game is over.

– Hu?

– Why?

Page 16: Monte-Carlo Tree Search (MCTS) for Computer Gobouzy/Doc/AA2/MCTSGo-Bouzy.pdf · Monte-Carlo Tree Search (MCTS) for Computer Go Bruno Bouzy bruno.bouzy@parisdescartes.fr Université

MCTS for Computer Go 16

A 9x9 game

● What happens if White contests black « territory »?

Page 17: Monte-Carlo Tree Search (MCTS) for Computer Gobouzy/Doc/AA2/MCTSGo-Bouzy.pdf · Monte-Carlo Tree Search (MCTS) for Computer Go Bruno Bouzy bruno.bouzy@parisdescartes.fr Université

MCTS for Computer Go 17

A 9x9 game

● White has invaded. Two strings are atari!

Page 18: Monte-Carlo Tree Search (MCTS) for Computer Gobouzy/Doc/AA2/MCTSGo-Bouzy.pdf · Monte-Carlo Tree Search (MCTS) for Computer Go Bruno Bouzy bruno.bouzy@parisdescartes.fr Université

MCTS for Computer Go 18

A 9x9 game

● Black captures !

Page 19: Monte-Carlo Tree Search (MCTS) for Computer Gobouzy/Doc/AA2/MCTSGo-Bouzy.pdf · Monte-Carlo Tree Search (MCTS) for Computer Go Bruno Bouzy bruno.bouzy@parisdescartes.fr Université

MCTS for Computer Go 19

A 9x9 game

● White insists but its string is atari...

Page 20: Monte-Carlo Tree Search (MCTS) for Computer Gobouzy/Doc/AA2/MCTSGo-Bouzy.pdf · Monte-Carlo Tree Search (MCTS) for Computer Go Bruno Bouzy bruno.bouzy@parisdescartes.fr Université

MCTS for Computer Go 20

A 9x9 game

● Black has proved is « territory ».

Page 21: Monte-Carlo Tree Search (MCTS) for Computer Gobouzy/Doc/AA2/MCTSGo-Bouzy.pdf · Monte-Carlo Tree Search (MCTS) for Computer Go Bruno Bouzy bruno.bouzy@parisdescartes.fr Université

MCTS for Computer Go 21

A 9x9 game

● Black may contest white territory too.

Page 22: Monte-Carlo Tree Search (MCTS) for Computer Gobouzy/Doc/AA2/MCTSGo-Bouzy.pdf · Monte-Carlo Tree Search (MCTS) for Computer Go Bruno Bouzy bruno.bouzy@parisdescartes.fr Université

MCTS for Computer Go 22

A 9x9 terminal position

● The game is over for computers.

– Hu?

– Who won ?

Page 23: Monte-Carlo Tree Search (MCTS) for Computer Gobouzy/Doc/AA2/MCTSGo-Bouzy.pdf · Monte-Carlo Tree Search (MCTS) for Computer Go Bruno Bouzy bruno.bouzy@parisdescartes.fr Université

MCTS for Computer Go 23

A 9x9 game

● The game ends when both players pass.● One black (resp. white) point for each black

(resp. white) stone and each black (resp. white) « eye » on the board.

● One black (resp. white) eye = an empty intersection surrounded by black (resp. white) stones.

Page 24: Monte-Carlo Tree Search (MCTS) for Computer Gobouzy/Doc/AA2/MCTSGo-Bouzy.pdf · Monte-Carlo Tree Search (MCTS) for Computer Go Bruno Bouzy bruno.bouzy@parisdescartes.fr Université

MCTS for Computer Go 24

A 9x9 game

● Scoring:– Black = 44

– White = 37

– Komi = 7.5

– Score = -0.5

● White wins!

Page 25: Monte-Carlo Tree Search (MCTS) for Computer Gobouzy/Doc/AA2/MCTSGo-Bouzy.pdf · Monte-Carlo Tree Search (MCTS) for Computer Go Bruno Bouzy bruno.bouzy@parisdescartes.fr Université

MCTS for Computer Go 25

Go ranking: « kyu » and « dan »

Top professional players

Average playersl

Very beginners

Beginners

Strong players

Very strong players

Pro ranking Amateur ranking

9 dan

1 dan

30 kyu

20 kyu

10 kyu

1 kyu1 dan

6 dan

9 dan

Page 26: Monte-Carlo Tree Search (MCTS) for Computer Gobouzy/Doc/AA2/MCTSGo-Bouzy.pdf · Monte-Carlo Tree Search (MCTS) for Computer Go Bruno Bouzy bruno.bouzy@parisdescartes.fr Université

MCTS for Computer Go 26

Computer Go (old history)● First go program (Lefkovitz 1960)● Zobrist hashing (Zobrist 1969)● Interim2 (Wilcox 1979)● Life and death model (Benson 1988)● Patterns: Goliath (Boon 1990)● Mathematical Go (Berlekamp 1991)● Handtalk (Chen 1995)

Page 27: Monte-Carlo Tree Search (MCTS) for Computer Gobouzy/Doc/AA2/MCTSGo-Bouzy.pdf · Monte-Carlo Tree Search (MCTS) for Computer Go Bruno Bouzy bruno.bouzy@parisdescartes.fr Université

MCTS for Computer Go 27

The old approach● Evaluation of non terminal positions

– Knowledge-based

– Breaking-down of a position into sub-positions

● Fixed-depth global tree search– Depth = 0 : action with the best value

– Depth = 1: action leading to the position with the best evaluation

– Depth > 1: alfa-beta or minmax

Page 28: Monte-Carlo Tree Search (MCTS) for Computer Gobouzy/Doc/AA2/MCTSGo-Bouzy.pdf · Monte-Carlo Tree Search (MCTS) for Computer Go Bruno Bouzy bruno.bouzy@parisdescartes.fr Université

MCTS for Computer Go 28

The old approachCurrent position

Evaluation of non terminal positions

Terminal positions

Bounded depth Tree search

361

2 or 3

Huhu?

Page 29: Monte-Carlo Tree Search (MCTS) for Computer Gobouzy/Doc/AA2/MCTSGo-Bouzy.pdf · Monte-Carlo Tree Search (MCTS) for Computer Go Bruno Bouzy bruno.bouzy@parisdescartes.fr Université

MCTS for Computer Go 29

Position evaluation

● Break-down– Whole game (win/loss or score)

– Goal-oriented sub-game● String capture● Connections, dividers, eyes, life and death

● Local searches– Alpha-beta and enhancements

– Proof-number search

Page 30: Monte-Carlo Tree Search (MCTS) for Computer Gobouzy/Doc/AA2/MCTSGo-Bouzy.pdf · Monte-Carlo Tree Search (MCTS) for Computer Go Bruno Bouzy bruno.bouzy@parisdescartes.fr Université

MCTS for Computer Go 30

A 19x19 middle-game position

Page 31: Monte-Carlo Tree Search (MCTS) for Computer Gobouzy/Doc/AA2/MCTSGo-Bouzy.pdf · Monte-Carlo Tree Search (MCTS) for Computer Go Bruno Bouzy bruno.bouzy@parisdescartes.fr Université

MCTS for Computer Go 31

A possible black break-down

Page 32: Monte-Carlo Tree Search (MCTS) for Computer Gobouzy/Doc/AA2/MCTSGo-Bouzy.pdf · Monte-Carlo Tree Search (MCTS) for Computer Go Bruno Bouzy bruno.bouzy@parisdescartes.fr Université

MCTS for Computer Go 32

A possible white break-down

Page 33: Monte-Carlo Tree Search (MCTS) for Computer Gobouzy/Doc/AA2/MCTSGo-Bouzy.pdf · Monte-Carlo Tree Search (MCTS) for Computer Go Bruno Bouzy bruno.bouzy@parisdescartes.fr Université

MCTS for Computer Go 33

Possible local evaluations (1)

Alive and territory

unstable

alive

dead

alive

Not important

Page 34: Monte-Carlo Tree Search (MCTS) for Computer Gobouzy/Doc/AA2/MCTSGo-Bouzy.pdf · Monte-Carlo Tree Search (MCTS) for Computer Go Bruno Bouzy bruno.bouzy@parisdescartes.fr Université

MCTS for Computer Go 34

Possible local evaluations (2)alive

unstable

unstablealive + big territory

unstable

Page 35: Monte-Carlo Tree Search (MCTS) for Computer Gobouzy/Doc/AA2/MCTSGo-Bouzy.pdf · Monte-Carlo Tree Search (MCTS) for Computer Go Bruno Bouzy bruno.bouzy@parisdescartes.fr Université

MCTS for Computer Go 35

Position evaluation

● Local results– Obtained with local tree search

– Result if white plays first (resp. black)

– Combinatorial game theory (Conway)

– Switches {a|b}, >, <, *, 0

● Global recomposition– move generation and evaluation

– position evaluation

Page 36: Monte-Carlo Tree Search (MCTS) for Computer Gobouzy/Doc/AA2/MCTSGo-Bouzy.pdf · Monte-Carlo Tree Search (MCTS) for Computer Go Bruno Bouzy bruno.bouzy@parisdescartes.fr Université

MCTS for Computer Go 36

Position evaluation

Page 37: Monte-Carlo Tree Search (MCTS) for Computer Gobouzy/Doc/AA2/MCTSGo-Bouzy.pdf · Monte-Carlo Tree Search (MCTS) for Computer Go Bruno Bouzy bruno.bouzy@parisdescartes.fr Université

MCTS for Computer Go 37

Drawbacks (1/2)

● The break-down is not unique● Performing a (wrong) local tree search on a

(possibly irrelevant) local position● Misevaluating the size of the local position● Different kinds of local information

– Symbolic (group: dead alive unstable)

– Numerical (territory size, reduction, increase)

Page 38: Monte-Carlo Tree Search (MCTS) for Computer Gobouzy/Doc/AA2/MCTSGo-Bouzy.pdf · Monte-Carlo Tree Search (MCTS) for Computer Go Bruno Bouzy bruno.bouzy@parisdescartes.fr Université

MCTS for Computer Go 38

Drawbacks (2/2)

● Local positions interact● Complicated● Domain-dependent knowledge● Need of human expertise● Difficult to program and maintain● Holes of knowledge● Erratic behaviour

Page 39: Monte-Carlo Tree Search (MCTS) for Computer Gobouzy/Doc/AA2/MCTSGo-Bouzy.pdf · Monte-Carlo Tree Search (MCTS) for Computer Go Bruno Bouzy bruno.bouzy@parisdescartes.fr Université

MCTS for Computer Go 39

Upsides

● Feasible on 1990's computers● Execution is fast

● Some specific local tree searches are accurate and fast

Page 40: Monte-Carlo Tree Search (MCTS) for Computer Gobouzy/Doc/AA2/MCTSGo-Bouzy.pdf · Monte-Carlo Tree Search (MCTS) for Computer Go Bruno Bouzy bruno.bouzy@parisdescartes.fr Université

MCTS for Computer Go 40

The old approach

Top professional players

Average playersl

Very beginners

Beginners

Strong players

Very strong players

Pro ranking Amateur ranking

9 dan

1 dan

30 kyu

20 kyu

10 kyu10 kyu

1 kyu1 dan

6 dan

9 dan

Old approach

Page 41: Monte-Carlo Tree Search (MCTS) for Computer Gobouzy/Doc/AA2/MCTSGo-Bouzy.pdf · Monte-Carlo Tree Search (MCTS) for Computer Go Bruno Bouzy bruno.bouzy@parisdescartes.fr Université

MCTS for Computer Go 41

End of part one!

● Next: the Monte-Carlo approach...

Page 42: Monte-Carlo Tree Search (MCTS) for Computer Gobouzy/Doc/AA2/MCTSGo-Bouzy.pdf · Monte-Carlo Tree Search (MCTS) for Computer Go Bruno Bouzy bruno.bouzy@parisdescartes.fr Université

MCTS for Computer Go 42

The Monte-Carlo (MC) approach

● Games containing chance– Backgammon (Tesauro 1989)

● Games with hidden information– Bridge (Ginsberg 2001)

– Poker (Billings & al. 2002)

– Scrabble (Sheppard 2002)

Page 43: Monte-Carlo Tree Search (MCTS) for Computer Gobouzy/Doc/AA2/MCTSGo-Bouzy.pdf · Monte-Carlo Tree Search (MCTS) for Computer Go Bruno Bouzy bruno.bouzy@parisdescartes.fr Université

MCTS for Computer Go 43

The Monte-Carlo approach

● Games with complete information– A general model (Abramson 1990)

● Simulated annealing Go – (Brügmann 1993)

– 2 sequences of moves

– « all moves as first » heuristic

– Gobble on 9x9

Page 44: Monte-Carlo Tree Search (MCTS) for Computer Gobouzy/Doc/AA2/MCTSGo-Bouzy.pdf · Monte-Carlo Tree Search (MCTS) for Computer Go Bruno Bouzy bruno.bouzy@parisdescartes.fr Université

MCTS for Computer Go 44

The Monte-Carlo approach● Position evaluation:

Launch N random games

Evaluation = mean value of outcomes

● Depth-one MC algorithm:For each move m {

Play m on the ref position

Launch N random games

Move value (m) = mean value

}

Page 45: Monte-Carlo Tree Search (MCTS) for Computer Gobouzy/Doc/AA2/MCTSGo-Bouzy.pdf · Monte-Carlo Tree Search (MCTS) for Computer Go Bruno Bouzy bruno.bouzy@parisdescartes.fr Université

MCTS for Computer Go 45

Depth-one Monte-Carlo

... ... ...

...

Page 46: Monte-Carlo Tree Search (MCTS) for Computer Gobouzy/Doc/AA2/MCTSGo-Bouzy.pdf · Monte-Carlo Tree Search (MCTS) for Computer Go Bruno Bouzy bruno.bouzy@parisdescartes.fr Université

MCTS for Computer Go 46

Progressive pruning● (Billings 2002, Sheppard 2002, Bouzy &

Helmstetter 2003)

Current best move

Second best

Pruned

Still explored

Page 47: Monte-Carlo Tree Search (MCTS) for Computer Gobouzy/Doc/AA2/MCTSGo-Bouzy.pdf · Monte-Carlo Tree Search (MCTS) for Computer Go Bruno Bouzy bruno.bouzy@parisdescartes.fr Université

MCTS for Computer Go 47

Upper bound● Optimism in face of uncertainty

– Intestim (Kaelbling 1993),

– UCB multi-armed bandit (Auer & al 2002)

Current best promising move

Second best promising

Current best proven move

Page 48: Monte-Carlo Tree Search (MCTS) for Computer Gobouzy/Doc/AA2/MCTSGo-Bouzy.pdf · Monte-Carlo Tree Search (MCTS) for Computer Go Bruno Bouzy bruno.bouzy@parisdescartes.fr Université

MCTS for Computer Go 48

All-moves-as-first heuristic (1/3)

A B C D

rA

B

D

C

Page 49: Monte-Carlo Tree Search (MCTS) for Computer Gobouzy/Doc/AA2/MCTSGo-Bouzy.pdf · Monte-Carlo Tree Search (MCTS) for Computer Go Bruno Bouzy bruno.bouzy@parisdescartes.fr Université

MCTS for Computer Go 49

All-moves-as-first heuristic (2/3)

Actual simulation

A D

CB

A B C D

r

o

A

C

B

D

A

B

D

C

Page 50: Monte-Carlo Tree Search (MCTS) for Computer Gobouzy/Doc/AA2/MCTSGo-Bouzy.pdf · Monte-Carlo Tree Search (MCTS) for Computer Go Bruno Bouzy bruno.bouzy@parisdescartes.fr Université

MCTS for Computer Go 50

All-moves-as-first heuristic (3/3)

Actual simulation

Virtual simulation = actual simulation assuming c is played« as first »

A D

CB

A B C D

r

o o

CA

C A

B B

D D

A

B

D

C

Page 51: Monte-Carlo Tree Search (MCTS) for Computer Gobouzy/Doc/AA2/MCTSGo-Bouzy.pdf · Monte-Carlo Tree Search (MCTS) for Computer Go Bruno Bouzy bruno.bouzy@parisdescartes.fr Université

MCTS for Computer Go 51

The Monte-Carlo approach● Upsides

– Robust evaluation

– Global search

– Move quality increases with computing power

● Way of playing– Good strategical sense but weak tactically

● Easy to program– Follow the rules of the game

– No break-down problem

Page 52: Monte-Carlo Tree Search (MCTS) for Computer Gobouzy/Doc/AA2/MCTSGo-Bouzy.pdf · Monte-Carlo Tree Search (MCTS) for Computer Go Bruno Bouzy bruno.bouzy@parisdescartes.fr Université

MCTS for Computer Go 52

Monte-Carlo and knowledge● Pseudo-random simulations using Go

knowledge (Bouzy 2003)– Moves played with a probability depending on

specific domain-dependent knowledge

● 2 basic concepts– string capture 3x3 shapes

Page 53: Monte-Carlo Tree Search (MCTS) for Computer Gobouzy/Doc/AA2/MCTSGo-Bouzy.pdf · Monte-Carlo Tree Search (MCTS) for Computer Go Bruno Bouzy bruno.bouzy@parisdescartes.fr Université

MCTS for Computer Go 53

Monte-Carlo and knowledge

● Results are impressive– MC(random) << MC(pseudo random)

– Size 9x9 13x13 19x19

– % wins 68 93 98

● Other works on simulations– Patterns in MoGo, proximity rule (Wang & al

2006)

– Simulation balancing (Silver & Tesauro 2009)

Page 54: Monte-Carlo Tree Search (MCTS) for Computer Gobouzy/Doc/AA2/MCTSGo-Bouzy.pdf · Monte-Carlo Tree Search (MCTS) for Computer Go Bruno Bouzy bruno.bouzy@parisdescartes.fr Université

MCTS for Computer Go 54

Monte-Carlo and knowledge● Pseudo-random player

– 3x3 pattern urgency table with 38 patterns

– Few dizains of relevant patterns only

– Patterns gathered by● Human expertise● Reinforcement Learning (Bouzy & Chaslot

2006)

● Warning– p1 better than p2 does not mean MC(p1)

better than MC(p2)

Page 55: Monte-Carlo Tree Search (MCTS) for Computer Gobouzy/Doc/AA2/MCTSGo-Bouzy.pdf · Monte-Carlo Tree Search (MCTS) for Computer Go Bruno Bouzy bruno.bouzy@parisdescartes.fr Université

MCTS for Computer Go 55

Monte-Carlo Tree Search (MCTS)

● How to integrate MC and TS ?● UCT = UCB for Trees

– (Kocsis & Szepesvari 2006)

– Superposition of UCB (Auer & al 2002)

● MCTS– Selection, expansion, updating (Chaslot & al)

(Coulom 2006)

– Simulation (Bouzy 2003) (Wang & Gelly 2006)

Page 56: Monte-Carlo Tree Search (MCTS) for Computer Gobouzy/Doc/AA2/MCTSGo-Bouzy.pdf · Monte-Carlo Tree Search (MCTS) for Computer Go Bruno Bouzy bruno.bouzy@parisdescartes.fr Université

MCTS for Computer Go 56

MCTS (1/2)while (hasTime) {

playOutTreeBasedGame()

expandTree()

outcome = playOutRandomGame()

updateNodes(outcome)

}

then choose the node with...

... the best mean value

... the highest visit number

Page 57: Monte-Carlo Tree Search (MCTS) for Computer Gobouzy/Doc/AA2/MCTSGo-Bouzy.pdf · Monte-Carlo Tree Search (MCTS) for Computer Go Bruno Bouzy bruno.bouzy@parisdescartes.fr Université

MCTS for Computer Go 57

MCTS (2/2)

PlayOutTreeBasedGame() {

node = getNode(position)

while (node) {

move=selectMove(node)

play(move)

node = getNode(position)

}

}

Page 58: Monte-Carlo Tree Search (MCTS) for Computer Gobouzy/Doc/AA2/MCTSGo-Bouzy.pdf · Monte-Carlo Tree Search (MCTS) for Computer Go Bruno Bouzy bruno.bouzy@parisdescartes.fr Université

MCTS for Computer Go 58

UCT move selection

● Move selection rule to browse the tree:

move=argmax (s*mean + C*sqrt(log(t)/n))

● Mean value for exploitation– s (=+-1): color to move

● UCT bias for exploration– C: constant term set up by experiments

– t: number of visits of the parent node

– n: number of visits of the current node

Page 59: Monte-Carlo Tree Search (MCTS) for Computer Gobouzy/Doc/AA2/MCTSGo-Bouzy.pdf · Monte-Carlo Tree Search (MCTS) for Computer Go Bruno Bouzy bruno.bouzy@parisdescartes.fr Université

MCTS for Computer Go 59

Example

● 1 iteration

1

1/1

Page 60: Monte-Carlo Tree Search (MCTS) for Computer Gobouzy/Doc/AA2/MCTSGo-Bouzy.pdf · Monte-Carlo Tree Search (MCTS) for Computer Go Bruno Bouzy bruno.bouzy@parisdescartes.fr Université

MCTS for Computer Go 60

Example

● 2 iterations

0

1/2

0/1

Page 61: Monte-Carlo Tree Search (MCTS) for Computer Gobouzy/Doc/AA2/MCTSGo-Bouzy.pdf · Monte-Carlo Tree Search (MCTS) for Computer Go Bruno Bouzy bruno.bouzy@parisdescartes.fr Université

MCTS for Computer Go 61

Example

● 3 iterations

1

2/3

0/11/1

Page 62: Monte-Carlo Tree Search (MCTS) for Computer Gobouzy/Doc/AA2/MCTSGo-Bouzy.pdf · Monte-Carlo Tree Search (MCTS) for Computer Go Bruno Bouzy bruno.bouzy@parisdescartes.fr Université

MCTS for Computer Go 62

Example

● 4 iterations2/4

0/11/10/1

Page 63: Monte-Carlo Tree Search (MCTS) for Computer Gobouzy/Doc/AA2/MCTSGo-Bouzy.pdf · Monte-Carlo Tree Search (MCTS) for Computer Go Bruno Bouzy bruno.bouzy@parisdescartes.fr Université

MCTS for Computer Go 63

Example

● 5 iterations3/5

0/11/10/1 1/1

Page 64: Monte-Carlo Tree Search (MCTS) for Computer Gobouzy/Doc/AA2/MCTSGo-Bouzy.pdf · Monte-Carlo Tree Search (MCTS) for Computer Go Bruno Bouzy bruno.bouzy@parisdescartes.fr Université

MCTS for Computer Go 64

Example

● 6 iterations

0/11/20/1 1/1

0/1

3/6

Page 65: Monte-Carlo Tree Search (MCTS) for Computer Gobouzy/Doc/AA2/MCTSGo-Bouzy.pdf · Monte-Carlo Tree Search (MCTS) for Computer Go Bruno Bouzy bruno.bouzy@parisdescartes.fr Université

MCTS for Computer Go 65

Example

● 7 iterations3/7

0/11/20/1 1/2

0/1 0/1

Page 66: Monte-Carlo Tree Search (MCTS) for Computer Gobouzy/Doc/AA2/MCTSGo-Bouzy.pdf · Monte-Carlo Tree Search (MCTS) for Computer Go Bruno Bouzy bruno.bouzy@parisdescartes.fr Université

MCTS for Computer Go 66

Example

● 8 iterations4/8

0/12/30/1 1/2

0/1 0/11/1

Page 67: Monte-Carlo Tree Search (MCTS) for Computer Gobouzy/Doc/AA2/MCTSGo-Bouzy.pdf · Monte-Carlo Tree Search (MCTS) for Computer Go Bruno Bouzy bruno.bouzy@parisdescartes.fr Université

MCTS for Computer Go 67

Example

● 9 iterations4/9

0/10/1 1/2

0/1 0/11/1

2/4

0/1

Page 68: Monte-Carlo Tree Search (MCTS) for Computer Gobouzy/Doc/AA2/MCTSGo-Bouzy.pdf · Monte-Carlo Tree Search (MCTS) for Computer Go Bruno Bouzy bruno.bouzy@parisdescartes.fr Université

MCTS for Computer Go 68

Example

● 10 iterations5/10

0/12/40/1 2/3

0/1 0/11/10/1 1/1

Page 69: Monte-Carlo Tree Search (MCTS) for Computer Gobouzy/Doc/AA2/MCTSGo-Bouzy.pdf · Monte-Carlo Tree Search (MCTS) for Computer Go Bruno Bouzy bruno.bouzy@parisdescartes.fr Université

MCTS for Computer Go 69

Example

● 11 iterations 6/11

0/12/40/1 3/4

0/1 0/11/10/1 1/1 1/1

Page 70: Monte-Carlo Tree Search (MCTS) for Computer Gobouzy/Doc/AA2/MCTSGo-Bouzy.pdf · Monte-Carlo Tree Search (MCTS) for Computer Go Bruno Bouzy bruno.bouzy@parisdescartes.fr Université

MCTS for Computer Go 70

Example

● 12 iterations7/12

0/12/40/1 4/5

0/1 1/21/10/1 1/1 1/1

1/1

Page 71: Monte-Carlo Tree Search (MCTS) for Computer Gobouzy/Doc/AA2/MCTSGo-Bouzy.pdf · Monte-Carlo Tree Search (MCTS) for Computer Go Bruno Bouzy bruno.bouzy@parisdescartes.fr Université

MCTS for Computer Go 71

Example● Clarity

– C = 0

● Notice– with C != 0 a node cannot stay unvisited

– min or max rule according to the node depth

– not visited children have an infinite mean

● Practice– Mean initialized optimistically

Page 72: Monte-Carlo Tree Search (MCTS) for Computer Gobouzy/Doc/AA2/MCTSGo-Bouzy.pdf · Monte-Carlo Tree Search (MCTS) for Computer Go Bruno Bouzy bruno.bouzy@parisdescartes.fr Université

MCTS for Computer Go 72

MCTS enhancements● The raw version can be enhanced

– Tuning UCT C value

– Outcome = score or win loss info (+1/-1)

– Doubling the simulation number

– RAVE

– Using Go knowledge● In the tree or in the simulations

– Speed-up● Optimizing, pondering, parallelizing

Page 73: Monte-Carlo Tree Search (MCTS) for Computer Gobouzy/Doc/AA2/MCTSGo-Bouzy.pdf · Monte-Carlo Tree Search (MCTS) for Computer Go Bruno Bouzy bruno.bouzy@parisdescartes.fr Université

MCTS for Computer Go 73

Assessing an enhancement● Self-play

– The new version vs the reference version

– % wins with few hundred games

– 9x9 (or 19x19 boards)

● Against differently designed programs– GTP (Go Text Protocol)

– CGOS (Computer Go Operating System)

● Competitions

Page 74: Monte-Carlo Tree Search (MCTS) for Computer Gobouzy/Doc/AA2/MCTSGo-Bouzy.pdf · Monte-Carlo Tree Search (MCTS) for Computer Go Bruno Bouzy bruno.bouzy@parisdescartes.fr Université

MCTS for Computer Go 74

Move selection formula tuning

● Using UCB– Best value for C ?

– 60-40%

● Using « UCB-tuned » (Auer & al 2002)– C replaced by min(1/4,variance)

– 55-45%

Page 75: Monte-Carlo Tree Search (MCTS) for Computer Gobouzy/Doc/AA2/MCTSGo-Bouzy.pdf · Monte-Carlo Tree Search (MCTS) for Computer Go Bruno Bouzy bruno.bouzy@parisdescartes.fr Université

MCTS for Computer Go 75

Exploration vs exploitation

● General idea: explore at the beginning and exploit in the end of thinking time

● Diminishing C linearly in the remaining time– (Vermorel & al 2005)

– 55-45%

● At the end:– Argmax over the mean value or over the

number of visits ?

– 55-45%

Page 76: Monte-Carlo Tree Search (MCTS) for Computer Gobouzy/Doc/AA2/MCTSGo-Bouzy.pdf · Monte-Carlo Tree Search (MCTS) for Computer Go Bruno Bouzy bruno.bouzy@parisdescartes.fr Université

MCTS for Computer Go 76

Kind of outcome

● 2 kinds of outcomes– Score (S) or win loss information (WLI) ?

– Probability of winning or expected score ?

– Combining both (S+WLI) (score +45 if win)

● Results – WLI vs S 65-35%

– S+WLI vs S 65-35%

Page 77: Monte-Carlo Tree Search (MCTS) for Computer Gobouzy/Doc/AA2/MCTSGo-Bouzy.pdf · Monte-Carlo Tree Search (MCTS) for Computer Go Bruno Bouzy bruno.bouzy@parisdescartes.fr Université

MCTS for Computer Go 77

Doubling the number of simulations

● N = 100,000

● Results – 2N vs N 60-40%

– 4N vs 2N 58-42%

Page 78: Monte-Carlo Tree Search (MCTS) for Computer Gobouzy/Doc/AA2/MCTSGo-Bouzy.pdf · Monte-Carlo Tree Search (MCTS) for Computer Go Bruno Bouzy bruno.bouzy@parisdescartes.fr Université

MCTS for Computer Go 78

Tree management

● Transposition tables– Tree -> Directed Acyclic Graph (DAG)

– Different sequences of moves may lead to the same position

– Interest for MC Go: merge the results

– Result: 60-40%

● Keeping the tree from one move to the next– Result: 65-35%

Page 79: Monte-Carlo Tree Search (MCTS) for Computer Gobouzy/Doc/AA2/MCTSGo-Bouzy.pdf · Monte-Carlo Tree Search (MCTS) for Computer Go Bruno Bouzy bruno.bouzy@parisdescartes.fr Université

MCTS for Computer Go 79

RAVE (1/3)

● Rapid Action Value Estimation– Mogo 2007

– Use the AMAF heuristic (Brugmann 1993)

– There are « many » virtual sequences that are transposed from the actually played sequence

● Result: – 70-30%

Page 80: Monte-Carlo Tree Search (MCTS) for Computer Gobouzy/Doc/AA2/MCTSGo-Bouzy.pdf · Monte-Carlo Tree Search (MCTS) for Computer Go Bruno Bouzy bruno.bouzy@parisdescartes.fr Université

MCTS for Computer Go 80

RAVE (2/3)● AMAF heuristic● Which nodes to update?● Actual

– Sequence ACBD

– Nodes

● Virtual– BCAD, ADBC, BDAC

– Nodes

B

A B

B

DD

D C

CC

AA

Page 81: Monte-Carlo Tree Search (MCTS) for Computer Gobouzy/Doc/AA2/MCTSGo-Bouzy.pdf · Monte-Carlo Tree Search (MCTS) for Computer Go Bruno Bouzy bruno.bouzy@parisdescartes.fr Université

MCTS for Computer Go 81

RAVE (3/3)● 3 variables

– Usual mean value Mu

– AMAF mean value Mamaf

– M = β Mamaf

+ (1-β) Mu

– β = sqrt(k/(k+3N))

– K set up experimentally

● M varies from Mamaf

to Mu

Page 82: Monte-Carlo Tree Search (MCTS) for Computer Gobouzy/Doc/AA2/MCTSGo-Bouzy.pdf · Monte-Carlo Tree Search (MCTS) for Computer Go Bruno Bouzy bruno.bouzy@parisdescartes.fr Université

MCTS for Computer Go 82

Knowledge in the simulations

● High urgency for...– capture/escape 55-45%

– 3x3 patterns 60-40%

– Proximity rule 60-40%

● Mercy rule– Interrupt the game when the difference of

captured stones is greater than a threshold (Hillis 2006)

– 51-49%

Page 83: Monte-Carlo Tree Search (MCTS) for Computer Gobouzy/Doc/AA2/MCTSGo-Bouzy.pdf · Monte-Carlo Tree Search (MCTS) for Computer Go Bruno Bouzy bruno.bouzy@parisdescartes.fr Université

MCTS for Computer Go 83

Knowledge in the tree

● Virtual wins for good looking moves● Automatic acquisition of patterns of pro

games (Coulom 2007) (Bouzy & Chaslot 2005)

● Matching has a high cost● Progressive widening (Chaslot & al 2008)

● Interesting under strong time constraints ● Result: 60-40%

Page 84: Monte-Carlo Tree Search (MCTS) for Computer Gobouzy/Doc/AA2/MCTSGo-Bouzy.pdf · Monte-Carlo Tree Search (MCTS) for Computer Go Bruno Bouzy bruno.bouzy@parisdescartes.fr Université

MCTS for Computer Go 84

Speeding up the simulations

● Fully random simulations (2007)– 50,000 game/second (Lew 2006)

– 20,000 (commonly eared)

– 10,000 (my program)

● Pseudo-random– 5,000 (my program in 2007)

● Rough optimization is worthwhile

Page 85: Monte-Carlo Tree Search (MCTS) for Computer Gobouzy/Doc/AA2/MCTSGo-Bouzy.pdf · Monte-Carlo Tree Search (MCTS) for Computer Go Bruno Bouzy bruno.bouzy@parisdescartes.fr Université

MCTS for Computer Go 85

Pondering

● Think on the opponent time– 55-45%

– Possible doubling of thinking time

– The move of the opponent may not be the planned move on which you think

– Side effect: play quickly to think on the opponent time

Page 86: Monte-Carlo Tree Search (MCTS) for Computer Gobouzy/Doc/AA2/MCTSGo-Bouzy.pdf · Monte-Carlo Tree Search (MCTS) for Computer Go Bruno Bouzy bruno.bouzy@parisdescartes.fr Université

MCTS for Computer Go 86

Summing up the enhancements● MCTS with all enhancements vs raw MCTS

– Exploration and exploitation: 60-40%

– Win/loss outcome: 65-35%

– Rough optimization of simulations 60-40%

– Transposition table 60-40%

– RAVE 70-30%

– Knowledge in the simulations 70-30%

– Knowledge in the tree 60-40%

– Pondering 55-45%

– Parallelization 70-30%

● Result: 99-1%

Page 87: Monte-Carlo Tree Search (MCTS) for Computer Gobouzy/Doc/AA2/MCTSGo-Bouzy.pdf · Monte-Carlo Tree Search (MCTS) for Computer Go Bruno Bouzy bruno.bouzy@parisdescartes.fr Université

MCTS for Computer Go 87

Parallelization● Computer Chess: Deep Blue● Multi-core computer

– Symmetric MultiProcessor (SMP)

– one thread per processor

– shared memory, low latency

– mutual exclusion (mutex) mechanism

● Cluster of computers– Message Passing Information (MPI)

Page 88: Monte-Carlo Tree Search (MCTS) for Computer Gobouzy/Doc/AA2/MCTSGo-Bouzy.pdf · Monte-Carlo Tree Search (MCTS) for Computer Go Bruno Bouzy bruno.bouzy@parisdescartes.fr Université

MCTS for Computer Go 88

Parallelization

while (hasTime) {

playOutTreeBasedGame()

expandTree()

outcome = playOutRandomGame()

updateNodes(outcome)

}

Page 89: Monte-Carlo Tree Search (MCTS) for Computer Gobouzy/Doc/AA2/MCTSGo-Bouzy.pdf · Monte-Carlo Tree Search (MCTS) for Computer Go Bruno Bouzy bruno.bouzy@parisdescartes.fr Université

MCTS for Computer Go 89

Leaf parallelization

Page 90: Monte-Carlo Tree Search (MCTS) for Computer Gobouzy/Doc/AA2/MCTSGo-Bouzy.pdf · Monte-Carlo Tree Search (MCTS) for Computer Go Bruno Bouzy bruno.bouzy@parisdescartes.fr Université

MCTS for Computer Go 90

Leaf parallelization

● (Cazenave Jouandeau 2007)● Easy to program● Drawbacks

– Wait for the longest simulation

– When part of the simulation outcomes is a loss, performing the remaining may not be a relevant strategy.

Page 91: Monte-Carlo Tree Search (MCTS) for Computer Gobouzy/Doc/AA2/MCTSGo-Bouzy.pdf · Monte-Carlo Tree Search (MCTS) for Computer Go Bruno Bouzy bruno.bouzy@parisdescartes.fr Université

MCTS for Computer Go 91

Root parallelization

Page 92: Monte-Carlo Tree Search (MCTS) for Computer Gobouzy/Doc/AA2/MCTSGo-Bouzy.pdf · Monte-Carlo Tree Search (MCTS) for Computer Go Bruno Bouzy bruno.bouzy@parisdescartes.fr Université

MCTS for Computer Go 92

Root parallelization

● (Cazenave Jouandeau 2007)● Easy to program● No communication● At completion, merge the trees● 4 MCTS for 1sec > 1 MCTS for 4 sec● Good way for low time settings and a small

number of threads

Page 93: Monte-Carlo Tree Search (MCTS) for Computer Gobouzy/Doc/AA2/MCTSGo-Bouzy.pdf · Monte-Carlo Tree Search (MCTS) for Computer Go Bruno Bouzy bruno.bouzy@parisdescartes.fr Université

MCTS for Computer Go 93

Tree parallelization – global mutex

Page 94: Monte-Carlo Tree Search (MCTS) for Computer Gobouzy/Doc/AA2/MCTSGo-Bouzy.pdf · Monte-Carlo Tree Search (MCTS) for Computer Go Bruno Bouzy bruno.bouzy@parisdescartes.fr Université

MCTS for Computer Go 94

Tree parallelization – local mutex

Page 95: Monte-Carlo Tree Search (MCTS) for Computer Gobouzy/Doc/AA2/MCTSGo-Bouzy.pdf · Monte-Carlo Tree Search (MCTS) for Computer Go Bruno Bouzy bruno.bouzy@parisdescartes.fr Université

MCTS for Computer Go 95

Tree parallelization● One shared tree, several threads● Mutex

– Global: the whole tree has a mutex

– Local: each node has a mutex

● « Virtual loss »– Given to a node browsed by a thread

– Removed at update stage

– Preventing threads from similar simulations

Page 96: Monte-Carlo Tree Search (MCTS) for Computer Gobouzy/Doc/AA2/MCTSGo-Bouzy.pdf · Monte-Carlo Tree Search (MCTS) for Computer Go Bruno Bouzy bruno.bouzy@parisdescartes.fr Université

MCTS for Computer Go 96

Computer-computer results● Computer Olympiads

19x19 9x9– 2010 Erica, Zen, MFGo MyGoFriend

– 2009 Zen, Fuego, Mogo Fuego

– 2008 MFGo, Mogo, Leela MFGo

– 2007 Mogo, CrazyStone, GNU Go Steenvreter

– 2006 GNU Go, Go Intellect, Indigo CrazyStone

– 2005 Handtalk, Go Intellect, Aya Go Intellect

– 2004 Go Intellect, MFGo, Indigo Go Intellect

Page 97: Monte-Carlo Tree Search (MCTS) for Computer Gobouzy/Doc/AA2/MCTSGo-Bouzy.pdf · Monte-Carlo Tree Search (MCTS) for Computer Go Bruno Bouzy bruno.bouzy@parisdescartes.fr Université

MCTS for Computer Go 97

Human-computer results● 9x9

– 2009: Mogo won a pro with black

– 2009: Fuego won a pro with white

● 19x19:– 2008: Mogo won a pro with 9 stones

Crazy Stone won a pro with 8 stones

Crazy Stone won a pro with 7 stones

– 2009: Mogo won a pro with 6 stones

Page 98: Monte-Carlo Tree Search (MCTS) for Computer Gobouzy/Doc/AA2/MCTSGo-Bouzy.pdf · Monte-Carlo Tree Search (MCTS) for Computer Go Bruno Bouzy bruno.bouzy@parisdescartes.fr Université

MCTS for Computer Go 98

MCTS and the old approach

Top professional players

Average players

Very beginners

Beginners

Strong players

Very strong players

Pro ranking Amateur ranking

9 dan

1 dan

30 kyu

20 kyu

10 kyu

1 kyu1 dan

6 dan

9 dan

MCTS

Old approach

9x9 go

19x19 go

Page 99: Monte-Carlo Tree Search (MCTS) for Computer Gobouzy/Doc/AA2/MCTSGo-Bouzy.pdf · Monte-Carlo Tree Search (MCTS) for Computer Go Bruno Bouzy bruno.bouzy@parisdescartes.fr Université

MCTS for Computer Go 99

Computer Go (MC history)

● Monte-Carlo Go (Brugmann 1993)● MCGo devel. (Bouzy & Helmstetter 2003)● MC+knowledge (Bouzy 2003)● UCT (Kocsis & Szepesvari 2006)● Crazy Stone (Coulom 2006)● Mogo (Wang & Gelly 2006)

Page 100: Monte-Carlo Tree Search (MCTS) for Computer Gobouzy/Doc/AA2/MCTSGo-Bouzy.pdf · Monte-Carlo Tree Search (MCTS) for Computer Go Bruno Bouzy bruno.bouzy@parisdescartes.fr Université

MCTS for Computer Go 100

Conclusion

● Monte-Carlo brought a Big improvement in Computer Go over the last decade!

– No old approach based program anymore!

– All go programs are MCTS based!

– Professional level on 9x9!

– Dan level on 19x19!

● Unbelievable 10 years ago!

Page 101: Monte-Carlo Tree Search (MCTS) for Computer Gobouzy/Doc/AA2/MCTSGo-Bouzy.pdf · Monte-Carlo Tree Search (MCTS) for Computer Go Bruno Bouzy bruno.bouzy@parisdescartes.fr Université

MCTS for Computer Go 101

Some references

● PhD, MCTS and Go (Chaslot 2010)● PhD, Reinf. Learning and Go (Silver 2010)● PhD, R. Learning: applic. to Go (Gelly 2007)● UCT (Kocsis & Szepesvari 2006)● 1st MCTS go program (Coulom 2006)

Page 102: Monte-Carlo Tree Search (MCTS) for Computer Gobouzy/Doc/AA2/MCTSGo-Bouzy.pdf · Monte-Carlo Tree Search (MCTS) for Computer Go Bruno Bouzy bruno.bouzy@parisdescartes.fr Université

MCTS for Computer Go 102

Web links

● http://www.grappa.univ-lille3.fr/icga/● http://cgos.boardspace.net/● http://www.gokgs.com/● http://www.lri.fr/~gelly/MoGo.htm● http://remi.coulom.free.fr/CrazyStone/● http://fuego.sourceforge.net/● ...

Page 103: Monte-Carlo Tree Search (MCTS) for Computer Gobouzy/Doc/AA2/MCTSGo-Bouzy.pdf · Monte-Carlo Tree Search (MCTS) for Computer Go Bruno Bouzy bruno.bouzy@parisdescartes.fr Université

MCTS for Computer Go 103

Thank you for your attention!