Top Banner
Monte Carlo Tree Search for the Super Mario Bros Chih-Sheng Lin Advisor: Dr. rer. nat. Chuan-Kang Ting Department of Computer Science and Information Engineering, National Chung Cheng University 1
40

Monte Carlo Tree Search for the Super Mario Bros

Jan 21, 2017

Download

Technology

Chih-Sheng Lin
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Monte Carlo Tree Search for the Super Mario Bros

Monte Carlo Tree Search for the Super Mario Bros

Chih-Sheng Lin Advisor: Dr. rer. nat. Chuan-Kang Ting

Department of Computer Science and Information Engineering, National Chung Cheng University

1

Page 2: Monte Carlo Tree Search for the Super Mario Bros

Outline • Introduction

• Monte Carlo Tree Search (MCTS) o Basic Algorithm

o Upper Confidence Bounds for Trees (UCT)

• Controller Design using MCTS o Problem Formulation

o MCTS-based Controller

o Improvements on UCT

• Experimental Results

• Conclusions

2

Page 3: Monte Carlo Tree Search for the Super Mario Bros

Introduction

The Mario AI Benchmark

• Designing a Mario-playing controller

Controller Gam

e En

viron

me

nt

Percepts

Actions

Sensors

Actuators

? in 42ms

3

Page 4: Monte Carlo Tree Search for the Super Mario Bros

Introduction

The Mario AI Benchmark

• Scoring method in the Mario AI competition

o Maximizing the multi-objective weighted sum

distance 1 hiddenBlocks 24 marioStatus 1024

flowers 64 killsByStomp 12 timeLeft 8

mushrooms 58 killsByFire 4 marioMode 32

greenMushrooms 58 killsByShell 17

coins 16 killsTotal 42

4

Page 5: Monte Carlo Tree Search for the Super Mario Bros

Introduction

Related Work: A∗-based Controller

• Robin proposed using A∗ search in 2009 [*].

o States : The full game information observed by Mario

o Successor function : Mario’s possible actions

o Heuristic function : Time for reaching the most right side

[*] J. Togelius, S. Karakovskiy, and R. Baumgarten. The 2009 Mario AI competition. In Proceedings of the 2010 IEEE Congress on Evolutionary Computation, 2010.

current node

left, jump, speed

right, speed

jump right, jump, speed

left, speed

5

Page 6: Monte Carlo Tree Search for the Super Mario Bros

Monte Carlo Tree Search (MCTS)

Basic Algorithm : Motivation

• Problems of A∗ search in real-time games

o Lack of effective heuristic functions

o Complexity is exponential in worst case.

• Characteristics of MCTS

o Using the generality of random sampling

o Performing asymmetric tree growth

6

Page 7: Monte Carlo Tree Search for the Super Mario Bros

Monte Carlo Tree Search (MCTS)

Basic Algorithm

• Selection • function MCTS(𝑠0)

create root node 𝑖0 with state 𝑠0

while within

computational budget do

𝒊𝐄 ← 𝐒𝐞𝐥𝐞𝐜𝐭𝐢𝐨𝐧(𝒊𝟎)

𝑖L ← Expansion(𝑖E)

𝑅𝑘 ← Simulation(𝑖L)

Backpropagation(𝑖L, 𝑅𝑘)

return BestChild 𝑖0, 0 . action

E

𝒊𝟎

7

Page 8: Monte Carlo Tree Search for the Super Mario Bros

Monte Carlo Tree Search (MCTS)

Basic Algorithm

• Expansion • function MCTS(𝑠0)

create root node 𝑖0 with state 𝑠0

while within

computational budget do

𝑖E ← Selection(𝑖0)

𝒊𝐋 ← 𝐄𝐱𝐩𝐚𝐧𝐬𝐢𝐨𝐧(𝒊𝐄)

𝑅𝑘 ← Simulation(𝑖L)

Backpropagation(𝑖L, 𝑅𝑘)

return BestChild 𝑖0, 0 . action

L

8

Page 9: Monte Carlo Tree Search for the Super Mario Bros

Monte Carlo Tree Search (MCTS)

Basic Algorithm

• Simulation • function MCTS(𝑠0)

create root node 𝑖0 with state 𝑠0

while within

computational budget do

𝑖E ← Selection(𝑖0)

𝑖L ← Expansion(𝑖E)

𝑹𝒌 ← 𝐒𝐢𝐦𝐮𝐥𝐚𝐭𝐢𝐨𝐧(𝒊𝐋)

Backpropagation(𝑖L, 𝑅𝑘)

return BestChild 𝑖0, 0 . action

L

𝑹𝒌

A random sampling (a simulated game)

𝑅𝑘 : score

9

Page 10: Monte Carlo Tree Search for the Super Mario Bros

Monte Carlo Tree Search (MCTS)

Basic Algorithm

• Backpropagation • function MCTS(𝑠0)

create root node 𝑖0 with state 𝑠0

while within

computational budget do

𝑖E ← Selection(𝑖0)

𝑖L ← Expansion(𝑖E)

𝑅𝑘 ← Simulation(𝑖L)

𝐁𝐚𝐜𝐤𝐩𝐫𝐨𝐩𝐚𝐠𝐚𝐭𝐢𝐨𝐧(𝒊𝐋, 𝑹𝒌)

return BestChild 𝑖0, 0 . action

𝑹𝒌

𝑹𝒌

𝑹𝒌

𝑅𝑘 : score

10

Page 11: Monte Carlo Tree Search for the Super Mario Bros

Monte Carlo Tree Search (MCTS)

Basic Algorithm

• Final action selection • function MCTS(𝑠0)

create root node 𝑖0 with state 𝑠0

while within

computational budget do

𝑖E ← Selection(𝑖0)

𝑖L ← Expansion(𝑖E)

𝑅𝑘 ← Simulation(𝑖L)

Backpropagation(𝑖L, 𝑅𝑘)

return 𝐁𝐞𝐬𝐭𝐂𝐡𝐢𝐥𝐝 𝒊𝟎, 𝟎 . action

𝑣1 𝑣2

...

...

...

...

𝑠0

𝑎1 𝑎2

𝑣𝑖 : utility

11

Page 12: Monte Carlo Tree Search for the Super Mario Bros

Monte Carlo Tree Search (MCTS)

Basic Algorithm : Example

• Selection • function MCTS(𝑠0)

create root node 𝑖0 with state 𝑠0

while within

computational budget do

𝒊𝐄 ← 𝐒𝐞𝐥𝐞𝐜𝐭𝐢𝐨𝐧(𝒊𝟎)

𝑖L ← Expansion(𝑖E)

𝑅𝑘 ← Simulation(𝑖L)

Backpropagation(𝑖L, 𝑅𝑘)

return BestChild 𝑖0, 0 . action

𝒊𝟎

bf : 2 win : 1 draw : 0.5 lose : 0

12

Page 13: Monte Carlo Tree Search for the Super Mario Bros

Monte Carlo Tree Search (MCTS)

Basic Algorithm : Example

• Expansion, Simulation • function MCTS(𝑠0)

create root node 𝑖0 with state 𝑠0

while within

computational budget do

𝑖E ← Selection(𝑖0)

𝒊𝐋 ← 𝐄𝐱𝐩𝐚𝐧𝐬𝐢𝐨𝐧(𝒊𝐄)

𝑹𝒌 ← 𝐒𝐢𝐦𝐮𝐥𝐚𝐭𝐢𝐨𝐧(𝒊𝐋)

Backpropagation(𝑖L, 𝑅𝑘)

return BestChild 𝑖0, 0 . action

1

bf : 2 win : 1 draw : 0.5 lose : 0

13

Page 14: Monte Carlo Tree Search for the Super Mario Bros

Monte Carlo Tree Search (MCTS)

Basic Algorithm : Example

• Backpropagation • function MCTS(𝑠0)

create root node 𝑖0 with state 𝑠0

while within

computational budget do

𝑖E ← Selection(𝑖0)

𝑖L ← Expansion(𝑖E)

𝑅𝑘 ← Simulation(𝑖L)

𝐁𝐚𝐜𝐤𝐩𝐫𝐨𝐩𝐚𝐠𝐚𝐭𝐢𝐨𝐧(𝒊𝐋, 𝑹𝒌)

return BestChild 𝑖0, 0 . action

1

1

1

bf : 2 win : 1 draw : 0.5 lose : 0

14

Page 15: Monte Carlo Tree Search for the Super Mario Bros

Monte Carlo Tree Search (MCTS)

Basic Algorithm : Example

• Selection (2nd loop) • function MCTS(𝑠0)

create root node 𝑖0 with state 𝑠0

while within

computational budget do

𝒊𝐄 ← 𝐒𝐞𝐥𝐞𝐜𝐭𝐢𝐨𝐧(𝒊𝟎)

𝑖L ← Expansion(𝑖E)

𝑅𝑘 ← Simulation(𝑖L)

Backpropagation(𝑖L, 𝑅𝑘)

return BestChild 𝑖0, 0 . action

1

1

𝒊𝟎

bf : 2 win : 1 draw : 0.5 lose : 0

15

Page 16: Monte Carlo Tree Search for the Super Mario Bros

Monte Carlo Tree Search (MCTS)

Basic Algorithm : Example

• Expansion, Simulation (2nd loop)

• function MCTS(𝑠0)

create root node 𝑖0 with state 𝑠0

while within

computational budget do

𝑖E ← Selection(𝑖0)

𝒊𝐋 ← 𝐄𝐱𝐩𝐚𝐧𝐬𝐢𝐨𝐧(𝒊𝐄)

𝑹𝒌 ← 𝐒𝐢𝐦𝐮𝐥𝐚𝐭𝐢𝐨𝐧(𝒊𝐋)

Backpropagation(𝑖L, 𝑅𝑘)

return BestChild 𝑖0, 0 . action

1

1

0

bf : 2 win : 1 draw : 0.5 lose : 0

16

Page 17: Monte Carlo Tree Search for the Super Mario Bros

Monte Carlo Tree Search (MCTS)

Basic Algorithm : Example

• Backpropagation (2nd loop)

• function MCTS(𝑠0)

create root node 𝑖0 with state 𝑠0

while within

computational budget do

𝑖E ← Selection(𝑖0)

𝑖L ← Expansion(𝑖E)

𝑅𝑘 ← Simulation(𝑖L)

𝐁𝐚𝐜𝐤𝐩𝐫𝐨𝐩𝐚𝐠𝐚𝐭𝐢𝐨𝐧(𝒊𝐋, 𝑹𝒌)

return BestChild 𝑖0, 0 . action

1 0

0

0.5

bf : 2 win : 1 draw : 0.5 lose : 0

17

Page 18: Monte Carlo Tree Search for the Super Mario Bros

Monte Carlo Tree Search (MCTS)

Basic Algorithm : Example

• 3rd loop • function MCTS(𝑠0)

create root node 𝑖0 with state 𝑠0

while within

computational budget do

𝑖E ← Selection(𝑖0)

𝑖L ← Expansion(𝑖E)

𝑅𝑘 ← Simulation(𝑖L)

Backpropagation(𝑖L, 𝑅𝑘)

return BestChild 𝑖0, 0 . action

1 0

1

0.75

1

bf : 2 win : 1 draw : 0.5 lose : 0

18

Page 19: Monte Carlo Tree Search for the Super Mario Bros

Monte Carlo Tree Search (MCTS)

Basic Algorithm : Example

• 4th loop • function MCTS(𝑠0)

create root node 𝑖0 with state 𝑠0

while within

computational budget do

𝑖E ← Selection(𝑖0)

𝑖L ← Expansion(𝑖E)

𝑅𝑘 ← Simulation(𝑖L)

Backpropagation(𝑖L, 𝑅𝑘)

return BestChild 𝑖0, 0 . action

1

1

0.63

0.5

0.5

0.25

bf : 2 win : 1 draw : 0.5 lose : 0

19

Page 20: Monte Carlo Tree Search for the Super Mario Bros

Monte Carlo Tree Search (MCTS)

Basic Algorithm : Example

• nth loop • function MCTS(𝑠0)

create root node 𝑖0 with state 𝑠0

while within

computational budget do

𝑖E ← Selection(𝑖0)

𝑖L ← Expansion(𝑖E)

𝑅𝑘 ← Simulation(𝑖L)

Backpropagation(𝑖L, 𝑅𝑘)

return 𝐁𝐞𝐬𝐭𝐂𝐡𝐢𝐥𝐝 𝒊𝟎, 𝟎 . action

0.56

0.32

...

...

...

...

𝒔𝟎

𝒂𝟏 𝒂𝟐

0.49

0.75

0.87

bf : 2 win : 1 draw : 0.5 lose : 0

20

Page 21: Monte Carlo Tree Search for the Super Mario Bros

Monte Carlo Tree Search (MCTS)

Upper Confidence Bounds for Trees (UCT) • Example : 4th loop • UCT = MCTS + UCB [*]

• Selecting a child node 𝑐 which

𝑐 ∈ argmax𝑖∈𝐼(𝑣𝑖 + 𝐶𝑝 ×ln 𝑛𝑝

𝑛𝑖)

o 𝑝 : 𝑐’s parent node

o 𝐼 : the set of 𝑝’s children

o 𝑣𝑖 : 𝑖’s approximate utility

o 𝑛𝑖 : 𝑖’s visit count

o 𝑛𝑝 : 𝑝’s visit count

o 𝐶𝑝 : a tunable constant

[*] L. Kocsis and C. Szepesvári. Bandit based Monte-Carlo planning. In Proceedings of the 17th European Conference on Machine Learning, 2006.

Exploration

Exploitation 1 0

1

0.75

?

21

Page 22: Monte Carlo Tree Search for the Super Mario Bros

Controller Design using MCTS

Problem Formulation • States

o The full game information in 15 × 19 grid • Mario’s position, speed and so on

• Enemy’s position and speed and so on

• Object’s position, …

• Successor function o Mario’s possible actions

• A search node 𝑖 contains o Game state 𝑠𝑖

o Approximate utility 𝑣𝑖 (average score, winning rate, …)

o Visit count 𝑛𝑖 (the number of updates)

22

Page 23: Monte Carlo Tree Search for the Super Mario Bros

Controller Design using MCTS

MCTS-based Controller

Controller G

ame

Enviro

nm

en

t

Sensors

Actuators

MCTS(𝑠0) in 42ms

23

Page 24: Monte Carlo Tree Search for the Super Mario Bros

Controller Design using MCTS

MCTS-based Controller

Controller G

ame

Enviro

nm

en

t

Sensors

Actuators

MCTS(𝑠0) in 42ms

24

Page 25: Monte Carlo Tree Search for the Super Mario Bros

Controller Design using MCTS

Improvements on UCT

• Approach to the calculation of a simulated game’s result

o Modifications of the multi-objective weighed sum

distance 0.1 hiddenBlocks 24 marioStatus 1024

flowers 64 killsByStomp 12 timeLeft 2

mushrooms 58 killsByFire 4 marioMode 32

greenMushrooms 1 killsByShell 17

coins 16 killsTotal 42

hurts −42 stomps 1 carries 1

25

Page 26: Monte Carlo Tree Search for the Super Mario Bros

Controller Design using MCTS

Improvements on UCT : Review UCT

• Example • Simulation step

o A random sampling

• Performing random

actions until the

simulated game is

terminated

L

𝑹𝒌

26

Page 27: Monte Carlo Tree Search for the Super Mario Bros

Controller Design using MCTS

Improvements on UCT : UCT-best

• Example : 𝑁 = 3 • Best-of-𝑁 simulation strategy [*]

o 𝑁 candidates selected from possible actions

o Evaluating candidates

o Performing the best one

[*] T. Kozelek. Methods of MCTS and the game Arimaa. Master's thesis, Charles University, 2009.

L

27

Page 28: Monte Carlo Tree Search for the Super Mario Bros

Controller Design using MCTS

Improvements on UCT : UCT-best

• Example : 𝑁 = 3 • Best-of-𝑁 simulation strategy [*]

o 𝑁 candidates selected from possible actions

o Evaluating candidates

o Performing the best one

[*] T. Kozelek. Methods of MCTS and the game Arimaa. Master's thesis, Charles University, 2009.

L

28

Page 29: Monte Carlo Tree Search for the Super Mario Bros

Controller Design using MCTS

Improvements on UCT : UCT-best

• Example : 𝑁 = 3 • Best-of-𝑁 simulation strategy [*]

o 𝑁 candidates selected from possible actions

o Evaluating candidates

o Performing the best one

[*] T. Kozelek. Methods of MCTS and the game Arimaa. Master's thesis, Charles University, 2009.

L

29

Page 30: Monte Carlo Tree Search for the Super Mario Bros

Controller Design using MCTS

Improvements on UCT : UCT-best

• Example : 𝑁 = 3 • Best-of-𝑁 simulation strategy [*]

o 𝑁 candidates selected from possible actions

o Evaluating candidates

o Performing the best one

[*] T. Kozelek. Methods of MCTS and the game Arimaa. Master's thesis, Charles University, 2009.

L

30

Page 31: Monte Carlo Tree Search for the Super Mario Bros

Controller Design using MCTS

Improvements on UCT : UCT-best

• Example : 𝑁 = 3 • Best-of-𝑁 simulation strategy [*]

o 𝑁 candidates selected from possible actions

o Evaluating candidates

o Performing the best one

• Drawback

o Lack of the generality of randomness

[*] T. Kozelek. Methods of MCTS and the game Arimaa. Master's thesis, Charles University, 2009.

L

𝑹𝒌

31

Page 32: Monte Carlo Tree Search for the Super Mario Bros

Controller Design using MCTS

Improvements on UCT : UCT-multi

• Example : 𝑁 = 3

• Multi-simulation simulation strategy

o Performing 𝑁 random samplings

o Selecting the best result to propagate

• Advantage

o Improving the accuracy of randomness

L

𝑹𝒌𝟐 𝑹𝒌

𝟑 𝑹𝒌𝟏

32

Page 33: Monte Carlo Tree Search for the Super Mario Bros

• Small problem : Parameter tuning

o 15 different levels

o Computational budget : 40ms

o Results’ behavior is similar to the big problem’s results.

• Big problem : Performance comparison

o 512 different levels (used in the Mario AI competition)

o Computational budget : 5ms - 40ms

Experimental Results

Performance Measurement

33

Page 34: Monte Carlo Tree Search for the Super Mario Bros

• The UCT-multi controller

o UCT uses the multi-simulation simulation strategy.

Experimental Results

Small Problem: Parameter Tuning

34

Page 35: Monte Carlo Tree Search for the Super Mario Bros

Experimental Results

Small Problem: Parameter Tuning

• The UCT-best controller

o UCT uses the best-of-𝑁 simulation strategy .

35

Page 36: Monte Carlo Tree Search for the Super Mario Bros

Experimental Results

Big Problem: Performance Comparison • A∗-based controllers :

o plainAstar : Robin’s A∗-based controller without improvements

o refinedAstar : Robin’s A∗-based controller

• MCTS-based controllers : o UCT : UCT without modifying its simulation strategy

o UCT-best : UCT using the best-of-𝑁 simulation strategy

o UCT-multi : UCT using the multi-simulation simulation strategy

o UCT +carr : UCT with the additional objective carries

o UCT-multi +carr : UCT-multi with the additional objective carries

• Parameter settings : o UCT-best : Increasing the number of candidate actions

o UCT-multi : Increasing the number of random samplings

36

Page 37: Monte Carlo Tree Search for the Super Mario Bros

Experimental Results

Big Problem: Performance Comparison

37

Page 38: Monte Carlo Tree Search for the Super Mario Bros

Experimental Results

Big Problem: Performance Comparison

38

Page 39: Monte Carlo Tree Search for the Super Mario Bros

Conclusions • Challenge of the Mario AI benchmark

o Large state space

o Lack of effective heuristic functions

• Contribution of this study o Showing the applicability of MCTS

o Results outperform the A∗-based controller.

o Improving the MCTS-based controller by • Improving random sampling’s accuracy

(the multi-simulation simulation strategy )

• Applying a good calculation of a simulated game’s result (the additional objective carries)

39

Page 40: Monte Carlo Tree Search for the Super Mario Bros

Thank You for Attention

40