Top Banner
Evaluation Function in Game Playing Programs M1 Yasubumi Nozawa Chikayama & Taura Lab
29

Evaluation Function in Game Playing Programs M1 Yasubumi Nozawa Chikayama & Taura Lab.

Jan 01, 2016

Download

Documents

Phyllis Perkins
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Evaluation Function in Game Playing Programs M1 Yasubumi Nozawa Chikayama & Taura Lab.

Evaluation Function in Game Playing Programs

M1 Yasubumi Nozawa

Chikayama & Taura Lab

Page 2: Evaluation Function in Game Playing Programs M1 Yasubumi Nozawa Chikayama & Taura Lab.

Outline

1. Introduction

2. Parameter tuning

1. Supervised learning

2. Comparison training

3. Reinforcement learning

3. Conclusion

Page 3: Evaluation Function in Game Playing Programs M1 Yasubumi Nozawa Chikayama & Taura Lab.

Introduction

Page 4: Evaluation Function in Game Playing Programs M1 Yasubumi Nozawa Chikayama & Taura Lab.

Game Playing Program

Simple implementation of real-world problem- If one player wins, the other must lose.

Very large search spaces- We can’t get complete information in limited time.

Page 5: Evaluation Function in Game Playing Programs M1 Yasubumi Nozawa Chikayama & Taura Lab.

Game tree search

7

9 8 7 6 5 4 3 2 1

7 4 1

Root node: current position Node: position Branch: legal move

MINIMAX SEARCH

Page 6: Evaluation Function in Game Playing Programs M1 Yasubumi Nozawa Chikayama & Taura Lab.

Requirements for a evaluation function accuracy efficiency

Static Evaluation Function

4 5 2 -1 6 3 7 -2 1

Static Evaluation FunctionEvaluated by

Page 7: Evaluation Function in Game Playing Programs M1 Yasubumi Nozawa Chikayama & Taura Lab.

Features and Weights

Feature f(The number of pieces of each side, etc.)

Weight w(The weight of a important feature must be big.)

Linear

Non-linear Neural Network , etc.

nn fwfw 11

Page 8: Evaluation Function in Game Playing Programs M1 Yasubumi Nozawa Chikayama & Taura Lab.

Parameter tuning

Page 9: Evaluation Function in Game Playing Programs M1 Yasubumi Nozawa Chikayama & Taura Lab.

Machine learning in games

In simple game like Othello and backgammon, parameter tuning by machine learning has been successful.

In complex games like Shogi, hand-crafted evaluation function is still better. Machine learning is used only in limited domains.(only value of materials, etc.)

Page 10: Evaluation Function in Game Playing Programs M1 Yasubumi Nozawa Chikayama & Taura Lab.

Outline

1. Introduction

2. Parameter tuning

1. Supervised learning

2. Comparison training

3. Reinforcement learning

3. Conclusion

Page 11: Evaluation Function in Game Playing Programs M1 Yasubumi Nozawa Chikayama & Taura Lab.

Training sample:

Minimize the error of the evaluation function on these positions.

Supervised learning

(Position, Score)

Page 12: Evaluation Function in Game Playing Programs M1 Yasubumi Nozawa Chikayama & Taura Lab.

Supervised learning(1)

Backgammon program [Tesauro 1989] Score is given by human experts. Standard back-propagation Far from human expert-level. Out

In1 In2 In458 In459

・・・

・・・

Input : Position and move (459 hand-crafted feature of Boolean value)

Score of Move

w1 w2

w3 w5w4

Page 13: Evaluation Function in Game Playing Programs M1 Yasubumi Nozawa Chikayama & Taura Lab.

Supervised learning

Difficulties in supplying training data by experts Consuming much time of experts to create a

database. Human experts don’t think in terms of absolute

scores.

Page 14: Evaluation Function in Game Playing Programs M1 Yasubumi Nozawa Chikayama & Taura Lab.

Supervised learning(2)

Bayesian learning [Lee et. al. 1988] Training position is labeled win or lose. Estimate the mean feature vector and the covariance

matrix for each label from training data.

x1

x3

x4

x2

μwin

μlose

Test sample

Page 15: Evaluation Function in Game Playing Programs M1 Yasubumi Nozawa Chikayama & Taura Lab.

Supervised learning(3)

LOGISTELLO [Buro 1998] Build different classifiers for different stages of the game.

(Othello is a game of finite plies.) last stage → middle/first stage.

(Scores of last stage is more reliable than middle/first stage.)

Page 16: Evaluation Function in Game Playing Programs M1 Yasubumi Nozawa Chikayama & Taura Lab.

Outline

1. Introduction

2. Parameter tuning

1. Supervised learning

2. Comparison training

3. Reinforcement learning

3. Conclusion

Page 17: Evaluation Function in Game Playing Programs M1 Yasubumi Nozawa Chikayama & Taura Lab.

Comparison training

Training sample:

Evaluation function learns to satisfy the constraint of these training sample.

Expert’s move is preferred above all other moves.

(Position_1 , Position_2, which is preferable)

Page 18: Evaluation Function in Game Playing Programs M1 Yasubumi Nozawa Chikayama & Taura Lab.

Backgammon program [Tesauro 1989] Consistency Transitivity Standard back-propagation Simpler and stronger than

preceding versions of supervisedlearning.

Comparison training

Final position (a) Final position (b)

W1 W2

W3 W4

W1=W2

W3=-W4

Which is preferable

)( abba )( cacbba

Page 19: Evaluation Function in Game Playing Programs M1 Yasubumi Nozawa Chikayama & Taura Lab.

Comparison training

Problems of Comparison training Is the assumption “human expert’s move is the

best“ correct? A program trained on experts’ games will imitate a

human playing style, which makes it harder for the program to surprise a human being.

Page 20: Evaluation Function in Game Playing Programs M1 Yasubumi Nozawa Chikayama & Taura Lab.

Outline

1. Introduction

2. Parameter tuning

1. Supervised learning

2. Comparison training

3. Reinforcement learning

3. Conclusion

Page 21: Evaluation Function in Game Playing Programs M1 Yasubumi Nozawa Chikayama & Taura Lab.

Reinforcement learning

No training information from a domain expert.

Program explores the different actions.

It will receive feedback from the environment (reward). Win or Lose By which margin the

program won/lost. Program (Learner)

Environment

action reward position

Page 22: Evaluation Function in Game Playing Programs M1 Yasubumi Nozawa Chikayama & Taura Lab.

TD(λ)

Temporal Difference Learning

t

kk

kttttt xFxFxFww

111 )())()((

wt : Weight vector at time t.

F: Evaluation function (Function of vector W and position x)

xt : Position at time t.

α: Learning rate.

λ: Influence of the current evaluation function value for weight updates of previous moves

Page 23: Evaluation Function in Game Playing Programs M1 Yasubumi Nozawa Chikayama & Taura Lab.

Temporal Difference Learning

F(xt-3) F(xt-2) F(xt-1) F(xt) F(xt+1)

TD(λ)

t

kk

kttttt xFxFxFww

111 )())()((

0)()( 1 tt xFxFλ= 0

λ= 0.5

λ= 1

F(xt+1)

F(xt+1)

F(xt)

F(xt)

F(xt-1)

F(xt-1)

F(xt-2)

F(xt-2)

F(xt-3)

F(xt-3)

Page 24: Evaluation Function in Game Playing Programs M1 Yasubumi Nozawa Chikayama & Taura Lab.

Temporal Difference Learning(1)

TD-Gammon [Tesauro 1992-] Neural Network (Input: raw board information) TD(λ) Self-play (300,000games) Human expert level

Program

Program

action action

Page 25: Evaluation Function in Game Playing Programs M1 Yasubumi Nozawa Chikayama & Taura Lab.

Self-play in other games

None of those successors achieved a performance as impressive as TD-Gammon’s.

In case of backgammon, dice before each move ensured a sufficient variety.

exploration-exploitation dilemmaProgram

Program

action action

Page 26: Evaluation Function in Game Playing Programs M1 Yasubumi Nozawa Chikayama & Taura Lab.

Temporal Difference learning(2)

KNIGHT CAP [Baxter et al. 1998] Learned on Internet chess server 1468 features (linearly combined) × 4 stages TDLeaf (λ)

Page 27: Evaluation Function in Game Playing Programs M1 Yasubumi Nozawa Chikayama & Taura Lab.

Knight Cap’s rating

All but the material parameters are initially set to zero.

After about 1000 games on the server, its rating had improved to exceed 2150, which is an improvement from an average amateur to a strong expert.

Page 28: Evaluation Function in Game Playing Programs M1 Yasubumi Nozawa Chikayama & Taura Lab.

Conclusion

Page 29: Evaluation Function in Game Playing Programs M1 Yasubumi Nozawa Chikayama & Taura Lab.

Conclusion

Machine learning in games Successful in simple game Used in limited domains in complex game such as

Shogi Reinforcement learning is successful is stochastic

game such as Backgammon.