Top Banner
2004/11/13 GPW2004 1 What Shogi Programs Still Cannot Do - A New Test Set for Shogi - Reijer Grimbergen and Taro Mu raoka Department of Informatics Yamagata University
24

2004/11/13GPW20041 What Shogi Programs Still Cannot Do - A New Test Set for Shogi - Reijer Grimbergen and Taro Muraoka Department of Informatics Yamagata.

Jan 02, 2016

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: 2004/11/13GPW20041 What Shogi Programs Still Cannot Do - A New Test Set for Shogi - Reijer Grimbergen and Taro Muraoka Department of Informatics Yamagata.

2004/11/13 GPW2004 1

What Shogi Programs Still Cannot Do- A New Test Set for Shogi -

Reijer Grimbergen and Taro Muraoka

Department of Informatics

Yamagata University

Page 2: 2004/11/13GPW20041 What Shogi Programs Still Cannot Do - A New Test Set for Shogi - Reijer Grimbergen and Taro Muraoka Department of Informatics Yamagata.

2004/11/13 GPW2004 2

Outline

The importance of testing

Test sets for chess

Test sets for shogi

A new test set for shogi

Problem area analysis

Some new results

Differences between humans and computers

Conclusions and future work

Page 3: 2004/11/13GPW20041 What Shogi Programs Still Cannot Do - A New Test Set for Shogi - Reijer Grimbergen and Taro Muraoka Department of Informatics Yamagata.

2004/11/13 GPW2004 3

The importance of testingGame programming

A program should play stronglyMore common is the reverse approach: minimize the number of bad moves

Testing can help determine problem areasIncremental testing

Save positions that the program did not handle wellDrawbacks

• Test set is program-specific• Positions selected subjectively

Page 4: 2004/11/13GPW20041 What Shogi Programs Still Cannot Do - A New Test Set for Shogi - Reijer Grimbergen and Taro Muraoka Department of Informatics Yamagata.

2004/11/13 GPW2004 4

The importance of testing

The requirements of a test setTesting a wide variety of potential problem areas

Not specific for one program

Test design in gamesMainly done for chess

Current test sets for shogi have shortcomings

Shogi research is at a point where focusing the effort could be a great help

Proposing a new test set for shogi

Page 5: 2004/11/13GPW20041 What Shogi Programs Still Cannot Do - A New Test Set for Shogi - Reijer Grimbergen and Taro Muraoka Department of Informatics Yamagata.

2004/11/13 GPW2004 5

Test sets for chessThe Bratko-Kopec test set

12 tactical positions and 12 strategic positionsDesigned to compare human and computer performance in chessThus far, no program can solve all positions

Reinfeld’s Win at chess300 tactical positionsUsed as a first test for new programs

LCT II35 positionsGood balance between strategic, tactical and endgame positionsAn ELO rating can be calculated from the solved positions

The Lindner test setA set of positions that are considered hard for computers to solve

Page 6: 2004/11/13GPW20041 What Shogi Programs Still Cannot Do - A New Test Set for Shogi - Reijer Grimbergen and Taro Muraoka Department of Informatics Yamagata.

2004/11/13 GPW2004 6

Test sets for shogiThe Matsubara-Iida test set

48 positions taken from professional gamesSelected by an expert playerAims at judging the strength of shogi programsFirst given to human players to establish a connection with playing strength

Problems with the Matsubara-Iida test setJudging programming strength can be established more accurately by playing on the internetNo ELO calculation like in LCT IISubjective selection leaves doubts about test balanceWhat is difficult for computers is not necessarily difficult for humans and vice versa, so connection with playing strength is unreliable

Page 7: 2004/11/13GPW20041 What Shogi Programs Still Cannot Do - A New Test Set for Shogi - Reijer Grimbergen and Taro Muraoka Department of Informatics Yamagata.

2004/11/13 GPW2004 7

Test sets for shogi

Other test sets for shogiYamashita’s test set (10 positions)

Tanase’s test set (19 positions)

Problems with these test setsToo small

Program specific

Unclear if there is only one solution

Page 8: 2004/11/13GPW20041 What Shogi Programs Still Cannot Do - A New Test Set for Shogi - Reijer Grimbergen and Taro Muraoka Department of Informatics Yamagata.

2004/11/13 GPW2004 8

A new test set for shogi

What do we want from a test set?1. As general as possible

2. Points to as many problem areas as possible

Find positions that can not be solved by the best programs

Finding weaknesses instead of measuring strength

Page 9: 2004/11/13GPW20041 What Shogi Programs Still Cannot Do - A New Test Set for Shogi - Reijer Grimbergen and Taro Muraoka Department of Informatics Yamagata.

2004/11/13 GPW2004 9

A new test set for shogiPositions selected from Shukan Shogi

Every week six next-move problemsMiddle game positions and endgame positionsDifferent tactical themes: winning material, attack, defense and matingOur goal: create a test set of 100 positions

The programs we usedAI Shogi 2003Todai Shogi 5Gekisashi 2

Conditions30 seconds on 2 GHz Pentium 4

Page 10: 2004/11/13GPW20041 What Shogi Programs Still Cannot Do - A New Test Set for Shogi - Reijer Grimbergen and Taro Muraoka Department of Informatics Yamagata.

2004/11/13 GPW2004 10

A new test set for shogi

This was not easy!More than 1500 positions needed to be checked to find our test set

Additional featureThe percentage of respondents who solved the problem is given

Differences between what is difficult for humans and difficult for computers

Page 11: 2004/11/13GPW20041 What Shogi Programs Still Cannot Do - A New Test Set for Shogi - Reijer Grimbergen and Taro Muraoka Department of Informatics Yamagata.

2004/11/13 GPW2004 11

Problem area analysisWhy are the positions difficult?

Using the analysis tools in Todai Shogi, Gekisashi and AI Shogi to find problem areas

Our first analysis indicates seven problem areasHorizon effect due to consecutive checksNot calling the tsume shogi solver deep in the search treeInaccurate evaluation functionIncorrect forward pruningMate with unpromoted piecesInsufficient hardware speedProblems with time allocation

Page 12: 2004/11/13GPW20041 What Shogi Programs Still Cannot Do - A New Test Set for Shogi - Reijer Grimbergen and Taro Muraoka Department of Informatics Yamagata.

2004/11/13 GPW2004 12

Problem area analysisHorizon effect and tsume shogi

Problem 750-3Solved: 16%

Solution2 四銀、 1 四玉(同歩、 2 三金、同玉、3 ニ角成)、 3 五金

Program repliesTodai: 1 五歩(敗勢)Gekisashi: 3 ニ角成(後手優勢)AI Shogi: 3 五金

Page 13: 2004/11/13GPW20041 What Shogi Programs Still Cannot Do - A New Test Set for Shogi - Reijer Grimbergen and Taro Muraoka Department of Informatics Yamagata.

2004/11/13 GPW2004 13

Problem area analysisHorizon effect and tsume shogi

The problemHorizon checks after 2 四銀、 1四玉、 3 五金

The same position without horizon checks can be solved by all programs

Page 14: 2004/11/13GPW20041 What Shogi Programs Still Cannot Do - A New Test Set for Shogi - Reijer Grimbergen and Taro Muraoka Department of Informatics Yamagata.

2004/11/13 GPW2004 14

Problem area analysisHorizon effect and tsume shogi

Another problem: tsume shogi deep in the search tree

Gekisashi with more time

2 四銀、 1 四玉、 3 五金、 7 九銀、同玉、2 五桂、 1 五歩、同馬、同銀(- 1192 )White has mate in 9 after 同玉 and black has a mate in 3 after 2 五桂 !

Page 15: 2004/11/13GPW20041 What Shogi Programs Still Cannot Do - A New Test Set for Shogi - Reijer Grimbergen and Taro Muraoka Department of Informatics Yamagata.

2004/11/13 GPW2004 15

Problem area analysisEvaluation and forward pruning

Problem 755-3Solved: 51%

Solution2 二金、同金、 2 三角成、 3 三金、同馬

Program repliesTodai: 2 一角成、 4 一玉、 6 一金(勝勢)Gekisashi: 6 八銀、 5 六成銀、 3 七桂、 6 六銀、2 五桂、 5 四歩、 2 一角成、 4 一玉(先手勝勢)AI Shogi: 6 八銀、 5 八成銀、 2 一角成、 4 一玉

Page 16: 2004/11/13GPW20041 What Shogi Programs Still Cannot Do - A New Test Set for Shogi - Reijer Grimbergen and Taro Muraoka Department of Informatics Yamagata.

2004/11/13 GPW2004 16

Problem area analysisEvaluation and forward pruning

The problem: an incorrect evaluationAfter 2 一角成、 4 一玉 the white king can escape, but this can not be assessed

Evaluating the chances of escaping an attack is difficult?

Another problem: forward pruningConsecutive sacrifices 2 二金 and 2 三角成Multiple sacrifices not searched deep enough?

Page 17: 2004/11/13GPW20041 What Shogi Programs Still Cannot Do - A New Test Set for Shogi - Reijer Grimbergen and Taro Muraoka Department of Informatics Yamagata.

2004/11/13 GPW2004 17

Problem area analysisUnpromoted pieces

Problem 935-2Solved: 95%

Solution1 三歩不成、 2 六銀直、( 1 四歩は反則) 1 四玉

Program repliesTodai: 5 二と(敗勢)Gekisashi:8 四桂(後手勝勢)AI Shogi: 投了 (!)

Page 18: 2004/11/13GPW20041 What Shogi Programs Still Cannot Do - A New Test Set for Shogi - Reijer Grimbergen and Taro Muraoka Department of Informatics Yamagata.

2004/11/13 GPW2004 18

Problem area analysisUnpromoted pieces

The problem here seems a special case of forward pruningPromoting a major piece or a pawn is almost always better than not promoting

Non-promotions of these pieces are pruned to improve search efficiency

Not a high priority problem, but could have consequences for thinking in opponent time

When there is no difference between promoting and non-promoting a piece, non-promoting makes thinking in opponent time useless

My advice : play the non-promotion to win some time!

Page 19: 2004/11/13GPW20041 What Shogi Programs Still Cannot Do - A New Test Set for Shogi - Reijer Grimbergen and Taro Muraoka Department of Informatics Yamagata.

2004/11/13 GPW2004 19

Problem area analysisOther problem areas

Insufficient hardware speedSome positions could be solved by giving the program more timeImproved hardware speed will automatically solve these positions

Time allocationIn some positions, the programs would play very quicklyThese positions were deleted from our test setHowever, it might be a different problem area: when to cut off the search?

Page 20: 2004/11/13GPW20041 What Shogi Programs Still Cannot Do - A New Test Set for Shogi - Reijer Grimbergen and Taro Muraoka Department of Informatics Yamagata.

2004/11/13 GPW2004 20

Problem area analysisOverview

Problem Area Positions

Insufficient hardware speed 31

Inaccurate evaluation function 20

Incorrect forward pruning 19

Horizon effect 18

Tsume shogi 11

Mate using unpromoted pieces 6

Reason unclear 7

Page 21: 2004/11/13GPW20041 What Shogi Programs Still Cannot Do - A New Test Set for Shogi - Reijer Grimbergen and Taro Muraoka Department of Informatics Yamagata.

2004/11/13 GPW2004 21

Some new results

New program versions have been releasedTodai Shogi 6 and 7, Gekisashi 3 and AI Shogi 2004

Results of Todai 6 on the test setSolved 6 positions

The problem areas of these positions was different• Inaccurate evaluation function (2 positions)

• Insufficient hardware speed (2 positions)

• Horizon effect (1 position)

• Reason unclear (1 position)

Page 22: 2004/11/13GPW20041 What Shogi Programs Still Cannot Do - A New Test Set for Shogi - Reijer Grimbergen and Taro Muraoka Department of Informatics Yamagata.

2004/11/13 GPW2004 22

Differences between humans and computers

How difficult are the positions for human players?

Almost half of the positions (46) can be solved by more than 50% of the human respondentsThere are 14 positions that can not be solved by computers, but by more than 80% of the humans

Human percentage

Positions

0 – 10% 0

11 – 20% 12

21 – 30% 18

31 – 40% 10

41 – 50% 13

51 – 60% 16

61 – 70% 7

71 – 80% 9

81 – 90% 9

91 – 100% 5

Page 23: 2004/11/13GPW20041 What Shogi Programs Still Cannot Do - A New Test Set for Shogi - Reijer Grimbergen and Taro Muraoka Department of Informatics Yamagata.

2004/11/13 GPW2004 23

Conclusions and future workWe have proposed a set of 100 positions that is general and points to specific problem areas in computer shogiAs more positions get solved, we intend to replace them with new positionsFurther investigate of the unsolved positions for which the problem could not be determinedMaking further comparisons between what is difficult for humans and difficult for computers

Page 24: 2004/11/13GPW20041 What Shogi Programs Still Cannot Do - A New Test Set for Shogi - Reijer Grimbergen and Taro Muraoka Department of Informatics Yamagata.

2004/11/13 GPW2004 24

Finally

Download the test set here

gamelab.yz.yamagata-u.ac.jp/RESEARCH/shogitestset.zip

Let me know about your results