Top Banner
-4 t liesis subrnitted in conformitjr \vit h tlie requirernents for tlie clegree of Alaster of Applied Science Graclrrate Department of Electrical anci Cornputer Engiiieeririg University of Toronto
99

University of Toronto · 2020. 4. 8. · -4 t liesis subrnitted in conformitjr \vit h tlie requirernents for tlie clegree of Alaster of Applied Science Graclrrate Department of Electrical

Jan 20, 2021

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: University of Toronto · 2020. 4. 8. · -4 t liesis subrnitted in conformitjr \vit h tlie requirernents for tlie clegree of Alaster of Applied Science Graclrrate Department of Electrical

-4 t liesis subrnitted in conformitjr \vit h tlie requirernents for tlie clegree of Alaster of Applied Science

Graclrrate Department of Electrical anci Cornputer Engiiieeririg University of Toronto

Page 2: University of Toronto · 2020. 4. 8. · -4 t liesis subrnitted in conformitjr \vit h tlie requirernents for tlie clegree of Alaster of Applied Science Graclrrate Department of Electrical

National Library Bibliothèque nationale du Canada

Acquisitions and Acquisitions et Bibliographie Services services bibliographiques 395 Wellington Street 395. nie Wellington Ottawa O N KtA O N 4 Ottawa ON KI A ON4 Canada Canada

The author has granted a non- exclusive licence allowing the National Library of Canada to reproduce, loan, distnhte or sen copies of this thesis in microform, paper or electronic formats.

The author retains ownership of the copyright in this thesis. Neither the thesis nor substantial extracts fiom it may be printed or othenvise reproduced without the author's permission.

L'auteur a accordé une licence non exclusive permettant à la Bibliothèque nationale du Canada de reproduire, prêter, distribuer ou vendre des copies de cette thése sous la forme de microfiche/film, de reproduction sur papier ou sur format électronique.

L'auteur conserve la propriété du droit d'auteur qui protège cette thèse. Ni la thèse ni des extraits substantiels de celle-ci ne doivent être imprimés ou autrement reproduits sans son autorisation.

Page 3: University of Toronto · 2020. 4. 8. · -4 t liesis subrnitted in conformitjr \vit h tlie requirernents for tlie clegree of Alaster of Applied Science Graclrrate Department of Electrical

Abstract

Parallel Alpha-Beta Secuch on S haret! hIernor3- 5lult iprocessors

Valavan Mcmohararajali

Alaster of Appliecl Science

Gracluate Depart ment of Elect rical ancl Cornputer Er-igineering

University of Toronto

2001

The alpha-beta algorit hm is a well knon-n methocl for the secluential search of gcme t rees. Two

methods. yoii~ig brothers wait concept ancl d p a m i c tree splitting liave been usecl sirccessf~illy

in parallel game tree search. First. this work introduces the notion of an esponentially orclerecl

gamc tree as a mode1 for the game trees encouritereci in practice. Second. esponentially orclerecl

trees are iised in the stiitly of the tree splitting rriethocls i.isecl bj- y i ~ x l g brotliers wait concept

21nc.l ciyna~nic tree splitting. Finally a ncw tree splitting nietliocl baseci on neiirnl iletworks is

iritrodirced m c l is fourid to outperform the otlicir two rnethods on certain types of trees.

Page 4: University of Toronto · 2020. 4. 8. · -4 t liesis subrnitted in conformitjr \vit h tlie requirernents for tlie clegree of Alaster of Applied Science Graclrrate Department of Electrical

Dedicat ion

To ml- parents. Amrna ancl Appa-

Page 5: University of Toronto · 2020. 4. 8. · -4 t liesis subrnitted in conformitjr \vit h tlie requirernents for tlie clegree of Alaster of Applied Science Graclrrate Department of Electrical

Acknowledgement s

First. I wonlcl like to thCuil; m>- supervisor. Professor 2. G. Vranesic. for his advice. guidance and

support. His continua1 encouragement when 1 vas facecl wit t i teclmicd difficulties Lielpecl me

push throtigh m c l cliscover the right solutions. He \Xri\-j d w ~ s willing to set asicle any amoiint

of cinie for a ctiscussion oti thesis matters.

1 am gratefd to both OGS nricl NSERC for provïcling the means CO F~irther rny eclrication.

Last. but certainly not Lest, I would like CO thank niy wXe. Abiramy. for her love and

support cluring the course of this thesis.

Page 6: University of Toronto · 2020. 4. 8. · -4 t liesis subrnitted in conformitjr \vit h tlie requirernents for tlie clegree of Alaster of Applied Science Graclrrate Department of Electrical
Page 7: University of Toronto · 2020. 4. 8. · -4 t liesis subrnitted in conformitjr \vit h tlie requirernents for tlie clegree of Alaster of Applied Science Graclrrate Department of Electrical

4 Neural Network Structure and Back-Propagation . . . . . . . . . . . . . . . . . . .5 0

.5 .5 Performance of Node Classification Schemes . . . . . . . . . . . . . . . . . . . . . .52

6 Experirnents on a Paralle1 Alpha-Beta Simulator 58

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-1 Introcluction .5Y

6.2 A Siniplifieci S hared hlernory hlultiprocessor . . . . . . . . . . . . . . . . . . . . . 58

6.3 SIukiprocessor Sini~rlation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

6-4 The Parallel -Alplia-Beta Simulator (PABSim) . . . . . . . . . . . . . . . . . . . . 63

6.5 Split-Point Selection Schemes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 3

6.G Performance of Spiit-Point Selection Scliernes . . . . . . . . . . . . . . . . . . . . 74

7 Conclusions and Future Work 86 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 . 1 Concliisions 86

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2 FutureWork 86

A Seeds Used For Artificial Tree Generation 87

A-1 Set l . tg. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

-42 Set 2 . tg., . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . S i

Bibliograpby 87

Page 8: University of Toronto · 2020. 4. 8. · -4 t liesis subrnitted in conformitjr \vit h tlie requirernents for tlie clegree of Alaster of Applied Science Graclrrate Department of Electrical

Chapter 1

Introduction

The seminal paper bj- Clairclc E. Shannon [Z] incroclucec1 tlie concept of game trees dong wit

a simple algorithni (hli1iih1a.s) for searching tliern. Shannon mas pr im~ir i ly interescecl in creating

a computer player for the game of chess. Here we consider the t rees t hat arise frorn a purticidar

c l a s of gzllmes. Eacli garne in this class 1icm the follotving properties:

T h e are txo players involvecl in the game: player 1 and pla>-er 2.

The tnro players take tiirns making rnoves. At anj- position i n the ganie. a finite nunrber

of rnoves are asmilable to the p l v e r on niove.

TIie game is cleterrninistic - cliere are no elements of cliance in the game.

It is a gume of perfect information. That is. botli players krion- die entire stacc of the

ganie at 21.11 cinies. For esample. chess is a game of perfect information. For clic cl~zration

of tlie gnnie. botti playcrs know the board position. Hot-ex-er. 2-player poker is not a gaule

of perfect information, Although. player 1 c m see his carcls. Ile caririot see thc carcls that

player 2 liolcts.

There are tliree possible outcornes in the game: a win for play-er 1, a win for p lver 2 or a

drmv. Garnes that do not encl in a draw are dso incl~iclecl in t- lie class. These garnes have

two possible o~ttcornes: a win for player 1 or a \vin for player 2.

The construction and the siibsequent search of game trees fornis t l i e basis of matiy conipi.iter

progams designecl to pli1.y two player strategy games. -4 game tree is a wa4- of represe~iting the

possibilities that are available to the players involved in the game- The search of the game tree

yielc1s the optimal secluence of play for both sicles.

Efficient dgorithms for the sequential searcli of game trees h a v e been in existence for a

long time (since 1963 [12]). In a tree wliere good moves are searchecl before bad ones, a good

Page 9: University of Toronto · 2020. 4. 8. · -4 t liesis subrnitted in conformitjr \vit h tlie requirernents for tlie clegree of Alaster of Applied Science Graclrrate Department of Electrical

sequeritial tree search algorithm will esamirie a fraction of the entire tree. On a parallel systern,

secluential dgorichms arc estendecl to czllow several processors to share thc searcli effort. Initial

attempts at parallel searcli had limitect siiccess on a largc nirrnber of processors El71 but morc

recent efforts lia\~e dernonstratecl t hat a large number of processors can be usecf effect ivel5- [SI.

One of the biggest clifficulties in paraIlel searcti is in cleterrziining d i e r e niriltipIe processors cari

be irsed to split rrp the scarch effort. The main fociis of this work is to espiore alternative n . q s

of clioosirig where to split the t ree. A corciparison of the tree split ting techniclries in ciment rise

wu not previously acciilable. This work produces sirch a cornparison iising artificially gcneratecl

trees. .A new tree splitting teclinicliie is also introcl~icecl anci its performance is coniparecl to

previously esisting technicpes.

Chaptcr 3 explores tlie different algorithrns available for secliiential tree scarch. Three ap-

proaches to parallel search are csarninecl in cliapter 3 . Chapter 4 introciuces tlie concept of

artificially gcneratecl trees. -4 new nocle classification teclinique is iritroclucecl in ctiapter 5 ancl

its performance is comparecl to e-xïsting techniques iising a sequcntial tree searclier. Chapter 6

stuclies the perfomiance of three tree splitting teclinicl~~cs within a parallel tree searchcr.

Page 10: University of Toronto · 2020. 4. 8. · -4 t liesis subrnitted in conformitjr \vit h tlie requirernents for tlie clegree of Alaster of Applied Science Graclrrate Department of Electrical

Chapter 2

Game Tree Search

Introduction

A simple tree for a two-ph>-cr game is presenteci in Figure 2-1- -4 node in tlic tree rcpresents

a position in the ganie while a brandi represents a move availablc a t u partictilar position.

Player 1 is on move a t nocles with tlic rectangular graphic ancl plq-er 2 is on rnove at nodes

n-itli a circle graptiic. For esample. a t the Root node. player 1 is on movc aricl tlie plaxer ticas

two moves a\-ailable: a and 6. Eacli leaf nocle lias been assignecl a score that indicates lion.-

valuable that position is. A positive score iriclicates tl-iac playcr 1 is winning wliile a ncgative

score inclicates ttiat p lc~er 2 is 11-inning: a score of O iriclicates a clraw. The magnitude of the

score conveys important infortnation as n~cll- Higl-icr the score. more fa\-orable a position is for

player 1. Sirnilarly. lower thc scorc. more FavorubIc a position is for plq-er 2. Xote tllat tliis

scoring schenie is cirbitrary ancl there are sevcrd possible schemes tliat cari l x uscd as long as

the scoring scheme àlloms one to clistingiiish wliich of the two players is winnitig.

1;Vhile the above clisciission involves a tree for a two-plqer garne, a game tree can b e

constructecl for a large clciss of problems that cloesn't involve garnes. -411 that a gcurie tree

requires is tliat tliere be tn-O opposing forces a n d that they occupy alternute levels in the tree.

The probiern in game trcc search is to fincl the game tree value. The i-alue of a game tree

is the score of the leaf node that is reachecl wlien both sides exercise their best options. Froni

a. practical viewpoint. mhat one really needs to fincl is the option at the root tlint leacls to the

game tree value. In the case of the tree in Figure 2.1. the best option is tlie rnove that gives

p l a ~ e r 1 the best chance of winning ,zssiiming that both players are plagng perfecttlj: Keeping

track of the path that leads to the game tree value is a trivial enhancement to an algorithm

that determines ttiis vdiie. In the discussions that follow. for simplicits it is assurneci that the

discovery of the g,me tree value is ecpivalent to the discovcry of the patli thût leads to that

value.

Page 11: University of Toronto · 2020. 4. 8. · -4 t liesis subrnitted in conformitjr \vit h tlie requirernents for tlie clegree of Alaster of Applied Science Graclrrate Department of Electrical

Figure 2.1: A simple garne tree.

Consider tlie problem of fincling the gzznie tree vallie for the tree i r i Figure 3-1. Iclecilly.

player 1 iGmts to tlie game t o ~ ~ v c l s position T bccause tliat lias the highest score frorn

his standpoint. Assrirne t h plaj-er 1 plays rnove b in ordcr to s ta r t rnaking progress ton-arcl

position T. As fûr as player 2 is conceniecl. rriove f pIays ri& irito p l v e r 1's linricls. Plaxer 2

obtains a better position by p l ayhg move y. If p lve r 2 chooses rriove g t h i playcr 1 fo1lon.s

it up witli move u and the final posit;iori. Cr. 1 ~ 2 ~ a score of -1. Tlius. plq-er 1-s original plan

of guiding the game towarcls T can bc easily thwartccl bu a carehil plq-er 2. Consiclcr wiiat

happens whcn player 1 chooses rnove a. Xow. if player 2 pics c tlicn pl*-er 1 cliooscs i and n-c

obtain a score of 7. Hoivever. if p l v e r 2 plays d tlien plq-er 1 cliooses I ancl the firial position

bas a scorc of 4. .A similar analysis on e slio~vs tliat it lcacls to a score of -5. Clearly. player 2

slioulcl clioose d. So tlic secluerice of rnoves. assurriirig perfect plq- bj- botli sides. is u. cl. 1. This

scquencc n-il1 be referred to as t h e principal variation. The terzn principal variation is used to

describe the scquence of moves tliat lend to tlie game tree valrie. For t lie t ree in Figure 2.1, the

game trce d i r e is 4.

2.2 Definitions

A11 ulgoritliins clescribed in this ivork expanci the branches at a node in a left to ri& orcler. A

branch or cliilcl nocle: m. is said to corne before another branch or dii1cl nodc, n. iF 7n is to tlie

left of n. A branch or diild node i s said to he the first a t a node if it is in the leftrnost position.

In addition to the search orcler defined above, tlie algorithms search the trce in a cleptli

first mmner. Deptli first tecliniclues for game tree search recjuire very litde memory ancl the

meriiory requirement cloes no t ~ T O W eesponent ially wi t li t lle t ree size-

Page 12: University of Toronto · 2020. 4. 8. · -4 t liesis subrnitted in conformitjr \vit h tlie requirernents for tlie clegree of Alaster of Applied Science Graclrrate Department of Electrical

1: if node-depth = O then

2: return EC-.-\LC--\TE(~O~~)

3: if node-type = r n m then

4: score - -X

5: else

6: score - fax - : for i - 1 t o nocle.branch.length.

8: nexw-node - TR. - \ v -ERSE(~O~~ , nocle.branch[ij)

9: value - AIr~rhl--\s(ne.ru-node)

10: if node-type = maz t h e n

Il: if ualue > score then

12: score - unlue

13: else

14: if value < score t hen

1.5: score - value

16: return score

Figure 2.2: The mini-max algoritfim.

The Mini-Max Algorithm

In the tree of Figure '2-1. at the nocles wliere pIayer 1 is on move. player 1 will select the move

tliat mi~xïniizes his/her score. Similarly. the iiocles wliere player 2 is on rnove. pli'~~-eï 2 will

select the rnow that rninirnizes his/her score. Thus. we can classify tlie nocles of a game tree

as being one of two types: maximizing or rninimizing. This observation Leads directly to the

mini-tnax algoritlini in Figure 2 2 .

Deperiding on n4ietlier a node is maxirnizitig or rninimizing. the algoritlirn keeps track of

the largest or the srnallest score. respectively A Aed nocle is rcuclied when the rerriaining deptli

(node-depth) is equal to zero. At a leaf nocle. the EVALUATE F~mction is c;dlecl to clctcrmine the

score associatecf with tlie nocle.

In perforiniiig its work. the mini-rnxx algorithm explores every nocle in the game tree-

Consider the application of this algorithm to cliess. On average, a chess position bas 32 possible

moves (refer to Section 4-32]. -4 tree of deptli n tvould contain 3 Y leaf nodes. Clearly. the

mini-rnax algorithm is not practical For a chess tree when the clepth esceeds 5.

Page 13: University of Toronto · 2020. 4. 8. · -4 t liesis subrnitted in conformitjr \vit h tlie requirernents for tlie clegree of Alaster of Applied Science Graclrrate Department of Electrical

iF node.depth = O t hen

return E \ r . - \ ~ ~ . - ~ ~ ~ ' r ~ ~ x h ~ I x ~ ( n o d e )

score - -,x for i - 1 to r~oSe.bra7~ch.length~

nezu-node - T R . . w E R S E ( ~ O ~ ~ . node. branch[il)

value - - N ~ ~ = \ ~ I . - \ x ( n e w _ n o d e )

if value > score then

score - value

ret urn score

2.4 The Nega-Max Formulation

T h e min i -ma algorithm can be simplifieci by eliminating the distinction betwecn rnx~irnizing

and minirnizing nocles. By simply negating the rcsrilt returnecl froni the recursive cal1 in Fig-

rire 3-2. eczcll nocle can be treatecl as a rnasirnizing node. Hotvet-er. another ~rioclificatiori is

rieccssarj- to acliiex-e the sanie resrilt cas t lie mini-tru.. algori tlini. -At il leaf iiode. t lie Ev--\LL--

.-\TE fiinction hzis to retiirn a score from the viewpoint of the player on niove. For esample.

consider the c~ase of a leaf node that has a score of -6 in the origirial mini-rnas scherne. The

score indicutes tliat the position favors player 2. In the new scheme. if player 1 is on move.

a score of -6 svould be retiirried. However. if p lqe r 2 is on move, thcn a score of G tvorilcl

bc returnecl. Tlie evaluation Function that irnplernents this Frinctionality wil1 bc referred to as

EV~\L~~.- \TE-\IEC~~I: \S- Figure 2-:3 illustrates the nega-max algoritlim [13] that is obtained dien

one implernerits the changes clescri bed.

2.5 The Alpha-Beta Algorithm

A closer esamination of the mini-mas rilgoritlirn reveals possible enhancemencs to clle basic

techiclue. Table 2.1 illustrates the progess of the algorithm on the tree of Figure 2.1. It is

assu~necl thnt the espûmion at each node proceeds in a left to riglit order. We start at the Root

node. which initially hcas a score of -m. and the exploration begïns svit1.r branch a. Mode -4

s t ~ u t s with a score of +cc since it is a minirnizing node. The process OF recursive cdls contiriues

until leaf node 1 is espandeci. Here the recursion stops and a value of 7 is returnecl. ,4t node

Page 14: University of Toronto · 2020. 4. 8. · -4 t liesis subrnitted in conformitjr \vit h tlie requirernents for tlie clegree of Alaster of Applied Science Graclrrate Department of Electrical

C. the oIcl value. -m. is ot-ermritten with the T that is returned by node I. Eventnally. nocle -4

obtains a score of 4 at step 18. Seurcli at nock -4 proceeds R-ith tlie traversal of brandi e. Node

E. beirig a nii~t;irriizing node. l ias an initial score of -cc. On exploring brancli o. riode E obtains

a score of 5- This is n-here the entianccment can be macle. Since riocle E is a masimizing node.

the score cun onlj- go higlier tlian 5. Ho~vetvr, it is dso known that at nocle -4, a mininiizing

node, the score is 4- Xocle -4 \ d l reject any =due that is greater ttiaii or eqtial to 4. Tliiis. the

irnesplorecl branches rootecl a t riode E can be eliminatecl from the searcli sincc tliey wi11 liuve

no eKect on the score at nocle -4. Such an elirnination is termecl a cut-off in ganie tree parlance-

There niust be conirnrinication betweeri the adjacent let-els in the trec in orcler to cletcrmine

wheri searcli at a nocle becomes no longer necessar5 The enIrancecl \-ersion of the rnini-mas

algorithm. whicti is referred to as the weak alpha-beta aigorithni [El. is illiistratecl in Figure 2.4.

The best score obtained a t any riocle is pcassecl clou-ri CO tlie siiccessor as a boiinci so that cut-offs

can be macle.

The weak alplia-beta nlgoritlirn still misses sonie cut-of%. Considcr the applicatioii of the

algorithm to the trec of arbitrary clepth as illiistrnted in Figure 2 5 . Xfter node -4 1123s been

esplorecl. the Root d l have a score of 4. Nocle B n-il1 receive 4 LIS a t~ouncl from the Root.

Since no searcti has bccn done at nocle B. nocle C receives an infinite hound, Similarly node

E also receives an infinite bound from nocle C. Howet-er. the botincl tliat KU applieci at riodc

B still applies to nocle E since the Root will reject any score that is Icss than or eclual to 4.

In this pnrticular case. aclclitional cut-offs can be macle cli~ring the searcli of thc subcree rooced

a t E if the liiglicst score obtainecl at a mxïimizirig nocle is carriecl don-nwarcls as a bounct. -4

similar argument can be macle for the smdlest score at niinirnizing noclcs. Thus, the algorithm

can be erilinnced fiirther by maintaining two boulicls:

r alpha (lo!wer bovnd) : Keeps track of tlie liighest score obtained at a mxsimizirig riodc

liiglier up in the tree ancl is usecl to perform cut-offs a t rninimizing nocles.

r betct t t t pper bound): Iieeps track of the Iowest score obtuinecl a t a minimizing nocle Liiglier

iip in the tree and is usecl to perform cut-offs at mairnizing nocles.

The resultirig technique is referred to as the alpha-beta [12] algoritliin ancl is summarizecl

in Figure 2.6. To determine the gane tree valriet the algoritlim is irivokecl with the cal1

ALPC[ABETA(ROU~. -m. foo). The pair of n~irnbers~ (-x. toc) . defines the seurch windozu.

An illustration of the cut-offs achieved by the alpha-beta algorithm is shown in Figure 2.7.

At node E. branches p and q are elirninatecl. Wien the searcii arrives at riocle B. there is a.

lower boirnd of 4. On esploring F. node B obtzuns a score of S. A similar exploration of G

yields -1. This esceeds the lower bound and the searcli is terminated at node B: node H is

then cut-OR.

Page 15: University of Toronto · 2020. 4. 8. · -4 t liesis subrnitted in conformitjr \vit h tlie requirernents for tlie clegree of Alaster of Applied Science Graclrrate Department of Electrical

Step

Root

-4

C

1

C

J

C

K

C

-4

D L

D

114

D

1V

D -4

E

O

E

P

E

Q E

-4

Score -

-4ction.

Esplore a

Explore c

Esplore i

Return T

Esplore j

Retrrrn '1

Explore X:

Return -9

Return 7

Explore cl

Esplore 1

Rcturn 4

Esplore

Return -2

Esplore n

Retirrii -:3

Retiirn 4

Explore e

Esplore O

Retiirn .5

Explore p

Return -1

Esplore q

Return 3

Retiirn .5

Retirrn 4

Table 2.1: Partial analysis of how the mini-mas algorithm explores the tree of Figure 2.1.

Page 16: University of Toronto · 2020. 4. 8. · -4 t liesis subrnitted in conformitjr \vit h tlie requirernents for tlie clegree of Alaster of Applied Science Graclrrate Department of Electrical

1: if node .depth = O t hen

'3: return E v = \ ~ ~ - - \ ~ ~ ( n o d e )

:3: if nocle.tgpe = maz then

4: score - -x 5: else

6 : score - f x - 1 : for i - 1 to node-b.r.anch.lengt/~

8: r~e~u-node - T R = \ v E R S E ( ~ O ~ ~ . r~ode.brmch[i])

9: valrie - \ v ~ - b \ ~ A k ~ ~ ~ ~ ~ \ B ~ ~ . . i ( r t e ~ u - n o d e . score)

10: if node. t ype = maz then

11: if ualue > bound then

12: return Sound

13: if unlue > score then

14: score - value

15: else

IG: if value 5 bound then

17: return bovnd

18: if ualue < score then

19: score - valzie

20: return score

Figure 2-4: The weak alpha-beta algorit hm.

Page 17: University of Toronto · 2020. 4. 8. · -4 t liesis subrnitted in conformitjr \vit h tlie requirernents for tlie clegree of Alaster of Applied Science Graclrrate Department of Electrical

Figure 2.5: -4 ganie tree ~vhere the ce& alpha-beta algorithm misses cut-offs.

The alpha-beta algoritlirn can be siniplifiecl usirig the nega-max formillatiori i r i a manner

similar to the simplification of the mini-max algoritlirn- The reforni~ilated algorithm is prescntecl

in Figure 2.8. Eacii node is treatccl as a nicxximizing node. thus hetu is used as the borincl tliat

determines when ciit-offs are possible. Furttiermore. \\-lien making a recursit-e call. the boiincls

are reversed (the lower bouncl beconies the upper bouncl uncl vice vcrsa) anci negatecl. This

allows the sub-nocle to be treatecl as a niaxirnizing node. Note that cc-aluations are cornputeci

bj- EVALCATENEGA h IAX as recliiirccl b>- the riega-niax sclieme.

The efiïciency of the alpha-heta algorithm is clepencknt on the orclcr in wliich the brariclics

are scarclied- If brariclies tliat leact to high scores (or low scores in the case of a nrinirriizing

nocle) are espanded first. then tighter bouncls will be obtainecl for the rest of the searcii. This

will result in w higlier nuniber of ciit-offs. For some problenis. the cpality of a bru~icli is not

known until tlie leaves are reached, thus it is clifficult to control the algorithni so that it searches

.-good'- branches first. However. in niari)- cases. an educatecl guess can be uiade regardirig the

cluwlity of a €)ranch €rom some preliniinary information. Iihere such information is at-ailable.

the efficiency of the search is greatly entiancecl if the lntnchcs are esploreci in orckr of clecrcasing

clu d i t y-

2.6 A Perfectly Ordered Game Tree and Node Classification

Consicler the gume tree in Figure 2.9. It is a revised version of the tree in Figure 2.1. The

brariclies have been arrangecl so that the best branch at each nocle appears i r i tlie leftmost

positiorr. Srich a tree is perfectly orderecl for an alpha-beta search that espands branches in a

left to riglit order. The nocles of a perfectly orderecl game tree cûn be clcassifiecl into tliree tmes

Page 18: University of Toronto · 2020. 4. 8. · -4 t liesis subrnitted in conformitjr \vit h tlie requirernents for tlie clegree of Alaster of Applied Science Graclrrate Department of Electrical

ALP~IXBETA( node. alpt~u. beta)

1: if node-depth = O then

2: return E\-.-ILC.-\TE(~O~~)

3 : if node-tgpe = rnax then

4: score - nlphct

5: else

G : score - betu - r : for i - 1 to notle.brar~ch.length

s: 9:

10:

Il:

12:

13:

12:

15:

16:

Li:

1s: 19:

30 :

nedw-nocle - T R A V E R S E ( ~ O ~ ~ . node.branci~[i])

if node-type = rnax then

udue - k ~ t ~ t [ = \ B ~ ~ ~ ( n e * w _ n o d e . score. beta)

if d u e > beta t hen

return betn

if ualue > score then

score - value

else

uahe - , 4 ~ ~ t [ ~ B ~ ~ - - \ ( n e ~ w - ~ ~ o d e . alpha. score)

if value <_ alpha then

ret urn alpl~a

if value < score then

score - unlue

21: return score

Figure 3.6: The alplia-beta algorithm.

Page 19: University of Toronto · 2020. 4. 8. · -4 t liesis subrnitted in conformitjr \vit h tlie requirernents for tlie clegree of Alaster of Applied Science Graclrrate Department of Electrical

Roo t

CL ah

Figure 2.7: Cut-of% macle bl- the alpha-beta algorithm.

A L P I - I X B node. de. alpha. beta)

1: if node-depth = O then

2: return Ev .ALL '= \TENEG. - \~ Ixs (~~~~)

3 : for i - 1 to node.branch.lenqth

4: neW-lzode - TR. - \ vERSE(~O~~ . node .brnnch[i]) - 3: uahe - - , ~ ~ ~ ~ ~ B ~ ~ ~ ( n e z u - n o c l e . - beta. -alpha)

6 : if vulue 3 beta then - (: return beta

8: if ualue > alpha then

9: alpha - ,value

10: return alpha

Figure 3.8: The alpha-beta algorithm reforrniilatecl using the nega-mas sclieme.

Page 20: University of Toronto · 2020. 4. 8. · -4 t liesis subrnitted in conformitjr \vit h tlie requirernents for tlie clegree of Alaster of Applied Science Graclrrate Department of Electrical

Figure 2.10: Cut-offs crchievecl by the alpha-beta algorithm on the tree of Figure 2.9.

[12. 171:

ir Tj-pe I or Principal Variation (PV) Nocles

+ Type 2 or CUT Nodes

0 Type 3 or ALL Nocles

The type of each rlocle is iriclicatecl in Figure '2.10. The figure czlso illustrates the cut-offs made

by an alpha-beta a lgo r i t h on the perfectly ordered tree-

2.6.1 Type 1 or PV Nodes

In a perfectly ordered tree: the first sequence of moves searched by the alpha-lxta algorithm is

also the principal variation. This is the cca.se for ,my perfectly orderecl tree. Eacli nocle of the

Page 21: University of Toronto · 2020. 4. 8. · -4 t liesis subrnitted in conformitjr \vit h tlie requirernents for tlie clegree of Alaster of Applied Science Graclrrate Department of Electrical

principal variation is described as being type 1 or PV.

Recall that a fidl alpha-beta search is initiated with ttie cd1 ALPEIABETX(R~O t , -cc. +cc) - Eacti PV nocle receives infinite lower c u i t l iipper borincls. This stems from the fact that PV

nodes constitute the start of the search mcl scores have yet to be establishecl- Sirice the bouncls

are infinitc. u ciit-off never occrrrs at a PV nocle ancl cil1 branches are searchecl. Howm-er. the

searcfi orcler at a PV nock is important- For esample. consicler a PV nocle of the rnxxiniizing

tj-pe. Ini~iaI1~- the lower bouncl is -cc ancl the iipper borincl is +m. Tlic score returnecl by the

first brancli will be iisecl as the loi\-er bouncl for tlie nest brancli to hc espancled. Clearlj-. if

the score retiirnecl by the first brzuich is the highesc possible at tliat nocle. the boiincl d l not

change for the tluration of the search at that nocle. Frirthermore. tlie search benefits frorn the

high lotver bouncl that vas establishecl ri& at the stnrt.

The first succeusor to a PV node is also a PV nocle \\-hile the otlicr successors are C'UT

nocles.

2.6.2 Type 2 or CUT Nodes

CUT nocIes are successors to P V nocles ancl ,U,L riodes. Since a CUT node is not tlie first

successor at a PV node. it will have a boiiricl as establishecl bi- the PL7 nocfe's first hrancl-i.

Once again. considcr a PV nock of the mi~simizing tjpe. The second brarich t o be espancled at

the PV nocle will lead to a minimizing CUT nocle, The score retririictl bj- the PV node-s first

braiicli serves as the Ioi-er borincl at tlie CUT node. Note tliat the CUT nocle docs not have an

uppcr borincl since that riras nevcr dctcrminecl at the rnasiniizing PV nocle - a PV riocle only

cIetermiries one of the two boiinds. Since the tree being searcticcl is perfectly orderecl. tlic first

braticli searchccl ctt a CUT riocle imriiediately leacls to a cut-off. .Jiist as with PV nocles. searcti

orcler is important at CUT nodcs.

The first sriccessor to a CUT nocle is an ALL nocle wlicre,~ the rest of the sriccessors are

cut-off-

2.6.3 Type 3 or ALL Nodes

An ALL nocle is a successor to a CUT node. Being the first branch at a C U S node. tlie ALL

node obtains the same bound information as its parent. Consider the esample of a minirriizing

CUT nocle. The CUT node will have a valid loiver bourid, liowever. tlie iipper bouncl will be

infinite. The Erst branch to be espandecl at the CUT nock leacls to a niaximizing ALL node.

Since the iipper bound is infinite at the ALL node, no cut-offs c m be niade ancl a11 branches

are searcl~ed. Due to the perfect ordering of the tree, the scores retiirned b_t- the ALL node's

successors are not high enorrgh to incre,ase the loier bound - the scores returned are worse

Page 22: University of Toronto · 2020. 4. 8. · -4 t liesis subrnitted in conformitjr \vit h tlie requirernents for tlie clegree of Alaster of Applied Science Graclrrate Department of Electrical

than that establishecl the PV node higlier iip in the tree. In hct. the search orcfer at a n

ALL nocle is irrelevant.

The successors at an -4LL node are al1 LUT nodes.

2.7 Aspiration Search

Wien the alplia-beta algorithm is invoked n-itli the c d A \ ~ ~ t ï ~ \ ~ ~ ~ - - \ ( R o o t . alpha. beta). it re-

turns a mliie becwecn alpha ancl beta - note that alpha aricl beta are valid return values LIS

well. Norniallj-. to determine the game tree t ~ ~ l u c , one woiilcI make a cal1 to the algoritlini witli

alpha ancl beta set to -x and +x respectively. Consicler the case where tlic final game tree

value is known wit 11 sonie certainty. Greater efficicncq- can be achiewcl by ernplo>%ig aspiration

search [lG. 1.51. Figure 2.11 illustrates tlie process. Tlie estimatecl gatne trce value iç Ii. An

error factor e is used to cletermine the tn-O initial botinds alpha ancl heta. The lower bound is

set at orle error factor below V and the tipper bouncl is set a t one error Factor abob-e I./. IVlien

the alpha-beta algorithm is called witti thesc bouncls. the return value is one of three tqpes:

A retiirn value betn-eeri alpha riricl beta: In this case tlie game tree valiie lias been cleter-

mineci and furcher search is not necessary. The trec searcli was Liiglily efficient due to the

narrow searcIr winclow-

A retiirn value ecliial to alpha: The search licu fuilecl-low. The real game tree value is not

knowi: a11 that is known is tliat the tree value is less thuri or eq~ial to alpha.

A retwn vciliie ecliial to betu: T h e searcli lias fuilecl-high- Tlic gamc trcc ~idt ie is grcuter

than or ecliial to beta.

In a fail-low or fail-high situation, n new searcli miist be perforrnecr witli nen- boiiricls in orcler

to determine the real tree value. During a new search, many of the nodes tliar: were visited by

the initial search may be revisited. Clearly. the new searcli rcchces efficiency: liowever. note

tliac this sitiiation only arises d i e n the original searcli fails - a sitiiation tliat does riot arise

too ofteri since tliere is an estimate for the final garne tree valire.

2.8 The Negascout Algorithm

As inentioned in Section 2.5. when there is some preliminary 'qriality" information about the

branches at a riocle. the alpha-beta algorithm benefits from the expansion oE the branches in

order of decreasirig c lua l i~ . If this preliminary information cc- be used to predict which of

the branclies is the best at a node with reasonable uccuracy. a technique similar to aspiration

searcli can be employed recursively to obtaîn a highly efficient tree searcher.

Page 23: University of Toronto · 2020. 4. 8. · -4 t liesis subrnitted in conformitjr \vit h tlie requirernents for tlie clegree of Alaster of Applied Science Graclrrate Department of Electrical

1: alpha - V - e

2: beta - V + e

3: score = - ~ L P HXB ET--\( Root. alpha. beta)

4: i f score = beta then

5: score = .ALPI-I.-\BET.~\(Roo~, beta. +,x)

6: elsif score = alpha then - r : SCO~~=-ALPHABET.-\(Root.-(x-alpha)

Figure 2-11: Aspiration searcri-

Ttie negascoot algoritlirn [XI] ' is illiistratecl in Figure 2.12. Onl- tlie first brancli at each

riode is searchecl n-itli tlie Full winclow. The rest of tlie branches are searctieci with a nul/- winclo,w.

,A null-winclow clcscribes the case LI-here alpha and betu are separatecl by 1 rinit. In tliis case

no real searcli is perforinecl - the nuIl-winclow search amounts to a test on the subtree to

determine wliether its value is Iess than or eclual to alpha or greater than or equal to beta- In

t lie negascoirt algorit hm. after searcliirig t lie first brarich wit 11 t lie f ~ i I 1 winclow. the rest of t lie

braiches arc sirnpll- testcci to see wtiether they lmw a value tliat esceecls the best score so Far

at that riocte. IF any of the braich tests fails-hi&. a rien* scarch is pcriornied n-itli an espanclccl

winclow to establish the truc score of chat particular brancli.

2.9 Iterative Deepening

Rather triari tackle tlic entire tree at once. depencling on the application. it rnay be aclwintageous

to cletcrmine the tree value in s e p s First. the gaule tree value and principal variation For a

tree of depth one is deterinined. Then. a sirnilar process is carriecl out on ct tree of deptli

two. This iteratiuely cleepening process {IG. 151 continues uritil the Cree of the requirecl ciept ii

lias been searcliecl. Tfiere a re two aclvantages to this sclienie. In a situation n-liere tliere is a

time restriction on the searck the search of the entire tree in a single p,ws may be too timc

consiiming. However. if the searclx is carriecl out iri steps, even if the searcli neecls to be stoppeci

at sorne depth, the previous iteratiori provides a reasoriable solution albeit a c a lower seardi

depth. The seconcl advantage is that each iteration can collect useful information about the

tree for tlie next iteration. Consider sorne esamples of t his iteration-to-iteration information - -- -

'Ttie algorithm presented here is a simplifieci version of the one presenteci in [20]. in particular. this version does not make use of the fail-soft extension and performç new searcties even when the height of the subtree is less than two.

Page 24: University of Toronto · 2020. 4. 8. · -4 t liesis subrnitted in conformitjr \vit h tlie requirernents for tlie clegree of Alaster of Applied Science Graclrrate Department of Electrical

NEGASCOL-T( node. alpha. beta)

1: if nacie.clepth = O then

2: return E~-,\L~:.\TENEG.-\A~TA~ node)

3 : ne,w_node -T~,\\'~~~~(node.node.branc/~[1])

4: val,ne - - N ~ ~ - - ~ s ~ o ~ * ~ ( n e ~ u - n o c l e . - beta. -alpha)

.5: if .ualue 2 beta then

6: return heta

7: if - u a l ~ ~ e > alpha then

8: alpha - value

9: for i - 2 to rtorle. 67-unch-length

10: nezu-node - T R A V E R S E ( ~ O & . node. brunch[i])

Il: ,uc~hie - - ~ ~ ~ . - l ~ ~ ~ ~ ' ~ ( ~ n e ~ w - n o c ~ e . -alpha - 1. -cdpha)

13: if ualtte > alpha and ,value < betn then

13: value - - N E G . - \ S C O L ~ T ( ~ ~ ~ W _ ~ O ~ ~ - - beta. -alpha - 1 )

14: if .uttlue 2 beta then

1.5: return betn

16: if value > alpha then

17: alpha - value

18: return alpha

Figxre 2.12: The negascout algorithm.

Page 25: University of Toronto · 2020. 4. 8. · -4 t liesis subrnitted in conformitjr \vit h tlie requirernents for tlie clegree of Alaster of Applied Science Graclrrate Department of Electrical

1: node - Root

2: I/' - initial-estimate

3 : for d - 1 t o depth

4: node-depth - d - 3: alpha - V - e

GI beta - V + e - I : score = - 4 ~ ~ ~ - i ~ \ B ~ ~ . - \ ( n o d e . alpha. beta)

S: if score = beta then

9: score = - ~ L P HXB FI AB no de. heta. +cc)

10: elsif score = alpha then

11: score = -4~~ t i - - \B~~=\ (node . -S. atpha)

12: V - sc0.r-e

Figure 2.13: Ari iteratively cleepening alpha-beta search t h uses die game tree \ - d u e froni the

prevïoiis iteration in an aspiration search of the current iteration.

a The priricipal variation froni the previous iteration is risually a goocl inclics~tor of n-hat

the principal variation froni the current iteration will look like- Search efficEcncy usiially

iniproves if the branches from the Iclst principal variation are espanclecl first .

a The gtime tree vdue from the Iast iteration is LI, goocl estimate of what t h e ganie tree

value will be at the end of the current iteration- Therefore. the game t r e e value from

the previous iteration cari be used to guicle an aspiration searcli of the tree to the clepth

reqiiired by the currerit iteration.

Figure 2.13 corribines iterative cleepening with aspiration searcli. Aspiratiorn searcli uses

a small window centerecl arouncI the game tree value h m the prececling iter;z~tiori for the

search during the current iteration.

The previoiis iteration rnay have retained quality information about certain key branches

in the tree. T h e nature of t h information is iisually application clcpcncicnt- This infor-

mation is then usecl by the current iteration to determine the orclering of the- branches at

a nocle so tlzat good branches will be expandecl first.

Page 26: University of Toronto · 2020. 4. 8. · -4 t liesis subrnitted in conformitjr \vit h tlie requirernents for tlie clegree of Alaster of Applied Science Graclrrate Department of Electrical

Chapter 3

Parallel Alpha-Beta Search

3.1 Introduction

Pardlelization of the alplia-beta algorithm and its variants h,îs provcri to be clifficult. The

overlieacls in a parallel impkmentation of the aigorithni cari be classified into three categories

pl]: r Conimiinicat ion O\-erheact

O Synclironization overlieacl

Searcfi overliead

First. consicler the comrnunicatiori overlieacl- Commii~iication bctween proccssors is normal

in a n - parallel algoritlini hoivever. tliere is orle type of cornrnunication that is unique to paraIIcl

alpha-beta. Tlie seqiientid alplia-beta algorithm iipdates its two boiiricls. alpha ancl beta. as

the search of a gcune tree progresses. \%%en senrcliirig in parallel. if one processor finds an

improvement to alpha or heta. it informs the other processors working belon- tliat node so that

tliey can make Lise of the tighter borincl that ivas just discoverecl.

Synchronization overhearl resiilts when n processor sits idle while waiting for some everit to

occur. For example. if four processors. Pl. P2. P3 and P4. are working together at a nodc.

processors P2. P3 and P4 maj- be waiting for the resrilt of a searcli being condiicted by processor

P l on some brcuicli at that node.

Search overliead is a consequence of the parczllel alpha-beta algorithm esamining nodes that

nroiild have been ûvoicied by the secluential version. When parallel searcli is initiated at a node.

the best score might not have been discovered as yet. As a result. parallel searcli is conciuctecl

116th a wider winclom than in the sequentid case. Fiirthermore: aiter parallel secarch 1icw been

initiated. one processor may discover that sench is no longer necessary ût the nocle due to a

CU t-off condition - the other processors have essential1y performed useless work.

Page 27: University of Toronto · 2020. 4. 8. · -4 t liesis subrnitted in conformitjr \vit h tlie requirernents for tlie clegree of Alaster of Applied Science Graclrrate Department of Electrical

The three overheads are not iriclependent of each other: they are reIatecl in a compIcs

mtmner. Let iis consicler a few examples of tliis cornples relationsliip. To redrrce searcli overhead.

the processors ma>- coritinirously update each otiier with the latest seardi iriforrnation thereby

increasing comnirrnication overheacl. In another s t ra tes- to recluce searcfi 0%-erheacl. we ma>-

recluire that a certain niimber of branches be esplored in c i seqirential fashion before attenipting

parallel search at a nocle. IWieri the parallel searcti is actrial1~- startecl. it is IiighIy likely

that the score at the nocle will have stabilizecl. Communication overfieacl is also red~icecl as

messages carrying bound information 6 1 1 be less frequent. However, synclironization overlicucl

nrill incretise since thcre will be sewral processors waiting idly d l i l e a single processor completes

the recpirecl nuinber of branches.

In this cliapter. important work in thc area of dpha-beta parallelization is prescnted. M%ile

the litcrcitiirc describes several methocis [l]. only tliree are describecl here. These tliree methocls

have been cliosen carefully froni the several that arc available. The first niethocl. principal

variation splitting (PVSplit) [16]. is the rcsult of some of the eirliest attempts at paralielizing

aiphci-beta. Altliough it is a relatively olcl iclea. PVSplit has been the subject of much rcsearcli

ancl it lias bcen the source of inspiration for several newer metliocls. The other tn-O mettiods.

J-oirng hrotllers wait concept (YBIVC) [G] and di-rianiic tree splitting (DSS) [91. arc more rccent

and spectaciilar speed-ups have been reported. These two rnethocis ernbod>- two clifferent design

goaIs: YBWC nwi clwelopecl in a clistribi~tecl envirorinient wliere communication costs are liigli.

whereas DTS wa.s developed in an environment where communicution is quite ctieap.

3.2 Super-Linear Speed-Up?

Xltlioiigli i t ruel>- ever happens in practicc. pcirallel alpha-beta may yield super-liriear speed-

iips. Consicler the tree of Figure 3.1. It is tusiimecl that this trce is part of a tnucli bigger trec.

When the searcti arrives at the Root node, alpha ticas a value of -5 ancl beta has a valrie of

0.l It is assiimecl that eacli major operation in alplia-beta can be completed in one time unit.

Table 3.1 illustrates the progress of a single processor esecuting the alpha-beta algorithin on

the tree. It takes 25 time units to cornplete the search. Now. consider the situation where two

processors. P l ancl P2. ~zre workirig togettier a t the Root node. The first processor handles the

subtree rootecl at node A while the secoricl hanclles the subtree rootecl a t node B. The progess

of processors Pl and P2 is illustratecl in Tables 3.2 and 3.3 respectively. Since rnove orclering is

bad withiiz the subtree rooted at A. 2x11 nodes will have to be esplored. However. there is perfect

ordering in the subtree rooted at B and cut-offs are plentiftri. The seconcl processor completes

its search in a short period of tinie and it discovers that a eut-off condition esists at the Root.

LThis discussion uses mini-ma.. conventions as opposed to nega-mu conventions.

Page 28: University of Toronto · 2020. 4. 8. · -4 t liesis subrnitted in conformitjr \vit h tlie requirernents for tlie clegree of Alaster of Applied Science Graclrrate Department of Electrical

Figrire :3.1: Parailel search on this tree ~ielcls super-linear speecl-irps.

Processor 1 is tlien stopped ancl t h e search of the Raot node is coniplete. Witti tn-O processors.

the search takes a mere II time ~ n i t s . Clearl. super-line,~ speed-up is a possibilit_v. but tlie

ratlier contriveci nature of the t r e e in Figure 3.1 and the bou~icls used shoir1cl be inclicative of

the fact that super-Iinear speed-up rarely occurs in practice.

Principal Variat iom Splitting

In PVSplit [lG]. the ride is that t h e first branch at a PV nocle must be searclied before paralle1

searcti of clie rernaining branches ma- begin. XII processors tra-el down the first branch at

each PV nocle iintil they reach t h e P V node that is one level above ttic leaf nocles- Here orle

processor searches the first brancli while the otlier processors wait* Oncc tlic first brunch lias

been esaminecl. al1 processors join the senrch effort. Each processor takes away a branch at

a tiine and cletermines i t s \-due. Ef a processor cliscovers an iniprovenient to tlie score ut the

nocle. it informs the otlier processcrrs of the iipclatecl vaiue. When tliere aren't ariy unassignecl

branches. a processor chat rims o u c of work remains iclle until the otlier processors finish. Oncc

a11 branches liâve been esamined. tzlie search effort rnoves the c u r e n t node's parent. Sirice che

vahe of the parent node's first brimch was jirst computed. pardlel search c m bc startecl at tlie

parent as well. This process contiriues upvm.rcIs in the tree until al1 the branclies nt tlie Root

rio cIe have been esamined.

F i s ~ r e 13.2 illustrates the progress of two processors. P l and P3. as they lise PVSplit on a

small tree. Both processors travel d a v n the leftmost path until they reach node D. At this node.

P2 remains idle whiIe P l esplores brcmch g. Once the exploration of g is complete. pczrallel

search is started; Pl explores h wliile P2 explores i. I h e n D lias been completely evduated.

Page 29: University of Toronto · 2020. 4. 8. · -4 t liesis subrnitted in conformitjr \vit h tlie requirernents for tlie clegree of Alaster of Applied Science Graclrrate Department of Electrical

T h e

O

I .) - 3

-4

.5

G - I

8

9

10

11

12

13

1-4

1.5

16

I I

1s 19

20

21

22

'2 3

24

Roo t

-4

C

G

C

H

C

&4

D

1

D

J

D -4

Roo t

B

E

f-c E

B

F

hl.

F

B Roo t

Beta

O

O

O

O

O

O

O

O

-1

-1

-1

-1

-1

O

O

O

O

O

O

O

O

O

O

O

O

Score

Esplorc CL

Explore c

Esplore 9

Return -2

Esplore h

Retrirn -1

Retiirn -1

Explore cl

Esplore i

Reti~rn -4

Explore j

Retrirn -3

Retrirri -3

Retiirn -3

Explore 6

Esplore e

Explore k

Reti~rn '2

Lut-off

Esplorc f

Explore rn

Retiirn 4

Ciit-off

Return O

Ciit-off

Table 3.1: Prog~ess of the aipha-beta algorithm on the tree of F i s i r e 3.1.

Page 30: University of Toronto · 2020. 4. 8. · -4 t liesis subrnitted in conformitjr \vit h tlie requirernents for tlie clegree of Alaster of Applied Science Graclrrate Department of Electrical

Beta

O

O

O

O

O

O

O

O

-1

-1

-1

Score

Explore n

Explore c

Explore g

Return -2

Explore h,

Retr~rn -1

Return -1

Esplore ci

Explore i

Return -4

Esplore j

Table 3-2: Progress of P l on the tree of Figure 3.1.

T i m e

O

1

2

3

4

5

6 - I

8

9

10

Roo t

B

E

1 -

E

B

F

1 w

F

B

no0 t

Beta Action

Esplore 6

Explore e

Explore k

Return 'Z

Lut-off

Esplore f

Esplore rn

Return 4

Cut-off

Returri O

Cut-off

Table 3.3: Progress of P2 on the tree of Figure 3.1.

Page 31: University of Toronto · 2020. 4. 8. · -4 t liesis subrnitted in conformitjr \vit h tlie requirernents for tlie clegree of Alaster of Applied Science Graclrrate Department of Electrical

Figure 3.2: Two processors iising PVSplit to divicle iip tlie work in a small t ree-

the secarch moves to its parent. -4. Since the elclest brotlicr, d h,îs been esamincc1 at -4. pnrallel

search can be ernployecl once again. Processor P l evaluates e while P2 evaluates J Once -4 lias

been evaluated. parallel searcli begins at the Root nocle witii P l evaluating h ancl P2 enliiating

C.

Let us esarnine some of the reasoning behincl PVSplit. First. at a. PV noclc al1 branches have

to be seiwciiec1. tliiis paralIel searcli is a good idea at tliis t_pe of nocle. Second. bi- recpiririg that

the first brandi be esaminecl before parallel search is startect nt a noclc. parallcl scarcii starts

only mhen a bouncl lias been cletermined. If the branches have becn orcicrcd uccorcliiig to sorne

preliminary cpaiity information. tlien tlie score rctilrnccl by thc first brancli m l - be the bcst

possible at that nodc. In fact. the original work [lG] clescribecl PVSplit as a rnetliocl for scarchiiig

strongly-orderecl trees ( d e r to Section 4-3) mhere the first branch at an>- nocte is t h best 70

percent of tlie tirne. If thc best possible score is obtaincd wlicri thc first bru~icli is csciniinecl.

parcille1 seardi will esairiiiie precisely the same noclcs as tlie seqricntial versiori- Ttiereforc. thcrc

is no searcli overlieacl if kit each PV nocle the first branch is also tiic best. Tliircl. searcli at a

PV node examinés more tiocles ttian the search of sin)- other nocle of ec~riivalent height because

ii. PV nocte has no bound inforniation - alplia and beta are at negative infinity aiicl positive

irifinity respectively. Thils. a PV nocIe is a reczsonable choice as a site for paraIlcl searcli.

The PVSplit method is not wittiout its Fa~ilts. Ctlieri the first brancli is riot the bcst. search

overhead increases c i s parallel search is conducted witti a bouncl tliat is not as tight as in a

secluential search. Depencling on tlie branching factor. the methocl may not be able to use u large

number of proccssors effectively. For example. in cliess. where the average branching factor is 32.

the met hoc1 cloes not have a rneclianism for hmidling more tlian 31 processors. Spchronization

overhcacl is a significant problem in PVSplit. Consider wliat liappens <as parczllel search at a

Page 32: University of Toronto · 2020. 4. 8. · -4 t liesis subrnitted in conformitjr \vit h tlie requirernents for tlie clegree of Alaster of Applied Science Graclrrate Department of Electrical

Table 3.4: Performmce of PVSplit on chess trees.

Processors

Speed-Up

Table 3.5: Performance of EPVS on chess trees-

1

1.0

Processors

Speed- Up

node nears its end, hsIost of the processors in the searcli effort wïll be idle waiting for a few

processors tliat are searcliing "difficult" branches. -4 clifficiilt brandi is one t h recliiires a

kzrger secarch effort because more nodes are esaminecl in the subtree generated by tliat brandi

c~rnp~xeci to the other branches a t the node-

Esperiments wïth PVSplit have sho~vn that speecl-rip is limitecl to a large estent by s y -

chronization overhead [17, 31, 91. Searching chess trees on a Crêiy C90. the techniclrie procluces

the speed-ups in Table 3.4 [9]. The speed-iip seems to be limitecl to am upper boririd of 5. This

iias led to some interest.i~ig work th& tries to recluce the synchronization 0%-erlieacl in PVSplit,

In enhanced principal variation splitting (EPVS) [9], wlien a processor becomes idle. al1 pro-

cessors move to the subtree beirig searclied bj- one of the busy processors and a site for parallel

search is createcl two levels below the original site. Esperiments n-itli this metliod proclucecl

the speed-ups in Table 3.5 [9]. Although chis metliocl is a little better tliari plain PVSplit. it is

still not very efficient wlien a large number of processors are involt-ed in the searcli effort. Iri

Dpcimic PVSplit (DPVS) pl]. each processor ruris a version of PVSplit. However. the differ-

cnce is that a controller process dynamicallÿ assigns iclle processors to lielp the busy processors

in the syscern. Searching chess trees. this tcclmiqire obtairiecl a speed-irp of 7.G-l on a network

of 19 Sun 3/7Ss,

3.4 Young Brothers Wait Concept (YBWC)

2 [ 4 1.8 1 3.0

There are two clifferent versions of YBWC. The earliest one is referred to c i s the weak YBTVC [G].

A more recent version that nioclifies t h e technique slightly is referrecl to <as the strong YBWC

[5] . Whenever R distinction neecls to b e made between t lie two versions. the clualifications weak

and strong will be used. Honever, if the discussion applies to botli techniclues. then the rnetliod

wiLl be referred to as simply YBWC.

At any node, the first branch to b e espanded is referred to the eldest broth,er ~vllile the

1 1 2

S

4.1

4

3.4 1.0

16

4.6

1.9

Y 5.4

16 6.0

Page 33: University of Toronto · 2020. 4. 8. · -4 t liesis subrnitted in conformitjr \vit h tlie requirernents for tlie clegree of Alaster of Applied Science Graclrrate Department of Electrical

other branches are referrecl to as the younger brothers. In weak YBLk-C'. the rule is as FoHows:

the elclest brotlier has to be examinecl before parallei-search of the o u n g e r brothers is possible.

Alttiorigh tliis is siniilar to PVSpiit. in YBI,L7C pardlel searcli is possible a t a.ny nocle not just

at P V noctes.

Before delving into the method's cletails, the concept of node ownersliip is introclricecl, A

processor that owns a riocle is responsible for its evaliiation- I t is d s o responsible for retirrning

the nocle's evaliiation to its parent- Note tliat this ma>- involve comrruinicatiori if the owner

of the evaluutecl node is clifferent from ~ h a t of its parent. Us~ially. a node ancl its sirccessors

have the same onrner, However. multiple processors cnn collaborate on a node if sorne of the

successors have different omners. In IBWL. once a processor is given ownersliip of a node. tlie

ownersliip of that node is not tramferable to anotlier processor.

At the start. one processor is given on-nership of the Root node wllile tlie otlier processors

remain in an idle stnte. A processor. P 1. tliac is iclle selects anotlier processor. P2. a t r a ~ i d o m

ancl transmits a message recluesting work. Processor P2 has work at-ailable if ttiere is a t leiut

one node in the subtree i t is esamining that satisfies the weak YBWC criterion. Tliat is. P2 h a

work available if it o m s a node a t tvliich the elclest brother has been evaluatecl. The node ttrat

satisfies the criterion becomes the split-point: if tliere are many nocles that sutisfy the criteriori.

then the node that is the higliest in the tree is selected as the split-point. X split-point is a

node that lias been chosen as a site for paralle1 search.

If P2 h a work available tlien a master-slave relationship is estât~lisliecl between P2 and Pl.

Note that P l may be one of m a v slaves to P2. The master ancl its slaves sliare the search

effort czt the split-point- Eacl-i processor takev rIivi1.y a brancli a t a tirnc iintil the setirch a t the

split-point is complete. A s in PVSplit. if one processor fincl5 an inlproveirient to the score a t

the split-point then the new score is transrnittecl to tlie other processors involved. -4 processor

rnay also discover a eut-off couditioii nt the split-point. In ttiis case. the search is coniplete ancl

the slaves return to their iclle state. A slave may also return to its idle state if there isn't a n y

work left a t the split-point. If tlie m u t e r returns h m the searcti of some brancti to find no

work a t the split-point. it should not remairi iclle whIe nraiting for the brisy processors to finish

because this woiilcl increase synchronization overhead. InsteacI, the master acts as a slave to

one of the busy processors. This is referrecl to as the helpful mnster concept.

\\lien an iclle processor P l transmits a message requesting work to a processor P3, the

latter may not have any work available. Processor P2 forwuc~s the request message to anotlier

randomly seIectec1 processor. However: if the message h a already travelecl t liroiugti a certain

nirrnber of processors, P2 throws away the message and informs P l t h no work is axailable.

Processor P 1 t tien begins reqriesting again.

In strong YBWC, the nodes of a g,me tree are classified into three types:

Page 34: University of Toronto · 2020. 4. 8. · -4 t liesis subrnitted in conformitjr \vit h tlie requirernents for tlie clegree of Alaster of Applied Science Graclrrate Department of Electrical

Y-P 1;: The root nocle is of t ~ p e Y-PV- The first successor a t a Y-PV nocle is of type

Y-PV tvliile the rest of the successors are Y-CUT-

Y-CUT: Ttle first successor is a Y-ALL nocle mhile the rest are Y-CUT nocles.

0 Y---ILL: Al1 successors are Y-CUT nocles.

Note tliat the clefinition of a Y-PV nocle is the sanie as tha t of a PV nocle escept tliat it proclirces

Y-PV ancl Y-C'UT nodes as successors- Fiirthcrrnore. a Y-ALL nocle is similar to an ALL riocle

escept tliat it procl~ices Y-CLT nocies as sirccessors. However. a Y-CUT riodc is cltrite clifferent

comparecl to u CUT nocle. Recall tliat a C'UT node is definecl as havirig onli- one srrccessor

2u1d that tlie lone sriccessor is of t j p e XLL. \ \ Ide tliat definition is suitable For a tree tliat is

perfectly orclerecl. when a tree is imperfectlj- orclerecl, a CUT nocIe rriay have more than onc

siiccessor. The nem- node classification specifies that these adclitional successors a t CGT nocles

are of type Y-CUT.

Strong YBIVC uses the weak YBW-C critcrion a t Y-PV and Y-,ALI, nocles. Han-ever. at Y-

CUT nocles. s t rong YBIvC' en forccs a clifferent riile: al1 --prornisirig" branches [rius t be esaminecl

before parallel search is possible. -4 proniising branch is one that is likely to prociucc a ait-off

basecl on sorne preliminary clualitj' iriforrnation. The esact clefinition of a promising brancli is

application clepenclerit, In strong YBUC there is a longer wait a t LUT nodcs before parallel

search is possible. Althougli tliis recluces the poteritial parallelism. the searcli overliencl is greatly

rccluced ancl in practice. strong YB\I;C prodiices better speecl-iips than n-eak kcB\lïC.

On a Parsj-tec SC 320 rnadiinc (basecl on tlie TSOO Transputer). weuk YBTk-C obtainccl a

speecl-up of 1:37 wlien scarcliing clicss trees ~ 5 t h 256 processors [SI. S trorig %BWC obtaineci

a speed-iip oE 142 on tlic sanie swtem. Esperirrients were also concluctecl on a Parsytcc CCel

machine (basecl on the T805 Transputer) witli 1024 processors. TVitli 1024 processors. strong

YBWC producecl a speed-rip of 344.

3.5 Dynamic Tree Splitting (DTS)

DTS [8, 91 lises a peer-to-peer approacli ratlier t h a i a niaster-slave approacli iu in YBWC.

Nocle ownersliip takes on a ciiffererit rneaning in DTS. While many processors m q - collal~orate

on a node, the processor that finishes its searcli kast is responsible for returning the node's

evdiiation to its parent,

At tlie start? one processor is set to searcli the Root nocle wliile the other processors are in

an idle state- An idle processor consults a global list of active spIit-points (SP-LIST) to fincl

work to do. If a split-point witli work is found, the idle processor joins the other processors that

are working a t that split-point and the work a t that tiode is sharecl. Wowever. if nu work can be

Page 35: University of Toronto · 2020. 4. 8. · -4 t liesis subrnitted in conformitjr \vit h tlie requirernents for tlie clegree of Alaster of Applied Science Graclrrate Department of Electrical

foiincl in SP-LIST. the idle processor- broadcasts a RELP message to a11 processors- On receipt

of the HELP message. a processor th-at is bwy copies the state of the sirbtree it is esmiining to

a shared area. T h e idle processor thwn esamines the shared area to find a srritable split-point.

If a split-point can be founcl. the splEt-point is first copied into SP-LIST ancl the idle proccssor

tlien stiares the ~vork a t the split-ponnt with the processor that originally espanded the nocle.

If a sriitable split-point cannot be found in the stiarecl iuea. t tie iclle proccssor rebroaclcasts the

HELP message d t e r a sniall clelay.

Wien a processor returns from ttiec search of some brandi to fincl no work at a split-point- the

processor simply enters the iclle stüte n-here it can try to find ivork at another riocle. However. if

a processor retirrning froni the searclu is the Iast processor at the spIit-point. then the processor

is responsible for returning the nocle-s evaluation to its parent. Furtliermore. this processor

does not cnter the iclle state but conainues working at the parent nocle.

Sirnilar to PVSpIit und Y B T V C . Si one processor disco~rers an it-riprovecl score. the score is

shared with the ot her processors wor-king at the split-point. Insceacl of an improt-ed score. if a

crit-off condition is cliscovered. a singrne processor is left a t the node as its owncr wtiile the other

processors return to their idle states-

Finding a siritable split-point uf ter having broadcast the HELP cornniand is rather cornpli-

cated. The selection procecture is n o t ,as simple as the one founcl in YE3WC. First. tiic type of

each nocie is determinecl. The set of sriles tliat is usccl to determine nocle t>-pe in DTS is cluite

different from YBWC. therefore clifferent riatnes are used to avoid an)- confusion. -A notic is

classifiecl into three types:

D-P V: A nocle that lias the samie alpha and beta valries as the Root.

O D-CUT: A minimizing nocle wi-tli the sanie beta as the Root or a rnaxiniizing nocle with

the same alpha cic; the Root.

D-ALL: Any node that does nort fit the D-PV ancl D-CUT criteria.

The types, D-PV ancl D-CUT, are eq-uivalcnt to the normal types. PV ancl CUT. respectively

Hoivever. the D-ALL node is mucli broader in scope than the ALL type. Xlthough every XLL

nocle is rilso a D-ALL nocle. the D-AILL type abo encompasses those noctes that are seûrched

due to imperfect ordering. After cletertnini~ig the node type, there are two override pllises.

During the Grst overricle phase. a nocle's G)*pe is changed from D-CUT into D-ALL if more than

three nodes have been esamined at t h e node witiiout having achieved a ciit-off. The second

override pliase cleals with the situation where there are several D-ALL nocles at consecutive

levels in the tree. DTS only ailows txwo D-ALL nodes to be consecutive in the tree. Following

the second D-ALL node. nodes <are fozrced into an alternating sequence of D-CUT ancl D-ALL

Page 36: University of Toronto · 2020. 4. 8. · -4 t liesis subrnitted in conformitjr \vit h tlie requirernents for tlie clegree of Alaster of Applied Science Graclrrate Department of Electrical

Processors 1 1 1 2 1 -L 1 S 1 16 (

Table 3.6: Performance of DTS on chess trees.

riocles. There is also a conficlence Factor associated witlr each D-CU,T ancl D-ALL nocle. If man)-

rnot-es (irp to a limit of tliree) lim-e been seardied at a D-C'UT riocle. tlien thé conficlence thrit

it is a D-CUT node is lowerecl. IE several moves have been searcfiecl a t a D-ALL node. then the

conficlencc that it is a D-ALL nocie incre'ases. -4 nocle's suitability as a split-point is baseci o n

four factors:

Speed-Up

The node niust be of type D-PV or D-ALL.

3-0 1.0

The height of the node. Nocles that are Liigher irp in tlie tree (closer to tlie root) represent

more w o r k

IF it is a D-PV node. its first branch m u s hm-e been searclied.

3.7

If it is a D-ALL node. the confidence factor should be relativelu high-

The process of selecting a split-point is quite cornplicatecl but al1 this effort is in an atternpt to

reclirce t lie searcli over heacl.

Searching cliess trees on a Cïi'~? C916/1024 machine. DTS proclircec~ the spcecl-iips in Ta-

ble 3.6 [91. Since DTS was clesignecl with sliared mernory in minci. esperimerits with a large

number OF processors are riot available as shared memory rnriltiprocessors are liard to find i r i

large configurations.

6.6 11.1

Page 37: University of Toronto · 2020. 4. 8. · -4 t liesis subrnitted in conformitjr \vit h tlie requirernents for tlie clegree of Alaster of Applied Science Graclrrate Department of Electrical

Chapter 4

Artificial Game Trees

4.1 Introduction

Although a wide variety of game trees arise in practice. as f a as the alpha-beta algorithni

is concerriecl. a trec's average branching factor ancl tlie clrialit? of its brmcli orclering risiially

governs the lengtli of time sperit searching the tree- Searctring artificially generated trees allows

one to esamine the algorithm's beliavior for a wicle variety of branching factors ancl branch

orcierings. IYtien artificial trees are risecl as a mode1 for the trees encounterd in practice: tliey

slioulcl rnirnic the behavior esliibited by real trees. Tliis chapter introduces the notion of an

eqonent ia l ly or-dereci tree. A method for generating siich trees is ctlso describecl.

4.2 Generating Artificial Trees

-4ny method used in artificial tree generation sliould satisfy three requirements, First. the

method must be able co generate a wide variety of trees. Second. wlien identical input param-

eters are used. iclentical trees shoulct be proctucecl. This is part icularly important bccause t lie

artificial trees are risecl to compare different teciinicpes ancl iclentical trees m~ist he presentect

to eacli tecliliiclue. TtiircI. the order of brandi espansion slioiilcl not affect the tree generatecl.

In secliientid searc11 this recluirernent has no effect. but in parallel search the order of branch

expansion is iisually depencient on the parallel searcli methocl. The methocl clescribed below

satisfies al1 three recluirements.

Three routines risecl for random nurnber generation in the artificid tree generator are esani-

ined first before a treatment of t.he niethod itself. Figure 4.1 shows these three routines. The

RANDOM fiinction is the main ranclom number generator. It is of tlie linear conguential type.

The R . . I N D O ~ I ~ E E D functiori is used to seed the main generator. In reality. this is also another

Linecu: congruential generator. Normally, it would have been sufficient to sirnply copy the seed

Page 38: University of Toronto · 2020. 4. 8. · -4 t liesis subrnitted in conformitjr \vit h tlie requirernents for tlie clegree of Alaster of Applied Science Graclrrate Department of Electrical

1: constant randon-lirnit - 2147483647

R A N D O ~ I S E E D ( ~ ~ . seed)

2: n d - s t a t e - (1GSO7 - seed) mod randorn-1imit

13: rnd-state - (48271 - rnd-s ta te) mod r.andorn-limit

4: return md.s ta te

5: return R X N D O A I ( ~ ~ ) - (upper - la*wer + l ) / rando~m- l imi t + lovrer

Figure 4.1: Rancloni nuinber routines usecl b - the artificial tree gerierator.

into the gerierator's state mriable. Firrtherrnore. if the seeding f~inction had to use a niultiplier.

it coiilcl have iisecl the s a n e miiltiplier LE, t lie main generator, Howev-er. u stcinchrcl seeciing

ftinction is not aclecluate for the artificial tree generator because it uses multiple raridom nrirn-

ber strearns ancl each Stream is initializecl rising ;i. seecl generatecl by anotIier strècirri. If eitlier iio

multiplier or the same multiplier wis usecl. tlieri the same secluence of raricEom numbers woulcl

be generatecl by a streiirn ancl the Stream tliat produced its seecl. The ra tdoni tree inetliocl [A]! - -

which is iisecl for parallel ranclorn number generatiou. provicles the necessary insiglit to sol\-e tlie

problem of similar seyuences - a second linear congrrential generator can be usecl to separate

the two secluences so tliat tliey appear clifferent. Additionally: note that t h e rniiltipliers usecl,

16807 ancl 48271. and the rnocl~diis. 2147483647: are known to have goocl randonmess properties

[ I l ] . Tlie R.-WDORIRXNCE function is an extension of R.-w~ohr to provide ranclom numbers

in a certain range. [loure~.. 7spper]. Tt siniply scules and shifts a randorn n ~ i m b e r generatecl by

RANDON irito the appropriate range before returning the result.

Virtually al1 of the tasks carried out by the ürtificial tree generator can be combinecl into

a sin& routine. This routine will be referred to as BRANCHGENERATE a n d is presented in

Figure 4.2. Every interna1 nocle invokes BRANCHGENERATE to generate i t s siiccessors. The

generatiori of sirccessors at each nocle is controlled by two variables, seed a n d pscore. Both of

these variables are set by a nocle's parent. The generation of successors i s as fol lo~s. First,

Page 39: University of Toronto · 2020. 4. 8. · -4 t liesis subrnitted in conformitjr \vit h tlie requirernents for tlie clegree of Alaster of Applied Science Graclrrate Department of Electrical

a new streum of ranclom numbers is initiczlized using seed (lines 1 - 2). Each successor needs

d u e s for its seed ancl pscore variables. It also needs vdues for its depth (which represents

the height of tlie tree at the node) and t?jpe (mxürnizing or minimizing) vclriablcs. Strictly

spedcïng. depth ancl t y p e are not relatecl to artificial tree generation - they are integral to

uny tree expansion routi~ie. artificial or not. Each successor's seed is generntecl bu a cal1 to

R.-w~oi\r ( h i e 17). Determining a valile for a successor's pscore is more involvecl.. A nocle's

pscore value represents the score that is obtainecl when the results rcturnecl bj- the successors

are mt~ximized or minimizecI <as requirecl at the node. The pscore For the first siiccessor is set

ecliial to the nocle's pscore. The smallest and largest score that can be generatecl is IieId in the

constants. score-min cuicl score-m,ax. respectively If a node is of the mc~simizing type. then

the value of the pscore variable for eacli OF tlie remaining successors is obtained by generating

raiiclom nurnbers i n the range [score-min. pscore J (line 10)- However. if the nock is of the

minimizing type. chen ranclom immbers in the range. [pscore. score-mas]. are irsed insteacl (line

14). Consider wlmt lias been accornplished so Far, The best score a t a nocle is d i ~ i ~ ÿ s in its

Eirst position. If the node being espancled is of the ma.simizing type. the remaining successors

will al1 have scores that are smaller tliczn or q u a 1 to tlie first successor. IF the node is of the

minirnizing type. tlie remaining successors will have scores t h are greater tlmn or equd to

the first siiccessor. This essentiallÿ procluces a perfectly ordered tree. Hoivever. the orclering

prodiicecl bj- the artificial tree generator is of no concerri since the B R.-\XCHGENERATE routine

will be followed by n B R X N C I I ~ R D E R roiitine n-hose sole purpose is to orclcr the siiccessors in

a more realistic nianner.

In Figure 4.3. tlie artificial tree generutor 11a.s been incorporated into an ALPH.-\BETA searc11

routine. At a leaf node. the mhie of pscore is retrirned as the nocle's score. At al1 other

nodcs. B RANCI- GEN NE RATE is called to generate the nocle's siiccessors. Once the esecution of

BRANCCIGENERATE is cornplete. BRANCHORDER is called to orcIer tlrc successors A possible

BRANCHORDER routine is described in Section 4 - 3 3 .

Note tliat tlie entire tree is clependent on only two values: the Root node's seed and

pscore. In the experiments condiicted, clifferent seeds are rised to generate different trees.

homever, the pscore value is L~eci, The valrie of the pscore variable is fked to the value

(score-,marc - score_min) /2 . If the Root node's pscore is too low. then the range used at the

root? [score-min, pscore], will be too small. Sirnilalx if the \due of pscore is too Iiigh. tlieri

the first successor's range, score. score-mux], will be too smdl. Therefore, a value for pscore

somemhere in the miclclle of the range of scores that are possible oKers tlie greatest fiesibility.

In the esperiments described in this test, two sets of seecls are iised to generate artificial trees.

The sets will be referred to as t g l and tg2 . The first set, tgl, contains 100 seeds while the

second set, tg2. contcuns 200 seeds. Both sets are described in Appendis 1.

Page 40: University of Toronto · 2020. 4. 8. · -4 t liesis subrnitted in conformitjr \vit h tlie requirernents for tlie clegree of Alaster of Applied Science Graclrrate Department of Electrical

1: score,min - O

2: score-maz - random-limit - 1

3 : node-rnd - new random-tgpe

4: R t \ x D o i \ ~ S ~ ~ D ( n o d e . rnd. node-seed)

5: for 2 - 1 to node.child.length

6: - 1 :

8:

9:

10:

Il:

12:

13:

14:

15:

16:

17:

18:

nocle. chi ld[ i] - new node-type

if i = i then

node-child [il .p.score - node .pscore

if node. type = m m t hen

node. child [il. tgpe - min

if i # 1 then

node.chilcl[i].pscore + R ~ ~ ~ o h ~ R . . l ~ ~ ~ ( n o c l e . r n d . score-min, nock.pscor-e)

else

,node. child [il. type +- . m m

if i + 1 then

node.ch.ild[i].pscore - R A N D O R I R X N G E ( . ~ O ~ ~ , . ~ ~ ~ ~ node-pscore. score-rnax)

node .child[il .depth - node-depth - 1

node. child [il .seed + Rxiu DO h,I(.node. n z d )

Figure 42: Generation of successors using the czrtificial tree generator.

Page 41: University of Toronto · 2020. 4. 8. · -4 t liesis subrnitted in conformitjr \vit h tlie requirernents for tlie clegree of Alaster of Applied Science Graclrrate Department of Electrical

, A L P H x B E T A ( ~ o ~ ~ , alpha. beta)

1: if node.depth = O then

2: ret urn node. pscore

3: B R . ~ N C H G E ' I E R ~ \ T E ( ~ O ~ ~ )

4: B R X X C H O R D E R ( ~ O ~ ~ ) - a: if node-type = ,max then

6: score - n1ph.a - : else

8: score + beta

9: for i - 1 to node.child.length

10: if node-type = . m u then

11: value - A ~ ~ t [ . ~ \ B ~ ~ ~ ( n o c l e . c h i l d [ i ] . score. beta)

12: if value 2 betn then

13: return beta

14: if ualuc > score then

1.5: score - unlue

16: else

17: value - ALP t[ AB ETA( node. chhi[i]. alpha score)

18: if .value 5 alpha then

19: return alpha

20: if val t~e < score then

2 1: score - .value

22: return score

Figure 4.3: Incorporating the artificiai tree generator into an alpha-beta search procedure.

Page 42: University of Toronto · 2020. 4. 8. · -4 t liesis subrnitted in conformitjr \vit h tlie requirernents for tlie clegree of Alaster of Applied Science Graclrrate Department of Electrical

CWAPTER 4. ARTIFICIAL GARE TREES

4.3 Modeling Branch Ordering

To mimic tlie branch orclerings ohsemecl in practice? kIcu.slcuicl and CampbeI1 [ l G I propose the

nioclel of a strongty ordered tree wtiere the branches are orclerecl accorcling to two rriles:

O The first branch a t an>- node is the best 70 percent of the time.

0 Ttiere is a 90 percent chance that the branch with the best score is locateci witliin the

first cluarter of tlie branches at a nocle.

In [IS]. to generate a strongIy orclered tree n-ith a hrancliing factor

a weiglit :

of 20. each branch is given

Each weiglit represents the probability of tliat rnokre being cliosen as the best at t hat notle.

Esperirnents mitli the aiithor's chess program. R a 1 . inclicate t liat the inode1 OF

a strongly ordered tree is not an accurate representntion of the brancli orderings observecl

in practice. A new mocleL the exponentïally orciered mode1 is introdiiced to oviircorne the

cleficiericies in the s trongly orclerecl model. IF necessary. c he esporient i d mode1 c m be adj ustecl

so tliat it satisfies t h criteria of tlie strongly ordered moclel.

4.3.1 A Description of RajahX

Esperience witli RajjaliX lias sliowri that tlie program is quite a strong cliess p lqer- It has

competed successfi~LIy in two toiirtiamerits, At the 1996 Dutcli Cornputer Cliess Cliarnpionsliips

the program placeci 13th in a fielcl of 20 participants. After several rnoclifications to the program.

at the 1997 Aegon Man-Machine tournament the program finishecl Et11 in a field of 100. PI-ing

oii a Pentium 166, the program's blitz rating hovers arouricl the 2500 mark on the lnternet Ctiess

Club.

RajahX uses a negczscout search routine 1vit.h nrimerous enliancemerits. At H Ieaf riode. a

quiesceuce searcli routine is used to refine tlie node before it is evaliiated. The structure of the ,.-

tree generated by the program is not iiniforzn. Variations witli rnoves t ha t seern -1nteresting"

are searchecl more deeply in orcler CO uncover any gains or losses that are outside the normal

search clepth. The program uses several methods to irnprove the efficiency OF its se,vcli routine:

O Iterative deepening. - Aspiration searcti.

Page 43: University of Toronto · 2020. 4. 8. · -4 t liesis subrnitted in conformitjr \vit h tlie requirernents for tlie clegree of Alaster of Applied Science Graclrrate Department of Electrical

0 Killer table [16. 1.51: -4 molFe tliat is Found to proclilce a cut-off is storeci in a killer table.

IF the move cornes iip again in another part of the tree. it should be placed near the front

of the rno\-e list since it is likely to produce a cut-off again.

Kis toq list E1.51: A liistory list contains 40% entries (64 x 64j. It contains a n entry for

eacli (front-square. to-square) pair in cliess. Tlie eritry indicates liow often a move from

one square t o anotlier is the best or is able to prodiice a cut-off a t a node. This tabie is

niaintained as the searcli progresses and is used to orcler the t)rcuiches a t ewry iiocle in

tlie tree.

Tra~t.s~osition table [16. 1.51: Tlie score cletermined for cacti position encountemcI cliiring

the search can be stored in a Iiasli table that is accessed rising a key generatecl frorn that

boarcl position. If the position arises again througli a different secluence of rnoves, the

table can immedia te - provicle a score for tlie position ancl search is cornpletely avoiclecl.

.I Nul1 move search [3]: In chess. a --pas" is not a legal move. Hoivevcr. if sricli a more

rvas legal. in most positions. the player on move woiilcl not pcws since the otticr pl--er

could potentially irnprove his position greatly. This observation can bc risecl clriring the

searcli ,as Follosvs. At a nocle in tlic tree. if the player on rnow rnukes a p a s ancl the

resirltirig score is Iiigher tl-ian the upper bound tlien making a. non-pczss move woiiIc1 result

in a significantly liigher score. Therefore. n-lien the score for a pcxs type move is liiglier

tlian the upper bouncl. the node can be CUL-off right aimy since it is likely to contribute

nothing to tlie overall searcti effort.

RajaliX also irnplernents ci. simple learning method [-31. Positions tliat a re founcl to be prob-

lematic For the progarn are storecl in a table ancl the prograrn "leariis" to alroicl thesc positions

mhen the - arise agairi during tree search.

4.3.2 Quality of Branch Ordering in RajahX

To rneasirre the cluality of the First brarich espancled a t eacli node. a rnetliocl sirggested in [IO]

nie~îsures the frcquency witli idiicli the first branch is the best or is able to produce a cut-off.

This frequency is referred to as fbest. The methocl assumes that the brancli that causes the

cut-off is d so tlie best. This assumption procluces a highly optimistic estirnate of the cliiality of

branch orclering because â branch that produces a cut-off at a node is not necessczrily the node's

best. Tliere is a large degree of uncertainty in the ranking of the cut-off branch among the other

branches at the node, because an exact score is unavailable for that branch and several branches

may be unesploreci when the cut-off occurs. However: fbeS t still provides useful information

Page 44: University of Toronto · 2020. 4. 8. · -4 t liesis subrnitted in conformitjr \vit h tlie requirernents for tlie clegree of Alaster of Applied Science Graclrrate Department of Electrical

abolit the tree structure. Specifically, fhesl can be tised <as a criterion for comparing the trees

prodiiced by RajahX nrit h the trees produced by the artificid tree generator.

To compute tlie value of fhest for RajahS- a minor program moclification ww necessar_v.

The negascout search algorithm is replacecl by the sirnpler alpha-bcta ulgorit hm- Alost of tlie

tirne. the negascout algorithm searches trith a nidl IL-indon*. Cut-offs are niore frecluerit in u nul1

wiriclow search and the greater frecluency of cnt-offs will skew the valire of fhhest even f~~rther .

Statistics are collectecl while RajaliX concliicts searches on the Bratko-Kopec set of 24 test

positions [ 1 3 ] . The progrcm is askecl to search each position for 10 seconcls, Tables 4 .1 . 4.3

and 4.3 summarize the statistics collectecl. Every nocle in the tree cloes not contribute to the

computation of fhes t . For esample. at a masïmizing node. if every branch retiirns a score

eqiral to the lower borind then it is not clear n-hich branch is the best. Ft~rtherrnore. a cut-

off never occurs at such a node. The second coltirnn in Table 4.1 gives the number of nocles

that contribute to the computation of Pbes t . To determine the average braricliing factor. the

number of pseudo-legal moves at everÿ node that contributes to jbeSt is totalecl. For efficiency

reasons. most chess programs generate pseudo-legd moves- The set of moves generated is

described as being pseuclo-legal sirice the set may inclucle illegal moves n-hich Ieave the king

in check. hloves that are illegal are cletectecl and rernovecl diiring the search proccss. The

ntimber of pserrclo-legal moves is a goocl estirnate of the average branching factor since itlegal

moves are not generated for most positions. However. tliis estimate is sligtitly higlier than the

real brancliing factor clne to the iiiclusion of the illcgal mo\es. The thircl coliimn in Table 4.1

indicates the total nuinber of pseuclo-lcg,d mob-es for eacli test position. Usirig the data in the

second ancl ttiird columns. the average branching factor is clcterniineci to be approsimatcly 33-

An espancled version of the origirial methocl in [IO] ivas implerriented to obtain a more cletailecl

view of the tree structiirc. The espariclecl version calculates the frecluency wit h wllicli a more

in one of the first 10 positions is tlie best or is a b k to produce a cut-off. This produces a set OF

frequericies: fhest (1). fbesL(c)) . . . - . fiest (10). Table 4.2 prescrits the first five values of frIesL anci

Table 4 .3 presents the li=t five. The term. phcsl jn). refers to the probability witli wliich a rnove

in a position, n. is the best ancl is obtained by dividiiig fbesL(n) by the nrirnber of nocies tliat

contributecl to fhest (n) (Table 4.1) .

For the trees searched by RajaIiX. tlie first branch hczs a very iiigh probabiiity (pas t (1) =

0.875521) of being the best or being able to procIrice a cut-off. On a perfectly orclered game tree

phest ( 1 ) = 100. The branch ordering tecliniques i~scd by RajahX help it achie\-e sometliing quite

close to a perfectly ordered tree. In the nest section. the challenge is to reproduce the behavior

observecl in RajahX using ûrtificiailÿ generated trees. If tlic artificially generated trees <are to

be used <as a mode1 of the trees encountered in practice tliey shoiild produce similcar values for

phes.! (11, ~ b e s t (3). - - - ~ b e s t ( 1 0 ) -

Page 45: University of Toronto · 2020. 4. 8. · -4 t liesis subrnitted in conformitjr \vit h tlie requirernents for tlie clegree of Alaster of Applied Science Graclrrate Department of Electrical

Position i

- Total

-4 .ve rage

Branching

Factor

Pseudo- Legal

Moues

Table 4.1: A coont of the nodes tliat contribute to fbs1 and average branching factor in the

Brcztko-Kopec test positions.

Page 46: University of Toronto · 2020. 4. 8. · -4 t liesis subrnitted in conformitjr \vit h tlie requirernents for tlie clegree of Alaster of Applied Science Graclrrate Department of Electrical

Position

Total

Frequenq With bViich

Moue is Best or

Produces a Ct~t-off

fhest (4)

5

1 1383

128.5

13 10

1209

1113

116.5

2331

1369

9S.5

l6GO

1427

1615

2032

937

13 12

1081

1135

1267

2869

571

418

1961

390

131261

3 .O 14957

fhest (-5)

Table 4.3: The first five values of fbmt derived from the scarch of the Bratko-Kopec Lest posi-

tions*

Page 47: University of Toronto · 2020. 4. 8. · -4 t liesis subrnitted in conformitjr \vit h tlie requirernents for tlie clegree of Alaster of Applied Science Graclrrate Department of Electrical

Freqztency With LVEich

Moue is Best or

Produces a Cut-08

f hest ( 7 ) fhest (9) Position

Total

Table 4.3: Tlie last five values of fbest derived from the searcli of the Bratko-Kopec test positions.

Page 48: University of Toronto · 2020. 4. 8. · -4 t liesis subrnitted in conformitjr \vit h tlie requirernents for tlie clegree of Alaster of Applied Science Graclrrate Department of Electrical

4-3.3 An Exponential Branch Ordering Mode1

Imagine a f~inction whose sole purpose is to orcler the branches at a nocle in a best to worst

orcter. Althorigh s~ich perfect orclering is impossible. the frinct ion rnakes an eciucated at tempt

a t creating a best to worst order. Tliere are two branch lists: one is rinorderecl ancl the other is

orclerecl. InitiaIl- the unordered Iist is frrU ancl lias 6 (branching factor) mernbers. The orclerecl

list is ernpty. UntiI t lie unorclerecl list is e m p t - tlie fiinction moves branclies. one at a tirne. from

the unorclerecl list into the orderecl Iist. Branches must be carefully selcctecl from the unorderect

list since they are being moved into an orclered k t . To aicl the selection process, the f~rriction

uses domain depenclent knowledge to predict which of the rernaining moves in the unorciered

list is the best- In the eqonential brunch ordering rnoclel. t ~ o weights. tu, and wh. represent

the effect of tlie cloniain clependent knowledge on the resriltant ordering. When selecting the

first mo\-e. the knowlecige assistecl selection meclianism is s~iccessfiil in picking the best move

u-ith a probabi1it'- of w,/100. For ail other moves. the knowleclge assistecl selection mechanisni

is successfiil in picking the best rernaining move from the unorclerecl list \vitil a probability of

w b / l O O . !\%en the clornain dependent knowledge is not good enough to select the nest best

move. tlie probability that the move pickecl is stiil the best is given by: 1 / ( b - n + 1). This is

the probability whcn n simple ranclom pick is irsecl to select the nth branch. The routine in

Figure 4.4 proci~ices an esponential branch orclering. Both the orcferecl ancl rinorclered list are

storecl wi t liin the node. child array. \\%en selecting the first niove. the orcierecl list is eniptj-

ancl the entire arr+- stores the elements of the unorclerecl List. iL7ien sclccting brancti n. the

elements in the range [l. R - l] are members of the orderecl list ancl elerrients in the range In. b]

are meinbers of the unorderecl List.

An esponenticzl hranch orclering lias the following properties:

The orclering generatecl is controlled by three parameters: b. LU,, sirid wh. Parameter.

6. represents the branching factor. Parameters. w, and wb. represent the effect of the

knowledge component in the ordericig Eiinctioii culcl are values in the range [O. 1001. In

Figure 4.4. the cocle in the first patli (lines 8 - 29) selects the branch witli the best score

among tlie rernaining branches. wliile the second pat h (Iine 21) selects one of the remaining

branches at random. The selecteci brancli is tlien placed in position i. Parameters. w,

and w b t are usecl to cletermirie whicli of tlie two p a t l ~ s is taken.

a The first bruich üt ci iiucle is bsst witli probability, p p given bu:

0 A brancli at position n: nrhere 2 5 n 5 6: is best with probability. p,? $ven by:

Page 49: University of Toronto · 2020. 4. 8. · -4 t liesis subrnitted in conformitjr \vit h tlie requirernents for tlie clegree of Alaster of Applied Science Graclrrate Department of Electrical

Branch,

Position

Branch is Best or

Produces a Cut-08

Table 4.4: \/allies of Jbesl and phest clerived from the search of LOO ar t i f ic idy generatecl trces.

We now acldress the clucstion of wllether an esponentialIq- orclereci moclel is a n adequate rep-

resentation of tile trees encountered in practice. In otlier worcls. can un esponentially orclerecl

tree procluce the same values for fflesL micl phest as the trees searcl-recl bj- R a j a l X ? Table 4.4

illustrates the vdiies of fbeAt a n d phest collccted during an alplia-beta searcli of 100 ~zrtificial1~-

generated trees (generateci using set tgl). Each tree wcîs searchecl to a clep th of 7- Tlie esponen-

tial orcleriiig erriploycd by tlre artificial tree geiierator irsed the paranietcrs: b = 32. w, = 79 and

wb = 5 . Figure 4.5 compares the probabilitj- curve procluced by RajahX (generatecl iising TabIes

4.2 ancl 4.3) with the probnbilit5- curve produced by t h artificial tree generator. Clearly. when

the appropriate parameters a r e usecl. the trees produceci bj- an esponential ordering closely

resernble the trees searchecl by R+jnhX.

The esponential mociel can satisfy the strongly orclerecl model's criteria when suppliecl \vit 11

[lie right parameters. X strongly orderecl tree with a branching factor of 20 is procluced when

the parameters. b = 20. wu = 69 u i d wb = 19. a re used. T h e data in Table 4.5 w u generated

iising these parameters- The second column is the resiilt of applying Equations 4.1 anci 4.3 for

various values oE n: this colurnn presents the probability wit h mtlich a move in n position. n. is

the best. A running t o t d of t h e probabilities in the second column is maintained in the tliircl

column: this colilmn presents t he probability with which a move in one of the first n positions

is the best. The probability of the first move being the best is 0.705500. This is slightly higlier

than the 0.70 required by the strongly ordered model. The best branch is mithin the first

cluarter of the branches a t a node witli a probability of 0.899916. This figue is slightly lower

Page 50: University of Toronto · 2020. 4. 8. · -4 t liesis subrnitted in conformitjr \vit h tlie requirernents for tlie clegree of Alaster of Applied Science Graclrrate Department of Electrical

B R . - L N C ~ [ ~ R D E R ( ~ L O ~ ~ )

L: for i - I to node.child.length

2:

:3:

4:

-5:

6: - ( 1

8:

9:

10:

Il:

12:

13:

14:

15:

16:

17:

18:

19:

20:

21:

32:

if i = 1 then

tu, - tu,

else

LU, - Wb w - R = l ~ ~ ~ h t R ~ ~ ~ ~ ( n o d e . r n d . 0.99)

if w < lu, then

if node-type = maz then

u - score-min - 1

for j - i to node.child.length

i f nocle.chilcl~].pscore > v then

P-+

u - node .child bl -pscore

else

v - score-muz + 1

for j - i to node.chilS.lengfh

if n ~ d e . c h i l c l ~ ] . ~ s c o r e < u then

P - 3 ' v - node .çhiEdb] .pscore

else

p - R ~ w o o b r R ~ ~ ~ ~ ~ ( n o d e . r n d . i . node.child.length)

swap node-child [il .pscore - node .child [pl-pscore

F i 4-4: A roiitine tliat procllices an esponential branch ordering.

Page 51: University of Toronto · 2020. 4. 8. · -4 t liesis subrnitted in conformitjr \vit h tlie requirernents for tlie clegree of Alaster of Applied Science Graclrrate Department of Electrical

Brancii Position

Figue 4.5: Comparing the values of p h , , From Rajah>( with the values of ph,,, Froiri artificially

generated trees.

Page 52: University of Toronto · 2020. 4. 8. · -4 t liesis subrnitted in conformitjr \vit h tlie requirernents for tlie clegree of Alaster of Applied Science Graclrrate Department of Electrical

Table 4.5: ,411 exponential orclering witli pxarneters. b = 20. w, = 69 ancl wf, = 19.

than t.he recluirecl value of 0.90. Higlily precise values for .lu, ancl .wb c m be iisecl to pro-duce the

esact valiles rccliiirecl by the strongly ordered moclel. but the routine in Figure 4.4 u s e s integer

arithmetic and it only accepts integral values in the range [O1 1001 for .w, ancl u b . ThGs is only

8 minor limitation and it does not have a significant impact on the performance esp~erirnerits

to be clescribed in Chapter 5 ancl 6.

The performance esperiments in tliis work will consider fiw different brandi orderings as

generatect by the following parameter sets:

Page 53: University of Toronto · 2020. 4. 8. · -4 t liesis subrnitted in conformitjr \vit h tlie requirernents for tlie clegree of Alaster of Applied Science Graclrrate Department of Electrical

I 1 - IO 20

Brancli Posi~ion

Figure 4.6: Tlie probability curves generated by the rive parameter sets usecl in the esperinients.

The first parameter set generates trees that resemble tlie trees searclied by RajallX. wllile the

second generates strotigly orclered trees. The last three parameter sets are clesigned to illustrate

the effect of clecreasing branch ordering accuracy The fiftli paraineter set generutes trees tliat

are ranclonily ordered- Ecluations 4.1 and 4.2 can be risecl to compute the probability ciirvcs

for the five branch orderings tlie resulting curves are plotted in Figure 4.6.

Page 54: University of Toronto · 2020. 4. 8. · -4 t liesis subrnitted in conformitjr \vit h tlie requirernents for tlie clegree of Alaster of Applied Science Graclrrate Department of Electrical

Chapter 5

Neural Network Based Prediction

5.1 Introduction

In n paralle1 aiplia-beta search. a split-point must be chosen carefully if large increctses in search

overhead are to be 2x1-oided. If a node that will be cut-off is selected as a split-point. useless

work i d 1 be perforniecl when many processors collaborate on the nocle- On the other hancl. a

node that esamines al1 of its successors makes a pcrfect split-point. 111 xlclition. for efficiency

reasons. a iiode should bc selected as a split-point only wlien it is reasonabl_v certan tliat its

score will not change f~irther.

Civen n large set of training data. a simple feecI forwarcl nciiral network ['il cari be taiiglit to

approsimate a wicle varietu of F~inctions. For the piirposes of a parallel alplia-beta searcher. a

feed-forwarcl ncurd network c m be tauglit to prcdict rvlien a nocle's score is going to stabilize

ancl svlietlier a nocle is going to ciit-off. .A neural rietwork that is capable of perforrriirig these

tsvo tcxks is prcserited here. its application to pczrallcl alpha-betâ seczrch is esamineci irsing botli

secpential ancl parallel esperirnents.

5.2 Neural Network Inputs and Outputs

T h e neural rietwork is callecl as a new cliild node is being generated. Three inputs are rcqiiirecl

by the network:

pv: The value of tliis input is 1 if the parent nocle is among the first nocles to be espcinclecl

by the search (Le. a PV node). It is O otherwise.

type: This input represerits the parent node's type ancl it is a value in the range [O. 1001.

index: The position of the chilcl being espancled among the other chilclren at this node.

T h e network produces two oiitputs:

Page 55: University of Toronto · 2020. 4. 8. · -4 t liesis subrnitted in conformitjr \vit h tlie requirernents for tlie clegree of Alaster of Applied Science Graclrrate Department of Electrical

type: Represents the chiid nocte's type and i t is a value in the range [O. 1001.

0 wait: This output represents the number of rnoves that need to be searchecl before the

score at the cliild node stabilizes-

Strong YBWC (Section 13-41 clczssifies a nocle into one of three types. Furthermore. in YB\;t7C'.

$\.en a node and its tjpe. one can determine the type for the nocie's cliildren as nrell. The

netn-ork presented tiere performs a similar tzzsk: given the parent's type. the netn-ork generates

the child's type- Note that a node's tlvpe m l h e one of 101 ciiffcrerit types. This generality is

necessary to allow tlie neural net to cliscover irrteres t ing patterns in the training clata.

A paralle1 alpha-beta searclier can tvait unt i l the number of trioves specificcl by the oiitpiit.

iuait. is searcliecl sec~iientidly before trying pxrullel searcli. How?ver, iising tlie oricput. type.

is slightly more involveci. If the vdue for a node's tgpe Falls within a range of niimbers that

are knox-n to represent cut-off nodes, estra caxe can be used before the nocle is selectecl as a

spIit-point or tlie node can be avoided al1 toget lier.

5.3 Generating Training Data

-4 single piece of training clata (also referrecl to as n pattern) consists of a group of input values

and the espected output values for the given inputs- Generating training data is ci. three stcp

process. First. a basic secluential alpha-beta se;lrch is modifieci to generate the following data

at every nocle:

( p l . P.). . . . . I I r L - 1 ) : S pecifies the ~adclress" of the nocle in the tree. Herc. n is eqiial to the

heiglit of the tree bcing searcliecl. Each p value specifies the move that was selected ut the

trce level indicated bj- the subscript. For csample. the node labeled B in Figure 5.1 lias

the aclciress: (1.2.2). FVlien adclressing a xocle. if a particular lewl has not been reached

as yet. the corresponcling p value is set t a O. For esample, the node labeled A hi= the

zdclress: (2.1,O).

t: A type value czssociated with the node, If a cut-off occurs at the node. then its type

value is 50. If a riocle updates its score b u t does not cut-off, then its type value is O. If a

node cloes not upclate its score and does mot cut-offt then its type value is 100.

u: The incles of the inove a t which the Enal score update occurred or the indes of the

move that produced a cut-off.

Valires for ( p l : p, . . . . prL-Z)? t and u are collected as the 200 trees from set tg2 are searctied

to a ciepth of 4, Since the data is collected by traversing several different trees. tliere mq- be

Page 56: University of Toronto · 2020. 4. 8. · -4 t liesis subrnitted in conformitjr \vit h tlie requirernents for tlie clegree of Alaster of Applied Science Graclrrate Department of Electrical

Figure S. 1: Nocle aciclressing.

severai different values of t and u for a nocle adclress (i. j. k). This problem is rectified in the

second step b - corrlpiiting the average \ d u e of t and u for each nocle aclclress in the tree. The

thircl ancl final step clefines a mapping between adjacent levels in the tree. Let u s consicler how

a mappirig ma+- be formed betn-een the second ancl third l e d s in the tree. ..A node a t clie secoricl

le\-el of the tree lias an acldress of the form: (i. O. O) . i # O. T h e average vdiies of t ancl u â t this

nocle are denoted taVg (i. O. O) ancl L L , ~ . ~ (i. 0.0). respect ivelj-. -A nocle ac the tliird levd of the t ree

1123s <an aclclress of the form: (i. j. O). i # O. j j O. A mapping tjetween a parent a t the second

level ancl a clrilcl a t the thircl level can be defined as follows:

This mapping is coniputecl for every possible vdiie of i and j. The same process is appliecl to

clefine a mapping between the nocles at the first and second levels ancl between the nocles a t

the thircl and Forirth levels.

The data generatecl by the three-step process just clescribed can be usecl as training da ta for

a neural network. The tiuk of the neural network is to discover the fiinction tliat maps a pair

of valries at a parent into a pair of mlues at the chilcl. For additionid accuracy. a booleari value.

p u . is ais0 used as input to the function. This boolecm value specifies whetlier the parent node

is czrnong the first nodes to be espanded by the searcii. Using the notation of Section 5.2. the

task of the neural network is t o discover the rnapping that procluces ( t g p e . wait) an o u t p ~ i t

given ( p u , t gpe , index) =as cm input.

Five sets of training da ta a re generated, one for each of the five ordering types tha t a re

considered in this test (see Section 4.3.3).

Page 57: University of Toronto · 2020. 4. 8. · -4 t liesis subrnitted in conformitjr \vit h tlie requirernents for tlie clegree of Alaster of Applied Science Graclrrate Department of Electrical

CEIAPTER 5 . NEURAL NETWORK BASED PREDICTION

5.4 Neural Network Structure and Back-Propagation

T h e striictiire of the neural network usecl is shown in Figure 5-2. Tt consists of three layers: <an

input laver (il- i2. i3.

nodes. ih ancl hb. are

netirons in the input

ib) a hiclden layer (hi: h2, . , . . hpt hXb) and an output layer (oi, 07). T h e

hius neurons which always output 1. Escept for the bias neurons and the

layer. the output of any other neuron is $\*en by the following eqiiation:

Figure 5-3 provicles a plot of f (13) for -6 < 3 5 6. IF ,3 is a large negati\-e value. the frrnction

returns a value verÿ close to O. However. if 3 is a large positive v,diie. the fiinction retrirns a

\-due close to 1. For the Il.-th neuron in the liiclden la>-er. 3 is given by:

Similady. the value of 3 for the k-th riairon in the output layer is gïven bj-:

A weight value. wh or UO. is associateci with each input to a neiiron ancl it is usecl to scale the

input before it is sunimeci. Note that each neuron I i L a s its OF\-n set of weigits - one neurones

set of weights is not necessari1~- the s a n e as anot lier neuron's. The weights. Z U ~ ~ J , and iuok.6.

are iisecl to scale the bias input.

T h e output OF s neuron is n value between O and 1- In keepirg with this convention. the

inputs to the neiiral net are scaleci a1ic1 sliifted to fit the range [0.1.0.9] before being placecl in

the input layer, Scalecl and shifted versions of pz!. tgpe anci index are placecl in neurons i l i2

and i3 respectivel- Sirnilarly. the values in the output li1v~ï are scaleci and sliiftect from the

range (O. 1.0.91 into the range recliiireci by tlie output variables.

To train the neural network. a set of input values is applied to tlie network. Tlie network's

outputs are comparecl to the espected oiitputs t o determine the error tssociatecl mith the

outputs- The error associatecl witii the k-th output neuron is given by:

Here. ok(p) represents the output of the k-th output neuron for the input pattern p ancl d k ( p )

reprtsents the desirecl output for the k-th output neuron. Error in the neural network output

cari be reducecl by applying the back-propagation method [ i l to adjirst the wiglits in the hickien

and output layers. Tlie acljiistment to the weights at the k-th neiiron in the output li~yer is

given by:

Aw0kvj = vn/k(p)hj (p) (5-6)

Page 58: University of Toronto · 2020. 4. 8. · -4 t liesis subrnitted in conformitjr \vit h tlie requirernents for tlie clegree of Alaster of Applied Science Graclrrate Department of Electrical

Figure .5.2: Neural network structure.

Figwe 5.3: The thresliold funcrion ~ised for euch neuron in the neural network. f(3) = -.

Here. r! is the learning rate and y k ( p ) is given by:

Sitnilarly the weiglits for the k-tli rieuron in the hiclclen l a y r are aclji~stecl irsing:

Although Sk(p) and -ik(p) play similar roles, the computation of S k ( ~ ) is more involved:

Page 59: University of Toronto · 2020. 4. 8. · -4 t liesis subrnitted in conformitjr \vit h tlie requirernents for tlie clegree of Alaster of Applied Science Graclrrate Department of Electrical

Table 5.1: Average sqi~arect error for the firial neural network obtainecl &er training.

Ordering (b..ur,? w b )

(32.79.5)

A training iteration consists of feeding every training input to the network. Ariy error in t lie

outpirt is used to adjust the network n-eights. -4s the iterations progress. the average scluarecl

error in each output is rncasured. The average sclriared error for output X: is giveri by:

Here, P is the set of al1 training patterns. One rieural network was createcl for every t ipe of

ordering coiisiderecl. Eacli network was trained over 10000 iterations ancl tlie Learning rate. q.

iras set to 0.1 cluring the training. Table 5.1 sumrnarizes the average sqriarecl error for tlie final

network obtainecl after t raïning.

Etw 1 Ewit

5.5 Performance of Node Classification Schernes

37-460133

Consicler threc different scliernes tliat predict wliether a node nrill cut-off or will esamirie al1

its successors. The first scheme, wliicli will be referred to as sclierne A. is taken From strong

YBWC. X nocle in tiiis sclierne is c1:usifieci cas'follo~vs (Section 3.4):

0.633316

O The root uocle is OF type Y-PV. The first successor at a Y-PV node is of type Y-PV. while

tlie rest of the successors are 'i'-CUT. - The fi rst successor to a, Y-CUT nocle is a Y-ALL node. wliile the rest are Y-CUT nocles.

w Al1 successors at Y-ALL nodes are Y-CUS nocles.

The second scheme, scheme B, is derived from the description of the DTS algorithm (Sec-

tion 3-5):

0 A nocle that h a the same alpha cmd beta vdues as the Root is classifieci as D-PV.

A minimizing node with the same beta LZS t h Root or a maxïmizing node with the same

alpha as the Root is classifieci as D-CUT.

Page 60: University of Toronto · 2020. 4. 8. · -4 t liesis subrnitted in conformitjr \vit h tlie requirernents for tlie clegree of Alaster of Applied Science Graclrrate Department of Electrical

Any nocle that doesn't fit the first two categories is clcassifiecl as a D-ALL node.

If more than some number of moves. n. have been searched at a D-CUT node without a n

occurrence of a cut-off. then the type of the node is changecl to D-ALL.

Two D-ALL nocles are allouecl a t subsequent levels in the tree exact ly once. Folloiving tlie

two D-ALL nodes. if tliey esist. noclcs at sribsec~iient levels are forced into an alternating

seqrience of D-CUT and D-ALL nodes.

Both scheme A anci scheme B have types thut corresponcl to the ALL anci CUT types in a

perfectly orclered game tree. Atliougli P V nodes are naniecl clifferently in eacli scherne. a node

that woiilcl b e clsusifiecl l'--Plr in schemc .A woirlcl be classifiecl as D-PV in sclierne B. Each

scherne consiclers its PV cjpe to be tlie first secpiencc of nocles espanclecl.

In scheme B a nocle maj- change froni beirig a D-CUT nock to a D-..-ILL nocle. In tliis case.

only the initial prccliction is iised in determining performance. Note that wlien the D-C'UT

node changes. it affects the chssification of nodes tliat are below it in the trce.

The final scheme. sclierne C. is an applicatiori of the nerird networks tliat were clescribecl

in tlie preceding sectiori. In tliis scherne. a nocle may be classifiecl as beitig one of 101 types.

The neural network generates a type vdirc for a chilcl nocle g-iven the parent's type valiie ancl

the locacioii of the chilcl amorig the parent's other chilclren. IF the type valiic for a riocle fdis

witliin a range [5O - rrt. 50 + rn.] centerecl arouncl 50. the node is clcfiriecl to lx? i i r i N-CUT riocle.

If the type Falls oiitsicle this range. the riodc is saicl to be an N--4LL nocle. If a nocle is arnong

the first nocles to be expanclecl by the search. the neural network oirtpiit is not useci and tlie

nodc is definecl to be an N-PV nocle.

To conipare the perforniance of sclienies A. B and C. each schcme was used to ccitegorize

the interna1 nocles of 100 artificially generatecl trees. In each tree. tlic first secllience of nocles

espancled (PV riodes) arc not iricliided in the performance measure becarise al1 tlirec sclienics

gcnerate the same type for tliese nocles. The trees were genercitecl using set t g l ancl were

searchecl to a deptli of 6. Tables 5.2. 5.3 ancl 5.4 surnrnczrize the results of tlic espcrirnents.

In each table. the TmeCuT column indicates how ofteri the precliction iws correct for a CUT

b p e node (Y-CUT. D-CUT or N-CUT). In other words. this coliirnn inclicates how often a

CUS type node was Founcl t o cut-off mithout having searclied al1 of its branches. Similarl_v, the

T7-ue-LlLL column inclicates how often an ALL type node (Y-ALL. D-ALL or N-XLL) searchecl

al1 of its branches. On the otlier licznd. the FalseCuT columri indicates the niimber of CUT

nodes that dicl not cut-off and the F a l ~ e - . ~ ~ ~ column indicates the nimber of ALL nodes that

did not search al1 of their branches. The Error column is obtained by summing the number of

mispredictecl CUT nodes and the nrimber of mispreclicted ALL nodes. When using scherne B.

a value must be specified for parameter n. Recall that n is the number of moves after whicli a

Page 61: University of Toronto · 2020. 4. 8. · -4 t liesis subrnitted in conformitjr \vit h tlie requirernents for tlie clegree of Alaster of Applied Science Graclrrate Department of Electrical

Table .5.2: Performzuice of Scheme -4.

D-CUT nocle changes into u D-XLL nocle. The seconcl colirrnn in Table 5.3 spccifies the valire

of n that [vas fonncl to gi\-e the l o ~ e s t precliction error for a gït-en orclcring. -1s in sclierne B.

sclienie 1 recluires a \-dire for parcuneter m. Recall that ttiis parameter defines a range of type

values [SO - m. 50 + ml that are usecl to check if a nocte is of the Ct iS twe. The scconcl columii

in Table 5-4 specifies the value of rn that ivCas founcl to give the Ion-est prcdiction error.

The lowest error. regardless of branch ordering. is eshibitecl by scherne C. Schcrne A. n-liicli

is basecl on strong YBIVC. l ias the nest highest error. The worst perforniance is exhibitecl bu

scheme B. Sclienie B obtains its best resiilts with n = 1. IVlien n = 1. sclieirie B eiiip10j-s a

rulc tliat is cririte similar to a rule used by schcme A, Ifiith n = 1, when one successor has been

esaminecl at a D-CUT node witlioirt an occurrence of a cut-off. the nocle's type is cliangecl to

D-ALL- A successor to tliis nen' D-ALL node will be of the D-CUT type if the rule regarding

two consecutive D-ALL nodes is rrscd- Note the similarity bet~veen this and scheme A's rule for

Y-CUT nocles. In scherne A. the first successor to a Y-CUT node is of type Y-ALL wliile tiic

rest are of type Y-CUT.

In Table 5.4. note that the last row hm two entries that are 0. This rneans tliat for ordering

(32.0.0) sclierne C does not use the N-ALL classification at alI! Consicler what huppens iri

schcme A for ordering (32.0.0). Scheme A phces 2227973 nodes into thc Y-ALI, category.

However. orily 971304 were founcl to be trire Y-ALL nocies. Sctierne C takes the approach of

eliminat ing the N-ALL category a11 togetlier. thereby elirninating the problern of inaccurate

classification. Altliough this I n y seem siniplistic. this is a safer approach for a parallel alpha-

b e t ~ searcher since a CUT node is subject to more stringent rirles before it is picked iu a site

for parallel search.

F i g ~ r e 5.4 presents 5 bar graphs conipûring the performance of schemes A. B and C. Scherne

B hm a significmtly higher error than either scheme A or C for al1 orderings considerecl. The

perf~rrn~uices of schemes A and Ci are relatively close to each other escept for those orderings

where wb is of reasonczble size. For esample. when the ordering (32.66.66) is considered. scheme

Page 62: University of Toronto · 2020. 4. 8. · -4 t liesis subrnitted in conformitjr \vit h tlie requirernents for tlie clegree of Alaster of Applied Science Graclrrate Department of Electrical

Table 5-3: Performance of Scheme B.

Table 5.4: Performance of Scheme C.

Page 63: University of Toronto · 2020. 4. 8. · -4 t liesis subrnitted in conformitjr \vit h tlie requirernents for tlie clegree of Alaster of Applied Science Graclrrate Department of Electrical

Error x 10.'

1.5 Error

10 x 10e5 -

.3

O

30

2 O Error x IO" 10

O A B C

Orcleririg i20.69.19)

A B C Ordering ( 3 2 . o. O)

Figure 5-4: Cornpuring the performance of schemes -4. B ancl C.

A prodr.ices an error \-due of 149611 wliile sclierrie C procluces an error value of 314338. In this

case. sclieme C produced an error val~ie that is over 40 percent lori-er tlian tliat producecl b>-

scheme A. In Figure 5.5. the improverrient of C over A is plottecl against the value of ~q, tliat

produced it. For tiiglier values of wb. scheme C dominates scheme A I>- a significiint rnargin.

Page 64: University of Toronto · 2020. 4. 8. · -4 t liesis subrnitted in conformitjr \vit h tlie requirernents for tlie clegree of Alaster of Applied Science Graclrrate Department of Electrical

Iniprok-ement of C' over X in 7% 20

F i 5 : Comparing the performance iniprovemcnt of scheme C over sclieme -4 For various

values of w b .

Page 65: University of Toronto · 2020. 4. 8. · -4 t liesis subrnitted in conformitjr \vit h tlie requirernents for tlie clegree of Alaster of Applied Science Graclrrate Department of Electrical

Chapter 6

Experiments on a Parallel

Alpha-Beta Simulator

6.1 Introduction

In Section 5.5. tlie performance of thrce different riocle classification schemes [\-LW corrsiclerecI

within the fr,urlework of a seclirentinl alplia-beta searcher. Were. the tliree scliemes frorri tliat

section are estc-nclecl to create tliree split-point selection schemes n-hosc performance is stirclied

iising a parallel alpha-beta simulator. The pnrdlel alpha-bcta si~tiiilator (PABSini) is a partial

irnplernentation of the DTS rnetliocl clescribecl in Section 3 5 on a simplifieci sharccl mernory

multiprocessor systern. The irnplernentation is partial in the sense tliat ttierc is no split-point

sclection riieclia~iistn iri P-ABSiin. It is t lie resporisibility of the user to specify the details of the

split-point selection mechanism. This hcility will be used to compare the performance of the

tliree split-point seIection scliemes.

6.2 A Simplified Shared Memory Multiprocessor

The simplifieci multiprocessor that is considered here bas 4s processors. ,An illiistratiori of the

system is presented in Figure 6.1. Each processor ficas a portion of the total nicrnory in the

system. A cache is attacheci to eacli processor primarily to providc fCwt access to bot11 local ancl

remoce mernorjr Each cache line is 128 bytes wide. Memory in the system is cache colierent and

coherericy is ensurecl using the 3-state LISI protocol in the caches dong with a directory protocol

[2] in the mernory units. Details of how the processors cire interconuected and the nature of the

intercorinection medium does not have a significant impact on the parallel alpha-beta searclier

that is going to be described thus this issue will not be considered further. Hosvever: the timing

of the various types of mernory accesses is of p e a t significance and this topic is treated nest.

Page 66: University of Toronto · 2020. 4. 8. · -4 t liesis subrnitted in conformitjr \vit h tlie requirernents for tlie clegree of Alaster of Applied Science Graclrrate Department of Electrical

--

Processor lnterconnect i I I

Cache X Figure 6.1: The simplifiecl sliarecl memory rnultiprocessor systeni.

Timing in the simplifiecl in~iltiprocessor systcin is espressed in ticks. The \ d u e of a tick is

not important as long as al1 timing in tlie system is espressed in ticks. -4 tick is clefiiiect to be

tlie lengtli of time it takes for a block of rnemory (caclie Iine) CO return froni the local niemory

1rnit.

A bloclc in tlic cache ma>- be in one of three states: moclifiecl (AI). sharecl (S) or invalid (1).

Furtherrnore. to etisure cache colierency. tlie home node lias a fiag inclicating rvlletlier a block

is clean or dirty dong witli a List of nocles that have a copv of the block. T h c clelaj-s associnted

with reading a block that is in one of several possible states is given in Table 6.1. For esample,

wben reading a block tliat is either in state 31 or statc S. the reacl cari be performecl clirectly

from the cache ,and cliere is virtiially no clc1a.y. If the block is in state I and is flagged as bcing

clean at the home node. it cari bc rend in 1 or 2 ticks. Tlie reacl takes 1 tick if the block is local

and 2 ticks if the block is froni a remote nocle. Horvever. if the block is Aaggecl as being clirty.

the block Iias to be read from tlic nocle that niodifiecl t h block ancl extra clelq-s arc incurrecl.

A set of delays for block 11-rites is preserited iii Table 6.2. If a block is in state AI in the caclie.

it can be modifiecl witliout any clelay. However if a block is in state S or state 1. invalidations

have to be sent to a11 processors tliat liwe 21, caclied copy of the block.

The cache in the sirnplified rnultiprocessor is infinite in s ix . This simplification nieans that

capacity ancl conflict misses do not occur in the caches. However, this z,issuniption prochces

rcs~ilts t h are clrastica~lj- cliffercnt from reality if the prog-rcirn being studied 1m.s a large workixig

set. Fortunately. in the c~îse of an alpha-beta seurch routine. the working set is quite small. A

single block is usually enough to store al1 the data at a node. For exunple. a ctiess mot-c is

represented using 2 bytes of memorÿ: 6 bits are used to represent the square that the piece is

rnollng from. another G bits are iisect to represent the square that the piece is moving to and

the remaining 4 bits are used as Aags. On average. a chess position lias 32 moves available.

thus 64 bytes are enough to represent the move list at a node. A few extra bytes are reqiiired

to Iiold the value of alpha. beta and any other temporary variables. If the heiglit of the tree is

G ancl the data at any node can be stored in a single block of memory. 6 blocks are enoiigh to

Page 67: University of Toronto · 2020. 4. 8. · -4 t liesis subrnitted in conformitjr \vit h tlie requirernents for tlie clegree of Alaster of Applied Science Graclrrate Department of Electrical

- - - - - - -

Dirty/Clean Locai/Remote

C lean Local

[ Cach,e State I

Clean 1 Remote 1 ? !

Dirty / Local / 1

H o m e State

Dirty / Remote 1 3

Table 6.1: Delay for 2-1 niernory read under varioris conditions in the simple m~iItiprocessor~

Home Node

Cache State

(11, t /S/l)

Delay

Home State

(Dirtg,/Clean)

Dirty/Clean

C lem

Clean

Ckan

Dirty

Dirty

Local/Remo te

Local

Remo t e

Remote

Local

Remote

Delay

( t icks)

Table 6.2: Delay For a rriemory write iincler various conclitions in the simple niultiprocessor.

store t h e da ta for the nocles alorig tlie pach Frorn t h e root to the leaf OF the tree. Tlierefore. the

working set is no more tlian G blocks.

6.3 Multiprocessor Simulation

Nornially a mdtiprocessor sirnulator ancl the program being run o n the simulator are two sep-

arate entities. However: tlie approach that is taken in PABSim is to embed the mriltiprocessor

simulation withiri the code that perfonils t lie parallel alplia-beta search. Before cletailing the

simulation process itself, a description of fibers [19] is $\-en. Fibers are of priniary importance

to the simrilation process because tliey provide the mechanism by wliich several processors are

simulatecl on a u~iiprocessor sus te~ri.

Most operating systems tochy provide support for threacls. A process running on such an

operating system may create threads that are then schedulecl by the operating system. Al1

Page 68: University of Toronto · 2020. 4. 8. · -4 t liesis subrnitted in conformitjr \vit h tlie requirernents for tlie clegree of Alaster of Applied Science Graclrrate Department of Electrical

threacls have access to the dam within the process. The Win32 subsystem that is foir~id in

h,Iicrosoft's Windows 98 cand Windows YT operating sp t ems provicies support for threads.

Stsxting with version 4.0 of the subsystem. support is nlso provided for un entitl referrecl to

as a fiber. Fibers are similar to tlireads escept ttiat they are scheclulecl 1- the user ratlier

than the operating system. In a rnultiprocessor simulation. a fiber can bc used to simulate the

activity of a processor in the system. Fibers are more natural tlism threads For a mtiltiprocessor

siindation because precise control is reqriirecl over the process of selecting whicli processor to

simula te nest .

Only three 1irin32 frinctioris are rieeclecl to perform tlie basic fiber operatioris. -4 fiber can be

crecitecl witli a call to CREATEFEBER. The Ftinction takes three arguments: a \-due For the size

of the stack rised for tlie fiber, a pointer to a. function where the fiber begiris esecution ancl a

p~lrameter to be passecl to the f~inction where esecution begins. The f~inction returns a hanclle

to the newly created fiber. During cleanup. a fiber can be cleleted nith a call to DELETEFIBER.

A fiber that is esecuting can suspend itself ancl ccm scl-iedule another fiber for execution with a

c d to SWITCHTOF[BER. The f~mction takes the lianclle of the fiber that is to be schechileci as

un argument -

6.3.2 Simulation Support Routines

Five routines arc clefined to lielp create multiprocessor siniulations. The first F~inction. YLELD.

advances tlie sirniilation tirne for tlie fiber tliat caUs tlie function sincl allows tlie esecution of

other fibers that are waiting to riin. Sctiecliilirig of the other fibers is clone in a roiinci robin

fcîshion. Tlisit is. if fiber 1 cnlls the YEELD Fiinctioii. iiber 1 is suspericlecl a d fiber 2 is esecuteci.

1x1 order to rnoclel siccesses to niernory blocks. tmo functions are providecl: CREAD aiid C 'WR~TE.

X recul request to a meniorÿ block c m be sirniilated iising a call to CREAD aricl a write recluest

to a memory biock can be simulatecl using a call to CWRITE. Both furictions properly mode1

the state transitions for the block in the cache ancl the home nocle. The functions also account

for the clelays s.rssociatec1 with reacling ancl writing a block as indicated in Tables 6.1 ancl 6.2-

For esample, if a fiber esecutes the CREAD function to access a block that is set to state I in

the cache. there nritl be cle1a.y~ in accessing the block. \Vhen such del--s occur inside CREAD or

CWRITE. a call is made to Y ~ E L D to advcmce the simulation time for the fiber. Furthermore. the

cal1 to YIELD suspends the fiber that initiated the read ancl anotlier fiber is allowed to run. Bath

CREAD ancl CWRITE nccount for memory block contention- If one processor reqiiests access

to a memory block that is biisy serving another processor's request. the request is rejected

and the processor that initiated the request will have to t ry again. Although CREAD and

CWRITE account for delays due to network traversal. they do not account for clelays due to

netxvork contention. Recall that the network connecting the processors w,as left unspecifiecl

Page 69: University of Toronto · 2020. 4. 8. · -4 t liesis subrnitted in conformitjr \vit h tlie requirernents for tlie clegree of Alaster of Applied Science Graclrrate Department of Electrical

in Section 6.2. Shere may be delays due to network contention clepending on the network

structure riseci. Both CREXD cuid C ~ ~ R I T E <assume that they ccui access the network soon as

a mernory request recluires communication with a remote node. The I:ut two support routines

,dow the creation of regions tIiat are accessible to one fiber at a t h e . A lock is essentially

a block of niemory that is niociifiecl atornicnlly. .A test-and-set operation is usecl to czseate the

lock entry function. LOCKENTER. The converse of the lock entry function is the LOCKLEAVE

functioti that is used to release a lock.

To illustrate Liow these siniulation support routines are used. a simple matrk nirihiplication

esample is esaminecl ncst. The routine in Figure 6.3 multiplies two 64 x 64 matrices. a and

b. to prodrice a thircl rnatrix. c. It is assumecl that four processors are eseciitirig t h e roiitine

illustrateci wïth each processor's id (O. 1. 2 or 3) storecl in t id. The rom in matriv c a r e dii-ided

ei-enb- among the four processors. Each processor is responsible for cletermiriing t h e valiies

to be pIacecI in the ron-s assigneci to it. ,411 tliree matrices are sliarecl bj- the four processors

perforniing the coniputation. Now. cousicler risirig the simulation support routines to : simulate

the eseciition of the matris multiplication routine on the simple m~iItiproccssor system, In the

simulation. the matris rnultipIication routine is esecuted by four fibers. Each fiber esecutes the

routine illustratecl in Figure 6.3. The variables. r. c. i and t . fit into the same block of niemory

and are Local to each processor. The array. BlockT. contains the blocks wliere these variables

arc storecl. There is an entry in the ctrray corresponcling to each processor in the systcrn. Since

each of tliese blocks is accessed by a single processor, n cal1 C'Rend or CCVrite is not rreqirired

each time 7'. C. i or t is accessed. Hotirever. the initial \\-rite reclrrest to r neecls to be nnodcled

since the block may not be within the cache of the processor recluesting the write. h f t e r the

initial write. the block d l remain in the cache in clle correct state ancl al1 accesses to it will not

irictir estra clelctys, Recall thnt tlie size of the cache is assumed to bc infinite, tliirs t k e block

will never be replwced. If each value in tlie matris is a four byte integer. 32 entries Et into a

single block of memory (recall thst a block is 128 bytes wicle). Assuming that the matmices are

aligned to block boirndaries. the romr and colrirnn coirnters can be iised to determine tHic block

containing a matrk entry. The blocks corresponding to rnatris entries in a. b aticl c c u b e storecl

in Block-4. BlockB and BlockC. respectively. A read or write request is sent to the appr ropriate

block as recluired (lines 7. 8 and 14). Note tliat a cd1 to YELD is macle whcn tirne reaches 10

(line 12)- The assirrnption is that time proportiorial to a tick bas elapsed wlien tinze reaches

a coiint of 10. Consicler what happens ,zs four fibers esecute the routirie in Figure 6G.3. If a

fiber exectites anj- read or kirrite reqiiest that h,as an ,associatecl delw esecution of the fiber is

suspended and anotlier fiber is run. Similarly, if a fiber has esecuted a tick worth of ope-rations.

it is suspended ,and ,mother fiber takes over the simulation system for a. tick. In this tnanner

tlie four fibers. each of which represents a processor in the multiprocessor system. execute on a.

Page 70: University of Toronto · 2020. 4. 8. · -4 t liesis subrnitted in conformitjr \vit h tlie requirernents for tlie clegree of Alaster of Applied Science Graclrrate Department of Electrical

1: for r - 16 - tzd to 16 - (tzd + 1)

2: for c-O to 63

3: t - O

3: for i -O to 613

.5: t - t + a[r][ i l - 6[i][r1

6: 47-1 [cl - t

Figure 6.2: Paralle1 matris multiplication.

uniprocessor systern-

6.4 The Parallel Alpha-Beta S imulator (PABSim)

In the followirig description of PABSim. calk to the first three siin~ilation support routines.

YIELD. CUFRITE and CREAD. are omitted for clcarity. These calls would never be present in any

p,arallel program - they are only present in PABSim for simiilation purposes. Fiirtlierrnore.

code that is iisecl to generate the artificial tree is also oniitted. Note tliat the artificiat tree

generation code in PABSitn is ideritical to the orle clescribed in Section 4.2. Tlie resrilting

ciescription cletails the DTS-like parallel aiplia-beta searcli procedure tliat forrns ttie core of

PABSim witlioiit tlie aclclecl clutter of simulation support and artificial tree generation code.

6.4.1 Node Structure

Tlie nocie striictiire is firridamental to al1 routines in PXBSim. Figirre 6.4 illustratcs the fielcls

nitliin the NodeT structure that is iisecl to holcl al1 node data. Tlie first fielcl. lock. is a lock

variable tliat controls access to the nocle. A node may be accessed by several proccssors at

the same time, therefore u lock is necessary t o ensure tliat orily one processor is allowecl to

rnodify the node at any t ime. The second fielcl. threads. is an intcger tliat inclicates the nimber

of threacls working ut the node. Recall that in DTS nocle ownership may transfer froni one

processor to another. A pointer to the parent node is rnaintained in parent so tbat the node's

current owner knows where to retiirn the score determined a t the node wlien it is closed. Wlen

returning the score cletermined at a node to its parent, certain chta striictirres have to be

cleared at the parent so that the node isn't visited agairi. The integer field, p ~ ~ e d i d ~ . contains

an index tliat specifies tlie position of the ctiild among ttie otlier children a t the parent. For

escample, if the current node is the fifth node expanded at the parent, parentidx would contain

4 (indesing starts a t O). A count of the moves that rem& unexaminecl s t a node is helcl in

Page 71: University of Toronto · 2020. 4. 8. · -4 t liesis subrnitted in conformitjr \vit h tlie requirernents for tlie clegree of Alaster of Applied Science Graclrrate Department of Electrical

tame - O

CWR[TE( t id . BZockT [ t i d ] )

for r - 16 - tid t o 16 - ( t i d + 1)

for c - O to 63

t - O

for i - O to 63

C R ~ . . \ ~ ( t l c l . BlocX=ii[(64 - r + i)/32]) C R ~ . - y ~ ( t l d . BlockB[(64 - i + r ) /33])

t - t + a[r][i] - b[i l[r]

10: tinte - tFme + 1

Il: if tinze = 10 then

12: Y r ~ ~ ~ ( t i d . 1)

13: t h e - O

14: C~%-RITE( t i d . BlockC[(G-l

15: c[rlkl - t

Figure 6.3: Simiilating matris multiplication-

Page 72: University of Toronto · 2020. 4. 8. · -4 t liesis subrnitted in conformitjr \vit h tlie requirernents for tlie clegree of Alaster of Applied Science Graclrrate Department of Electrical

1: record ~VodeT

'> - -. loch

3: threacls

-4: parent

paren f.i&

mvsleft

rnusfin i

t? /P

ulp ha

betn

SCoTe

bmnch [32]

child [f32]

Fig~ire 6.4: The Node T striictrirc.

mvsleft. Note that a move tliat is ciirrently k i n g seacheci by a processor is not includecl in

tliis count. A si~nilar count for rnoves that have been cornpletely searchecl is Iield in nrusjini.

PABSiai uses tlie minimils formulation as opposect to the negamas fornlulation. Recall that

in negarnas every riocle is of the masimizing type. In minimxi;. a clistinction is made between

a iiock tliat is of the rniriirriizirig type and a riocie tliat is of the mi.~simizirig typc. -4 flag that

specifies the nocle t j p e is storecl in type. Tlie lower ancl upper boirncl at a nocle are storecl in

alpha m c 1 betn, respecti\dy. -At a iniriiniizirig nocle. tlie lowest vdue fo~lrid so Far is storecl iri

score. The sanie field is used at a rnaxirnizirig node to store the liighest n z h e founcl- -4 List of

moves at a riode is storeci in the array: bmnch. AS branches are espancled. the riewly allocated

nocle chta structures are storeci in chiM. Note that the higliest brancliing factor at a node is

assimecl to be 32.

6.4.2 Node Allocation

Nodes are alIocated and dealIocated <as the searc11 of the tree progresses. The allocation ancl

deallocation routines use two heaps of nocles: a global heap m c I a Iocal heap. The global heap

is sharecl by al1 processors and it initially contains p - d - 2 nodes. Here, p is the number of

processors involved in the search ancl d is the depth of the tree being seûrched. Each processor

also l i i ~ a Iocal lieap of nocies. -4ltliough the local heaps are initially enipty: the local henps

Page 73: University of Toronto · 2020. 4. 8. · -4 t liesis subrnitted in conformitjr \vit h tlie requirernents for tlie clegree of Alaster of Applied Science Graclrrate Department of Electrical

return

Figure G.5: ,Ail possible s tate transitions for a processor perforniitig paralle1 alpha-beta search.

c~an contain up to ci - 2 nocies. ik7ien a processor allocates a tiocle. the local heap is checkeci

for a free riode before the global heap is rised, The global heap is sharecl by al1 processors ancl

access to it is controllecl bj- a lock. thus it is k t e r to satisfj- node allocation recluests 1ocalI~-

whenever possible. \l'lien a node is cleaIlocated. it is retiirried to the local heap if possible. IF

the local lieap hCas reacIicc1 its rnasimurn s i x . t h e node is returnecf to the global Lieap- At thc

start of the seardi, d l allocatio~ls tvill use the global heap. Mowever. as the search progresses.

d l noclc allocation requests will be satisfiecl Iocally.

6.4.3 Search Procedure

A processor that is participating in the parallel alpha-beta search proceclure clefi~iecl by P-4BSirn

m q * be in one of seven possible states: expan- search. eval. ret~rrn. split , update and enr;.

Figure 6.5 illustrates all possible state transitions that cari occur. Escept For the last state.

exit. there is a routine in PXBSirn correspoiiding t o each of the sis ot her states. These routines

are referrecl to as s ta te ro.t~tzne.5. Each processor also rnaintains a pointer to the -'ciirrenr;" nocle.

The target of al1 work performed by a processor is its c u r e n t nocle. A processor t n - be viewccl

as a state machine nrhose behavïor is governed bj- its s tate and its current riocle. ParaIlel searcti

is then a result of several interacting stute machines.

Figure 6.6 iIlustrates the T D a t a T chta s tructure that is riseci to liold search data local to a

processor. -4 pointer to che current nocle is stored in node and che processor's current state is

storecl in state. An additional fielcl called value is used LIS a ternporary chta storage area tvlien

a processor switdles states, A variable of type. T D a t a T , is passeci into each scate ro~itine.

State: expand

T h e routine corresponding to the expand state i s illustrateci in Figure 6.7. W l e n a node is

initially created. its move List is enipty. In the expancl state, a call is made t o a move generation

Page 74: University of Toronto · 2020. 4. 8. · -4 t liesis subrnitted in conformitjr \vit h tlie requirernents for tlie clegree of Alaster of Applied Science Graclrrate Department of Electrical

1: record TDataT

2: node

3 : s ta te

4: value

Figure 6.6: The TDataT data strticture.

1: B ~ x ~ ~ ~ r G ~ ~ ~ ~ ~ - i ~ ~ ( t d a t a . n o d e )

2: tdntastate - search

Figure 6.7: The ercpand state routine.

routine to crcate a valici move k t . Since PA4BSirn searclies artificial trees. the rnove generation

roiitirie that it uses creates a set of seecls that are to be passed on to the cliilcl nodes (see

Section 4.2). Once the niove generatiori is complete. a processor tliat is in the expunu! state

changes to the seurch state.

Slie noclc beiiig espanclecl is already lockect before the expand routine is execritecl. It is

lockecl wlien it is initially created by search state. When the stnte changes froni expand to

search. tlie node remains lockecl.

State: search

The semelt state routine is given in Figure 6.8. At the start of the routine. two checks are

performed to ensirre tliat the nocle is sec~rcliable. The first check verifies that the riocle score

is wïthin the boiinds at the node (lines 1-10) and the seconcl check verifies tliat there are

uriassig-ned nioves a t the node (lines 11-13). If eitlier of the two checks fail. a transition is

made to the return state. If both checks pcis. an unassigned move is expancled to create a new

node. A node created by NODEALLOC is initidly locked. This is to prevent otlier processors

from entering the newly created node before the nocle's rnove list is popirlatecl. Notc tliat when

the cal1 to NODEALLOC retiirns. the processor esecuting the search routine holds two locks:

one lock controls access to the cirrrent node and the other lock controls access to the newly

created node. Searcli now moves t o the new node: the lock corresponding to clle current node

is released and the newly createcl node becomes the current node. If the new node is a leaf. the

Page 75: University of Toronto · 2020. 4. 8. · -4 t liesis subrnitted in conformitjr \vit h tlie requirernents for tlie clegree of Alaster of Applied Science Graclrrate Department of Electrical

processor's state cliariges from search, to eual. Ho~vever~ if the new node is an interna1 node. the

state changes to expand.

State: eval

Tlie eval state routine is illustrated in Figure 6-9. In the evnl state, a score is determinecl for the

current nocle. Normally tliis woulcl involve the esecrition of an application specific routine that

compuces a score for tlie node. This routine is usuall- a significant percentage of the totaI search

time. Howec-er. since P-U3S irn searches art ificial t rees. t lie score is availal~le inst antarieotisly

In PABSim. EVALLTATE does not return the score irnmecliacely but c i e l i ' ~ ~ ~ the processor by a

specifiecl arnount of tirne beforc retrirning. Once e\rziluation is complete. the processor-s state is

changecl to return.

State: return

In the returrz state. the current nocle h a been cornpletely searcliecl and the score determinecl

at the nocle is ready to be retririicd to its parent. Tlie return state routine is preseiiced in

Figure 6. IO, R e d 1 thut in DTS. a node is on-nec1 by the l u t processor that is left searching the

node. The responsibifity of ret~~rning the nocle's score to its parent hlls on the nocle's olvner.

At tlie start of tlie return routirie. a check is performed to see if the current processor is the lzst

one at the nocle. If the current processor is not the last. the tlireacl coiint is sirnply ciecrernentecl

ancl the processor enters the spl i t state wliere it searclies for another nocle to work on. However.

if the crirrent processor is the Last one at tlie nocle. it begins the process of cleallocating the riocle

(lines 6-12). IF the node being cleallocatecl is tlie root nocle itself. theri the seurcli is cornplete

and the processor enters the erit state. In ridclition. a global Rag calleci searchend is set to

indicate tliat the search of the tree is complete. For any other node. the riocle's score is placed

witliin the processor-s value fielcl. Tlie ualue fielcl is iisecl ,as teniporary Gorage for a nocle's

score before it is nctu~illy mecl in the updnte state. The current nocfc is tiieri cleallocated. the

parent nocle becornes the ciirrent nocle ancl the processor's state is changecl to iipdate.

State: update

In the update state. the seardi has just returned from the searcli of a child nocle. The chilct

node's score is found in the processor's value field. Figure 6-11 shows the update state routine.

If tlie score returnecl by the chilcl is better thcm the score at the node, the score is irpdated.

M71ien an update occiirs, the new score is actually a bound for the nodes loiver in the tree. If

tlie new score is a lower bouricl. TRANSLB is used to transmit the new bound to the siibtree

rooted at the current nocle. TRXNSLB performs a depth first traversal on the subtree. At eacli

Page 76: University of Toronto · 2020. 4. 8. · -4 t liesis subrnitted in conformitjr \vit h tlie requirernents for tlie clegree of Alaster of Applied Science Graclrrate Department of Electrical

SEARCH( tciata)

1: if tclata-node-type = m a then

2: if tdata .node.score >_ tdata.node,betu then

3: tdata-node-score - tciatcz-node. beta

4: tdata-state - return

5: return

G: eIse

7: if tdata.noclescore 5 tdata.r~ode.alphct then

8 : tdnta-node .score - tdatn.node .aLpha

9 : tdata-state - return

10: re turn

II: if tdata. node.m.c?slefi = O t hen

12: tdata d a t e - retr~rn

13: return

14: k m p - N o ~ ~ A ~ ~ ~ c ( t c ~ a t a . n o d e )

15: tdata.node-muslefi - tclata.node.rnvsle - 1

16: LOCKLEAVE( tdnta .nocle.lock)

17: tclata.nocle - t e m p

13: if tdatn. node-clepth = O then

19: tdata-state - etru1

'20: else

21: tduta.state - e q a n d

Figpre (3.8: The search state routine.

EVAL( tdata)

1: tdnta-node .score - Ev. - \~u ' .~~~( tc la ta .node)

2: tdatastate - re tu rn

Figure 6.9: The eval state routine.

Page 77: University of Toronto · 2020. 4. 8. · -4 t liesis subrnitted in conformitjr \vit h tlie requirernents for tlie clegree of Alaster of Applied Science Graclrrate Department of Electrical

1: if tdata.node t h r e c . > 1 then

2: tdata .node .thrends - tdata-node ,thread.s - 1

3: L ~ ~ ~ L ~ . . \ ~ ~ ( t d a t a . n o d e .lock)

4: tdata-state - spl,it

5: else

if tclata.rtocle = root then

L ~ c r ; L ~ , ~ ~ ~ ~ ( t d a t u . r t o d e . l o c k )

tdntu-state - ezEt

senrcf~end - true

else

tdata. ~ralue - tdata-node ,sco~ti

tchta.r~ode - N o ~ ~ D ~ . - ~ ~ ~ ~ ~ ( t d a t a .node)

tda tas ta te - update

Figure 6.10: The retzirn state routine.

Page 78: University of Toronto · 2020. 4. 8. · -4 t liesis subrnitted in conformitjr \vit h tlie requirernents for tlie clegree of Alaster of Applied Science Graclrrate Department of Electrical

I : tdata-node . musfini - tdata. node .rnvsfini - 1

2: if tdata-node-Qjpe = max then

3: if tdata--ualue > tdata.nocle.score then

4: tdata.node.sco.r-e - tciata,:ualue

.5 : if tclata.node.th~reuds > 1 t hen

6 : T ~ ~ \ ~ ~ L B ( t c l a t a . n o d e ) - : else

8: if tdatu.t.alve < tdnta.nocle.score then

9: tdata .node score - tduta - u a h e

10: if tdatn.node.threacls > I t hen

11: T ~ x ~ ~ U B ( t d a t a . n o c l e )

12: tclata-state - search

Figure 6.11: The update state routine.

nocle, the new tower boirnd is compared to the esisting bound. If the new boiind is better. the

nocle-s boiincI is cllangecl. -4 sirnilx routine esists for transmit ting iipper borinds and it is callecl

T RAKSUB. Once the ciirrenr, node-s score is iipclatecl. the processor enters t lie search state.

State: split

The split state is a state in which iclle processors try to fincl a nocle wliere work is ctvailable.

Figure 6.12 illtistrcztes the split state routine. At the start of the roiitinc. the global t-ari-

cible. searchend is checked to determine if the tree searcli is complete, If search is complete.

tlie processor in tlie split state enters the exit state. However, if tlie searcli is not complete.

the processor tries to find a split-point where the searcli effort can be shared. The S P L I T P -

NTSEARCCI roiitine can be used to search through the entire tree to End a splic-point. Howcver,

SPLITPNTSEARCEI is too espensive to esecute each time R split-point is rieedecl. The previous

resulcs of esecuting S P L I T P N T S E X R C H can be foi~nd in splitpnt. When a split-point is neectecl.

the nock in splitpnt is cliecked for =ilidîty before S P L I T P N T S E A R C H is irivokecl. A d i c l spiit-

point is one that has some uriczssigned moves and one where the nock score is within the node

bouncls. Note that even after esecuting Splz'tPntSenrch, a valid split-point may not be found.

In this ccaset the processor delays for some time before retrying.

Recall that we are trying to compare the performance of three different split-point selection

Page 79: University of Toronto · 2020. 4. 8. · -4 t liesis subrnitted in conformitjr \vit h tlie requirernents for tlie clegree of Alaster of Applied Science Graclrrate Department of Electrical

1: for ever

2: if searchend = true t h e n

3: tciatn-state - exit

4: retrtrn - 3: LOCKENTE ~(spl i tp~ntIock:)

6: if S P L I T P N T ~ . - \ L I D ( ~ ~ ~ ~ ~ ~ R ~ ) = true then - l : splitpnt .threads - spl i tpnt .threads + 1

8: tdcih ,node - splitpnt

9:

10:

II :

12:

1.3:

14:

15:

16:

17:

18:

19:

20:

tdatn .state - search

LOCKL E A V E ( S ~ & ? Z ~ ~ O C ~ )

ret urn

splitpnt - S P L I T ~ ? N T S E A R C H ( )

if S ~ ~ r ~ P ~ ~ V : \ L r ~ ( s p l . i t p n t ) = true then

splitpnt .threads - splEtpnt .thrends + I tclata .node - splitpnt

tdata-state - senrch

L O C K L E . - ~ ~ ( s p l i t p n t l o c k )

return

L o c ~ L ~ ~ v ~ ( ~ p 1 i t p n t l a c k )

DELAY ( rechecktime )

Figure 6.12: Tlie split state routine.

scliernes. In Figure 6.12, S P L I T P N T S E A R C H is actually a Function pointer. The function that

is eseciitecl changes clepenclirig o n the split-point selection scherne being useci.

State Timing

Apart from the clelqs due to mernory access. lock entry and lock release: there are finite de lqs

associateci with the cornputation that takes place within each state. The search and r e t u m

state each consume a tick of processor tirne. The split state is one tick in duration if a wtlid

split-point is founc1 imrnediately. However. if a split-point search routine is started, ttn estra

tick of processor time is consurned for every node that is examinecl during the search. If the

searcli does not fincl a split-point, another SPLITPNTSEARCCI is not startecl untiI 36 ticks have

Page 80: University of Toronto · 2020. 4. 8. · -4 t liesis subrnitted in conformitjr \vit h tlie requirernents for tlie clegree of Alaster of Applied Science Graclrrate Department of Electrical

elapsed. The upclate state is one tick in duration if no score updates are performed- However.

if an update 1icw to be performed. ~ui estra tick is spent for eveq- child that hcas to be informed

of the score cl-iange. The expand and eual state are 3 and 8 ticks in duration. respectively.

6.5 Split-Point Selection Schemes

Tliree clifferent split-point selection schemes are corisiderecl. The first sclierne. -4. is baseci ori

the Y73Ci7C search procedure. Node classification is iclentical to tlie one describecl in Section 3.4.

In scfieme A. a nocle can be selectecl as a split-point 0n1- if it meets the follon-ing criteria:

0 -4t least one brandi at the nocle rnust hm-e been esaminecl.

If the node is of the Y-CUT type then at least .rn.-i branclles rnust 1ia.v-e been esaniined.

Here. m.=\ is a valrie that is varied during t h performance esperiments.

At iuiy point in tlie search. there will be several nodes in the tree tliat meet this criteria.

Three gpiclelines are used to determine which riode gets selectecl as the split-point. In orcler of

decreasing priority, the guiclelines are as folloms:

1. -4 nocle that is liigher rip in tlie tree is preferable as it represents rnore work.

2. A nocle chnt Ilas man)- branches alrcady esamineci is a good split-point since the score at

the node ma)- have stabilized. This gnicleline is not applicable once more than a cl~iarter

of the rnoves at a node have been esaminecl. For esample. on a tree n-itli a brancliing

Factor of 32. a node tliat lias 4 brandies esarriiiied is preferable to one tliat lias 3 braridies

esaminecl. Hoirever. a nocle that has 10 branches escimined is corisiclered eclrral1~- gooc1 as

a riode that has Il branches esamined.

3. A node tliat bas several irnesamined branches males a goocl split-point as a lot of paralle1

work is alrailable a t tlie nocle.

Sclieine B is bcued on tire DTS searcli proceclure- Nocles in the tree are classifiecl accorcling

to the scheme described in Section 3.5. In selecting a split-point: scheme B uses guiclelines 1

and 3 from scheme A. However, guideline 2 is replaced by a new guicleline that favors nodes

witli higher confidence. At a D-PV or D-ALL node, confidence is mecasured by tlie number of

moves examined. As the number of rnoves esamineci increases, the confidence that the node is

a D-PV or D-ALL node increases. The nurnber of moves esamined is aIso used at a D-CUT

node as si measure of confidence. However: at a D-CUT node. the confidence decreases as more

nioves are esczmined. Once more tlian m B moves have been esamined at a D-CUT node, the

node's t_vpe is changecl from D-CUT to D-ALL and the confidence that it is a D-ALL node

Page 81: University of Toronto · 2020. 4. 8. · -4 t liesis subrnitted in conformitjr \vit h tlie requirernents for tlie clegree of Alaster of Applied Science Graclrrate Department of Electrical

begins increasing. -4 node can be selected as a split-point only if it is a D-PV or a D--XL nocle

nith a confidence of at Iecast 1. Furtherrnore, in ci manner similar to guideline 2 in scheme A.

the confictence measure is not applicable if tlie nodes being compared have a conficlence t-aliie

that exceeds b1-L on a tree with a branching factor of b.

The tliircl sche~ne. C. uses the neural network clescribed in Section .5.4 to ciussiFy the nocles

in the tree. A nocie's type is determinecl using tlie proceclure describecl in Section 5.5. The range

usecl to determine if a node is of the N-CUS type is iclentical to that usecl in Section 5.5. In

aclclition to the valiie used for nocle classification. the neural network also provïcles a constraint.

W . on the nirmber of branches that must be searched secluentially before the nocle ccui be selecteci

as a split-point. For g-reater flesibility. two w.riables. rnc-, and TTLC-.,. are ~iseci to adjust the

constraint thac the neural network produces. At al1 N-ALL ancl 3-Pv nocles. ~ c - i n t ~ - , branches

have to be searched sequentidly before the nocle m l - becorne ci split-point. At a N-CUT nocle.

the constraint is increuecl even higher - LL?+ neaa+ rnc-, nocles have to bc searclied seqirentiailÿ

before parailel search is possible. IWien a few processors are usecl for the parailel sclirch. mc_,

and rnr-, are rrsuztllj- large: greater accuracy in split-point selection is obtained at the cost

of reduced parallelism- However. when a large number of processors are participating in the

searcli. rnC.,, ancl mC., are set to a small d u e : split-point selection accurncy is reclircecl biit

the pcircdlelism that is a[-ailable is increasecl. From the set of nocles t hat pass t liese const raints.

the node that is finnlly selected CU the split-point is based on tlic same set of gridelines as in

scheme A.

Performance of Split-Point Selection Schemes

The performCuice of tlie tllree split-point selection schenies. -4. B ancl C. riras comparecl usirig

the trees generated by set tg l . The results reported liere are totals for the 100 trees generated

by tlie set. Driring the performance esperinients. variablesl m,~ , m g . mc.tr ancl mc.,, w r e

varied to find the valiies at which the liighest speecl-iip wc?s obtained. The figures reportcc1 hcre

correspond to the higliest speecl-up obtained for each scheme.

6.6.1 Performance Measures

Before introclucing the rneasures used to compare the performance of the split-point selection

schemes, a few definitions are introduced. On a p processor systeml the cime taken to search

the 100 trees generated by t g L is denoted tp. On tlie same set of trees, the number of nodes

esamined by a p processor search is denoted n,.

Three mecmures are iised to compare perf~rm~mce. The first measure, speed-up, is the

Page 82: University of Toronto · 2020. 4. 8. · -4 t liesis subrnitted in conformitjr \vit h tlie requirernents for tlie clegree of Alaster of Applied Science Graclrrate Department of Electrical

traditional metric irsed to jridge the eficiency of a parallel program relative to a seqrientid one:

A parallel search ~isiiailj- examines a larger tree comparecl to a secluential search. The search

overhead associatecl mith a p processor searcti is given by:

n~ SO, = - - 1 ( 6 2 ) n1

Icleally. eacfi processor participating in a parallel search spencls al1 of its time searching tlie

tree. Howcver. tliere are several obstacles tliat prevcnt the realization of this icleal behavior.

Communication is necessary in PABSim to ensure tliat each processor Iias the Iatest bounds.

There are synchronization overheacls associntecl with lock entry and release. Furthcrmore.

the process of selecting a split-point For <21i idle processor aclcls extra overIieacls. TIiese three

overheacls arc groupeci into a single rneasurcment referred to as comrrzrrnicntion-.sy~~ci~ronizatior~-

split ( CSS) O\-erlieacl. On a sliarecl rnemory rnu1tiprocessor. coniniunicâtion ancl sj-nchoniza.tion

costs are usually srn;dl. hliich of the G'SS overhead will be cliie to ttic time spent searching for

a split-point. The G'SS overheacl can be approsirnatecl as the estra effort recpired to evdiiate

each node in a paralle1 search:

Note tliat the CS'S overliead should riot be irsed as a figure of rnerit n-itlioirt taking SU arid S O

into account. For csarnple. a paraIlel searcli t h spends rnost of its tirrie evaluating subtrees

tl-rat are usuidly cut-off mill liave a small G'SS overliead. However. tliis particiilar search n-il1

have a sniall speecf-up and a large search overheacl.

6.6.2 Uniprocessor Search

Table 6.3 presencs the results of a uniprocessor search on the trees generatecl by t g L. Tlicse

resiilts ad1 he usecl in evaluiiting SUp' SOp am1 CSS, as p takes on values in the set (12.24. AS}.

6.6.3 Trees With Ordering (32, 79, 5)

The perforrnttnce of schemes A. B and C under ordering (32.79.5) is presentecl in Table 6.4.

Figure 6.13 compares tlie performance of the three scliernes on the ba i s of spei-d-iip. seuch

overliead and G'SS overliead. Sclienies A and B perforrn similarly on al1 t h e memures, Wien

~ising 24 and 48 processors, Sclierne C's behavior is remarkably different from tbat eshibitecl

by schemes A and B. With 24 processors, sclieme C has a lower speed-up. Furthermore, it lias

a higher search overhead ;1~i well <as a higher CSS overhead. The higher searcli overhead can

Page 83: University of Toronto · 2020. 4. 8. · -4 t liesis subrnitted in conformitjr \vit h tlie requirernents for tlie clegree of Alaster of Applied Science Graclrrate Department of Electrical

Table 6.3: Performance of alpha-beta search on a single processor.

Ordering

32,ï9,5

be interpreted as a reduction in split-point selection accuracy and the higher C'SS overhead

can be interpreted as an increcase in the time spent performing communication and split-point

selection. I i i th 48 processors. scheme C hcas â much liigher speed-up and an even higher searcli

overhead. However, i ts CSS overhead is sipificantly Iower tliari the CSS overhead cdculatecl

for schemes A and B. Here. scheme C keeps ail processors biisy by reducing the split-point

selection accuracy Although there is increased inaccuracy in the split-point selection. cliere is

also an increase in the usefui work performed by the processors participating in the search.

6.6.4 Trees With Ordering (20.69,19)

t r (ticks)

220380112

Table 6.5 presents the performance of the three split-point selection sclie~nes under the ordering

(20.69.19). Figure 6.14 compares the speed-up. seczrch overlieacl ancl CSS overhead observecl for

the tliree schemes. Although schemes A and B perform sirnilady on the speed-up measurement:

there is a clifference in the way in which they achieve their individual speed-up values. Scheme

A speads less time selecting a split-point ancl searclies more nocles. whereas scheme B spends

more time in split-point selection and excanines fewer nodes. For both schemes. the speed-up

with 45 processors is less than the speed-up obtained with 24 processors. With 4s processors,

both schemes e-xhibit a kvge CSS overhead, As described in Section 6.6.1, CSS overheacl is

primarily cliie to the time spent searching for a split-point. Crsing a modifieci version of PABSim

that provides extra timing information. the split s ta te is found to account for over 60 percent of

the total search time using scheme A. Wien using scheme B: the split state is found to account

for over 70 percent of the total search time. The split state has effectively become the limiting

factor in the parallcl scarch for the folIowïng recasons:

nl (nodes)

22123250

More processors are actively seaching for a split-point.

There are fewer split-points as a result of the smdler tree size and the constraints imposed

by the schemes A and B.

Page 84: University of Toronto · 2020. 4. 8. · -4 t liesis subrnitted in conformitjr \vit h tlie requirernents for tlie clegree of Alaster of Applied Science Graclrrate Department of Electrical

Figure 6.13: Performance of schemes A, B and C under ordering (32,79: 5 ) .

Page 85: University of Toronto · 2020. 4. 8. · -4 t liesis subrnitted in conformitjr \vit h tlie requirernents for tlie clegree of Alaster of Applied Science Graclrrate Department of Electrical

1 Scheme -4

1 Scheme B

P

12

24

48

1 Scheme C

SUp 7.423

11.089

12.817

P

12

24

4s

rn,\

4

5

2

Table 6.4: Performance of schenles At B and C under orclering (32, 79.5).

t p (ticksj

29637828

19873555

17194575

np (nodes)

32457819

33626938

41103307

r n ~

6

5

1

p

12

With 12 and '24 processors. scheme C has similar speed-up values as schemes -4 ancl B. However.

when se~wdiing with 48 processors, scheme C performs sigificantly better than schemes A cznci

B. To achiex7e this speed-up, sclierne C tracles split-point selection accuracy in favor of keeping

its processors busy,

6.6.5 Trees With Ordering (32,66,66)

SOp 0.467

0.520

0.858

tp (ticks)

25601791

19356414

17302G47

rnc.,, mc.,

- l r 3

The performance of schemes A, B and C under ordering (32,66: 66) is presented in Table 6.6.

A cornparison of the speed-up, search overhead and CSS overhead eshibited by each scheme

is presented in Figure 6.15. The performance of schemes A =and B on d l three rneasures is

similcar. Scheme C outperforms both scheme A and scheme B on the speed-up rneasurernent.

When searching with 12 processors, scheme C acliieves a srnail search overhead at the cost of

a small increase in CSS overhead. With 24 processors, the search overhead is stiU quite small

but the increase in CSS overhead is much lczrger. When secarchhg with 48 processors, scheme

C reduces split-point selection accuracy to keep al1 its processors busy.

CSSp

0.102

0.424

1.016

SUp 7.705

11.385

12,811

tp (ticks)

29539208

np (nodes)

30927242

33797226

41280623

SUp 7.461

SOp

0.398

0.528

0.866

np (nocles)

31415747

CSSp

0.114

0.350

1.005

SOp

0.420

CSSp

0.133

Page 86: University of Toronto · 2020. 4. 8. · -4 t liesis subrnitted in conformitjr \vit h tlie requirernents for tlie clegree of Alaster of Applied Science Graclrrate Department of Electrical

I Scheme A

1 Scheme B

I Scheme C

P

13

24

48

Table 6.5: Performcmce of schemes A: B C ~ ~ n d e r ordering (20,69.

. rn~

4

2

1

p

12

24

48

Scheme B

t p (ticks)

QQ2S327

6751780

8695562

rnCsa7 mcTC

-1,3

-3:3

-4.3

Scheme C

P

12

24

48

S U p 6-75?

5.828

6.855

tp (ticlis)

9298153

6699466

5331472

Table 6.6: Performance of schemes A, B and C under ordering (32,66,66).

r n ~

5

3

2

p

13

np (nodes)

8'7108'14

9168454

10482967

Surp

6.410

5-89,

tp (ticks)

31281023

18527219

14879228

rnc,., mcVc

0,3

SOp O

0.519

0.731

np (nodes)

8803379

14298996

CSSp

0.231

0.789

3.031

11-150 , 17930200

SU, 6.473

10.928

13.607

tp (ticks)

SUp 0.459

1.3'70

CSSp

0.283

0.135

L.971

np (nocles)

35130206

35449416

29993951

S U p 1 np (nodes)

0.445

23623744 5.571 ( 244.129533 I

SOp 0.714

0.730

0.463

0.192 0.175 1

SOp

CSS,

0.082

0.270

1.410

CSS,

Page 87: University of Toronto · 2020. 4. 8. · -4 t liesis subrnitted in conformitjr \vit h tlie requirernents for tlie clegree of Alaster of Applied Science Graclrrate Department of Electrical

Figure 6.14: Performance of schemes A: B a d C tinder ordering (20,6c): 19).

Page 88: University of Toronto · 2020. 4. 8. · -4 t liesis subrnitted in conformitjr \vit h tlie requirernents for tlie clegree of Alaster of Applied Science Graclrrate Department of Electrical

Figure 6.15: Performance of schemes A. B and C under ordering (32 ,66 .66) .

Page 89: University of Toronto · 2020. 4. 8. · -4 t liesis subrnitted in conformitjr \vit h tlie requirernents for tlie clegree of Alaster of Applied Science Graclrrate Department of Electrical

1 Scheme A 1

1 Scheme B 1

P

12 SOp

0.766

Table 6.7: Performance of schemes At B <amci C uricler ordering (32,33.33).

CSSp

0.0'75

P

12

24

48

p

13

24

48

6.6.6 Trees With Ordering (32 .3333)

np (nodes)

72018499

m,i

6

SO, 0.778

0-905

0,903

Table 6.7 presents the performance of scliemes A, B ancl C uncier the ordering (3233.33)-

Figure 6.16 compares the performance of the three schernes on the basis of speeci-up. search

overliead and CSS over11e;~d. Once again. schemes A and B esliibit a similar perforrnünce

on a11 three measures. Sc l~eme C lias a higher speed-up relative to schemes A and B wlieri

searching with 12 ancl 48 processors. With 24 processors. al1 tliree scliemes e-dlibit a similar

speed-up. When searching with 12 and 24 processors. scheme C favors a liigher split-point

selection accuracy a t the cclst of a liiglier CSS overhead. When searching with 45 processors.

it favors a lower CSS overhead so tliat d l the processors ccvl be kept busy

rns 7

5

2

C S p 0.0.50

0-138

0.596

me.,, Tnc.~

0.1

-3, 1

-3 , l

6.6.7 Trees With Ordering (32,0,0)

t p ( t icks)

65197824

The performance of the schemes A, B cmd C under the ordering (32, O? 0) is presented in Ta-

ble 6.8. Figure 6.17 compares the performance of the three schemes on the b a i s of speed-up,

semch overhead and CSS overhead. All three schemes cxhibit similm speed-ups. Compared to

schemes A and B: there is a significant difference in the ~ v a y scheme C achieves its speed-up

results. With 12 a n d 24 processors, scheme C favors an increased split-point selection accuracy

at the cost of increased CSS overhead. With 48 processors, the scenario is reversed. Split-point

SUp 6.319

t p ( t icks)

55947868

37256277

23035327

np (nodes)

72502627

77679.512

t, (ticks)

64122450

37235149

SUP 6.425

11.06'7

S U p 1.363

11.058

17.853

26066335 15.805 77601935

np (nodes)

57645755

73518488

99485108

SO, 0,414

0.803

1.440

CSSp

0.153

0.204

0.100

Page 90: University of Toronto · 2020. 4. 8. · -4 t liesis subrnitted in conformitjr \vit h tlie requirernents for tlie clegree of Alaster of Applied Science Graclrrate Department of Electrical

Figure 6.16: Performance of schemes A, B and C under ordering (32: 33: 33).

Page 91: University of Toronto · 2020. 4. 8. · -4 t liesis subrnitted in conformitjr \vit h tlie requirernents for tlie clegree of Alaster of Applied Science Graclrrate Department of Electrical

1 Scheme A

-

Scheme B 1

1 Scheme C 1

CSS,

0.020

0-042

0.154 1

Table 6.8: Performance of schemes Al 5 ancl C under ordering (32,O: 0).

-

p

12

24

48

accuracj- is ailowecl to decrease in order to keep dl processors busy.

np (nodes)

291397171

349871306

372052246

P

12

24

48

6.6.8 The Significance of the Neural Network Approach

SOp 0,637

0,965

1.090

. r n ~

8

G

4

tp (tic&)

258220759

158327672

93253513

rnc,=. me.,

-5: O

-Y, O

-12,O

In the esperiments. the neural network approach. as embodied by sclierne Ct eshibits be1iavior

that is quite unlike the other two schemes, In particular, for non-random trees (an esponential

ordering wliere bot11 w, and u;h are non-zero), scheine C has a higher speed-up than sclzemes

A and B when a large niimber of processors are useci- Scheme C achieves this by rediicing the

split-point selection accuracy to keep al1 of its processors busy. In scherne A, when rn=i = 1: a

node may be selected =as û split-point if at l e s t one branch hcas been esaminecl. Since scheme

C achieves a higher degree of parcdelisml i t must have less stringent criteria for split-point

selection. Specifically, scheme C may select a node as a split-point even before a branch has

been fully examineci. When searching a tree with a lczrge wb using a few processors. scheme C

obtains a higher speed-up than schemes A B by increasing the split-point selection accuracy

at the cost of an increase in CSS overhead. The performance that the neural network approach

obtains over the otlier two schetnes is prim,arily due to its Aexibility. It is flexible enough to

increase parallelism when a large number of processors are being used and to increase split-point

selection accuracy when a few processors are being used.

SUp 7.186

11.720

19.899

tP (ticks)

246839068

152396535

89229804

SU,

7.517

12.176

20.796

np (nodes)

272691007

334553742

398466668

SOp 0.532

0.8'79

1.238

CSSP

0.042

0.049

0.031

Page 92: University of Toronto · 2020. 4. 8. · -4 t liesis subrnitted in conformitjr \vit h tlie requirernents for tlie clegree of Alaster of Applied Science Graclrrate Department of Electrical

Figure 6.17: Performance of schemes A, B and C under ordering (32.0: 0)

Page 93: University of Toronto · 2020. 4. 8. · -4 t liesis subrnitted in conformitjr \vit h tlie requirernents for tlie clegree of Alaster of Applied Science Graclrrate Department of Electrical

Chapter 7

Conclusions and Future Work

7.1 Conclusions

This work offers three main contributions. First. the concept of ,an exponentially ordered game

tree \vi'iS introduced as a moclel for the +une trees encountered in practice. Various statistics

arising froni the searcli of exponentially orderccl trees were compared to the statistics arising

h m the search of chess trees. With a certain set of parameters, exponentially orclerecl trees

were foirncl to produce trees that were quite similar to chess trees. Second. netiral networks were

introducecl as a method of solving the node clcassification problem. The netiral network approacli

ras found to outperform al1 other node classification techniqiies. The precliction accuracy of the

neural network approach increasecl as the tree ordering parcuneter. z ~ ' b . t ~ r ~ ~ increased- Thircl.

the neural network approach \vas irsed as a split-point selection scherne for parallel search. Here.

the neural network approach tva found to perform better than tecliniques btrtsed on YBtV-C ,mcl

DTS wlien a large number of processors ivas usecl. Fiirthermore. the neural network approach

performed well on a few processors if the tree ordering parameter w h FWS relatively large.

7.2 Fbture Work

There is a lot of room for esperimentation with the neurai network approacir. In this work,

we explored the performance of the neural network approach on exponentidly orclerecl game

trees. While exponentidly ordered trees rnimic most of the behavior of reul game trees, there

is no substitute for experimentation on real g,me trees. Chess has long served researcliers as

the main source of g m e trees and chess ccm be used once again as a testbed for the neural

network approach. The parCaIlel search resiilts described here are based on simulations. This

can be extended by performing the same experiments on a reai sshared memory rnultiprocessor

mith real game trees.

Page 94: University of Toronto · 2020. 4. 8. · -4 t liesis subrnitted in conformitjr \vit h tlie requirernents for tlie clegree of Alaster of Applied Science Graclrrate Department of Electrical

Appendix A

Seeds Used For Artificial Tree

Generat ion

The clrtificial tree generation methoct presented in Section 4-2 generates a tree btised on the

value of the seed pa rmete r . The esperiments conducted in this text use two sets of seecls: t g l

and tg-. The first set consists of 100 seed valiles while the second set consists of 200 seed values.

A S Set 1, tg,

A simple Linear congruential random number generator produces the seed values for t Iie first

set, t g L . The rûndom number generator is based o n the equation: r-, = ( T , - ~ x 16807) mod

Sl4748:3647. Note tiiat this is one of the two linear congruentiai generators used in Section 4.2.

To stàrt the computation of r,: 7-0 is initially set to 12345678. The valiies of 7., where 1 5 n 5 100 make up set tg Table -4.1 presents the seeds in the set.

A.2 Set 2, tg2

The second set, t g 2 , is generated using the digits of ir. A seed is formed using 9 secpential

digits of ri. For esample, the first seed in this set is 314159265. Tables A 2 and A.3 present the

seeds in this set.

Page 95: University of Toronto · 2020. 4. 8. · -4 t liesis subrnitted in conformitjr \vit h tlie requirernents for tlie clegree of Alaster of Applied Science Graclrrate Department of Electrical

Table A. 1: The seed vdiies in set t g ,

Page 96: University of Toronto · 2020. 4. 8. · -4 t liesis subrnitted in conformitjr \vit h tlie requirernents for tlie clegree of Alaster of Applied Science Graclrrate Department of Electrical

Table A.2: The 6rst hundred seed values in set t g2

Page 97: University of Toronto · 2020. 4. 8. · -4 t liesis subrnitted in conformitjr \vit h tlie requirernents for tlie clegree of Alaster of Applied Science Graclrrate Department of Electrical

Table A.3: The second hundred seecl vdues in set tg2

Page 98: University of Toronto · 2020. 4. 8. · -4 t liesis subrnitted in conformitjr \vit h tlie requirernents for tlie clegree of Alaster of Applied Science Graclrrate Department of Electrical

Bibliography

[l] Brockington, M G 7 (1996). A Taxonomy of Parallel Game-Tree Searching Algorithms.

[CC-4 .Jo.t~rnal. Vol. 19, No. :3. pp. 162 - 174.

[2] Culler, D-E.. Singh. J.P. with Gupta, A. (1999). Parallel Computer Architecture: A Hard-

ware/So_ftware Approach. Morgan Kaufmann.

[37 Donninger. C. (1993). Nul1 Move and Deep Search: Selective-Search Heuristics for Obtuse

Chess Prog~ams. iCC-4 .Jownal. Vol. 9. No. 1- pp. 3 - 19-

[4] Fos ter. 1. ( 1995). Desigwing and Programming Parallel Progra,rns. Addison-LVësley.

[5] Feldmann. R. (1993). Game Tree Search on Massivelg Parallel Sjstems. Ph-D. Thesis,

University of Paderborn, Paderborn, Gerrnany.

[6] Feldm,mn, R-. Moilien, B., Mysliwietz, P. and Vornberger, 0- (1989). Distribritecl Game

Tree Searcli. ICCA .Journal, Vol. 12. No. 2, pp. 65 - 73.

[7] Hq-kinl S. (1999). Neural Networks: A Cornprehensive Fowzdation. 2nd ed.. L,lacniil-

lan/IEEE Press.

[SI Hj.att, RAI. (198s). -4 High-Performance Parallel Algorithm to Search Depth-First G a m e

Trees. Ph.D- Thesis. University of Alabama. Birmingham.

[9] Hyatt, RM. (l!Xlï). The Dynamic Tree-Splitting Parallel Search Algorithm. ICCA Journc~l,

Vol. 20, No. 1. pp. 3 - 19.

[IO] Kaindl, H. (1988). Useful Statistics from Tournament Prograrns. ICC.4 Journal. Vol. 11,

No. 4. pp. 156 - 159.

[ll] Knutli. D. E. (1998). The A r t of Computer Programming, Vol. 2: Seminumerical Algo-

rithms. 3rd ed., Addison-Wesley, Ontario.

[12] Knuth, D.E. and Moore, R.W. (1915). An Analysis of Alpha-Beta Pruniug. Artificial In-

telligence, Vol. 6: No, 4, pp. 293 - 326.

Page 99: University of Toronto · 2020. 4. 8. · -4 t liesis subrnitted in conformitjr \vit h tlie requirernents for tlie clegree of Alaster of Applied Science Graclrrate Department of Electrical

[13] Kopec, D. and Bratko, 1. (1982). The Bratko-Kopec Experiment: A Cornparison of Human

and Cornputer Performance in Chess. ln M.R.B. Clarke, editor. ddvances in Co,mputer

Chess 3, pp. 57 - 72. Perniagon Press, OxFord.

[14] Nanohürarajah, V. (199,). Rajah: The design of a chess program. ICC-4 Jownal. Vol. 20.

No. 3, pp. 87 - 91.

[15] hI:uslcmcll T.A. (1986). A Review of Game-Tree Pruning ICCA Journal. Vol. Q, No. 1, pp.

3 - 19.

[lô] h~1,usland. T.,Ji. and Campbell, M.S. (1952). Paralle1 Searcli of Strongly Orclerecl Game

Trees. -4CM Computing Sumeys, Vol. 14, No. 4. pp. 533 - -5.51.

[II] i\;l,uslcmd? TA. <and Popowich, F. (1985). Paralle1 Gcuno,Tree Search. IEEE Transactions

on Pattern Analysis and Machine Intelligence. Vol. PAhlZI-7, No. 4, pp. 442 - 452.

LIS] h:Z.l.rsland~ TA., Reinefeld, A. and Schaeffer, J. (1957). Low Overheacl Alternatives to

SSS*. -4rtificial Intelligence. Vol. 31, No- 2. pp. 185 - 199.

[19] Microsoft Platform SDK Documentation.

[20] Reinefeld, A. (1983). An Improvement of the Scout Tree-Search Algorithm. ICC.4 .Jo~urnal,

Vol. 6, No. 4, pp. 4 - 14.

[21] Schaeffer. J . (1989). Distributeci Game-Tree Searcliing. Journal of Parallel and Distrihuted

Computing, Vol. 6. N o . 3, pp. 90 - 114.

p2] Shanrion. C. E. (1950). Progrstmmiiig a Computer for Playing Chess. Philosophical Maga-

zine, Vol, 41: No. 7, pp. 256 - 275.

[23] Slate, D.J. (1987). A Chess Program that uses its TrCmsposition Table to Learn from

Experience. ICCA Journal, Vol. 10, No. 2. pp. 59 - 71.