Top Banner
Clustered Indexing for Conditional Branch Predictors Veerle Desmet Ghent University Belgium
38

Clustered Indexing for Conditional Branch Predictors Veerle Desmet Ghent University Belgium.

Dec 16, 2015

Download

Documents

Maud Hancock
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Clustered Indexing for Conditional Branch Predictors Veerle Desmet Ghent University Belgium.

Clustered Indexingfor Conditional Branch Predictors

Veerle DesmetGhent University

Belgium

Page 2: Clustered Indexing for Conditional Branch Predictors Veerle Desmet Ghent University Belgium.

Clustered Indexingfor Conditional Branch Predictors

Veerle DesmetGhent University

Belgium

Page 3: Clustered Indexing for Conditional Branch Predictors Veerle Desmet Ghent University Belgium.

3

Conditional Branches

if (i > 0)/* something */

else /* something else */

for (i=0; i<50; i++) {

/* a loop... */ }/* next statements */ How frequent do

conditional branches occur?

1/8

Page 4: Clustered Indexing for Conditional Branch Predictors Veerle Desmet Ghent University Belgium.

4

Program Execution

Fetch = take next instruction Decode = analyze type and

read operands Execute Write Back = write result

Fetch Decode Execute Write Back

R1=R2+R3

addition4 3

computation

R1 contains 7

Page 5: Clustered Indexing for Conditional Branch Predictors Veerle Desmet Ghent University Belgium.

5

Pipelined architectures

Parallel versus sequential:

Constant flow of instructions possible Faster applications Limitation due to conditional branches

Fetch Decode Execute Write Back

R1=R2+R3R1=R2+R3R5=R2+1 R1=R2+R3R5=R2+1R4=R3-1 R1=R2+R3R5=R2+1R4=R3-1R7=2*R1 R5=R2+1R4=R3-1R7=2*R1R5=R6 R4=R3-1R7=2*R1R5=R6R1>0

Page 6: Clustered Indexing for Conditional Branch Predictors Veerle Desmet Ghent University Belgium.

6

Problem: Branches

Branches introduce bubbles

Affects pipeline throughputFetch Decode Execute Write Back

R1=R2+R3 R1=R2+R3R5=R2+1 R5=R2+1R4=R3-1R7=2*R1R5=R6 R5=R6if R1>0 if R1>0 R5=R2+1 R5=R6? if R1>0 R5=R2+1?? if R1>0R7=2*R1

R1=R2+R3

R5=R2+1

R7=0

R7=2*R1

R5=R6

if R1>0

else

thenR2=R2-1

Page 7: Clustered Indexing for Conditional Branch Predictors Veerle Desmet Ghent University Belgium.

7

Solution: Prediction

Fetch those instructions that are likely to be executed

Fetch Decode Execute Write Back

R1=R2+R3

R5=R2+1

R7=0

R7=2*R1

R5=R6

if R1>0

else

thenR2=R2-1

R1=R2+R3 R1=R2+R3R5=R2+1 R5=R2+1R4=R3-1R7=2*R1R5=R6 R5=R6if R1>0 if R1>0 R5=R2+1 R5=R6R7=2*R1 if R1>0 R5=R2+1R7=2*R1R2=R2-1

correct prediction = gainmisprediction = penalty

Page 8: Clustered Indexing for Conditional Branch Predictors Veerle Desmet Ghent University Belgium.

8

Nowaday’s Architecturein

stru

ctio

n c

ach

e

fetc

h

deco

de

reg

iste

r re

nam

e

dis

patc

h

inst

ruct

ion

w

ind

ow

re-orderlogic

functionalunit

register file

IPC

functionalunit

functionalunit

functionalunit

Branchpredictor

Page 9: Clustered Indexing for Conditional Branch Predictors Veerle Desmet Ghent University Belgium.

Clustered Indexingfor Conditional Branch Predictors

Veerle DesmetGhent University

Belgium

Page 10: Clustered Indexing for Conditional Branch Predictors Veerle Desmet Ghent University Belgium.

10

Bimodal Branch Predictor

Predict outcome of condition e.g. if or else based on unique branch

address

Update prediction table

k

Branch address

predictiontable

Page 11: Clustered Indexing for Conditional Branch Predictors Veerle Desmet Ghent University Belgium.

11

Global History Branch Predictor

k

Global history

predictiontable

Predict outcome of condition e.g. for loop based on global history 111101111011110

Update prediction table and global history

Page 12: Clustered Indexing for Conditional Branch Predictors Veerle Desmet Ghent University Belgium.

12

Gshare Branch Predictor

k

Global history

Branch address

Original index

predictiontable

[McFarling]

XOR

Page 13: Clustered Indexing for Conditional Branch Predictors Veerle Desmet Ghent University Belgium.

13

Misprediction rate: gshare

0

5

10

15

20

25

10 100 1000 10000 100000 1000000

predictor size (bytes)

mis

pred

ictio

n ra

te

SPEC INT 2000

better

Page 14: Clustered Indexing for Conditional Branch Predictors Veerle Desmet Ghent University Belgium.

14

Aliasing Resource limitations:

8 entries, index = 3 bits index 101

Two different branches using the same prediction information

3 bit index

Index=101

Index=101B

A

predictiontable

Page 15: Clustered Indexing for Conditional Branch Predictors Veerle Desmet Ghent University Belgium.

15

Aliasing

05

101520253035404550

16 32 64 128

256

512

1024

2048

4096

8192

1638

4

3276

8

6553

6

1310

7

2621

4

5242

8

predictor size (bytes)

alia

s ra

te (

%)

destructive

constructive

neutral

SPEC INT 2000

Page 16: Clustered Indexing for Conditional Branch Predictors Veerle Desmet Ghent University Belgium.

Clustered Indexingfor Conditional Branch Predictors

Veerle DesmetGhent University

Belgium

Page 17: Clustered Indexing for Conditional Branch Predictors Veerle Desmet Ghent University Belgium.

17

Basic Observations

Branches with similar behavior can share prediction information 1 1 1 1 0 0 0 0 1 1 1 1 0 1 0 1 1 1 1 1 0 0 0 0 1 1 1 1 0 1 0 1

Branches can use same table entry, e.g. 1 1 1 1 0 0 0 0 1 1 1 1 0 1 0

time

Page 18: Clustered Indexing for Conditional Branch Predictors Veerle Desmet Ghent University Belgium.

18

Time Varying Behavior

1 1 1 1 0 0 0 0 1 1 1 1 0 1 0 1

1 1 1 1 0 0 0 0 1 1 1 1 0 1 0 1 1 1 1 1 0 0 1 0 0

1 1 1 1 0 1 0

100% 0% 100% 50%

100% 0% 100% 60%

100% 25% 0% NE

NE NE 100% 33%

A:B:C:D:

A:B:C:D:

phase

NE = not executed

phase phase phase

Page 19: Clustered Indexing for Conditional Branch Predictors Veerle Desmet Ghent University Belgium.

19

Branch Clustering

Each branch represents a point in N-dim space

Clusters formed by k-means algorithm

A:B:C:D:

100% 0% 100% 50%

100% 0% 100% 60%

100% 25% 0% NE

NE NE 100% 33%

Page 20: Clustered Indexing for Conditional Branch Predictors Veerle Desmet Ghent University Belgium.

20

k-Means Cluster Algorithm

X

X

1. initial centers 2. calculate nearest center

X

X

4. Restart with new centers

XX

3. redefine centers

X

X

XX

Page 21: Clustered Indexing for Conditional Branch Predictors Veerle Desmet Ghent University Belgium.

21

k-Means Cluster Algorithm

XX

1. initial centers

Stable solution

X X X X

X X

2. calculate nearest centers

3. redefine centers

Page 22: Clustered Indexing for Conditional Branch Predictors Veerle Desmet Ghent University Belgium.

22

Determining k of k-Means

k is chosen by BIC-score (Bayesian Information Criterion) Tradeoff between k and goodness of a clustering

Stable solution with k=2

X X

Stable solution with k=3

X

XX

best?

Page 23: Clustered Indexing for Conditional Branch Predictors Veerle Desmet Ghent University Belgium.

23

Branch Clustering

SPEC INT 2000 from 8 to 33 clusters mcf: 8 gcc, parser: 33

Each branch belongs to exactly one cluster

100% 0% 100% 50%

100% 0% 100% 60%

100% 25% 0% NE

NE NE 100% 33%

A:B:C:D:

Cluster

Cluster

Cluster

Cluster

Page 24: Clustered Indexing for Conditional Branch Predictors Veerle Desmet Ghent University Belgium.

Clustered Indexingfor Conditional Branch Predictors

Veerle DesmetGhent University

Belgium

Page 25: Clustered Indexing for Conditional Branch Predictors Veerle Desmet Ghent University Belgium.

25

Subtables Example

8 entries, index = 3 bits

4 clusters, 2 bits Original index 101

3

Index = 1Cluster

predictiontable

Page 26: Clustered Indexing for Conditional Branch Predictors Veerle Desmet Ghent University Belgium.

26

Subtables Example

8 entries, index = 3 bits

4 clusters, 2 bits Original index 101

3

Index = 1Cluster

predictiontable

Page 27: Clustered Indexing for Conditional Branch Predictors Veerle Desmet Ghent University Belgium.

27

Subtables Example

8 entries, index = 3 bits

4 clusters, 2 bits Original index 101

3 to 6 bits for cluster [SPECint2000]

can be used in every predictor scheme

3

Index = 1Cluster

predictiontable

Page 28: Clustered Indexing for Conditional Branch Predictors Veerle Desmet Ghent University Belgium.

28

Subtables for Bimodal

Cluster Branch addr

predictiontable

0

5

10

15

20

25

10 100 1000 10000 100000 1000000

predictor size (bytes)

mis

pre

dic

tion

rat

e

bimodal original

bimodal clustered

Page 29: Clustered Indexing for Conditional Branch Predictors Veerle Desmet Ghent University Belgium.

29

Subtables for Gshare

Cluster Branch addrprediction

table

Global history

0

5

10

15

20

25

10 100 1000 10000 100000 1000000

mis

pre

dic

tion

rat

e

gshare original

gshare clustered

predictor size (bytes)

19% better for SMALL predictors

Page 30: Clustered Indexing for Conditional Branch Predictors Veerle Desmet Ghent University Belgium.

30

Why Clustered Indexing Works

05

101520253035404550

16 32 64 128

256

512

1024

2048

4096

8192

1638

4

3276

8

6553

6

1310

7

2621

4

5242

8

predictor size (bytes)

alia

s ra

te (

%)

destructive

constructive

neutral

05

101520253035404550

16 32 64 128

256

512

1024

2048

4096

8192

1638

4

3276

8

6553

6

1310

7

2621

4

5242

8

predictor size (bytes)

alia

s ra

te (

%)

destructive

constructive

neutral

Subtabling Uses smaller predictors More aliasing expected… but

More constructive aliasing

Page 31: Clustered Indexing for Conditional Branch Predictors Veerle Desmet Ghent University Belgium.

31

Hashing: Alternative to Subtables

Keeps original global history length

Global history

Gshare ix index

Branch addr Cluster

predictiontable

Page 32: Clustered Indexing for Conditional Branch Predictors Veerle Desmet Ghent University Belgium.

32

Hashing for Gshare

0

5

10

15

20

25

10 100 1000 10000 100000 1000000

predictor size (bytes)

mis

pre

dic

tion

ra

te

gshare original

gshare clustered: subtables

gshare clustered: hashed

3,5

4

4,5

5

5,5

6

6,5

7

7,5

1000 10000 100000 1000000

predictor size (bytes)

mis

pre

dic

tion

ra

te

gshare original

gshare clustered: subtables

gshare clustered: hashed

5% better for LARGE predictors

Page 33: Clustered Indexing for Conditional Branch Predictors Veerle Desmet Ghent University Belgium.

33

Self Profile-Based Clustering

Limit study Identified clusters optimal for given

execution

100% 0% 100% 50%

100% 0% 100% 60%

100% 25% 0% NE

NE NE 100% 33%

A:B:C:D:

Cluster

Cluster

Cluster

Cluster

Page 34: Clustered Indexing for Conditional Branch Predictors Veerle Desmet Ghent University Belgium.

34

Cross Profile-Based Clustering

100% 0% 100% 50%

100% 0% 100% 60%

100% 25% 0% NE

NE NE 100% 33%

A:B:C:D:

Cluster

Cluster

Cluster

Cluster

SELF

90% 10% 100% 60%

NE NE NE NE

100% 25% NE NE

NE NE 100% 33%

0% 0% 10% 20%

A:B:C:D:E:

Cluster

Cluster

Cluster

SPEC-train inputs

OK Cluster

additional cluster for unseen branches

Cluster

Page 35: Clustered Indexing for Conditional Branch Predictors Veerle Desmet Ghent University Belgium.

35

Cross Profile-Based Clustering

0

5

10

15

20

25

10 100 1000 10000 100000 1000000

predictor size (bytes)

mis

pred

ictio

n ra

te

bimodal originalbimodal self clusteredbimodal cross clustered

0

5

10

15

20

25

10 100 1000 10000 100000 1000000

predictor size (bytes)

mis

pred

ictio

n ra

te

gshare originalgshare self clusteredgshare cross clustered

3,5

4

4,5

5

5,5

6

6,5

7

7,5

1000 10000 100000 1000000

predictor size (bytes)

mis

pred

ictio

n ra

te

gshare originalgshare self clustered: subtablesgshare self clustered: hashedgshare cross clustered: subtablesgshare cross clustered: hashed

cross clustered still good

GSHARE @ small budgets: subtables

12.3% less mispredictions(19% self clustered)

@ large budgets: hashing3% better(5% self clustered)

Page 36: Clustered Indexing for Conditional Branch Predictors Veerle Desmet Ghent University Belgium.

36

Conclusion

Small branch predictors suffer from aliasing frequently destructive

Exploit constructive aliasing by clustering branches

Implementation subtables (can be used in all branch prediction

schemes) hashing (specific for gshare)

Gshare misprediction rate @ 1KiB: reduced by 19% (self), 12.3% (cross)@ 256KiB: reduced by 5% (self), 3% (cross)

Page 37: Clustered Indexing for Conditional Branch Predictors Veerle Desmet Ghent University Belgium.

Questions?

Page 38: Clustered Indexing for Conditional Branch Predictors Veerle Desmet Ghent University Belgium.

The End