Top Banner
Chen-Yong Cher & T. N. Vijaykumar School of Electrical and Computer Engineering Purdue University http://www.ece.purdue.edu/~vijay
28

Chen-Yong Cher & T. N. Vijaykumar - Microarch · Chen-Yong Cher & T. N. Vijaykumar School of Electrical and Computer Engineering Purdue University vijay

May 31, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Chen-Yong Cher & T. N. Vijaykumar - Microarch · Chen-Yong Cher & T. N. Vijaykumar School of Electrical and Computer Engineering Purdue University vijay

Chen-Yong Cher & T. N. Vijaykumar

School of Electrical and Computer EngineeringPurdue University

http://www.ece.purdue.edu/~vijay

Page 2: Chen-Yong Cher & T. N. Vijaykumar - Microarch · Chen-Yong Cher & T. N. Vijaykumar School of Electrical and Computer Engineering Purdue University vijay

Copyright © 2001 by Chen-Yong Cher & T. N. Vijaykumar 2

Accuracy is not 100% due to difficult branches� Complex branching patterns� Conflicts in prediction tables

Trends show deeper pipelines (e.g., 20-stage Pentium 4)� One misprediction squash

� At least 15 cycles or 15 x 4 = 60 instructions� At 5% mispredictions, CPI = 0.25 + 0.2*0.05*15 = 0.40

� Actually, squashes cost more due to late outcomes

Branch mispredictions cause significant performance loss

Page 3: Chen-Yong Cher & T. N. Vijaykumar - Microarch · Chen-Yong Cher & T. N. Vijaykumar School of Electrical and Computer Engineering Purdue University vijay

Copyright © 2001 by Chen-Yong Cher & T. N. Vijaykumar 3

branch PC2

A…

B…

C…

TakenNot Taken

Control-flow independent

Control-flow dependent

ExecutedIrrespectiveof branch outcome

Page 4: Chen-Yong Cher & T. N. Vijaykumar - Microarch · Chen-Yong Cher & T. N. Vijaykumar School of Electrical and Computer Engineering Purdue University vijay

Copyright © 2001 by Chen-Yong Cher & T. N. Vijaykumar 4

Skip over control-flow dependent code� For only difficult branches� Without even fetching control-flow dependent code� Execute control-flow independent code� Execute control-flow dependent code after branch resolves� Conserve hardware resources

Today’s OoO pipelines routinely exploit data independence� But not control-flow independence directly

Page 5: Chen-Yong Cher & T. N. Vijaykumar - Microarch · Chen-Yong Cher & T. N. Vijaykumar School of Electrical and Computer Engineering Purdue University vijay

Copyright © 2001 by Chen-Yong Cher & T. N. Vijaykumar 5

� Introduction� Skipper: When and Where to Skip� Skipper: How to skip� Results� Conclusions

Page 6: Chen-Yong Cher & T. N. Vijaykumar - Microarch · Chen-Yong Cher & T. N. Vijaykumar School of Electrical and Computer Engineering Purdue University vijay

Copyright © 2001 by Chen-Yong Cher & T. N. Vijaykumar 6

Tim

e Some of data-independent C

Correct Incorrect Skipper

Predict not taken Predict taken Skip

Resolve not taken Resolve not taken Resolve not taken

Some of A & C Some of B & C

Rest of A & C Squash ALL B & C

Re-execute ALL of A & C

A & rest of CBranch PC2

A B

C

IncorrectCorrect

Page 7: Chen-Yong Cher & T. N. Vijaykumar - Microarch · Chen-Yong Cher & T. N. Vijaykumar School of Electrical and Computer Engineering Purdue University vijay

Copyright © 2001 by Chen-Yong Cher & T. N. Vijaykumar 7

Execution is out of orderBut fetch and rename are in orderInstruction Window maintains precise interrupt

Relies on fetching in program order

predict/fetch decode rename

OoOissue

regread execute

branchor

cachewriteback

Page 8: Chen-Yong Cher & T. N. Vijaykumar - Microarch · Chen-Yong Cher & T. N. Vijaykumar School of Electrical and Computer Engineering Purdue University vijay

Copyright © 2001 by Chen-Yong Cher & T. N. Vijaykumar 8

Skipping results in out-of-order fetching� First fetch control-flow independent� Then fetch control-flow dependent

Convince an in-order fetch pipeline to fetch out-of-order!

Page 9: Chen-Yong Cher & T. N. Vijaykumar - Microarch · Chen-Yong Cher & T. N. Vijaykumar School of Electrical and Computer Engineering Purdue University vijay

Copyright © 2001 by Chen-Yong Cher & T. N. Vijaykumar 9

� Introduction� Skipper: When and Where to Skip� Skipper: How to skip� Results� Conclusions

Page 10: Chen-Yong Cher & T. N. Vijaykumar - Microarch · Chen-Yong Cher & T. N. Vijaykumar School of Electrical and Computer Engineering Purdue University vijay

Copyright © 2001 by Chen-Yong Cher & T. N. Vijaykumar 10

� When: only difficult branches �JRS low confidence predictor [MICRO ‘96]�Count consecutive correct predictions�Identify as difficult if recently mispredicted

Page 11: Chen-Yong Cher & T. N. Vijaykumar - Microarch · Chen-Yong Cher & T. N. Vijaykumar School of Electrical and Computer Engineering Purdue University vijay

Copyright © 2001 by Chen-Yong Cher & T. N. Vijaykumar 11

� Hardware Heuristic based on If-Then-Else� Learn and keep in table� Branch PC2 # difficult branch (step 1)� A� …� Jump PC3 # jump instruction (step 2)� PC2: B # target of difficult branch� …� PC3: C # target of jump instruction

� Reconvergence PC: PC3 for If-Then-Else, PC2 for If (step 3)

Page 12: Chen-Yong Cher & T. N. Vijaykumar - Microarch · Chen-Yong Cher & T. N. Vijaykumar School of Electrical and Computer Engineering Purdue University vijay

Copyright © 2001 by Chen-Yong Cher & T. N. Vijaykumar 12

� Introduction� Skipper: When and Where to Skip� Skipper: How to skip� Results� Conclusions

Page 13: Chen-Yong Cher & T. N. Vijaykumar - Microarch · Chen-Yong Cher & T. N. Vijaykumar School of Electrical and Computer Engineering Purdue University vijay

Copyright © 2001 by Chen-Yong Cher & T. N. Vijaykumar 13

�Create a gap in instruction window �Fill the gap later when fetching skipped instructions

�Learn the gap length from past�Use largest length of if/else paths conservatively�squash if actual instruction count exceeds gap length

Despite out-of-order fetch, program order in I-window

Page 14: Chen-Yong Cher & T. N. Vijaykumar - Microarch · Chen-Yong Cher & T. N. Vijaykumar School of Electrical and Computer Engineering Purdue University vijay

Copyright © 2001 by Chen-Yong Cher & T. N. Vijaykumar 14

Prog

ram

Ord

er

Instruction Window

Gap

Control-flow independent

Control-flow dependent A B

C

Branch PC2Head

Tail

Program Order

FetchedFirst

FetchedLater

Out-Of-OrderFetching

Page 15: Chen-Yong Cher & T. N. Vijaykumar - Microarch · Chen-Yong Cher & T. N. Vijaykumar School of Electrical and Computer Engineering Purdue University vijay

Copyright © 2001 by Chen-Yong Cher & T. N. Vijaykumar 15

Prog

ram

Ord

er

Instruction Window

GapA B

C

Branch PC2Head

Tail

Program Order

FetchedFirst

FetchedLater

Inputregs (2)

Outputregs (1)

Page 16: Chen-Yong Cher & T. N. Vijaykumar - Microarch · Chen-Yong Cher & T. N. Vijaykumar School of Electrical and Computer Engineering Purdue University vijay

Copyright © 2001 by Chen-Yong Cher & T. N. Vijaykumar 16

� How will data dependent instructions wait for skipped instructions�Learn outputregs written by control dependent insts�Preallocate and preassign for outputregs, mark “busy” �Insert Pmoves instructions after gap filled�pmoves copy values to preallocated after gap filled

�If actual output not in outputregs, squash

Use normal rename and wake-up mechanism

Gap

Page 17: Chen-Yong Cher & T. N. Vijaykumar - Microarch · Chen-Yong Cher & T. N. Vijaykumar School of Electrical and Computer Engineering Purdue University vijay

Copyright © 2001 by Chen-Yong Cher & T. N. Vijaykumar 17

� How will control-flow dependent instructions know the correct registers to source�Learn inputregs read by control dependent insts�Cannot backup all rename maps in single cycle�Backup only inputregs and outputregs

�Skipped instructions use backup rename table�If actual input not in inputregs, squash

Use normal rename backup mechanismGap

Page 18: Chen-Yong Cher & T. N. Vijaykumar - Microarch · Chen-Yong Cher & T. N. Vijaykumar School of Electrical and Computer Engineering Purdue University vijay

Copyright © 2001 by Chen-Yong Cher & T. N. Vijaykumar 18

backup in/outputregs’ rename maps

wbmem/br

execreadOoOissue

rendecfet

fetch next from reconv PC

mark busy

create Inst-Window gap

allocate new regs

place in Inst-Window gap

fetch skippedinsts

Last inst Inserts pmoves

Preassign for outputregs

Usual

DifficultBranch

Skipped lookup in backup rename table

Page 19: Chen-Yong Cher & T. N. Vijaykumar - Microarch · Chen-Yong Cher & T. N. Vijaykumar School of Electrical and Computer Engineering Purdue University vijay

Copyright © 2001 by Chen-Yong Cher & T. N. Vijaykumar 19

� Introduction� Skipper: When and Where to Skip� Skipper: How to skip� Results� Conclusions

Page 20: Chen-Yong Cher & T. N. Vijaykumar - Microarch · Chen-Yong Cher & T. N. Vijaykumar School of Electrical and Computer Engineering Purdue University vijay

Copyright © 2001 by Chen-Yong Cher & T. N. Vijaykumar 20

Simplescalar simulator� 8k/8k/8k entries Hybrid predictors, commit-update� 9-cycle misprediction penalty� 4K-entry, 4-bit JRS

� 64K 2-way L1 I & D caches, 2M L2 cache

� 128-entry information table of 3KB total

Page 21: Chen-Yong Cher & T. N. Vijaykumar - Microarch · Chen-Yong Cher & T. N. Vijaykumar School of Electrical and Computer Engineering Purdue University vijay

Copyright © 2001 by Chen-Yong Cher & T. N. Vijaykumar 21

� Speedup 10% over base � Compress – deep data dependent� Cc1, go –mispredictions in control-dependent path� Perl, vortex – low misprediction rate and low coverage

0.900.951.001.051.101.151.20

cc1

compre

ss go

ijpeg li

m88ks

im perl

vorte

x

Spee

dup 128

256

Page 22: Chen-Yong Cher & T. N. Vijaykumar - Microarch · Chen-Yong Cher & T. N. Vijaykumar School of Electrical and Computer Engineering Purdue University vijay

Copyright © 2001 by Chen-Yong Cher & T. N. Vijaykumar 22

� Speedup 8% over Polypath� Polypath executes both if & else paths� Equal I-cache bandwidth for all machines

0.900.951.001.051.101.151.20

cc1co

mpress go ijpeg li

m88ksim perl

vortex

Skipper 128Polypath 128

Skipper 256Polypath 256

Spee

dup

Page 23: Chen-Yong Cher & T. N. Vijaykumar - Microarch · Chen-Yong Cher & T. N. Vijaykumar School of Electrical and Computer Engineering Purdue University vijay

Copyright © 2001 by Chen-Yong Cher & T. N. Vijaykumar 23

� Actual Coverage Mean: 23% of mispredictions� Overshoot Mean: 4.3% of all branches

� Mean of branch misprediction rate�Skipper’s: 4.06%�Superscalar’s 6.53%

Page 24: Chen-Yong Cher & T. N. Vijaykumar - Microarch · Chen-Yong Cher & T. N. Vijaykumar School of Electrical and Computer Engineering Purdue University vijay

Copyright © 2001 by Chen-Yong Cher & T. N. Vijaykumar 24

Exploits control-flow independence for difficult branches� Fetch control-independent code while branch resolves� Fetch control-dependent code after the branch is resolved

� Out-of-order instruction fetch � Mechanisms: Inst-Window gap, Preallocation, Pmoves

� Performs better �10% over Superscalar�8% over Polypath

Page 25: Chen-Yong Cher & T. N. Vijaykumar - Microarch · Chen-Yong Cher & T. N. Vijaykumar School of Electrical and Computer Engineering Purdue University vijay

Copyright © 2001 by Chen-Yong Cher & T. N. Vijaykumar 25

A B

C…

Branch PC6

Branch PC2

Program Order Predictor relies on fetching in-order

Missingpatternhistories

Shift In Predictionhistory

Pattern History

Page 26: Chen-Yong Cher & T. N. Vijaykumar - Microarch · Chen-Yong Cher & T. N. Vijaykumar School of Electrical and Computer Engineering Purdue University vijay

Copyright © 2001 by Chen-Yong Cher & T. N. Vijaykumar 26

� Compiler might confuse reconvergence PC heuristic1. Compiler changes code patterns (trace scheduling)

� But only performed non-difficult branches

2. Compiler changes control instructions(branch to jump)

3. Compiler increases # of control-dependent: (Example: tail duplication)� Increasing gap length to unacceptably large number

Page 27: Chen-Yong Cher & T. N. Vijaykumar - Microarch · Chen-Yong Cher & T. N. Vijaykumar School of Electrical and Computer Engineering Purdue University vijay

Copyright © 2001 by Chen-Yong Cher & T. N. Vijaykumar 27

2416891208798go

Misprediction RatioCoverage

981009910098

10092

HeuristicAccuracy

112177988vortex432169894perl4211329078m88ksim846177796li938589690ijpeg

12892510098compress1087197592cc1

Superscalar’s

Skipper’sOvershootActualHeuristicJRSBenchmarks

Page 28: Chen-Yong Cher & T. N. Vijaykumar - Microarch · Chen-Yong Cher & T. N. Vijaykumar School of Electrical and Computer Engineering Purdue University vijay

Copyright © 2001 by Chen-Yong Cher & T. N. Vijaykumar 28

2110881.4go

13891613

1014

#slot

8551.0vortex4551.3perl5442.1m88ksim9551.2li5662.0ijpeg

4341.5compress7461.4cc1

#inst#out#in#gaps

Benchmarks