Top Banner
EECE476: Computer Architecture Lecture 18: Pipelining Control Hazards Chapter 6.6 The University of British Columbia EECE 476 © 2005 Guy Lemieux
27

EECE476: Computer Architecture Lecture 18: Pipelining Control Hazards Chapter 6.6 The University of British ColumbiaEECE 476© 2005 Guy Lemieux.

Dec 19, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: EECE476: Computer Architecture Lecture 18: Pipelining Control Hazards Chapter 6.6 The University of British ColumbiaEECE 476© 2005 Guy Lemieux.

EECE476: Computer Architecture

Lecture 18: PipeliningControl Hazards

Chapter 6.6

The University ofBritish Columbia EECE 476 © 2005 Guy Lemieux

Page 2: EECE476: Computer Architecture Lecture 18: Pipelining Control Hazards Chapter 6.6 The University of British ColumbiaEECE 476© 2005 Guy Lemieux.

2

Administratia• Project

– Phase 1 now online– Phases 2, 3 coming soon…

• Partner signup deadline: OCTOBER 21– MUST work in pairs

• EMAIL ME IMMEDIATELY IF YOU’RE STILL ALONE– MUST email to project TA: 2 names, stud #s, emails– Late penalty: 5% of final grade

• Phase 1: single-cycle CPU– Suggest you finish by November 1

• Phase 2: convert to 5-stage pipelined CPU with FDU & HDU– Do phase 1 first !!!!– While doing phase 1, keep in mind that you’ll be adding pipelining later

Page 3: EECE476: Computer Architecture Lecture 18: Pipelining Control Hazards Chapter 6.6 The University of British ColumbiaEECE 476© 2005 Guy Lemieux.

3

Review: Hazards• 3 Types of Hazards

– Structural Hazard– Data Hazard– Control Hazard

• Structural – functional unit is busy– Eg, multicycle multiplier which is not fully pipelined

• Data Hazards– Cause: Dependency between instructions– 3 Types of Data Hazards (RAW, WAR, WAW)

• Control Hazards– Today!

Page 4: EECE476: Computer Architecture Lecture 18: Pipelining Control Hazards Chapter 6.6 The University of British ColumbiaEECE 476© 2005 Guy Lemieux.

4

Review: Dependencies

• Dependency definition– Two instructions

• Executed closely together in time• Share linkage by using a common register

– Desired outcome is clear• Defined by ORIGINAL PROGRAM ORDER

• Data hazard problem– Hardware sometimes overlaps or reorders instructions– Potential violation of dependency

• Solution– Original dependency must be preserved!– Hardware must obey ORIGINAL PROGRAM ORDER semantics

Page 5: EECE476: Computer Architecture Lecture 18: Pipelining Control Hazards Chapter 6.6 The University of British ColumbiaEECE 476© 2005 Guy Lemieux.

5

Review: Data Hazards• Three kinds of data hazards: RAW, WAR, WAW

– In each case below, instruction pairs form a dependency– Consider what may happen if the two instructions are re-ordered

– Read after Write (aka true dependence)ADD $s0, $s1,$s2 <- writes $s0ADD $s3,$s0,$s0 <- reads $s0

– Write after Read (aka false dependence)• Doesn’t occur in our simple MIPS pipelineADD $s0,$s1,$s2 <- reads $s1ADD $s1,$s2,$s2 <- writes $s1

– Write after Write (aka output dependence)• Doesn’t occur in our simple MIPS pipelineADD $s0,$s1,$s2 <- writes $s0ADD $s0,$s3,$s4 <- writes $s0

• WAR, WAW occur in more complex CPUs with multiple pipelines

Page 6: EECE476: Computer Architecture Lecture 18: Pipelining Control Hazards Chapter 6.6 The University of British ColumbiaEECE 476© 2005 Guy Lemieux.

6

Review: RAW Hazards

• Main data hazard in our pipeline– Read-After-Write (RAW)

• Problem– Write to register in first instruction– Read register in subsequent instruction– Read gets stale value from RegFile due to pipelining

• Possible solutions– Forwarding, stalls, compiler (NOPs or reordering code)

• Forwarding doesn’t always work– Load-use delay– Result after “M” stage is too late, must stall

Page 7: EECE476: Computer Architecture Lecture 18: Pipelining Control Hazards Chapter 6.6 The University of British ColumbiaEECE 476© 2005 Guy Lemieux.

7

Control Hazards

• Control hazards arize due to changes in control flow• This is a software thing• Not the same as the “control unit” in hardware

• Control flow

– Normal software execution is PC+4 (straight-line)

– Only branches, jumps change the execution path• We say the “flow of control” has changed

– Often, path taken depends on outcome of operation (eg, beq)• Data-dependent• Hard to predict in advance

Page 8: EECE476: Computer Architecture Lecture 18: Pipelining Control Hazards Chapter 6.6 The University of British ColumbiaEECE 476© 2005 Guy Lemieux.

8

Pipelined Branching Logic, “beq”

Page 9: EECE476: Computer Architecture Lecture 18: Pipelining Control Hazards Chapter 6.6 The University of British ColumbiaEECE 476© 2005 Guy Lemieux.

9

Control Hazards

• Consider “beq” instructionIf (Rs – Rt) == 0 then

PC (PC+4) + SgnExt(Imm16)

ElsePC (PC+4)

• (Rs – Rt) computed by ALU, available after “X”– ALU generates “Zero” output– “Zero” decides next PC

• Controls PCSrc mux: PC+4 or PC+4+SgnExt(Imm16)

– “Zero” computed in X stage

Page 10: EECE476: Computer Architecture Lecture 18: Pipelining Control Hazards Chapter 6.6 The University of British ColumbiaEECE 476© 2005 Guy Lemieux.

10

Branching Example

• Control hazard example due to branch• Consider the machine code

40 BEQ $1, $3, 744 AND $12,$2,$548 OR $13, $6,$252 ADD $14, $2,$2…72 LW $4,50($7)

• Let’s see the pipeline details

BEQ target is ahead by7*4=28 bytes,

44+28=72

Page 11: EECE476: Computer Architecture Lecture 18: Pipelining Control Hazards Chapter 6.6 The University of British ColumbiaEECE 476© 2005 Guy Lemieux.

11

Branching Details, Cycle 1BEQ $1,$3, 7

4044

Page 12: EECE476: Computer Architecture Lecture 18: Pipelining Control Hazards Chapter 6.6 The University of British ColumbiaEECE 476© 2005 Guy Lemieux.

12

Branching Details, Cycle 2BEQ $1,$3, 7

[$1]

[$3]

7

44

AND $12,$2,$5

4448

Page 13: EECE476: Computer Architecture Lecture 18: Pipelining Control Hazards Chapter 6.6 The University of British ColumbiaEECE 476© 2005 Guy Lemieux.

13

Branching Details, Cycle 3AND $12,$2,$5

[$2]

[$5]

?

BEQ $1,$3, 7

[$1]

[$3]

7*4

48

OR $13,$6,$2

48 44+2852

Page 14: EECE476: Computer Architecture Lecture 18: Pipelining Control Hazards Chapter 6.6 The University of British ColumbiaEECE 476© 2005 Guy Lemieux.

14

Branching Details, Cycle 4aOR $13,$6,$2

[$2]

[$5]

?

AND $12,$2,$5

[$2]

[$5]

?

52

ADD $14,$2,$2

52 48+?

BEQ $1,$3, 7

7256

Page 15: EECE476: Computer Architecture Lecture 18: Pipelining Control Hazards Chapter 6.6 The University of British ColumbiaEECE 476© 2005 Guy Lemieux.

15

Branching Details, Cycle 4bOR $13,$6,$2

[$2]

[$5]

?

AND $12,$2,$5

[$2]

[$5]

?

52

ADD $14,$2,$2

52 48+?

BEQ $1,$3, 7

72

72

56

Page 16: EECE476: Computer Architecture Lecture 18: Pipelining Control Hazards Chapter 6.6 The University of British ColumbiaEECE 476© 2005 Guy Lemieux.

16

Branching Details, Cycle 5ADD $14,$2,$2

[$2]

[$2]

?

OR $13,$6,$2

[$6]

[$2]

?

56

LW $4,50($7)

72 ?

AND $12,$2,$5

?

76

76

BEQ$1,$3,7(nothingleft todo here)

Page 17: EECE476: Computer Architecture Lecture 18: Pipelining Control Hazards Chapter 6.6 The University of British ColumbiaEECE 476© 2005 Guy Lemieux.

17

BEQ Executes in “M” Stage

• PCSrc generated in “M” stage– Like “lw”, “beq” outcome not available until “M”

• Consequence?– 3 instructions follow “BEQ” into the pipeline– Instructions from PC+4, PC+8, PC+12

• What if we take the branch?– 3 instructions in pipeline must NOT be executed!– Control hazard arises!– How to prevent their execution?

Page 18: EECE476: Computer Architecture Lecture 18: Pipelining Control Hazards Chapter 6.6 The University of British ColumbiaEECE 476© 2005 Guy Lemieux.

18

Control Hazard Solution 1: Stall

• Stall after every branch/jump– Stop fetching new instructions– Let branch/jump advance in pipeline

• Wait for branch/jump outcome to be decided– After branch/jump decided…– ….fetch from final target PC– Resume normal pipelining

• Performance impact– Always wastes 3 cycles

• Problem– We can’t recognize the branch instruction until it reaches D stage– Already fetched PC+4 in I stage– Must now flush instruction in I stage

Page 19: EECE476: Computer Architecture Lecture 18: Pipelining Control Hazards Chapter 6.6 The University of British ColumbiaEECE 476© 2005 Guy Lemieux.

19

Control Hazard Solution 2: Nullify

• Branch result is in “M” stage– Next 3 instructions (AND,OR,ADD) after branch are fetched– Only partially executed– They alter state of machine (eg, RegFile or DataMem contents)

• State altered only if they reach M stage or W stage– If branch taken

• We can squash, cancel or nullify them before they reach M or W

• How to nullify?– Change control signals to “NOP” instruction (usually all zeros)– Extra control logic!

• Note– 4th instruction fetched (LW) is correct and is always executed

Page 20: EECE476: Computer Architecture Lecture 18: Pipelining Control Hazards Chapter 6.6 The University of British ColumbiaEECE 476© 2005 Guy Lemieux.

20

Nullify Performance Impact?

• If branch is taken– We must nullify 3 instructions– Wasted CPU effort (3 CPU cycles!)

• If branch is not taken– We did useful work

• What is frequency of taken vs not taken ?– About 80-90% backward branches are taken (loops)– About 50% of forward branches are taken (if/else statements)

• Taken is most frequent, so we often nullify !!!– Only a small performance gain

Page 21: EECE476: Computer Architecture Lecture 18: Pipelining Control Hazards Chapter 6.6 The University of British ColumbiaEECE 476© 2005 Guy Lemieux.

21

Control Hazard Solution 3:Branch Delay Slots

• Basic idea– Be optimistic and always execute – No stalls, no nullify

• Propose New ISA Rule– Always execute 3 instructions after branch– Here, we say branch has “3 Branch Delay Slots”

• Compiler places useful instructions after a branch– Utilizes “wasted” CPU cycles– Turns “disadvantage” into a “feature” ??

Page 22: EECE476: Computer Architecture Lecture 18: Pipelining Control Hazards Chapter 6.6 The University of British ColumbiaEECE 476© 2005 Guy Lemieux.

22

Code Scheduling for Branch Delay Slots

• Compiler moves instructions to fill delay slots

UnusedDelaySlots

FilledDelaySlot

Delay Slot

Delay Slot

Delay Slot

Sub $t4,$t5,$t6

Sub $t4,$t5,$t6

Add $s1,$s2,$s3

Page 23: EECE476: Computer Architecture Lecture 18: Pipelining Control Hazards Chapter 6.6 The University of British ColumbiaEECE 476© 2005 Guy Lemieux.

23

Code Scheduling for Branch Delay Slots

• Compiler checks dependencies to verify it is safe to Compiler checks dependencies to verify it is safe to move instructionsmove instructions

UnusedDelaySlots

FilledDelaySlot

Delay Slot

Delay Slot

Delay Slot

Sub $t4,$t5,$t6

Sub $t4,$t5,$t6

Add $s1,$s2,$s3

Page 24: EECE476: Computer Architecture Lecture 18: Pipelining Control Hazards Chapter 6.6 The University of British ColumbiaEECE 476© 2005 Guy Lemieux.

24

Compiler Checks

• When we move an instruction– Check move is valid – involves checking

dependencies– Reordering code is tricky subject

• Eg, Consider moving instruction forward in code/time from location A to B– Destination register must be ok to move

• Check no instructions between A and B use the destination register

– Source register(s) must be ok to move• Check no instructions between A and B change the source

registers

Page 25: EECE476: Computer Architecture Lecture 18: Pipelining Control Hazards Chapter 6.6 The University of British ColumbiaEECE 476© 2005 Guy Lemieux.

25

Branch Delay SlotPerformance Impact?

Cool idea, but…• Benefit

– Compiler can put 3 useful instructions after each branch

• Compiler difficulty– Only ~50% of 1st delay slots filled with “useful” instruction– Remaining delay slots are even harder!

• Another problem…– A pipeline detail is now exposed to software

• ISA Rule is now tied to current pipeline organization

– Difficult to change pipeline organization in future• Deeper pipeline will need more branch delay slots• If we add more branch delay slots, old software will be incompatible

Page 26: EECE476: Computer Architecture Lecture 18: Pipelining Control Hazards Chapter 6.6 The University of British ColumbiaEECE 476© 2005 Guy Lemieux.

26

Final Word on Branch Delay Slots

• Delay slots once touted as a “feature”– But are now heavily discouraged

• MIPS ISA rules (older design)– Delay slots defined in ISA

• 1 branch delay slot• 1 jump delay slot

– Compiler must insert “NOP” or independent instruction

• NIOS-II ISA rules (more recent design)– 0 branch delay slots– 0 jump delay slots– Hardware choice: either stall or nullify to handle control hazards

Page 27: EECE476: Computer Architecture Lecture 18: Pipelining Control Hazards Chapter 6.6 The University of British ColumbiaEECE 476© 2005 Guy Lemieux.

27

Directly Reducing Penaltyof Control Hazards

• Control hazards demand solutions:– Stalling– Nullify– Branch delay slots– All of these negatively impact performance (in various ways)

• Alternative– Can we directly reduce the negative impact of control hazards?

• Yes!– Execute branch/jump instruction earlier in pipeline– Outcome known sooner– Fetch fewer instructions enter pipeline after branch (before outcome

known)– For BEQ, we must detect “equals” earlier