Top Banner
Savio Chau What We Have Learn About Pipeline So Far Pipelining Helps the Throughput of the Entire Workload But Doesn’t Help the Latency of a Single Task Pipeline Rate is Limited by the Slowest Pipeline Stage Multiple Instructions are Operating Simultaneously Potential Speedup = Number of Pipeline Stages Under The Ideal Situations That All Instructions Are Independent and No Branch Instructions Soon, We Will Learn About Hazards That Degrade The Performance Of The Idea Pipeline
44

What We Have Learn About Pipeline So Far

Mar 21, 2016

Download

Documents

hedwig

What We Have Learn About Pipeline So Far. Pipelining Helps the Throughput of the Entire Workload But Doesn’t Help the Latency of a Single Task Pipeline Rate is Limited by the Slowest Pipeline Stage Multiple Instructions are Operating Simultaneously - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: What We Have Learn About Pipeline So Far

Savio Chau

What We Have Learn About Pipeline So Far• Pipelining Helps the Throughput of the Entire Workload But

Doesn’t Help the Latency of a Single Task

• Pipeline Rate is Limited by the Slowest Pipeline Stage

• Multiple Instructions are Operating Simultaneously

• Potential Speedup = Number of Pipeline Stages Under The Ideal Situations That All Instructions Are Independent and No Branch Instructions

• Soon, We Will Learn About Hazards That Degrade The Performance Of The Idea Pipeline

Page 2: What We Have Learn About Pipeline So Far

Savio Chau

Pipeline Hazards • Pipelining Limitations: Hazards are Situations that Prevent the

Next Instruction from Executing During its Designated Cycle– Structural Hazard:

Resource Conflict When Several Pipelined Instructions Need the Same Functional Unit Simultaneously

– Data Hazard:An Instruction Depends on the Result of a Prior Instruction that is Still in the Pipeline

– Control Hazard:Pipelining of Branches and Other Instructions that Change the PC

• Common Solution:Stall the Pipeline by Inserting “Bubbles” Until the Hazard is Resolved

Page 3: What We Have Learn About Pipeline So Far

Savio Chau

Graphical Representation to Analyze Pipeline Hazards

Instruction Mem Reg Mem RegALU

Operations

Bypass

Page 4: What We Have Learn About Pipeline So Far

Savio Chau

Structural Hazard: Conflict in Resources

Instruction 3 fetching instruction from the same memory

Example: Assuming Instructions and Data Share the Same MemoryLoad reading data from memory

Page 5: What We Have Learn About Pipeline So Far

Savio Chau

Resolution Option 1: Don’t Share the Memory

IM

IM

IM

DM

DM

DM

DMIM

IM

DM

Use different memory for instructions and data (just like the single cycle data path)

Page 6: What We Have Learn About Pipeline So Far

Savio Chau

Resolution Option 2: Using a Two-Port MemoryUse a 2-port memory that has two read output ports and can be read and written at the same time

Load Mem

Mem

Load instruction reading memory from output port #2 of the same memory

Load instruction reading memory from output port #1

Page 7: What We Have Learn About Pipeline So Far

Savio Chau

Resolution Option 2: Using a Two-Port Memory

Store Mem

Mem

Instruction 3 fetching instruction from the same memory

Store writing data to memory

Use a 2-port memory that has two read output ports and can be read and written at the same time

Page 8: What We Have Learn About Pipeline So Far

Savio Chau

Resolution Option 3: Stall the PipelineDelay the start of conflicting successor instructions (i.e., for Load instructions, delay the 3rd succeeding instructions by 3 clocks)

Page 9: What We Have Learn About Pipeline So Far

Savio Chau

To Insert a BubbleDon’t Change PC, Keeps Fetching Same Instruction, Sets All Control Signals in The ID/EX Pipeline Register to Benign Values (0). More discussion about the implementation later.

sub r4, r1 ,r3All ctrl set to 0

All ctrl set to 0

All ctrl set to 0

All ctrl set to 0

All ctrl set to 0

All ctrl set to 0

All ctrl set to 0

All ctrl set to 0

All ctrl set to 0

All ctrl set to 0

All ctrl set to 0

All ctrl set to 0

sub r4, r1 ,r3(refetch)

sub r4, r1 ,r3(refetch)

(execute)

Each refetch creates a bubble

(I.e., do nothting)

(I.e., do nothting)

(I.e., do nothting)

Do not update PC

Page 10: What We Have Learn About Pipeline So Far

Savio Chau

Data Hazard: Dependencies Backwards in Time

Reg

Sub needs r1 2 clocks before add can supply it

Note: The register file design allows date be written in first half of clock cycle and read in the second half of clock cycle

And needs r1 1 clocks before add can supply it

R1 ready for xor

Or gets the data in the same clock when add is done

Page 11: What We Have Learn About Pipeline So Far

Savio Chau

Data Hazard Example

AddIFetch

r2=4r3=6

subIFetch

6+4

r1=3r3=6

subIFetch

r8=-50

3-6

r1=3 r7=40

subIFetch

r1=10

3-40

r1=10r9=60

subIFetch

r4=–3

10-6

0r1=10r11=80

r6=-37

10-8

0 r10=-70

sub

sub

sub

sub

r1r2r3r4r5r6r7r8r9r10r11

346

10

040

060

080

10464

-3040

-5060

-7080

0 0

Correct Answer

Current Value

10

-3

-37

-50

-70

Page 12: What We Have Learn About Pipeline So Far

Savio Chau

Resolution Option 1: HW Stalls

See structural hazard solution 2 for how to generate a bubble

Page 13: What We Have Learn About Pipeline So Far

Savio Chau

Resolution Option 2: Reordering of Instructions

add r5, r6, r7

sub r8, r9, r10

Software inserts independent instructions instead of bubbles. May have to inserts NOP instructions if no independent instructions found.

Page 14: What We Have Learn About Pipeline So Far

Savio Chau

Resolution Option 3: Forwarding• Insight: The Needed Data is Actually Available! It is Contained

in the Pipeline Registers.

Page 15: What We Have Learn About Pipeline So Far

Savio Chau

Hardware Change for Forwarding• Add Paths From Pipeline Registers to Stages That Need the Data • Add Multiplexors to Select The Pipeline Registers• Register File Forwarding: Register Read During Write Gets New

Value (write in 1st half of the clock cycle and read in 2nd half)

RegFile

Page 16: What We Have Learn About Pipeline So Far

Savio Chau

Data Hazard Detection For Forwarding4 types of instruction dependencies cause data hazard:

1a. Rd of instruction in execution = Rs of instruction in operand fetch(EX/MEM.RegisterRd = ID/EX.RegisterRs)1b. Rd of instruction in execution = Rt of instruction in operand fetch(EX/MEM.RegisterRd = ID/EX.RegisterRt)2a. Rd of instruction writing back = Rs of instruction in execution(MEM/WB.RegisterRd = ID/EX.RegisterRs)2b. Rd of instruction writing back = Rt of instruction in execution(MEM/WB.RegisterRd = ID/EX.RegisterRt)

add r1 ,r2, r3 IM Reg DM RegALU

IM Reg DM RegALU

IM Reg DM RegALU

sub r4,r1 ,r3

and r6,r1 ,r7

Type 1a Type 2a

r1 not valid yet

r1 not valid yet

Page 17: What We Have Learn About Pipeline So Far

Savio Chau

Forwarding Control

• For Mux A– Select ALU operands from previous ALU result in EX/MEM (Type 1a)

if (EX/MEM.RegWrite and (EX/MEM.RegRd 0) and (EX/MEM.RegRd = ID/EX.RegRs))– Select ALU operands from MEM/WB (Type 2a)

if (MEM/WB.RegWrite and (MEM/WB.RegRd 0) and (MEM/WB.RegRd = ID/EX.RegRs))• For Mux B

– Same as Mux A except replacing Rs with Rt

Control Output of the Forwarding Unit

RegFile

Forwarding Unit

exmwb

mwb wb

Control

rdrd

rs

Mux A

Mux B

Data MemoryA

LU

Mux

rdrt

Fwd A

Fwd B

ID/EX

EX/MEM

MEM/WB Mu

x

0

12

012

Page 18: What We Have Learn About Pipeline So Far

Savio Chau

Forwarding Exampleadd r1 ,r2, r3sub r4, r1 ,r3and r6, r7 ,r1

RegFile

exmwb

mwb wb

Control

Mux A

Mux B

Data MemoryA

LU

Mux

Fwd A

Fwd B

ID/EX

EX/MEM

MEM/WB

Mux add

r1

r2

r3

A=R[rs]

B=R[rt]

A+B add

r1

sub

r4

r1

r3

B=R[rt]

A=R[rs]

A - B A+B add

r1

sub

r6

r7

r1

B=R[rt]

A=R[rs]

A • B

A+B

and

A-B

r4Forwarding

Unitrs

rdrt rd rd

Type 1a Hazard Type 2b Hazard

01

10

Page 19: What We Have Learn About Pipeline So Far

Savio Chau

One More Problem

Answer: Forward the EX/MEM because it is more update than MEM/WB. Therefore, MEM/WB is forwarded only if rd in all three stages are not the same. That is:

– For Mux A, Select ALU operands from MEM/WB (Type 2a): if (MEM/WB.RegWrite and (MEM/WB.RegRd 0) and (EX/MEM.RegRd ID/EX.RegRd) and (MEM/WB.RegRd = ID/EX.RegRs))

– For Mux B: Same as Mux A except replacing Rs with Rt (Type 2b)

Question: If Rd is used Repeatedly such that rd in all three stages are the same (i.e., MEM/WB.RegRd = EX/MEM.RegRd = ID/EX.RegRs (or ID/EX.RegRt)). In that case, should EX/MEM or MEM/WB be forwarded?

RegFile

Forwarding Unit

exmwb

mwb wb

Control

rdrd

rs

Mux A

Mux B

Data MemoryA

LU

Mux

rdrt

Fwd A

Fwd B

ID/EX

EX/MEM

MEM/WB Mu

x

0

12

012

Page 20: What We Have Learn About Pipeline So Far

Savio Chau

Forwarding Removes Data Hazard in Most Cases

add r1, r2, r3

add r4, r1, r3

add r5, r4, r1

sw r5, 0(r4)

Page 21: What We Have Learn About Pipeline So Far

Savio Chau

Except in One Case: lw Instruction Problem: The lw instruction is still reading memory when the sub instruction

needs the data for EX. Still need to handle the 1 hazard cycleWhy not forward the EX/MEM register to ALU? Would that remove the 1 hazard cycle?

Page 22: What We Have Learn About Pipeline So Far

Savio Chau

The Case Forwarding Can’t Avoid Stallinglw r1 , 0(r2)sub r4, r1 ,r3and r6, r7 ,r1

RegFile

exmwb

mwb wb

Control

Mux A

Mux B

Data MemoryA

LU

Mux

Fwd A

Fwd B

ID/EX

EX/MEM

MEM/WB

Mux lw

r1

r2

r3

A=R[rs]

addr

Forwarding Unit

rd

Problem: lw followed by R-type – the lw instruction is still reading memory when the sub instruction needs the data for EX. Need to stall 1 cycle

lw

r1

add

r4

r1

r3

B=R[rt]

A=R[rs]

A+ B addr

Type 1a Hazard, but cannot forward EX/MEM output. It is mem addr, not data, for lw

rs

rdrt rd

lwadd

Mem[addr]

Memory data for lw

r1

10

Forwarded as Type 2a

Page 23: What We Have Learn About Pipeline So Far

Savio Chau

Option 1: Software Solution• Software inserts independent instructions worst case inserts

NOP instructions

Page 24: What We Have Learn About Pipeline So Far

Savio Chau

Option 2: Hardware Solution

Do nothing

Do nothing

Do nothing

Do nothing

Already in reg file

• Control logic checks for data hazard and stall one cycle (i.e., insert a bubble) if necessary

Page 25: What We Have Learn About Pipeline So Far

Savio Chau

Hardware to Stall The Pipeline

• Step 1: Detecting the hazard (check if lw is being executed and if the memory data read by lw will be loaded to one of the operands in the next instruction)

– Stall = if (ID/EX.MemRead and ((ID/EX.rt = IF/ID.rs) or (ID/EX.rt = IF/ID.rt))) • Step 2: If Stall is true

– Do not fetch the next instruction by disabling the writing to PC and IF/ID registers– Disable all control signals of the current instruction

RegFile

Forwarding Unit

exmwb

mwb wb

rdrd

rs

Mux A

Mux B

Data MemoryA

LU

Mux

rdrt

Fwd A

Fwd B

ID/EX

EX/MEM

MEM/

WB Mux

Mux

Control

0

Hazard Detect

IF/ID

Instr Mem

PC

rs

rdrt

rtrt

IF/IDW

r

PCW

r

ID/EX.MemRead

ID/EX.rt

IF/ID.rt

IF/ID.rs

IF/ID.opcode

Page 26: What We Have Learn About Pipeline So Far

Savio Chau

ID/EX

Stalling The Pipeline

RegFile

Forwarding Unit

rdrd

rs

Mux A

Mux B

Data MemoryA

LU

Mux

rdrt

Fwd B

EX/MEM

MEM/

WB Mux

Mux

Control

0

Hazard Detect

IF/ID

Instr Mem

PC

rs

rdrt

rtrt

IF/IDW

r

PC

Wr

ID/EX.MemRead

ID/EX.rt

IF/ID.rt

IF/ID.rs

IF/ID.op

lw r1, 0(r2)sub r4, r1 ,r3and r6, r7 ,r1or r8, r1 ,r9

mwb wb

Fwd A

lwsub

ID/EX.MemRead = 1 lw instrcution

Sub

ID/EX.rt = R1

IF/ID.rs = R

1 MemRead = 1, MemWr = 0

RegWr = 1

exmwb

Page 27: What We Have Learn About Pipeline So Far

Savio Chau

Stalling The Pipeline

RegFile

Forwarding Unit

mwb wb

rdrd

rs

Mux A

Mux B

Data MemoryA

LU

Mux

rdrt

Fwd A

Fwd B

ID/EX

EX/MEM

MEM/

WB Mux

Mux

Control

0

Hazard Detect

IF/ID

Instr Mem

PC

rs

rdrt

rtrt

IF/IDW

r

PC

Wr

ID/EX.MemRead

ID/EX.rt

IF/ID.rt

IF/ID.rs

IF/ID.op

lw r1, 0(r2)sub r4, r1 ,r3and r6, r7 ,r1or r8, r1 ,r9

PCW

r=0

lwsub

ID/EX.MemRead = 1 lw instrcution

Sub

ID/EX.rt = R1

IF/ID.rs = R

1

IF/IDW

r = 0

exmwb

MemRead = 1, MemWr = 0

RegWr = 1

Page 28: What We Have Learn About Pipeline So Far

Savio Chau

Stalling The Pipeline

RegFile

Forwarding Unit

exmwb wb

rdrd

rs

Mux A

Mux B

Data MemoryA

LU

Mux

rdrt

Fwd A

Fwd B

ID/EX

EX/MEM

MEM/

WB Mux

Mux

Control

0

Hazard Detect

IF/ID

Instr Mem

PC

rs

rdrt

rtrt

IF/IDW

r

PC

Wr

ID/EX.MemRead

ID/EX.rt

IF/ID.rt

IF/ID.rs

IF/ID.op

lw r1, 0(r2)sub r4, r1 ,r3and r6, r7 ,r1or r8, r1 ,r9

lw

subSub

IF/ID.rs = R

1 MemRead = 0, MemWr = 0

RegWr = 0

mwb M

emR

ead = 1M

emW

r = 0RegWr = 1

Rea

ding same instructio

n agai

n

subN

ot doing anythi

ng (i.e., it

is a bubble

!)

Page 29: What We Have Learn About Pipeline So Far

Savio Chau

Stalling The Pipeline

RegFile

Forwarding Unit

mwb

rdrd

rs

Mux A

Mux B

Data MemoryA

LU

Mux

rdrt

Fwd A

Fwd B

ID/EX

EX/MEM

MEM/

WB Mux

Mux

Control

0

Hazard Detect

IF/ID

Instr Mem

PC

rs

rdrt

rtrt

IF/IDW

r

PC

Wr

ID/EX.MemRead

ID/EX.rt

IF/ID.rt

IF/ID.rs

IF/ID.op

lw r1, 0(r2)sub r4, r1 ,r3and r6, r7 ,r1or r8, r1 ,r9

lwsub

Mem

Read = 0

Mem

Wr = 0

RegWr = 0 RegWr = 1

and

wb

exmwb

MemRead = 0, MemWr = 0

RegWr = 1

sub

bubble

Page 30: What We Have Learn About Pipeline So Far

Savio Chau

Stalling The Pipeline

RegFile

Forwarding Unit

rdrd

rs

Mux A

Mux B

Data MemoryA

LU

Mux

rdrt

Fwd A

Fwd B

ID/EX

EX/MEM

MEM/

WB Mux

Mux

Control

0

Hazard Detect

IF/ID

Instr Mem

PC

rs

rdrt

rtrt

IF/IDW

r

PC

Wr

ID/EX.MemRead

ID/EX.rt

IF/ID.rt

IF/ID.rs

IF/ID.op

lw r1, 0(r2)sub r4, r1 ,r3and r6, r7 ,r1or r8, r1 ,r9

sub

Mem

Read = 0

Mem

Wr = 0

RegWr = 1 RegWr = 0

and

wb

exmwb

MemRead = 0, MemWr = 0

RegWr = 1

mwb

or lw data

sub

bubble

Page 31: What We Have Learn About Pipeline So Far

Savio Chau

Stalling The Pipeline

RegFile

Forwarding Unit

rdrd

rs

Mux A

Mux B

Data MemoryA

LU

Mux

rdrt

Fwd A

Fwd B

ID/EX

EX/MEM

MEM/

WB Mux

Mux

Control

0

Hazard Detect

IF/ID

Instr Mem

PC

rs

rdrt

rtrt

IF/IDW

r

PC

Wr

ID/EX.MemRead

ID/EX.rt

IF/ID.rt

IF/ID.rs

IF/ID.op

lw r1, 0(r2)sub r4, r1 ,r3and r6, r7 ,r1or r8, r1 ,r9

sub

Mem

Read = 0

Mem

Wr = 0

RegWr = 1 RegWr = 1

and

wb

MemRead = 0, MemWr = 0

RegWr = 1

mwb

lw data

or

exmwb

The bubble has not changed any state of the pipeline

Page 32: What We Have Learn About Pipeline So Far

Savio Chau

Stalling The Pipeline

RegFile

Forwarding Unit

rdrd

rs

Mux A

Mux B

Data MemoryA

LU

Mux

rdrt

Fwd A

Fwd B

ID/EX

EX/MEM

MEM/

WB Mux

Mux

Control

0

Hazard Detect

IF/ID

Instr Mem

PC

rs

rdrt

rtrt

IF/IDW

r

PC

Wr

ID/EX.MemRead

ID/EX.rt

IF/ID.rt

IF/ID.rs

IF/ID.op

lw r1, 0(r2)sub r4, r1 ,r3and r6, r7 ,r1or r8, r1 ,r9

Mem

Read = 0

Mem

Wr = 0

RegWr = 1 RegWr = 1

and

wb

or

exmwb

mwb

lw datasub data

The bubble has not changed any state of the pipeline

Page 33: What We Have Learn About Pipeline So Far

Savio Chau

Control Hazard: Change in Control Flow Due to Branching

beq $1,$ 3,36

ld $4, $7, 100

All ctrl set to 0

All ctrl set to 0

All ctrl set to 0

All ctrl set to 0

All ctrl set to 0

All ctrl set to 0

All ctrl set to 0

All ctrl set to 0

All ctrl set to 0

All ctrl set to 0

All ctrl set to 0

All ctrl set to 0

Result of comparison branch to target

Waiting for result of comparison

Waiting for result of comparison

Waiting for result of comparison

Branch target

Have to stall 3 Cycles before branch decision is made

Page 34: What We Have Learn About Pipeline So Far

Savio Chau

Option 1: Static Branch PredictionPredict Branch Not Taken

Result of comparison not to branch

Assume branch not taken

Assume branch not taken

Assume branch not taken

Prediction is correct, branching does not cause any penalty

PC=12

PC=16

PC=20

PC=24

PC=28 or $15,$7,$3

Page 35: What We Have Learn About Pipeline So Far

Savio Chau

Penalty of Wrong Prediction

Assume branch not taken

Assume branch not taken

Assume branch not taken

Branch target

PC=12

PC=16

PC=20

PC=24

PC=36 Result of comparison branch taken

Prediction is incorrect, need to flush pipe, penalty = without branch prediction (3 cycles). Note: Need to make sure no instructions after beq has updated the register file or memory.

Page 36: What We Have Learn About Pipeline So Far

Savio Chau

Example of Wrong Prediction Penalty(e.g.,Incorrectly Predict Branch Not Taken)

rt

rd

ID/E

XPC Addr

InstructionMemory

Rd Reg1RdReg2

RegistersWr RegWr Data

AddrRd Data

DataMemory

Wr Data

PCsrc IF/ID

4 Reg

Writ

e

ALU

src

ALUop

RegDst

Branch

Mem

Wr

Mem

toR

zero

out

<15:0>

Mem

Rd

rs

A

Zero

ALUout

0wb

exm

wb

IF/ID

mwb

EX/M

EM

MEM

/WB

1

rt

Extrt

rdMux

Con

trol

0

1

Add A

dd

B

ALUControl

Mux

ALU

0

1

rd

BMux

0

1

rd

ID/EX EX/MEM MEM/WB--/IF

Mux

ALU

out

mdo

<10:0>

<31:0>

<31:26>

Clock 2 Clock 3 Clock 4 Clock 5Clock 1

x4

See Set 8 Class Example

flush

Ck PC 

1 00 lw $2, 0($3)2 04 add $4, $0, $53 08 sw $6, 4($3)4 12 beq $7, $2, 55 16 add $8, $2, $56 20 add $9, $2, $47 24 sub $10, $4, $78 28 add $11, $7, $89 32 j 118 36 sub $8, $2, $59 40 sub $9, $2, $4

Branch Hazard Detect

flushflushflush

Page 37: What We Have Learn About Pipeline So Far

Savio Chau

Example of Wrong Prediction Penalty(e.g.,Incorrectly Predict Branch Not Taken)

Reset by (Brand Zero) No permanent change to Register File or Memory Pipe flushed

Page 38: What We Have Learn About Pipeline So Far

Savio Chau

To Reduce Branch Penalty

1st clock delay

2nd clock delay

3rd clock delay

PC+4

Ext(imm16)

Branch addressNeed a separate comparator since ALU is computing branch address

Let’s Take a Look at Why 3 Cycles of Penalty Are Needed – the decision is made after the execution stage

Page 39: What We Have Learn About Pipeline So Far

Savio Chau

To Reduce Branch Penalty

1st clock delay

PC+4

Ext(imm16)

Branch address

But in fact all the information required to make decision can be obtained in the “decode/operand fetch” stage. That means we can move address calculation forward.

Page 40: What We Have Learn About Pipeline So Far

Savio Chau

Predict branch taken

Pipeline After Branch Penalty Reduction

lw $4, 100($7)

Penalty = 1 cycle, instead of 3 cycles

Predict branch not taken

Branch Decision and Calculate Address Done in Decode Stage

All ctrl set to 0

Prediction is wrong. But need to flush IF/ID only

All ctrl set to 0

All ctrl set to 0

All ctrl set to 0

add $5, $6, $7

sw $5, 100($7)

PC

20

24

56

60

64

Page 41: What We Have Learn About Pipeline So Far

Savio Chau

Add

r for

beq

= 2

0

Flushing Pipe in the New Approach if Prediction is Wrong

Ctr

l sig

nals

Ctr

l sig

nals

Branch Hazard Detection

Ctr

l sig

nals

Ctr

l sig

nals

beq

C

trl f

or b

eq

… …

and fetched as if branch

not taken

Ctr

l for

beq

Ctr

l for

and

beq

and

flush

But Zero has decided that the branch should have taken

00…

000

…0

Ctr

l for

lw

00…

0

Ctr

l for

beq

beq

lw

00…

0

Ctr

l for

beq

Ctr

l for

and

Ctr

l for

lw

00…

0

Ctr

l for

add

beq

Ctr

l for

and

lw

00…

0

add

Add

r for

and

= 2

0A

ddr f

or lw

= 5

6A

ddr f

or a

dd =

60

Add

r for

sw

= 6

4

Page 42: What We Have Learn About Pipeline So Far

Savio Chau

Need Both Styles of Branch Prediction:Branch Taken and Branch Not Taken

• MIPS code that favors branch taken:Loop: mult $19,$10 # (HI, LO) regs = i * 4

mflo $9 # reg $9 least sig. 32 product bitslw $8, Aaddr($9) # Temporary reg $8 = A[i*4]add $17,$17,$8 # g = g + A[ i]add $19,$19,$20 # i = i + jbne $19,$18, Loop # goto Loop if i != h

• MIPS code that favors branch not taken:Loop: mult $19,$10 # (HI, LO) regs = i * 4

mflo $9 # reg $9 least sig. 32 product bitslw $8, Aaddr($9) # Temporary reg $8 = A[i*4]add $17,$17,$8 # g = g + A[ i]add $19,$19,$20 # i = i + jbeq $19,$18, Exit # if i = h, get out of the loopJ Loop # otherwise, goto Loop

Branch to Loop most of the time

Do not branch to Exit most of the time

Page 43: What We Have Learn About Pipeline So Far

Savio Chau

Option 2: Dynamic Branch Prediction• Rather than always assuming branch not taken, use a branch

history table (also call branch prediction buffer) to achieve better prediction

• The branch history table is implemented as a one or two bit registerExample: state transition of a 2-bit history table

State 00predict taken

State 01predict taken

State 10predict not taken

State 11predict not taken

taken not taken

takennot taken

taken

not takentaken

If branch test is in Instruction N, then:predict taken means PC set to the target address by default, and set to N+4 if wrongpredict not taken means PC set to N+4 by default, and set to target address if wrong

Page 44: What We Have Learn About Pipeline So Far

Savio Chau

Option 3: Delayed Branch• Make use of the time while the branch decision is being made: Execute

an unrelated instruction subsequent to the branch instruction• Where To Get Instructions to Fill Branch Delay Slot? Three Strategies:

• Compiler Effectiveness for Single Branch Delay Slot:– Fills About 60% of Branch Delay Slots– About 80% of Instructions Executed in Branch Delay Slots Useful in

Computation– About 50% (60% x 80%) of Slots Usefully Filled

• Worst Case, Compiler Inserts NOP into Branch Delay

Before Branch Instruction(best if possible)

From Target(good if always branch)

From Fall Through(good if always don’t branch)

add $s1, $s2, $s3

If $s2=0 then

Delay slot

add $s1, $s2, $s3

If $s1=0 then

Delay slot

sub $t4, $t5, $t6 add $s1, $s2, $s3

If $s1=0 then

Delay slot

sub $t4, $t5, $t6

add $s1, $s2, $s3

sub $t4, $t5, $t6

sub $t4, $t5, $t6