Top Banner
CS 152 Computer Architecture and Engineering Lecture 4 - Pipelining Krste Asanovic Electrical Engineering and Computer Sciences University of California at Berkeley http://www.eecs.berkeley.edu/~krste http://inst.eecs.berkeley.edu/~cs 152
33

CS 152 Computer Architecture and Engineering Lecture 4 - Pipelining Krste Asanovic Electrical Engineering and Computer Sciences University of California.

Dec 19, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: CS 152 Computer Architecture and Engineering Lecture 4 - Pipelining Krste Asanovic Electrical Engineering and Computer Sciences University of California.

CS 152 Computer Architecture and

Engineering

Lecture 4 - Pipelining

Krste AsanovicElectrical Engineering and Computer Sciences

University of California at Berkeley

http://www.eecs.berkeley.edu/~krstehttp://inst.eecs.berkeley.edu/~cs152

Page 2: CS 152 Computer Architecture and Engineering Lecture 4 - Pipelining Krste Asanovic Electrical Engineering and Computer Sciences University of California.

2/3/2009 CS152-Spring’09 2

Last time in Lecture 3

• Microcoding became less attractive as gap between RAM and ROM speeds reduced

• Complex instruction sets difficult to pipeline, so difficult to increase performance as gate count grew

• Iron-law explains architecture design space– Trade instructions/program, cycles/instruction, and time/cycle

• Load-Store RISC ISAs designed for efficient pipelined implementations

– Very similar to vertical microcode

– Inspired by earlier Cray machines

Page 3: CS 152 Computer Architecture and Engineering Lecture 4 - Pipelining Krste Asanovic Electrical Engineering and Computer Sciences University of California.

2/3/2009 CS152-Spring’09 3

“Iron Law” of Processor Performance

Time = Instructions Cycles Time Program Program * Instruction * Cycle

– Instructions per program depends on source code, compiler technology, and ISA

– Cycles per instructions (CPI) depends upon the ISA and the microarchitecture

– Time per cycle depends upon the microarchitecture and the base technology

Microarchitecture CPI cycle time

Microcoded >1 short

Single-cycle unpipelined 1 long

Pipelined 1 shortThis lecture

Page 4: CS 152 Computer Architecture and Engineering Lecture 4 - Pipelining Krste Asanovic Electrical Engineering and Computer Sciences University of California.

2/3/2009 CS152-Spring’09 4

An Ideal Pipeline

• All objects go through the same stages

• No sharing of resources between any two stages

• Propagation delay through all pipeline stages is equal

• The scheduling of an object entering the pipeline is not affected by the objects in other stages

stage1

stage2

stage3

stage4

These conditions generally hold for industrial assembly lines. But can an instruction pipeline satisfy the last condition?

Page 5: CS 152 Computer Architecture and Engineering Lecture 4 - Pipelining Krste Asanovic Electrical Engineering and Computer Sciences University of California.

2/3/2009 CS152-Spring’09 5

Pipelined MIPS

To pipeline MIPS:

• First build MIPS without pipelining with CPI=1

• Next, add pipeline registers to reduce cycle time while maintaining CPI=1

Page 6: CS 152 Computer Architecture and Engineering Lecture 4 - Pipelining Krste Asanovic Electrical Engineering and Computer Sciences University of California.

2/3/2009 CS152-Spring’09 6

Pipelined Datapath

Clock period can be reduced by dividing the execution of an instruction into multiple cycles

tC > max {tIM, tRF, tALU, tDM, tRW} ( = tDM probably)

However, CPI will increase unless instructions are pipelined

write-backphase

fetchphase

executephase

decode & Reg-fetchphase

memoryphase

addr

wdata

rdataDataMemory

weALU

ImmExt

0x4

Add

addrrdata

Inst.Memory

rd1

GPRs

rs1rs2

wswd rd2

we

IRPC

Page 7: CS 152 Computer Architecture and Engineering Lecture 4 - Pipelining Krste Asanovic Electrical Engineering and Computer Sciences University of California.

2/3/2009 CS152-Spring’09 7

Technology Assumptions

Thus, the following timing assumption is reasonable

• A small amount of very fast memory (caches) backed up by a large, slower memory

• Fast ALU (at least for integers)

• Multiported Register files (slower!)

tIM tRFtALU tDM tRW

A 5-stage pipelined Harvard architecture will be the focus of our detailed design

Page 8: CS 152 Computer Architecture and Engineering Lecture 4 - Pipelining Krste Asanovic Electrical Engineering and Computer Sciences University of California.

2/3/2009 CS152-Spring’09 8

5-Stage Pipelined Execution

time t0 t1 t2 t3 t4 t5 t6 t7 . . . .instruction1 IF1 ID1 EX1 MA1 WB1

instruction2 IF2 ID2 EX2 MA2 WB2

instruction3 IF3 ID3 EX3 MA3 WB3

instruction4 IF4 ID4 EX4 MA4 WB4

instruction5 IF5 ID5 EX5 MA5 WB5

Write-Back (WB)

I-Fetch (IF)

Execute (EX)

Decode, Reg. Fetch (ID)

Memory (MA)

addr

wdata

rdataDataMemory

weALU

ImmExt

0x4

Add

addrrdata

Inst.Memory

rd1

GPRs

rs1rs2

wswdrd2

we

IRPC

Page 9: CS 152 Computer Architecture and Engineering Lecture 4 - Pipelining Krste Asanovic Electrical Engineering and Computer Sciences University of California.

2/3/2009 CS152-Spring’09 9

5-Stage Pipelined ExecutionResource Usage Diagram

time t0 t1 t2 t3 t4 t5 t6 t7 . . . .IF I1 I2 I3 I4 I5ID I1 I2 I3 I4 I5EX I1 I2 I3 I4 I5MA I1 I2 I3 I4 I5WB I1 I2 I3 I4 I5

Reso

urc

es

Write-Back (WB)

I-Fetch (IF)

Execute (EX)

Decode, Reg. Fetch (ID)

Memory (MA)

addr

wdata

rdataDataMemory

weALU

ImmExt

0x4

Add

addrrdata

Inst.Memory

rd1

GPRs

rs1rs2

wswdrd2

we

IRPC

Page 10: CS 152 Computer Architecture and Engineering Lecture 4 - Pipelining Krste Asanovic Electrical Engineering and Computer Sciences University of California.

2/3/2009 CS152-Spring’09 10

Pipelined Execution:ALU Instructions

IRIR IR

31

PCA

B

Y

R

MD1 MD2

addrinst

InstMemory

0x4Add

IR

ImmExt

ALU

rd1

GPRs

rs1rs2

wswd rd2

we

wdata

addr

wdata

rdataData Memory

we

Not quite correct!

We need an Instruction Reg (IR) for each stage

Page 11: CS 152 Computer Architecture and Engineering Lecture 4 - Pipelining Krste Asanovic Electrical Engineering and Computer Sciences University of California.

2/3/2009 CS152-Spring’09 11

Pipelined MIPS Datapathwithout jumps

IRIR IR

31

PCA

B

Y

R

MD1 MD2

addrinst

InstMemory

0x4Add

IR

ImmExt

ALU

rd1

GPRs

rs1rs2

wswd rd2

we

Data Memory

wdata

addr

wdata

rdata

we

OpSel

ExtSel BSrc

WBSrcMemWrite

RegDstRegWrite

F D E M W

Control Points Need to Be Connected

Page 12: CS 152 Computer Architecture and Engineering Lecture 4 - Pipelining Krste Asanovic Electrical Engineering and Computer Sciences University of California.

2/3/2009 CS152-Spring’09 12

How Instructions can Interact with each other in a pipeline

• An instruction in the pipeline may need a resource being used by another instruction in the pipeline structural hazard

• An instruction may depend on something produced by an earlier instruction

– Dependence may be for a data value data hazard

– Dependence may be for the next instruction’s address

control hazard (branches, exceptions)

Page 13: CS 152 Computer Architecture and Engineering Lecture 4 - Pipelining Krste Asanovic Electrical Engineering and Computer Sciences University of California.

2/3/2009 CS152-Spring’09 13

Data Hazards

...r1 r0 + 10r4 r1 + 17...

r1 is stale. Oops!

r1 …r4 r1…

IRIR IR31

PCA

B

Y

R

MD1 MD2

addrinst

InstMemory

0x4Add

IR

ImmExt

ALU

rd1

GPRs

rs1rs2

wswd rd2

we

wdata

addr

wdata

rdataData Memory

we

Page 14: CS 152 Computer Architecture and Engineering Lecture 4 - Pipelining Krste Asanovic Electrical Engineering and Computer Sciences University of California.

2/3/2009 CS152-Spring’09 14

CS152 Administrivia

• PS 1 due Tuesday Feb 10 in class• Section covering PS 1 on Wednesday Feb 11

– Room/time TBD

• First Quiz on Thursday Feb 12– In class, closed-book, no computers or calculators– Covers lectures 1-5 (this week’s lectures)

• Lecture 7, Tuesday Feb 17 in 320 Soda• Lecture 8, Thursday Feb 19 back in 306 Soda

• See website for full schedule

Page 15: CS 152 Computer Architecture and Engineering Lecture 4 - Pipelining Krste Asanovic Electrical Engineering and Computer Sciences University of California.

2/3/2009 CS152-Spring’09 15

Resolving Data Hazards (1)

Strategy 1:

Wait for the result to be available by freezing earlier pipeline stages interlocks

Page 16: CS 152 Computer Architecture and Engineering Lecture 4 - Pipelining Krste Asanovic Electrical Engineering and Computer Sciences University of California.

2/3/2009 CS152-Spring’09 16

Feedback to Resolve Hazards

• Later stages provide dependence information to earlier stages which can stall (or kill) instructions

FB1

stage1

stage2

stage3

stage4

FB2 FB3 FB4

• Controlling a pipeline in this manner works provided the instruction at stage i+1 can complete without any interference from instructions in stages 1 to i (otherwise deadlocks may occur)

Page 17: CS 152 Computer Architecture and Engineering Lecture 4 - Pipelining Krste Asanovic Electrical Engineering and Computer Sciences University of California.

2/3/2009 CS152-Spring’09 17

IRIR IR

31

PCA

B

Y

R

MD1 MD2

addrinst

InstMemory

0x4Add

IR

ImmExt

ALU

rd1

GPRs

rs1rs2

wswd rd2

we

wdata

addr

wdata

rdataData Memory

we

nop

Interlocks to resolve Data Hazards

...r1 r0 + 10r4 r1 + 17...

Stall Condition

Page 18: CS 152 Computer Architecture and Engineering Lecture 4 - Pipelining Krste Asanovic Electrical Engineering and Computer Sciences University of California.

2/3/2009 CS152-Spring’09 18

stalled stages

timet0 t1 t2 t3 t4 t5 t6 t7 . . . .

IF I1 I2 I3 I3 I3 I3 I4 I5ID I1 I2 I2 I2 I2 I3 I4 I5EX I1 nop nop nop I2 I3 I4 I5MA I1 nop nop nop I2 I3 I4 I5WB I1 nop nop nop I2 I3 I4 I5

Stalled Stages and Pipeline Bubbles

timet0 t1 t2 t3 t4 t5 t6 t7 . . . .

(I1) r1 (r0) + 10 IF1 ID1 EX1 MA1 WB1

(I2) r4 (r1) + 17 IF2 ID2 ID2 ID2 ID2 EX2 MA2 WB2

(I3) IF3 IF3 IF3 IF3 ID3 EX3 MA3 WB3

(I4) IF4 ID4 EX4 MA4 WB4

(I5) IF5 ID5 EX5 MA5 WB5

Resource Usage

nop pipeline bubble

Page 19: CS 152 Computer Architecture and Engineering Lecture 4 - Pipelining Krste Asanovic Electrical Engineering and Computer Sciences University of California.

2/3/2009 CS152-Spring’09 19

IRIR IR31

PCA

B

Y

R

MD1 MD2

addrinst

InstMemory

0x4Add

IR

ImmExt

ALU

rd1

GPRs

rs1rs2

wswd rd2

we

wdata

addr

wdata

rdataData Memory

we

nop

Interlock Control Logic

Compare the source registers of the instruction in the decode stage with the destination register of the uncommitted instructions.

stallCstall

ws

rsrt ?

Page 20: CS 152 Computer Architecture and Engineering Lecture 4 - Pipelining Krste Asanovic Electrical Engineering and Computer Sciences University of California.

2/3/2009 CS152-Spring’09 20

Cdest

Interlock Control Logicignoring jumps & branches

Should we always stall if the rs field matches some rd?

IRIR IR

PCA

B

Y

R

MD1 MD2

addrinst

InstMemory

0x4Add

IR

ImmExt

ALU

rd1

GPRs

rs1rs2

wswd rd2

we

wdata

addr

wdata

rdataData Memory

we

31

nop

stallCstall

ws

rsrt ?

we

re1 re2

Cre

ws we wsCdest

Cdest

we

not every instruction writes a register we not every instruction reads a register re

Page 21: CS 152 Computer Architecture and Engineering Lecture 4 - Pipelining Krste Asanovic Electrical Engineering and Computer Sciences University of California.

2/3/2009 CS152-Spring’09 21

Source & Destination Registers

source(s) destinationALU rd (rs) func (rt) rs, rt rdALUi rt (rs) op imm rs rtLW rt M [(rs) + imm] rs rtSW M [(rs) + imm] (rt) rs, rtBZ cond (rs)

true: PC (PC) + imm rs false: PC (PC) + 4 rs

J PC (PC) + immJAL r31 (PC), PC (PC) + imm 31 JR PC (rs) rsJALR r31 (PC), PC (rs) rs 31

R-type: op rs rt rd func

I-type: op rs rt immediate16

J-type: op immediate26

Page 22: CS 152 Computer Architecture and Engineering Lecture 4 - Pipelining Krste Asanovic Electrical Engineering and Computer Sciences University of California.

2/3/2009 CS152-Spring’09 22

Deriving the Stall Signal

Cdest

ws = Case opcodeALU rdALUi, LW rtJAL, JALR R31

we = Case opcodeALU, ALUi, LW (ws 0)

JAL, JALR on... off

Cre

re1 = Case opcodeALU, ALUi,

onoff

re2 = Case opcodeonoff

LW, SW, BZ, JR, JALRJ, JAL

ALU, SW...

Cstall

stall = ((rsD =wsE).weE + (rsD =wsM).weM + (rsD =wsW).weW) . re1D +((rtD =wsE).weE + (rtD =wsM).weM + (rtD =wsW).weW) . re2D

This

is no

t

the

full s

tory

!

Page 23: CS 152 Computer Architecture and Engineering Lecture 4 - Pipelining Krste Asanovic Electrical Engineering and Computer Sciences University of California.

2/3/2009 CS152-Spring’09 23

Hazards due to Loads & Stores

...M[(r1)+7] (r2) r4 M[(r3)+5]...

IRIR IR

31

PCA

B

Y

R

MD1 MD2

addrinst

InstMemory

0x4Add

IR

ImmExt

ALU

rd1

GPRs

rs1rs2

wswd rd2

we

wdata

addr

wdata

rdataData Memory

we

nop

Stall Condition

Is there any possible data hazardin this instruction sequence?

What if(r1)+7 = (r3)+5 ?

Page 24: CS 152 Computer Architecture and Engineering Lecture 4 - Pipelining Krste Asanovic Electrical Engineering and Computer Sciences University of California.

2/3/2009 CS152-Spring’09 24

Load & Store Hazards

However, the hazard is avoided because our memory system completes writes in one cycle !

Load/Store hazards are sometimes resolved in the pipeline and sometimes in the memory system itself.

More on this later in the course.

...M[(r1)+7] (r2) r4 M[(r3)+5]...

(r1)+7 = (r3)+5 data hazard

Page 25: CS 152 Computer Architecture and Engineering Lecture 4 - Pipelining Krste Asanovic Electrical Engineering and Computer Sciences University of California.

2/3/2009 CS152-Spring’09 25

Resolving Data Hazards (2)

Strategy 2:

Route data as soon as possible after it is calculated to the earlier pipeline stage

bypass

Page 26: CS 152 Computer Architecture and Engineering Lecture 4 - Pipelining Krste Asanovic Electrical Engineering and Computer Sciences University of California.

2/3/2009 CS152-Spring’09 26

Bypassing

Each stall or kill introduces a bubble in the pipelineCPI > 1

time t0 t1 t2 t3 t4 t5 t6 t7 . . . .(I1) r1 r0 + 10 IF1 ID1 EX1 MA1 WB1

(I2) r4 r1 + 17 IF2 ID2 ID2 ID2 ID2 EX2 MA2 WB2

(I3) IF3 IF3 IF3 IF3 ID3 EX3 MA3

(I4) stalled stages IF4 ID4 EX4

(I5) IF5 ID5

time t0 t1 t2 t3 t4 t5 t6 t7 . . . .(I1) r1 r0 + 10 IF1 ID1 EX1 MA1 WB1

(I2) r4 r1 + 17 IF2 ID2 EX2 MA2 WB2

(I3) IF3 ID3 EX3 MA3 WB3

(I4) IF4 ID4 EX4 MA4 WB4

(I5) IF5 ID5 EX5 MA5 WB5

A new datapath, i.e., a bypass, can get the data from the output of the ALU to its input

Page 27: CS 152 Computer Architecture and Engineering Lecture 4 - Pipelining Krste Asanovic Electrical Engineering and Computer Sciences University of California.

2/3/2009 CS152-Spring’09 27

Adding a Bypass

ASrc

...(I1) r1 r0 + 10(I2) r4 r1 + 17

r4 r1 r1

IRIR IR

PCA

B

Y

R

MD1 MD2

addrinst

InstMemory

0x4Add

IR

ImmExt

ALU

rd1

GPRs

rs1rs2

wswd rd2

we

wdata

addr

wdata

rdataData Memory

we

31

nop

stall

D

E M W

When does this bypass help?

r1 M[r0 + 10]r4 r1 + 17

JAL 500r4 r31 + 17

yes no no

Page 28: CS 152 Computer Architecture and Engineering Lecture 4 - Pipelining Krste Asanovic Electrical Engineering and Computer Sciences University of California.

2/3/2009 CS152-Spring’09 28

The Bypass SignalDeriving it from the Stall Signal

ASrc = (rsD=wsE).weE.re1D

we = Case opcodeALU, ALUi, LW (ws 0)

JAL, JALR on... off

No because only ALU and ALUi instructions can benefit from this bypass

Is this correct?

Split weE into two components: we-bypass, we-stall

stall = ( ((rsD =wsE).weE + (rsD =wsM).weM + (rsD =wsW).weW).re1D

+((rtD =wsE).weE + (rtD =wsM).weM + (rtD =wsW).weW).re2D )

ws = Case opcodeALU rdALUi, LW rtJAL, JALR R31

Page 29: CS 152 Computer Architecture and Engineering Lecture 4 - Pipelining Krste Asanovic Electrical Engineering and Computer Sciences University of California.

2/3/2009 CS152-Spring’09 29

Bypass and Stall Signals

we-bypassE = Case opcodeE

ALU, ALUi (ws 0) ... off

ASrc = (rsD =wsE).we-bypassE . re1D

Split weE into two components: we-bypass, we-stall

stall = ((rsD =wsE).we-stallE +

(rsD=wsM).weM + (rsD=wsW).weW). re1D

+((rtD = wsE).weE + (rtD = wsM).weM + (rtD = wsW).weW). re2D

we-stallE = Case opcodeE

LW (ws 0) JAL, JALR on

... off

Page 30: CS 152 Computer Architecture and Engineering Lecture 4 - Pipelining Krste Asanovic Electrical Engineering and Computer Sciences University of California.

2/3/2009 CS152-Spring’09 30

Fully Bypassed Datapath

ASrcIRIR IR

PCA

B

Y

R

MD1 MD2

addrinst

InstMemory

0x4Add

IR ALU

ImmExt

rd1

GPRs

rs1rs2

wswd rd2

we

wdata

addr

wdata

rdataData Memory

we

31

nop

stall

D

E M W

PC for JAL, ...

BSrc

Is there stilla need for thestall signal ? stall = (rsD=wsE). (opcodeE=LWE).(wsE0 ).re1D

+ (rtD=wsE). (opcodeE=LWE).(wsE0 ).re2D

Page 31: CS 152 Computer Architecture and Engineering Lecture 4 - Pipelining Krste Asanovic Electrical Engineering and Computer Sciences University of California.

2/3/2009 CS152-Spring’09 31

Resolving Data Hazards (3)

Strategy 3:

Speculate on the dependence. Two cases:

Guessed correctly do nothing

Guessed incorrectly kill and restart

Page 32: CS 152 Computer Architecture and Engineering Lecture 4 - Pipelining Krste Asanovic Electrical Engineering and Computer Sciences University of California.

2/3/2009 CS152-Spring’09 32

Next Time: Control Hazards

• Branches/Jumps

• Exceptions/Interrupts

Page 33: CS 152 Computer Architecture and Engineering Lecture 4 - Pipelining Krste Asanovic Electrical Engineering and Computer Sciences University of California.

2/3/2009 CS152-Spring’09 33

Acknowledgements

• These slides contain material developed and copyright by:

– Arvind (MIT)

– Krste Asanovic (MIT/UCB)

– Joel Emer (Intel/MIT)

– James Hoe (CMU)

– John Kubiatowicz (UCB)

– David Patterson (UCB)

• MIT material derived from course 6.823

• UCB material derived from course CS252