Pipelining III

Post on 24-Feb-2016

22 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Pipelining III. Andreas Klappenecker CPSC321 Computer Architecture. Administrative Issues. Talk by Laszlo Kish Quantum Computing Seminar, Thursday 10:00am-11:00am, HRBB 302 Projects: Get started!!!. Pipelined Datapath. Pipeline separation registers, width varies. Control Lines. - PowerPoint PPT Presentation

Transcript

Pipelining III

Andreas KlappeneckerCPSC321 Computer

Architecture

Administrative Issues Talk by Laszlo Kish

Quantum Computing Seminar, Thursday 10:00am-11:00am, HRBB 302

Projects: Get started!!!

Pipelined Datapath

Pipeline separation registers, width varies

Control Lines Instruction fetch:

control signal to read instruction memory and to write PC are always asserted - nothing special here

Instruction decode/register file read: same thing happens every clock cycle, so no

optional control lines to set Execution/address calculation

RegDst selects the result register, ALUOp selects the ALU operation ALUSrc selects Read data 2 or sign-extd.

immediate

Pipelined Datapath w/ Controls Signals

Control Lines Memory access

Branch set by branch equal MemRead set by load instructions MemWrite set by store instructions

Write back MemtoReg send ALU result or memory

value RegWrite selects register

Pipelined Datapath w/ Controls Signals

Pass control signals along just like the data

Pipeline Control

Execution/Address Calculation stage control lines

Memory access stage control lines

Write-back stage control

lines

InstructionReg Dst

ALU Op1

ALU Op0

ALU Src Branch

Mem Read

Mem Write

Reg write

Mem to Reg

R-format 1 1 0 0 0 0 0 1 0lw 0 0 0 1 0 1 0 1 1sw X 0 0 1 0 0 1 0 Xbeq X 0 1 0 1 0 0 0 X

Control

EX

M

WB

M

WB

WB

IF/ID ID/EX EX/MEM MEM/WB

Instruction

Datapath with Control

PC

Instructionmemory

Inst

ruct

ion

Add

Instruction[20– 16]

Mem

toR

eg

ALUOp

Branch

RegDst

ALUSrc

4

16 32Instruction[15– 0]

0

0

Mux

0

1

Add Addresult

RegistersWriteregister

Writedata

Readdata 1

Readdata 2

Readregister 1

Readregister 2

Signextend

Mux

1

ALUresult

Zero

Writedata

Readdata

Mux

1

ALUcontrol

Shiftleft 2

Reg

Writ

e

MemRead

Control

ALU

Instruction[15– 11]

6

EX

M

WB

M

WB

WBIF/ID

PCSrc

ID/EX

EX/MEM

MEM/WB

Mux

0

1

Mem

Writ

e

AddressData

memory

Address

Assume that the compiler has to guarantee that no hazards occur

Where do we insert the “nops” ?

sub $2, $1, $3and $12, $2, $5or $13, $6, $2add $14, $2, $2sw $15, 100($2)

Data Hazards

Data hazard: a dependency that “goes backward in time”

Dependencies

IM Reg

IM Reg

CC 1 CC 2 CC 3 CC 4 CC 5 CC 6

Time (in clock cycles)

sub $2, $1, $3

Programexecutionorder(in instructions)

and $12, $2, $5

IM Reg DM Reg

IM DM Reg

IM DM Reg

CC 7 CC 8 CC 9

10 10 10 10 10/– 20 – 20 – 20 – 20 – 20

or $13, $6, $2

add $14, $2, $2

sw $15, 100($2)

Value of register $2:

DM Reg

Reg

Reg

Reg

DM

Solutionsub $2, $1, $3nopnopand $12, $2, $5or $13, $6, $2add $14, $2, $2sw $15, 100($2)

Problem: this slows us down!

Resolution of Data Hazards

ForwardingDo not wait until result have been written Use temporary results! Use register file forwarding to handle read/write to same register ALU forwarding

Forwarding

IM Reg

IM Reg

CC 1 CC 2 CC 3 CC 4 CC 5 CC 6

Time (in clock cycles)

sub $2, $1, $3

Programexecution order(in instructions)

and $12, $2, $5

IM Reg DM Reg

IM DM Reg

IM DM Reg

CC 7 CC 8 CC 9

10 10 10 10 10/– 20 – 20 – 20 – 20 – 20

or $13, $6, $2

add $14, $2, $2

sw $15, 100($2)

Value of register $2 :

DM Reg

Reg

Reg

Reg

X X X – 20 X X X X XValue of EX/MEM :X X X X – 20 X X X XValue of MEM/WB :

DM

Forwarding

PC Instructionmemory

Registers

Mux

Mux

Control

ALU

EX

M

WB

M

WB

WB

ID/EX

EX/MEM

MEM/WB

Datamemory

Mux

Forwardingunit

IF/ID

Inst

ruct

ion

Mux

RdEX/MEM.RegisterRd

MEM/WB.RegisterRd

Rt

Rt

Rs

IF/ID.RegisterRd

IF/ID.RegisterRt

IF/ID.RegisterRt

IF/ID.RegisterRs

Load word can still cause a hazard: an instruction trying to read a register following a load instruction writing to the same register.

Need a hazard detection unit to “stall” pipeline

Obstructions to Forwarding

Reg

IM

Reg

Reg

IM

CC 1 CC 2 CC 3 CC 4 CC 5 CC 6

Time (in clock cycles)

lw $2, 20($1)

Programexecutionorder(in instructions)

and $4, $2, $5

IM Reg DM Reg

IM DM Reg

IM DM Reg

CC 7 CC 8 CC 9

or $8, $2, $6

add $9, $4, $2

slt $1, $6, $7

DM Reg

Reg

Reg

DM

Stalling We can stall the pipeline by keeping

an instruction in the same stage

lw $2, 20($1)

Programexecutionorder(in instructions)

and $4, $2, $5

or $8, $2, $6

add $9, $4, $2

slt $1, $6, $7

Reg

IM

Reg

Reg

IM DM

CC 1 CC 2 CC 3 CC 4 CC 5 CC 6Time (in clock cycles)

IM Reg DM RegIM

IM DM Reg

IM DM Reg

CC 7 CC 8 CC 9 CC 10

DM Reg

RegReg

Reg

bubble

Hazard Detection Unit Stall by letting an instruction that

won’t write anything go forward

PC Instructionmemory

Registers

Mux

Mux

Mux

Control

ALU

EX

M

WB

M

WB

WB

ID/EX

EX/MEM

MEM/WB

Datamemory

Mux

Hazarddetection

unit

Forwardingunit

0

Mux

IF/ID

Inst

ruct

ion

ID/EX.MemRead

IF/ID

Writ

e

PC

Writ

e

ID/EX.RegisterRt

IF/ID.RegisterRd

IF/ID.RegisterRtIF/ID.RegisterRtIF/ID.RegisterRs

RtRs

Rd

Rt EX/MEM.RegisterRd

MEM/WB.RegisterRd

When we decide to branch, other instructions are in the pipeline!

We are predicting “branch not taken” need to add hardware for flushing instructions if we are wrong

Branch Hazards

Reg

Reg

CC 1

Time (in clock cycles)

40 beq $1, $3, 7

Programexecutionorder(in instructions)

IM Reg

IM DM

IM DM

IM DM

DM

DM Reg

Reg Reg

Reg

Reg

RegIM

44 and $12, $2, $5

48 or $13, $6, $2

52 add $14, $2, $2

72 lw $4, 50($7)

CC 2 CC 3 CC 4 CC 5 CC 6 CC 7 CC 8 CC 9

Reg

Flushing Move branch decision from 4th

pipeline stage to the second only one instruction following the

branch will be in the pipeline IF.Flush turns fetched instruction

into a nop by zeroing the IF/ID pipeline register

Flushing Instructions

PC Instructionmemory

4

Registers

Mux

Mux

Mux

ALU

EX

M

WB

M

WB

WB

ID/EX

0

EX/MEM

MEM/WB

Datamemory

Mux

Hazarddetection

unit

Forwardingunit

IF.Flush

IF/ID

Signextend

Control

Mux

=

Shiftleft 2

Mux

Improving Performance Try and avoid stalls by reordering instructions Add a “branch delay slot”

the next instruction after a branch is always executed

rely on compiler to “fill” the slot with something useful

Superscalar: start more than one instruction in the same cycle

Dynamic Scheduling The hardware performs the “scheduling”

hardware tries to find instructions to execute out of order execution is possible speculative execution and dynamic branch prediction

All modern processors are very complicated DEC Alpha 21264: 9 stage pipeline, 6 instruction issue PowerPC and Pentium: branch history table Compiler technology important

This class has given you the background you need to learn more - read Chapter 6!

More material will be posted!

top related