Department of Computer and IT Engineering University of Kurdistan Pipelining (Multi-Cycle)

Department of Computer and IT EngineeringDepartment of Computer and IT EngineeringUniversity of KurdistanUniversity of Kurdistan

Pipelining (Multi-Cycle)

By: Dr. Alireza AbdollahpouriBy: Dr. Alireza Abdollahpouri

Pipelined MIPS processor

Any instruction set can be implemented in many different ways

MIPS ISAMIPS ISA

Single Cycle Multi-Cycle Pipelined

Short CPILong CCT

Long CPIShort CCT

Short CPIShort CCT

2

Getting the Best of Both Datapaths

Single-cycle:Clock rate = 125 MHz

CPI = 1

Multicycle:Clock rate = 500 MHz

CPI 4

Pipelined:Clock rate = 500 MHz

CPI 1

3

مفهوم پردازش خط لوله اي

مثال: شستن لباسهاAli, Bahram, Cathy, Dara

هر كدام مقداري لباس دارند كه مي خواهند بشورند، خشك كنند و اتو كنند.

دقيقه طول مي كشد.30عمل شستن

دقيقه طول مي كشد.40عمل خشك كردن

دقيقه طول مي كشد.20اتو زدن

A B C D

4

لباسشويي به صورت متوالي

ساعت براي كار چهار نفر طول ميكشد.6در انجام متوالي اعمال مذكور

A

B

C

D

30 40 20 30 40 20 30 40 20 30 40 20

6 PM 7 8 9 10 11 Midnight

Task

Order

Time

5

لباسشويي به صورت خط لوله اي

ساعت براي كار چهار نفر طول ميكشد.3.5انجام اعمال مذكور به صورت خط لوله اي

A

B

C

D

6 PM 7 8 9 10 11 Midnight

Task

Order

Time

30 40 40 40 40 20

6

مفهوم پايه

.خط لوله: چندین دستورالعمل به طور همزمان در حال اجر هستند.خط لوله به بخش ها یا قطعات تقسیم می شود.چرخه ماشین بوسیله کندترین مرحله خط لوله معین می گردد

معموال چرخه ماشین = پالس ساعت

7

خط لوله ای کردن

اگرn تکلیف (Task) داشته باشیم که زمان اجرای هر کدام

(، با n*tn باشد )زمان انجام کل تکالیف = tnاز آن ها برابر با

tp باشد و هر قطعه در kفرض اینکه تعداد قطعات خط لوله

(:tpانجام پذیر باشد )پالس ساعت =

Task اول در k پالس (k*tp).انجام می پذیرد

Task های دیگر هر کدام در پالس زمانی بعدی )یک پالس (n-1)زمانی( تکمیل خواهند شد، پس زمان الزم برای انجام

خواهد شد.tp(*n-1)تکلیف دیگر برابر با

در نتیجه افزایش سرعت پردازش خط لوله نسبت بهپردازش غیر خط لوله ای از فرمول زیر محاسبه می گردد:

S = ntn / (k + n - 1)tp

8

نكاتي در مورد

پردازش خط لوله

اي

پردازش خط لوله اي يك كار خاص راسريعتر نميكند، بلكه توان عملياتي كل را

بهبود ميبخشد.

كندترين مرحلهسرعت خط لوله توسط محدود ميگردد.

چند كار همزمان با استفاده از منابع مختلف باهم اجرا ميشوند.

در حالت ايده آل، تسريع به تعداد مراحلخط لوله است.

)مراحل نامتعادل )با زمان اجراي نامساويسرعت و كارايي خط لوله را كاهش ميدهد.

زماني كه براي پر كردن و خالي كردن خطلوله صرف ميشود نيز باعث كاهش سرعت

خط لوله ميگردد.

A

B

C

D

6 PM 7 8 9

Task

Order

Time

30 40 40 40 40 20

9

پنج مرحله سيكل دستورالعمل

Ifetch: Instruction FetchFetch the instruction from the Instruction Memory

Reg/Dec: Registers Fetch and Instruction DecodeExec: Calculate the memory addressMem: Read the data from the Data MemoryWr: Write the data back to the register file

Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5

Ifetch Reg/Dec Exec Mem Wrlw

10

IF: InstructionFetch

ID Instr. DecodeReg. Read

MIPSپنج مرحله مسير داده پردازنده

MEM:MemoryAccess

WB: WriteBack

EX: ExecuteAddr. Calc

رجیسترهای بین مراحل خط لوله

Need registers between stages To hold information produced in previous cycle

تجسم خط لوله

Instr.

Order

Time (clock cycles)

Reg

ALU

DMemIfetch Reg

Reg

ALU

DMemIfetch Reg

Reg

ALU

DMemIfetch Reg

Reg

ALU

DMemIfetch Reg

Cycle 1Cycle 2 Cycle 3Cycle 4 Cycle 6Cycle 7Cycle 5

13

مشكالتي كه در پردازش خط لوله اي بوجود مي آيد

Limits to pipelining: Hazards prevent next instruction from executing during its designated clock cycle Structural hazards: HW cannot support this combination

of instructions

Data hazards: Instruction depends on result of prior instruction still in the pipeline

Control hazards: Caused by delay between the fetching of instructions and decisions about changes in control flow (branches and jumps).

14

One Memory Port/ Structural Hazards

Instr.

Order

Time (clock cycles)

Load

Instr 1

Instr 2

Instr 3

Instr 4

Reg

ALU

DMemIfetch Reg

Reg

ALU

DMemIfetch Reg

Reg

ALU

DMemIfetch Reg

Reg

ALU

DMemIfetch Reg


Reg

ALU

DMemIfetch Reg

15

Reading data from memory

Reading instruction from memory

One Memory Port/ Structural Hazards

Instr.

Order

Time (clock cycles)

Load

Instr 1

Instr 2

Stall

Instr 3

Reg

ALU

DMemIfetch Reg

Reg

ALU

DMemIfetch Reg

Reg

ALU

DMemIfetch Reg


Reg

ALU

DMemIfetch Reg

Bubble Bubble Bubble BubbleBubble

16

Data Hazard on $1

17

add $1,$3,$0

sub $4,$1,$5

and $6,$1,$7

xor $4,$1,$5

or $8,$1,$9

وابستگيهاي رو به عقب در زمان

Instr.

Order

Time (clock cycles)

add $1,$3,$0

sub $4,$1,$3

and $6,$1,$7

or r$,$1,$9

xor $10,$1,$11

IF ID/RF EX MEM WB

AL

UIm Reg Dm Reg

AL

UIm Reg Dm RegA

LUIm Reg Dm Reg

Im

AL

UReg Dm Reg

AL

UIm Reg Dm Reg

18

نتيجه يك مرحله را به محض آماده شدن به جلو برانيم

Data براي حل مشكل Forwardingروش Hazard

Instr.

Order

Time (clock cycles)

AL

UIm Reg Dm

AL

UIm Reg Dm Reg

AL

UIm Reg Dm Reg

Im

AL

UReg Dm Reg

AL

UIm Reg Dm Reg

19

IF ID/RF EX MEM WB

add $1,$3,$0

sub $4,$1,$3

and $6,$1,$7

or r$,$1,$9

xor $10,$1,$11

تغييرات سخت افزاري براي پشتيباني Forwardingاز

MEM

/WR

ID/E

X

EX

/MEM

DataMemory

ALU

mu

xm

ux

Reg

iste

rs

NextPC

Immediate

mu

x

20

Read After Write (RAW) InstrJ tries to read operand before InstrI writes it

Caused by a “Dependence” (in compiler nomenclature). This hazard results from an actual need for communication.

I: add r1,r2,r3J: sub r4,r1,r3

21

Data Hazardسه نوع

Write After Read (WAR) InstrJ writes operand before InstrI reads it

Called an “anti-dependence” by compiler writers.This results from reuse of the name “r1”.

I: sub r4,r1,r3 J: add r1,r2,r3K: mul r6,r1,r7

22


Write After Write (WAW) InstrJ writes operand before InstrI writes it.

Called an “output dependence” by compiler writersThis also results from the reuse of name “r1”.

I: sub r1,r4,r3 J: add r1,r2,r3K: mul r6,r1,r7

23


Data Hazard حتي با استفاده ازForwarding

24

25

Data Hazard حتي با استفاده ازForwarding

Try producing fast code for

a = b + c;

d = e – f;

assuming a, b, c, d ,e, and f in memory. Slow code:

LW Rb,b

LW Rc,c

ADD Ra,Rb,Rc

SW a,Ra

LW Re,e

LW Rf,f

SUB Rd,Re,Rf

SW d,Rd

Software Scheduling to Avoid Load Hazards

Fast code:

LW Rb,b

LW Rc,c

LW Re,e

ADD Ra,Rb,Rc

LW Rf,f

SW a,Ra

SUB Rd,Re,Rf

SW d,Rd

26

Control Hazard on Branches - Three Stage Stall

10: beq r1,r3,36

14: and r2,r3,r5

18: or r6,r1,r7

22: add r8,r1,r9

36: xor r10,r1,r11

Reg ALU

DMemIfetch Reg

Reg

ALU

DMemIfetch Reg

Reg

ALU

DMemIfetch Reg

Reg

ALU

DMemIfetch Reg

Reg

ALU

DMemIfetch Reg

27

Branch Stall Impact

If CPI = 1, 30% branch, Stall 3 cycles => new CPI = 1.9!

Two part solution:Determine branch taken or not sooner, ANDCompute taken branch address earlier

28

Four Branch Hazard Alternatives

1: Stall until branch direction is clear

2: Predict Branch Not Taken

3: Predict Branch Taken

4: Delayed Branch

29

Superscalar پردازش ابر عددي يا

از چند استفادهلوله به خط

موازي صورت

30

Summary : Control and Pipelining

Just overlap tasks; easy if tasks are independent Speed Up Pipeline Depth; if ideal CPI is 1, then:

Hazards limit performance on computers: Structural: need more HW resources Data (RAW,WAR,WAW): need forwarding, compiler

scheduling Control: Delayed branch, prediction

pipelined

dunpipeline

TimeCycle

TimeCycle

CPI stall Pipeline 1depth Pipeline

Speedup

31

Single Cycle, Mult-Cycle, vs. Pipeline

Multiple Cycle Implementation:

Clk

Cycle 1

IFetch Dec Exec Mem WB

Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9Cycle 10

IFetch Dec Exec Mem

lw sw

IFetch

R-type

lw IFetch Dec Exec Mem WB

Pipeline Implementation:

IFetch Dec Exec Mem WBsw

IFetch Dec Exec Mem WBR-type

Clk

Single Cycle Implementation:

lw sw Waste

Cycle 1 Cycle 2

33QuestionsQuestionsQuestionsQuestions

Department of Computer and IT Engineering University of Kurdistan Pipelining (Multi-Cycle)

Documents

data hazard instr

previous cycle instr

data hazardwrite

data hazarddata hazard

data memorywr

instruction memoryregdec

instruction decodeexec

instruction fetchfetch