Top Banner
COMP25212 Further Pipeline Issues
14

COMP25212 Further Pipeline Issues. Cray 1 COMP25212 Designed in 1976 Cost $8,800,000 8MB Main Memory Max performance 160 MFLOPS Weight 5.5 Tons Power.

Dec 14, 2015

Download

Documents

Brooklynn Evens
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: COMP25212 Further Pipeline Issues. Cray 1 COMP25212 Designed in 1976 Cost $8,800,000 8MB Main Memory Max performance 160 MFLOPS Weight 5.5 Tons Power.

COMP25212

Further Pipeline Issues

Page 2: COMP25212 Further Pipeline Issues. Cray 1 COMP25212 Designed in 1976 Cost $8,800,000 8MB Main Memory Max performance 160 MFLOPS Weight 5.5 Tons Power.

Cray 1

COMP25212

•Designed in 1976

•Cost $8,800,000

•8MB Main Memory

•Max performance 160 MFLOPS

•Weight 5.5 Tons

•Power 115 KW (250KW inc Storage

and cooling)

Page 3: COMP25212 Further Pipeline Issues. Cray 1 COMP25212 Designed in 1976 Cost $8,800,000 8MB Main Memory Max performance 160 MFLOPS Weight 5.5 Tons Power.

COMP25212

Further Pipeline Issues

Page 4: COMP25212 Further Pipeline Issues. Cray 1 COMP25212 Designed in 1976 Cost $8,800,000 8MB Main Memory Max performance 160 MFLOPS Weight 5.5 Tons Power.

COMP25212

More Pipeline Detail

Register B

ank

Data

Cache

PC

Instruction

Cache

MU

X

ALU

IF ID EX MEM WB

Page 5: COMP25212 Further Pipeline Issues. Cray 1 COMP25212 Designed in 1976 Cost $8,800,000 8MB Main Memory Max performance 160 MFLOPS Weight 5.5 Tons Power.

COMP25212

Data Hazards

• Pipeline can cause other problems

• ConsiderADD R1,R2,R3

MUL R0,R1,R1

• The ADD instruction is producing a value in R1

• The following MUL instruction uses R1 as input

Page 6: COMP25212 Further Pipeline Issues. Cray 1 COMP25212 Designed in 1976 Cost $8,800,000 8MB Main Memory Max performance 160 MFLOPS Weight 5.5 Tons Power.

COMP25212

Instructions in the Pipeline

Register B

ank

Data

Cache

PC

Instruction

Cache

MU

X

ALU

IF ID EX MEM WB

ADD R1,R2,R3MUL R0,R1,R1

Page 7: COMP25212 Further Pipeline Issues. Cray 1 COMP25212 Designed in 1976 Cost $8,800,000 8MB Main Memory Max performance 160 MFLOPS Weight 5.5 Tons Power.

COMP25212

The Data isn’t Ready

• At end of ID cycle, MUL instruction should have selected value in R1 to put into buffer at input to EX stage

• But the correct value for R1 from ADD instruction is being put into the buffer at output of EX stage at this time

• It won’t get to input of Register Bank until one cycle later – then probably another cycle to write into R1

Page 8: COMP25212 Further Pipeline Issues. Cray 1 COMP25212 Designed in 1976 Cost $8,800,000 8MB Main Memory Max performance 160 MFLOPS Weight 5.5 Tons Power.

COMP25212

Insert Delays?

• One solution is to detect such data dependencies in hardware and hold instruction in decode stage until data is ready – ‘bubbles’ & wasted cycles again

• Another is to use the compiler to try to reorder instructions

• Only works if we can find something useful to do – otherwise insert NOPs - waste

Page 9: COMP25212 Further Pipeline Issues. Cray 1 COMP25212 Designed in 1976 Cost $8,800,000 8MB Main Memory Max performance 160 MFLOPS Weight 5.5 Tons Power.

COMP25212

Forwarding

Register B

ank

Data

Cache

PC

Instruction

Cache

MU

X

ALU

ADD R1,R2,R3MUL R0,R1,R1

• We can add extra paths for specific cases• Control becomes more complex

Page 10: COMP25212 Further Pipeline Issues. Cray 1 COMP25212 Designed in 1976 Cost $8,800,000 8MB Main Memory Max performance 160 MFLOPS Weight 5.5 Tons Power.

COMP25212

Why did it Occur?

• Due to the design of our pipeline• In this case, the result we want is ready

one stage ahead of where it was needed, why pass it down the pipeline?

• But what if we have the sequenceLDR R1,[R2,R3]MUL R0,R1,R1

• LDR instruction means load R1 from memory address R2+R3

Page 11: COMP25212 Further Pipeline Issues. Cray 1 COMP25212 Designed in 1976 Cost $8,800,000 8MB Main Memory Max performance 160 MFLOPS Weight 5.5 Tons Power.

COMP25212

Pipeline Sequence for LDR

• Fetch

• Decode and read registers (R2 & R3)

• Execute – add R2+R3 to form address

• Memory access, read from address

• Now we can write the value into register R1

• We have designed the ‘worst case’ pipeline to work for all instructions

Page 12: COMP25212 Further Pipeline Issues. Cray 1 COMP25212 Designed in 1976 Cost $8,800,000 8MB Main Memory Max performance 160 MFLOPS Weight 5.5 Tons Power.

Forwarding

Register B

ank

Data

Cache

PC

Instruction

Cache

MU

X

ALU

NOPMUL R0,R1,R1

• We can add extra paths for specific cases• Control becomes more complex

LDR R1,[R2,R3]

Page 13: COMP25212 Further Pipeline Issues. Cray 1 COMP25212 Designed in 1976 Cost $8,800,000 8MB Main Memory Max performance 160 MFLOPS Weight 5.5 Tons Power.

COMP25212

Longer Pipelines

• As mentioned previously we can go to longer pipelines– Do less per pipeline stage– Each step takes less time– So can increase clock frequency– But greater penalty for hazards– More complex control

• Negative returns?

Page 14: COMP25212 Further Pipeline Issues. Cray 1 COMP25212 Designed in 1976 Cost $8,800,000 8MB Main Memory Max performance 160 MFLOPS Weight 5.5 Tons Power.

COMP25212

Where Next?

• Despite these difficulties it is possible to build processors which approach 1 cycle per instruction (cpi)

• Given that the computational model is one of serial instruction execution can we do any better than this?