Top Banner
Computer Architecture Pipelines & Superscalars Sunset over the Pacific Ocean Taken from Iolanthe II about 100nm north of Cape Reanga
25

Computer Architecture Pipelines & Superscalars Sunset over the Pacific Ocean Taken from Iolanthe II about 100nm north of Cape Reanga.

Jan 04, 2016

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Computer Architecture Pipelines & Superscalars Sunset over the Pacific Ocean Taken from Iolanthe II about 100nm north of Cape Reanga.

Computer Architecture

Pipelines & Superscalars

Sunset over the Pacific OceanTaken from Iolanthe II about 100nm north of Cape Reanga

Page 2: Computer Architecture Pipelines & Superscalars Sunset over the Pacific Ocean Taken from Iolanthe II about 100nm north of Cape Reanga.

Pipelines

• Data Hazards• Code:

lw $4, 0($1)add $15, $1, $1sub $2, $1, $3and $12, $2, $5or $13, $6, $2add $14, $2, $2sw $15,100($2)

The last four instructions all depend on a result

produced by the first!

MIPS instructionshave the format

op dest, srca, srcb

Page 3: Computer Architecture Pipelines & Superscalars Sunset over the Pacific Ocean Taken from Iolanthe II about 100nm north of Cape Reanga.

Pipelines - Data hazards

• Examine the pipeline(ignore first 2!)

• r2 onlyupdatedin timefor add!

Page 4: Computer Architecture Pipelines & Superscalars Sunset over the Pacific Ocean Taken from Iolanthe II about 100nm north of Cape Reanga.

Pipelines - Data Hazards

• Compilersolution• Insert

NOOPs• Inefficient!

Page 5: Computer Architecture Pipelines & Superscalars Sunset over the Pacific Ocean Taken from Iolanthe II about 100nm north of Cape Reanga.

Pipelines - Data Hazards

• Second compiler solution• Reorder

lw $4, 0($1)add $15, $1, $1sub $2, $1, $3and $12, $2, $5or $13, $6, $2add $14, $2, $2sw $15,100($2)

sub $2, $1, $3lw $4, 0($1)add $15, $1, $1and $12, $2, $5or $13, $6, $2add $14, $2, $2sw $15,100($2)

These two must not define$1 or $3!

ReadWritten

Page 6: Computer Architecture Pipelines & Superscalars Sunset over the Pacific Ocean Taken from Iolanthe II about 100nm north of Cape Reanga.

Pipelines - Data Hazards

• Second compiler solution• Reorder

sub $2, $1, $3lw $4, 0($1)add $15, $1, $1and $12, $2, $5or $13, $6, $2add $14, $2, $2sw $15,100($2)

ReadWritten

First use of $2

Page 7: Computer Architecture Pipelines & Superscalars Sunset over the Pacific Ocean Taken from Iolanthe II about 100nm north of Cape Reanga.

Pipelines - Data Hazards

• Compiler analyses dependencies• Register

definitions

• Registeruse

• Read After Write(RAW)dependency

• No dependencies

• Instruction can be moved!

sub $2, $1, $3lw $4, 0($1)add $15, $1, $1and $12, $2, $5or $13, $6, $2add $14, $2, $2sw $15,100($2)

Written

Usesof $2

Page 8: Computer Architecture Pipelines & Superscalars Sunset over the Pacific Ocean Taken from Iolanthe II about 100nm north of Cape Reanga.

Pipelines - Data Hazards

• Hardware solution• Value forwarding

• Hardware detectsdependency

• scoreboard• Forwards result

from WB to EXfor subsequentuse

• Hardware• Transparent to software!

Page 9: Computer Architecture Pipelines & Superscalars Sunset over the Pacific Ocean Taken from Iolanthe II about 100nm north of Cape Reanga.

Data Hazards - classification

• Read after Write (RAW)• Instruction 1 must write

before instruction 2 reads

• Write after Write (WAW)• Instructions 1 and 2 both write

Instruction 2 must write after 1

• Write after Read (WAR)• Instruction 1 reads

Instruction 2 writes (overwrites)• Instruction 2 must not write before 1 reads

Reordering algorithms must consider all three!

Page 10: Computer Architecture Pipelines & Superscalars Sunset over the Pacific Ocean Taken from Iolanthe II about 100nm north of Cape Reanga.

Lecture 5 - Key Points

• Data Hazards• RAW - most common• WAW• WAR

• Compiler looks for dependencies• then re-orders

• Hardware• Scoreboard

• Monitors dependencies• ensures correct operation

• Value forwarding hardware• Forwards results from EX stage

Page 11: Computer Architecture Pipelines & Superscalars Sunset over the Pacific Ocean Taken from Iolanthe II about 100nm north of Cape Reanga.

Pipelines - Exceptions

• Caused by overflow, underflow• Example

add $1, $2, $1• Overflow detected in EX stage• Causes jump to exception handler

• as branch - remainder of pipeline flushed

but• Compiler needs original $1 causing overflow

Register must not be overwritten • EX stage needs to squash WB operation

• Precise Exception problem - more later!

Page 12: Computer Architecture Pipelines & Superscalars Sunset over the Pacific Ocean Taken from Iolanthe II about 100nm north of Cape Reanga.

Superpipelines

Page 13: Computer Architecture Pipelines & Superscalars Sunset over the Pacific Ocean Taken from Iolanthe II about 100nm north of Cape Reanga.

Superpipelines

• Time to complete each instruction = t• Total: Fetch + decode + fetch operands + operation + write-back

• Clock frequency: f = 1/t

• An n-stage pipeline allows n instructions ‘in flight’ simultaneously

• Each pipeline stage does 1/n of the work Each stage requires time t/n

• Assumes a perfectly balanced pipeline!• Balanced = each stage requires the same time

Clock frequency: fpipe = 1/(t/n) = n/t

Increasing n increases processor power?

Page 14: Computer Architecture Pipelines & Superscalars Sunset over the Pacific Ocean Taken from Iolanthe II about 100nm north of Cape Reanga.

Pipelines - Depth

• Pipeline can’t be too deep• Hazards are frequent

many stalls in deep pipelines

0.5

1.0

1.5

2.0

2.5

1 2 4 8 16

Rel

ativ

eP

erfo

rman

ce

Pipeline Depth

TooDeep!

Page 15: Computer Architecture Pipelines & Superscalars Sunset over the Pacific Ocean Taken from Iolanthe II about 100nm north of Cape Reanga.

Pipelines - Depth

• Pipeline can’t be too deep• Hazards are frequent

many stalls in deep pipelines

0.5

1.0

1.5

2.0

2.5

1 2 4 8 16

Rel

ativ

eP

erfo

rman

ce

Pipeline Depth

TooDeep!

Superpipelined

Page 16: Computer Architecture Pipelines & Superscalars Sunset over the Pacific Ocean Taken from Iolanthe II about 100nm north of Cape Reanga.

Pipeline depth

• Increasing number of stages• Each stage adds overheads

• Problems balancing pipeline

• Require tpd1 ≈ tpd

2 ≈ tpd3

• Stage time is tpdj + tpd

reg

• n stages means n tpdreg overhead

Reg

iste

r

Op

erat

ion

(wo

rk)

Reg

iste

r

Reg

iste

r

Op

erat

ion

(wo

rk)

Op

erat

ion

(wo

rk)

tpdregtpd

1 tpd2 tpd

3tpdreg tpd

reg

Page 17: Computer Architecture Pipelines & Superscalars Sunset over the Pacific Ocean Taken from Iolanthe II about 100nm north of Cape Reanga.

CISC and pipelines

• High Speed CISC processors are pipelined• Overlap IF, EX

• Variable• instruction length• running time (number of microcode cycles)pipeline imbalance“backup” in pipe stagescomplicate hazard detection

• Complex addressing modesauto-increment updates address registermultiple memory accesses required

smooth pipeline flow more difficult!

Page 18: Computer Architecture Pipelines & Superscalars Sunset over the Pacific Ocean Taken from Iolanthe II about 100nm north of Cape Reanga.

Instruction Queues

• Vital performance determinant• Rate of instruction fetch

• High Performance processors• Fetch multiple instructions in each cycle

• 2 - 4 common• Use wide datapath to memory

• PowerPC 604 128 bits = 4 instructions• Despatch unit

• Examine dependencies• Determine which instructions can be

despatched

Page 19: Computer Architecture Pipelines & Superscalars Sunset over the Pacific Ocean Taken from Iolanthe II about 100nm north of Cape Reanga.

Instruction Queues

• Q “matches” fetch/despatch rates• General Strategy for matching

Producers - Consumers• Use of FIFO-style Queues• Absorb

AsynchronousDelivery / ConsumptionRates

• ProvidesElasticityin pipelines

Producer

FIFO

Consumer

DifferingInstantaneous

Rates

Page 20: Computer Architecture Pipelines & Superscalars Sunset over the Pacific Ocean Taken from Iolanthe II about 100nm north of Cape Reanga.

Superscalar Processors

Page 21: Computer Architecture Pipelines & Superscalars Sunset over the Pacific Ocean Taken from Iolanthe II about 100nm north of Cape Reanga.

PowerPC organisation

PowerPC 601~1993

Boundary of theSi die

New - Look in the “Example Processors” sectionof the Web notes

3-way SuperScalar• Integer• Branch• Floating Point

A newer machine will have more functional units here!

Page 22: Computer Architecture Pipelines & Superscalars Sunset over the Pacific Ocean Taken from Iolanthe II about 100nm north of Cape Reanga.

Superscalar Processors

• Multiple Functional Units• PowerPC 604

6-way superscalar

• Despatch Unit • Sends “ready” instructions to all free units• PowerPC 604:

• potential 4 instructions/cycle (pipeline lengths are different!)

• reality: 2-3 instructions/cycle?(program dependent!)

Branch UnitLoadStore Unit3 Integer UnitsFloating Point Unit

Page 23: Computer Architecture Pipelines & Superscalars Sunset over the Pacific Ocean Taken from Iolanthe II about 100nm north of Cape Reanga.

Superscalar Processors

• Mix of functional units• Up to 8-way superscalar common now

• 2 Floating point units• Usually have ~3 cycle latency

• 3 Integer Arithmetic• Branch unit• Load / store unit• + ….?

• Marketing departments can play some games with the ‘n’ of a n-way superscalar!

Page 24: Computer Architecture Pipelines & Superscalars Sunset over the Pacific Ocean Taken from Iolanthe II about 100nm north of Cape Reanga.

Pentium Quad Core - 2008

• Distinguish between • Multiple ‘cores’ (separate processors) – later –

and• Superscalars – multiple functional units per

processor☺“Wide dynamic execution” in Intel-speak

• Quad core• 4 cores• Complete up to 4 instructions / cycle each• IIU can issue four instructions / cycle• 3 Mb L2 cache / processor (total 12Mb)• Master clock 3.2 GHz, front side bus 1.6GHz• 771 pins

Page 25: Computer Architecture Pipelines & Superscalars Sunset over the Pacific Ocean Taken from Iolanthe II about 100nm north of Cape Reanga.

Superscalar Limitations

• To achieve maximum performance• Instruction mix must match Functional Unit mix

• eg if we have 2 Integer ALUs, 2 FPUs, 1 branch unit, 1 load/store unit

• Instruction issue unit (IIU) can issue 4 instructions• Each four instructions should be able to use 4 of the

functional units• If instruction stream doesn’t have right mix

• Some functional units will remain idle

• FPUs require multiple cycles• Additional stalls

• Pipeline hazards stall pipeline• 4-way superscalar gets 1.8-3 instructions completed per

cycle• Program dependent!