Top Banner
CSC 4250 Computer Architectures September 29, 2006 Appendix A. Pipelining
27

CSC 4250 Computer Architectures

Feb 04, 2016

Download

Documents

Amos

CSC 4250 Computer Architectures. September 29, 2006 Appendix A. Pipelining. Static Pipeline Scheduling. Simple pipeline fetches an instruction, decodes it, and checks for hazards (structural and data) If no hazard, then issue instruction - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: CSC 4250 Computer Architectures

CSC 4250Computer Architectures

September 29, 2006Appendix A. Pipelining

Page 2: CSC 4250 Computer Architectures

Static Pipeline Scheduling

Simple pipeline fetches an instruction, decodes it, and checks for hazards (structural and data)

If no hazard, then issue instruction If there is hazard, then stall pipeline ─ no

new instructions will be fetched or issued Compiler may schedule instructions to

avoid the hazard ─ static scheduling

Page 3: CSC 4250 Computer Architectures

Dynamic Pipeline Scheduling

Hardware rearranges instruction execution to reduce stalls Scoreboarding technique of CDC6600 Tomasulo’s algorithm (Chapter 3) We do in-order instruction issue ─ if an instruction is stalled in

the pipeline, then no later instructions can proceed What if later instructions are independent? Example: DIV.D F0,F2,F4

ADD.D F10,F0,F8MUL.D F6,F6,F14

We want to issue and execute MUL instruction while ADD instruction waits for the result of DIV

Page 4: CSC 4250 Computer Architectures

Scoreboarding

In a dynamically scheduled pipeline, all instructions pass through the issue stage in order (in-order issue); however, they can be stalled or they can bypass each other in the second stage (read operands) and enter execution out of order

Scoreboarding is a technique for allowing instructions to execute out of order when there are sufficient resources and no data dependences; it is named after the CDC 6600 scoreboard, which developed this capability

Page 5: CSC 4250 Computer Architectures

First Supercomputer

CDC = Control Data Corporation In 1964 CDC delivered the first CDC6600 The machine was unique in many ways It introduced scoreboarding It was the first processor to make extensive use of

multiple functional units. It had 16 separate FUs, including 4 FP units, 5 units for memory references and 7 units for integer operations

It had peripheral processors that used multithreading The interaction between pipelining and IS design was

understood, and a simple, load-store instruction set was used to promote pipelining

Page 6: CSC 4250 Computer Architectures

Structural and Data Hazards

Before, no instruction issue if there is either structural or data hazard

Data hazards include WAW, RAW and WAR Now, issue instruction if no structural hazard and no

WAW data hazard Example: DIV.D F0,F2,F4

ADD.D F10,F0,F8MUL.D F6,F6,F14

So, all three instructions will be issued Read operands when no RAW hazards

Page 7: CSC 4250 Computer Architectures

Record Keeping

Every instruction goes through the scoreboard, where a record of the data dependences is constructed; this step corresponds to instruction issue and replaces part of the ID step in the MIPS pipeline

The scoreboard determines when the instruction can read its operands and begin operation (RAW hazards)

If the scoreboard decides that the instruction cannot execute immediately, it monitors every change in the hardware and decides when the instruction can execute

The scoreboard controls when an instruction can write its result into the destination register (WAR hazards)

Page 8: CSC 4250 Computer Architectures

Split ID Stage into Two Stages 1. Issue ─ Decode instructions; check for structural and

WAW hazards2. Read operands ─ Wait until no RAW hazards; then read

operands

No Issue: DIV.D F0,F2,F4ADD.D F10,F0,F8SUB.D F6,F6,F14 (why no

issue?)

No Issue: DIV.D F0,F2,F4ADD.D F10,F0,F8MUL.D F0,F6,F14 (why no

issue?)

Page 9: CSC 4250 Computer Architectures

MIPS Processor with Scoreboard

Page 10: CSC 4250 Computer Architectures

Four Steps in Execution

1. Issue ─ if no structural nor WAW hazards

2. Read operands ─ if no RAW hazards

3. Execute ─ if both operands are received

4. Write result ─ if no WAR hazards

We concentrate on FP operations and do not consider a step for memory access

Page 11: CSC 4250 Computer Architectures

Step One. Issue

If a functional unit (FU) for the instruction is free and no other active instruction has the same destination register, the scoreboard issues the instruction to the FU and updates its internal data structure

By ensuring that no other active FU wants to write its result into the destination register, we guarantee that WAW hazards cannot be present

If a structural or WAW hazard exists, then the instruction issue stalls, and no further instructions will issue until these hazards are cleared

Page 12: CSC 4250 Computer Architectures

Step Two. Read Operands

The scoreboard monitors the availability of the source operands. A source operand is available if no earlier issued active instruction is going to write it.

When the source operands are available, the scoreboard tells the FU to proceed to read the operands from the registers and begin execution.

The scoreboard resolves RAW hazards dynamically in this step, and instructions may be sent into execution out of order.

The operands for an instruction are read only when both operands are available in the register file. The scoreboard does not take advantage of forwarding.

Issue and Read Operands together replace the ID stage of the simple MIPS pipeline.

Page 13: CSC 4250 Computer Architectures

Step Three. Execution

The FU begins execution upon receiving operands

When the result is ready, the FU notifies the scoreboard that it has completed execution

This step replaces the EX stage in the MIPS pipeline and takes multiple cycles in the MIPS FP pipeline

Page 14: CSC 4250 Computer Architectures

Step Four. Write Result

Once it is aware that the FU has completed execution, the scoreboard checks for WAR hazards and stalls the completing instruction, if necessary

In general, a completing instruction cannot be allowed to write its results when

There is an instruction that has not read its operands that precedes (i.e., in order of issue) the completing instruction, and

One of the operands is the same register as the result of the completing instruction

If WAR hazard does not exist, or when it clears, the scoreboard tells the FU to store its result to the destination register

This step replaces the WB step in the simple MIPS pipeline

Page 15: CSC 4250 Computer Architectures

Example (p. A-72)

L.D F6,34(R2)

L.D F2,45(R3)

MUL.D F0,F2,F4

SUB.D F8,F6,F2

DIV.D F10,F0,F6

ADD.D F6,F8,F2

Page 16: CSC 4250 Computer Architectures

Scoreboard

Three parts:1. Instruction status ─

indicates which of four steps of instruction

2. Functional unit status ─

busy, op, Fi, Fj, Fk, Qj, Qk, Rj, Rk

3. Register result status ─

indicates which functional unit will write each register, if instruction is active

Page 17: CSC 4250 Computer Architectures

Example

Code:

L.D F6,34(R2)

L.D F2, 45(R3)

MUL.D F0,F2,F4

SUB.D F8,F6,F2

DIV.D F10,F0,F6

ADD.D F6,F8,F2

Page 18: CSC 4250 Computer Architectures

Scoreboard Tables 1 (Fill in blanks)

Instruction Issue Read operands Exec. complete Write result

L.D F6,34(R2) √ √

L.D F2,45(R3)

MUL.D F0,F2,F4

SUB.D F8,F6,F2

DIV.D F10,F0,F6

ADD.D F6,F8,F2

Name Busy Op Fi Fj Fk Qj Qk Rj Rk

Integer

Mult1

Mult2

Add

Divide

F0 F2 F4 F6 F8 F10 F12 … F30

FU

Page 19: CSC 4250 Computer Architectures

Scoreboard Tables 2 (Fill in blanks)

Instruction Issue Read operands Exec. complete Write result

L.D F6,34(R2) √ √ √ √

L.D F2,45(R3) √ √ √

MUL.D F0,F2,F4 √

SUB.D F8,F6,F2 √

DIV.D F10,F0,F6 √

ADD.D F6,F8,F2

Name Busy Op Fi Fj Fk Qj Qk Rj Rk

Integer

Mult1

Mult2

Add

Divide

F0 F2 F4 F6 F8 F10 F12 … F30

FU

Page 20: CSC 4250 Computer Architectures

Scoreboard Tables 3 (Fill in blanks)

Instruction Issue Read operands Exec. complete Write result

L.D F6,34(R2) √ √ √ √

L.D F2,45(R3) √ √ √ √

MUL.D F0,F2,F4 √ √ √

SUB.D F8,F6,F2 √ √ √ √

DIV.D F10,F0,F6 √

ADD.D F6,F8,F2 √ √ √

Name Busy Op Fi Fj Fk Qj Qk Rj Rk

Integer

Mult1

Mult2

Add

Divide

F0 F2 F4 F6 F8 F10 F12 … F30

FU

Page 21: CSC 4250 Computer Architectures

Scoreboard Tables 4 (Fill in blanks)

Instruction Issue Read operands Exec. complete Write result

L.D F6,34(R2) √ √ √ √

L.D F2,45(R3) √ √ √ √

MUL.D F0,F2,F4 √ √ √ √

SUB.D F8,F6,F2 √ √ √ √

DIV.D F10,F0,F6 √ √ √

ADD.D F6,F8,F2 √ √ √ √

Name Busy Op Fi Fj Fk Qj Qk Rj Rk

Integer

Mult1

Mult2

Add

Divide

F0 F2 F4 F6 F8 F10 F12 … F30

FU

Page 22: CSC 4250 Computer Architectures

Required Checks

Instruction status Wait until

Issue Not busy[FU] and not Result[D]

Read operands Rj and Rk

Execution complete Functional unit done

Write results For every f

( ( Fj[f]≠Fi[FU] or Rj[f]=No ) &

( Fk[f]≠Fi[FU] or Rk[f]=No ) )

Page 23: CSC 4250 Computer Architectures

WAR Hazard

WAR hazard exists if another instr. has this instr.’s destination

(Fi[FU]) as a source (Fj[f] or Fk[f]), and if some other instruction has flagged the

register (Rj = Yes or Rk = Yes) Test on write-result prevents write if WAR

hazard exists

Page 24: CSC 4250 Computer Architectures

Costs and Benefits of Scoreboarding Reported performance improvement of 1.7 for

FORTRAN programs and 2.5 for hand-coded assembly language.

Scoreboard had about as much logic as a FU ─ surprisingly low.

Main cost was large number of buses ─ about four times as many as would be required if CPU only executed instructions in order.

Page 25: CSC 4250 Computer Architectures

Factors Limiting Scoreboarding 1. Amount of Parallelism available among the instructions ─ This

determines whether independent instructions can be found to execute. If each instruction depends on its predecessor, no dynamic scheduling scheme can reduce stalls.

2. Amount of Scoreboard Entries ─ This determines how far ahead the pipeline can look for independent instructions. The set of instructions examined as candidates for potential execution is called the window. The size of the scoreboard determines the size of the window.

3. Number and Types of FU’s ─ This determines the importance of structural hazards.

4. Presence of Antidependences and Output Dependences ─ These lead to WAR and WAW stalls.

Page 26: CSC 4250 Computer Architectures

A.9. Fallacies and Pitfalls

Unexpected execution may cause unexpected hazards. It looks like that WAW hazards should never occur in a code sequence because no compiler would ever generate two writes to the same register without an intervening read. But they can occur when the sequence is unexpected. For example, the first write might be in the delay slot of a taken branch. Here is an example:

BNEZ R1,fooDIV.D F0,F2,F4; moved into delay slot

; from fall through…..…..

foo: L.D F0,qrsIf the branch is taken, then before DIV.D can complete, the L.D will reach WB, causing a WAW hazard.

Page 27: CSC 4250 Computer Architectures

How Extensive Pipelining Affects Performance

Extensive pipelining can impact other aspects of a design, leading to overall worse cost-performance

The best example of this phenomenon comes from two implementations of the VAX, the 8600 and the 8700

When the 8600 was initially delivered, it had a cycle time of 80ns. Subsequently, a redesigned version called the 8650 with a 55 ns clock was introduced.

The 8700 had a much simpler pipeline that operated at the microinstruction level, yielding a smaller CPU with a faster clock cycle of 45ns

The overall outcome is that the 8650 had a CPI advantage of about 20%, but the 8700 had a clock rate that was about 20% faster. Thus, the 8700 achieved the same performance with much less hardware