Chapter One Introduction to Pipelined Processors

Chapter One Introduction to Pipelined

Processors

Principle of Designing Pipeline Processors

(Design Problems of Pipeline Processors)

Instruction Prefetch and Branch Handling

• The instructions in computer programs can be classified into 4 types:– Arithmetic/Load Operations (60%) – Store Type Instructions (15%)– Branch Type Instructions (5%)– Conditional Branch Type (Yes – 12% and No – 8%)


• Arithmetic/Load Operations (60%) : – These operations require one or two operand

fetches. – The execution of different operations requires a

different number of pipeline cycles


• Store Type Instructions (15%) :– It requires a memory access to store the data.

• Branch Type Instructions (5%) :– It corresponds to an unconditional jump.


• Conditional Branch Type (Yes – 12% and No – 8%) : – Yes path requires the calculation of the new

address – No path proceeds to next sequential instruction.


• Arithmetic-load and store instructions do not alter the execution order of the program.

• Branch instructions and Interrupts cause some damaging effects on the performance of pipeline computers.

Interrupts

• When instruction I is being executed,the occurrence of an interrupt postpones instruction I+1 until ISR is serviced.

• There are two types of interrupt: – Precise : caused by illegal operation codes and can

be detected at decoding stage – Imprecise: caused by defaults from storage,

address and execution functions

Handling Interrupts

• Precise: Since decoding is the first stage, instruction I prohibits I+1 from entering the pipeline and all preceding instructions are executed before ISR

• Imprecise : No new instructions are allowed and all incomplete instructions whether they precede or follow are executed before ISR.

Handling Example – Interrupt System of Cray1

Cray-1 System• The interrupt system is built around an exchange

package. • When an interrupt occurs, the Cray-1 saves 8 scalar

registers, 8 address registers, program counter and monitor flags.

• These are packed into 16 words and swapped with a block whose address is specified by a hardware exchange address register

• Since exchange package does not have all state information, software interrupt handler have to store remaining states


• In general, the higher the percentage of branch type instructions in a program, the slower a program will run on a pipeline processor.

Effect of Branching on Pipeline Performance

• Consider a linear pipeline of 5 stages

Fetch Instruction Decode Fetch

OperandsExecute Store

Results

Overlapped Execution of Instruction without branching

I1

I2I3

I4

I5I6

I7I8

I5 is a branch instruction

I1

I2

I3I4

I5

I6I7

I8

Estimation of the effect of branching on an n-segment instruction pipeline

Estimation of the effect of branching

• Consider an instruction cycle with n pipeline clock periods.

• Let – p – probability of conditional branch (20%)– q – probability that a branch is successful (60% of

20%) (12/20=0.6)


• Suppose there are m instructions • Then no. of instructions of successful branches

= mxpxq (mx0.2x0.6)• Delay of (n-1)/n is required for each successful

branch to flush pipeline.


• Thus, the total instruction cycle required for m instructions =

n

nmpqmn

n

)1(1

1


• As m becomes large , the average no. of instructions per instruction cycle is given as

= ?

nnmpq

nmn

mLtm )1(1


• As m becomes large , the average no. of instructions per instruction cycle is given as

nnmpq

nmn

mLtm )1(1

)1(1

npq

n


• When p =0, the above measure reduces to n, which is ideal.

• In reality, it is always less than n.

Solution = ?

Multiple Prefetch Buffers• Buffers can be used to match the instruction

fetch rate to pipeline consumption rate1.Sequential Buffers: for in-sequence pipelining2.Target Buffers: instructions from a branch

target (for out-of-sequence pipelining)

Multiple Prefetch Buffers• A conditional branch cause both sequential

and target to fill and based on condition one is selected and other is discarded

Data Buffering and Busing Structures

Speeding up of pipeline segments

• The processing speed of pipeline segments are usually unequal.

• Consider the example given below:

S1 S2 S3

T1 T2 T3

Speeding up of pipeline segments• If T1 = T3 = T and T2 = 3T, S2 becomes the

bottleneck and we need to remove it• How?• One method is to subdivide the bottleneck– Two divisions possible are:


• First Method:

S1

T T 2T

S3

T


• First Method:

S1

T T 2T

S3

T


• Second Method:

S1

T T T

S3

T T


• If the bottleneck is not sub-divisible, we can duplicate S2 in parallel

S1

S2

S3

T

3T

T

S2

3T

S2

3T


• Control and Synchronization is more complex in parallel segments

Data Buffering• Instruction and data buffering provides a

continuous flow to pipeline units• Example: 4X TI ASC

Example: 4X TI ASC • In this system it uses a memory buffer unit

(MBU) which– Supply arithmetic unit with a continuous stream

of operands– Store results in memory

• The MBU has three double buffers X, Y and Z (one octet per buffer)– X,Y for input and Z for output

Example: 4X TI ASC • This provides pipeline processing at high rate

and alleviate bandwidth mismatch problem between memory and arithmetic pipeline

Busing Structures

• PBLM: Ideally subfunctions in pipeline should be independent, else the pipeline must be halted till dependency is removed.

• SOLN: An efficient internal busing structure.• Example : TI ASC

Example : TI ASC

• In TI ASC, once instruction dependency is recognized, update capability is incorporated by transferring contents of Z buffer to X or Y buffer.

Chapter One Introduction to Pipelined Processors

Documents

branch instructions

branch instructionestimation

store instructions

new instructions

incomplete instructions

preceding instructions

pipeline performance

linear pipeline