Chapter One Introduction to Pipelined Processors
Feb 24, 2016
Chapter One Introduction to Pipelined
Processors
Principle of Designing Pipeline Processors
(Design Problems of Pipeline Processors)
Instruction Prefetch and Branch Handling
• The instructions in computer programs can be classified into 4 types:– Arithmetic/Load Operations (60%) – Store Type Instructions (15%)– Branch Type Instructions (5%)– Conditional Branch Type (Yes – 12% and No – 8%)
Instruction Prefetch and Branch Handling
• Arithmetic/Load Operations (60%) : – These operations require one or two operand
fetches. – The execution of different operations requires a
different number of pipeline cycles
Instruction Prefetch and Branch Handling
• Store Type Instructions (15%) :– It requires a memory access to store the data.
• Branch Type Instructions (5%) :– It corresponds to an unconditional jump.
Instruction Prefetch and Branch Handling
• Conditional Branch Type (Yes – 12% and No – 8%) : – Yes path requires the calculation of the new
address – No path proceeds to next sequential instruction.
Instruction Prefetch and Branch Handling
• Arithmetic-load and store instructions do not alter the execution order of the program.
• Branch instructions and Interrupts cause some damaging effects on the performance of pipeline computers.
Handling Example – Interrupt System of Cray1
Cray-1 System• The interrupt system is built around an
exchange package. • When an interrupt occurs, the Cray-1 saves 8
scalar registers, 8 address registers, program counter and monitor flags.
• These are packed into 16 words and swapped with a block whose address is specified by a hardware exchange address register
Instruction Prefetch and Branch Handling
• In general, the higher the percentage of branch type instructions in a program, the slower a program will run on a pipeline processor.
Effect of Branching on Pipeline Performance
• Consider a linear pipeline of 5 stages
Fetch Instruction Decode Fetch
OperandsExecute Store
Results
Overlapped Execution of Instruction without branching
I1
I2
I3
I4
I5
I6
I7
I8
I5 is a branch instruction
I1
I2
I3
I4
I5
I6
I7
I8
Estimation of the effect of branching on an n-segment instruction pipeline
Estimation of the effect of branching
• Consider an instruction cycle with n pipeline clock periods.
• Let – p – probability of conditional branch (20%)– q – probability that a branch is successful (60% of
20%) (12/20=0.6)
Estimation of the effect of branching
• Suppose there are m instructions • Then no. of instructions of successful branches
= mxpxq (mx0.2x0.6)• Delay of (n-1)/n is required for each successful
branch to flush pipeline.
Estimation of the effect of branching
• Thus, the total instruction cycle required for m instructions =
nnmpqmn
n)1(11
Estimation of the effect of branching
• As m becomes large , the average no. of instructions per instruction cycle is given as
= ?
nnmpq
nmn
mLtm )1(1
Estimation of the effect of branching
• As m becomes large , the average no. of instructions per instruction cycle is given as
nnmpq
nmn
mLtm )1(1
)1(1
npqn
Estimation of the effect of branching
• When p =0, the above measure reduces to n, which is ideal.
• In reality, it is always less than n.
Solution = ?
Multiple Prefetch Buffers• Three types of buffers can be used to match
the instruction fetch rate to pipeline consumption rate
1. Sequential Buffers: for in-sequence pipelining
2. Target Buffers: instructions from a branch target (for out-of-sequence pipelining)
Multiple Prefetch Buffers• A conditional branch cause both sequential
and target to fill and based on condition one is selected and other is discarded
Multiple Prefetch Buffers
3. Loop Buffers– Holds sequential instructions within a loop
Data Buffering and Busing Structures
Speeding up of pipeline segments• The processing speed of pipeline segments are
usually unequal.• Consider the example given below:
S1 S2 S3
T1 T2 T3
Speeding up of pipeline segments• If T1 = T3 = T and T2 = 3T, S2 becomes the
bottleneck and we need to remove it• How?• One method is to subdivide the bottleneck– Two divisions possible are:
Speeding up of pipeline segments• First Method:
S1
T T 2T
S3
T
Speeding up of pipeline segments• First Method:
S1
T T 2T
S3
T
Speeding up of pipeline segments• Second Method:
S1
T T T
S3
T T
Speeding up of pipeline segments
• If the bottleneck is not sub-divisible, we can duplicate S2 in parallel
S1
S2
S3
T
3T
T
S2
3T
S2
3T
Speeding up of pipeline segments
• Control and Synchronization is more complex in parallel segments
Data Buffering• Instruction and data buffering provides a
continuous flow to pipeline units• Example: 4X TI ASC
Example: 4X TI ASC • In this system it uses a memory buffer unit
(MBU) which– Supply arithmetic unit with a continuous stream of
operands– Store results in memory
• The MBU has three double buffers X, Y and Z (one octet per buffer)– X,Y for input and Z for output
Example: 4X TI ASC • This provides pipeline processing at high rate
and alleviate mismatch bandwidth problem between memory and arithmetic pipeline
Busing Structures• PBLM: Ideally subfunctions in pipeline should
be independent, else the pipeline must be halted till dependency is removed.
• SOLN: An efficient internal busing structure.• Example : TI ASC
Example : TI ASC
• In TI ASC, once instruction dependency is recognized, update capability is incorporated by transferring contents of Z buffer to X or Y buffer.