Chapter 4 The Processor
Jan 04, 2016
Chapter 4
The Processor
Chapter 4 — The Processor — 2
Introduction We will examine two MIPS implementations
A simplified version A more realistic pipelined version
Simple subset, shows most aspects Memory reference: lw, sw Arithmetic/logical: add, sub, and, or, slt Control transfer: beq, j
§4.1 Introduction
Login using
Username : your username
Password : your email password.
Uoh.blackboard.com
Go to “Courses” menu
Select “201401_COE308_001_3646: Computer
Architecture”
Select “Content “
Slides
First Task
First Task
Chapter 4 — The Processor — 10
Pipelining Analogy Pipelined laundry: overlapping execution
Parallelism improves performance
§4.5 An O
verview of P
ipelining Four loads: Speedup
= 8/3.5 = 2.3
Chapter 4 — The Processor — 11
MIPS Pipeline Five stages, one step per stage
1. IF: Instruction fetch from memory
2. ID: Instruction decode & register read
3. EX: Execute operation or calculate address
4. MEM: Access memory operand
5. WB: Write result back to register
Chapter 4 — The Processor — 12
Pipeline Performance Assume time for stages is
100ps for register read or write 200ps for other stages
Compare pipelined datapath with single-cycle datapath
Instr Instr fetch Register read
ALU op Memory access
Register write
Total time
lw 200ps 100 ps 200ps 200ps 100 ps 800ps
sw 200ps 100 ps 200ps 200ps 700ps
R-format 200ps 100 ps 200ps 100 ps 600ps
beq 200ps 100 ps 200ps 500ps
Chapter 4 — The Processor — 13
Pipeline PerformanceSingle-cycle (Tc= 800ps)
Pipelined (Tc= 200ps)
BasicBasic IdeaIdea
Assembly Line
Divide the execution of a task among a number of stages
A task is divided into subtasks to be executed in sequence
Performance improvement compared to sequential execution
PipelinePipeline
Task
1 2 n
Sub-tasks
1 2 n
Pipeline
Stream ofTasks
5 Tasks on 4 stage pipeline5 Tasks on 4 stage pipeline
Task 1
Task 2
Task 3
Task 4
Task 5
1 2 3 4 5 6 7 8Time
SpeedupSpeedupt t t
1 2 n
Pipeline
Stream ofm Tasks
T (Seq) = n * m * t
T(Pipe) = n * t + (m-1) * t
Speedup = n * m/n + m -1
Efficiency Efficiency t t t
1 2 n
Pipeline
Stream ofm Tasks
T (Seq) = n * m * t
T(Pipe) = n * t + (m-1) * t
Efficiency = Speedup/ n =m/(n+m-1)
Throughput Throughput t t t
1 2 n
Pipeline
Stream ofm Tasks
T (Seq) = n * m * t
T(Pipe) = n * t + (m-1) * t
Throughput = no. of tasks executed per unit of time = m/((n+m-1) x t)
Instruction Pipeline Instruction Pipeline
Pipeline stall Some of the stages might need more time to perform its
function. E.g. I2 needs 3 time units to perform its function
This is called a “Bubble” or “pipeline hazard”
Pipeline and Instruction Dependency Pipeline and Instruction Dependency
Instruction Dependency The operation performed by a stage depends on the operation(s)
performed by other stage(s).
E.g. Conditional Branch Instruction I4 can not be executed until the branch
condition in I3 is evaluated and stored. The branch takes 3 units of time
Group Activity Group Activity
Show a Gantt chart for 10 instructions that enter a four-stage pipeline (IF, ID, IE , and IS)?
Assume that I5 fetching process depends on the results of the I4 evaluation.
Answer Answer
Pipeline and Data Dependency Pipeline and Data Dependency
Data Dependency: A source operand of instruction Ii depends on the results of
executing a proceeding Ij i > j
E.g. Ij can not be fetched unless the results of Ii are saved.
Group Activity Group Activity
ADD R1, R2, R3 R3 R1 + R2 Ii
SL R3; R3 SL(R3) Ii+1
SUB R5, R6, R4 R4 R5 – R6 Ii+2
Assume that we have five stages in the pipeline: IF (Instruction Fetch) ID (Instruction Decode) OF (Operand Fetch) IE (Instruction Execute) IS (Instruction Store)
Show a Gantt chart for this code?
Answer Answer
R3 in both Ii and Ii+1 need to be written Therefore, the problem is a
Write after Write Data Dependancy
When stalls occur in the pipeline ?When stalls occur in the pipeline ? Write after write Read after write Write after read Read after read does not cause stall
Read after write
Group Activity Group Activity Consider the execution of the following sequence of
instructions on a five-stage pipeline consisting of IF, ID, OF, IE, and IS. It is required to show the succession of these instructions in the pipeline. Show all types of data dependency? Show the speedup and efficiency?
Answer Answer
No Operation MethodNo Operation Method
Prevents Fetching the Wrong Instruction / Operand
Equivalent to doing nothing
Group ActivityGroup Activity Consider the execution of ten instructions I1–I10 on a
pipeline consisting of four pipeline stages: IF, ID, IE, and IS. Assume that instruction I4 is a conditional branch instruction and that when it is executed, the branch is not taken; that is, the branch condition is not satisfied. Draw Gantt chart showing Nop?
Answer Answer Prevents Fetching Wrong Instruction
Group ActivityGroup Activity Consider the execution of the following
piece of code on a five-stage pipeline (IF, ID, OF, IE, IS). Draw Gantt chart with Nop?
Answer Answer Prevents Fetching Wrong Operands
Reducing the Stalls Due to Instruction Dependency
Unconditional Branch InstructionsUnconditional Branch Instructions Reordering of Instructions
Use of Dedicated Hardware in the Fetch Unit Speed up the fetching instruction
Precomputing the Branch and Reordering the Instructions
Instruction prefetch Instructions can be fetched and stored in the instruction
queue.
Conditional Branching Instructions Conditional Branching Instructions The target of the conditional branch address will not be known
until the execution of the conditional branch has been completed.
Delayed Branch Fill the pipeline with some instruction until the branch instruction is
executed
Prediction of the next instruction to be fetched It is based on that the branch outcome is random Assume that the branch is not taken If the predication is correct , we saved the time Otherwise, we redo everything
Example Example Before delaying
After Delaying
Reducing Pipeline Stalls due to Data Dependency
Hardware Operand ForwardingHardware Operand Forwarding Allows the result of ALU operation to be available to another
ALU operation.
SUB can not start until R3 is stored If we can forward R3 to the Sub at the same time of the store
operation will save a stall time
Group ActivityGroup Activity
Group Activity Group Activity
Group activity Group activity int I, X=3;
for( i=0;i<10;i++ ) {
X= X+ 5 ;
}Assume that we have five stages in the pipeline:IF (Instruction Fetch)ID (Instruction Decode)OF (Operand Fetch) IE (Instruction Execute)IS (Instruction Store)
Show a Gantt chart for this code?
Group activity Group activity int I, X=3;
for( i=0;i<10;i++ ) {
X= X+ 5 ;}
MIPS Code
1.li $t0, 10 # t0 is a constant 10
2.li $t1, 0 # t1 is our counter (i)
3.li $t2, 3 # t2 is our x
4.loop:
5.beq $t1, $t0, end # if t1 == 10 we are done
6.Add $t2, $t2, 5 #Add 5 to x
7.addi $t1, $t1, 1 # add 1 to t1
8.j loop # jump back to the top
9.end: