CMPE 421 Parallel Computer Architecture Part 1 Pipeline: HAZARD
Jan 05, 2016
CMPE 421Parallel Computer Architecture
Part 1Pipeline: HAZARD
Pipelining MIPS Lets us examine why the pipeline can not run at full
speed There are some cases, though, where the next instruction can not
begin executing immediately This limits to pipeline are known as hazards
What makes it hard? structural hazards: different instructions, at different stages,
in the pipeline want to use the same hardware resource (resource conflict)
control hazards: succeeding instruction, to put into pipeline, depends on the outcome of a
previous branch instruction, already in pipeline Control decision determines execution path, such as when the instruction
changes the PC data hazards: an instruction in the pipeline requires data to
be computed by a previous instruction still in the pipeline
Before actually building the pipelined datapath and control we first briefly examine these potential hazards individually…
Structural Hazards Structural hazard: inadequate hardware to
simultaneously support all instructions in the pipeline in the same clock cycle
E.g., suppose single – not separate – instruction and data memory in pipeline below with one read port
then a structural hazard between first and fourth lw instructions
MIPS was designed to be pipelined: structural hazards are easy to avoid!
2 4 6 8 10 12 14
Instructionfetch
Reg ALUData
accessReg
Time
lw $1, 100($0)
lw $2, 200($0)
lw $3, 300($0)
2 nsInstruction
fetchReg ALU
Dataaccess
Reg
2 nsInstruction
fetchReg ALU
Dataaccess
Reg
2 ns 2 ns 2 ns 2 ns 2 ns
Programexecutionorder(in instructions)
Pipelined
Instructionfetch
Reg ALUData
accessReg
2 nslw $4, 400($0)
Hazard if single memory
Structural Hazards
Structural HazardEx 1: Suppose we have one memory unit instead of separate instruction and data memory
InstFetch
RegRead
ALU DataAccess
Reg Write
InstFetch
RegRead
ALU DataAccess
Reg Write
InstFetch
RegRead
ALU DataAccess
Reg Write
InstFetch
RegRead
ALU DataAccess
Reg Write
When a load or store word instruction is used the MEM stage tries to access the memory and because of single data memory a conflict occurs
Structural Hazard• Consider a load followed immediately by a
store • Processor only has a single write port
Clock
Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9
IF RF/ID EX WBR-type
IF RF/ID EX WBBR-type
IF RF/ID EX MEMM WBLoad
IF RF/ID EX WBR-type
IF RF/ID EX WBR-type
bubble
Structural Hazard• Solutions
• Delay instruction until functional unit is ready
• Hardware inserts a pipeline stall or a bubble that delays execution of all instructions that follow (previous instructions continue)
• Increases CPI from the ideal value of 1
• Build more sophisticated functional units so that all combinations of instructions can be accommodated
• Example: Allow two simultaneous writes to the register file
Structural Hazard SolutionWrite Back Stall Solution:
Delay R-type register write by one cycle
IF RF/ID EX WBR-type MEM
1 2 3 4
Clock
Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9
IF RF/ID MEM WBR-type
IF RF/ID MEM WBR-type
IF RF/ID EX MEM WBLoad
IF RF/ID MEM WBR-type
IF RF/ID MEM WBR-type
EX
EX
EX
EX
Control Hazards Control hazard: need to make a decision based on the
result of a previous instruction still executing in pipeline Solution 1 Stall the pipeline
Instructionfetch
Reg ALUData
accessReg
Time
beq $1, $2, 40
add $4, $5, $6
lw $3, 300($0)
4 ns
Instructionfetch
Reg ALUData
accessReg
2ns
Instructionfetch
Reg ALUData
accessReg
2ns
2 4 6 8 10 12 14 16Programexecutionorder(in instructions)
Pipeline stall
bubble
Note that branch outcome iscomputed in ID stage withadded hardware (later…)
Control Hazards Solution 2 Predict branch outcome
e.g., predict branch-not-taken :
Instructionfetch
Reg ALUData
accessReg
Time
beq $1, $2, 40
add $4, $5, $6
lw $3, 300($0)
Instructionfetch
Reg ALUData
accessReg
2 ns
Instructionfetch
Reg ALUData
accessReg
2 ns
Programexecutionorder(in instructions)
Instructionfetch
Reg ALUData
accessReg
Time
beq $1, $2, 40
add $4, $5 ,$6
or $7, $8, $9
Instructionfetch
Reg ALUData
accessReg
2 4 6 8 10 12 14
2 4 6 8 10 12 14
Instructionfetch
Reg ALUData
accessReg
2 ns
4 ns
bubble bubble bubble bubble bubble
Programexecutionorder(in instructions)
Prediction success
Prediction failure: undo (=flush) lw
Control Hazards Solution 3 Delayed branch: always execute the
sequentially next statement with the branch executing after one instruction delay – compiler’s job to find a statement that can be put in the slot that is independent of branch outcome
MIPS does this – but it is an option in SPIM (Simulator -> Settings)
Instructionfetch
Reg ALUData
accessReg
Time
beq $1, $2, 40
add $4, $5, $6
lw $3, 300($0)
Instructionfetch
Reg ALUData
accessReg
2 ns
Instructionfetch
Reg ALUData
accessReg
2 ns
2 4 6 8 1 0 12 14
2 ns
(d elayed branch slot)
Programexecutionorder(in instructions)
Delayed branch beq is followed by add that isindependent of branch outcome
Review: Pipelining Multiple Instructions The Instructions in Figures 6-19, 6-20
and 6-21 were independent None of them used the results calculated by
any of the others (register numbers are different)
12
Review: Pipelining Multiple Instructions
13
Review: Pipelining Multiple Instructions
14
Problem with starting next instruction before first is finished
dependencies that “go backward in time” are data hazards
Data Hazards
IM Reg
IM Reg
CC 1 CC 2 CC 3 CC 4 CC 5 CC 6
Time (in clock cycles)
sub $2, $1, $3
Programexecutionorder(in instructions)
and $12, $2, $5
IM Reg DM Reg
IM DM Reg
IM DM Reg
CC 7 CC 8 CC 9
10 10 10 10 10/– 20 – 20 – 20 – 20 – 20
or $13, $6, $2
add $14, $2, $2
sw $15, 100($2)
Value of register $2:
DM Reg
Reg
Reg
Reg
DM
Solution to Data Hazards Data hazard: instruction needs data from the result of a
previous instruction still executing in pipeline Occur when pipeline changes the order of read/write
access to operands so that the order differs from the order seen by sequentially executing instructions
Solution1 Forward data if possible… Solution 2 Or change the relative timing of instructions
(insert stalls)Time
2 4 6 8 10
add $s0, $t0, $t1 IF ID WBEX MEM
add $s0, $t0, $t1
sub $t2, $s0, $t3
Programexecutionorder(in instructions)
IF ID WBEX
IF ID MEMEX
Time2 4 6 8 10
MEM
WBMEM
Instruction pipeline diagram:shade indicates use – left=write, right=read
Without forwarding – blue line –data has to go back in time;with forwarding – red line – data is available in time
•Caused by several different types of dependencies
Data Hazards SOLUTION 1
• Don’t wait for the instruction to complete before trying to resolve the data hazard
• As soon as ALU creates the sum for “add”, we can supply it as an input for the add
• Adding extra H/W to retrieve the missing item early from the internal resources is called forwarding or bypassing
Invalid
Remark: Forwarding path from the output of the memory access stage in the first instruction to the input of the execution stage is invalid (backward in time)
Data Dependency Types-Three classifications of data dependencies for instruction j following instruction I
• Read after Write (RAW)Instr. j tries to read before instr. i tries to write it
• Write after Write (WAW)Instr. j tries to write an operand before i writes its valueSince register writes only occur in WB, the pipeline we have been discussing does not have this type of dependency
• Write after Read (WAR)Instr. j tries to write a destination before it is read by iThis also does not occur in this pipeline we have been discussing since all reads happen early in the ID/RF stage and all writes are late in the WB stage
-WAW and WAR are in later more complicated pipes
Data Hazards Forwarding may not be enough (Hybrid solution is
required) e.g., if an R-type instruction following a load uses the result
of the load – called load-use data hazardTime
2 4 6 8 10 12 14
lw $s0, 20($t1)
sub $t2, $s0, $t3
Programexecutionorder(in instructions)
IF ID WBMEMEX
IF ID WBMEMEX
Time2 4 6 8 10 12 14
lw $s0, 20($t1)
sub $t2, $s0, $t3
Programexecutionorder(in instructions)
IF ID WBMEMEX
IF ID WBMEMEX
bubble bubble bubble bubble bubble
-With a one-stage stall (solution 2)
-Forwarding can get the data to the sub instruction in time (solution 1)
Without a stall it is impossibleto provide input to the subinstruction in time
Reordering Code to Avoid Pipeline Stall (Alternative Software Solution)
Example:lw $t0, 0($t1)lw $t2, 4($t1)sw $t2, 0($t1)sw $t0, 4($t1)
Reordered code:lw $t0, 0($t1)lw $t2, 4($t1)sw $t0, 4($t1)sw $t2, 0($t1)
Data hazard
Interchanged
Revisiting Hazards So far our datapath and control have ignored
hazards We shall revisit data hazards and control hazards
and enhance our datapath and control to handle them in hardware…
Problem with starting an instruction before previous are finished:
data dependencies that go backward in time – called data hazards
Data Hazards and Forwarding
IM Reg
IM Reg
CC 1 CC 2 CC 3 CC 4 CC 5 CC 6
Time (in clock cycles)
sub $2, $1, $3
Programexecutionorder(in instructions)
and $12, $2, $5
IM Reg DM Reg
IM DM Reg
IM DM Reg
CC 7 CC 8 CC 9
10 10 10 10 10/– 20 – 20 – 20 – 20 – 20
or $13, $6, $2
add $14, $2, $2
sw $15, 100($2)
Value of register $2:
DM Reg
Reg
Reg
Reg
DM
sub $2, $1, $3and $12, $2, $5or $13, $6, $2add $14, $2, $2sw $15, 100($2)
$2 = 10 before sub;$2 = -20 after sub
Have compiler guarantee never any data hazards! by rearranging instructions to insert independent
instructions between instructions that would otherwise have a data hazard between them,
or, if such rearrangement is not possible, insert nops
Such compiler solutions may not always be possible, and nops slow the machine down
Software Solution
sub $2, $1, $3 lw $10, 40($3) slt $5, $6, $7
and $12, $2, $5or $13, $6, $2add $14, $2, $2sw $15, 100($2)
sub $2, $1, $3 nop nop
and $12, $2, $5or $13, $6, $2add $14, $2, $2sw $15, 100($2)
or
MIPS: nop = “no operation” = 00…0 (32bits) = sll $0, $0, 0
REVIEW: Solution to HAZARDS
How About Register File Access?
Instr.
Order
Time (clock cycles)
Inst 1
Inst 2
ALUIM Reg DM Reg
ALUIM Reg DM Reg
ALUIM Reg DM Reg
ALUIM Reg DM Reg
Fix register file access hazard by
doing reads in the second half of the cycle and
writes in the first half
add $1,
add $2,$1,
clock edge that controls register writing
clock edge that controls loading of pipeline state registers
Define register reads to occur in the second half of the cycle and register writes in the first half
Register Usage Can Cause Data Hazards
ALUIM Reg DM Reg
ALUIM Reg DM Reg
ALUIM Reg DM Reg
ALUIM Reg DM Reg
ALUIM Reg DM Reg
Dependencies backward in time cause hazards
add $1,
sub $4,$1,$5
and $6,$1,$7
xor $4,$1,$5
or $8,$1,$9
Read before write data hazard
Loads Can Cause Data Hazards
Instr.
Order
lw $1,4($2)
sub $4,$1,$5
and $6,$1,$7
xor $4,$1,$5
or $8,$1,$9A
LUIM Reg DM Reg
ALUIM Reg DM Reg
ALUIM Reg DM Reg
ALUIM Reg DM Reg
ALUIM Reg DM Reg
Dependencies backward in time cause hazards
Load-use data hazard
stall
stall
One Way to “Fix” a Data Hazard
Instr.
Order
add $1,
ALUIM Reg DM Reg
sub $4,$1,$5
and $6,$1,$7
ALUIM Reg DM Reg
ALUIM Reg DM Reg
Can fix data hazard by
waiting – stall – but impacts
CPI
Another Way to “Fix” a Data Hazard
ALUIM Reg DM Reg
ALUIM Reg DM Reg
ALUIM Reg DM Reg
Fix data hazards by forwarding results as soon
as they are available to
where they are needed
ALUIM Reg DM Reg
ALUIM Reg DM Reg
Instr.
Order
add $1,
sub $4,$1,$5
and $6,$1,$7
xor $4,$1,$5
or $8,$1,$9
Forwarding paths are valid only if the destination stage is later in time than the source stage.Forwarding is harder if there are multiple results to forward per instruction or if they need to write a result early in the pipeline
Forwarding with Load-use Data Hazards
ALUIM Reg DM Reg
ALUIM Reg DM Reg
ALUIM Reg DM Reg
ALUIM Reg DM Reg
ALUIM Reg DM Reg
Will still need one stall cycle even with forwarding
Instr.
Order
lw $1,4($2)
sub $4,$1,$5
and $6,$1,$7
xor $4,$1,$5
or $8,$1,$9
Branch Instructions Cause Control Hazards
Instr.
Order
lw
Inst 4
Inst 3
beq
ALUIM Reg DM Reg
ALUIM Reg DM Reg
ALUIM Reg DM Reg
ALUIM Reg DM Reg
Dependencies backward in time cause hazards
One Way to “Fix” a Control Hazard Another “solution” is to put in enough extra
hardware so that we can test registers, calculate the branch address, and update the PC during the second stage of the pipeline. That would reduce the number of stalls to only one.
A third approach is to prediction to handle branches, e.g., always predict that branches will be untaken. When right, the pipeline proceeds at full speed. When wrong, have to stall (and make sure nothing completes – changes machine state – that shouldn’t have).
stall
stall
stall
One Way to “Fix” a Control Hazard
Instr.
Order
beq
ALUIM Reg DM Reg
lw
ALUIM Reg DM Reg
ALU
Inst 3IM Reg DM
Fix branch hazard by waiting –
stall – but affects CPI