Lecture 11: Pipelining and Branch Prediction EEN 312: Processors: Hardware, Software, and Interfacing Department of Electrical and Computer Engineering Spring 2014, Dr. Rozier (UM)
Feb 05, 2016
Lecture 11: Pipelining and Branch Prediction
EEN 312: Processors: Hardware, Software, and Interfacing
Department of Electrical and Computer EngineeringSpring 2014, Dr. Rozier (UM)
THE QUIZ SHOW!
Today’s class will be a quiz show
• We will be solving puzzles involving pipelining, branch prediction, and the stack.
• Form up into groups of 8 individuals
• Points for correct solutions, the extra credit points awarded to the top teams:– 4 pts for 1st place– 3 pts for 2nd place– 2 pts for 3rd place– 1 pt for 4th place
The Rules!
• Each group will elect a “buzzer” when the buzzer raises his hand, your group will be called on to solve the puzzle.
• One representative will be sent up per group. They will give their answer and explain it.
• Once the buzzer has raised his hand, your group must stop discussing the answer!
PIPELINING
Pipelining
• Assume r5 != r4• Assume there is one memory for
instructions and data.• During a cycle either data can be
loaded for an instruction OR an instruction can be fetched, not both.
(100) A structural hazard exists. What is it?
str r0, [r1, #16]ldr r0, [r1, #8]cmp r5, r4beq labeladd r5, r2, r4add r5, r5, r0
Pipelining
• Assume r5 != r4• Assume there is one memory for
instructions and data.• During a cycle either data can be
loaded for an instruction OR an instruction can be fetched, not both.
(200) Can this structural hazard be eliminated by adding “bubbles” to the pipeline in the form of NOP instructions?
str r0, [r1, #16]ldr r0, [r1, #8]cmp r5, r4beq labeladd r5, r2, r4add r5, r5, r0
Pipelining
• Assume r5 != r4• Assume there is one memory for
instructions and data.• During a cycle either data can be
loaded for an instruction OR an instruction can be fetched, not both.
(300) To guarantee forward progress, how must this hazard be resolved? In favor of data access, or instruction fetching? Why?
str r0, [r1, #16]ldr r0, [r1, #8]cmp r5, r4beq labeladd r5, r2, r4add r5, r5, r0
Pipelining
• Assume r5 != r4• Assume there is one memory for
instructions and data.• During a cycle either data can be
loaded for an instruction OR an instruction can be fetched, not both.
(400) Draw the 5-stage pipeline for this code, assume the stages are:
Fetch, Decode, Execute, Memory, Writeback.
What is the total execution time?
str r0, [r1, #16]ldr r0, [r1, #8]cmp r5, r4beq labeladd r5, r2, r4add r5, r5, r0
Pipelining
• Assume r5 != r4• Assume there is one memory for
instructions and data.• During a cycle either data can be
loaded for an instruction OR an instruction can be fetched, not both.
(500) Assume we have a new processor such that when the offset is zero on a memory operation, the Execute stage (ALU) can be skipped. The MEM and EXECUTE can now be overlapped in the pipeline. What speedup is achieved with this new architecture?
str r0, [r1, #0]ldr r0, [r10, #0]cmp r5, r4beq labeladd r5, r2, r4add r5, r5, r0
DATA DEPENDENCIES
Data Dependencies
(100) Find all data dependencies in this sequence.
ldr r1, [r1, #0]and r1, r1, r2ldr r2, [r1, #0]ldr r1, [r3, #0]
Data Dependencies
(200) Find all hazards in this sequence, with and without forwarding, for a 5-stage pipeline assume the stages are:
Fetch, Decode, Execute, Memory, Writeback.
ldr r1, [r1, #0]and r1, r1, r2ldr r2, [r1, #0]ldr r1, [r3, #0]
Data Dependencies
(300) To reduce the clock cycle time, we are considering a split of the MEM stage into two stages.
Find all hazards in this sequence for a 5-stage pipeline, with and without forwarding, assume the stages are:
Fetch, Decode, Execute, Memory, Writeback.
add r1, r2, r1ldr r2, [r1, #0]ldr r1, [r1, #4]or r3, r1, r2
Data Dependencies
• Assume all data memory values are 0’s.
• Assume:– r0 = 0– r1 = -1– r2 = 31– r3 = 1500
• Assume the processor has forwarding logic for hazards.
(400) What value is the first one to be forwarded, and what is the value it overrides?
add r1, r2, r1ldr r2, [r1, #0]ldr r1, [r1, #4]or r3, r1, r2
Data Dependencies
• Assume all data memory values are 0’s.
• Assume:– r0 = 0– r1 = -1– r2 = 31– r3 = 1500
(500) The hazard detection unit assumes forwarding was implemented, but the processor designers, (UF students) forgot to implement it!
What are the final register values? What should they be?Add NOPs to this sequence to ensure
correct execution despite UF’s screw up!
add r1, r2, r1ldr r2, [r1, #0]ldr r1, [r1, #4]or r3, r1, r2
BRANCH PREDICTION
Branch Prediction
(100) When building a branch prediction unit, define for the following cases if the best choice is “branch not taken” or “branch taken” for the prediction:
1.Branches associated with “If” statements2.Branches associated with “Else if” statements3.Branches associated with “Else” Statements4.Branches associated with “For” Statements
Branch Prediction
(200) Design a dynamic branch predictor for if statements and loops. Describe how to implement it in hardware. What new hardware might it require?
Branch Prediction
• Assume branch prediction is handled by branch not taken.
• Assume one element of the array at r2 is equal to 100.
(300) How many times is the branch predicted correctly versus incorrectly?
00: mov r1, #001: mov r2, #DEADBEEFLOOP:02: ldr r3, [r2, r0 lsl 2]03: cmp r3, #10004: beq LABEL05: mov r4, r3LABEL:06: add r0, r0, #107: cmp r0, #508: beq LOOP09: mov r0, r410: add r0, r0, #1
Branch Prediction
• Assume branch prediction is handled by branch not taken.
• Assume one element of the array at r2 is equal to 100.
• Assume the PC pipeline is three instructions deep
• Assume the PC pipeline can be flushed in one cycle, and on a miss prediction must be fully flushed.
• Assume a pipeline with the phases:Fetch, Decode, Issue, Execute, Memory, and Writeback
• Assume branches are evaluated in the issue step, and the pipeline flushed during execute
(400) How many cycles does the loop take?
00: mov r1, #001: mov r2, #DEADBEEFLOOP:02: ldr r3, [r2, r0 lsl 2]03: cmp r3, #10004: beq LABEL05: mov r4, r3LABEL:06: add r0, r0, #107: cmp r0, #508: beq LOOP09: mov r0, r410: add r0, r0, #1
Branch Prediction
• Assume branch prediction is handled by branch not taken.
• Assume the PC pipeline is three instructions deep
• Assume the PC pipeline can be flushed in one cycle, and on a miss prediction must be fully flushed.
• Assume a pipeline with the phases:Fetch, Decode, Issue, Execute, Memory, and Writeback
• Assume branches are evaluated in the issue step, and the pipeline flushed during execute
(500) Act as the compiler. Optimize the code for branch not taken. How many cycles does it take?
00: mov r1, #001: mov r2, #DEADBEEFLOOP:02: ldr r3, [r2, r0 lsl 2]03: cmp r3, #10004: beq LABEL05: mov r4, r3LABEL:06: add r0, r0, #107: cmp r0, #508: beq LOOP09: mov r0, r410: add r0, r0, #1
PROCESSOR ARCHITECTURE
Processor Architecture
(100) For a five stage pipeline with stages: Fetch, Decode, Execute, Memory, and Writeback, describe what happens in each stage.
Processor Architecture
(200) Describe the purpose of a clock signal in a processor. Why do processors need clock signals?
Processor Architecture
(300) Describe how during the Decode phase registers are selected from the register file. How is this accomplished in hardware?
Processor Architecture
(400) Why must we allocate new registers in the datapath for the writeback register instead of reading it from the decode phase?
Processor Architecture
(500) Design a one bit full adder.
REPRESENTATION OF DATA
Representation of Data
(100) Describe the difference between big endian and little endian representations.
Representation of Data
(200) Represent the following data in big endian and little endian formats:
1.00ac8eff
2.54897743
3.be88fac8
Representation of Data
(300) Represent the following data as hexadecimal numbers in big and little endian formats. Assume unsigned integers
1.128
2.976
Representation of Data
(400) Represent the following data as hexadecimal numbers in big and little endian formats. Assume signed integers
1.-55
2.99
Representation of Data
(500) Write assembly code which takes data from one register in Big Endian format and stores it in a new register in Little Endian format.
You may use temporary registers.
FINAL QUESTION
Final Question
• Each team should decide an amount of points to bid.
• Write down your bids on a sheet of paper and hand them in.
• You will have only 60 seconds to answer the next question as a team, write your answers down by the time limit.– Answer correctly and you will add your bid to your score.– Answer incorrectly and you will lose those points.
Final Question
In order to detect data hazards, new hardware must be added. Assuming that the registers ids involved in an instruction are available during the decode stage, what hardware would be necessary to check for data hazards?
WRAP UP
For next time
• Enjoy your spring break!
• Read Chapter 5, sections 5.1 – 5.3