Hardware Based Speculation
Hardware Based Speculation
2
Hardware-Based Speculation
•Branch prediction reduces the stalls attributable to branches
• For a processor executing multiple instructions• Just predicting branch is not enough• Multiple issue processor may execute a branch every clock
cycle
•Exploiting parallelism requires that we overcome the limitation of control dependence
3
Hardware-Based Speculation
•Greater ILP: Overcome control dependence by hardware speculating on outcome of branches and executing program as if guesses were correct•If prediction is wrong it needs a hardware to handle it
• extension over branch prediction with dynamic scheduling• Speculation fetch, issue, and execute instructions as if branch
predictions were always correct • Dynamic scheduling only fetches and issues such instructions
•Essentially a data flow execution model: Operations execute as soon as their operands are available
4
Hardware-Based Speculation
•3 components of HW-based speculation:
• Dynamic branch prediction to choose which instructions
to execute
• Speculation to allow execution of instructions before
control dependences are resolved
• Dynamic scheduling to deal with scheduling of different
combinations of basic blocks
5
Hardware-Based Speculation
•Extending Tomasulo’s algorithm
To support speculation -must separate the bypassing of
results among instruction (speculative instruction) from the actual
completing of an instruction
•out-of-order execution but in-order commit
•the register file is not updated until instruction commits
6
Hardware-Based Speculation
Store
• Store still takes 2 steps. 2nd step performed by instn commit
•Instruction commit unit-if instruction is executed completely then it
is updated in memory or register
• every instruction has a place in the ROB until it commits
7
Hardware-Based Speculation in Tomasulo
•The key idea • allow instructions to execute out of order • force instructions to commit in order• prevent any irrevocable action (such as updating state or
taking an exception) until an instruction commits.
•Hence: • Must separate execution from allowing instruction to finish or
“commit”• instructions may finish execution considerably before they
are ready to commit.
•This additional step called instruction commit
8
Hardware-Based Speculation in Tomasulo
•When an instruction is no longer speculative, allow it to update the register file or memory
Reorder buffer (ROB)
•Requires additional set of buffers to hold results of instructions that have finished execution but have not committed
•used to pass results among instructions that may be speculated
9
Reorder Buffer
•In Tomasulo’s algorithm, once an instruction writes its result, any subsequently issued instructions will find result in the register file
•With speculation, the register file is not updated until the instruction commits
• (we know definitively that the instruction should execute)
•Thus, the ROB supplies operands in interval between completion of instruction execution and instruction commit
• ROB is a source of operands for instructions, just as reservation stations (RS) provide operands in Tomasulo’s algorithm
• ROB extends architectured registers like RS
10
Reorder Buffer Structure (Four fields)
•instruction type field• Indicates whether the instruction is a branch (and has no destination
result), a store (which has a memory address destination), or a register operation (ALU operation or load, which has register destinations).
•destination field • supplies the register number (for loads and ALU operations) or the
memory address (for stores) where the instruction result should be written.
•value field • hold the value of the instruction result until the instruction commits.
•ready field • indicates that the instruction has completed execution, and the value is
ready.
11
Reorder Buffer Operation
•Holds instructions in FIFO order, exactly as issued•When instructions complete, results placed into ROB
• Supplies operands to other instruction between execution complete & commit => more registers like RS
• Tag results with ROB buffer number instead of reservation station
•Instructions commit =>values at head of ROB placed in registers•As a result, easy to undo speculated instructions on mispredicted branches or on exceptions
ReorderBufferFP
OpQueue
FP Adder FP Adder
Res Stations Res Stations
FP Regs
Commit path
12
Where is the store queue?
13
4 Steps of Speculative Tomasulo
1- Issue —get instruction from instruction queueIf reservation station and reorder buffer slot free, issue instr & send operands (if available: either from ROB or FP registers) to reservation station & send reorder buffer no. allocated for result to reservation station (tag the result when it is placed on CDB)
14
4 Steps of Speculative Tomasulo
2. Execution —operate on operands (EX)When both operands ready then execute; if not ready, watch CDB for result; when both in reservation station, execute; checks RAW
Loads still require 2 step process, instructions may take multiple cycles,, stores is only effective address calculation (need Rs),
15
4 Steps of Speculative Tomasulo
4. Commit a) when an instruction reaches the head of the ROB and its result is present in the buffer;
• update the register with the result and remove the instruction from
the ROB.
b) Committing a store is similar except that memory is
updated rather than a result register.
c) If a branch with incorrect prediction reaches the head ROB• it indicates that the speculation was wrong.
• ROB is flushed and execution is restarted at the correct successor
of the branch.
d) If the branch was correctly predicted, the branch is finished.
•Once an instruction commits, its entry in the ROB is
reclaimed. If the ROB fills, we simply stop issuing instructions
until an entry is made free.
Tomasulo With Reorder buffer:
ToMemory
FP adders FP multipliers
Reservation Stations
FP OpQueue
ROB7
ROB6
ROB5
ROB4
ROB3
ROB2
ROB1F0 LD F0,16(R2) N
Done?
DestDest
Oldest
Newest
from Memory
1 10+R2Dest
Reorder Buffer
Registers
LD F0,16(R2)
ADDD F10,F4,F0
DIVD F2,F10,F6
Dest. Value Instruction
17
2 ADDD R(F4),ROB1
Tomasulo With Reorder buffer:
ToMemory
FP adders FP multipliers
Reservation Stations
FP OpQueue
ROB7
ROB6
ROB5
ROB4
ROB3
ROB2
ROB1
F10
F0
ADDD F10,F4,F0
LD F0,16(R2)
N
N
Done?
DestDest
Oldest
Newest
from Memory
1 10+R2Dest
Reorder Buffer
Registers
LD F0,16(R2)
ADDD F10,F4,F0
DIVD F2,F10,F6
Dest. Value Instruction
18
3 DIVD ROB2,R(F6)2 ADDD R(F4),ROB1
Tomasulo With Reorder buffer:
ToMemory
FP adders FP multipliers
Reservation Stations
FP OpQueue
ROB7
ROB6
ROB5
ROB4
ROB3
ROB2
ROB1
F2
F10
F0
DIVD F2,F10,F6
ADDD F10,F4,F0
LD F0,16(R2)
N
N
N
Done?
DestDest
Oldest
Newest
from Memory
1 10+R2Dest
Reorder Buffer
Registers
LD F0,16(R2)
ADDD F10,F4,F0
DIVD F2,F10,F6
Dest. Value Instruction
19
Avoiding Memory Hazards
• WAW and WAR hazards through memory are eliminated with speculation
• RAW hazards through memory are avoided by two restrictions: