Using Virtual Load/Store Queues Using Virtual Load/Store Queues (VLSQs) to Reduce (VLSQs) to Reduce The Negative Effects of Reordered The Negative Effects of Reordered Memory Instructions Memory Instructions Aamer Jaleel and Bruce Jacob Electrical and Computer Engineering, University of Maryland, College Park {ajaleel, blj} @ eng.umd.edu
27
Embed
Using Virtual Load/Store Queues (VLSQs) to Reduce The Negative Effects of Reordered Memory Instructions Aamer Jaleel and Bruce Jacob Electrical and Computer.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Using Virtual Load/Store Queues Using Virtual Load/Store Queues (VLSQs) to Reduce(VLSQs) to Reduce
The Negative Effects of Reordered The Negative Effects of Reordered Memory InstructionsMemory Instructions
Aamer Jaleel and Bruce JacobElectrical and Computer Engineering,University of Maryland, College Park
{ajaleel, blj} @ eng.umd.edu
A. Jaleel and B. Jacob. “Using Virtual Load/Store Queues to Reduce the Negative Effects of Reordered Memory Instructions”
Paper Motivation• Maximizing Application ILP:
– OoO performance depends on size of instruction window or reorder buffer (ROB)
– Improve ILP by larger ROB sizes
• Before This Paper:– Many studies have showed large performance gains with
large ROBs– Most have discounted real effects in memory subystem
A. Jaleel and B. Jacob. “Using Virtual Load/Store Queues to Reduce the Negative Effects of Reordered Memory Instructions”
Paper Contributions• Uncovering A Problem:
– Increasing OoO capability degrades memory system performance
• Increase in replay traps • Increase in L1 cache misses
• The Reason:– OoO scheduler reordering memory instructions
• The Solution:– Restrict reordering of memory instructions – Virtual Load/Store Queue (VLSQ)
A. Jaleel and B. Jacob. “Using Virtual Load/Store Queues to Reduce the Negative Effects of Reordered Memory Instructions”
Background – Replay Traps• Hardware events to ensure correct
A. Jaleel and B. Jacob. “Using Virtual Load/Store Queues to Reduce the Negative Effects of Reordered Memory Instructions”
VLSQ Performance
• Applications show three different behaviors– Group I: Performance same – non-memory intensive apps– Group II: Performance loss – memory intensive apps– Group III: Performance benefit – alleviating negative effects
• VLSQ of size 16 or 32 is ideal across all apps
Inf
64
3216
8
41
VLSQ Sizes
ROB-512 ROB-512 ROB-512
CPICPICPI
MEMORYALU
OTHER
GROUP IIIGROUP IIGROUP I
A. Jaleel and B. Jacob. “Using Virtual Load/Store Queues to Reduce the Negative Effects of Reordered Memory Instructions”
Power Savings with VLSQs
• Reducing Replay Traps– 5-60% power savings in fetch/map/exec hardware
• Reducing Cache Accesses and Misses– 5-65% savings in L1 data cache
• Savings of 25-30% using VLSQs of 16 or 32
VLSQ 64VLSQ 32
VLSQ 4VLSQ 16
VLSQ 1VLSQ 8
Execution Units(Normalized to Inf)
L1 Cache(Normalized to Inf)
ROB 080ROB 128ROB 256ROB 512
VLSQ 64VLSQ 32
VLSQ 4VLSQ 16
VLSQ 1VLSQ 8
A. Jaleel and B. Jacob. “Using Virtual Load/Store Queues to Reduce the Negative Effects of Reordered Memory Instructions”
Windowing of Load/Store Queue
• Static Mechanism (This Study):– Statically set the size of the virtual window– Drawback: Memory ILP lost during execution phase
where negative effects do not exist
• Dynamic Mechanism (Future Work):– Intuition that negative effects do not always exist– Dynamically vary virtual window size based on