(C) 2004 Daniel Sorin Duke Architecture Using Speculation to Simplify Multiprocessor Design Daniel J. Sorin 1 , Milo M. K. Martin 2 , Mark D. Hill 3 , David A. Wood 3 1 Dept. of Electrical & Computer Engineering, Duke University 2 Dept. of Computer & Information Science, Univ. of Pennsylvania 3 Computer Sciences Dept., University of Wisconsin- Madison
23
Embed
(C) 2004 Daniel SorinDuke Architecture Using Speculation to Simplify Multiprocessor Design Daniel J. Sorin 1, Milo M. K. Martin 2, Mark D. Hill 3, David.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
(C) 2004 Daniel Sorin Duke Architecture
Using Speculation to Simplify Multiprocessor Design
Daniel J. Sorin1, Milo M. K. Martin2, Mark D. Hill3, David A. Wood3
1Dept. of Electrical & Computer Engineering, Duke University2Dept. of Computer & Information Science, Univ. of Pennsylvania
3Computer Sciences Dept., University of Wisconsin-Madison
IPDPS 2004 – Daniel Sorinslide 2
My Talk in One Slide
• Shared memory multiprocessors are complicated– Difficult to design for every possible corner case
• Proposal: Use speculation to target the common case– Speculate that corner cases won’t happen– Detect if they do occur and recover system– Ensure forward progress
• Case studies– Simplify cache coherence protocols– Simplify the interconnection network
IPDPS 2004 – Daniel Sorinslide 3
Speculation for Simplicity
• Why we want to avoid complexity– Time and money for design and verification
• Design for the common case– But we have to make ALL cases work correctly
• Examples of this philosophy in uniprocessors– Trapping to software for infrequent/obsolescent instructions– Pentium4 recovers from edge case scheduler deadlocks
• But this idea hadn’t been used for multiprocessors– Key: we now have efficient multiprocessor recovery
IPDPS 2004 – Daniel Sorinslide 4
Framework for Speculation
• Four keys to design simplification with speculation
1) Ensure that mis-speculations are rare
2) Detect all mis-speculations
3) Recover from mis-speculations
4) Ensure forward progress even for worst-case
IPDPS 2004 – Daniel Sorinslide 5
SafetyNet Checkpoint/Recovery
• We use SafetyNet [ISCA 2002] for system recovery• All-hardware checkpoint/recovery for shared
memory multiprocessors• Periodically, takes logical checkpoints of system
– Including caches, coherence state, memory, directory state– Implements checkpointing with incremental logging– Consistent checkpoints using logical time coordination
• Can recover 100,000+ cycles• Negligible performance impact
– Incremental logging performed off critical path
• Small log buffers (512 KB) at caches & memories
IPDPS 2004 – Daniel Sorinslide 6
The Need for Multiprocessor Recovery
• Assumption: multiprocessors will have system-wide recovery mechanisms for purposes of availability– As fault rates keep increasing, recovery is crucial
• Will be all-hardware (like SafetyNet) for performance– But many alternative designs are possible
• We leverage this recovery mechanism for recovering from mis-speculations
IPDPS 2004 – Daniel Sorinslide 7
Outline
• A Framework for Speculation• Simplifying Cache Coherence Protocols• Simplifying the Interconnection Network• Evaluation• Conclusions
IPDPS 2004 – Daniel Sorinslide 8
Directory Protocol Complexity
• We want adaptive routing in interconnection network– Better performance and availability– But adaptive routing precludes point-to-point ordering
• So what?– Point-to-point ordering simplifies protocol design– Eliminates several potential corner case races
IPDPS 2004 – Daniel Sorinslide 9
Race Case in Directory Protocol
• Example race if no point-to-point ordering in network
P1
Dir
P2
RequestReadWrite
Writeback
RequestReadWrite arrives first at Dir, gets forwarded to P1
Forwarded RequestReadWrite
IPDPS 2004 – Daniel Sorinslide 10
Race Case in Directory Protocol
P1
Dir
P2
RequestReadWrite
Forwarded RequestReadWrite
Writeback AckWriteback
Forwarded RequestReadWrite arrives after Writeback Ack
IPDPS 2004 – Daniel Sorinslide 11
Race Case in Directory Protocol
• Problem: P1 sees Forwarded Request in state Invalid
P1
Dir
P2
RequestReadWrite
Forwarded RequestReadWrite
Writeback AckWriteback
Not possible if point-to-point order in interconnection network
IPDPS 2004 – Daniel Sorinslide 12
Simplifying a Directory Protocol
• Speculate that adaptive network provides ordering
1) Why is mis-speculation rare?– Not many re-orderings– Most re-orderings don’t matter!
2) How do we detect all mis-speculations?– If we get a Forwarded RequestReadWrite in state Invalid
3) How do we recover?– SafetyNet
4) How do we ensure forward progress?– Slow-start operation for a while after recovery– Guarantees that this race can’t keep recurring
IPDPS 2004 – Daniel Sorinslide 13
Simplifying a Snooping Coherence Protocol
• During design, we missed a corner case
StateM
Statetrans1
WritebackStatetrans2
Request ReadWrite
Request ReadWrite
• Solution: it’s rare, treat it as mis-speculation• Detect by seeing RequestReadWrite in state trans2• Recovery with SafetyNet• Forward progress with slow-start after recovery
???
IPDPS 2004 – Daniel Sorinslide 14
Outline
• A Framework for Speculation• Simplifying Cache Coherence Protocols• Simplifying the Interconnection Network
– Deadlock– Avoiding deadlock
• Evaluation• Conclusions
IPDPS 2004 – Daniel Sorinslide 15
Two Causes of Deadlock
P1
P2
Response
full of requests
full of requests
Response
Message M1
full of messages
full of messages
Message M2
Endpoint
Deadlock
Switch
Deadlock
switch1
switch2
IPDPS 2004 – Daniel Sorinslide 16
Avoiding Deadlock
• Simple but wasteful solution: full buffering– But it’s rare that we ever need full buffering
• More efficient solution: virtual channels (networks)• For endpoint deadlock
– Need a virtual network per type of message
• For switch deadlock– Need some number of virtual channels per virtual network– Depends on network topology and routing scheme
• A major source of design complexity
IPDPS 2004 – Daniel Sorinslide 17
Simplifying Deadlock Avoidance
• Speculate that deadlock won’t occur, despite using less than full buffering and no virtual channels
1) Why is mis-speculation rare?– Can usually avoid deadlock with reasonable buffering
2) How do we detect all mis-speculations?– Timeout mechanism for cache coherence transactions
3) How do we recover?– SafetyNet
4) How do we ensure forward progress?– Slow-start operation for a while after recovery– Guarantees that deadlock can’t keep recurring
IPDPS 2004 – Daniel Sorinslide 18
Outline
• A Framework for Speculation• Simplifying Cache Coherence Protocols• Simplifying the Interconnection Network• Evaluation
– Goals– Methodology– Results
• Conclusions
IPDPS 2004 – Daniel Sorinslide 19
Goals
• Discover the point at which mis-speculation recoveries impact performance– Determines whether our simplified snooping protocol and
our simplified interconnection network are viable
• Determine whether our simplified directory protocol can usefully speculate on point-to-point ordering
IPDPS 2004 – Daniel Sorinslide 20
Methodology
• Full-system simulation– Simics provides full-system functionality– We added detailed timing model for memory system
• Workloads– Online transaction processing (OLTP) with DB2– SPECjbb2000 java middleware– Apache static web serving– Slashcode dynamic web serving– Barnes-Hut scientific simulation
IPDPS 2004 – Daniel Sorinslide 21
How Rare Must Mis-speculation Be?
We can tolerate high mis-speculation rates – these rates are much higher than what our simplified designs incur
IPDPS 2004 – Daniel Sorinslide 22
Adaptive Routing with Speculative Ordering
Adaptive routing can provide better performance by routing around congestion, even with mis-speculations
IPDPS 2004 – Daniel Sorinslide 23
Conclusions
• Simplify multiprocessor design with speculation– Treat corner cases as mis-speculations & recover from them
• Must be able to ensure that– Mis-speculations are sufficiently rare– Can detect all mis-speculations– Can recover from mis-speculations– Can provide forward progress in all cases
• Showed how to simplify– Cache coherence protocols– Interconnection network deadlock avoidance