This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
1
Lecture 21: Coherence and Interconnection Networks
Papers:• Flexible Snooping: Adaptive Filtering and Forwarding in Embedded Ring Multiprocessors, UIUC, ISCA-06
• Coherence-Ordering for Ring-Based Chip Multiprocessors, Wisconsin, MICRO-06 (very brief)
• In-Network Cache Coherence, Princeton, MICRO-06
2
Motivation
• CMPs are the standard
• cheaper to build medium size machines– 32 to 128 cores
• shared memory, cache coherent– easier to program, easier to manage
• cache coherence is a necessary evil
3
Cache coherence solutions
long latenciessimplenosnoopy embedded ring
difficult to scalesimpleyessnoopy
broadcast bus
indirection,extra hardware
scalablenodirectory based protocol
consprosordered
network?strategy
ComplexityscalablenoIn-Network Directory
4
Ring Interconnect based snoop protocols
• Investigated by Barroso et al. in early nineties
• Why?– Short, fast point-to-point link
– Fewer (data) ports
– Less complex than packet-switched
– Simple, distributed arbitration
– Exploitable ordering for coherence
5
Ring in action
R
S
R
S
R
S
supplierpredictor
snoop
request
cmp
Lazy Eager Oracle
response
datadata
data
Courtesy: UIUC-ISCA06
6
R
S
R
S
R
S
latency
snoops
messages
• goal: adaptive schemes that approximate Oracle’s behavior
Lazy Eager Oracle
Ring in action
Courtesy: UIUC-ISCA06
7
Primitive snooping actions
X X
• snoop and then forward
• forward and then snoop
• forward only
+ fewer messages
+ shorter latency
+ fewer snoops+ shorter latency– false negative predictions not allowed
Courtesy: UIUC-ISCA06
8
Predictor implementation
• Subset– associative table:
subset of addresses that can be supplied by node
• Superset– bloom filter: superset of addresses that can be supplied by node– associative table (exclude cache):
addresses that recently suffered false positives
• Exact– associative table: all addresses that can be supplied by node– downgrading: if address has to be evicted from predictor table, corresponding line in node has to be downgraded