Shuchang Shan † ‡ , Yu Hu † , Xiaowei Li † † Key Laboratory of Computer System and Architecture, Institute of Computing Technology, Chinese Academy of Sciences ‡ Graduate University of Chinese Academy of Sciences (GUCAS) Transparent Dynamic Binding with Fault-Tolerant Cache Coherence Protocol for Chip Multiprocessors
31
Embed
Transparent Dynamic Binding with Fault-Tolerant Cache Coherence Protocol for Chip Multiprocessors
Transparent Dynamic Binding with Fault-Tolerant Cache Coherence Protocol for Chip Multiprocessors. Shuchang Shan † ‡ , Yu Hu † , Xiaowei Li † † Key Laboratory of Computer System and Architecture, Institute of Computing Technology, Chinese Academy of Sciences - PowerPoint PPT Presentation
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Shuchang Shan † ‡ , Yu Hu †, Xiaowei Li †
†Key Laboratory of Computer System and Architecture, Institute of Computing Technology, Chinese Academy of Sciences
‡ Graduate University of Chinese Academy of Sciences (GUCAS)
Transparent Dynamic Binding with Fault-Tolerant Cache Coherence Protocol for Chip Multiprocessors
2
Outline
Introduction
TDB execution model
Experimental results
Conclusion
3
FUs
Decode/
Rename
Register File
Writeback/
Commit
Fetch
Reorder Buffer
Issue QueueFUs
Decode/
Rename
Register File
Writeback/
Commit
Fetch
Reorder Buffer
Issue Queue
=
Architectural level Dual Modular Redundancy
Memory system
L1 L1 L1 L1L1
Instruction-level DMR
Core-level DMR
AR-SMT[FTCS’99], SRT[ISCA’00]Thread-level DMR
DIVA[MICRO’99], SHREC[MICRO’04], EDDI[TR’02]
CRTR [ISCA’03], Reunion[MICRO’06], DCC[DSN’07]
Leading thread
Trailing thread
EX’
CHKLeading
instructionsTrailing
instructions
A A’ B B’ For CMP systems, to make use of abundant
hardware resources, buildingCore-level DMR!
4Core-level Dual Modular Redundancy (DMR)Using coupled cores to verify each other’s executionStatic binding
5 5uar-LRU: update MRU after the instruction retirement to prevent the WP
memory references from violating the consistency
23
Master-slave memory consistency violationExternal writes violates the master-slave memory consistencyAtomicity of master-slave data access behaviorLacks of scalability as external writes become more frequent
Master-slave input coherence: (a) external writes violates the consistency; (b) the master-slave consistency window in DCC
24Transparent Input Coherence StrategyTake advantage of Transparent dynamic bindingBreak the atomicity of master-slave data access behavior
LD1
ST3
LD1'
ST3
D D
I D
I D
optimization
time
Checker
25
Outline
Introduction
TDB execution model
Experimental results
Conclusion
26
Experimental Setup
Full system simulator: simics + GEMSParallel workloads: SPLASH-2The Baseline Dual Modular Redundancy System
– N active cores and another N disabled cores– Simulate the DMR system where the slaves work without
interfering the masters
27
The Performance of TDB Proposal
0.8
0.9
1.0
1.1
Norm
alize
d run
time
4P 8P 16P 32P
97.2%, 99.8%, 101.2% and 105.4% over the baseline for 4, 8, 16 and 32 cores respectively
Conservative private cache ingress rule helps filter the WP effects
28
Network Traffic of TDB Proposal
0.8
0.9
1.0
1.1
Norm
alize
d Net
work
Traffi
c
4P 8P 16P 32P
the total traffic is increased by 5.2%, 3.6%, 1.3% and 2.5% for 4-, 8-, 16- and 32-core CMP systems
29
Comparison against DCC [DSN’07]
4P 8P 16P 32P1.01.21.41.6
4P 8P 16P 32P1.0
1.1
TDB DCCNo
rmali
zed
Runti
me
Norm
alize
d Ne
twor
k Tra
ffic
TDB DCC
9.2% 10.4%18%
37.1%
Transparent Dynamic Binding (TDB):scalable and flexible Core-level DMR solution!
30
Conclusion
Transparent Dynamic Binding– Reduce SoC to the scale of Private Caches
Techniques to maintain the consistency– Consumer-consumer data access pattern– Victim Buffer assisted conservative ingress rule– uar-LRU replacement policy– Transparent input coherence policy