Top Banner
Cache Coherence “Can we do a better job of supporting cache coherence?” Ross Daly Chan Kim
17

Cache Coherence “Can we do a better job of supporting cache coherence?” Ross Daly Chan Kim.

Apr 01, 2015

Download

Documents

Lee Brundage
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Cache Coherence “Can we do a better job of supporting cache coherence?” Ross Daly Chan Kim.

Cache Coher-ence

“Can we do a better job of supporting cache co-

herence?”

Ross DalyChan Kim

Page 2: Cache Coherence “Can we do a better job of supporting cache coherence?” Ross Daly Chan Kim.

Definition of CC• “For any given memory location, at any given moment

in time, there is either a single core that may write it (and that may also read it) or some number of cores that may read it.”

• “Data-Value Invariant: the value of a memory location at the start of an epoch is the same as the value of the memory location at the end of its last read-write epoch”

- D. J. Sorin, M. D. Hill, and D. A. Wood. A Primer on Memory Consistency and Cache Coherence, volume 6 of Synthesis Lectures on Computer Architecture. Morgan & Claypool Publishers, May 2011.

Page 3: Cache Coherence “Can we do a better job of supporting cache coherence?” Ross Daly Chan Kim.

Goals• Improve performance for cache coherency on multi-core/many-core systems.

• Scaling the number of cores to increase perfor-mance A

• Scaling the number of cores with out increasing cache coherence complexity.

Page 4: Cache Coherence “Can we do a better job of supporting cache coherence?” Ross Daly Chan Kim.

Xpoint Cache• Motivation:

Page 5: Cache Coherence “Can we do a better job of supporting cache coherence?” Ross Daly Chan Kim.

Xpoint: Architecture(2D)

Typical bus based Architecture Xpoint Architecture

Page 6: Cache Coherence “Can we do a better job of supporting cache coherence?” Ross Daly Chan Kim.

Xpoint: Architecture(3D)

Page 7: Cache Coherence “Can we do a better job of supporting cache coherence?” Ross Daly Chan Kim.

Xpoint: Results• 29x speedup for 32 core system

• 45x speedup for 64 core system

• 2.1 improvement over 64 core conventional bus

Page 8: Cache Coherence “Can we do a better job of supporting cache coherence?” Ross Daly Chan Kim.

Increasing the Effectiveness of Directory Caches by Deactivating

Coherence for Private Memory Blocks: Motivation

• Keeping track of all the blocks in directory entails huge storage requirements.

• Directory cache requires less storage, but it will suffer from directory cache misses.

• Most of the accessed blocks (about 75% on avg.) are private.

Page 9: Cache Coherence “Can we do a better job of supporting cache coherence?” Ross Daly Chan Kim.

Increasing the Effectiveness of Directory Caches by Deactivating

Coherence for Private Memory Blocks: Private vs. Shared blocks

• Coarse-grain strategy (page granularity)

• OS detects when a private page must become shared.

• Every new page load is private

• When another processor access private blocks, it becomes shared.

Page 10: Cache Coherence “Can we do a better job of supporting cache coherence?” Ross Daly Chan Kim.

Increasing the Effectiveness of Directory Caches by Deactivating

Coherence for Private Memory Blocks

Page 11: Cache Coherence “Can we do a better job of supporting cache coherence?” Ross Daly Chan Kim.

Increasing the Effectiveness of Directory Caches by Deactivating

Coherence for Private Memory Blocks: Coherence Recovery Mecha-

nism

• Flushing-based Recovery Mechanism- Flushing all the blocks within a page may in-crease

the miss rate.

• Updating-based Recovery Mechanism

Page 12: Cache Coherence “Can we do a better job of supporting cache coherence?” Ross Daly Chan Kim.

Increasing the Effectiveness of Directory Caches by Deactivating

Coherence for Private Memory Blocks: Results

• Directory caches can avoid the tracking of about 57%

• Shorten the runtime of parallel application by 15% while keeping directory cache size or to maintain system performance while using direc-tory caches 8 times smaller.

Page 13: Cache Coherence “Can we do a better job of supporting cache coherence?” Ross Daly Chan Kim.

Complexity-Effective Multicore Coherence

• Similarity- Motivation

- Private and Shared blocks

• Difference- Simplifying the protocol

- directory-less

Page 14: Cache Coherence “Can we do a better job of supporting cache coherence?” Ross Daly Chan Kim.

Complexity-Effective Multicore Coherence:

Simplifying the protocol• Dynamic write policy - Write-back vs. Write-through

• VIPS Cache coherency protocol- Valid/Invalid – Private/Shared

Page 15: Cache Coherence “Can we do a better job of supporting cache coherence?” Ross Daly Chan Kim.

Complexity-Effective Multicore Coherence:

Directory-less• Self-invalidation

- Readers are allowed to make unregistered copies of a memory location, as long as they promise to invalidate these at the next synchronization point.- Doe this follow cache coherency?

• Selective Flushing

• Write-through at a word granularity with per-word dirty bit

Page 16: Cache Coherence “Can we do a better job of supporting cache coherence?” Ross Daly Chan Kim.

Complexity-Effective Multicore Coherence:

Simplifying the protocol: Synchronization• Synchronization relies on data race

• Atomic instructions spin locally in it’s L1 until the condition is changed by another core.

• In this paper, a core does not send invalidation signal to other cores when executes write inst.

• Solution?

Page 17: Cache Coherence “Can we do a better job of supporting cache coherence?” Ross Daly Chan Kim.

Complexity-Effective Multicore Coherence:

Simplifying the protocol: Results• Outperformed MESI directory protocol by 4.8%

• Reduced network energy consumption by 14.2%

• Simulated for 15 parallel benchmarks, on 16 cores