http://www.cs.utsa.edu http://www.cs.utsa.edu Achieving Non-Inclusive Cache Performance with Inclusive Caches Temporal Locality Aware (TLA) Cache Management Policies Aamer Jaleel, Eric Borch, Malini Bhandaru, Simon Steely Jr., Joel Emer In International Symposium on Microarchitecture (MICRO) , December 2010 Presented by: Yingying Tian
21
Embed
Aamer Jaleel, Eric Borch, Malini Bhandaru, Simon Steely Jr., Joel Emer
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
conveys the temporal locality of hot blocks in core caches by sending hints to the LLC on eachhit of core caches to update the replacement state of that block in LLC.• Significantly reduce the number
of inclusion victims• The number of requests to the
LLC is extremely large and does notscale well with increasing number of cores
(even with filter optimizations)• Limit study
http://www.cs.utsa.edu
Early Core Invalidation (ECI)
• derives the temporal locality of a block before its becomes LRU in the LLC. The LLC chooses the block located at [LRU-1] position and invalidates it in the core caches while keeping it in the LLC• by observing the core’s subsequent request, the LLC derives the temporal locality• occurs on each LLC miss
http://www.cs.utsa.edu
Early Core Invalidation (ECI) cont.• Early-invalidated block – ECI block
• ECI block is hot in certain core cache re-requested by that core cache L1 miss but LLC hit, move back to MRU in LLC to keep the temporal locality
• ECI block is not hot (not re-requested or re-requested after a long time)
evicted from the LLC on next LLC miss in the corresponding set
• Lower traffic solution (# of LLC misses is much smaller)
• low-accurate prediction (predict the ECI block is hot in core caches)
what if the ECI block is hot, but not that hot?
http://www.cs.utsa.edu
Query Based Selection (QBS)
• infers the temporal locality of a block in the LLC by query the core caches on each LLC miss
• The LLC selects a replacement candidate and queries all core caches if this block is present in certain core caches.
• Only replace the block that is not present in any core caches.
• If the QBS block is present in certain core cache. The LLC updates the corresponding replacement state to MRU and re-select, re-query another replacement candidate.
http://www.cs.utsa.edu
• The QBS victim selection process is hidden by memory latency.
• The cache controller can limit the number of queries issued on an LLC miss.
• Based on the experiments, sending 2 queries is sufficient to achieve performance benefits.
• Performs similar to a non-inclusive cache hierarchy.
• The on-chip communication overhead is extremely large. [not mentioned in the paper]
Query Based Selection (QBS) Cont.
http://www.cs.utsa.edu
An example (. . . a, b, a, c, a, d, a, e, a, f, a, . . . . )
•Benchmarks: 15 benchmarks selected from SPEC CPU 2006 benchmark suite based on program behaviors (core cache fitting, LLC fitting, LLC thrashing, 5 benchmarks of each)