Adaptive Cache Compression Adaptive Cache Compression for High-Performance for High-Performance Processors Processors Alaa R. Alameldeen and David A.Wood Alaa R. Alameldeen and David A.Wood Computer Sciences Department, University of Computer Sciences Department, University of Wisconsin-Madison Wisconsin-Madison
21
Embed
Adaptive Cache Compression for High-Performance Processors Alaa R. Alameldeen and David A.Wood Computer Sciences Department, University of Wisconsin- Madison.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Adaptive Cache Compression for Adaptive Cache Compression for High-Performance ProcessorsHigh-Performance Processors
Alaa R. Alameldeen and David A.WoodAlaa R. Alameldeen and David A.WoodComputer Sciences Department, University of Wisconsin-Computer Sciences Department, University of Wisconsin-
Increasing performance gap between Increasing performance gap between processors and memory calls for faster processors and memory calls for faster memory access.memory access.
Cache memories – reduce average memory Cache memories – reduce average memory latencylatency
Cache Compression – improves performance of cache Cache Compression – improves performance of cache
memoriesmemoriesAdaptive Cache Compression – Theme of this Adaptive Cache Compression – Theme of this discussion discussion
MotivationMotivation
Cache compressionCache compression can improve effectiveness of cache memories can improve effectiveness of cache memories (increase (increase effective cache capacityeffective cache capacity))
Increasing effective cache capacity reduces miss rateIncreasing effective cache capacity reduces miss rate
Performance will improve !Performance will improve !
Adaptive Cache Compression Adaptive Cache Compression An OverviewAn Overview
Use the past to predict the futureUse the past to predict the future
How likely is compression going to help, hurt, or make no difference to next reference? Feedback from previous compression helps to decide whether to compress the next write to cache
Use a Use a compression information tagcompression information tag stored with each stored with each address tagaddress tag
32 segments (8 bytes each) in each set32 segments (8 bytes each) in each set
An An uncompressed lineuncompressed line comprises comprises 8 segments8 segments
(4 uncompressed lines max in each set)(4 uncompressed lines max in each set)
Compressed linesCompressed lines are are 1 to 7 segments1 to 7 segments in length in length
Max number of lines in each set =8Max number of lines in each set =8
Least recently used (LRU) lines evictedLeast recently used (LRU) lines evicted
CompactingCompacting may be used to make room for a new line may be used to make room for a new line
Adaptive Cache Compression:Adaptive Cache Compression:To compress or not to compress?To compress or not to compress?
While compression eliminates L2 misses, it increases the While compression eliminates L2 misses, it increases the latency of L2 hits (more frequent).latency of L2 hits (more frequent).However, penalty for L2 misses is usually large and However, penalty for L2 misses is usually large and extra latency due to decompression is usually small.extra latency due to decompression is usually small.Compression helps if:Compression helps if:
( avoided L2 misses ) x (L2 miss penalty)
>( penalized L2 hits ) x( decompression penalty)
Example: For a 5 cycle decompression penalty and 400 cycle cycle L2 miss penalty, compression wins if it eliminates at least one L2 miss for every 400/5=80 penalized L2 hits
Adaptive Cache CompressionAdaptive Cache CompressionClassification of Cache ReferencesClassification of Cache References
Classifications of hitsClassifications of hits Unpenalized hit Unpenalized hit (e.g. reference to address A)(e.g. reference to address A) Penalized hitPenalized hit
(e.g. reference to address C)(e.g. reference to address C) Avoided missAvoided miss
(e.g. reference to address E)(e.g. reference to address E)
Classifications of missesClassifications of misses Avoidable missAvoidable miss
( e.g. reference to address G)( e.g. reference to address G) Unavoidable missUnavoidable miss
( e.g. reference to address H)( e.g. reference to address H) Evicted
Adaptive Cache CompressionAdaptive Cache CompressionHardware use in decision-makingHardware use in decision-making
Global Compression PredictorGlobal Compression Predictor estimates the recent estimates the recent cost or benefitcost or benefit of compression of compression On a On a penalized hitpenalized hit, the controller , the controller biases againstbiases against compression by compression by
decrementing the counterdecrementing the counter
( subtractedvalue=decompression penalty)( subtractedvalue=decompression penalty) On an avoided or avoidable miss, the controller increments the On an avoided or avoidable miss, the controller increments the
counter by the L2 miss penalty.counter by the L2 miss penalty. The controller uses the GCP when allocating a line in the L2 cacheThe controller uses the GCP when allocating a line in the L2 cache
Positive value -> compression has helped, so now compressPositive value -> compression has helped, so now compress Negative value -> compression has been penalizing, so don’t Negative value -> compression has been penalizing, so don’t
compresscompress
Size of GCP determines sensitivity to changesSize of GCP determines sensitivity to changes In this paper, 19-bit used ( saturates at 262143 or -262144 )In this paper, 19-bit used ( saturates at 262143 or -262144 )
To understand the utility of adaptive compression, 2 To understand the utility of adaptive compression, 2 extreme policies ( Never compress, and always extreme policies ( Never compress, and always compress were compared with )compress were compared with )
‘‘Never’ strives to reduce hit latencyNever’ strives to reduce hit latency
‘‘Always’ strives to reduce miss rateAlways’ strives to reduce miss rate
‘‘Adaptive’ strives to optimize.Adaptive’ strives to optimize.
Figure: Runtime for the three compression alternatives (normalized to “Never”)
Reported PerformanceReported Performance((sensitivity of adaptive compression to sensitivity of adaptive compression to
benchmark phase changes)benchmark phase changes)
Top: temporal changes in Global Compression Predictor values
Bottom: effective cache size
Review ConclusionReview Conclusion
Compressing all compressible cache lines only improves Compressing all compressible cache lines only improves memory-intensive applications. Applications with low memory-intensive applications. Applications with low miss rate / compressibility suffer.miss rate / compressibility suffer.
Optimization achieved by adaptive scheme are:Optimization achieved by adaptive scheme are:Up to 26% speedup (over uncompressed scheme) forUp to 26% speedup (over uncompressed scheme) for
memory-intensive, highly-compressible benchmarksmemory-intensive, highly-compressible benchmarksPerformance degradation for other benchmarks < 0.4%Performance degradation for other benchmarks < 0.4%
Critics/SuggestionsCritics/Suggestions
Data inconsistency:17% improvement in performance for memory-Data inconsistency:17% improvement in performance for memory-intensive commercial workloads claimed on page 2 but 26% claimed intensive commercial workloads claimed on page 2 but 26% claimed on page 11.on page 11.
Miscalculation on page 4 Miscalculation on page 4 The sum of the compressed sizes at stack depths 1 through 7 totals The sum of the compressed sizes at stack depths 1 through 7 totals
29.29. However, this miss cannot be avoided because the sum of compressed However, this miss cannot be avoided because the sum of compressed
sizes exceeds the total number of segments (i.e. 35 > 32 ) .sizes exceeds the total number of segments (i.e. 35 > 32 ) .
All in all, the proposed technique doesn’t seem to enhance All in all, the proposed technique doesn’t seem to enhance performance significantly with respect to ‘always’. performance significantly with respect to ‘always’.