Page 1
1Mrinmoy Ghosh
CoolPression: A Hybrid Significance CoolPression: A Hybrid Significance Compression Technique for Reducing Compression Technique for Reducing
Energy in CachesEnergy in Caches
Mrinmoy Ghosh Weidong ShiHsien-Hsin (Sean) Lee
School of Electrical and Computer EngineeringSchool of Electrical and Computer Engineering
Georgia Institute of TechnologyGeorgia Institute of Technology
September 15, 2004
Page 2
2Mrinmoy Ghosh
Hot CachesHot Caches
Data Cache14%
Bus Interface Unit12%
Integer Units16%
Data Path32%
Mem. Control
ler19%
Instruction
Cache7%
I Cache25%
D MMU5%
I MMU4%
SysCtl3%
Other4%
Clocks4%
BIU8%
PATag RAM1%
CP152%
ARM 925%
D Cache19%
Alpha 21264
ARM 920T
Page 3
3Mrinmoy Ghosh
MotivationMotivation
1
10
100
1000
10000
100000
1000000
10000000
100000000
1000000000
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61 63
Max
Min
Avg
8 16 24 32 40 48
56 64
# o
f In
stan
ces
# of Leading Zeroes
Occurrences of Leading Zeroes for SPECint2000
Uniform distribution of occurrences of leading zeroes across the 64 bit space
Page 4
4Mrinmoy Ghosh
Salient Features of CoolPressionSalient Features of CoolPression
Energy-saving based on “bits” granularity
Compress both leading 1’s and leading 0’s
Reuse most significant byte, minimizing overhead
CoolPression is a hybrid of two schemes
Dynamic Zero Compression
CoolCount Scheme
Choose the better scheme dynamically
Page 5
5Mrinmoy Ghosh
CoolPression CacheCoolPression Cache
Sense Amps
32 bits
SRAM Cell Array
Page 6
6Mrinmoy Ghosh
CoolPression Cache CoolPression Cache DZC DZC
Dynamic Zero Compression Technique [Villa et al 2000]
Sense Amps
36 bits
ZIBs
SRAM Cell Array1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0
Page 7
7Mrinmoy Ghosh
CoolPression CacheCoolPression Cache
1 1 1 1 1 1 1 10 0 0 0 0 0 0 0 1 11 1 1 1 1 10 0 0 0 0 0 0 0
Sense Amps
36 bits
ZIBsCoolCount Technique
SRAM Cell Array
Page 8
8Mrinmoy Ghosh
CoolPression CacheCoolPression Cache
Step 1: Read In First 7 bits and the ZIBs
CoolCount Circuit
1 1 1 1 1 1 1 10 0 0 0 0 0 0 0 1 11 1 1 1 1 10 0 0 0 0 0 0 0
Sense Amps37Data from Cache
Data Out
33
32Bitline Enable Lines
6 bits
CE Bit
CoolCount Circuit
36 bits
ZIBsCoolCount Technique
SRAM Cell Array
Step 2a: Read only 32 –count bits and append with leading zeroes or ones
32 - count
Page 9
9Mrinmoy Ghosh
Counting Leading 0’s And 1’sCounting Leading 0’s And 1’s
0
Priority Encoder
0 10 1 0 10
“# of Leading Zeroes or Ones ”
0 0 0 1 1 0
0 0 0 0 1 1 0 1
Page 10
10Mrinmoy Ghosh
Counting Leading 0’s And 1’sCounting Leading 0’s And 1’s
0
Priority Encoder
0 10 1 0 1
“# of Leading Zeroes or Ones”
0 0 0 1 1 00
0 0 0 1 0 1 1
0 0 0 0 1 1 0 1
Page 11
11Mrinmoy Ghosh
Counting Leading 0’s And 1’sCounting Leading 0’s And 1’s
Priority Encoder
“# of Leading Zeroes or Ones”
0 0 0 1 0 1 1
0 1 1
0 0 0 0 1 1 0 1
Page 12
12Mrinmoy Ghosh
Counting Leading 0’s And 1’sCounting Leading 0’s And 1’s
Priority Encoder
“# of Leading Zeroes or Ones”
0 1 1
1 0 0
0 0 0 0 1 1 0 1
Page 13
13Mrinmoy Ghosh
Bitline Precharge Enabling CircuitBitline Precharge Enabling Circuit
VDD
C2
C1C0
Y7
Y6
Y5
Y4
Y3
Y2
Y1
Y0
SRAM Cell
SRAM Cell
Bitline PrechargePrecharge Control Transisto
rVDDVDD
b b
wl
wl
Precharge Enable from
Coolcount Decoder Circuit
Page 14
14Mrinmoy Ghosh
Read Data From CoolPression CacheRead Data From CoolPression Cache
Read in Count Enable (CE) Bit and First 6 bits of data
CE ==1 Enable Least Significant 64-count bit lines
Read Data From Least Significant 64-count bit lines and append with
count leading zeroes or ones
Read Data for bytes where ZIB is not enabled and make the other bytes
zero
Yes
No
Page 15
15Mrinmoy Ghosh
Write Data To CoolPression CacheWrite Data To CoolPression Cache
Count Number of Leading Zeroes or Ones
Check for Bytes which are zero
Count > Zero Bytes
Set CE bit to one and Enable Most Significant 6
bits lines and Least Significant 64-count bit
lines
Write Encoded Data to Cache
Set CE bit to 0 and Write Data to Cache setting ZIBs where necessary
Yes
No
Page 16
16Mrinmoy Ghosh
Simulation MethodologySimulation Methodology
● Simulator: Simplescalar with Wattch ● Benchmarks: SPEC INT 2000● Power Numbers for Cache Structures:
CACTI● Power Numbers for Priority Encoder:
J.S Wang, C.H. Huang. “High Speed and low power CMOS priority encoders”. Journal of Scientific Computing, 35(10) 2000
For a 64 KB Cache Priority Encoder consumes around .1% of the Cache Power
Page 17
17Mrinmoy Ghosh
ResultsResults
00.20.40.60.8
1
Bzip2 Crafty GCC GZIP MCF Parser Vortex Vpr Avg
Dcache Base Dcache CoolCount Dcache DZC Dcache CoolPression
16K Data Cache
16K Instruction Cache
0.75
0.8
0.85
0.9
0.95
1
Bzip2 Crafty GCC GZIP MCF Parser Vortex Vpr Avg
Icache Base Icache CoolCount Icache DZC Icache CoolPression
Norm
Tota
l P
ow
er
Norm
Tota
l P
ow
er
Page 18
18Mrinmoy Ghosh
ResultsResults
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Bzip2 Crafty GCC GZIP MCF Parser Vortex Vpr Avg
Dcache Base Dcache CoolCount Dcache DZC Dcache CoolPression
32K Data Cache
Norm
Tota
l P
ow
er
00.10.20.30.40.50.60.70.80.9
1
Bzip2 Crafty GCC GZIP MCF Parser Vortex Vpr Avg
Icache Base Icache CoolCount Icache DZC Icache CoolPression
32K Instruction Cache
Norm
Tota
l P
ow
er
Page 19
19Mrinmoy Ghosh
Potential Performance ImpactPotential Performance Impact
0
0.5
1
1.5
2
2.5
Crafty Gcc Gzip Mcf Parser Twolf Vortex VPR Avg
IPC
Normal Cache CoolPression Cache
Page 20
20Mrinmoy Ghosh
ConclusionsConclusions
● System Transparent Hybrid Zero Compression Scheme
● Bit level and Byte level compressibility used to save power
● Energy Savings of over 35% over baseline cache
● Potential Use at other places where data transfer takes place
Page 21
21Mrinmoy Ghosh
Thank YouThank You