Top Banner
Decoupled Compressed Cache: Exploiting Spatial Locality for Energy-Optimized Compressed Caching Somayeh Sardashti and David A. Wood University of Wisconsin-Madison 1
42

Decoupled Compressed Cache - Microarch.org · Decoupled Compressed Cache: Exploiting Spatial Locality for Energy-Optimized Compressed Caching Somayeh Sardashti and David A. Wood ...

Jul 29, 2018

Download

Documents

duongque
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Decoupled Compressed Cache - Microarch.org · Decoupled Compressed Cache: Exploiting Spatial Locality for Energy-Optimized Compressed Caching Somayeh Sardashti and David A. Wood ...

Decoupled Compressed Cache: Exploiting Spatial Locality for Energy-Optimized Compressed Caching

Somayeh Sardashti and David A. Wood University of Wisconsin-Madison

1

Page 2: Decoupled Compressed Cache - Microarch.org · Decoupled Compressed Cache: Exploiting Spatial Locality for Energy-Optimized Compressed Caching Somayeh Sardashti and David A. Wood ...

Please find the power point presentation in: http://www.cs.wisc.edu/multifacet/papers/micro13_dcc.pptx

2

Page 3: Decoupled Compressed Cache - Microarch.org · Decoupled Compressed Cache: Exploiting Spatial Locality for Energy-Optimized Compressed Caching Somayeh Sardashti and David A. Wood ...

3

Page 4: Decoupled Compressed Cache - Microarch.org · Decoupled Compressed Cache: Exploiting Spatial Locality for Energy-Optimized Compressed Caching Somayeh Sardashti and David A. Wood ...

4

Page 5: Decoupled Compressed Cache - Microarch.org · Decoupled Compressed Cache: Exploiting Spatial Locality for Energy-Optimized Compressed Caching Somayeh Sardashti and David A. Wood ...

Communication vs. Computation

Keckler Micro 2011

Improving cache utilization is critical for energy-efficiency!

~200X

Page 6: Decoupled Compressed Cache - Microarch.org · Decoupled Compressed Cache: Exploiting Spatial Locality for Energy-Optimized Compressed Caching Somayeh Sardashti and David A. Wood ...

Compressed Cache: Compress and Compact Blocks + Higher effective cache size + Small area overhead + Higher system performance + Lower system energy

Previous work limit compression effectiveness: - Limited number of tags - High internal fragmentation - Energy expensive re-compaction

Page 7: Decoupled Compressed Cache - Microarch.org · Decoupled Compressed Cache: Exploiting Spatial Locality for Energy-Optimized Compressed Caching Somayeh Sardashti and David A. Wood ...

7

Decoupled Compressed Cache (DCC)

Saving system energy by improving LLC utilization through cache compression.

Non-Contiguous Sub-Blocks

Previous work limit compression effectiveness: - Limited number of tags - High Internal Fragmentation - Energy expensive re-compaction

Decoupled Super-Blocks

Page 8: Decoupled Compressed Cache - Microarch.org · Decoupled Compressed Cache: Exploiting Spatial Locality for Energy-Optimized Compressed Caching Somayeh Sardashti and David A. Wood ...

8

Decoupled Compressed Cache (DCC)

Saving system energy by improving LLC utilization through cache compression.

Outperform 2X LLC 1.08X LLC area 14% higher performance 12% lower energy

Page 9: Decoupled Compressed Cache - Microarch.org · Decoupled Compressed Cache: Exploiting Spatial Locality for Energy-Optimized Compressed Caching Somayeh Sardashti and David A. Wood ...

Outline

9

Motivation Compressed caching Our Proposals: Decoupled compressed cache Experimental Results Conclusions

Page 10: Decoupled Compressed Cache - Microarch.org · Decoupled Compressed Cache: Exploiting Spatial Locality for Energy-Optimized Compressed Caching Somayeh Sardashti and David A. Wood ...

10

Uncompressed Caching

A fixed one-to-one tag/data mapping

Tags Data

Page 11: Decoupled Compressed Cache - Microarch.org · Decoupled Compressed Cache: Exploiting Spatial Locality for Energy-Optimized Compressed Caching Somayeh Sardashti and David A. Wood ...

11

Compressed Caching

Compress cache blocks.

Tags Data

Compact compressed blocks, to make room. Add more tags to increase effective capacity.

Page 12: Decoupled Compressed Cache - Microarch.org · Decoupled Compressed Cache: Exploiting Spatial Locality for Energy-Optimized Compressed Caching Somayeh Sardashti and David A. Wood ...

12

Compression

(1) Compression: how to compress blocks? • There are different compression algorithms. • Not the focus of this work. • But, which algorithm matters!

64 bytes 20 bytes

Compressor

Page 13: Decoupled Compressed Cache - Microarch.org · Decoupled Compressed Cache: Exploiting Spatial Locality for Energy-Optimized Compressed Caching Somayeh Sardashti and David A. Wood ...

Compression Potentials

13

High compression ratio potentially large normalized effective cache capacity.

1.5

2.8

3.9

Compression Ratio = Original Size / Compressed Size

Cycles to Decompress Compression Algorithm

We use C-PACK+Z for the rest of the talk!

Page 14: Decoupled Compressed Cache - Microarch.org · Decoupled Compressed Cache: Exploiting Spatial Locality for Energy-Optimized Compressed Caching Somayeh Sardashti and David A. Wood ...

14

Compaction

(2) Compaction: how to store and find blocks? • Critical to achieve the compression potentials. • This work focuses on compaction.

Tags Data

Fixed Sized Compressed Cache (FixedC) [Kim’02,  WMPI,  Yang  Micro  02]

Internal Fragmentation!

Page 15: Decoupled Compressed Cache - Microarch.org · Decoupled Compressed Cache: Exploiting Spatial Locality for Energy-Optimized Compressed Caching Somayeh Sardashti and David A. Wood ...

15

Compaction

(2) Compaction: how to store and find blocks?

Tags Data

Variable Sized Compressed Cache (VSC) [Alameldeen, ISCA 2002]

Sub-block

Page 16: Decoupled Compressed Cache - Microarch.org · Decoupled Compressed Cache: Exploiting Spatial Locality for Energy-Optimized Compressed Caching Somayeh Sardashti and David A. Wood ...

Previous Compressed Caches

16

(Limit 1) Limited Tag/Metadata – High Area Overhead by Adding 4X/more Tags

(Limit 2) Internal Fragmentation – Low Cache Capacity Utilization

10B 16B 2.6 2.3

2.0

1.7

Potential: 3.9

3.1

Normalized Effective Capacity = LLC Number of Valid Blocks / MAX Number of (Uncompressed) Blocks

Page 17: Decoupled Compressed Cache - Microarch.org · Decoupled Compressed Cache: Exploiting Spatial Locality for Energy-Optimized Compressed Caching Somayeh Sardashti and David A. Wood ...

(Limit 3) Energy-Expensive Re-Compaction

17

3X higher LLC dynamic energy!

Tags Data

VSC requires energy-expensive re-compaction.

Update B B needs 2 sub-blocks

Page 18: Decoupled Compressed Cache - Microarch.org · Decoupled Compressed Cache: Exploiting Spatial Locality for Energy-Optimized Compressed Caching Somayeh Sardashti and David A. Wood ...

Outline

Motivation Compressed caching Our Proposals: Decoupled compressed cache Experimental Results Conclusions

18

Page 19: Decoupled Compressed Cache - Microarch.org · Decoupled Compressed Cache: Exploiting Spatial Locality for Energy-Optimized Compressed Caching Somayeh Sardashti and David A. Wood ...

Decoupled Compressed Cache

19

(1) Exploiting Spatial Locality Low Area Overhead

(2) Decoupling tag/data mapping Eliminate energy expensive re-compaction Reduce internal fragmentation

(3) Co-DCC: Dynamically co-compacting super-blocks Further reduce internal fragmentation

Page 20: Decoupled Compressed Cache - Microarch.org · Decoupled Compressed Cache: Exploiting Spatial Locality for Energy-Optimized Compressed Caching Somayeh Sardashti and David A. Wood ...

(1) Exploiting Spatial Locality

Neighboring blocks co-reside in LLC.

20

89%

Page 21: Decoupled Compressed Cache - Microarch.org · Decoupled Compressed Cache: Exploiting Spatial Locality for Energy-Optimized Compressed Caching Somayeh Sardashti and David A. Wood ...

(1) Exploiting Spatial Locality

DCC tracks LLC blocks at Super-Block granularity.

21

4X Tags

Tags Data

2X Tags Quad (Q): A, B, C, D Singleton (S): E

Super-Block Tag Q stat

e A

stat

e B

stat

e C

stat

e D

Super Tags

Up to 4X blocks with low area overheads!

Page 22: Decoupled Compressed Cache - Microarch.org · Decoupled Compressed Cache: Exploiting Spatial Locality for Energy-Optimized Compressed Caching Somayeh Sardashti and David A. Wood ...

(2) Decoupling tag/data mapping DCC decouples mapping to eliminate re-compaction.

22

Quad (Q): A, B, C, D Singleton (S): E

Super Tags

Quad (Q): A, B, C, D Singleton (S): E

Flexible Allocation Update B

Page 23: Decoupled Compressed Cache - Microarch.org · Decoupled Compressed Cache: Exploiting Spatial Locality for Energy-Optimized Compressed Caching Somayeh Sardashti and David A. Wood ...

(2) Decoupling tag/data mapping

23

Back pointers identify the owner block of each sub-block.

Quad (Q): A, B, C, D Singleton (S): E

Super Tags

Quad (Q): A, B, C, D Singleton (S): E

Data Back

Pointers

Tag ID Blk ID

Page 24: Decoupled Compressed Cache - Microarch.org · Decoupled Compressed Cache: Exploiting Spatial Locality for Energy-Optimized Compressed Caching Somayeh Sardashti and David A. Wood ...

(3) Co-compacting super-blocks

Co-DCC dynamically co-compacts super-blocks. Reducing internal fragmentation

24

A sub-block

Quad (Q): A, B, C, D

Page 25: Decoupled Compressed Cache - Microarch.org · Decoupled Compressed Cache: Exploiting Spatial Locality for Energy-Optimized Compressed Caching Somayeh Sardashti and David A. Wood ...

Outline

Motivation Compressed caching Our Proposals: Decoupled compressed cache Experimental Results Conclusions

25

Page 26: Decoupled Compressed Cache - Microarch.org · Decoupled Compressed Cache: Exploiting Spatial Locality for Energy-Optimized Compressed Caching Somayeh Sardashti and David A. Wood ...

Experimental Methodology

26

Integrated DCC with AMD Bulldozer Cache. – We model the timing and allocation constraints of sequential

regions at LLC in detail. – No need for an alignment network.

Verilog implementation and synthesis of the tag match and sub-block selection logic. – One additional cycle of latency due to sub-block selection.

Page 27: Decoupled Compressed Cache - Microarch.org · Decoupled Compressed Cache: Exploiting Spatial Locality for Energy-Optimized Compressed Caching Somayeh Sardashti and David A. Wood ...

Experimental Methodology

27

Full-system simulation with a simulator based on GEMS. Wide range of applications with different level of cache

sensitivities: –Commercial workloads: apache, jbb, oltp, zeus – Spec-OMP: ammp, applu, equake, mgrid, wupwise –Parsec: blackscholes, canneal, freqmine – Spec 2006 mixes (m1-m8): bzip2, libquantum-bzip2, libquantum, gcc, astar-

bwaves, cactus-mcf-milc-bwaves, gcc-omnetpp-mcf-bwaves-lbm-milc-cactus-bzip, omnetpp-lbm

Cores Eight OOO cores, 3.2 GHz L1I$/L1D$ Private, 32-KB, 8-way L2$ Private, 256-KB, 8-way L3$ Shared, 8-MB, 16-way, 8 banks Main Memory 4GB, 16 Banks, 800 MHz bus frequency DDR3

Page 28: Decoupled Compressed Cache - Microarch.org · Decoupled Compressed Cache: Exploiting Spatial Locality for Energy-Optimized Compressed Caching Somayeh Sardashti and David A. Wood ...

Effective LLC Capacity

28

Components FixedC/VSC-2X DCC Co-DCC

Tag Array 6.3% 2.1% 11.3%

Back Pointer Array 0 4.4% 5.4%

(De-)Compressors 1.8% 1.8% 1.8%

Total Area Overhead 8.1% 8.3% 18.5%

1

2

1 2 3

Nor

mal

ized

LLC

Area

Baseline

2X Baseline

VSC DCC Co-DCC

FixedC

Normalized Effective LLC Capacity

Components FixedC/VSC-2X

Tag Array 6.3%

Back Pointer Array 0

(De-)Compressors 1.8%

Total Area Overhead 8.1%

Components FixedC/VSC-2X DCC

Tag Array 6.3% 2.1%

Back Pointer Array 0 4.4%

(De-)Compressors 1.8% 1.8%

Total Area Overhead 8.1% 8.3%

Page 29: Decoupled Compressed Cache - Microarch.org · Decoupled Compressed Cache: Exploiting Spatial Locality for Energy-Optimized Compressed Caching Somayeh Sardashti and David A. Wood ...

(Co-)DCC Performance

29

0.93 0.96 0.95

0.90 0.86

(Co-)DCC boost system performance significantly.

Page 30: Decoupled Compressed Cache - Microarch.org · Decoupled Compressed Cache: Exploiting Spatial Locality for Energy-Optimized Compressed Caching Somayeh Sardashti and David A. Wood ...

(Co-)DCC Energy Consumption

30

0.93 0.96 0.97

0.91 0.88

(Co-)DCC reduce system energy by reducing number of accesses to the main memory.

Page 31: Decoupled Compressed Cache - Microarch.org · Decoupled Compressed Cache: Exploiting Spatial Locality for Energy-Optimized Compressed Caching Somayeh Sardashti and David A. Wood ...

Summary

31

Analyze the limits of compressed caching • Limited number of tags • Internal fragmentation • Energy-expensive re-compaction

Decoupled Compressed Cache • Improving performance and energy of compressed caching • Decoupled super-blocks • Non-contiguous sub-blocks

Co-DCC further reduces internal fragmentation Practical designs [details in the paper]

Page 32: Decoupled Compressed Cache - Microarch.org · Decoupled Compressed Cache: Exploiting Spatial Locality for Energy-Optimized Compressed Caching Somayeh Sardashti and David A. Wood ...

(De-)Compression overhead DCC data array organization with AMD Bulldozer DCC Timing DCC Lookup Applications Co-DCC design LLC effective capacity LLC miss rate Memory dynamic energy LLC dynamic energy

32

Backup

Page 33: Decoupled Compressed Cache - Microarch.org · Decoupled Compressed Cache: Exploiting Spatial Locality for Energy-Optimized Compressed Caching Somayeh Sardashti and David A. Wood ...

(De-)Compression Overhead

33

Parameters Compressor Decompressor Pipeline Depth

6 2

Latency (cycles)

16 9

Area (𝒎𝒎𝟐)

0.016 0.016

Power Consumption (mW) 25.84 19.01

Page 34: Decoupled Compressed Cache - Microarch.org · Decoupled Compressed Cache: Exploiting Spatial Locality for Energy-Optimized Compressed Caching Somayeh Sardashti and David A. Wood ...

DCC Data Array Organization AMD Bulldozer

34

B Ph

ase

Flop

A Ph

ase

Flop

A Ph

ase

Flop

B Ph

ase

Flop

A0: uncompressed; B1 and C2 are compressed to 2 sub-blocks

SR0SR 1SR 2SR3

A0.3C3.0

A0.2

B1.1

A0.1B1.0

A0.0C3.1

NSet Addr

4SR0 Addr

SR3 Addr

4

44

SR1 AddrSR2 Addr

Read Data

Page 35: Decoupled Compressed Cache - Microarch.org · Decoupled Compressed Cache: Exploiting Spatial Locality for Energy-Optimized Compressed Caching Somayeh Sardashti and David A. Wood ...

DCC Timing

35

Page 36: Decoupled Compressed Cache - Microarch.org · Decoupled Compressed Cache: Exploiting Spatial Locality for Energy-Optimized Compressed Caching Somayeh Sardashti and David A. Wood ...

DCC Lookup

1. Access Super Tags and Back Pointers in parallel 2. Find the matched Back Pointers 3. Read corresponding sub-blocks and decompress

36

Quad (Q): A, B, C, D Singleton (S): E

Super Tags

Data Back Pointers Read C

Q 1 0 S

1 1

1 1

1 1

Page 37: Decoupled Compressed Cache - Microarch.org · Decoupled Compressed Cache: Exploiting Spatial Locality for Energy-Optimized Compressed Caching Somayeh Sardashti and David A. Wood ...

Applications

37

Spec2006 (m1-m8)

bzip2, libquantum-bzip2, libquantum, gcc, astar-bwaves, cactus-mcf-milc-bwaves, gcc-omnetpp-mcf-bwaves-lbm-milc-cactus-bzip, omnetpp-lbm

Sensitive to Cache Capacity and Latency

Sensitive to Cache Capacity

Cache Insensitive

Sensitive to Cache

Latency

Page 38: Decoupled Compressed Cache - Microarch.org · Decoupled Compressed Cache: Exploiting Spatial Locality for Energy-Optimized Compressed Caching Somayeh Sardashti and David A. Wood ...

Co-DCC Design

38

A2.1

A: <A2,A1,A0>A-ENDA1-Begin

Sub-block 7

…Sub-block 6 Sub-block 5

…Sub-blocks 4-2

A0.0

Sub-block 1 Sub-block 0

A0-Begin

A2.2A0.1A1A2.0

Tag ID Sh

arer

s

4b

Super-Block Tag Cs

tate

3Co

mp3

Csta

te2

Com

p2

Begi

n3

7b

Begi

n2Cs

tate

1Co

mp1

Csta

te0

Com

p0

Begi

n1

Begi

n0

END

7b3b 1b 1b7b3b1b 7b3b 1b 7b3b1b

Page 39: Decoupled Compressed Cache - Microarch.org · Decoupled Compressed Cache: Exploiting Spatial Locality for Energy-Optimized Compressed Caching Somayeh Sardashti and David A. Wood ...

LLC Effective Cache Capacity

39

1.0

1.5

2.0

2.5

3.0

3.5

4.0N

orm

LLC

Effe

ctiv

e Ca

paci

ty

Page 40: Decoupled Compressed Cache - Microarch.org · Decoupled Compressed Cache: Exploiting Spatial Locality for Energy-Optimized Compressed Caching Somayeh Sardashti and David A. Wood ...

LLC Miss Rate

40

0.20

0.40

0.60

0.80

1.00N

orm

LLC

Miss

Rat

e

Page 41: Decoupled Compressed Cache - Microarch.org · Decoupled Compressed Cache: Exploiting Spatial Locality for Energy-Optimized Compressed Caching Somayeh Sardashti and David A. Wood ...

Memory Dynamic Energy

41

0.40

0.50

0.60

0.70

0.80

0.90

1.00

Nor

m M

emor

y Dy

nam

ic E

nerg

y

Page 42: Decoupled Compressed Cache - Microarch.org · Decoupled Compressed Cache: Exploiting Spatial Locality for Energy-Optimized Compressed Caching Somayeh Sardashti and David A. Wood ...

LLC Dynamic Energy

42

0

1

2

3

4

5

6N

orm

LLC

Dyna

mic

Ene

rgy