Top Banner
Linearly Compressed Pages: A Main Memory Compression Framework with Low Complexity and Low Latency Gennady Pekhimenko, Vivek Seshadri , Yoongu Kim, Hongyi Xin, Onur Mutlu, Todd C. Mowry Phillip B. Gibbons, Michael A. Kozuch
54

Linearly Compressed Pages: A Main Memory Compression ...omutlu/pub/linearly... · Executive Summary 2 Main memory is a limited shared resource Observation: Significant data redundancy

May 29, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Linearly Compressed Pages: A Main Memory Compression ...omutlu/pub/linearly... · Executive Summary 2 Main memory is a limited shared resource Observation: Significant data redundancy

Linearly Compressed Pages:A Main Memory

Compression Framework with Low Complexity and Low Latency

Gennady Pekhimenko,

Vivek Seshadri , Yoongu Kim, Hongyi Xin, Onur Mutlu,

Todd C. Mowry

Phillip B. Gibbons,

Michael A. Kozuch

Page 2: Linearly Compressed Pages: A Main Memory Compression ...omutlu/pub/linearly... · Executive Summary 2 Main memory is a limited shared resource Observation: Significant data redundancy

Executive Summary

2

Main memory is a limited shared resource Observation: Significant data redundancy Idea: Compress data in main memory Problem: How to avoid inefficiency in address

computation? Solution: Linearly Compressed Pages (LCP):

fixed-size cache line granularity compression1. Increases memory capacity (62% on average)2. Decreases memory bandwidth consumption (24%)3. Decreases memory energy consumption (4.9%)4. Improves overall performance (13.9%)

Page 3: Linearly Compressed Pages: A Main Memory Compression ...omutlu/pub/linearly... · Executive Summary 2 Main memory is a limited shared resource Observation: Significant data redundancy

Potential for Data Compression

3

Significant redundancy in in-memory data:

0x00000000

How can we exploit this redundancy?

• Main memory compression helps

• Provides effect of a larger memory without making it physically larger

0x0000000B 0x00000003 0x00000004 …

Page 4: Linearly Compressed Pages: A Main Memory Compression ...omutlu/pub/linearly... · Executive Summary 2 Main memory is a limited shared resource Observation: Significant data redundancy

Challenges in Main Memory Compression

4

1. Address Computation

2. Mapping and Fragmentation

Page 5: Linearly Compressed Pages: A Main Memory Compression ...omutlu/pub/linearly... · Executive Summary 2 Main memory is a limited shared resource Observation: Significant data redundancy

L0 L1 L2 . . . LN-1

Cache Line (64B)

Address Offset 0 64 128 (N-1)*64

L0 L1 L2 . . . LN-1Compressed Page

0 ? ? ?Address Offset

Uncompressed Page

Challenge 1: Address Computation

5

Page 6: Linearly Compressed Pages: A Main Memory Compression ...omutlu/pub/linearly... · Executive Summary 2 Main memory is a limited shared resource Observation: Significant data redundancy

Challenge 2: Mapping & Fragmentation

6

Virtual Page (4KB)

Physical Page (? KB) Fragmentation

Virtual Address

Physical Address

Page 7: Linearly Compressed Pages: A Main Memory Compression ...omutlu/pub/linearly... · Executive Summary 2 Main memory is a limited shared resource Observation: Significant data redundancy

Outline

7

•Motivation & Challenges

• Shortcomings of Prior Work

• LCP: Key Idea

• LCP: Implementation

• Evaluation

• Conclusion and Future Work

Page 8: Linearly Compressed Pages: A Main Memory Compression ...omutlu/pub/linearly... · Executive Summary 2 Main memory is a limited shared resource Observation: Significant data redundancy

Key Parameters in Memory Compression

8

CompressionRatio

Address Comp.Latency

Decompression Latency

Complexityand Cost

Page 9: Linearly Compressed Pages: A Main Memory Compression ...omutlu/pub/linearly... · Executive Summary 2 Main memory is a limited shared resource Observation: Significant data redundancy

Shortcomings of Prior Work

9

CompressionMechanisms

CompressionRatio

Address Comp.Latency

Decompression Latency

Complexityand Cost

IBM MXT[IBM J.R.D. ’01]

2X 64 cycles

Page 10: Linearly Compressed Pages: A Main Memory Compression ...omutlu/pub/linearly... · Executive Summary 2 Main memory is a limited shared resource Observation: Significant data redundancy

Shortcomings of Prior Work (2)

10

CompressionMechanisms

CompressionRatio

Address Comp.Latency

Decompression Latency

Complexity And Cost

IBM MXT[IBM J.R.D. ’01]

Robust Main Memory Compression [ISCA’05]

Page 11: Linearly Compressed Pages: A Main Memory Compression ...omutlu/pub/linearly... · Executive Summary 2 Main memory is a limited shared resource Observation: Significant data redundancy

Shortcomings of Prior Work (3)

11

CompressionMechanisms

CompressionRatio

Address Comp.Latency

Decompression Latency

Complexity And Cost

IBM MXT[IBM J.R.D. ’01]

Robust Main Memory Compression [ISCA’05]

LCP: Our Proposal

Page 12: Linearly Compressed Pages: A Main Memory Compression ...omutlu/pub/linearly... · Executive Summary 2 Main memory is a limited shared resource Observation: Significant data redundancy

Linearly Compressed Pages (LCP): Key Idea

12

64B 64B 64B 64B . . .

. . .

4:1 Compression

64B

Uncompressed Page (4KB: 64*64B)

Compressed Data (1KB)

LCP effectively solves challenge 1: address computation

128

32

Fixed compressed size

Page 13: Linearly Compressed Pages: A Main Memory Compression ...omutlu/pub/linearly... · Executive Summary 2 Main memory is a limited shared resource Observation: Significant data redundancy

4:1 Compression

E

LCP: Key Idea (2)

13

64B 64B 64B 64B . . .

. . . M

Metadata(64B)

ExceptionStorage

64B

Uncompressed Page (4KB: 64*64B)

Compressed Data (1KB)

idx

E0

Page 14: Linearly Compressed Pages: A Main Memory Compression ...omutlu/pub/linearly... · Executive Summary 2 Main memory is a limited shared resource Observation: Significant data redundancy

E

But, wait …

14

64B 64B 64B 64B . . .

. . . M

4:1 Compression

64B

Uncompressed Page (4KB: 64*64B)

Compressed Data (1KB)

How to avoid 2 accesses ?

Metadata (MD) cache

Page 15: Linearly Compressed Pages: A Main Memory Compression ...omutlu/pub/linearly... · Executive Summary 2 Main memory is a limited shared resource Observation: Significant data redundancy

Key Ideas: Summary

Fixed compressed size per cache line

Metadata (MD) cache

15

Page 16: Linearly Compressed Pages: A Main Memory Compression ...omutlu/pub/linearly... · Executive Summary 2 Main memory is a limited shared resource Observation: Significant data redundancy

Outline

16

•Motivation & Challenges

• Shortcomings of Prior Work

• LCP: Key Idea

• LCP: Implementation

• Evaluation

• Conclusion and Future Work

Page 17: Linearly Compressed Pages: A Main Memory Compression ...omutlu/pub/linearly... · Executive Summary 2 Main memory is a limited shared resource Observation: Significant data redundancy

LCP Overview

17

• Page Table entry extension

• compression type and size (fixed encoding)

• OS support for multiple page sizes

• 4 memory pools (512B, 1KB, 2KB, 4KB)

• Handling uncompressible data

• Hardware support

• memory controller logic

• metadata (MD) cache

PTE

512B 1KB 2KB 4KB

Page 18: Linearly Compressed Pages: A Main Memory Compression ...omutlu/pub/linearly... · Executive Summary 2 Main memory is a limited shared resource Observation: Significant data redundancy

Page Table Entry Extension

18

c-bit (1b)c-type (3b)

Page Table Entry c-size (2b)

c-base (3b)

• c-bit (1b) – compressed or uncompressed page

• c-type (3b) – compression encoding used

• c-size (2b) – LCP size (e.g., 1KB)

• c-base (3b) – offset within a page

Page 19: Linearly Compressed Pages: A Main Memory Compression ...omutlu/pub/linearly... · Executive Summary 2 Main memory is a limited shared resource Observation: Significant data redundancy

Physical Memory Layout

19

1

4

4KB

2KB 2KB

1KB 1KB 1KB 1KB

512B 512B ... 512B

4KB

Page Table

PA1

PA2

PA2 + 512*1

PA1 + 512*4

PA0

Page 20: Linearly Compressed Pages: A Main Memory Compression ...omutlu/pub/linearly... · Executive Summary 2 Main memory is a limited shared resource Observation: Significant data redundancy

Memory Request Flow

1. Initial Page Compression

2. Cache Line Read

3. Cache Line Writeback

20

Page 21: Linearly Compressed Pages: A Main Memory Compression ...omutlu/pub/linearly... · Executive Summary 2 Main memory is a limited shared resource Observation: Significant data redundancy

Initial Page Compression (1/3)Memory Request Flow (2)

21

Last-LevelCache

Core TLB

Compress/ Decompress

MemoryController

MD Cache

Processor

Disk

DRAM

4KB

1KB

1. Initial Page Compression2. Cache Line Read

LD

LD

1KB$Line

3. Cache Line Writeback

$Line

2KB

$Line

Cache Line Read (2/3)Cache Line Writeback (3/3)

Page 22: Linearly Compressed Pages: A Main Memory Compression ...omutlu/pub/linearly... · Executive Summary 2 Main memory is a limited shared resource Observation: Significant data redundancy

Handling Page Overflows

22

• Happens after writebacks, when all slots in the exception storage are already taken

• Two possible scenarios:

• Type-1 overflow: requires larger physical page size (e.g., 2KB instead of 1KB)

• Type-2 overflow: requires decompression and full uncompressed physical page (e.g., 4KB)

$ line

M

Compressed Data

E0

Exception Storage

E1 E2

Happens infrequently -once per ~2M instructions

Page 23: Linearly Compressed Pages: A Main Memory Compression ...omutlu/pub/linearly... · Executive Summary 2 Main memory is a limited shared resource Observation: Significant data redundancy

Compression Algorithms

23

• Key requirements:• Low hardware complexity

• Low decompression latency

• High effective compression ratio

• Frequent Pattern Compression [ISCA’04]

• Uses simplified dictionary-based compression

• Base-Delta-Immediate Compression [PACT’12]

• Uses low-dynamic range in the data

Page 24: Linearly Compressed Pages: A Main Memory Compression ...omutlu/pub/linearly... · Executive Summary 2 Main memory is a limited shared resource Observation: Significant data redundancy

Base-Delta Encoding [PACT’12]

24

32-byte Uncompressed Cache Line

0xC04039C0 0xC04039C8 0xC04039D0 … 0xC04039F8

0xC04039C0

Base

0x00

1 byte

0x08

1 byte

0x10

1 byte

… 0x38 12-byte Compressed Cache Line

20 bytes saved Fast Decompression:

vector addition

Simple Hardware: arithmetic and comparison

Effective: good compression ratio

BDI [PACT’12] has two bases:1. zero base (for narrow values)2. arbitrary base (first non-zero

value in the cache line)

Page 25: Linearly Compressed Pages: A Main Memory Compression ...omutlu/pub/linearly... · Executive Summary 2 Main memory is a limited shared resource Observation: Significant data redundancy

• Memory bandwidth reduction:

• Zero pages and zero cache lines

• Handled separately in TLB (1-bit) and in metadata

(1-bit per cache line)

LCP-Enabled Optimizations

25

64B 64B 64B 64B

1 transfer instead of 4

Page 26: Linearly Compressed Pages: A Main Memory Compression ...omutlu/pub/linearly... · Executive Summary 2 Main memory is a limited shared resource Observation: Significant data redundancy

Outline

26

•Motivation & Challenges

• Shortcomings of Prior Work

• LCP: Key Idea

• LCP: Implementation

• Evaluation

• Conclusion and Future Work

Page 27: Linearly Compressed Pages: A Main Memory Compression ...omutlu/pub/linearly... · Executive Summary 2 Main memory is a limited shared resource Observation: Significant data redundancy

Methodology

27

• Simulator: x86 event-driven based on Simics

• Workloads (32 applications)• SPEC2006 benchmarks, TPC, Apache web server

• System Parameters• L1/L2/L3 cache latencies from CACTI [Thoziyoor+, ISCA’08]• 512kB - 16MB L2 caches • DDR3-1066, 1 memory channel

• Metrics• Performance: Instructions per cycle, weighted speedup• Capacity: Effective compression ratio• Bandwidth: Bytes per kilo-instruction (BPKI)• Energy: Memory subsystem energy

Page 28: Linearly Compressed Pages: A Main Memory Compression ...omutlu/pub/linearly... · Executive Summary 2 Main memory is a limited shared resource Observation: Significant data redundancy

Evaluated Designs

28

Design Description

Baseline Baseline (no compression)

RMC Robust main memory compression[ISCA’05]

(RMC) and FPC[ISCA’04]

LCP-FPC LCP framework with FPC

LCP-BDI LCP framework with BDI[PACT’12]

LZ Lempel-Ziv compression (per page)

Page 29: Linearly Compressed Pages: A Main Memory Compression ...omutlu/pub/linearly... · Executive Summary 2 Main memory is a limited shared resource Observation: Significant data redundancy

Effect on Memory Capacity

29

32 SPEC2006, databases, web workloads, 2MB L2 cache

LCP-based designs achieve competitive average compression ratios with prior work

1.00

1.59 1.52 1.62

2.60

0.0

0.5

1.0

1.5

2.0

2.5

Co

mp

ress

ion

Rat

io

Baseline RMC LCP-FPC LCP-BDI LZ

Page 30: Linearly Compressed Pages: A Main Memory Compression ...omutlu/pub/linearly... · Executive Summary 2 Main memory is a limited shared resource Observation: Significant data redundancy

Effect on Bus Bandwidth

30

32 SPEC2006, databases, web workloads, 2MB L2 cache

LCP-based designs significantly reduce bandwidth (24%)(due to data compression)

Bet

ter

1.00

0.79 0.80 0.76

0.0

0.2

0.4

0.6

0.8

1.0

No

rmal

ize

d B

PK

I

Baseline RMC LCP-FPC LCP-BDI

Page 31: Linearly Compressed Pages: A Main Memory Compression ...omutlu/pub/linearly... · Executive Summary 2 Main memory is a limited shared resource Observation: Significant data redundancy

Effect on Performance

31

LCP-based designs significantly improve performance over RMC

0%

5%

10%

15%

1-core 2-core 4-core

Pe

rfo

rman

ce

Imp

rove

me

nt RMC LCP-FPC LCP-BDI

Page 32: Linearly Compressed Pages: A Main Memory Compression ...omutlu/pub/linearly... · Executive Summary 2 Main memory is a limited shared resource Observation: Significant data redundancy

Effect on Memory Subsystem Energy

32

32 SPEC2006, databases, web workloads, 2MB L2 cache

LCP framework is more energy efficient than RMC

Bet

ter

1.00 1.060.97 0.95

0.0

0.2

0.4

0.6

0.8

1.0

1.2

No

rmal

ize

d E

ne

rgy Baseline RMC LCP-FPC LCP-BDI

Page 33: Linearly Compressed Pages: A Main Memory Compression ...omutlu/pub/linearly... · Executive Summary 2 Main memory is a limited shared resource Observation: Significant data redundancy

Effect on Page Faults

33

32 SPEC2006, databases, web workloads, 2MB L2 cache

LCP framework significantly decreases the number of page faults (up to 23% on average for 768MB)

8%14% 23%

21%

00.20.40.60.8

11.2

256MB 512MB 768MB 1GB

No

rmal

ize

d #

of

Pag

e F

ault

s

DRAM Size

Baseline LCP-BDI

Page 34: Linearly Compressed Pages: A Main Memory Compression ...omutlu/pub/linearly... · Executive Summary 2 Main memory is a limited shared resource Observation: Significant data redundancy

Other Results and Analyses in the Paper

• Analysis of page overflows

• Compressed page size distribution

• Compression ratio over time

• Number of exceptions (per page)

• Detailed single-/multicore evaluation

• Comparison with stride prefetching

• performance and bandwidth

34

Page 35: Linearly Compressed Pages: A Main Memory Compression ...omutlu/pub/linearly... · Executive Summary 2 Main memory is a limited shared resource Observation: Significant data redundancy

Conclusion• Old Idea: Compress data in main memory

• Problem: How to avoid inefficiency in address computation?

• Solution: A new main memory compression framework called LCP (Linearly Compressed Pages)• Key idea: fixed-size for compressed cache lines within a

page

• Evaluation:1. Increases memory capacity (62% on average)

2. Decreases bandwidth consumption (24%)

3. Decreases memory energy consumption (4.9%)

4. Improves overall performance (13.9%)

35

Page 36: Linearly Compressed Pages: A Main Memory Compression ...omutlu/pub/linearly... · Executive Summary 2 Main memory is a limited shared resource Observation: Significant data redundancy

Linearly Compressed Pages:A Main Memory Compression

Framework with Low Complexity and Low Latency

Gennady Pekhimenko,

Vivek Seshadri , Yoongu Kim, Hongyi Xin, Onur Mutlu,

Todd C. Mowry

Phillip B. Gibbons,

Michael A. Kozuch

Page 37: Linearly Compressed Pages: A Main Memory Compression ...omutlu/pub/linearly... · Executive Summary 2 Main memory is a limited shared resource Observation: Significant data redundancy

Backup Slides

37

Page 38: Linearly Compressed Pages: A Main Memory Compression ...omutlu/pub/linearly... · Executive Summary 2 Main memory is a limited shared resource Observation: Significant data redundancy

Large Pages (e.g., 2MB or 1GB)

• Splitting large pages into smaller 4KB sub-pages (compressed individually)

• 64-byte metadata chunks for every sub-page

38

2KB 2KB

2KB 2KBM

Page 39: Linearly Compressed Pages: A Main Memory Compression ...omutlu/pub/linearly... · Executive Summary 2 Main memory is a limited shared resource Observation: Significant data redundancy

Physically Tagged Caches

39

Core

TLB

tag

tag

tag

Physical Address

data

data

data

VirtualAddress

Critical PathAddress Translation

L2 CacheLines

Page 40: Linearly Compressed Pages: A Main Memory Compression ...omutlu/pub/linearly... · Executive Summary 2 Main memory is a limited shared resource Observation: Significant data redundancy

Changes to Cache Tagging Logic

40

Before:

tag

tag

p-base

data

data

data

CacheLines

tag

• p-base – physical page base address

• c-idx – cache line index within the page

After:

p-base c-idx

Page 41: Linearly Compressed Pages: A Main Memory Compression ...omutlu/pub/linearly... · Executive Summary 2 Main memory is a limited shared resource Observation: Significant data redundancy

Analysis of Page Overflows

41

1.E-08

1.E-07

1.E-06

1.E-05

1.E-04

1.E-03

apa

ch

ea

sta

rb

zip

2ca

ctu

sA

DM

gcc

Gem

sF

DT

Dg

rom

acs

h26

4re

flb

mle

slie

3d

libq

ua

ntu

mm

cf

om

ne

tpp

perlb

en

ch

sje

ng

so

ple

xsp

hin

x3

tpcc

tpch

6xa

lan

cb

mk

ze

usm

p

Geo

Me

an

Typ

e-1

Overf

low

s p

er

ins

tr.

(lo

g-s

cale

)

Page 42: Linearly Compressed Pages: A Main Memory Compression ...omutlu/pub/linearly... · Executive Summary 2 Main memory is a limited shared resource Observation: Significant data redundancy

Frequent Pattern Compression

42

Idea: encode cache lines based on frequently occurring patterns, e.g., first half of a word is zero

0x00000001 0x00000000 0xFFFFFFFF 0xABCDEFFF

0x00000001 001

0x00000000 000

0xFFFFFFFF 011

0xABCDEFFF 111

Frequent Patterns:000 – All zeros001 – First half zeros010 – Second half zeros011 – Repeated bytes100 – All ones…111 – Not a frequent pattern

001 0x0001 000 011 0xFF 111 0xABCDEFFF

0x0001

0xFF

0xABCDEFFF

Page 43: Linearly Compressed Pages: A Main Memory Compression ...omutlu/pub/linearly... · Executive Summary 2 Main memory is a limited shared resource Observation: Significant data redundancy

GPGPU Evaluation

• Gpgpu-sim v3.x

• Card: NVIDIA GeForce GTX 480 (Fermi)

• Caches:

– DL1: 16 KB with 128B lines

– L2: 786 KB with 128B lines

• Memory: GDDR5

43

Page 44: Linearly Compressed Pages: A Main Memory Compression ...omutlu/pub/linearly... · Executive Summary 2 Main memory is a limited shared resource Observation: Significant data redundancy

Effect on Bandwidth Consumption

44

0.00.51.01.52.02.53.0

BFS

MU

MJP

EG NN

LPS

STO

CO

NS

SCP

spm

vsa

d

bac

kpro

ph

ots

po

tst

ream

clu

ster

PV

CP

VR

InvI

dx SS b

fs bh

dm

rm

st spss

sp

Geo

Mea

n

CUDA Parboil Rodinia Mars Lonestar

No

rmal

ize

d B

PK

I

BDI LCP-BDI

Page 45: Linearly Compressed Pages: A Main Memory Compression ...omutlu/pub/linearly... · Executive Summary 2 Main memory is a limited shared resource Observation: Significant data redundancy

Effect on Throughput

45

0.8

1.0

1.2

1.4

1.6

1.8

BFS

MU

MJP

EG NN

LPS

STO

CO

NS

SCP

spm

vsa

d

bac

kpro

ph

ots

po

tst

ream

clu

ster

PV

CP

VR

InvI

dx SS b

fs bh

dm

rm

st spss

sp

Geo

Mea

n

CUDA Parboil Rodinia Mars Lonestar

No

rmal

ize

d P

erf

orm

ance

Baseline BDI

Page 46: Linearly Compressed Pages: A Main Memory Compression ...omutlu/pub/linearly... · Executive Summary 2 Main memory is a limited shared resource Observation: Significant data redundancy

Physical Memory Layout

46

1

4

4KB

2KB 2KB

1KB 1KB 1KB 1KB

512B 512B ... 512B

4KB

Page Table

PA1c-base

PA2

PA2 + 512*1

PA1 + 512*4

PA0

Page 47: Linearly Compressed Pages: A Main Memory Compression ...omutlu/pub/linearly... · Executive Summary 2 Main memory is a limited shared resource Observation: Significant data redundancy

Page Size Distribution

47

Page 48: Linearly Compressed Pages: A Main Memory Compression ...omutlu/pub/linearly... · Executive Summary 2 Main memory is a limited shared resource Observation: Significant data redundancy

Compression Ratio Over Time

48

Page 49: Linearly Compressed Pages: A Main Memory Compression ...omutlu/pub/linearly... · Executive Summary 2 Main memory is a limited shared resource Observation: Significant data redundancy

IPC (1-core)

49

Page 50: Linearly Compressed Pages: A Main Memory Compression ...omutlu/pub/linearly... · Executive Summary 2 Main memory is a limited shared resource Observation: Significant data redundancy

Weighted Speedup

50

Page 51: Linearly Compressed Pages: A Main Memory Compression ...omutlu/pub/linearly... · Executive Summary 2 Main memory is a limited shared resource Observation: Significant data redundancy

Bandwidth Consumption

51

Page 52: Linearly Compressed Pages: A Main Memory Compression ...omutlu/pub/linearly... · Executive Summary 2 Main memory is a limited shared resource Observation: Significant data redundancy

Page Overflows

52

Page 53: Linearly Compressed Pages: A Main Memory Compression ...omutlu/pub/linearly... · Executive Summary 2 Main memory is a limited shared resource Observation: Significant data redundancy

Stride Prefetching - IPC

53

Page 54: Linearly Compressed Pages: A Main Memory Compression ...omutlu/pub/linearly... · Executive Summary 2 Main memory is a limited shared resource Observation: Significant data redundancy

Stride Prefetching - Bandwidth

54