Top Banner
PAY-AS-YOU-GO STORAGE-EFFICIENT HARD ERROR CORRECTION Moinuddin K. Qureshi ECE, Georgia Tech Research done while at: IBM T. J. Watson Research Center New York MICRO 2011 Dec 6, 2011
28

P AY -A S -Y OU -G O S TORAGE -E FFICIENT H ARD E RROR C ORRECTION Moinuddin K. Qureshi ECE, Georgia Tech Research done while at: IBM T. J. Watson Research.

Jan 13, 2016

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: P AY -A S -Y OU -G O S TORAGE -E FFICIENT H ARD E RROR C ORRECTION Moinuddin K. Qureshi ECE, Georgia Tech Research done while at: IBM T. J. Watson Research.

PAY-AS-YOU-GO STORAGE-EFFICIENT HARD ERROR CORRECTION

Moinuddin K. QureshiECE, Georgia Tech

Research done while at: IBM T. J. Watson Research Center New York

MICRO 2011 Dec 6, 2011

Page 2: P AY -A S -Y OU -G O S TORAGE -E FFICIENT H ARD E RROR C ORRECTION Moinuddin K. Qureshi ECE, Georgia Tech Research done while at: IBM T. J. Watson Research.

PAY-AS-YOU-GO, MICRO-2011

Introduction

PCM is a scalable technology. Device state changed by heating.

Over time, write operations break heater Cell gets stuck

Reported write endurance: 10-100 million writes/cell

With good wear leveling still possible to have 8+ years lifetime

Page 3: P AY -A S -Y OU -G O S TORAGE -E FFICIENT H ARD E RROR C ORRECTION Moinuddin K. Qureshi ECE, Georgia Tech Research done while at: IBM T. J. Watson Research.

PAY-AS-YOU-GO, MICRO-2011

Not All Cells Are Created Equal

Variability in lifetime due to process variation: weak vs. strong cells

Weak cells fail much earlier reduce system lifetime greatly

Lifetime usually modeled as Gaussian with SDEV of 10-30% of meanWe use SDEV=20% of mean

P (5 SDEV from mean) ≈ 10-6

For 1GB memory bank, 8K bits fail at time 0, more as we write!

PCM needs significant amount of error correction to handle variability

Page 4: P AY -A S -Y OU -G O S TORAGE -E FFICIENT H ARD E RROR C ORRECTION Moinuddin K. Qureshi ECE, Georgia Tech Research done while at: IBM T. J. Watson Research.

PAY-AS-YOU-GO, MICRO-2011

Write Efficient Code

Traditional ECC codes are write intensive More wear

Endurance related (hard) faults identified with checker read

Write-efficient code: Error Correcting Pointers [ISCA’10]

ECP needs 10 bits per entry. Handles multiple faults (needs 1 Full bit)

0 1 2 3 4 … 511

Cache Line (512b)

XPointer

9 bit

D

For correcting N errors, ECP needs (10N+1) bits

1 bit

Page 5: P AY -A S -Y OU -G O S TORAGE -E FFICIENT H ARD E RROR C ORRECTION Moinuddin K. Qureshi ECE, Georgia Tech Research done while at: IBM T. J. Watson Research.

PAY-AS-YOU-GO, MICRO-2011

Expensive to Correct Many Errors

To get 6+ years lifetime, we need to correct six errors per line

Storage: 61 bits/line (about 12%, 1GB for 8GB) Expensive

Unlike ECC in current DRAM chips, this overhead is not optional

0 1 2 3 4 5 6 7

Baseline System Lifetime (years)

NoECPECP-1 ECP-2 ECP-3 ECP-4 ECP-5 ECP-6

Goal: Reduce storage significantly (3X-6X) while retaining lifetime

Page 6: P AY -A S -Y OU -G O S TORAGE -E FFICIENT H ARD E RROR C ORRECTION Moinuddin K. Qureshi ECE, Georgia Tech Research done while at: IBM T. J. Watson Research.

PAY-AS-YOU-GO, MICRO-2011

Motivation

Uniformly allocating error correction entries is inefficient (by ~20X)

We do not need to pay for error correction of each line upfront

Pay-As-You-Go: Give error correction entries in proportion to errors

Num Writes(Normalized)

No ECP used

Only ECP-1 used

ECP-2 to ECP-6 used

Average ECP Used

50% 99.02% 0.97% 0.01% 0.01

95% 79.63% 18.14% 2.23% 0.23

100% 73.24% 22.82% 3.95% 0.31

Utilization of error correction entries per line

Key insight: Very few lines have large number of errors

Page 7: P AY -A S -Y OU -G O S TORAGE -E FFICIENT H ARD E RROR C ORRECTION Moinuddin K. Qureshi ECE, Georgia Tech Research done while at: IBM T. J. Watson Research.

PAY-AS-YOU-GO, MICRO-2011

Outline

Introduction & Motivation PAYG Design Results Even More Storage Efficiency Related Work Summary

Page 8: P AY -A S -Y OU -G O S TORAGE -E FFICIENT H ARD E RROR C ORRECTION Moinuddin K. Qureshi ECE, Georgia Tech Research done while at: IBM T. J. Watson Research.

PAY-AS-YOU-GO, MICRO-2011

Naïve Design for PAYG

MEMORY LINE (64B)OFB

Ways (Num GEC entries per set)

Sets V TAG ECP-N

GEC Entry

Global Error Correction (GEC) Pool

Given 73% of lines have no error, why not give ECP-6 only on error?

GEC Pool structure: Set associative vs. Fully associative (impractical)

Page 9: P AY -A S -Y OU -G O S TORAGE -E FFICIENT H ARD E RROR C ORRECTION Moinuddin K. Qureshi ECE, Georgia Tech Research done while at: IBM T. J. Watson Research.

PAY-AS-YOU-GO, MICRO-2011

Three Key Problems

1. Set associative structure is inefficient (by ~8X for 8-way)

2. If we allocate six ECP entries per each GEC entry, most errorcorrection entries still remain unused

3. Given >25% of lines are likely to have at-least on error, the latency impact of GEC is significant

Page 10: P AY -A S -Y OU -G O S TORAGE -E FFICIENT H ARD E RROR C ORRECTION Moinuddin K. Qureshi ECE, Georgia Tech Research done while at: IBM T. J. Watson Research.

PAY-AS-YOU-GO, MICRO-2011

Inefficiency of Set Associative GEC There are 10s/100s of thousand of sets Any set could overflow

How many entries used before one set overflows? Buckets-and-Balls

An 8-way GEC only 12% full when one set overflows Need 8x entries

Page 11: P AY -A S -Y OU -G O S TORAGE -E FFICIENT H ARD E RROR C ORRECTION Moinuddin K. Qureshi ECE, Georgia Tech Research done while at: IBM T. J. Watson Research.

PAY-AS-YOU-GO, MICRO-2011

Scalable Structure for GEC Pool

“Hash-Table With Chaining” structure for flexibility & low latency

OFB

OFBSet Associative Table (SAT)

Global Collision Table (GCT)

GEC Entry

1

PTR1

PTR

GCT-HEAD

*PTR is two-way replicated

TAKEN BY SOME OTHER SET

Page 12: P AY -A S -Y OU -G O S TORAGE -E FFICIENT H ARD E RROR C ORRECTION Moinuddin K. Qureshi ECE, Georgia Tech Research done while at: IBM T. J. Watson Research.

PAY-AS-YOU-GO, MICRO-2011

Scalable Structure for GEC Pool

Structure Total Entries Latency

Fully Associative N Very High

8-way Set Associative 8*N 1

8-way (SAT+GCT) 1.5*N 1+epsilon

Proposed GEC structure has latency similar to Set Associative Table while needing 5X fewer entries

Global Collision Table (GCT) with half as many sets as SAT is sufficient

Lets say we want to store N entries

Page 13: P AY -A S -Y OU -G O S TORAGE -E FFICIENT H ARD E RROR C ORRECTION Moinuddin K. Qureshi ECE, Georgia Tech Research done while at: IBM T. J. Watson Research.

PAY-AS-YOU-GO, MICRO-2011

Solving Other Two Problems

2. Fine Grained Allocation for effectively utilizing ECP entries• Each GEC entry has only ECP-1. • Each line can have multiple GEC entries• We guarantee that all entries are in same set of (SAT/GCT)• A faulty line can get more than ECP-6 as well

3. Local Error Correction (LEC) for low latency in common case• Each line has dedicated ECP-1 (handles 95% lines)• Ensures extra accesses (GEC) needed for only few lines

Page 14: P AY -A S -Y OU -G O S TORAGE -E FFICIENT H ARD E RROR C ORRECTION Moinuddin K. Qureshi ECE, Georgia Tech Research done while at: IBM T. J. Watson Research.

PAY-AS-YOU-GO, MICRO-2011

PAYG: Tying it All TogetherPAYG performs on-demand allocation of error correction entries

PAYG has 3 levels. LEC is first line of defense (lowers latency) SAT is second and GCT is third (flexible)

Page 15: P AY -A S -Y OU -G O S TORAGE -E FFICIENT H ARD E RROR C ORRECTION Moinuddin K. Qureshi ECE, Georgia Tech Research done while at: IBM T. J. Watson Research.

PAY-AS-YOU-GO, MICRO-2011

Outline

Introduction & Motivation PAYG Design Results Even More Storage Efficiency Related Work Summary

Page 16: P AY -A S -Y OU -G O S TORAGE -E FFICIENT H ARD E RROR C ORRECTION Moinuddin K. Qureshi ECE, Georgia Tech Research done while at: IBM T. J. Watson Research.

PAY-AS-YOU-GO, MICRO-2011

Evaluation Settings Assumptions:

1. Mean writes 32 Million, SDEV=20%, no correlation2. Perfect wear leveling all lines get same number of writes3. Writes are converted into writes-read to detect faults

Configuration:PCM bank of 1GB with 64B lines, so 16 million lines per bankWrite latency of 1 micro secondAt 100% write traffic, lifetime is 18 years (if zero variance)

Figure of Merit:Uniform ECP-6 gets 35% of ideal lifetime, so 6.5 yearsWe report lifetime with respect to Uniform ECP-6

Page 17: P AY -A S -Y OU -G O S TORAGE -E FFICIENT H ARD E RROR C ORRECTION Moinuddin K. Qureshi ECE, Georgia Tech Research done while at: IBM T. J. Watson Research.

PAY-AS-YOU-GO, MICRO-2011

ECP-1

ECP-2

ECP-3

ECP-4

ECP-5

ECP-6

32K

64K

128K

256K

512K

1024K

2K 4K 8K 16K

32K

64K

0

10

20

30

40

50

60

70

80

90

100

110

Life

time

wit

h re

spec

t to

ECP-

6 (%

)

Importance of Scalable GEC Pool

Proposed structure reduces storage overhead of GEC by more than 5X

Num SAT SetsNum GCT Sets

(SAT Sets=128K)

NoFGA-NoGCT NoFGA-wGCT Total Sets128K+64K=192K

Page 18: P AY -A S -Y OU -G O S TORAGE -E FFICIENT H ARD E RROR C ORRECTION Moinuddin K. Qureshi ECE, Georgia Tech Research done while at: IBM T. J. Watson Research.

PAY-AS-YOU-GO, MICRO-2011

Importance of Fine-Grained Alloc.Num ECP Entries in Each GEC Entry 5 4 3 2 1

Num GEC Entry per Set (64B line) 8 9 12 16 24

Total ECP Entries per Set 40 36 36 32 24

5 4 3 2 1100

102

104

106

108

110

112

114

116

Num ECP Entries in Each GEC Entry

Life

time

Nor

m. t

o EC

P-6

(%)

Fine-Grained Allocation improves the effectiveness of PAYG

Page 19: P AY -A S -Y OU -G O S TORAGE -E FFICIENT H ARD E RROR C ORRECTION Moinuddin K. Qureshi ECE, Georgia Tech Research done while at: IBM T. J. Watson Research.

PAY-AS-YOU-GO, MICRO-2011

Importance of LECWe can get higher lifetime by increasing GEC size but we still need LEC

5 years

For first 5 years, PAYG incurs on avg 1 extra access for < 0.4% accesses

Without LEC, latency impact is significant. With LEC, not so much

Page 20: P AY -A S -Y OU -G O S TORAGE -E FFICIENT H ARD E RROR C ORRECTION Moinuddin K. Qureshi ECE, Georgia Tech Research done while at: IBM T. J. Watson Research.

PAY-AS-YOU-GO, MICRO-2011

Storage Overhead

LEC Storage 13 bits/line (10 bit ECP + 1 valid + 2 OFB)

GEC Storage 6.5 bits/line on average

Total 19.5 bits/line

Scheme Storage Overhead(bits/line)

Lifetime

Uniform ECP-6 61 1X

Uniform ECP-8 81 1.13X

PAYG with ECP-1 in LEC 19.5 1.13X

PAYG provides lifetime similar to ECP-8 at 3.1X less storage than ECP-6

(Total storage overhead to protect 1GB reduces from 122MB to 39MB, down 83MB)

Page 21: P AY -A S -Y OU -G O S TORAGE -E FFICIENT H ARD E RROR C ORRECTION Moinuddin K. Qureshi ECE, Georgia Tech Research done while at: IBM T. J. Watson Research.

PAY-AS-YOU-GO, MICRO-2011

Outline

Introduction & Motivation PAYG Design Results Even More Storage Efficiency Related Work Summary

Page 22: P AY -A S -Y OU -G O S TORAGE -E FFICIENT H ARD E RROR C ORRECTION Moinuddin K. Qureshi ECE, Georgia Tech Research done while at: IBM T. J. Watson Research.

PAY-AS-YOU-GO, MICRO-2011

Efficient Single Bit CorrectionLEC responsible for most of storage overhead (13 bits out of 19.5 bits)

Need efficient schemes single bit hard faults Alternate Data Retry (ADR)

ADR: Mask hard fault by storing data in either normal or inverted form

110 1

SA-0

0

INV

001 0

SA-0

1

INV

ADR needs only 1 bit to mask a single stuck-at-fault (caveat: double write)

Reduce storage overhead of PAYG by using ADR instead of ECP-1 in LEC

Page 23: P AY -A S -Y OU -G O S TORAGE -E FFICIENT H ARD E RROR C ORRECTION Moinuddin K. Qureshi ECE, Georgia Tech Research done while at: IBM T. J. Watson Research.

PAY-AS-YOU-GO, MICRO-2011

Comparisons

Scheme Storage Overhead(bits/line)

Lifetime

Uniform ECP-6 61 1XUniform ECP-8 81 1.13XPAYG with ECP-1 in LEC 19.5 1.13XPAYG with ADR in LEC 9.5 1.02X

PAYG with heterogeneous error correction reduces storage by 6X

Hard to scale ADR to multiple faults. SAFER [MICRO’10] partitions lines with multiple faults into single bit faults. SAFER needs 55 bits/line and lifetime ~ECP-6

Page 24: P AY -A S -Y OU -G O S TORAGE -E FFICIENT H ARD E RROR C ORRECTION Moinuddin K. Qureshi ECE, Georgia Tech Research done while at: IBM T. J. Watson Research.

PAY-AS-YOU-GO, MICRO-2011

Outline

Introduction & Motivation PAYG Design Results Even More Storage Efficiency Related Work Summary

Page 25: P AY -A S -Y OU -G O S TORAGE -E FFICIENT H ARD E RROR C ORRECTION Moinuddin K. Qureshi ECE, Georgia Tech Research done while at: IBM T. J. Watson Research.

PAY-AS-YOU-GO, MICRO-2011

Non Uniform Error Correction

Variable Strength ECC (VS-ECC) by Alameldeen+ ISCA’11Proposed for cache reliability at low voltagesEach way has ECC-4 for one quarter of ways, allocated based on testingDifference: Cache line disabling works. Only set associative structure.

Layered ECP by Schechter+ ISCA’10ECP-1 for each line, and some ECP entries for each pageIn essence, this is a set-associative GEC with ECP-1 in LECDifference: Set associative GEC requires 5X more entries (inefficient)

Line Sparing with FREE-p by Hyun+ HPCA’11A faulty line is remapped to a spare area using embedded pointerSparing needs 1 good line for 1 uncorrectable faultDifference: PAYG is much more storage efficient than sparing

Page 26: P AY -A S -Y OU -G O S TORAGE -E FFICIENT H ARD E RROR C ORRECTION Moinuddin K. Qureshi ECE, Georgia Tech Research done while at: IBM T. J. Watson Research.

PAY-AS-YOU-GO, MICRO-2011

FREE-p: Sparing vs. CorrectionFor 1 extra error bit, PAYG needs 20 bit GEC entry, FREE-p needs 512 bit

PAYG is more effective than line sparing with FREE-p

Page 27: P AY -A S -Y OU -G O S TORAGE -E FFICIENT H ARD E RROR C ORRECTION Moinuddin K. Qureshi ECE, Georgia Tech Research done while at: IBM T. J. Watson Research.

PAY-AS-YOU-GO, MICRO-2011

Outline

Introduction & Motivation PAYG Design Results Even More Storage Efficiency Related Work Summary

Page 28: P AY -A S -Y OU -G O S TORAGE -E FFICIENT H ARD E RROR C ORRECTION Moinuddin K. Qureshi ECE, Georgia Tech Research done while at: IBM T. J. Watson Research.

PAY-AS-YOU-GO, MICRO-2011

SummaryPCM: limited endurance, variability across cells reduces lifetime

Need to correct many (six) errors per line

Uniform allocation is expensive and inefficient (only 0.3 out of 6 used)

Pay-As-You-Go (PAYG): Allocate error correction entries on demand

PAYG has LEC + GEC Pool (Set Associative Table + Global Collision Table)

Provides 1.13X lifetime compared to ECP-6 at 3.1X lower overhead

Heterogeneous scheme (ADR for LEC) reduces storage by 6X

PAYG useful for efficient hard-error correction in other technologies too