Top Banner
LEVERAGING ACCESS LOCALITY FOR THE EFFICIENT USE OF MULTIBIT ERROR-CORRECTING CODES IN L2 CACHE By Hongbin Sun, Nanning Zheng, and Tong Zhang Joseph Schneider March 23, 2010
28

By Hongbin Sun, Nanning Zheng , and Tong Zhang

Feb 23, 2016

Download

Documents

cloris

Leveraging Access Locality for the Efficient Use of Multibit Error-Correcting Codes in L2 Cache. Joseph Schneider March 23, 2010. By Hongbin Sun, Nanning Zheng , and Tong Zhang. The Problem. As CMOS technology shrinks, random defects increase - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: By  Hongbin  Sun, Nanning  Zheng ,  and  Tong  Zhang

LEVERAGING ACCESS LOCALITY FOR THE EFFICIENT USE

OF MULTIBIT ERROR-CORRECTING CODES IN L2 CACHE

By Hongbin Sun, Nanning Zheng, and Tong Zhang

Joseph SchneiderMarch 23, 2010

Page 2: By  Hongbin  Sun, Nanning  Zheng ,  and  Tong  Zhang

The Problem As CMOS technology shrinks, random

defects increase

Traditionally, these defects handled with redundant rows, columns, and words to replace defective ones

As random defects increase, traditional defect strategy may no longer be sufficient

Page 3: By  Hongbin  Sun, Nanning  Zheng ,  and  Tong  Zhang
Page 4: By  Hongbin  Sun, Nanning  Zheng ,  and  Tong  Zhang

The Solution Extend the role of Error-Correcting Codes

to compensate for defects

Error-Correcting Codes (ECC) also used to compensate for transient soft errors

Find a method that allows ECCs to be used for both defects and soft errors

Page 5: By  Hongbin  Sun, Nanning  Zheng ,  and  Tong  Zhang

Multi-bit ECC Multi-bit ECC – ECC that can correct

multiple errors in one codeword

Suffers larger latency and higher coding redundancy than single error correction

Therefore unusable in L1 cache without suffering major performance issues

Page 6: By  Hongbin  Sun, Nanning  Zheng ,  and  Tong  Zhang

Overall Goal Implement multi-bit ECC in L2 cache

design to correct L2 cache defects without causing significant IPC degradation, area use, or energy cost

Page 7: By  Hongbin  Sun, Nanning  Zheng ,  and  Tong  Zhang

Steps to Success 1. Apply multi-bit ECC only to cache

blocks that require it

2. Implement buffers to limit repeated use of multi-bit ECC

3. Ensure data integrity for soft errors where ECC can no longer alone compensate for it

Page 8: By  Hongbin  Sun, Nanning  Zheng ,  and  Tong  Zhang

Limited multi-bit ECC Cache blocks with one or more defective

cells identified during memory testing; Multi-bit ECC selectively applied then

Content-Addressable Memory (CAM) then used to identify blocks requiring multi-bit ECC (referred to as m-blocks)

ISSUE: CAM requires large energy consumption

Page 9: By  Hongbin  Sun, Nanning  Zheng ,  and  Tong  Zhang

Proposed Architecture Standard L2 cache core protecting all subblocks

with single error correction, double error detection (SEC-DED) codes

Multi-bit ECC core using fully associative multi-bit ECC cache (M-ECC cache), ECC encoder/decoder, and two buffers. M-ECC cache contains location tags and corresponding check bits

Dirty Replication Cache to ensure soft error tolerance

Page 10: By  Hongbin  Sun, Nanning  Zheng ,  and  Tong  Zhang

Proposed Architecture

Page 11: By  Hongbin  Sun, Nanning  Zheng ,  and  Tong  Zhang

Multi-bit ECC Core In case of write, subblock data encoded and

check bits stored

In case of read, check bits fetched and decoded

ISSUE: Constant use of multi-bit ECC will increase latency and energy consumption at higher defect densities

Solution: Two additional buffers

Page 12: By  Hongbin  Sun, Nanning  Zheng ,  and  Tong  Zhang

Multi-bit ECC Core Buffers Pre-decoding Buffer: Small cache that keeps

copies of mostly recently accessed m-blocks; Searched before accessing M-ECC cache

Employs least recently used (LRU) policy for replacement when full; Successful due to cache access temporal locality

Reduces large amount of ECC decoding and some M-ECC cache access

Page 13: By  Hongbin  Sun, Nanning  Zheng ,  and  Tong  Zhang

Multi-bit ECC Core Buffers FLU buffer – small CAM that keeps

addresses of recently accessed cache blocks that are NOT m-blocks

Also employs LRU policy

Further reduces M-ECC cache access

Page 14: By  Hongbin  Sun, Nanning  Zheng ,  and  Tong  Zhang

M-ECC core Flow Chart

Page 15: By  Hongbin  Sun, Nanning  Zheng ,  and  Tong  Zhang

Soft Error Tolerance ISSUE: When ECC devoted to defect tolerance,

defective subblock is vulnerable to soft errors

Only necessary for blocks containing defects (including blocks with single defects protected by SEC-DED rather than multi-bit ECC)

Further, only necessary when cache block is dirty; Clean blocks can redirect to memory when soft error detected

Page 16: By  Hongbin  Sun, Nanning  Zheng ,  and  Tong  Zhang

Dirty Replication Cache Use of Dirty Replication (DR) cache

When cache block made dirty, data is also kept in this cache

When data leaves this cache, a write is performed to main memory

Ensures a backup is always available

Page 17: By  Hongbin  Sun, Nanning  Zheng ,  and  Tong  Zhang
Page 18: By  Hongbin  Sun, Nanning  Zheng ,  and  Tong  Zhang

Evaluation Cache defect density set at 0.5% Multi-ECC: BCH-based DEC-TED code (double error

correction, triple error detection); Subblocks with more than two errors repaired by redundancy

Cache subblocks contain 64 bits BCH DEC-TED decoder has parallelism of 2, uses

PGZ decoding algorithm- resulting latency of 82 cycles

Cacti 5 used to model caches; Through verilog, determined extra logic is 0.2% of area of L2 cache core

Page 19: By  Hongbin  Sun, Nanning  Zheng ,  and  Tong  Zhang

Evaluation Compared on four bases:

Base: Defect-free L2 cache with no defect tolerant functions

M-ECC only; No buffers M-ECC-pbuf: Use of predecoding buffer M-ECC-pfbuf: Use of predecoding and FLU buffers

First, determine best size of buffers for use; Then compare performance of IPC and power consumption

Page 20: By  Hongbin  Sun, Nanning  Zheng ,  and  Tong  Zhang

Size of precoding Buffer

Page 21: By  Hongbin  Sun, Nanning  Zheng ,  and  Tong  Zhang

Size of FLU buffer

Page 22: By  Hongbin  Sun, Nanning  Zheng ,  and  Tong  Zhang

Normalize IPC comparison

Page 23: By  Hongbin  Sun, Nanning  Zheng ,  and  Tong  Zhang

Normalized Power Consumption

Page 24: By  Hongbin  Sun, Nanning  Zheng ,  and  Tong  Zhang

Results Similar IPC performance, M-ECC core

power performance 30% of L2 cache core, which itself is about 10% of the entire system cache

Page 25: By  Hongbin  Sun, Nanning  Zheng ,  and  Tong  Zhang

DR Write-back hit rates L2 cache fixed at 1 MB 8-way

associative, DR varies

Page 26: By  Hongbin  Sun, Nanning  Zheng ,  and  Tong  Zhang

DR Write-back hit rates DR fully associative with 64 blocks, 1 MB

L2 cache varies

Page 27: By  Hongbin  Sun, Nanning  Zheng ,  and  Tong  Zhang

Conclusions Goal was to effectively use multi-bit ECC for L2 cache

defect tolerance at minimal performance and implementation cost

Multi-bit ECC implemented only where more than one defect found

Two small buffers included to reduce performance impact of multi-bit ECC

Dirty Replication Cache included to ensure soft error tolerance

Page 28: By  Hongbin  Sun, Nanning  Zheng ,  and  Tong  Zhang

Conclusions IPC performance nearly the same as

defect-free cache

M-ECC cache has less than 2.5% of area overhead and 36% of energy consumption overhead

Dirty replication cache has area overhead of only 0.3%, storing 96.4% of write-back data from L1 cache