Top Banner
FLASH Mitigation Strategies for Space Applications Charles Howard Southwest Research Institute
23

FLASH Mitigation Strategies for Space Applications Charles Howard Southwest Research Institute.

Jan 16, 2016

Download

Documents

Beverly Lyons
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: FLASH Mitigation Strategies for Space Applications Charles Howard Southwest Research Institute.

FLASH Mitigation Strategies for Space Applications

Charles HowardSouthwest Research Institute

Page 2: FLASH Mitigation Strategies for Space Applications Charles Howard Southwest Research Institute.

2FLASH Mitigation Strategies for Space Applications

Abstract

The MMS mission requires a high density non-volatile solid state recorder. The SSR will be implemented with screened commercial FLASH devices, characterized for radiation effects (both TID and SEE). In an extensive collaborative effort by NEPP and SWRI, multiple manufacturers and devices have been characterized. The additional SEU failure modes exhibited by FLASH devices compel mitigation techniques to extend beyond the traditional bit error correction. A discussion of mitigation techniques and tradeoffs between FPGA complexity/utilization, bandwidth and total memory will be presented .

Page 3: FLASH Mitigation Strategies for Space Applications Charles Howard Southwest Research Institute.

3FLASH Mitigation Strategies for Space Applications

Why FLASH?

“I am your density” – George McFly, Back to the Future

SDRAM– 512Mx8 in an MCM?

SRAM– Yeah, right…

FLASH – 512Mx8 discrete parts

1Gx8 available

– 4Gx8 MCMs (8Gx8 possible)– NON-VOLATILE…

Page 4: FLASH Mitigation Strategies for Space Applications Charles Howard Southwest Research Institute.

4FLASH Mitigation Strategies for Space Applications

Why NOT FLASH?

Space qualified parts?– General availability sorely lacking– No Rad foundry providing FLASH

Legacy / Lack thereof– Radiation testing of commercial products is a

strenuous process…– Each wafer lot must be tested– “Long term” availability for commodity parts?

NOT!

Page 5: FLASH Mitigation Strategies for Space Applications Charles Howard Southwest Research Institute.

5FLASH Mitigation Strategies for Space Applications

NEPP/SWRI testing of FLASH

SEE response is generally excellent for all flash products– Error cross-sections orders of magnitude lower than

for standard volatile memories None of the parts suffered SEL

– There were other destructive effects, usually failure of the erase circuit.

The SEFI rate is a concern with flash memories.– What do you call a SEFI that won’t clear after a

power cycle?

Page 6: FLASH Mitigation Strategies for Space Applications Charles Howard Southwest Research Institute.

6FLASH Mitigation Strategies for Space Applications

FLASH Memory in Space Environment

“The SEFI (Single Event Functional Interrupt) rate is of greater concern for space applications than the bit error rate”– TID and SEE Response of Advanced 4G NAND Flash

Memories NSREC08, T.R. Oldham

Page 7: FLASH Mitigation Strategies for Space Applications Charles Howard Southwest Research Institute.

7FLASH Mitigation Strategies for Space Applications

Mitigation Considerations

Class of Error– SEUs– SEL– SEFI– “Permanent” SEFI

Cost of implementation/mitigation– Area – Mass– Power– Required FPGA logic

Page 8: FLASH Mitigation Strategies for Space Applications Charles Howard Southwest Research Institute.

8FLASH Mitigation Strategies for Space Applications

Error Classes

SEU– Address to satisfy MAR

Some form of ECC SEL

– Sufficiently low to neglect Component design issue

SEFI (part becomes nominal after power cycle/reset)– More likely than SEU, must address

Detect & power cycle/reset Permanent SEFI

– More likely than SEU, must address– Different mitigation approach!

???

Page 9: FLASH Mitigation Strategies for Space Applications Charles Howard Southwest Research Institute.

9FLASH Mitigation Strategies for Space Applications

Module Topology

4 Gigabyte Module

FLASH 4Gx8

(512Mx8, x8

PowerControl

Ctrl+Data

FLASH512Mx8

FLASH512Mx8

FLASH512Mx8

FLASH512Mx8

FLASH512Mx8

FLASH512Mx8

4GByte Module

FLASH512Mx8

FLASH512Mx8

Page 10: FLASH Mitigation Strategies for Space Applications Charles Howard Southwest Research Institute.

10FLASH Mitigation Strategies for Space Applications

CAVEAT STATEMENTS

I am not doing the probability calculations Consider a DWORD storage system for reference Permanent SEFIs are not recoverable:

– Loss of Erase, Write or Read Circuit– Can approximate the loss of a component

Block based failures and permanent SEFIs are roughly equivalent– Lose a “unit” of data (BLOCK x 4 x n) ~

“component” Simple addressing and memory management

– No exotic stuff like link listing

Page 11: FLASH Mitigation Strategies for Space Applications Charles Howard Southwest Research Institute.

11FLASH Mitigation Strategies for Space Applications

Design Options

UnmitigatedSEC/DED (Traditional EDAC)Reed-SolomonParallel Reed-SolomonTMRRedundancyECC “Plus”

Page 12: FLASH Mitigation Strategies for Space Applications Charles Howard Southwest Research Institute.

12FLASH Mitigation Strategies for Space Applications

Unmitigated

0% more memory– Area / Power / Mass 1x

Implementation concerns– Addressing scheme Simple– Memory management metrics Simple

Utilization -- logic required to implement– I/O count 1x– Gates Baseline

Susceptibility– Bit Any Single Bit Error– Byte or component NOPE…

Page 13: FLASH Mitigation Strategies for Space Applications Charles Howard Southwest Research Institute.

13FLASH Mitigation Strategies for Space Applications

SEC/DED

25% more memory– Area / Power / Mass 1.25x

Implementation concerns– Addressing scheme Simple– Memory management metrics Simple

Utilization -- logic required to implement– I/O count 1.25x– Gates Hamming cost

Susceptibility (Immunity)– Bit Any Single Bit Error– Byte or component NOPE…

Page 14: FLASH Mitigation Strategies for Space Applications Charles Howard Southwest Research Institute.

14FLASH Mitigation Strategies for Space Applications

Reed Solomon (Block)

25% more memory – Area / Power / Mass 1.25x

Implementation concerns– Addressing scheme Straightforward– Memory management metrics Simple

Utilization -- logic required to implement– I/O count 1.25x– Gates Encoder/Decoder/RAM– Bandwidth Likely Adverse

Susceptibility (Immunity)– Bit, byte Many/codeblock – Component failures NOPE…

Page 15: FLASH Mitigation Strategies for Space Applications Charles Howard Southwest Research Institute.

15FLASH Mitigation Strategies for Space Applications

Parallel Reed Solomon

50% more memory – Area / Power / Mass 1.5x

Implementation concerns– Addressing scheme Simple– Memory management metrics

Utilization -- logic required to implement– I/O count 1.5x– Gates Encoder/Decoder

Susceptibility (Immunity) – Bit, byte, byte “plus” YEAH!– SOME component failures 2/3 (NOT IN

THE RS)

Page 16: FLASH Mitigation Strategies for Space Applications Charles Howard Southwest Research Institute.

16FLASH Mitigation Strategies for Space Applications

TMR

200% more memory– Area / Power / Mass 3x

Implementation concerns– Addressing scheme Simple– Memory management metrics Simple

Utilization -- logic required to implement– I/O count 3X or TDM– Bus loading / signal integrity Ouch…– Gates Voters (plus)

Susceptibility (Immunity) – Bit, byte or component OH, YEAH! We

can handle anything!

Page 17: FLASH Mitigation Strategies for Space Applications Charles Howard Southwest Research Institute.

17FLASH Mitigation Strategies for Space Applications

Redundant Memory

X% more memory– Area / Power / Mass X

Implementation concerns– Addressing scheme Simple– Memory management metrics Simple

Utilization -- logic required to implement– I/O count X– Gates Minimal

Susceptibility (Immunity) – Bit, byte or component Nope.

Page 18: FLASH Mitigation Strategies for Space Applications Charles Howard Southwest Research Institute.

18FLASH Mitigation Strategies for Space Applications

ECC with Warm Spare

25-50% more memory per dword– Area / Power / Mass 1.5x

Implementation concerns– Addressing scheme Simple– Memory management metrics

Straightforward Utilization -- logic required to implement

– I/O count 1.5x– Bus loading / signal integrity– Gates ECC & steering

Susceptibility (Immunity) – Bit, byte or component OH, YEAH! We

can handle anything!

Page 19: FLASH Mitigation Strategies for Space Applications Charles Howard Southwest Research Institute.

19FLASH Mitigation Strategies for Space Applications

Memory Topology

Power Sector(16GB + EDAC)

PowerControl

Power Sector(16GB + EDAC)

PowerControl

Ctrl+Data

BYTE0

BYTE2

ECC

BYTE1

BYTE3

SPARE

Revised 07/25/09

Power Sector(16GB + EDAC)

PowerControl

Mass Memory

Flash Array(48GB + EDAC)

Page 20: FLASH Mitigation Strategies for Space Applications Charles Howard Southwest Research Institute.

20FLASH Mitigation Strategies for Space Applications

Failure 1

Power Sector(16GB + EDAC)

PowerControl

Power Sector(16GB + EDAC)

PowerControl

Ctrl+Data

BYTE0

BYTE2

ECC

BYTE1

BYTE3

BYTE0

Revised 07/25/09

Power Sector(16GB + EDAC)

PowerControl

Mass Memory

Flash Array(48GB + EDAC)

Page 21: FLASH Mitigation Strategies for Space Applications Charles Howard Southwest Research Institute.

21FLASH Mitigation Strategies for Space Applications

Failure 2

Power Sector(16GB + EDAC)

PowerControl

Power Sector(16GB + EDAC)

PowerControl

Ctrl+Data

BYTE0

BYTE2

ECC

BYTE1

BYTE3

ECC

Revised 07/25/09

Power Sector(16GB + EDAC)

PowerControl

Mass Memory

Flash Array(48GB + EDAC)

Page 22: FLASH Mitigation Strategies for Space Applications Charles Howard Southwest Research Institute.

22FLASH Mitigation Strategies for Space Applications

Observations

ECC covers SEU errors Warm Spare compensates for SEFIs and block errors ECC with Warm Spare is a superior option

– Susceptibility to permanent SEFIs plummets– Memory availability remains near 100%

Block based errors mapped to spare SEFI based errors map to spare

ECC with Warm Spare is roughly equivalent to full TMR at half the power, mass, area, and cost

Page 23: FLASH Mitigation Strategies for Space Applications Charles Howard Southwest Research Institute.

23FLASH Mitigation Strategies for Space Applications

Summary

Memory modules allow highest density/area Mitigation is user’s choice depending upon

design goals but must cover SEFI and SEU ECC with Warm Spare is roughly equivalent to

full TMR at half the power, mass, area, and cost

TID and SEE Response of an Advanced Samsung 4Gb NAND Flash Memory (NSREC07); T. R. Oldham, M. Friendlich, J. W. Howard, Jr., M. D. Berg, H. S. Kim, T. L. Irwin, and K. A. LaBel

TID and SEE Response of Advanced 4G NAND Flash Memories (NSREC08); T. R. Oldham, Fellow, IEEE, M. Suhail, M. R. Friendlich, M. A. Carts, R.L. Ladbury, Member, IEEE, H. S. Kim, M. D. Berg, C. Poivey, Member, IEEE, S. P. Buchner, Member, IEEE, A. B. Sanders, C. M. Seidleck, and K. A. LaBel, Member, IEEE

SEE and TID of Emerging Non-Volatile Memories; D.N. Nguyen and L.Z. Scheick, Jet Propulsion Laboratory California Institute of Technology, http://parts.jpl.nasa.gov/docs/PID16621.pdf

A Case Study of Single Event Functional Interrupts (SEFIs) in COTS SDRAMS (NSREC08); Joe Benedetto and George Ott, Radiation Assured Devices