Top Banner
1 University of Michigan Electrical Engineering and Computer Science Cost-Efficient Soft Error Protection for Embedded Microprocessors Jason Blome 1 , Shuguang Feng 1 , Shantanu Gupta 1 , Scott Mahlke 1 , Daryl Bradley 2 University of Michigan 1 ARM, Ltd. 2
22

University of Michigan Electrical Engineering and Computer Science 1 Cost-Efficient Soft Error Protection for Embedded Microprocessors Jason Blome 1, Shuguang.

Dec 21, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: University of Michigan Electrical Engineering and Computer Science 1 Cost-Efficient Soft Error Protection for Embedded Microprocessors Jason Blome 1, Shuguang.

1 University of MichiganElectrical Engineering and Computer Science

Cost-Efficient Soft Error Protection for Embedded Microprocessors

Jason Blome1, Shuguang Feng1, Shantanu Gupta1, Scott Mahlke1, Daryl Bradley2

University of Michigan1

ARM, Ltd. 2

Page 2: University of Michigan Electrical Engineering and Computer Science 1 Cost-Efficient Soft Error Protection for Embedded Microprocessors Jason Blome 1, Shuguang.

2 University of MichiganElectrical Engineering and Computer Science

The Soft Error Problem

transient fault soft error

0CLK

DQ1

Page 3: University of Michigan Electrical Engineering and Computer Science 1 Cost-Efficient Soft Error Protection for Embedded Microprocessors Jason Blome 1, Shuguang.

3 University of MichiganElectrical Engineering and Computer Science

Fault Masking

• Logical: faulted value does not affect logical operation of the circuit

0

0

• Latching-Window: the fault pulse does not reach a state element within the latching window

• Electrical: the fault pulse is electrically attenuated by subsequent gates in the circuit

• Architectural/Software: incorrect state is written before it is read

CLK

tsetup thold

mov r5, 8

mov r2, 4------

…d

eco

der

Register File

012345

add r6, r2, r5

mov r5, 8

mov r2, 4

98

4add r6, r2, r5

Page 4: University of Michigan Electrical Engineering and Computer Science 1 Cost-Efficient Soft Error Protection for Embedded Microprocessors Jason Blome 1, Shuguang.

4 University of MichiganElectrical Engineering and Computer Science

Soft Error Rate Trends

Shivakumar 2002

Soft Error Rate Contributions

Mitra 2005

49%

11%

40%

StaticCombinationalLogicUnprotectedSRAMs

SequentialElements

Increasing contribution of faults in combinational logic to the overall soft error rate

Page 5: University of Michigan Electrical Engineering and Computer Science 1 Cost-Efficient Soft Error Protection for Embedded Microprocessors Jason Blome 1, Shuguang.

5 University of MichiganElectrical Engineering and Computer Science

Outline

• Soft error analysis setup• Summary of fault analysis results• Fault tolerance techniques

► Register value cache► Strategic deployment of fault detectors

• Conclusion

Page 6: University of Michigan Electrical Engineering and Computer Science 1 Cost-Efficient Soft Error Protection for Embedded Microprocessors Jason Blome 1, Shuguang.

6 University of MichiganElectrical Engineering and Computer Science

Fault Analysis Frameworktestbench

referencedesign

testdesign

report generationreport generation

benchmarkbenchmark

fault injection/error analysis framework

error checkingand logging

fault injectionscheduler

RegisterBank

RegisterBank

Data InterfaceData Interface

InstructionAddress

Logic

InstructionAddress

Logic

DataAddress

Logic

DataAddress

Logic

MultiplyMultiply ALU

ShiftShift

Instruction DecodeInstruction Decode

ARM926EJ-S

Instruction FetchInstruction Fetch

Datacache

Datacache

MMUMMU

Instructioncache

Instructioncache

MMUMMU

Bus Interface

Write Buffer/Bus Interface

MuxArray

MuxArray

Page 7: University of Michigan Electrical Engineering and Computer Science 1 Cost-Efficient Soft Error Protection for Embedded Microprocessors Jason Blome 1, Shuguang.

7 University of MichiganElectrical Engineering and Computer Science

Observed Error Rates

Error Site Error Rate

Microarchitectural State 94%

Architectural State 7%

Error Site Error Rate

Microarchitectural State 16%

Architectural State 4%

Faults Occurring in Registers

Faults Occurring in Combinational Logic

At the software interface, error rates within 3%

94%

16%

7%

4%

Page 8: University of Michigan Electrical Engineering and Computer Science 1 Cost-Efficient Soft Error Protection for Embedded Microprocessors Jason Blome 1, Shuguang.

8 University of MichiganElectrical Engineering and Computer Science

Impact of Fault Injection

05

101520253035404550

0 5 10 15 20Cycle

Nu

mb

er

of

Err

ors

Comb. Logic:Microarchitectural StateErrors

Comb. Logic: ArchitecturalState Errors

Seq. State:Microarchitectural StateErrors

Seq. State: ArchitecturalState Errors

Page 9: University of Michigan Electrical Engineering and Computer Science 1 Cost-Efficient Soft Error Protection for Embedded Microprocessors Jason Blome 1, Shuguang.

9 University of MichiganElectrical Engineering and Computer Science

Targeting the Faults that Count

• ARM926EJ-S register file consumes 8.7% of total core area

► Responsible for 57.4% of architectural errors

• Register file area dominated by combinational logic

► ECC cost, efficacy?

Page 10: University of Michigan Electrical Engineering and Computer Science 1 Cost-Efficient Soft Error Protection for Embedded Microprocessors Jason Blome 1, Shuguang.

10 University of MichiganElectrical Engineering and Computer Science

The Register Value Cache

Register Value Cache

Register File

CMP

CMP

CMP

Stall/Check CRC

dec

ode

r

012345

x

x…

10

32

54

Read/WriteAddr/Data Read Result

Page 11: University of Michigan Electrical Engineering and Computer Science 1 Cost-Efficient Soft Error Protection for Embedded Microprocessors Jason Blome 1, Shuguang.

11 University of MichiganElectrical Engineering and Computer Science

The Register Value CacheValid

Read/WriteAddr

ReadData

Index Array

Value Array

Previous Read Values

CRC

CRC

WriteData

WriteData

Error

CMP Error

Read OperationWrite OperationCheck Operation

Page 12: University of Michigan Electrical Engineering and Computer Science 1 Cost-Efficient Soft Error Protection for Embedded Microprocessors Jason Blome 1, Shuguang.

12 University of MichiganElectrical Engineering and Computer Science

Example

------

dec

ode

r

Register File

Register Cache

x

x…

----

4

8

40

48

mov r5, 8

mov r2, 4

add r3, r1, r4

mov r5, 8

mov r2, 4

add r3, r2, r5

CheckCRC

012345

10

32

54

---

-8 crc4 crc

Page 13: University of Michigan Electrical Engineering and Computer Science 1 Cost-Efficient Soft Error Protection for Embedded Microprocessors Jason Blome 1, Shuguang.

13 University of MichiganElectrical Engineering and Computer Science

RVC Fault Coverage

57.4%

Page 14: University of Michigan Electrical Engineering and Computer Science 1 Cost-Efficient Soft Error Protection for Embedded Microprocessors Jason Blome 1, Shuguang.

14 University of MichiganElectrical Engineering and Computer Science

RVC Overhead

Page 15: University of Michigan Electrical Engineering and Computer Science 1 Cost-Efficient Soft Error Protection for Embedded Microprocessors Jason Blome 1, Shuguang.

15 University of MichiganElectrical Engineering and Computer Science

What About the Rest?• Leverage fault fanout to place detectors at

likely targets

Page 16: University of Michigan Electrical Engineering and Computer Science 1 Cost-Efficient Soft Error Protection for Embedded Microprocessors Jason Blome 1, Shuguang.

16 University of MichiganElectrical Engineering and Computer Science

Fault Fanout

Page 17: University of Michigan Electrical Engineering and Computer Science 1 Cost-Efficient Soft Error Protection for Embedded Microprocessors Jason Blome 1, Shuguang.

17 University of MichiganElectrical Engineering and Computer Science

Transient Fault Detector

Main Flip-Flop

ShadowLatchDelay

D

CLK

Error

Q

ShadowLatch

A Self-Tuning DVS Processor Using Delay-Error Detection and Correction: S. Das 2006

Main Flip-Flop

Page 18: University of Michigan Electrical Engineering and Computer Science 1 Cost-Efficient Soft Error Protection for Embedded Microprocessors Jason Blome 1, Shuguang.

18 University of MichiganElectrical Engineering and Computer Science

Glitch Detector CoveragePower Area

Percent Overhead Percent Overhead

Co

ve

rag

e

Co

ve

rag

e

Page 19: University of Michigan Electrical Engineering and Computer Science 1 Cost-Efficient Soft Error Protection for Embedded Microprocessors Jason Blome 1, Shuguang.

19 University of MichiganElectrical Engineering and Computer Science

Combined Technique CoveragePower Area

Percent Overhead Percent Overhead

Co

ve

rag

e

Co

ve

rag

e

Page 20: University of Michigan Electrical Engineering and Computer Science 1 Cost-Efficient Soft Error Protection for Embedded Microprocessors Jason Blome 1, Shuguang.

20 University of MichiganElectrical Engineering and Computer Science

Conclusion

• Circuit level soft error analysis offers significant insight

• Faults in combinational logic do not require structural duplication

► Coverage versus cost tradeoffs available► Significant benefits in compromise

• 85% fault coverage for only 5.5% area► 2-3x increase in MTTF

Page 21: University of Michigan Electrical Engineering and Computer Science 1 Cost-Efficient Soft Error Protection for Embedded Microprocessors Jason Blome 1, Shuguang.

21 University of MichiganElectrical Engineering and Computer Science

Questions?

Page 22: University of Michigan Electrical Engineering and Computer Science 1 Cost-Efficient Soft Error Protection for Embedded Microprocessors Jason Blome 1, Shuguang.

22 University of MichiganElectrical Engineering and Computer Science

RVC Hit Rates

0.7

0.75

0.8

0.85

0.9

0.95

1

6 8 10 12 14 16

Cache Size

Hit

Rat

e

cjpeg

djpeg

epic

unepic

g721decode

g721encode

pegwitdecode

pegwitencode

rawcaudio

rawdaudio

average