Top Banner
29

UW-Madison Computer Sciences Vertical Research Group© 2010 Relax: An Architectural Framework for Software Recovery of Hardware Faults Marc de Kruijf Shuou.

Dec 17, 2015

Download

Documents

Kenneth Riley
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: UW-Madison Computer Sciences Vertical Research Group© 2010 Relax: An Architectural Framework for Software Recovery of Hardware Faults Marc de Kruijf Shuou.
Page 2: UW-Madison Computer Sciences Vertical Research Group© 2010 Relax: An Architectural Framework for Software Recovery of Hardware Faults Marc de Kruijf Shuou.

UW-Madison Computer Sciences Vertical Research Group © 2010

Relax: An Architectural Framework for Software Recovery

of Hardware Faults

Marc de KruijfShuou Nomura

Karthikeyan Sankaralingam

Page 3: UW-Madison Computer Sciences Vertical Research Group© 2010 Relax: An Architectural Framework for Software Recovery of Hardware Faults Marc de Kruijf Shuou.

ISCA 2010 - 3

Executive Summary Problem

Technology is driving simple hardware Fault recovery requires complex hardware

Software Recovery Enables simple hardware High energy efficiency

Relax: An Architectural Framework for Software Recovery ISA: a well-defined interface for software recovery Software: support to use the ISA Hardware: support to implement the ISA

Page 4: UW-Madison Computer Sciences Vertical Research Group© 2010 Relax: An Architectural Framework for Software Recovery of Hardware Faults Marc de Kruijf Shuou.

ISCA 2010 - 4

Architecture TrendEnergy efficiency

Hardware simplification

Page 5: UW-Madison Computer Sciences Vertical Research Group© 2010 Relax: An Architectural Framework for Software Recovery of Hardware Faults Marc de Kruijf Shuou.

ISCA 2010 - 5

SearchComputer Vision

Data MiningMedia Processing

Scientific Computing…

Applications TrendData-intensive, error-tolerant applications

Architecture TrendEnergy efficiency

Hardware simplification

100110101101001011001010111001010111000100001101

Page 6: UW-Madison Computer Sciences Vertical Research Group© 2010 Relax: An Architectural Framework for Software Recovery of Hardware Faults Marc de Kruijf Shuou.

ISCA 2010 - 6

Vdd

OutIn

CMOS TrendDevice variability,

wear-out, soft errors

SearchComputer Vision

Data MiningMedia Processing

Scientific Computing…

Applications TrendData-intensive, error-tolerant applications

Architecture TrendEnergy efficiency

Hardware simplification

Page 7: UW-Madison Computer Sciences Vertical Research Group© 2010 Relax: An Architectural Framework for Software Recovery of Hardware Faults Marc de Kruijf Shuou.

CMOS TrendDevice variability,

wear-out, soft errors

Hardware RecoverySoftware Recovery

?Applications Trend

Data-intensive, error-tolerant applications

InefficientNo flexibility

Checkpoints conservative

EfficientError tolerance

Natural recovery points

ISCA 2010 - 7

Vdd

OutIn

SearchComputer Vision

Data MiningMedia Processing

Scientific Computing…

Architecture TrendEnergy efficiency

Hardware simplification

Simple HardwareNo speculative state

Recovery Support Is Needed

Complex HardwareSpeculative state

?

Page 8: UW-Madison Computer Sciences Vertical Research Group© 2010 Relax: An Architectural Framework for Software Recovery of Hardware Faults Marc de Kruijf Shuou.

ISCA 2010 - 8

Relax

Software Recovery

Hardware Detection

ISA

Page 9: UW-Madison Computer Sciences Vertical Research Group© 2010 Relax: An Architectural Framework for Software Recovery of Hardware Faults Marc de Kruijf Shuou.

ISCA 2010 - 9

ISASoftwareHardware

Relax

Page 10: UW-Madison Computer Sciences Vertical Research Group© 2010 Relax: An Architectural Framework for Software Recovery of Hardware Faults Marc de Kruijf Shuou.

ISCA 2010 - 10

ISA

SIMPLE HARDWARE

application

error tolerancesoftware-definedrecovery

simplicityenergy

efficiency

flexibility

Software defines recovery handler

Hardware detects and jumps to handler on faultand is allowed to commit corrupted state*

rlx RECOVER ...RECOVER: ...

*Details in paper

Page 11: UW-Madison Computer Sciences Vertical Research Group© 2010 Relax: An Architectural Framework for Software Recovery of Hardware Faults Marc de Kruijf Shuou.

ISCA 2010 - 11

ISA

SoftwareHardware

Page 12: UW-Madison Computer Sciences Vertical Research Group© 2010 Relax: An Architectural Framework for Software Recovery of Hardware Faults Marc de Kruijf Shuou.

ISCA 2010 - 12

-- WARNING --SOURCE CODE AHEAD

Software

int sad(int *left, int *right, int len) int sum = 0; for (int i = 0; i < len; ++i) { sum += abs(left[i] - right[i]); } return sum;}

SAD (Sum of Absolute Differences) Example(adapted from a H.264 video encoder)

Page 13: UW-Madison Computer Sciences Vertical Research Group© 2010 Relax: An Architectural Framework for Software Recovery of Hardware Faults Marc de Kruijf Shuou.

ISCA 2010 - 13

ENTRY: mv 0 -> $sum ble $len, 0, EXITLOOP_PREHEADER: mv 0 -> $iLOOP: ld [$left + $i * 4] -> $tmp1 ld [$right + $i * 4] -> $tmp2 abs $tmp1, $tmp2 -> $tmp3 add $sum, $tmp3 -> $sum add $i, 1 -> $i blt $i, $len, LOOPEXIT: rlx 0 # Relax off ret $sum

Software

int sad(int *left, int *right, int len) int sum = 0; for (int i = 0; i < len; ++i) { sum += abs(left[i] - right[i]); } return sum;}

relax {

SAD (Sum of Absolute Differences) Example

int sad(int *left, int *right, int len)

int sum = 0; for (int i = 0; i < len; ++i) { sum += abs(left[i] - right[i]);

return sum;}

} recover { retry; }} recover { return INT_MAX; }

return 0x7FFFFFF # “discard”RECOVER: jmp ENTRY # “retry”

rlx RECOVER # Relax on

(adapted from a H.264 video encoder)

raw

encoded

1. No writes to memory2. Idempotent3. Recoverable by re-execution

SIMPLE + INTUITIVE + FLEXIBLE

Page 14: UW-Madison Computer Sciences Vertical Research Group© 2010 Relax: An Architectural Framework for Software Recovery of Hardware Faults Marc de Kruijf Shuou.

ISCA 2010 - 14

ISA

Hardware

Software

Page 15: UW-Madison Computer Sciences Vertical Research Group© 2010 Relax: An Architectural Framework for Software Recovery of Hardware Faults Marc de Kruijf Shuou.

ISCA 2010 - 15

Microarchitecture1. Fine-grained hardware detection (e.g. Argus)2. Recovery PC register + control logic

Hardware

SIMPLE MICROARCHITECTURE

Page 16: UW-Madison Computer Sciences Vertical Research Group© 2010 Relax: An Architectural Framework for Software Recovery of Hardware Faults Marc de Kruijf Shuou.

ISCA 2010 - 16

Homogenous RelaxAll cores with no hardware recovery support

Hardware Organization

“Relaxed” coresNo hardware recovery

Normal coresHardware recovery

Dynamically Heterogeneous RelaxHardware recovery adaptively disabled

Statically Heterogeneous RelaxSome cores with; some cores without

FLEXIBLE DESIGN

Page 17: UW-Madison Computer Sciences Vertical Research Group© 2010 Relax: An Architectural Framework for Software Recovery of Hardware Faults Marc de Kruijf Shuou.

ISCA 2010 - 17

ISASoftware

HardwareEvaluation

Page 18: UW-Madison Computer Sciences Vertical Research Group© 2010 Relax: An Architectural Framework for Software Recovery of Hardware Faults Marc de Kruijf Shuou.

ISCA 2010 - 18

Evaluation

Is it useful?

How useful is it?

Page 19: UW-Madison Computer Sciences Vertical Research Group© 2010 Relax: An Architectural Framework for Software Recovery of Hardware Faults Marc de Kruijf Shuou.

ISCA 2010 - 19

Is it Useful?

Application Name Percent Execution Time Contribution of FunctionBarnesHut (Lonestar) >99.9%bodytrack (PARSEC) 21.9%canneal (PARSEC) 89.4%ferret (PARSEC) 15.7%kmeans (MineBench) 83.3%raytrace (PARSEC) 49.4%x264 (PARSEC) 49.2%

Language support using LLVMOne relax region per application (most dominant function)

Retry and discard behavior

7 Applications

IT WORKS!

Page 20: UW-Madison Computer Sciences Vertical Research Group© 2010 Relax: An Architectural Framework for Software Recovery of Hardware Faults Marc de Kruijf Shuou.

ISCA 2010 - 20

How Useful Is It?

Software recovery for timing speculation

Page 21: UW-Madison Computer Sciences Vertical Research Group© 2010 Relax: An Architectural Framework for Software Recovery of Hardware Faults Marc de Kruijf Shuou.

ISCA 2010 - 21

Methodology

Instruction-level fault injection

Execution time model Statically Heterogeneous

Architecture

Energy model Energy-delay product (EDP) Analytical model for hardware efficiency

Page 22: UW-Madison Computer Sciences Vertical Research Group© 2010 Relax: An Architectural Framework for Software Recovery of Hardware Faults Marc de Kruijf Shuou.

ISCA 2010 - 22

Results – Execution Time

barnesh

ut

bodytrac

k

canneal

ferret

kmean

s

raytra

cex2

640

0.20.40.60.8

11.2

retrydiscard

Exec

ution

Tim

e

*error rates range from 10-3 to 10-6 errors/cycle

Execution time overhead is less than 10% and 1% typical

Discard performance is comparable to retry

Page 23: UW-Madison Computer Sciences Vertical Research Group© 2010 Relax: An Architectural Framework for Software Recovery of Hardware Faults Marc de Kruijf Shuou.

ISCA 2010 - 23

Results – Energy-delay

barnesh

ut

bodytrac

k

canneal

ferret

kmean

s

raytra

cex2

64-0.2-1.66533453693773E-16

0.20.40.60.8

11.2

retrydiscard

Nor

mal

ized

ED

P

*error rates range from 10-3 to 10-6 errors/cycle

Relax achieves energy improvements for timing speculation

Page 24: UW-Madison Computer Sciences Vertical Research Group© 2010 Relax: An Architectural Framework for Software Recovery of Hardware Faults Marc de Kruijf Shuou.

ISCA 2010 - 24

Future Work Better software support

Compiler automation? Binary instrumentation? Nesting relax blocks?

Hardware support What are the chip-level area and power savings? Is Relax hardware truly simpler?

Other domains Software rollback for hardware transactional memory?

Tools to assist analysis of “discard” Discard is hard to reason about; non-deterministic

Page 25: UW-Madison Computer Sciences Vertical Research Group© 2010 Relax: An Architectural Framework for Software Recovery of Hardware Faults Marc de Kruijf Shuou.

ISCA 2010 - 25

Summary

Emerging Architectures Many-core architectures are simple Hardware fault recovery is complex

Emerging Applications Error tolerant Large idempotent regions

Software Recovery is a natural fit Relax : an architectural framework for software recovery

ISA: an interface to define it Software: support for applications to use it Hardware: hardware that enables it

Page 26: UW-Madison Computer Sciences Vertical Research Group© 2010 Relax: An Architectural Framework for Software Recovery of Hardware Faults Marc de Kruijf Shuou.

ISCA 2010 - 26

?

Page 27: UW-Madison Computer Sciences Vertical Research Group© 2010 Relax: An Architectural Framework for Software Recovery of Hardware Faults Marc de Kruijf Shuou.

ISCA 2010 - 27

ISA Semantics Errors must be “spatially contained” to the target resources of a

relax block Misdirected stores and register not recoverable by Relax!

Errors must be “temporally contained” to the scope of a relax block ECC (or other technique) necessary for memory Cache coherence, cache writeback, etc. require other mechanisms

Control flow must be “legal” (follow static control flow edges) Includes hardware exceptions (must wait on detection before trap)

Atomic operations (e.g. atomic increment) are problematic Not supported (sorry)

ISCA 2010 - 27

Page 28: UW-Madison Computer Sciences Vertical Research Group© 2010 Relax: An Architectural Framework for Software Recovery of Hardware Faults Marc de Kruijf Shuou.

ISCA 2010 - 28

Fault Detection

Short latencies important for Detecting misdirected stores Detecting misdirected register writes

Otherwise, latencies depend on region sizes 50 cycle regions + 5 cycle latency = 10% overhead Average region sizes in paper = 1000 cycles

Then, 10 cycle latency = 1% overhead

Page 29: UW-Madison Computer Sciences Vertical Research Group© 2010 Relax: An Architectural Framework for Software Recovery of Hardware Faults Marc de Kruijf Shuou.

ISCA 2010 - 29

“Optimal” Error Rate

Error rate Error rate Error rate

EDP

Tim

e

EDP

Hardware Efficiency Execution Time Overall Efficiency

optimum