Top Banner
Exploiting Quality-Efficiency Tradeoffs with Arbitrary Quantization Special Session - CODES+ISSS Thierry Moreau, Felipe Augusto, Patrick Howe Armin Alaghi, Luis Ceze
39

Exploiting Quality-Efficiency Tradeoffs with …homes.cs.washington.edu/~moreau/media/slides/qappa-codes2017-slides...Exploiting Quality-Efficiency Tradeoffs with Arbitrary Quantization

Apr 14, 2018

Download

Documents

votuong
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Exploiting Quality-Efficiency Tradeoffs with …homes.cs.washington.edu/~moreau/media/slides/qappa-codes2017-slides...Exploiting Quality-Efficiency Tradeoffs with Arbitrary Quantization

Exploiting Quality-Efficiency Tradeoffs with Arbitrary Quantization

Special Session - CODES+ISSS Thierry Moreau, Felipe Augusto, Patrick Howe

Armin Alaghi, Luis Ceze

Page 2: Exploiting Quality-Efficiency Tradeoffs with …homes.cs.washington.edu/~moreau/media/slides/qappa-codes2017-slides...Exploiting Quality-Efficiency Tradeoffs with Arbitrary Quantization

Internet of Things Revolution

noisy, real world sensory input processing

aggregate analytics,

consumed by human etc.

… double temp = sensor_acquire();

… double temp = sensor_acquire();

Approximate computing: eliminate inefficiencies in systems by producing just-the-right quality

Page 3: Exploiting Quality-Efficiency Tradeoffs with …homes.cs.washington.edu/~moreau/media/slides/qappa-codes2017-slides...Exploiting Quality-Efficiency Tradeoffs with Arbitrary Quantization

Quantization: going back to basics

noisy, real world sensory input processing

aggregate analytics,

consumed by human etc.

SRAM

SRAM

ALU

ALU

Page 4: Exploiting Quality-Efficiency Tradeoffs with …homes.cs.washington.edu/~moreau/media/slides/qappa-codes2017-slides...Exploiting Quality-Efficiency Tradeoffs with Arbitrary Quantization

This Talk: A “Limit Study” on Precision Scaling

n

doublefloat

1

Assumption: hardware that can dynamically and arbitrarily scale its precision

SW Scope: compute heavy, regular applications

HW Scope: hardware accelerators

Page 5: Exploiting Quality-Efficiency Tradeoffs with …homes.cs.washington.edu/~moreau/media/slides/qappa-codes2017-slides...Exploiting Quality-Efficiency Tradeoffs with Arbitrary Quantization

Talk Overview

1. How much precision is needed at different stages of a program?

2. How much energy can be saved (upper bound)?

3. How does this inform approximate computing research?

Page 6: Exploiting Quality-Efficiency Tradeoffs with …homes.cs.washington.edu/~moreau/media/slides/qappa-codes2017-slides...Exploiting Quality-Efficiency Tradeoffs with Arbitrary Quantization

Talk Overview

1. How much precision is needed at different stages of a program?

QAPPA - Precision Autotuner

2. How much energy can be saved?

3. How does this inform approximate computing research?

Page 7: Exploiting Quality-Efficiency Tradeoffs with …homes.cs.washington.edu/~moreau/media/slides/qappa-codes2017-slides...Exploiting Quality-Efficiency Tradeoffs with Arbitrary Quantization

QAPPA: Quality Autotuner for Precision-Programmable Accelerators

Goal: Minimize instruction-level precision requirements given a quality target

kernel.c

desired quality target

QAPPA framework instruction-level

precision requirements

quality &energy savings

Built on top of ACCEPT, the approximate C/C++ compiler http://accept.rocks

Page 8: Exploiting Quality-Efficiency Tradeoffs with …homes.cs.washington.edu/~moreau/media/slides/qappa-codes2017-slides...Exploiting Quality-Efficiency Tradeoffs with Arbitrary Quantization

QAPPA Autotuner Overview

application quality

savings

bad OK

instruction 0instruction 1instruction 2…instruction n-1instruction n

Default (no savings)

Page 9: Exploiting Quality-Efficiency Tradeoffs with …homes.cs.washington.edu/~moreau/media/slides/qappa-codes2017-slides...Exploiting Quality-Efficiency Tradeoffs with Arbitrary Quantization

QAPPA Autotuner Overview

instruction 0instruction 1instruction 2…instruction n-1instruction n

Optimized: extraneous precision is shaved off savings

bad OK

application quality

Page 10: Exploiting Quality-Efficiency Tradeoffs with …homes.cs.washington.edu/~moreau/media/slides/qappa-codes2017-slides...Exploiting Quality-Efficiency Tradeoffs with Arbitrary Quantization

QAPPA 5-Step Description

AnnotatedProgram

Program Inputs &Quality Metrics

ACCEPTstatic analysis ILPC*

ACCEPTerror injection & instrumentation

ApproximateBinary

Execution & Quality

Assessment

Quality AutotunerOutput Configuration

Quality Results& Bit Savings

* Instruction-level Precision Configuration

Page 11: Exploiting Quality-Efficiency Tradeoffs with …homes.cs.washington.edu/~moreau/media/slides/qappa-codes2017-slides...Exploiting Quality-Efficiency Tradeoffs with Arbitrary Quantization

1. Program AnnotationAnnotatedProgram

Program Inputs &Quality Metrics

ACCEPTstatic analysis ILPC*

ACCEPTerror injection & instrumentation

ApproximateBinary

Execution & Quality

Assessment

Quality AutotunerOutput Configuration

Quality Results& Bit Savings

* Instruction-level Precision Configuration

void conv2d (APPROX pix *in, APPROX pix *out, APPROX flt *filter){ for (row) { for (col) { APPROX flt sum = 0 int dstPos = … for (row_offset) { for (col_offset) { int srcPos = … int fltPos = … sum += in[srcPos] * filter[fltPos] } } out[dstPos] = sum / normFactor } }}

Key: use the APPROXtype qualifier [*]

[*] EnerJ, Sampson et al., PLDI’11

Page 12: Exploiting Quality-Efficiency Tradeoffs with …homes.cs.washington.edu/~moreau/media/slides/qappa-codes2017-slides...Exploiting Quality-Efficiency Tradeoffs with Arbitrary Quantization

2. Static Analysis

Instruction-Level Precision Configuration (ILPC)

conv2d:13:7:load:Int32 conv2d:13:10:load:Float conv2d:13:11:fmul:Float conv2d:13:12:fadd:Float conv2d:15:1:fdiv:Float conv2d:15:7:store:Int32

void conv2d (APPROX pix *in, APPROX pix *out, APPROX flt *filter){ for (row) { for (col) { APPROX flt sum = 0 int dstPos = … for (row_offset) { for (col_offset) { int srcPos = … int fltPos = … sum += in[srcPos] * filter[fltPos] } } out[dstPos] = sum / normFactor } }}

ACCEPT

AnnotatedProgram

Program Inputs &Quality Metrics

ACCEPTstatic analysis ILPC*

ACCEPTerror injection & instrumentation

ApproximateBinary

Execution & Quality

Assessment

Quality AutotunerOutput Configuration

Quality Results& Bit Savings

* Instruction-level Precision Configuration

ACCEPT identifies safe-to-approximate instructions from data annotations using flow analysis

Page 13: Exploiting Quality-Efficiency Tradeoffs with …homes.cs.washington.edu/~moreau/media/slides/qappa-codes2017-slides...Exploiting Quality-Efficiency Tradeoffs with Arbitrary Quantization

3. Error InjectionAnnotatedProgram

Program Inputs &Quality Metrics

ACCEPTstatic analysis ILPC*

ACCEPTerror injection & instrumentation

ApproximateBinary

Execution & Quality

Assessment

Quality AutotunerOutput Configuration

Quality Results& Bit Savings

* Instruction-level Precision Configuration

Approximate Binary

Instruction-Level Precision Configuration (ILPC)

conv2d:13:7:load:Int4 conv2d:13:10:load:Fix2.3 conv2d:13:11:fmul:Fix2.3 conv2d:13:12:fadd:Fix4.5 conv2d:15:1:fdiv:Fix2.3 conv2d:15:7:store:Int4

Instrumentation & Compilation

Each instruction in the ILCP acts as a quality knob that the autotuner can use to maximize bit-savings

Page 14: Exploiting Quality-Efficiency Tradeoffs with …homes.cs.washington.edu/~moreau/media/slides/qappa-codes2017-slides...Exploiting Quality-Efficiency Tradeoffs with Arbitrary Quantization

4. Quality AssessmentAnnotatedProgram

Program Inputs &Quality Metrics

ACCEPTstatic analysis ILPC*

ACCEPTerror injection & instrumentation

ApproximateBinary

Execution & Quality

Assessment

Quality AutotunerOutput Configuration

Quality Results& Bit Savings

* Instruction-level Precision Configuration

The programmer provides a quality assessment script to evaluate quality on the program output

Reference Binary

Approximate Binary

eval.py

10dB SNR

Page 15: Exploiting Quality-Efficiency Tradeoffs with …homes.cs.washington.edu/~moreau/media/slides/qappa-codes2017-slides...Exploiting Quality-Efficiency Tradeoffs with Arbitrary Quantization

5. Autotuning AlgorithmAnnotatedProgram

Program Inputs &Quality Metrics

ACCEPTstatic analysis ILPC*

ACCEPTerror injection & instrumentation

ApproximateBinary

Execution & Quality

Assessment

Quality AutotunerOutput Configuration

Quality Results& Bit Savings

* Instruction-level Precision Configuration

config k: error = 0.10%

config [k+1, i-1]: error = 5.91%

config [k+1, i]: error = 0.30%

config [k+1, i+1]: error = 0.12%

config [k+2, i-1]: error = 5.91%

config [k+2, i]: error = 0.33%

config [k+2, i+1]: error = 1.6%

Greedy iterative algorithm [*]: reduces precision requirement of the instruction that impacts quality the least

Finds solution in O(m2n) worst case where m is the number of static safe-to-approximate instructions and n are the levels of precision for all instructions

[*] Precimonious, Rubio-Gonzalez et al., SC’13

Page 16: Exploiting Quality-Efficiency Tradeoffs with …homes.cs.washington.edu/~moreau/media/slides/qappa-codes2017-slides...Exploiting Quality-Efficiency Tradeoffs with Arbitrary Quantization

AnnotatedProgram

Program Inputs &Quality Metrics

ACCEPTstatic analysis ILPC*

ACCEPTerror injection & instrumentation

ApproximateBinary

Execution & Quality

Assessment

Quality AutotunerOutput Configuration

Quality Results& Bit Savings

* Instruction-level Precision Configuration

precise60dB40dB

20dB

10dB The autotuner greedily maximizes bit-savings as the quality target is lowered

5. Autotuning Algorithm

Page 17: Exploiting Quality-Efficiency Tradeoffs with …homes.cs.washington.edu/~moreau/media/slides/qappa-codes2017-slides...Exploiting Quality-Efficiency Tradeoffs with Arbitrary Quantization

PERFECT Application StudyApplication Domain Kernels Metric

PERFECT Application 1Discrete Wavelet Transform

Signal to Noise Ratio(SNR)

[120dB to 10dB] (0.0001% to 31.6% MSE)

2D ConvolutionHistogram Equalization

Space Time Adaptive Processing

Outer ProductSystem SolveInner Product

Synthetic Aperture RadarInterpolation 1Interpolation 2

Back Projection

Wide Area Motion ImagingDebayer

Image RegistrationChange Detection

Required Kernels FFT 1DFFT 2D

Page 18: Exploiting Quality-Efficiency Tradeoffs with …homes.cs.washington.edu/~moreau/media/slides/qappa-codes2017-slides...Exploiting Quality-Efficiency Tradeoffs with Arbitrary Quantization

Opportunity of ApproximationsQAPPA Analyzes PERFECT Dynamic Instruction Mix

load/store27%

int arith4%

fp arith31%

math1%

int arith25%

control11%

Safe to approximate

Precise

Page 19: Exploiting Quality-Efficiency Tradeoffs with …homes.cs.washington.edu/~moreau/media/slides/qappa-codes2017-slides...Exploiting Quality-Efficiency Tradeoffs with Arbitrary Quantization

Average Precision Reduction Achieved Across PERFECT Kernels

Dyn

amic

pre

cisi

on re

duct

ion

on

safe

-to-a

ppro

xim

ate

inst

ruct

ions

0%

20%

40%

60%

80%

100%

Target Application SNR (dB)10 20 40 60 80 100 120

26%32%

40%48%

57%

74%

83%

High QualityApproximateMore savings

Page 20: Exploiting Quality-Efficiency Tradeoffs with …homes.cs.washington.edu/~moreau/media/slides/qappa-codes2017-slides...Exploiting Quality-Efficiency Tradeoffs with Arbitrary Quantization

Average Precision Reduction Achieved Across PERFECT Kernels

Dyn

amic

pre

cisi

on re

duct

ion

on

safe

-to-a

ppro

xim

ate

inst

ruct

ions

0%

20%

40%

60%

80%

100%

Average SNR (dB)10 20 40 60 80 100 120

26%32%

40%48%

57%

74%

83%

PERFECT Manual 0.001% MSE

Page 21: Exploiting Quality-Efficiency Tradeoffs with …homes.cs.washington.edu/~moreau/media/slides/qappa-codes2017-slides...Exploiting Quality-Efficiency Tradeoffs with Arbitrary Quantization

Average Precision Reduction Achieved Across PERFECT Kernels

Dyn

amic

pre

cisi

on re

duct

ion

on

safe

-to-a

ppro

xim

ate

inst

ruct

ions

0%

20%

40%

60%

80%

100%

Average SNR (dB)10 20 40 60 80 100 120

26%32%

40%48%

57%

74%

83%

Approximate Computing 10% MSE

Page 22: Exploiting Quality-Efficiency Tradeoffs with …homes.cs.washington.edu/~moreau/media/slides/qappa-codes2017-slides...Exploiting Quality-Efficiency Tradeoffs with Arbitrary Quantization

Talk Overview

1. How much precision is needed at different stages of a program?

QAPPA - Precision Autotuner

2. How much energy can be saved (upper bound)?

Case Study of Precision Scaling Hardware Mechanisms

3. How does this inform approximate computing research?

Page 23: Exploiting Quality-Efficiency Tradeoffs with …homes.cs.washington.edu/~moreau/media/slides/qappa-codes2017-slides...Exploiting Quality-Efficiency Tradeoffs with Arbitrary Quantization

Translating Precision Reduction into Energy Savings (Compute)

11011110

01001001

01100010

(a) (b)

01001000

11001100

10000100

quant quant

01001001

01100010

(c)

1001

1110

0101

c

ser ser1001 0101

de-ser

1110

0100

1100

0110

c

ser ser0100 0110

de-ser

1100

(d)

10

11

01

c

q q1001 0101

de-ser

1100

01

11

10

c

q q0100 0110

de-ser

1100

Baseline ALU

No savings

Page 24: Exploiting Quality-Efficiency Tradeoffs with …homes.cs.washington.edu/~moreau/media/slides/qappa-codes2017-slides...Exploiting Quality-Efficiency Tradeoffs with Arbitrary Quantization

Translating Precision Reduction into Energy Savings (Compute)

11011110

01001001

01100010

(a) (b)

01001000

11001100

10000100

quant quant

01001001

01100010

(c)

1001

1110

0101

c

ser ser1001 0101

de-ser

1110

0100

1100

0110

c

ser ser0100 0110

de-ser

1100

(d)

10

11

01

c

q q1001 0101

de-ser

1100

01

11

10

c

q q0100 0110

de-ser

1100

QUORA [MICRO’13]

Baseline ALU Value Truncation

No savings Less Power

Page 25: Exploiting Quality-Efficiency Tradeoffs with …homes.cs.washington.edu/~moreau/media/slides/qappa-codes2017-slides...Exploiting Quality-Efficiency Tradeoffs with Arbitrary Quantization

Translating Precision Reduction into Energy Savings (Compute)

11011110

01001001

01100010

(a) (b)

01001000

11001100

10000100

quant quant

01001001

01100010

(c)

1001

1110

0101

c

ser ser1001 0101

de-ser

1110

0100

1100

0110

c

ser ser0100 0110

de-ser

1100

(d)

10

11

01

c

q q1001 0101

de-ser

1100

01

11

10

c

q q0100 0110

de-ser

1100

Bit-Sliced

Stripes [MICRO’16]QUORA [MICRO’13]

Baseline ALU Value Truncation

No savings Less Power Higher Throughput

Page 26: Exploiting Quality-Efficiency Tradeoffs with …homes.cs.washington.edu/~moreau/media/slides/qappa-codes2017-slides...Exploiting Quality-Efficiency Tradeoffs with Arbitrary Quantization

Case Study: Precision Scaled Adder

Methodology: Post-place-and-route prime-time power analysis on 65nm TSMC library

Goal: Design an precision scalable adder that can elegantly trade lower precision for energy savings

Exploration: Combine value truncation and bit slicing techniques, and vary the slice width in increments of powers of 2

Page 27: Exploiting Quality-Efficiency Tradeoffs with …homes.cs.washington.edu/~moreau/media/slides/qappa-codes2017-slides...Exploiting Quality-Efficiency Tradeoffs with Arbitrary Quantization

Precision Scaled Adder Study

Ener

gy C

ost (

pJ)

0.00

0.45

0.90

1.35

1.80

Input Bit-Width

0 8 16 24 32 40 48 56 64

64

technique 1: value truncation

offset due to static power

Page 28: Exploiting Quality-Efficiency Tradeoffs with …homes.cs.washington.edu/~moreau/media/slides/qappa-codes2017-slides...Exploiting Quality-Efficiency Tradeoffs with Arbitrary Quantization

Precision Scaled Adder Study

Ener

gy C

ost (

pJ)

0.00

0.45

0.90

1.35

1.80

Input Bit-Width

0 8 16 24 32 40 48 56 64

1 64

technique 2: bit slicing

Page 29: Exploiting Quality-Efficiency Tradeoffs with …homes.cs.washington.edu/~moreau/media/slides/qappa-codes2017-slides...Exploiting Quality-Efficiency Tradeoffs with Arbitrary Quantization

Case Study: Precision-Scaled Adder

Ener

gy C

ost (

pJ)

0.00

0.45

0.90

1.35

1.80

Input Bit-Width

0 8 16 24 32 40 48 56 64

1 2 4 8 16 32 64

we look at different slice widths in powers of 2

increments

slice width

a 2-bit slice seems to be the energy-optimal design point

Page 30: Exploiting Quality-Efficiency Tradeoffs with …homes.cs.washington.edu/~moreau/media/slides/qappa-codes2017-slides...Exploiting Quality-Efficiency Tradeoffs with Arbitrary Quantization

Average Compute Energy Savings vs. Application SNREn

ergy

Sav

ings

(x) -

Hig

her i

s Be

tter

0

1

2

3

4

5

6

7

8

9

10

Application SNR (dB) - Higher is Better

20 30 40 50 60

2.52.62.93.13.8 3.6

4.34.8

5.6

7.1

2.83.03.23.6

7.7

1.41.51.72.02.5

1 2 4 816 32

PERFECT Study: Compute Energy Savings

quora

stripesslice width

Page 31: Exploiting Quality-Efficiency Tradeoffs with …homes.cs.washington.edu/~moreau/media/slides/qappa-codes2017-slides...Exploiting Quality-Efficiency Tradeoffs with Arbitrary Quantization

PERFECT Study: Compute Energy Savings

Average Compute Energy Savings vs. Application SNREn

ergy

Sav

ings

(x) -

Hig

her i

s Be

tter

0

1

2

3

4

5

6

7

8

9

10

Application SNR (dB) - Higher is Better

20 30 40 50 60

2.52.62.93.13.8 3.6

4.34.8

5.6

7.1

2.83.03.23.6

7.7

1.41.51.72.02.5

1 2 4 816 32

slice widthAt 40dB a 16b sliced ALU can

achieve 4.8 energy reduction!

Page 32: Exploiting Quality-Efficiency Tradeoffs with …homes.cs.washington.edu/~moreau/media/slides/qappa-codes2017-slides...Exploiting Quality-Efficiency Tradeoffs with Arbitrary Quantization

PERFECT Study: Compute Energy Savings

Average Compute Energy Savings vs. Application SNREn

ergy

Sav

ings

(x) -

Hig

her i

s Be

tter

0

1

2

3

4

5

6

7

8

9

10

Application SNR (dB) - Higher is Better

20 30 40 50 60

2.52.62.93.13.8 3.6

4.34.8

5.6

7.1

2.83.03.23.6

7.7

1.41.51.72.02.5

1 2 4 816 32

slice widthAt 20dB the optimal design

point shifts to 8-bit slice

Page 33: Exploiting Quality-Efficiency Tradeoffs with …homes.cs.washington.edu/~moreau/media/slides/qappa-codes2017-slides...Exploiting Quality-Efficiency Tradeoffs with Arbitrary Quantization

Talk Overview

1. How much precision is needed at different stages of a program?

QAPPA - Precision Autotuner

2. How much energy can be saved (upper bound)?

Case Study of Precision Scaling Hardware Mechanisms

3. How does this inform approximate computing research?

Comparative Study of Approximation Techniques

Page 34: Exploiting Quality-Efficiency Tradeoffs with …homes.cs.washington.edu/~moreau/media/slides/qappa-codes2017-slides...Exploiting Quality-Efficiency Tradeoffs with Arbitrary Quantization

Comparative StudyMany papers on approximate computing state:

“Our technique provided n times speedup at x% error”

Problem: This give us a data point but doesn’t quite say much about the merits of the technique at

trading accuracy for efficiency

Solution: Use QAPPA to produce quick comparison results to assess effectiveness of technique

Page 35: Exploiting Quality-Efficiency Tradeoffs with …homes.cs.washington.edu/~moreau/media/slides/qappa-codes2017-slides...Exploiting Quality-Efficiency Tradeoffs with Arbitrary Quantization

Comparative Study - Voltage Overscaling

0

10

0.8

20

01

Erro

r Pro

babi

lity

(%)

2

30

30.85 456

40

789100.9 11

Overscaling Factor

1213

Bit Position

141516170.95 181920212223241 25262728293031

Methodology (1/2): Spice simulation of ALU/FPU design under different voltage overscaling factors.

fp adder example

Page 36: Exploiting Quality-Efficiency Tradeoffs with …homes.cs.washington.edu/~moreau/media/slides/qappa-codes2017-slides...Exploiting Quality-Efficiency Tradeoffs with Arbitrary Quantization

Comparative Study - Voltage Overscaling

Results: Precision scaling always produces better quality/efficiency

Methodology (2/2): Then we feed the error model into QAPPA’s error injection framework to assess

application error.SN

R (d

B) -

hige

r is

bette

r

0

10

20

30

40

2dconv

dwthisteq

outersystemsolve

innerinterp1

interp2bp debayer

lucaskanade

changedet

fft1dfft2d

VOF=0.95 VOF=0.90 VOF=0.84

Page 37: Exploiting Quality-Efficiency Tradeoffs with …homes.cs.washington.edu/~moreau/media/slides/qappa-codes2017-slides...Exploiting Quality-Efficiency Tradeoffs with Arbitrary Quantization

Future Directions in Architecture/CAD

Precision Scaling Architectures: Need to see more precision-scaled accelerators for more applications of the likes of Quora[MICRO’13], Stripes[MICRO’16]

CAD tools with Quality Awareness: Need to see more tools that can leverage quantization, especially in the

FPGA community, of the likes of AHLS[DATE’17]

Page 38: Exploiting Quality-Efficiency Tradeoffs with …homes.cs.washington.edu/~moreau/media/slides/qappa-codes2017-slides...Exploiting Quality-Efficiency Tradeoffs with Arbitrary Quantization

Conclusion

1. How much precision is needed at different stages of a program?

QAPPA - Precision Autotuner

2. How much energy can be saved (upper bound)?

Case Study of Precision Scaling Hardware Mechanisms

3. How does this inform approximate computing research?

Comparative Study of Approximation Techniques

Page 39: Exploiting Quality-Efficiency Tradeoffs with …homes.cs.washington.edu/~moreau/media/slides/qappa-codes2017-slides...Exploiting Quality-Efficiency Tradeoffs with Arbitrary Quantization

Special Session - CODES+ISSS Thierry Moreau, Felipe Augusto, Patrick Howe

Armin Alaghi, Luis Ceze

Exploiting Quality-Efficiency Tradeoffs with Arbitrary Quantization

Thank you!