Top Banner
Energy Efficient Computing Systems Exploiting Online Tuning and Output Quality Management Gianluca Palermo
31

Energy Efficient Computing Systems Exploiting Online ...

Mar 18, 2022

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Energy Efficient Computing Systems Exploiting Online ...

Energy Efficient Computing Systems Exploiting Online Tuning and Output Quality Management

Gianluca Palermo

Page 2: Energy Efficient Computing Systems Exploiting Online ...

Extra-Functional Properties

2

Func%onal)Descrip%on)

What%to%do…%%

How%It%is%done…%

Extra3Func%onal)Proper%es)

Page 3: Energy Efficient Computing Systems Exploiting Online ...

Extra-Functional Properties

3

Page 4: Energy Efficient Computing Systems Exploiting Online ...

Energy constraint

Power = f * C * V2 + Pstatic

Simple Power-Performance trade-off

4

Power%

Delay%

Pla5orm)Frequency%

Dynamic Power

???

Energy = Power * Delay Delay

constraint

Power constraint

Page 5: Energy Efficient Computing Systems Exploiting Online ...

Energy Vs Power

5

•  Power poses constraints •  E.g. power delivery or cooling solution

•  Energy is in most of the cases the ultimate metric •  It measures the cost of performing a fixed task

•  How can I reduce Energy? •  Voltage and Frequency Scaling •  Dynamic Power Switching •  […]

Page 6: Energy Efficient Computing Systems Exploiting Online ...

•  Small reductions in voltage can be very significant •  Power is proportional to frequency and square of

voltage •  Effective way to reduce energy by providing just-

enough computing power •  F and V are not independent variables •  Choosing the right VF operating point is not

straightforward

(Dynamic) Voltage and Frequency Scaling (DVFS)

6

K.)Choi,)R.)Soma,)M.)Pedram)“Dynamic)voltage)and)frequency)

scaling)based)on)workload)decomposi%on”)ISLPED)2004)

%1% %2% )3) %4%

Power%

1%

Execu<on%Time%

0.1%

)Freq)=)F)W

)DEADLINE)

Execu<on%Time%

Power%

%1% %2% )3) %4%

1%

0.1%

)DEADLINE)

W

)Freq)=)F/3)

Page 7: Energy Efficient Computing Systems Exploiting Online ...

•  Keeping devices powered-on consumes power •  Cut or reduce power on idle device portions

•  Significant overheads for switching to deep low-power states •  Different low-power states

•  Some of them need the context save/restore •  E.g. Retention states –> off states

•  Wakeup latency reduces the responsiveness •  In some contexts also called Race-to-Halt

Dynamic Power Switching (DPS)

7

%1% %2% %3% %4%

Power%

1% Overhead)

Execu<on%Time%

0.1%

%1% %2% %3% %4%

Power%

1%

Execu<on%Time%

0.1%

W

W

Page 8: Energy Efficient Computing Systems Exploiting Online ...

Is there any way to reduce the amount of computation to be performed to consume less?

… Longer Week-Ends are my Dream

8

Execu<on%Time%

Power%

%1% %2% )3) %4%

1%

0.1%

W W

Execu<on%Time%

Power%

%1% %2% )3) %4%

1%

0.1%

W’) W’)

Page 9: Energy Efficient Computing Systems Exploiting Online ...

Approximate Computing

Performance%

Energy%

Accuracy%

Performance%and%%Energy%Reduc<on%

Accuracy%

Performance%and%%Energy%Reduc<on%

9

Page 10: Energy Efficient Computing Systems Exploiting Online ...

•  Approximation adds complexity ‒  Not all codes can handle it

•  … but many expensive applications or kernels are naturally error-tolerant ‒  Make use of analog inputs

‒  E.g. operating noisy real-world data from noisy sensor ‒  Provide analog output

‒  E.g. targeting human perception ‒  Provide multiple good-enough results or no unique answer

‒  E.g. web search ‒  Compute iteratively towards convergence

‒  E.g. convergent applications over the number of iterations.

Approximate Computing Applications

10

V.)Chippa)et)al.“Analysis)and)characteriza%on)of)inherent)applica%on)resilience)for)approximate)compu%ng)”)DAC)2013.))

Page 11: Energy Efficient Computing Systems Exploiting Online ...

Possible Application Domains…

11

Image&Processing&Big&Data&Analy4cs&Robo4cs&

Drug&Discovery&

Graph&Analy4cs&

Traffic&Predic4on&

Mul4media&

…)where)100%)of)accuracy)not)always)required)

HOWEVER …

Page 12: Energy Efficient Computing Systems Exploiting Online ...

… we have to pay attention

12

No Traffic

Page 13: Energy Efficient Computing Systems Exploiting Online ...

Let’s have an example…

13

2 eyes = 3 dimensions

Left camera Right camera

Reference disparityK.)Zhang)et)al.)“Cross3Based)Local)Stereo)Matching)Using)Orthogonal)Integral)Images”.)IEEE)Transac%ons)On)Circuits)and)Systems)For)Video)Technology)2009)

Page 14: Energy Efficient Computing Systems Exploiting Online ...

Tunable Stereo Matching

14

Left camera Right camera

Reference disparity

*)Paone)et)al.)“An)Explora%on)Methodology)for)a)Customizable)OpenCL)Stereo3Matching)Applica%on)Targeted)to)an)Industrial)Mul%3Cluster)Architecture)")In)CODES+ISSS)2012)

1%

2%

3%

QoR Disparity

Error

5%Applica<on%%Knobs*%

Page 15: Energy Efficient Computing Systems Exploiting Online ...

Trading-off Accuracy

15

Accuracy)QoR

)

5FPS%1FPS%Performance))

10FPS%

1

2

3

Extra-functional requirements: What if … 1.  Performance = 4FPS 2.  Performance = F(Speed) 3.  Min Energy; QOR>50%; Perf>=1

2

1 2 3

2FPS%0.5FPS% 4FPS%

2

Execu<on%Time%

Power%

%New%Frame%

1%

0.1%

%New%Frame%

Execu<on%Time%

Power%

%New%Frame%

1%

0.1%

%New%Frame%

Idle)

Applying)VFS)

D.Gadioli)et)al.)“Applica%on)Autotuning)to)support)run%me)adap%vity)in)mul%core)architectures”)SAMOS)2015)

Page 16: Energy Efficient Computing Systems Exploiting Online ...

Architectures)and)Circuits)Use%of%Inexact%HW%(e.g.%Neural%

accelerators,%voltage%over%scaling,%approximate%memories)%

Compilers)and))Code)Transforma%ons)

Strategies%for%approxima<on%due%to%code%manipula<on%

Applica%ons))) Possible%applica<on%domains%and%knobs%to%tradeToff%Accuracy%and%Performance%%

In literature AC has been faced from different perspectives and at different levels

State of the Art…

16

S.)Migal,)“A)survey)of)techniques)for)approximate)compu%ng,”)ACM)Compu%ng)Surveys,)pp.)1–34,)2016.))

DSL%and%Language%supports%for%specifying%quality%requirements%and%

approxima<on%possibili<es%%

Fram

eworks)

Page 17: Energy Efficient Computing Systems Exploiting Online ...

•  Application parameters

•  Task skipping

•  Multiversioning •  Considering different performance/accuracy trade-offs

•  Approximate Memoization •  User partial key for the lookup

Application-level Approximations

17

Applica%ons)))

Compilers)and))Code)Transforma%ons)

Architectures)and)Circuits)

Fram

eworks)

Video Resolution #MC Samples

Computation model that represents an approximate application as a pipeline

J.)SanMiguel)et)al)"The)any%me)automaton.")

ISCA)2016.)

V.)Vassilliadis)et)al)")Exploi%ng)Significance)of)Computa%ons)for)Energy3Constrained)Approximate)

Compu%ng")IJPP)2016.)

Page 18: Energy Efficient Computing Systems Exploiting Online ...

•  Application parameters

•  Task skipping

•  Multiversioning •  Considering different performance/accuracy trade-offs

•  Approximate Memoization •  User partial key for the lookup

Application-level Approximations

17

Applica%ons)))

Compilers)and))Code)Transforma%ons)

Architectures)and)Circuits)

Fram

eworks)

Video Resolution #MC Samples

DSL or annotation based approaches%J)Ansel)et)al.)“PetaBricks:)a)language)and)compiler)for)algorithmic)choice”)PLDI)2009)

B.)Woongki)et)al)"Green:)a)framework)for)suppor%ng)energy3conscious)programming)using)controlled)approxima%on.")PLDI)2010)

Page 19: Energy Efficient Computing Systems Exploiting Online ...

•  Application parameters

•  Task skipping

•  Multiversioning •  Considering different performance/accuracy trade-offs

•  Approximate Memoization •  User partial key for the lookup

Application-level Approximations

17

Applica%ons)))

Compilers)and))Code)Transforma%ons)

Architectures)and)Circuits)

Fram

eworks)

Video Resolution #MC Samples

(p#&&#0xffff0000)#

M.)Alvarez)et)al)"Fuzzy)Memoiza%on)for)Floa%ng3Point)Mul%media)Applica%ons")TCOM)2005.)

Page 20: Energy Efficient Computing Systems Exploiting Online ...

-  Precision Scaling -  FP64->FP32->FP16 -  Float2Int -  Custom precision

-  Loop Perforation

Compiler Approaches

18

Applica%ons)))

Compilers)and))Code)Transforma%ons)

Architectures)and)Circuits)

Fram

eworks)

N.)Ho)et)al.)"Efficient)floa%ng)point)precision)tuning)for)approximate)compu%ng,")

ASP3DAC)2017.)

Modulo%Perfora<on%

Trunca<on%Perfora<on%

Randomized%Perfora<on%

rand( )

S.)Misailovic)et)al.)"Quality)of)service)profiling.")ICSE)2010)

Page 21: Energy Efficient Computing Systems Exploiting Online ...

Very large literature … Dual VDD architectures, Approximate Adders/Multipliers, NPU etc…

Approximate Hardware

19

Truffle%Core%

VDDH$

VDDL

CPU

•  Efficient%hardware%implementa<ons%

•  Capable%to%mimic%many%APPROXIMABLE&computa<ons%(a]er%training)%

•  Fault%tolerant%%

ADD

Applica%ons)))

Compilers)and))Code)Transforma%ons)

Architectures)and)Circuits)

Fram

eworks)

ld 0x04 r1 ld 0x08 r2 add.a r1 r2 r3 st.a 0x0c r3

H.)Esmaelizadeh)et)al.“Neural)Accelera%on)for)General3Purpose)Approximate)Programs”)MICRO)2012)

H.)Esmaelizadeh)et)al.)“Architecture)support)for)disciplined)approximate)programming”)ASPLOS)2012)

PARROT&

EDA perspective

S.)Venkataramani)et)al,)“SALSA:))systema%c)logic)synthesis)of)approximate)circuits”)DAC)2012)I.)Scarabogolo)et)al)“Circuit)Carving:)A)Methodology)for)the)Design)of)Approximate)Hardware”)DATE)2018)

Page 22: Energy Efficient Computing Systems Exploiting Online ...

EnerJ •  Java extension with for

Approximate Computation

•  Including Compiler and Runtime envisioning the usage of approximate HW

Approximate Frameworks (1)

20

int a = ...; int p = ...;

@approx @precise

Applica%ons)))

Compilers)and))Code)Transforma%ons)

Architectures)and)Circuits)

Fram

eworks)

A.)Sampson)et)al.)“EnerJ:)Approximate)Data)Types)for)Safe)and)General)Low3Power)Computa%on”)PLDI)2011)

Page 23: Energy Efficient Computing Systems Exploiting Online ...

Approximate Frameworks (2)

21 Silvano)et)al.)“Autotuning)and)adap%vity)in)energy)efficient)HPC)

systems:)the)ANTAREX)toolbox”)Compu%ng)Fron%ers)2018)

DSL WEAVER

Func%onal)Descrip%on)

C/C++%w/%OpenMP,%%MPI,%OpenCL,%Matlab%

Extra3Func%onal)Requirements)and)Approxima%on)aware3transforma%ons)E.g.))Adap%vity)features,)Tuning)Knobs,))))))))))Code)transforma%ons,)))))))))))Power/Performance)constraints)

DSL L A RA

AOP

Multiversioing aspect including Modulo Perforation

Applica%ons)))

Compilers)and))Code)Transforma%ons)

Architectures)and)Circuits)

Fram

eworks)

D.)Gadioli)et)al.)“SOCRATES:)A)seamless)online)compiler)

and)system)run%me)autotuning)framework)for)energy3aware)applica%ons”)

DATE)2017)

Page 24: Energy Efficient Computing Systems Exploiting Online ...

Need for monitoring the Quality of Result (QoR)

Controlling the Approximation

22

Golden))Model)

Approximate)Model)

Checker)

It)has)to)be)noted)that)establishing)a)suitable)error)measure)is)highly)applica%on)

dependent…)

Applica%on)Inputs)

Off3line)

On3line)…)and)the)value)is)data)dependent)

Page 25: Energy Efficient Computing Systems Exploiting Online ...

Online Monitoring: an Example

23

Applica<on%Inputs%

Approximate%Results%

&%Robust%

CPU)

NPU)

Recover%

Detect%

Rumba)Khudia)et)al.)“Rumba:)An)

Online)Quality)Management)System)for)Approximate)Compu%ng”)ISCA)2015)

Error-Aware Input classification Output Temporal Similarity

Page 26: Energy Efficient Computing Systems Exploiting Online ...

•  Pure autotuning approaches •  Error = f(x,i)

Off-line Quality Profiling

At design time: 1.  Instrument the application 2.  Perform a Design Space Exploration 3.  Store the Pareto front

knob1%=%5%knob2%=%1%

Objec<ve1%=%1000%Objec<ve2%=%50%

Operating Point

24

Page 27: Energy Efficient Computing Systems Exploiting Online ...

Configuring a Tunable Application

Input%

Tunable%Program%

Target%Pladorm%

RunT%Time%

App3Knobs)

DeployTTime%

Configuration file

25

Page 28: Energy Efficient Computing Systems Exploiting Online ...

•  Pure autotuning approaches •  Error = f(x,i)

•  Proactive and/or reactive approaches: •  Error < K => x’ : f(x’,i) < K

Off-line Quality Profiling (cont…)

26

EFP%Models%Dynamic)

)Autotuner)

Input% Features%

RunT<me%Tunable%Program%

EFP%Models%

EFP%Models%

EFP%Models%

Target%Pladorm%

RunTTime%

Knobs)

Goal%

DesignTTime%

Monitors%

Page 29: Energy Efficient Computing Systems Exploiting Online ...

•  Pure autotuning approaches •  Error = f(x,i)

•  Proactive and/or reactive approaches: •  Error < K => x’ : f(x’,i) < K

Off-line Quality Profiling (cont…)

27 D.)Gadioli)et)al.)“Applica%on)autotuning)to)support)run%me)

adap%vity)in)mul%core)architectures”)SAMOS15)hgps://gitlab.com/margot_project/)

Online*

Control*System*•  Use*machineOlearning*based*approach*to*build*error*model*fe*and*cost*

model*fc*•  Assume*p(i)"is"equal"for"all"inputs ""

–  easy"to"change"assumpKon"if"needed*

22*

Offline*

Blue"boxes"provided"by"programmers"**

Error*Model*fe*

Cost*Model*fc*

Model*Builder*

Training*Inputs*Training*Inputs*Training*Inputs*

Error*Metric*

Controller*

Input* (ε,*π)*

Tunable*Program*

Cost*Metric*

Profiler*

X.)Sui)et)al.)“Proac%ve)Control)of)Approximate)Programs”)ASPLOS)2016)

H.)Hoffmann)et)al.)"Dynamic)knobs)for)responsive)power3aware)compu%ng.")ASPLOS)2012)

CAPRI%

mARGOt%

PowerDial%

H.)Hoffmann))”JouleGuard:)[…]")SOSP)2015)

Page 30: Energy Efficient Computing Systems Exploiting Online ...

IRA – Input Responsive Approximation

Removing Offline Quality Profiling

28

mARGOt - AGORA

D.)Gadioli)et)al.)“mARGOt:)a)Dynamic)Autotuning)Framework)Targe%ng)Adap%vity)and)Controllable)Approxima%on”)SoxwareX)

M.)Laurenzano)et)al.)"Input)responsiveness:)using)canary)inputs)to)dynamically)steer)approxima%on.")PLDI)16.)

Page 31: Energy Efficient Computing Systems Exploiting Online ...

Conclusions… Why taking Care of Approximation?

29