Top Banner
Energy-Precision Tradeoffs in the Graphics Pipeline Jeff Pool March 19 th , 2012
77

Energy-Precision Tradeoffs in the Graphics Pipeline

Feb 25, 2016

Download

Documents

caelan

Energy-Precision Tradeoffs in the Graphics Pipeline. Jeff Pool March 19 th , 2012. Motivation. Why energy? It matters everywhere: - Mobile devices - Desktop computers - Servers, data centers It’s a bottleneck to performance!. - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Energy-Precision Tradeoffs in the Graphics Pipeline

Energy-Precision Tradeoffs in the Graphics Pipeline

Jeff PoolMarch 19th, 2012

Page 2: Energy-Precision Tradeoffs in the Graphics Pipeline

2

MotivationWhy energy?

It matters everywhere:- Mobile devices- Desktop computers- Servers, data centers

It’s a bottleneck to performance!

http://img717.imageshack.us/img717/3936/1101771coolitomni.jpg

http://www.ornl.gov/ornlhome/images/casl/TVA%20Watts%20Bar.jpg

Page 3: Energy-Precision Tradeoffs in the Graphics Pipeline

3

MotivationWhy precision?

Sign Exponent Mantissa

IEEE 754-2008 Single-Precision Floating-Point Representation

Page 4: Energy-Precision Tradeoffs in the Graphics Pipeline

4

Don’t do Unnecessary Work• Max precision isn’t needed:

– 8-10 bit color buffers– FP32 => 24 bits of precision– Potentially lots of wasted effort!

• It’s certainly more complicated, but worth exploring

Page 5: Energy-Precision Tradeoffs in the Graphics Pipeline

5

My ApproachVariable-precision computations- Reduce the precision when possible: 12.5 mantissa

bits used- Save energy in arithmetic: 70% less

energy- Low errors: 0.086%

differenceFull-Precision Arithmetic Reduced-Precision Arithmetic

Page 6: Energy-Precision Tradeoffs in the Graphics Pipeline

6

My ApproachCommunicate fewer bits - Since fewer bits are used in computation - Most DRAM traffic is already compressed

Crysis, 2007

Variable-precision compression:(on sample frame)

- Geometry improved by 12%- Depth improved by 83%

Page 7: Energy-Precision Tradeoffs in the Graphics Pipeline

7

GPU Global Memory Texture Frame-

Data Buffer

The Graphics Pipeline

Vertex Shader

Rasterization

Pixel Shader

Background

Page 8: Energy-Precision Tradeoffs in the Graphics Pipeline

8

GPUs: A Brief History

Time

Cap

abili

ty

GPU

(NOT to scale!)

Fixe

d-Fu

nctio

n

Prog

ram

mab

ility

GPG

PU

CU

DA,

Str

eam

, O

penC

L

Shader Program

Compute Program

1.53, 32.8, …

Page 9: Energy-Precision Tradeoffs in the Graphics Pipeline

10

Thesis StatementReducing the work done in the modern

graphics pipeline through novel communication and variable-precision computation techniques can enable a tradeoff between energy savings and image fidelity, leading to significant energy savings without perceptible loss of image quality.

Page 10: Energy-Precision Tradeoffs in the Graphics Pipeline

11

How?Proving this thesis:

– Show that induced errors are imperceptible– Show significant energy savings

• Find energy consumed by entire pipeline• Find energy savings possible in each stage

Page 11: Energy-Precision Tradeoffs in the Graphics Pipeline

12

Roadmap• My work

– Energy model– Energy savings in computation– Energy savings in communication

• Conclusions• Future work

Page 12: Energy-Precision Tradeoffs in the Graphics Pipeline

13

Roadmap• My work

– Energy model– Energy savings in computation– Energy savings in communication

• Conclusions• Future work

Page 13: Energy-Precision Tradeoffs in the Graphics Pipeline

14

Why an Energy Model?So I’ll know how much difference saving energy in different stages actually makes, know where to focus• Provides researchers/developers a tool to

predict energy usagePast Work Validated? Graphics? Simple

?Brooks et al., 2000Eisley et al., 2006Shaeffer et al., 2004Ramani et al., 2007Nagasaka et al., 2010Hong and Kim, 2010Zhang et al., 2011

Page 14: Energy-Precision Tradeoffs in the Graphics Pipeline

15

Strategy• Model construction

– Experimentally measure energy for each operation

• Energy prediction– Profile a scene for operations performed– Predict total energy consumption (dot

product)• Validation

– Compare prediction with measured energy

Page 15: Energy-Precision Tradeoffs in the Graphics Pipeline

16

What Operations?• Arithmetic

– ADD, MUL, SIN/COS, POW, LOG, …• Memory

– Local/Global Load/Store• Programmable

– Vertex/Pixel Shaders• Fixed-function

– Rasterization, Texture filtering

Explicit

Implicit

Page 16: Energy-Precision Tradeoffs in the Graphics Pipeline

17

Measuring Energy in the GPU

Explicit• GPGPU

– Runs on same hardware as graphics

– No ambiguity in operations

• Simple microkernels– Little/no overhead– 10s runtime– Directed tests per

operation

Implicit• OpenGL• Enable/Disable

operation in question– Difference in energy is

the operation’s contribution

– Not as straightforward• Ex.: Texture filtering

Page 17: Energy-Precision Tradeoffs in the Graphics Pipeline

18

Experimental Setup• NVIDIA 8300GS graphics card• Adex Electronics’ PEX16LX PCI riser to interrupt

power from motherboard

• Supply metered power to the card– 12V– 3.3V– 12V (fan, not counted in energy)

• Log runtimes/framerates, measure current as tests run

http://www.pretaktovanie.sk/obr/spotreba/eng/PICTURES/P1010283_ENG.jpg

Page 18: Energy-Precision Tradeoffs in the Graphics Pipeline

19

ResultsOperation Energy (nJ)

Arithmetic 0.4 – 22.9

MemoryLocal load 1.49Local store 1.49Global load* 8.39 – 67.40Global store* 5.19 – 42.70 *Depending on type of access

Rasterization (per pixel) 0.24

Texture filtering (per pixel) 7.0 – 13.8

Page 19: Energy-Precision Tradeoffs in the Graphics Pipeline

20

Profiling Operations Performed

• Use Microsoft’s PIX to log a frame of a running application:– Framebuffer contents– Vertex data– Render states– Vertex shaders– Pixel shaders– Per draw call

(100-1000s per frame)

• From all this data, extract operations

Page 20: Energy-Precision Tradeoffs in the Graphics Pipeline

21

Validation• Three different applications, four scenes

– Real-world games to test the developed model• Harvested data, predict energy usage• Measured real energy usage, compare

Half Life 2: Lost Coast(High/Low Rendering

Qualities)

Batman: Arkham Asylum

Mass Effect

Page 21: Energy-Precision Tradeoffs in the Graphics Pipeline

22

Validation Results

Batman HL2_low HL2_high Mass Effect0

100

200

300

400

500

600

700MeasuredPredicted

Test Scene

Ener

gy (

mJ)

Overheads

Page 22: Energy-Precision Tradeoffs in the Graphics Pipeline

23

What Uses the Energy?

Batman

HL2_lo

w

HL2_h

igh

Mas

s Effec

t0

100200300400500600700

FB-WriteFB-ReadZ-WriteZ-ReadPS-MemoryPS-ArithmeticRasterizationVSRead Geometry

Test Scene

Esti

mat

ed E

nerg

y (m

J)

Page 23: Energy-Precision Tradeoffs in the Graphics Pipeline

24

Roadmap• My work

– Energy model– Energy savings in computation– Energy savings in communication

• Conclusions• Future work

Page 24: Energy-Precision Tradeoffs in the Graphics Pipeline

25

Where Does the Power Go?

Ptotal = Pdynamic + Pstatic

Power

Ground

CMOS Inverter

Page 25: Energy-Precision Tradeoffs in the Graphics Pipeline

26

Energy-Saving Techniques

Clock gating (Park et al., 2010)Signal gating (Huang and Ercegovac, 2003)Power gating

– Coarse (Usami et al., 2009, Sjalander et al., 2005)

– Fine (My work)

Ptotal = Pdynamic + Pstatic

Page 26: Energy-Precision Tradeoffs in the Graphics Pipeline

27

!Enable

Example: 1-Bit Adder

Cin

A

BCout

S

Page 27: Energy-Precision Tradeoffs in the Graphics Pipeline

28

HW Results

SPICE simulations of:Adders: linear savings

Multipliers: quadratic savings

Page 28: Energy-Precision Tradeoffs in the Graphics Pipeline

29

Precision in RenderingVariable-Precision fixed-function CPU

rendering– Hao and Varshney, 2001– 3 key differences: GPU, FP32,

programmabilityDepth buffer comparator

– Hensley, Singh, and Lastra, 2005Triangle separation for correct occlusion

– Akeley and Su, 2006

Page 29: Energy-Precision Tradeoffs in the Graphics Pipeline

30

VARIABLE-PRECISION PIXEL SHADERS

So, we have hardware, let’s see what happens in

Page 30: Energy-Precision Tradeoffs in the Graphics Pipeline

31

A Pixel Shader

Page 31: Energy-Precision Tradeoffs in the Graphics Pipeline

32

Exaggerated Texture Coordinate Errors

Blocky textures(8 mantissa bits)

Original frame(24 mantissa bits)

Page 32: Energy-Precision Tradeoffs in the Graphics Pipeline

33

Arithmetic Errors

… Different?(8 mantissa bits)

Original frame(24 mantissa bits)

Page 33: Energy-Precision Tradeoffs in the Graphics Pipeline

34

Exaggerated Arithmetic Errors

Clearly different(4 mantissa bits)

Original frame(24 mantissa bits)

Page 34: Energy-Precision Tradeoffs in the Graphics Pipeline

35

Different Errors,Different Tolerances

• Colors can be pushed far lower– 12, 10, 8 bits for color components (plus

one for rounding)

• Texture coordinates may need to be fully precise!

Page 35: Energy-Precision Tradeoffs in the Graphics Pipeline

36

So, Treat Them Separately

Page 36: Energy-Precision Tradeoffs in the Graphics Pipeline

37

So, Treat Them Separately

ACould contribute to texture coordinates

Page 37: Energy-Precision Tradeoffs in the Graphics Pipeline

38

So, Treat Them Separately

A

B

Could contribute to texture coordinates

Will NOT contribute to texture coordinates

Page 38: Energy-Precision Tradeoffs in the Graphics Pipeline

39

Precision Selection Strategies

• Statically• Artist-directed• Automatic closed-loop

Page 39: Energy-Precision Tradeoffs in the Graphics Pipeline

40

Static Program Analysis

9 bits10 bits

12 bits

9 bits10 bits And so on…

11 bits

Page 40: Energy-Precision Tradeoffs in the Graphics Pipeline

41

Artist-Directed PrecisionsPrecisions are chosen as the effect is designed

Page 41: Energy-Precision Tradeoffs in the Graphics Pipeline

42

Automatic Closed-Loop Precision Selection

Run time feedback controlPer-shader error detection and precision

control

Error DetectionRenderer

Controller

Display

Prec

isio

n

Reduced Pixel

Full Pixel(sparsely sampled)

Error

Reduced Pixel

Page 42: Energy-Precision Tradeoffs in the Graphics Pipeline

43

Experimental SetupStatic analysis

– Analyze shaders to find minimum safe operating precision

Artist-directed– Modify several demo applications– Allow the artist to choose precisions

Automatic closed-loop– Modify the ATTILA GPU simulator– Apply several feedback control schemes– Several test scenes

Page 43: Energy-Precision Tradeoffs in the Graphics Pipeline

44

Data SetsData Set Static Directed Automatic

Closed-Loop

Depth of Field

Parallax Mapping

SSAO

Half Life 2: Lost Coast

Doom 3

Need for Speed: UndercoverMetaballs

Page 44: Energy-Precision Tradeoffs in the Graphics Pipeline

45

Data Sets

Page 45: Energy-Precision Tradeoffs in the Graphics Pipeline

46

Results: PrecisionsData Set Static Directed Automatic

Closed-Loop

Depth of Field 18.5 12.0 -Parallax Mapping 23.3 15.2 -SSAO 20.1 13.0 -Half Life 2: Lost Coast 19.1 - 13.2Doom 3 19.7 - 14.7Need for Speed: Undercover

21.8 - 16.5Metaballs 9.7 - 8.9

Lower is Better!

Page 46: Energy-Precision Tradeoffs in the Graphics Pipeline

47

Results: Closed-Loop ErrorsUnnoticeable in practice

Page 47: Energy-Precision Tradeoffs in the Graphics Pipeline

48

Results: % Energy SavingsData Set Static Directed Automatic

Closed-Loop

Depth of Field 33% 79% -

Parallax Mapping -2% 61% -

SSAO 49% 71% -

Half Life 2: Lost Coast 33% - 75%

Doom 3 15% - 69%Need for Speed: Undercover

2% - 50%

Metaballs 87% - 90%

Higher is Better!

Overall Energy: 2/3 1/5

Page 48: Energy-Precision Tradeoffs in the Graphics Pipeline

49

Which Precision Selection Method?

Approach Savings HW Complexity

Artist Effort

Static Low Low Low

Directed High Low Medium

Automatic Closed-Loop High High Low

Page 49: Energy-Precision Tradeoffs in the Graphics Pipeline

50

Directed Approach• High savings

– 70-80% in arithmetic– 10-20% overall GPU energy

• (by arithmetic alone!)• Low errors

– Acceptable by design– Quantitatively low (PSNR, % error)

Page 50: Energy-Precision Tradeoffs in the Graphics Pipeline

51

Variable Precision Geometry• Vertex shaders• Similarly high savings (55-80%)• Different types of errors

– XY Screen-space– Depth

Page 51: Energy-Precision Tradeoffs in the Graphics Pipeline

52

XY Screen-Space Errors8 bits of precision

Page 52: Energy-Precision Tradeoffs in the Graphics Pipeline

53

Depth Errors16 bits of precision

Page 53: Energy-Precision Tradeoffs in the Graphics Pipeline

54

Depth Matters (Some)• Far before XY errors• Even in unmodified commercial games

http://underpop.free.fr/j/java/developing-games-in-java/1592730051_ch10lev1sec5.htmlhttps://encrypted-tbn3.google.com/images?q=tbn:ANd9GcRWhmviKHKMGVAU1ooXrAzJxa_2IlknTI6cRT4MGfJyTpaZNYw-MA

Page 54: Energy-Precision Tradeoffs in the Graphics Pipeline

55

COMMUNICATING LESS DATA

Variable-precision computation works. Let’s look at

Page 55: Energy-Precision Tradeoffs in the Graphics Pipeline

56

Energy Savings in Communication

• Off-chip: compression (most data!)– Strom et al. (2008), Rasmussen et al. (2007,

2009)• 16 bit positive color/depth values• I adapt their approach to my needs

– Negative numbers– 32 bits– General values

• On-chip: bus encoding, caching– Reduced precision data freeze unused

lines

Page 56: Energy-Precision Tradeoffs in the Graphics Pipeline

57

Unified CompressorKey idea behind compression:• Encode numbers as differences between them• Similar numbers lead to smaller representations

What’s tricky about adding geometry, GPGPU data?

• Negative values• Arbitrary data/attribute layout• Random access

• Each will limit how complicated the compressor’s design can be

Page 57: Energy-Precision Tradeoffs in the Graphics Pipeline

58

Handling Negative ValuesColor and depth data is all positive – sign bit unused!

Not so for general data.• Overflow can occur during prediction and difference encoding• My approach

– Generalized past work to handle negative values– Drastically simplified processing of differences

• Limitation: can’t do any processing

33 Bit Adder

EncoderProcessing

Page 58: Energy-Precision Tradeoffs in the Graphics Pipeline

59

Arbitrary Data Layout

X Y Zx1 y1 z1x2 y2 z2… … …

Geometry?

Color is simple

X Y Z U V Nx Ny Nz …

x1 y1 z1 u1 v1 …

x2 y2 z2 u2 v2 …

… … … … …

Page 59: Energy-Precision Tradeoffs in the Graphics Pipeline

60

Arbitrary Data LayoutOur approach: encode vectors of data (rather than

blocks)• Color

– Alpha channel for free!• Geometry

– Intra-attribute coherence!

Limitation: no 2D coherence

X Y Zx1 y1 z1x2 y2 z2… … …… … …… … …xN yN zN

Page 60: Energy-Precision Tradeoffs in the Graphics Pipeline

61

Random AccessRandom access is necessary for graphics• Color data maps well – 4x4, 8x8 tiles• Geometry?

– Simply encode a subset, C, of the vertices at a time

X x1 x2 … … xC xC+1 … … x2C ...Y y1 y2 … … yC yC+1 … … y2C ...Z z1 z2 … … zC zC+1 … … z2C ...

Page 61: Energy-Precision Tradeoffs in the Graphics Pipeline

62

DoF

HDR_1

HDR_2

Parallax

Map

Smoke

Crysis

NFS:U Car

Carava

n

Soldier

0

20

40

60

80

100TiledUnified

Color Depth

Com

pres

sed

Siz

e (%

)Unified Compressor

Compared to (Ström 2008) (“Tiled”)Smaller is better!

Color channel coherence!

Page 62: Energy-Precision Tradeoffs in the Graphics Pipeline

63

Unified CompressorGeometric data sets – uncompressed in

past work!

Data Set Compressed Bandwidth (%)Crysis 30.3Crysis: Warhead 55.6Need For Speed: Undercover 37.0Half Life 2: Lost Coast (Scene 1)

28.6

Half Life 2: Lost Coast (Scene 2)

23.8

Page 63: Energy-Precision Tradeoffs in the Graphics Pipeline

64

Improvements to Existing Compressors

Just a brief mention of my other work:• Dynamic Bucket Selection

– Average of 1.25x improvement• Fibonacci Encoding

– Up to 1.7x improvement– Average of 1.12x for unified compressor

• Dynamic Range Reduction– Extra 5-20%, depending on application

Page 64: Energy-Precision Tradeoffs in the Graphics Pipeline

65

On-Chip CommunicationFreeze (signal gate) unused bus lines

from register file to L1 cacheApplication Average

PrecisionEnergy (%)

Half-Life 2: Lost Coast (1) 10.9 63.7Half-Life 2: Lost Coast (2) 10.2 55.9Doom 3 9.8 62.5Need For Speed: Undercover

19.4 86.9

Metaballs 8.2 52.7

Page 65: Energy-Precision Tradeoffs in the Graphics Pipeline

66

SUMMARY

Page 66: Energy-Precision Tradeoffs in the Graphics Pipeline

67

Thesis StatementReducing the work done in the modern

graphics pipeline through novel communication and variable-precision computation techniques can enable a tradeoff between energy savings and image fidelity, leading to significant energy savings without perceptible loss of image quality.

Page 67: Energy-Precision Tradeoffs in the Graphics Pipeline

68

How Did I Do?• Show that induced errors are imperceptible

– Vertex and pixel shader precisions can be reduced significantly without loss of quality

• Show significant energy savings– Find energy consumed by entire pipeline

• Energy model accurate to within 5% for tested applications– Find energy savings possible in each stage

• Designed hardware that saves energy• Used this hardware and the reduced precisions to find energy

savings in computation• Used precision information to enable further savings in on-

and off-chip communication

Page 68: Energy-Precision Tradeoffs in the Graphics Pipeline

69

Batman HL2_low HL2_high Mass Effect0

100

200

300

400

500

600

700

Test Scene

Esti

mat

ed E

nerg

y (m

J)Estimated Energy Savings

54%

49%

46%57%

Page 69: Energy-Precision Tradeoffs in the Graphics Pipeline

70

Future WorkAlong the same lines…

– Variable-precision FPU– Other sections of the memory hierarchy– Recently-introduced stages (geometry,

tessellation, compute shaders)– GPGPU applications

Larger scale…– 2-bit granularity precision control– Scheduling for dynamic voltage/frequency scaling

(DVFS)– Architectural studies

Page 70: Energy-Precision Tradeoffs in the Graphics Pipeline

71

List of Papers• Jeff Pool, Anselmo Lastra, and Montek Singh, “Lossless Compression of

Variable-Precision Floating-Point Buffers on GPUs,” ACM Interactive 3D Graphics and Games (I3D), 9-11 March 2012.

• Jeff Pool, Anselmo Lastra, and Montek Singh, “Precision Selection for Energy-Efficient Pixel Shaders,” High Performance Graphics, 5-7 Aug. 2011.

• Jeff Pool, Anselmo Lastra, and Montek Singh, “Power-Gated Arithmetic

Circuits for Energy-Precision Tradeoffs in Mobile Graphics Processing Units,” Journal of Low Power Electronics, Vol. 7, No. 2, 2011.

• Jeff Pool, Anselmo Lastra, and Montek Singh, “An Energy Model for Graphics Processing Units,” IEEE International Conference on Computer Design, 3-6 Oct. 2010.

• Jeff Pool, Anselmo Lastra, and Montek Singh, “Energy-Precision Tradeoffs in Mobile Graphics Processing Units,” IEEE International Conference on Computer Design, 12-15 Oct. 2008.

Page 71: Energy-Precision Tradeoffs in the Graphics Pipeline

72

AcknowledgmentsAdvisers: Anselmo and MontekCommittee members: Dinesh Manocha, Steve

Molnar, John PoultonJustin Hensley for starting the variable-precision

work

Various folks around the department for their feedback

Family and friends for their support and encouragement

The NSF for funding

Page 72: Energy-Precision Tradeoffs in the Graphics Pipeline

73

THANKS, QUESTIONS?

Page 73: Energy-Precision Tradeoffs in the Graphics Pipeline

74

BACKUP

Page 74: Energy-Precision Tradeoffs in the Graphics Pipeline

75

Programmer-Directed

Directed StaticScene Precision PSNR Savings Precision SavingsSSAO 13.0 53.4 71% 20.1 49%Parallax 15.2 39.7 61% 23.3 -2%DoF 12.0 45.6 79% 18.5 33%

Page 75: Energy-Precision Tradeoffs in the Graphics Pipeline

76

Results – Programmable UnitsOperation Energy (nJ)

add 0.443mul 0.357mad 0.455rcp 2.440exp 1.512log 5.177sin/cos 22.997pow 16.366

Local load 1.490Local store 1.490Global load (coalesced) 8.390Global store (coalesced) 5.190Global load (uncoalesced) 67.400Global store (uncoalesced) 42.700

M

emor

y

Arith

met

ic

Page 76: Energy-Precision Tradeoffs in the Graphics Pipeline

77

Results – Fixed-Function Units

Rasterization Off OnEnergy/Pixel (pJ/P) 166.4 404.6Rasterization Cost (pJ/P)

- 238.2

Texturing Mipmapping Energy/pixel (nJ/p)Nearest - 13.3Bilinear - 13.8Nearest Nearest 7.07Bilinear Nearest 7.76Bilinear Linear 10.6

Qua

lity

Low

High

Page 77: Energy-Precision Tradeoffs in the Graphics Pipeline

78

Typical Cell Phone Energy Consumption

http://www.androidcentral.com/android-quick-app-juice-defender-ultimate

Varies drastically depending on workloadMore efficient GPU == more time watching movies, playing games,

HTML5 …

Advertised talk times dwarf video playback/game times!

http://www.howtogeek.com/wp-content/uploads/2010/08/image207.pnghttp://tapatalk.com/mu/5adc833a-beea-1bf3.jpg