Top Banner
synergy.cs.vt .edu Power and Performance Characterization of Computational Kernels on the GPU Yang Jiao, Heshan Lin , Pavan Balaji (ANL), Wu- chun Feng
31

Synergy.cs.vt.edu Power and Performance Characterization of Computational Kernels on the GPU Yang Jiao, Heshan Lin, Pavan Balaji (ANL), Wu-chun Feng.

Dec 24, 2015

Download

Documents

Anna Bryant
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Synergy.cs.vt.edu Power and Performance Characterization of Computational Kernels on the GPU Yang Jiao, Heshan Lin, Pavan Balaji (ANL), Wu-chun Feng.

synergy.cs.vt.edu

Power and Performance Characterization of

Computational Kernels on the GPUYang Jiao, Heshan Lin, Pavan Balaji (ANL), Wu-chun Feng

Page 2: Synergy.cs.vt.edu Power and Performance Characterization of Computational Kernels on the GPU Yang Jiao, Heshan Lin, Pavan Balaji (ANL), Wu-chun Feng.

synergy.cs.vt.edu

Graphic Processing Units (GPU) are Powerful

* Data and image source, http://people.sc.fsu.edu/~jburkardt/latex/ajou_2009_parallel/ajou_2009_parallel.html

Page 3: Synergy.cs.vt.edu Power and Performance Characterization of Computational Kernels on the GPU Yang Jiao, Heshan Lin, Pavan Balaji (ANL), Wu-chun Feng.

synergy.cs.vt.edu

GPU is Increasingly Popular in HPC

Three out of top five supercomputers are GPU-based

Page 4: Synergy.cs.vt.edu Power and Performance Characterization of Computational Kernels on the GPU Yang Jiao, Heshan Lin, Pavan Balaji (ANL), Wu-chun Feng.

synergy.cs.vt.edu

GPUs are Power Hungry

Xeon GTX280 Fermi0

50

100

150

200

250

300

350T

her

mal

Des

ign

Po

wer

(W

atts

)

It is imperative to investigate Green GPU computing

Page 5: Synergy.cs.vt.edu Power and Performance Characterization of Computational Kernels on the GPU Yang Jiao, Heshan Lin, Pavan Balaji (ANL), Wu-chun Feng.

synergy.cs.vt.edu

Green Computing with DVFS on CPUs

Mechanism

Minimizing performance impact Lower voltage and frequency when CPU not in critical

path

What about GPUs?

Power Voltage∝ 2 × Frequency

Page 6: Synergy.cs.vt.edu Power and Performance Characterization of Computational Kernels on the GPU Yang Jiao, Heshan Lin, Pavan Balaji (ANL), Wu-chun Feng.

synergy.cs.vt.edu

What is this Paper about?

Characterize performance and power for various kernels on GPUs Kernels with different compute and memory

intensiveness Various core and memory frequencies

Contributions Reveal unique frequency scaling behaviors on GPUs Provide useful hints for green GPU computing

Page 7: Synergy.cs.vt.edu Power and Performance Characterization of Computational Kernels on the GPU Yang Jiao, Heshan Lin, Pavan Balaji (ANL), Wu-chun Feng.

synergy.cs.vt.edu

Outline

Introduction GPU Overview Characterization Methodology Experimental Results Conclusion & Future Work

Page 8: Synergy.cs.vt.edu Power and Performance Characterization of Computational Kernels on the GPU Yang Jiao, Heshan Lin, Pavan Balaji (ANL), Wu-chun Feng.

synergy.cs.vt.edu

NVIDIA GTX280 Architecture

8

On-chip memory • Small sizes• Fast access

Off-chip memory • Large size• High access latency

Device (Global) Memory

Page 9: Synergy.cs.vt.edu Power and Performance Characterization of Computational Kernels on the GPU Yang Jiao, Heshan Lin, Pavan Balaji (ANL), Wu-chun Feng.

synergy.cs.vt.edu

OpenCL

Write once, run on any GPUs Allow programmer to fully exploit power of

GPUs Compute kernel: function executed on a GPU

OpenCL Device Abstraction

Page 10: Synergy.cs.vt.edu Power and Performance Characterization of Computational Kernels on the GPU Yang Jiao, Heshan Lin, Pavan Balaji (ANL), Wu-chun Feng.

synergy.cs.vt.edu

GPU Frequency Scaling

Two dimensional Compute core frequency and memory frequency

Semi-automatic Dynamic configuration not supported User can only control peak frequencies Automatically switch to idle mode when no

computation

Details not available to public

Page 11: Synergy.cs.vt.edu Power and Performance Characterization of Computational Kernels on the GPU Yang Jiao, Heshan Lin, Pavan Balaji (ANL), Wu-chun Feng.

synergy.cs.vt.edu

Outline

Introduction GPU Overview Characterization Methodology Experimental Results Conclusion & Future Work

Page 12: Synergy.cs.vt.edu Power and Performance Characterization of Computational Kernels on the GPU Yang Jiao, Heshan Lin, Pavan Balaji (ANL), Wu-chun Feng.

synergy.cs.vt.edu

Kernel Selection

High performance of GPUs Massive parallelism (e.g., 240 cores) High memory bandwidth (e.g., 140GB/s)

Three kernels of computational diversity

Compute Intensive

Memory Intensive

Matrix Multiplication

Matrix Transpose

Fast Fourier Transform (FFT)

Page 13: Synergy.cs.vt.edu Power and Performance Characterization of Computational Kernels on the GPU Yang Jiao, Heshan Lin, Pavan Balaji (ANL), Wu-chun Feng.

synergy.cs.vt.edu

Kernel Characteristics

Memory to compute ratio

Instruction throughput

Rmem =#Global_Memory _Transactions

#Computation _ Instructions

Rins =#Computation _ Instructions

GPU _Time

Page 14: Synergy.cs.vt.edu Power and Performance Characterization of Computational Kernels on the GPU Yang Jiao, Heshan Lin, Pavan Balaji (ANL), Wu-chun Feng.

synergy.cs.vt.edu

Kernel Profile

Matrix Multiplication

Matrix Transpose

FFT

Rmem 5.6% 53.7% 8.3%

Rins 203215711 12095895 145165788

Page 15: Synergy.cs.vt.edu Power and Performance Characterization of Computational Kernels on the GPU Yang Jiao, Heshan Lin, Pavan Balaji (ANL), Wu-chun Feng.

synergy.cs.vt.edu

Measurement

Performance Matrix multiplication, FFT: GFLOPS Matrix transpose: MB/s

Energy Whole system when executing the kernel on the GPU

Power Reported using the average power

Energy Efficiency Performance / power

Page 16: Synergy.cs.vt.edu Power and Performance Characterization of Computational Kernels on the GPU Yang Jiao, Heshan Lin, Pavan Balaji (ANL), Wu-chun Feng.

synergy.cs.vt.edu

Outline

Introduction GPU Overview Characterization Methodology Experimental Results Conclusion & Future Work

Page 17: Synergy.cs.vt.edu Power and Performance Characterization of Computational Kernels on the GPU Yang Jiao, Heshan Lin, Pavan Balaji (ANL), Wu-chun Feng.

synergy.cs.vt.edu

Experimental Setup

System Intel Core 2 Quad Q6600 NVIDIA GTX280 1GB memory

Power Meter Watts Up? Pro ES

Page 18: Synergy.cs.vt.edu Power and Performance Characterization of Computational Kernels on the GPU Yang Jiao, Heshan Lin, Pavan Balaji (ANL), Wu-chun Feng.

synergy.cs.vt.edu

Matrix Multiplication - Performance

Mostly affected by core frequency, almost not affected by memory frequency

400 450 500 550 600 650 70085

95

105

115

125

135

145

155

600700800900100011001200

GPU Core Frequency (MHz)

Perf

orm

ance

(GFL

OPS

)

Page 19: Synergy.cs.vt.edu Power and Performance Characterization of Computational Kernels on the GPU Yang Jiao, Heshan Lin, Pavan Balaji (ANL), Wu-chun Feng.

synergy.cs.vt.edu

Matrix Multiplication - Power

Mostly affected by core frequency, slightly affected by memory frequency

400 450 500 550 600 650 700245

255

265

275

285

295

305

315

600700800900100011001200

GPU Core Frequency (MHz)

Pow

er (W

atts)

Page 20: Synergy.cs.vt.edu Power and Performance Characterization of Computational Kernels on the GPU Yang Jiao, Heshan Lin, Pavan Balaji (ANL), Wu-chun Feng.

synergy.cs.vt.edu

Matrix Multiplication - Efficiency

Best efficiency achieved at highest core frequency and relatively high memory frequency

400 450 500 550 600 650 700340

360

380

400

420

440

460

480

500

600700800900100011001200

GPU Core Frequency (MHz)

Pow

er E

ffici

ency

(M

FLO

PS/W

att)

Page 21: Synergy.cs.vt.edu Power and Performance Characterization of Computational Kernels on the GPU Yang Jiao, Heshan Lin, Pavan Balaji (ANL), Wu-chun Feng.

synergy.cs.vt.edu

Matrix Transpose - Performance

Performance dominated by memory frequency

400 450 500 550 600 650 700150

170

190

210

230

250

270

600700800900100011001200

GPU Core Frequency (MHz)

Perf

orm

ance

(MB/

s)

Page 22: Synergy.cs.vt.edu Power and Performance Characterization of Computational Kernels on the GPU Yang Jiao, Heshan Lin, Pavan Balaji (ANL), Wu-chun Feng.

synergy.cs.vt.edu

Matrix Transpose - Power

Higher core frequency increase power consumption (not performance)

400 450 500 550 600 650 700195200205210215220225230235240

600700800900100011001200

GPU Core Frequency (MHz)

Pow

er (W

atts)

Page 23: Synergy.cs.vt.edu Power and Performance Characterization of Computational Kernels on the GPU Yang Jiao, Heshan Lin, Pavan Balaji (ANL), Wu-chun Feng.

synergy.cs.vt.edu

Matrix Transpose - Efficiency

Best efficiency achieved at highest memory frequency and lowest core frequency

400 450 500 550 600 650 700650

750

850

950

1050

1150

1250

600700800900100011001200

GPU Core Frequency (MHz)

Pow

er E

ffici

ency

(KBP

S/W

att)

Page 24: Synergy.cs.vt.edu Power and Performance Characterization of Computational Kernels on the GPU Yang Jiao, Heshan Lin, Pavan Balaji (ANL), Wu-chun Feng.

synergy.cs.vt.edu

FFT - Performance

Affected by both core and memory frequencies

400 450 500 550 600 650 70040455055606570758085

600700800900100011001200

GPU Core Frequency (MHz)

Perf

orm

ance

(GFL

OPS

)

Page 25: Synergy.cs.vt.edu Power and Performance Characterization of Computational Kernels on the GPU Yang Jiao, Heshan Lin, Pavan Balaji (ANL), Wu-chun Feng.

synergy.cs.vt.edu

FFT - Power

Affected by both core and memory frequencies

400 450 500 550 600 650 700225

235

245

255

265

275

285

600700800900100011001200

GPU Core Frequency (MHz)

Pow

er (W

atts)

Page 26: Synergy.cs.vt.edu Power and Performance Characterization of Computational Kernels on the GPU Yang Jiao, Heshan Lin, Pavan Balaji (ANL), Wu-chun Feng.

synergy.cs.vt.edu

FFT - Efficiency

Best efficiency at highest core and memory frequencies

400 450 500 550 600 650 700185

205

225

245

265

285

305

600700800900100011001200

GPU Core Frequency (MHz)

Pow

er E

ffcie

ncy

(GFL

OPS

/w)

Page 27: Synergy.cs.vt.edu Power and Performance Characterization of Computational Kernels on the GPU Yang Jiao, Heshan Lin, Pavan Balaji (ANL), Wu-chun Feng.

synergy.cs.vt.edu

FFT – Two Dimensional Effect

Power (Watts) Efficiency (Mflops/Watt)225

230

235

240

245

250

255

260

265

270

<550, 1200><600, 1000><700, 800>

7%

Page 28: Synergy.cs.vt.edu Power and Performance Characterization of Computational Kernels on the GPU Yang Jiao, Heshan Lin, Pavan Balaji (ANL), Wu-chun Feng.

synergy.cs.vt.edu

Power and Efficiency Range

Power Efficiency0%

5%

10%

15%

20%

25%

30%

35%

40%

45%

Matrix MultiplicationMatrix TransposeFFT

Page 29: Synergy.cs.vt.edu Power and Performance Characterization of Computational Kernels on the GPU Yang Jiao, Heshan Lin, Pavan Balaji (ANL), Wu-chun Feng.

synergy.cs.vt.edu

Conclusion & Future Work

To take away Green computing on GPUs are important GPU frequency scaling considerably different than

CPUs

Next Finer-grained level of characterization (e.g., different

types of operations) Experiments on Fermi and AMD GPUs

Page 30: Synergy.cs.vt.edu Power and Performance Characterization of Computational Kernels on the GPU Yang Jiao, Heshan Lin, Pavan Balaji (ANL), Wu-chun Feng.

synergy.cs.vt.edu

Acknowledgment

NSF Center for High Performance Reconfigurable Computing (CHREC) for their support through NSF I/UCRC Grant IIP-0804155;

National Science Foundation for their support partialy through CNS-0915861 and CNS-0916719.

Page 31: Synergy.cs.vt.edu Power and Performance Characterization of Computational Kernels on the GPU Yang Jiao, Heshan Lin, Pavan Balaji (ANL), Wu-chun Feng.

synergy.cs.vt.edu

Questions?