Top Banner
GPGPU and Stream Computing Julian Fietkau University of Hamburg June 30th, 2011
21

GPGPU and Stream Computing

May 10, 2015

Download

Technology

Julian Fietkau

This talk was held in the seminar for the course Parallel Programming and dealt with general purpose computation on graphics hardware and fundamentals of stream computing. Building on previous knowledge about computer architecture and parallelization strategies, I contextualized GPGPU and introduced stream computing as its background. I then demonstrated a few modern languages and technologies (CUDA, OpenCL) and briefly touched upon compilation processes (NVIDIA PTX, AMD IL). The talk ended with perspectives on programmability and efficiency of the technologies and a short overview of the latest trends.
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: GPGPU and Stream Computing

GPGPU and Stream Computing

Julian Fietkau

University of Hamburg

June 30th, 2011

Page 2: GPGPU and Stream Computing

Julian Fietkau

Things to clear up beforehand. . .

These slides are published under the CC-BY-SA 3.0 license.Sources for the numbered figures are in the →list of figures.

Non- numbered pictures and illustrations are from theOpenClipArt Project or are based on content from there.

Download these slides and give feedback:http://www.julian-fietkau.de/gpgpu_and_stream_computing

2 / 21

Page 3: GPGPU and Stream Computing

Agenda Julian Fietkau

Agenda

IntroductionGeneral Idea of GPGPUStream Computing

LanguagesCommon IdeasOpenCLCUDAOthersCompilation to Intermediary Languages

PropertiesProgrammabilityEfficiency

Prospects and ConclusionsFuture DevelopmentsConclusion

3 / 21

Page 4: GPGPU and Stream Computing

Introduction: General Idea of GPGPU Julian Fietkau

Flynn’s Taxonomy

SISD MISDSIMD MIMD

4 / 21

Page 5: GPGPU and Stream Computing

Introduction: General Idea of GPGPU Julian Fietkau

Why Does It Exist?

� How long can Moore’s lawhold true? → parallelism asa possible answer tocomputational demands

� “swiss army knife”(generally optimal solution)for parallel programminghas not been found

� idea: exploitconsumer-grade graphicshardware

Figure 1: Moore’s law – 2011

5 / 21

Page 6: GPGPU and Stream Computing

Introduction: General Idea of GPGPU Julian Fietkau

About Graphics Hardware

� games need to display increasinglyrealistic objects/scenes in realtime

� need to calculate a lot of verticesand a lot of pixels very quickly→ Pixel/Vertex Shaders, laterUnified Shader Model

� consumer market ensures thatgraphics adapters remain(relatively) cheap

� General Purpose computation onGraphics Processing Units

6 / 21

Page 7: GPGPU and Stream Computing

Introduction: Stream Computing Julian Fietkau

Stream Computing

� idea: operate on a “stream” of data passing through different“kernels”

� related to SIMD� mitigates some of the difficulties of parallelism on von Neumannarchitectures as well as simple SIMD implementations like SSE orAltiVec

� first came up in the 70ies, didn’t gain much traction as “pure”implementations, but hybrid architectures survived

7 / 21

Page 8: GPGPU and Stream Computing

Introduction: Stream Computing Julian Fietkau

Stream Computing Example

Input: u, v, w;

x = u - (v + w);y = u * (v + w);

Output: x, y;

Figure 2: Stream Computing Example8 / 21

Page 9: GPGPU and Stream Computing

Languages: Common Ideas Julian Fietkau

Common Ideas

modern streaming programming languages. . .� . . . are verbose about different usage scenarios for memory� . . . help with partitioning problem spaces in a multitude of ways� . . . are not afraid to introduce limitations to faciliate optimization

9 / 21

Page 10: GPGPU and Stream Computing

Languages: OpenCL Julian Fietkau

OpenCL™

� Open Computing Language, free standard by Khronos™ Group

Context

Application Kernel

Command Queue

Device

Figure 3: OpenCL™ Application Model

10 / 21

Page 11: GPGPU and Stream Computing

Languages: OpenCL Julian Fietkau

OpenCL™ in Detail

Host

Application

Device

NDRangeWork group(0,0)

Work group(1,0)Work group(1,1)

Work group(2,0)

Work group(0,1)

Work group(1,2)

Workitem

(0,1,0)

Workitem

(1,1,0)

Work item

(2,1,0)

Work item

(0,0,0)

Work item(2,0,1)

Work item(2,1,1)

Work item

(1,0,0)

Work item

(2,0,0)

Figure 4: OpenCL™ Problem Partitioning11 / 21

Page 12: GPGPU and Stream Computing

Languages: CUDA Julian Fietkau

CUDA

� NVIDIA’s custom framework for high-level GPGPU� (it’s actually older than OpenCL though)

� same basic idea, but specific to NVIDIA GPUs� conceptually only minor differences between CUDA and OpenCL

� biggest one: CUDA is compiled at application compile time whileOpenCL is (typically) compiled at application run time

� also, annoying nomenclature differences (e.g. shared vs. local vs.private memory)

12 / 21

Page 13: GPGPU and Stream Computing

Languages: Others Julian Fietkau

Others

There are several more stream processing languages, some of themlong in development. Notable:� Brook (and Brook+)� Cilk, compare also Intel Array Building Blocks

13 / 21

Page 14: GPGPU and Stream Computing

Languages: Compilation to Intermediary Languages Julian Fietkau

Intermediary Languages

ProblemThe actual binary code that runs on devices needs to “know” aboutexact numbers for cores, memory, registers etc., information that isgenerally not known at compile time.

→ compilation to an intermediary language like NVIDIA’s PTX andAMD’s IL, low-level and assembly-like yet abstracting some hardwarelimitations

14 / 21

Page 15: GPGPU and Stream Computing

Languages: Compilation to Intermediary Languages Julian Fietkau

PTX and AMD IL

PTX example.reg .b32 r1, r2;.global .f32 array[N];

start: mov.b32 r1, %tid.x;shl.b32 r1, r1, 2; // shift thread id by 2 bitsld.global.b32 r2, array[r1]; // thread[tid] gets array[tid]add.f32 r2, r2, 0.5; // add 1/2

AMD IL examplesample_resource(0)_sampler(0) r0.x, v0.xy00mov r2.x, r0.xxxxdcl_output_generic o0ret

15 / 21

Page 16: GPGPU and Stream Computing

Properties: Programmability Julian Fietkau

Programmability

� as they’re mostly custom versions of C, GPGPU languages arerather simple to pick up for someone with C experience

� OpenCL™ and CUDA both look slightly boilerplate-y for smalltasks� hypothesis: they might not be designed for small tasks

� disadvantage of the cutting edge: toolchain maturity might belacking

� watch out for vendor dependencies!

16 / 21

Page 17: GPGPU and Stream Computing

Properties: Efficiency Julian Fietkau

Efficiency

� hard to find actual data� optimizations and proficiency might skew the results� conceptual similarities indicate that implementations would also besimilar

� CUDA can get a (constant) head start vs. OpenCL™ due to beingprecompiled

� CUDA might generally perform faster, sometimes significantly,than OpenCL (but take this with a grain of salt)

17 / 21

Page 18: GPGPU and Stream Computing

Prospects and Conclusions: Future Developments Julian Fietkau

Things to Come

The future remains notoriously hard to predict.� at the moment, we see increased interest in specialized GPGPUboards (cf. NVIDIA Tesla and AMD FireStream)

� OpenCL promotes device flexibility at the cost of efficiency – noway to know if this strategy will win

� Intel pushes for integrated solutions with more processing power(cf. Sandy Bridge, Ivy Bridge)

18 / 21

Page 19: GPGPU and Stream Computing

Prospects and Conclusions: Conclusion Julian Fietkau

Conclusion

� GPGPU is a viable way to to massively parallel work even on ahome PC

� will be further developed and refined, knowledge may be valuable

19 / 21

Page 20: GPGPU and Stream Computing

External Links: Weblinks Julian Fietkau

Weblinks

AMD Developer Central: Introduction to OpenCL™

Programminghttp://developer.amd.com/zones/openclzone/...-may-2010.aspx

GPGPU: OpenCL™ (Università di Catania)http://www.dmi.unict.it/~bilotta/gpgpu/notes/11-opencl.html

NVIDIA: PTX ISA Version 2.1http://developer.download.nvidia.com/compute/.../ptx_isa_2.1.pdf

AMD: High Level Programming for GPGPUhttp://coachk.cs.ucf.edu/courses/CDA6938/s08/AMD_IL.pdf

20 / 21

Page 21: GPGPU and Stream Computing

External Links: List of figures Julian Fietkau

List of figures

1 Moore’s Law – 2011, by Wgsimon via Wikimedia Commons, CC-BY-SA2 Stream Computing Example, by Kallistratos via German Wikipedia, public domain3 OpenCL – Simple Kernel Exec, by Joachim Weging, CC-BY-SA4 OpenCL – Problem Partitioning, by Joachim Weging, CC-BY-SA

21 / 21