Top Banner
Algorithm Engineering „GPGPU“ Stefan Edelkamp
39

Algorithm Engineering „GPGPU“ Stefan Edelkamp. Graphics Processing Units GPGPU = (GP)²U General Purpose Programming on the GPU „Parallelism for the.

Dec 31, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Algorithm Engineering „GPGPU“ Stefan Edelkamp. Graphics Processing Units  GPGPU = (GP)²U General Purpose Programming on the GPU  „Parallelism for the.

Algorithm Engineering

„GPGPU“

Stefan Edelkamp

Page 2: Algorithm Engineering „GPGPU“ Stefan Edelkamp. Graphics Processing Units  GPGPU = (GP)²U General Purpose Programming on the GPU  „Parallelism for the.

Graphics Processing Units

GPGPU = (GP)²U

General Purpose Programming on the GPU „Parallelism for the masses“ Application: Fourier-Transformation, Model Checking,

Bio-Informatics, see CUDA-ZONE

Page 3: Algorithm Engineering „GPGPU“ Stefan Edelkamp. Graphics Processing Units  GPGPU = (GP)²U General Purpose Programming on the GPU  „Parallelism for the.

Programming the Graphics Processing Unitwith Cuda

Page 4: Algorithm Engineering „GPGPU“ Stefan Edelkamp. Graphics Processing Units  GPGPU = (GP)²U General Purpose Programming on the GPU  „Parallelism for the.

Overview

Cluster / Multicore / GPU comparisonComputing on the GPUGPGPU languagesCUDASmall Example

Page 5: Algorithm Engineering „GPGPU“ Stefan Edelkamp. Graphics Processing Units  GPGPU = (GP)²U General Purpose Programming on the GPU  „Parallelism for the.

Overview

Cluster / Multicore / GPU comparisonComputing on the GPUGPGPU languagesCUDASmall Example

Page 6: Algorithm Engineering „GPGPU“ Stefan Edelkamp. Graphics Processing Units  GPGPU = (GP)²U General Purpose Programming on the GPU  „Parallelism for the.

Cluster / Multicore / GPU

Cluster systemmany unique systemseach one

one (or more) processors internal memory often HDD

communication over network slow compared to internal no shared memory

CPU RAM

HDD

CPU RAM

HDD

CPU RAM

HDD

Switch

Page 7: Algorithm Engineering „GPGPU“ Stefan Edelkamp. Graphics Processing Units  GPGPU = (GP)²U General Purpose Programming on the GPU  „Parallelism for the.

Cluster / Multicore / GPU

Multicore systemsmultiple CPUsRAMexternal memory on HDD communication over RAM

CPU1 CPU2

CPU4CPU3

RAM

HDD

Page 8: Algorithm Engineering „GPGPU“ Stefan Edelkamp. Graphics Processing Units  GPGPU = (GP)²U General Purpose Programming on the GPU  „Parallelism for the.

Cluster / Multicore / GPU

System with a Graphic Processing UnitMany (240) Parallel processing unitsHierarchical memory structure

RAM VideoRAM SharedRAM

Communication PCI BUS Graphics Card

GPU

SRAM VRAM RAM

CPU

Hard Disk Drive

Page 9: Algorithm Engineering „GPGPU“ Stefan Edelkamp. Graphics Processing Units  GPGPU = (GP)²U General Purpose Programming on the GPU  „Parallelism for the.

Overview

Cluster / Multicore / GPU comparisonComputing on the GPUGPGPU languagesCUDASmall Example

Page 10: Algorithm Engineering „GPGPU“ Stefan Edelkamp. Graphics Processing Units  GPGPU = (GP)²U General Purpose Programming on the GPU  „Parallelism for the.

Computing on the GPU

Hierarchical executionGroups

executed sequentially

Threads executed parallel lightweight (creation / switching nearly free)

one Kernel function executed by each thread

• Group 0

Page 11: Algorithm Engineering „GPGPU“ Stefan Edelkamp. Graphics Processing Units  GPGPU = (GP)²U General Purpose Programming on the GPU  „Parallelism for the.

Computing on the GPU

Hierarchical memoryVideo RAM

1 GB Comparable to RAM

Shared RAM in the GPU 16 KB Comparable to registers parallel access by threads

Graphic Card

GPUSRAM VideoRAM

Page 12: Algorithm Engineering „GPGPU“ Stefan Edelkamp. Graphics Processing Units  GPGPU = (GP)²U General Purpose Programming on the GPU  „Parallelism for the.

Beispielarchitektur G200 z.B. in 280GTX

Page 13: Algorithm Engineering „GPGPU“ Stefan Edelkamp. Graphics Processing Units  GPGPU = (GP)²U General Purpose Programming on the GPU  „Parallelism for the.

Beispielprobleme

Page 14: Algorithm Engineering „GPGPU“ Stefan Edelkamp. Graphics Processing Units  GPGPU = (GP)²U General Purpose Programming on the GPU  „Parallelism for the.

Ranking und Unranking mit Parity

Page 15: Algorithm Engineering „GPGPU“ Stefan Edelkamp. Graphics Processing Units  GPGPU = (GP)²U General Purpose Programming on the GPU  „Parallelism for the.

2-Bit BFS

Page 16: Algorithm Engineering „GPGPU“ Stefan Edelkamp. Graphics Processing Units  GPGPU = (GP)²U General Purpose Programming on the GPU  „Parallelism for the.

1-Bit BFS

Page 17: Algorithm Engineering „GPGPU“ Stefan Edelkamp. Graphics Processing Units  GPGPU = (GP)²U General Purpose Programming on the GPU  „Parallelism for the.

Schiebepuzzle

Page 18: Algorithm Engineering „GPGPU“ Stefan Edelkamp. Graphics Processing Units  GPGPU = (GP)²U General Purpose Programming on the GPU  „Parallelism for the.

Some Results…

Page 19: Algorithm Engineering „GPGPU“ Stefan Edelkamp. Graphics Processing Units  GPGPU = (GP)²U General Purpose Programming on the GPU  „Parallelism for the.

Weitere Resultate …

Page 20: Algorithm Engineering „GPGPU“ Stefan Edelkamp. Graphics Processing Units  GPGPU = (GP)²U General Purpose Programming on the GPU  „Parallelism for the.

Overview

Cluster / Multicore / GPU comparisonComputing on the GPUGPGPU languagesCUDASmall Example

Page 21: Algorithm Engineering „GPGPU“ Stefan Edelkamp. Graphics Processing Units  GPGPU = (GP)²U General Purpose Programming on the GPU  „Parallelism for the.

GPGPU Languages

RapidMindSupports MultiCore, ATI, NVIDIA and CellC++ analysed and compiled for target hardware

Accelerator (Microsoft)Library for .NET language

BrookGPU (Stanford University)Supports ATI, NVIDIAOwn Language, variant of ANSI C

Page 22: Algorithm Engineering „GPGPU“ Stefan Edelkamp. Graphics Processing Units  GPGPU = (GP)²U General Purpose Programming on the GPU  „Parallelism for the.

Overview

Cluster / Multicore / GPU comparisonComputing on the GPUProgramming languagesCUDASmall Example

Page 23: Algorithm Engineering „GPGPU“ Stefan Edelkamp. Graphics Processing Units  GPGPU = (GP)²U General Purpose Programming on the GPU  „Parallelism for the.

CUDA

Programming languageSimilar to CFile suffix .cuOwn compiler called nvccCan be linked to C

Page 24: Algorithm Engineering „GPGPU“ Stefan Edelkamp. Graphics Processing Units  GPGPU = (GP)²U General Purpose Programming on the GPU  „Parallelism for the.

CUDA

C++ code CUDA Code

Compile with GCC Compile with nvcc

Link with ld

Executable

Page 25: Algorithm Engineering „GPGPU“ Stefan Edelkamp. Graphics Processing Units  GPGPU = (GP)²U General Purpose Programming on the GPU  „Parallelism for the.

CUDA

Additional variable typesDim3 Int3Char3

Page 26: Algorithm Engineering „GPGPU“ Stefan Edelkamp. Graphics Processing Units  GPGPU = (GP)²U General Purpose Programming on the GPU  „Parallelism for the.

CUDA

Different types of functions__global__ invoked from host__device__ called from device

Different types of variables__device__ located in VRAM__shared__ located in SRAM

Page 27: Algorithm Engineering „GPGPU“ Stefan Edelkamp. Graphics Processing Units  GPGPU = (GP)²U General Purpose Programming on the GPU  „Parallelism for the.

CUDA

Calling the kernel functionname<<<dim3 grid, dim3 block>>>(...)

Grid dimensions (groups)Block dimensions (threads)

Page 28: Algorithm Engineering „GPGPU“ Stefan Edelkamp. Graphics Processing Units  GPGPU = (GP)²U General Purpose Programming on the GPU  „Parallelism for the.

CUDA

Memory handlingCudaMalloc(...) - allocating VRAMCudaMemcpy(...) - copying Memory CudaFree(...) - free VRAM

Page 29: Algorithm Engineering „GPGPU“ Stefan Edelkamp. Graphics Processing Units  GPGPU = (GP)²U General Purpose Programming on the GPU  „Parallelism for the.

CUDA

Distinguish threads blockDim – Number of all groupsblockIdx – Id of Group (starting with 0)threadIdx – Id of Thread (starting with

0)Id =

blockDim.x*blockIdx.x+threadIdx.x

Page 30: Algorithm Engineering „GPGPU“ Stefan Edelkamp. Graphics Processing Units  GPGPU = (GP)²U General Purpose Programming on the GPU  „Parallelism for the.

Overview

Cluster / Multicore / GPU comparisonComputing on the GPUProgramming languagesCUDASmall Example

Page 31: Algorithm Engineering „GPGPU“ Stefan Edelkamp. Graphics Processing Units  GPGPU = (GP)²U General Purpose Programming on the GPU  „Parallelism for the.

CUDA

void inc(int *a, int b, int N)

{

for (int i = 0; i<N; i++)

a[i] = a[i] + b;

}

void main()

{

...

inc(a,b,N);

}

__global__ void inc(int *a, int b, int N)

{

int id = blockDim.x*blockIdx.x+threadIdx.x;

if (id<N)

a[id] = a[id] + b;

}

void main()

{

...

int * a_d = CudaAlloc(N);

CudaMemCpy(a_d,a,N,HostToDevice);

dim3 dimBlock ( blocksize, 0, 0 );

dim3 dimGrid ( N / blocksize, 0, 0 );

inc<<<dimGrid,dimBlock>>>(a_d,b,N);

}

Page 32: Algorithm Engineering „GPGPU“ Stefan Edelkamp. Graphics Processing Units  GPGPU = (GP)²U General Purpose Programming on the GPU  „Parallelism for the.

Realworld Example

LTL Model checkingTraversing an implicit Graph G=(V,E)Vertices called statesEdges represented by transitionsDuplicate removal needed

Page 33: Algorithm Engineering „GPGPU“ Stefan Edelkamp. Graphics Processing Units  GPGPU = (GP)²U General Purpose Programming on the GPU  „Parallelism for the.

Realworld Example

External Model checkingGenerate Graph with external BFSEach BFS layer needs to be sorted

GPU proven to be fast in sorting

Page 34: Algorithm Engineering „GPGPU“ Stefan Edelkamp. Graphics Processing Units  GPGPU = (GP)²U General Purpose Programming on the GPU  „Parallelism for the.

Realworld Example

ChallengesMillions of states in one layerHuge state sizeFast access only in SRAMElements needs to be moved

Page 35: Algorithm Engineering „GPGPU“ Stefan Edelkamp. Graphics Processing Units  GPGPU = (GP)²U General Purpose Programming on the GPU  „Parallelism for the.

Realworld Example

Solutions:Gpuqsort

Qsort optimized for GPUs Intensive swapping in VRAM

Bitonic based sorting Fast for subgroupsConcatenating Groups slow

Page 36: Algorithm Engineering „GPGPU“ Stefan Edelkamp. Graphics Processing Units  GPGPU = (GP)²U General Purpose Programming on the GPU  „Parallelism for the.

Realworld Example

Our solutionStates S presorted by Hash H(S) Bucket sorted in SRAM by a Group

• VRAM

• SRAM

Page 37: Algorithm Engineering „GPGPU“ Stefan Edelkamp. Graphics Processing Units  GPGPU = (GP)²U General Purpose Programming on the GPU  „Parallelism for the.

Realworld Example

Our solutionOrder given by H(S),S

Page 38: Algorithm Engineering „GPGPU“ Stefan Edelkamp. Graphics Processing Units  GPGPU = (GP)²U General Purpose Programming on the GPU  „Parallelism for the.

Realworld Example

Results

Page 39: Algorithm Engineering „GPGPU“ Stefan Edelkamp. Graphics Processing Units  GPGPU = (GP)²U General Purpose Programming on the GPU  „Parallelism for the.

Questions???

Programming the GPU