Top Banner
Programming with Programming with CUDA CUDA WS 08/09 WS 08/09 Lecture 3 Lecture 3 Thu, 30 Oct, 2008 Thu, 30 Oct, 2008
24

Programming with CUDA WS 08/09 Lecture 3 Thu, 30 Oct, 2008.

Dec 22, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Programming with CUDA WS 08/09 Lecture 3 Thu, 30 Oct, 2008.

Programming with Programming with CUDACUDAWS 08/09WS 08/09

Lecture 3Lecture 3Thu, 30 Oct, 2008Thu, 30 Oct, 2008

Page 2: Programming with CUDA WS 08/09 Lecture 3 Thu, 30 Oct, 2008.

PreviouslyPreviously

CUDA programming modelCUDA programming model– GPU as co-processorGPU as co-processor– Kernel definition and invocationKernel definition and invocation– Thread blocks – 1D, 2D, 3DThread blocks – 1D, 2D, 3D– Thread ID and Thread ID and threadIdxthreadIdx– Global/shared memory for threadsGlobal/shared memory for threads– Compute capabilityCompute capability

Page 3: Programming with CUDA WS 08/09 Lecture 3 Thu, 30 Oct, 2008.

TodayToday

Theory/practical course?Theory/practical course? CUDA programming modelCUDA programming model

– Limitations on number of threadsLimitations on number of threads– Grids of thread blocksGrids of thread blocks

Page 4: Programming with CUDA WS 08/09 Lecture 3 Thu, 30 Oct, 2008.

TodayToday

Theory/practical course?Theory/practical course?– The course is meant to be practicalThe course is meant to be practical– ProgrammingProgramming with CUDA with CUDA– Is that a problem for some of you?Is that a problem for some of you?– Should we change something?Should we change something?

Page 5: Programming with CUDA WS 08/09 Lecture 3 Thu, 30 Oct, 2008.

The CUDA Programming ModelThe CUDA Programming Model

(cont'd)(cont'd)

Page 6: Programming with CUDA WS 08/09 Lecture 3 Thu, 30 Oct, 2008.

Number of threadsNumber of threads

A kernel is executed on the device A kernel is executed on the device simultaneously by many threadssimultaneously by many threads

dim3 blockSize(Dx,Dy,Dz);dim3 blockSize(Dx,Dy,Dz);// for 1D block, Dy = 1// for 1D block, Dy = 1// for 2D block, Dz = 1// for 2D block, Dz = 1kernel<<<1,blockSize>>>(...)kernel<<<1,blockSize>>>(...)

– # threads =# threads = blockSize = Dx*Dy*Dz blockSize = Dx*Dy*Dz

Page 7: Programming with CUDA WS 08/09 Lecture 3 Thu, 30 Oct, 2008.

A bit about the A bit about the hardwarehardware The GPU consists of several The GPU consists of several

multiprocessorsmultiprocessors Each multiprocessor consists of Each multiprocessor consists of

several processorsseveral processors Each processor in a multiprocessor Each processor in a multiprocessor

has its local memory in the form of has its local memory in the form of registersregisters

All processors in a multiprocessor All processors in a multiprocessor have access to a have access to a shared memoryshared memory

Page 8: Programming with CUDA WS 08/09 Lecture 3 Thu, 30 Oct, 2008.
Page 9: Programming with CUDA WS 08/09 Lecture 3 Thu, 30 Oct, 2008.

Threads and Threads and processorsprocessors All threads in a block run on the All threads in a block run on the

same multiprocessor.same multiprocessor.– They might not all run at the same They might not all run at the same

timetime– Therefore, threads should be Therefore, threads should be

independent of each otherindependent of each other– __syncthreads()__syncthreads() causes all threads causes all threads

to reach the same execution point to reach the same execution point before carrying on.before carrying on.

Page 10: Programming with CUDA WS 08/09 Lecture 3 Thu, 30 Oct, 2008.

Threads and Threads and processorsprocessors How many threads can run on How many threads can run on

a multiprocessor?a multiprocessor?– how much memory the how much memory the

multiprocessor hasmultiprocessor has– how much memory does each thread how much memory does each thread

requirerequire

Page 11: Programming with CUDA WS 08/09 Lecture 3 Thu, 30 Oct, 2008.

Threads and Threads and processorsprocessors How many threads can a block How many threads can a block

have?have?– how much memory the how much memory the

multiprocessor hasmultiprocessor has– how much memory does each thread how much memory does each thread

requirerequire

Page 12: Programming with CUDA WS 08/09 Lecture 3 Thu, 30 Oct, 2008.

Grids of BlocksGrids of Blocks

What if I want to run more What if I want to run more threads?threads?– Call multiple blocks of threadsCall multiple blocks of threads– These form a These form a gridgrid of blocks of blocks

A grid can be 1D or 2DA grid can be 1D or 2D

Page 13: Programming with CUDA WS 08/09 Lecture 3 Thu, 30 Oct, 2008.

Grids of BlocksGrids of Blocks

Example of 1D gridExample of 1D gridInvoke (in main):Invoke (in main):

int N;int N;// assign some value to N// assign some value to Ndim3 blockDimension (N,N);dim3 blockDimension (N,N);kernel<<<N, blockDimension>>> (...);kernel<<<N, blockDimension>>> (...);

Example of 2D gridExample of 2D gridInvoke (in main):Invoke (in main):

int N;int N;// assign some value to N// assign some value to Ndim3 blockDimension (N,N);dim3 blockDimension (N,N);dim3 gridDimension (N,N);dim3 gridDimension (N,N);kernel<<<gridDimension, blockDimension>>> kernel<<<gridDimension, blockDimension>>>

(...);(...);

Page 14: Programming with CUDA WS 08/09 Lecture 3 Thu, 30 Oct, 2008.

Grids of BlocksGrids of Blocks

Invoking a grid:Invoking a grid:kernel<<<gridDimension, blockDimension>>> kernel<<<gridDimension, blockDimension>>>

(...);(...);

– # threads =# threads = gridDimension* gridDimension* blockDimensionblockDimension

Page 15: Programming with CUDA WS 08/09 Lecture 3 Thu, 30 Oct, 2008.
Page 16: Programming with CUDA WS 08/09 Lecture 3 Thu, 30 Oct, 2008.
Page 17: Programming with CUDA WS 08/09 Lecture 3 Thu, 30 Oct, 2008.

Accessing block Accessing block informationinformation Grids can be 1D or 2DGrids can be 1D or 2D The index of a block in a grid is The index of a block in a grid is

available through the available through the blockIdx blockIdx variablevariable

The dimension of a block is The dimension of a block is available through the available through the blockDim blockDim vairablevairable

Page 18: Programming with CUDA WS 08/09 Lecture 3 Thu, 30 Oct, 2008.

Arranging blocksArranging blocks

Threads in a block should be Threads in a block should be independent of other threads in independent of other threads in the blockthe block

Blocks in a grid should be Blocks in a grid should be independent of other blocks in the independent of other blocks in the gridgrid

Page 19: Programming with CUDA WS 08/09 Lecture 3 Thu, 30 Oct, 2008.

Memory available to Memory available to threadsthreads Each thread has a local memoryEach thread has a local memory Threads in a block share a shared Threads in a block share a shared

memorymemory All threads can access the global All threads can access the global

memorymemory

Page 20: Programming with CUDA WS 08/09 Lecture 3 Thu, 30 Oct, 2008.
Page 21: Programming with CUDA WS 08/09 Lecture 3 Thu, 30 Oct, 2008.

Memory available to Memory available to threadsthreads All threads have All threads have read-onlyread-only access access

to to constantconstant and and texturetexture memoriesmemories

Page 22: Programming with CUDA WS 08/09 Lecture 3 Thu, 30 Oct, 2008.
Page 23: Programming with CUDA WS 08/09 Lecture 3 Thu, 30 Oct, 2008.

Memory available to Memory available to threadsthreads An application is expected to An application is expected to

managemanage– global, constant and texture memory global, constant and texture memory

spacesspaces– Data transfer between host and Data transfer between host and

device memoriesdevice memories– (de)allocating host and device (de)allocating host and device

memorymemory

Page 24: Programming with CUDA WS 08/09 Lecture 3 Thu, 30 Oct, 2008.

Have a nice weekendHave a nice weekendSee you next timeSee you next time