1 (1) GPU Programming using OpenCL Blaise Tine School of Electrical and Computer Engineering Georgia Institute of Technology (2) Outline v What’s OpenCL? v The OpenCL Ecosystem v OpenCL Programming Model v OpenCL vs CUDA v OpenCL VertorAdd Sample v Compiling OpenCL Programs v Optimizing OpenCL Programs v Debugging OpenCL Programs v The SPIR Portable IL v Other Compute APIs (DirectX, C++ AMP, SyCL) v Resources
13
Embed
GPU Programming using OpenCLece8823-sy.ece.gatech.edu/wp-content/uploads/sites/550/2018/02/... · v Windows, MAC, Linux, ... C++ AMP, SyCL • Single source compute API • Exploit
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
1
(1)
GPU Programming using OpenCL
Blaise TineSchool of Electrical and Computer Engineering
Georgia Institute of Technology
(2)
Outline
v What’s OpenCL?v The OpenCL Ecosystemv OpenCL Programming Model v OpenCL vs CUDAv OpenCL VertorAdd Samplev Compiling OpenCL Programsv Optimizing OpenCL Programsv Debugging OpenCL Programsv The SPIR Portable ILv Other Compute APIs (DirectX, C++ AMP, SyCL)v Resources
2
(3)
What’s OpenCL?
• Low-level programming API for data parallel computation
v Platform API: device query and pipeline setup
v Runtime API: resources management + execution
• Cross-platform API
v Windows, MAC, Linux, Mobile, Web…
• Portable device targets
v CPUs, GPUs, FPGAs, DSPs, etc…
• Implementation based on C99
• Maintained by Kronos Group (www.khronos.org)
• Current version: 2.2 with C++ support (classes & templates)
(4)
OpenCL Implementations
3
(5)
OpenCL Front-End APIs
(6)
OpenCL Platform Model
• Multiple compute devices attached to a host processor
• Each compute device has multiple compute units
• Each compute unit has multiple processing elements
• Each processing element execute the same work-item within a compute unit in log steps.
An Introduction to the OpenCL Programming Model (2012)Jonathan Thompson, Kristofer Schlachter
4
(7)
OpenCL Execution Model
• A kernel is logical unit of instructions to be executed on a compute device.
• Kernels are executed in multi-dimensional index space: NDRange
• For every element of the index space a work-item is executed
• The index space is tiled into work-groups
• Work items within a workgroup are synchronized using barriers or fences
• Work Items IndexingOpenCL Terminology CUDA Terminologyget_num_groups() gridDim
get_local_size() blockDim
get_group_id() blockIdxget_local_id() threadIdx
get_global_id() blockIdx * blockDim + threadIdx
get_global_size() gridDim * blockDim
(12)
OpenCL vs CUDA (4)
• Threads SynchronizationOpenCL Terminology CUDA Terminologybarrier() __syncthreads()
No direct equivalent* __threadfence()mem_fence() __threadfence_block()No direct equivalent* __threadfence_system()
No direct equivalent* __syncwarp()
Read_mem_fence() No direct equivalent*
Write_mem_fence() No direct equivalent*
7
(13)
OpenCL vs CUDA (5)
• API Terminology
OpenCL Terminology CUDA TerminologyclGetContextInfo() cuDeviceGet()clCreateCommandQueue() No direct equivalent*clBuildProgram() No direct equivalent*clCreateKernel() No direct equivalent*clCreateBuffer() cuMemAlloc()clEnqueueWriteBuffer() cuMemcpyHtoD()clEnqueueReadBuffer() cuMemcpyDtoH()clSetKernelArg() No direct equivalent*clEnqueueNDRangeKernel() kernel<<<...>>>()clReleaseMemObj() cuMemFree()
(14)
OpenCL vs CUDA (6)
• Which is Best?
Strengths APIPerformance CUDA is better on Nvidia cards
Device Capabilities CUDA has an edgePortability CUDA is not portable