Top Banner
CUDA Workshop, Week 4 NVVP, Existing Libraries, Q/A
34

CUDA Workshop, Week 4

Feb 23, 2016

Download

Documents

erek

CUDA Workshop, Week 4. NVVP, Existing Libraries, Q/A. Agenda. Text book / resources Eclipse Nsight , NVIDIA Visual Profiler Available libraries Questions Certificate dispersal (Optional) Multiple GPUs: Where’s Pixel-Waldo?. Text Book / Resources. Text book. - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript

CUDA Workshop, Week 4

CUDA Workshop, Week 4NVVP, Existing Libraries, Q/AAgendaText book / resourcesEclipse Nsight, NVIDIA Visual ProfilerAvailable librariesQuestionsCertificate dispersal(Optional) Multiple GPUs: Wheres Pixel-Waldo?Text Book / ResourcesText bookProgramming Massively Parallel Processors, A Hands on approachDavid Kirk, Wen-mei Hwu

Text Book / ResourcesNvidia developer zoneEarly access to updated drivers / updatesHeavily curated help forumRequires registration and approval (nearly automated)developer.nvidia.com

Text Book / ResourcesUS!Were pretty passionate about this GPU computing stuff.Collaboration is coolIf you think youve got a problem that can benefit from GPU computation we may have some ideas.

Eclipse Nsight, NVVPIDE with an Eclipse foundationCUDA aware syntax highlighting / suggestions / recognitionHooked into NVVP

Eclipse Nsight, NVVPDeep profiling of every aspect of GPU execution ( memory bandwidth, branch divergence, bank conflicts, compute / transfer overlap, and more! )Provides suggestions for optimizationGraphical view of GPU performance

Eclipse Nsight, NVVPNsight and NVVP are available on our cuda# machinesSsh X @

Nsight demo on Week 3 codeAvailable LibrariesWhy re-invent the wheel?There are many GPU enabled tools built on CUDA that are already availableThese tools have been extensively tested for efficiency and in most cases will outperform custom solutionsSome require CUDA-like code structureAvailable LibrariesLinear Algebra, cuBLASCUDA enabled basic linear algebra subroutinesGPU-accelerated version of the complete standard BLAS libraryProvided with the CUDA toolkit. Code examples are also providedCallable from C and FortranAvailable LibrariesLinear Algebra, cuBLAS

Available LibrariesLinear Algebra, cuBLAS

Available LibrariesLinear Algebra, CULA, MAGMA

CULA and MAGMA extend BLASCULA (Paid)CULA-dense: LAPACK and BLAS implementations, solvers, decompositions, basic matrix operationsCULA-sparse: sparse matrix specialized routines, specialized storage structures, iterative methodsMAGMA (Free, BSD) (Fortran Bindings)LAPACK and BLAS implementations, developed by the same dev. team as LAPACK.

Available LibrariesLinear Algebra, CULA, MAGMA

Available LibrariesLinear Algebra, CULA, MAGMA

Available LibrariesIMSL Fortran/C Numerical LibraryLarge collection of mathematical and statistical gpu-accelerated functionsFree evaluation, paid extensionhttp://www.roguewave.com/products/imsl-numerical-libraries/fortran-library.aspx

Available LibrariesImage/Signal Processing: NVIDIA Performance Primitives1900 Image processing and 600 signal processing algorithmsFree and provided with the CUDA toolkit, code examples included.Can be used in tandem with visualization libraries like OpenGL, DirectX.

Available LibrariesImage/Signal Processing: NVIDIA Performance Primitives

Available LibrariesCUDA without the CUDA:Thrust LibraryThrust is a high level interface to GPU computing.Offers template-interface access to sort, scan, reduce, etc.A production tested version is provided with the CUDA toolkit.

Available LibrariesCUDA without the CUDA:Thrust Library

Available LibrariesCUDA without the CUDA:Thrust Library

Available LibrariesCUDA without the CUDA:Thrust Library

Available LibrariesPython and CUDAPyCUDAPython interface to CUDA functions.Simply a collection of wrappers, but effective.NumbaPro (Paid)Announced this year at GTC 2013, native CUDA python compilerPython = 4th major cuda language

Available LibrariesR and CUDAR+GPUPackage with accelerated alternatives for common R statistical functionsRpud / rpudplusPackage with accelerated alternatives for common R statistical functionsRcuda Package with accelerated alternatives for common R statistical functions

Available LibrariesR and CUDA

Questions?Certificate DispersalMultiple GPUsWheres Pixel-Waldo?Motivation: Given two images which contain a unique suspect and a number of distinct bystanders, identify the suspect by pairwise comparison.

Multiple GPUsThis is hardWell simplify the problem by reducing the targets to pixel triples.

Multiple GPUs0: upload an image and a list to store targets to each GPU.GPU0GPU1f.bmps.bmp0 | 0 | 0 | 0 | 0 | 0 | Multiple GPUs1: Find all positions of potential targets (triples) within each image using both GPUS independently.GPU0GPU1f.bmps.bmp11 | 143 | 243 | 3 | 1632 | 54321 | Multiple GPUs2: Allow GPU0 to access GPU1 memory, use both images and target lists to compare potential suspects.GPU1f.bmps.bmp11 | 143 | 243 | PCI BusGPU03 | 1632 | 54321 | 0 | 0Multiple GPUs3: Print the positions of the single matching suspect.f.bmp11 | 143 | 243 | PCI BusGPU0CPU132 | 629Multiple GPUsWalk though the source code.

Things to note:This is un-optimized and known to be inefficient, but the concepts of asynchronous streams, GPU context switching, universal addressing, and peer-to-peer access are coveredSource code requires the tclap library to compile appropriately.Source code will be made available in a github repository after the workshop.