HPCC Mid-Morning Break High Performance Computing on a GPU cluster Dirk Colbry, Ph.D. Research Specialist Institute for Cyber Enabled Discovery.

HPCC Mid-Morning Break

High Performance Computing on a GPU cluster

Dirk Colbry, Ph.D.

Research Specialist

Institute for Cyber Enabled Discovery

What is a GPU?

• Graphics Processing Unit• Originally designed to make

Video Games• Uses many processing cores to

parallelize the math required for real time game play.

• Early researchers made general programs that looked like graphics so they could run in the GPU.

• In 2006 nVidia released the CUDA programming interface to allow users to easily make scalable general purpose programs that run on the GPU (GPGPU).

GPU vs CPU

CPU and GPU working together

Running on the GPU

• Program Starts on the CPU Copy data to GPU (slow-ish) Run kernel threads on GPU (very fast) Copy results back to CPU (slow-ish)

• There are a lot of clever ways to fully utilize both the GPU and CPU.

Pros and Cons

• Benefits Lots of processing

cores. Works with the CPU

as a co-processor Very fast local

memory bandwidth Large online

community of developers

• Drawbacks Can be difficult to

program. Memory Transfers

between GPU and CPU are costly (time).

Cores typically run the same code.

gfx-000 Test hardware

• Single Quad core 2.4 Ghz Intel Processor.

• 8GB of CPU RAM• Three Nvidia GTX 280 Video cards:

1GB of ram per card 240 CUDA processing Cores per card 1.3 GHz Processor Clock Speed

• Total of 724 cores on a single machine

Installed Software on gfx-000

• Cuda toolkit 2.2 and 2.3 For programming in c/c++ and fortran

• cublas – Cuda version of blas libraries• cufft – Cuda version of fft libraries• pycuda – Python Cuda Interface• Zephyr – Molecular Dynamics Program

optimized for GPUs

Other Available Software

• OpenCL c/c++ interface

• Jacket Matlab GPU wrapper

• Lattice Boltzmann pde solver

• OpenVIDIA Machine Vision

• Many Many others

• Cuda Zone ~90 thousand cuda

developers. Lots of software

examples Developer Forms Tutorials

• http://www.nvidia.com/object/cuda_home.html

http://www.nvidia.com/object/cuda_home.html

http://www.nvidia.com/object/cuda_home.html

New GPU Cluster Buy-In

• Rack Units: 1U• CPU: 2x Intel Xeon E5530 Quad-Core 2.40GHZ• Memory: 18GB of Ram• Hard drive: 250GB disk for OS and Local

Scratch• Network: Ethernet only, (no Infiniband support)• GPU: Two Nvidia Tesla M1060 GPUs• Support: Four year, next business day hardware

support• Cost: $5,224

Each Nvidia Tesla M1060

• Number of Streaming Processor Cores 240• Frequency of processor cores 1.3 GHz• Single Precision peak floating point performance 933 gigaflops• Double Precision peak floating point performance 78 gigaflops• Dedicated Memory 4 GB GDDR3• Memory Speed 800 MHz• Memory Interface 512-bit • Memory Bandwidth 102 GB/sec• System Interface PCIe

What are we buying

• 240 cores * 2 GPUs + 4 cores * 2 CPUs = 488 Cores / node

• 31 Nodes (minimum) * 488 Cores / node = 15,128 cores in our new cluster

• However, 20 of these nodes are dedicated buy-in nodes so only 5368 cores will be available in the general cluster

HPCC Mid-Morning Break High Performance Computing on a GPU cluster Dirk Colbry, Ph.D. Research Specialist Institute for Cyber Enabled Discovery.

Documents

gpu vs cpu slide

gpu program

cuda processing cores

gpu gpgpu

cores node

gpus slide

frequency of processor

memory interface