Top Banner
Applications of Berkeley’s Dwarfs on Nvidia GPUs Seminar: Topics in High-Performance and Scientific Computing Team N2: Yang Zhang, Haiqing Wang 05.02.2015
40

Applications of Berkeley s Dwarfs on Nvidia GPUshpac.rwth-aachen.de/teaching/sem-hpsc-14/presentations/N2 slides.… · CUDA •Parallel computing platform and programming model for

May 20, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Applications of Berkeley s Dwarfs on Nvidia GPUshpac.rwth-aachen.de/teaching/sem-hpsc-14/presentations/N2 slides.… · CUDA •Parallel computing platform and programming model for

Applications of Berkeley’s Dwarfs on Nvidia GPUs

Seminar: Topics in High-Performance and Scientific Computing

Team N2: Yang Zhang, Haiqing Wang

05.02.2015

Page 2: Applications of Berkeley s Dwarfs on Nvidia GPUshpac.rwth-aachen.de/teaching/sem-hpsc-14/presentations/N2 slides.… · CUDA •Parallel computing platform and programming model for

Overview

05.02.2015 Team N2: Yang Zhang & Haiqing Wang Overview 2/37

CUDA

The Dwarfs

Summary

• Dynamic Programming

• Sparse Linear Algebra

• Unstructured Grids

• Combinational Logic

• Graphical Model

Page 3: Applications of Berkeley s Dwarfs on Nvidia GPUshpac.rwth-aachen.de/teaching/sem-hpsc-14/presentations/N2 slides.… · CUDA •Parallel computing platform and programming model for

CUDA

• Parallel computing platform and

programming model for GPGPU

• Supports various languages

including C/C++ and Fortran

• Lots of libraries available

(e.g. cuSPARSE, cuBLAS, NPP, etc…)

05.02.2015 Team N2: Yang Zhang & Haiqing Wang CUDA 3/37

Page 4: Applications of Berkeley s Dwarfs on Nvidia GPUshpac.rwth-aachen.de/teaching/sem-hpsc-14/presentations/N2 slides.… · CUDA •Parallel computing platform and programming model for

CUDA : Execution Model

• Each thread gets an ID

• Group of threads build a block

• Group of blocks build a grid

• Each thread executed by a core

• Each block executed by a SM• A block is further split into warps

• Blocks are independent of each other

05.02.2015 Team N2: Yang Zhang & Haiqing Wang CUDA: Execution Model 4/37

Page 5: Applications of Berkeley s Dwarfs on Nvidia GPUshpac.rwth-aachen.de/teaching/sem-hpsc-14/presentations/N2 slides.… · CUDA •Parallel computing platform and programming model for

CUDA : Memory Model

05.02.2015 Team N2: Yang Zhang & Haiqing Wang CUDA: Memory Model 5/37

• Each thread has a private local memory

• Each block has a shared memory • Allows communication between threads

• All thread can access the global memory

• Constant memory is a read-only memory

Page 6: Applications of Berkeley s Dwarfs on Nvidia GPUshpac.rwth-aachen.de/teaching/sem-hpsc-14/presentations/N2 slides.… · CUDA •Parallel computing platform and programming model for

Overview

05.02.2015 Team N2: Yang Zhang & Haiqing Wang Overview 6/37

CUDA

The Dwarfs

Summary

• Dynamic Programming

• Sparse Linear Algebra

• Unstructured Grids

• Combinational Logic

• Graphical Model

Page 7: Applications of Berkeley s Dwarfs on Nvidia GPUshpac.rwth-aachen.de/teaching/sem-hpsc-14/presentations/N2 slides.… · CUDA •Parallel computing platform and programming model for

Dynamic Programming [1] : Matrix Chain Product

05.02.2015 Team N2: Yang Zhang & Haiqing Wang Dynamic Programming: Matrix Chain Product 7/37

2*9*3+2*3*1+2*1*4+4*11*5+2*4*5=328

Goal: Minimize the total number of multiplications

((A1 A2 A3 A4) (A5 A6))

(A1 (A2 A3) (A4 A5) A6)

9*3*1+2*9*1+1*4*112*1*11+2*11*5=221

An example:

Page 8: Applications of Berkeley s Dwarfs on Nvidia GPUshpac.rwth-aachen.de/teaching/sem-hpsc-14/presentations/N2 slides.… · CUDA •Parallel computing platform and programming model for

05.02.2015 Team N2: Yang Zhang & Haiqing Wang Dynamic Programming: Algorithm 8/37

Dynamic Programming [1] : Algorithm

Page 9: Applications of Berkeley s Dwarfs on Nvidia GPUshpac.rwth-aachen.de/teaching/sem-hpsc-14/presentations/N2 slides.… · CUDA •Parallel computing platform and programming model for

05.02.2015 Team N2: Yang Zhang & Haiqing Wang Dynamic Programming: Algorithm 9/37

Table m: Table s:

(n=6)

Dynamic Programming [1] : Algorithm

Page 10: Applications of Berkeley s Dwarfs on Nvidia GPUshpac.rwth-aachen.de/teaching/sem-hpsc-14/presentations/N2 slides.… · CUDA •Parallel computing platform and programming model for

05.02.2015 Team N2: Yang Zhang & Haiqing Wang Dynamic Programming: Implementation 10/37

Computing is independent

Can be computed in parallel

Table m:(n=8)

Dynamic Programming [1] : Implementation

Page 11: Applications of Berkeley s Dwarfs on Nvidia GPUshpac.rwth-aachen.de/teaching/sem-hpsc-14/presentations/N2 slides.… · CUDA •Parallel computing platform and programming model for

05.02.2015 Team N2: Yang Zhang & Haiqing Wang Dynamic Programming: Implementation 11/37

Using three different Kernels:OneThreadPerOneEntry OneBlockPerOneEntryBlocksPerOneEntry

The number of (i,j) for each l The number of k for each (i,j) of each l

The amount of the computation for each l

the performance depends on various factors

Dynamic Programming [1] : Implementation

Page 12: Applications of Berkeley s Dwarfs on Nvidia GPUshpac.rwth-aachen.de/teaching/sem-hpsc-14/presentations/N2 slides.… · CUDA •Parallel computing platform and programming model for

05.02.2015 Team N2: Yang Zhang & Haiqing Wang Dynamic Programming: Implementation 12/37

Memory Mapping Direction:

OneThreadPerOneEntry

Allocates one Thread to compute one entry

e.g. 𝑚1,5,𝑚2,6,𝑚3,7, 𝑚4,8

each one is computed concurrently all use previous entries

Change Memory Mapping

Dynamic Programming [1] : Implementation

Page 13: Applications of Berkeley s Dwarfs on Nvidia GPUshpac.rwth-aachen.de/teaching/sem-hpsc-14/presentations/N2 slides.… · CUDA •Parallel computing platform and programming model for

05.02.2015 Team N2: Yang Zhang & Haiqing Wang Dynamic Programming: Implementation 13/37

CUDA Architecture:Stored in Global memory:

OneThreadPerOneEntry

Allocates one Thread to compute one entry

e.g. 𝑚1,5,𝑚2,6,𝑚3,7, 𝑚4,8

each one is computed concurrently by one coreall use previous entries in shared memory stored in Global memory after computing

Dynamic Programming [1] : Implementation

Page 14: Applications of Berkeley s Dwarfs on Nvidia GPUshpac.rwth-aachen.de/teaching/sem-hpsc-14/presentations/N2 slides.… · CUDA •Parallel computing platform and programming model for

05.02.2015 Team N2: Yang Zhang & Haiqing Wang Dynamic Programming: Implementation 14/37

OneBlockPerOneEntry

Allocates one Block to compute one entry

e.g. 𝑚1,5 = min1≤𝑘<5

(𝑚1,𝑘 + 𝑚𝑘+1,5 + 𝑝0𝑝𝑘𝑝5) is computed by one Streaming multiprocessor

each (𝑚1,𝑘 +𝑚𝑘+1,5 + 𝑝0𝑝𝑘𝑝5) is computed by one core

use another core for selection CUDA Architecture:

Dynamic Programming [1] : Implementation

Page 15: Applications of Berkeley s Dwarfs on Nvidia GPUshpac.rwth-aachen.de/teaching/sem-hpsc-14/presentations/N2 slides.… · CUDA •Parallel computing platform and programming model for

05.02.2015 Team N2: Yang Zhang & Haiqing Wang Dynamic Programming: Implementation 15/37

CUDA Architecture:

BlocksPerOneEntry

Allocates multiple Blocks to compute for one entry

e.g. 𝑚1,5 = min1≤𝑘<5

(𝑚1,𝑘 + 𝑚𝑘+1,5 + 𝑝0𝑝𝑘𝑝5) is computed by a few Streaming multiprocessors

each (𝑚1,𝑘 +𝑚𝑘+1,5 + 𝑝0𝑝𝑘𝑝5) is computed by one core but maybe from different Streaming multiprocessors

use another core in any Streaming multiprocessor for selection

Dynamic Programming [1] : Implementation

Page 16: Applications of Berkeley s Dwarfs on Nvidia GPUshpac.rwth-aachen.de/teaching/sem-hpsc-14/presentations/N2 slides.… · CUDA •Parallel computing platform and programming model for

05.02.2015 Team N2: Yang Zhang & Haiqing Wang Dynamic Programming: Evaluation 16/37

GPU:Nvidia GeForce GTX 480 with 480 processing cores (15 Streaming Multiprocessors which has 32 processing cores) 1.4GHz, 3GB memory.

Total time of each kernel for different number of threads and blocks(n = 16384)

Dynamic Programming [1] : Evaluation

Page 17: Applications of Berkeley s Dwarfs on Nvidia GPUshpac.rwth-aachen.de/teaching/sem-hpsc-14/presentations/N2 slides.… · CUDA •Parallel computing platform and programming model for

05.02.2015 Team N2: Yang Zhang & Haiqing Wang Dynamic Programming: Evaluation 17/37

GPU:Nvidia GeForce GTX 480 with 480 processing cores (15 Streaming Multiprocessors which has 32 processing cores) 1.4GHz, 3GB memory.

OneThreadPerOneEntry OneBlockPerOneEntry BlocksPerOneEntry

Fastest Kernel for different l

Running time with l of each kernel:

Dynamic Programming [1] : Evaluation

Page 18: Applications of Berkeley s Dwarfs on Nvidia GPUshpac.rwth-aachen.de/teaching/sem-hpsc-14/presentations/N2 slides.… · CUDA •Parallel computing platform and programming model for

05.02.2015 Team N2: Yang Zhang & Haiqing Wang Dynamic Programming: Evaluation GPU vs. CPU 18/37

GPU:Nvidia GeForce GTX 480 with 480 processing cores (15 Streaming Multiprocessors which has 32 processing cores) 1.4GHz, 3GB memory.(combination of three Kernels)

CPU:Intel Core i7 870, 2.93GHz, 8GB memory(sequential program in C language)

Total computing time for n = 16384

The speedup factor is unfair

Fastest Kernel for different l

Dynamic Programming [1] : Evaluation GPU vs. CPU

Page 19: Applications of Berkeley s Dwarfs on Nvidia GPUshpac.rwth-aachen.de/teaching/sem-hpsc-14/presentations/N2 slides.… · CUDA •Parallel computing platform and programming model for

Overview

05.02.2015 Team N2: Yang Zhang & Haiqing Wang Overview 19/37

CUDA

The Dwarfs

Summary

• Dynamic Programming

• Sparse Linear Algebra

• Unstructured Grids

• Combinational Logic

• Graphical Model

Page 20: Applications of Berkeley s Dwarfs on Nvidia GPUshpac.rwth-aachen.de/teaching/sem-hpsc-14/presentations/N2 slides.… · CUDA •Parallel computing platform and programming model for

Sparse Linear Algebra [2]

Goal:

Accelerate sparse matrix-matrix (SpMM) product on GPU

SpMM product: Compute 𝐶 = 𝐴𝐵 where A sparse matrix, B dense matrix

X

05.02.2015 Team N2: Yang Zhang & Haiqing Wang Sparse Linear Algebra 20/37

Page 21: Applications of Berkeley s Dwarfs on Nvidia GPUshpac.rwth-aachen.de/teaching/sem-hpsc-14/presentations/N2 slides.… · CUDA •Parallel computing platform and programming model for

Sparse Linear Algebra [2] : FastSpMM

Approach:

• Extension of the ELLR-T kernel called FastSpMM

• Relies on ELLPACK-R storage format

• Outperforms common libraries for SpMM (e.g. cuSPARSE)

05.02.2015 Team N2: Yang Zhang & Haiqing Wang Sparse Linear Algebra: FastSpMM 21/37

Page 22: Applications of Berkeley s Dwarfs on Nvidia GPUshpac.rwth-aachen.de/teaching/sem-hpsc-14/presentations/N2 slides.… · CUDA •Parallel computing platform and programming model for

Sparse Linear Algebra [2] : Evaluation SpMM

05.02.2015 Team N2: Yang Zhang & Haiqing Wang Sparse Linear Algebra: Evaluation SpMM 22/37

Three versions of SpMM routines evaluated on two Nvidia GPUs:

FastSpMM vs. ELLR-T (ELLPACK-R storage format) vs. cuSPARSE (CRS storage format)

GTX480 Tesla C2050 𝑁𝑥𝑁 test sparse matrices

Page 23: Applications of Berkeley s Dwarfs on Nvidia GPUshpac.rwth-aachen.de/teaching/sem-hpsc-14/presentations/N2 slides.… · CUDA •Parallel computing platform and programming model for

Sparse Linear Algebra [2] : Evaluation GPU vs. CPU

05.02.2015 Team N2: Yang Zhang & Haiqing Wang Sparse Linear Algebra: Evaluation GPU vs. CPU 23/37

GTX480 and Tesla C2050 using FastSpMM vs.

Intel Xeon E5640 with 4 cores using the MKL library

Runtimes (in seconds) on test matrices:

Speedups compared to CPU:

GTX480: 2,8 × − 6,2 × Tesla C2050: 1,7 × − 3,8 ×

Page 24: Applications of Berkeley s Dwarfs on Nvidia GPUshpac.rwth-aachen.de/teaching/sem-hpsc-14/presentations/N2 slides.… · CUDA •Parallel computing platform and programming model for

Overview

05.02.2015 Team N2: Yang Zhang & Haiqing Wang Overview 24/37

CUDA

The Dwarfs

Summary

• Dynamic Programming

• Sparse Linear Algebra

• Unstructured Grids

• Combinational Logic

• Graphical Model

Page 25: Applications of Berkeley s Dwarfs on Nvidia GPUshpac.rwth-aachen.de/teaching/sem-hpsc-14/presentations/N2 slides.… · CUDA •Parallel computing platform and programming model for

05.02.2015 Team N2: Yang Zhang & Haiqing Wang Unstructured Grids: Compressible Flows 25/37

Compressible flows simulation on 3-D unstructured grids

Compressible flows : fluid mechanics that deals with flows having significant changes in fluid density

An example : Subsonic Flow past a Sphere

Unstructured Grids [3] : Compressible Flows

Page 26: Applications of Berkeley s Dwarfs on Nvidia GPUshpac.rwth-aachen.de/teaching/sem-hpsc-14/presentations/N2 slides.… · CUDA •Parallel computing platform and programming model for

05.02.2015 Team N2: Yang Zhang & Haiqing Wang Unstructured Grids: DG Method 26/37

Discontinuous Galerkin (DG) method : in mathematics form a class of numerical methods for solving differential equations

DG method can be implemented in parallel

An example : Subsonic Flow past a Sphere

Unstructured Grids [3] : DG Method

Page 27: Applications of Berkeley s Dwarfs on Nvidia GPUshpac.rwth-aachen.de/teaching/sem-hpsc-14/presentations/N2 slides.… · CUDA •Parallel computing platform and programming model for

05.02.2015 Team N2: Yang Zhang & Haiqing Wang Unstructured Grids: Evaluation GPU vs. CPU 27/37

Timing measurements for subsonic flow past a sphere

GPU:NVIDIA Tesla K20c GPU containing 2496 multiprocessors(OpenACC-based program)

CPU:AMD Opteron 6128 CPU containing 16 cores(MPI-based parallel program)

Nelem: number of elements Ntime : number of time steps

Unstructured Grids [3] : Evaluation GPU vs. CPU

Page 28: Applications of Berkeley s Dwarfs on Nvidia GPUshpac.rwth-aachen.de/teaching/sem-hpsc-14/presentations/N2 slides.… · CUDA •Parallel computing platform and programming model for

Overview

05.02.2015 Team N2: Yang Zhang & Haiqing Wang Overview 28/37

CUDA

The Dwarfs

Summary

• Dynamic Programming

• Sparse Linear Algebra

• Unstructured Grids

• Combinational Logic

• Graphical Model

Page 29: Applications of Berkeley s Dwarfs on Nvidia GPUshpac.rwth-aachen.de/teaching/sem-hpsc-14/presentations/N2 slides.… · CUDA •Parallel computing platform and programming model for

Combinational Logic [4] : Parallel AES

Goal:

Efficient encryption/decryption of data streams on web server applications

Approach:

Design of a parallel AES on GPU

Two design choices:

• Fine-grained: Focus on thread-level parallelism• A lot of communication and synchronization

• Coarse-grained: Focus on higher-level parallelism i.e. blocks

05.02.2015 Team N2: Yang Zhang & Haiqing Wang Combinational Logic: Parallel AES 29/37

Page 30: Applications of Berkeley s Dwarfs on Nvidia GPUshpac.rwth-aachen.de/teaching/sem-hpsc-14/presentations/N2 slides.… · CUDA •Parallel computing platform and programming model for

Combinational Logic [4] : Evaluation

Comparison: Fine-grained vs coarse-grained on a Nvidia 8880 GT (112 cores)

05.02.2015 Team N2: Yang Zhang & Haiqing Wang Combinational Logic: Evaluation 30/37

Page 31: Applications of Berkeley s Dwarfs on Nvidia GPUshpac.rwth-aachen.de/teaching/sem-hpsc-14/presentations/N2 slides.… · CUDA •Parallel computing platform and programming model for

Combinational Logic [4] : Evaluation GPU vs. CPU

05.02.2015 Team N2: Yang Zhang & Haiqing Wang Combinational Logic: Evaluation GPU vs. CPU 31/37

Throughput (in Mbps) comparisons on two Nvidia GPUs and two high-end CPUs (in 2009):

• CPU implementation from the OpenSSL toolkit

Page 32: Applications of Berkeley s Dwarfs on Nvidia GPUshpac.rwth-aachen.de/teaching/sem-hpsc-14/presentations/N2 slides.… · CUDA •Parallel computing platform and programming model for

Overview

05.02.2015 Team N2: Yang Zhang & Haiqing Wang Overview 32/37

CUDA

The Dwarfs

Summary

• Dynamic Programming

• Sparse Linear Algebra

• Unstructured Grids

• Combinational Logic

• Graphical Model

Page 33: Applications of Berkeley s Dwarfs on Nvidia GPUshpac.rwth-aachen.de/teaching/sem-hpsc-14/presentations/N2 slides.… · CUDA •Parallel computing platform and programming model for

05.02.2015 Team N2: Yang Zhang & Haiqing Wang Graphical Model: Speech Recognition System 33/37

ANN:Artificial Neural Network

HMM:Hidden Markov Model

ANN Model:recognizing the acoustic in a time frame (a word or a phoneme)

HMM Model:warping and adjusting the whole acoustic combining these words or phonemes from ANN

Graphical Model [5] : Speech Recognition System

Page 34: Applications of Berkeley s Dwarfs on Nvidia GPUshpac.rwth-aachen.de/teaching/sem-hpsc-14/presentations/N2 slides.… · CUDA •Parallel computing platform and programming model for

05.02.2015 Team N2: Yang Zhang & Haiqing Wang Graphical Model: ANN Training 34/37

Input: A vector represents acoustic in a time frame

Output: A vector represents most possible relative word or phoneme

Hidden vector = Input vector × weight vector 1Output vector = Hidden vector × weight vector 2

Training is the process of adjusting weight vector 1 and weight vector 2

Graphical Model [5] : ANN Training

Inner product

Page 35: Applications of Berkeley s Dwarfs on Nvidia GPUshpac.rwth-aachen.de/teaching/sem-hpsc-14/presentations/N2 slides.… · CUDA •Parallel computing platform and programming model for

05.02.2015 Team N2: Yang Zhang & Haiqing Wang Graphical Model: Block ANN Training 35/37

Training can be solved by linear algebra

Input: A Matrixmade up of many input vectors

Output: A Matrixmade up of many output vectors

Hidden matrix = Input matrix × weight vector 1Output matrix = Hidden matrix × weight vector 2

Graphical Model [5] : Block ANN Training

Page 36: Applications of Berkeley s Dwarfs on Nvidia GPUshpac.rwth-aachen.de/teaching/sem-hpsc-14/presentations/N2 slides.… · CUDA •Parallel computing platform and programming model for

05.02.2015 Team N2: Yang Zhang & Haiqing Wang Graphical Model: Evaluation GPU vs. CPU 36/37

GPU:1600 MHz FSB, 8 GB RAM, NVIDIA GTX280 GPU

(CuBLAS library)

CPU: a quad-core 3.0 GHz CPU(Intel MKL library)

Training time, and relative speed-up for the WSJ0 corpus:

a speedup factor of 5

Graphical Model [5] : Evaluation GPU vs. CPU

Page 37: Applications of Berkeley s Dwarfs on Nvidia GPUshpac.rwth-aachen.de/teaching/sem-hpsc-14/presentations/N2 slides.… · CUDA •Parallel computing platform and programming model for

Summary

• What is it good for? • Provides extremely high parallelism

• Accelerates scientific computations by a considerable factor

• Reduce CPU workload

• Achieves high performance for low cost

• Learning curve? • Rather smooth since languages like C/C++ is supported

• But: Precise knowledge of hardware architecture necessary

• Given scalar α and two vectors 𝑥 and 𝑦:• operation 𝑥 ∶= 𝛼𝑥 + 𝑦 ? Easy to implement?

• Fairly easy: Basically C implementation with some added keywords and CPU/GPU memory management

Disclaimer: Some comparisons to CPU not really representative or not clearly specified

05.02.2015 Team N2: Yang Zhang & Haiqing Wang Summary 37/37

Page 38: Applications of Berkeley s Dwarfs on Nvidia GPUshpac.rwth-aachen.de/teaching/sem-hpsc-14/presentations/N2 slides.… · CUDA •Parallel computing platform and programming model for

References 1

[1] K. Nishida, Y. Ito, K. Nakano.

Accelerating the Dynamic Programming for the Matrix Chain Product on the GPU. Networking and Computing (ICNC), 2011 Second International Conference on, pp. 320-326, Nov. 30 2011-Dec. 2 2011

[2] F. Vazquez, G. Ortega, J. J. Fernandez, I.Garcia and E. M. Garzon.

Fast sparse matrix matrix product based on ELLR-T and GPU computing. Parallel and Distributed Processing with Applications (ISPA), 2012 IEEE 10th International Symposium on, pp. 669-674, 10-13 July 2012.

[3] Y. Xia, H. Luo, L. Luo, J. Edwards, J. Lou and F. Mueller.

OpenACC-based GPU Acceleration of a 3-D Unstructured Discontinuous Galerkin Method. 52nd Aerospace Sciences Meeting. January 2014.

05.02.2015 Team N2: Yang Zhang & Haiqing Wang References 1 Ref 1/2

Page 39: Applications of Berkeley s Dwarfs on Nvidia GPUshpac.rwth-aachen.de/teaching/sem-hpsc-14/presentations/N2 slides.… · CUDA •Parallel computing platform and programming model for

References 2

[4] A. di Biagio, A. Barenghi, G. Agosta, G. Pelosi.

Design of a Parallel AES for Graphics Hardware using the CUDA framework. Parallel & Distributed Processing, 2009. IPDPS 2009. IEEE International Symposium on, pp. 1-8, 23-29 May 2009.

[5] S. Scanzio, S. Cumani, R. Gemello, F. Mana, P. Laface.

Parallel implementation of artificial neural network training. Acoustics Speech and Signal Processing (ICASSP), 2010 IEEE International Conference on, pp. 4902-4905, 14-19 March 2010

Image Sources:

http://rtcmagazine.com/files/images/5985/RTC08-ERTW-Nvidia-FigX_original_large.jpg

https://www.pgroup.com/images/insider/v2n4a1i2.png

http://pic002.cnblogs.com/images/2011/63234/2011030722152125.png

http://3dgep.com/wp-content/uploads/2011/11/CUDA-memory-model.gif

05.02.2015 Team N2: Yang Zhang & Haiqing Wang References 2 Ref 2/2

Page 40: Applications of Berkeley s Dwarfs on Nvidia GPUshpac.rwth-aachen.de/teaching/sem-hpsc-14/presentations/N2 slides.… · CUDA •Parallel computing platform and programming model for

Credits

Yang Zhang: Haiqing Wang:

CUDA

Sparse Linear Algebra Dynamic Programming (in detail)

Combinational Logic Unstructured Grids

Summary Graphical Model

05.02.2015 Team N2: Yang Zhang & Haiqing Wang Credits