Top Banner
A codelsss introduction to GPU parallelism Will Landau A review of GPU parallelism Examples of parallelism Vector addition Pairwise summation Matrix multiplication K-means clustering Markov chain Monte Carlo A codelsss introduction to GPU parallelism Will Landau Iowa State University September 23, 2013 Will Landau (Iowa State University) A codelsss introduction to GPU parallelism September 23, 2013 1 / 47
48

A codelsss introduction to GPU parallelism · CPU / GPU cooperation I The CPU (\host") is in charge. I The CPU sends computationally intensive instruction sets to the GPU (\device")

Sep 24, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: A codelsss introduction to GPU parallelism · CPU / GPU cooperation I The CPU (\host") is in charge. I The CPU sends computationally intensive instruction sets to the GPU (\device")

A codelsssintroduction toGPU parallelism

Will Landau

A review of GPUparallelism

Examples ofparallelism

Vector addition

Pairwise summation

Matrix multiplication

K-means clustering

Markov chain MonteCarlo

A codelsss introduction to GPU parallelism

Will Landau

Iowa State University

September 23, 2013

Will Landau (Iowa State University) A codelsss introduction to GPU parallelism September 23, 2013 1 / 47

Page 2: A codelsss introduction to GPU parallelism · CPU / GPU cooperation I The CPU (\host") is in charge. I The CPU sends computationally intensive instruction sets to the GPU (\device")

A codelsssintroduction toGPU parallelism

Will Landau

A review of GPUparallelism

Examples ofparallelism

Vector addition

Pairwise summation

Matrix multiplication

K-means clustering

Markov chain MonteCarlo

Outline

A review of GPU parallelism

Examples of parallelismVector additionPairwise summationMatrix multiplicationK-means clusteringMarkov chain Monte Carlo

Will Landau (Iowa State University) A codelsss introduction to GPU parallelism September 23, 2013 2 / 47

Page 3: A codelsss introduction to GPU parallelism · CPU / GPU cooperation I The CPU (\host") is in charge. I The CPU sends computationally intensive instruction sets to the GPU (\device")

A codelsssintroduction toGPU parallelism

Will Landau

A review of GPUparallelism

Examples ofparallelism

Vector addition

Pairwise summation

Matrix multiplication

K-means clustering

Markov chain MonteCarlo

A review of GPU parallelism

Outline

A review of GPU parallelism

Examples of parallelismVector additionPairwise summationMatrix multiplicationK-means clusteringMarkov chain Monte Carlo

Will Landau (Iowa State University) A codelsss introduction to GPU parallelism September 23, 2013 3 / 47

Page 4: A codelsss introduction to GPU parallelism · CPU / GPU cooperation I The CPU (\host") is in charge. I The CPU sends computationally intensive instruction sets to the GPU (\device")

A codelsssintroduction toGPU parallelism

Will Landau

A review of GPUparallelism

Examples ofparallelism

Vector addition

Pairwise summation

Matrix multiplication

K-means clustering

Markov chain MonteCarlo

A review of GPU parallelism

The single instruction, multiple data (SIMD)paradigm

I SIMD: apply the same command to multiple places in adataset.

f o r ( i = 0 ; i < 1e6 ; ++i )a [ i ] = b [ i ] + c [ i ] ;

I On CPUs, the iterations of the loop run sequentially.

I With GPUs, we can easily run all 1,000,000 iterationssimultaneously.

i = t h r e a d I d x . x ;a [ i ] = b [ i ] + c [ i ] ;

I We can similarly parallelize a lot more than just loops.

Will Landau (Iowa State University) A codelsss introduction to GPU parallelism September 23, 2013 4 / 47

Page 5: A codelsss introduction to GPU parallelism · CPU / GPU cooperation I The CPU (\host") is in charge. I The CPU sends computationally intensive instruction sets to the GPU (\device")

A codelsssintroduction toGPU parallelism

Will Landau

A review of GPUparallelism

Examples ofparallelism

Vector addition

Pairwise summation

Matrix multiplication

K-means clustering

Markov chain MonteCarlo

A review of GPU parallelism

CPU / GPU cooperationI The CPU (“host”) is in charge.I The CPU sends computationally intensive instruction

sets to the GPU (“device”) just like a human uses apocket calculator.

Will Landau (Iowa State University) A codelsss introduction to GPU parallelism September 23, 2013 5 / 47

Page 6: A codelsss introduction to GPU parallelism · CPU / GPU cooperation I The CPU (\host") is in charge. I The CPU sends computationally intensive instruction sets to the GPU (\device")

A codelsssintroduction toGPU parallelism

Will Landau

A review of GPUparallelism

Examples ofparallelism

Vector addition

Pairwise summation

Matrix multiplication

K-means clustering

Markov chain MonteCarlo

A review of GPU parallelism

How GPU parallelism works1. The CPU sends a command called a kernel to a GPU.2. The GPU executes several duplicate realizations of this

command, called threads.I These threads are grouped into bunches called blocks.I The sum total of all threads in a kernel is called a grid.

I Toy example:I CPU says: “Hey, GPU. Sum pairs of adjacent numbers.

Use the array, (1, 2, 3, 4, 5, 6, 7, 8).”I GPU thinks: “Sum pairs of adjacent numbers” is a

kernel that I need to apply to the array, (1, 2, 3, 4, 5, 6,8).

I The GPU spawns 2 blocks, each with 2 threads:

Block 0 1

Thread 0 1 0 1

Action 1 + 2 3 + 4 5 + 6 7 + 8

I I could have also used 1 block with 4 threads and giventhe threads different pairs of numbers.

Will Landau (Iowa State University) A codelsss introduction to GPU parallelism September 23, 2013 6 / 47

Page 7: A codelsss introduction to GPU parallelism · CPU / GPU cooperation I The CPU (\host") is in charge. I The CPU sends computationally intensive instruction sets to the GPU (\device")

A codelsssintroduction toGPU parallelism

Will Landau

A review of GPUparallelism

Examples ofparallelism

Vector addition

Pairwise summation

Matrix multiplication

K-means clustering

Markov chain MonteCarlo

A review of GPU parallelism

Will Landau (Iowa State University) A codelsss introduction to GPU parallelism September 23, 2013 7 / 47

Page 8: A codelsss introduction to GPU parallelism · CPU / GPU cooperation I The CPU (\host") is in charge. I The CPU sends computationally intensive instruction sets to the GPU (\device")

A codelsssintroduction toGPU parallelism

Will Landau

A review of GPUparallelism

Examples ofparallelism

Vector addition

Pairwise summation

Matrix multiplication

K-means clustering

Markov chain MonteCarlo

Examples of parallelism

Outline

A review of GPU parallelism

Examples of parallelismVector additionPairwise summationMatrix multiplicationK-means clusteringMarkov chain Monte Carlo

Will Landau (Iowa State University) A codelsss introduction to GPU parallelism September 23, 2013 8 / 47

Page 9: A codelsss introduction to GPU parallelism · CPU / GPU cooperation I The CPU (\host") is in charge. I The CPU sends computationally intensive instruction sets to the GPU (\device")

A codelsssintroduction toGPU parallelism

Will Landau

A review of GPUparallelism

Examples ofparallelism

Vector addition

Pairwise summation

Matrix multiplication

K-means clustering

Markov chain MonteCarlo

Examples of parallelism Vector addition

Vector addition

I Say I have 2 vectors,

a =

a1

a2...an

b =

b1

b2...bn

I I want to compute their component-wise sum,

c =

c1

c2...cn

=

a1 + b1

a2 + b2...

an + bn

Will Landau (Iowa State University) A codelsss introduction to GPU parallelism September 23, 2013 9 / 47

Page 10: A codelsss introduction to GPU parallelism · CPU / GPU cooperation I The CPU (\host") is in charge. I The CPU sends computationally intensive instruction sets to the GPU (\device")

A codelsssintroduction toGPU parallelism

Will Landau

A review of GPUparallelism

Examples ofparallelism

Vector addition

Pairwise summation

Matrix multiplication

K-means clustering

Markov chain MonteCarlo

Examples of parallelism Vector addition

Vector addition

Will Landau (Iowa State University) A codelsss introduction to GPU parallelism September 23, 2013 10 / 47

Page 11: A codelsss introduction to GPU parallelism · CPU / GPU cooperation I The CPU (\host") is in charge. I The CPU sends computationally intensive instruction sets to the GPU (\device")

A codelsssintroduction toGPU parallelism

Will Landau

A review of GPUparallelism

Examples ofparallelism

Vector addition

Pairwise summation

Matrix multiplication

K-means clustering

Markov chain MonteCarlo

Examples of parallelism Vector addition

Vector addition

Will Landau (Iowa State University) A codelsss introduction to GPU parallelism September 23, 2013 11 / 47

Page 12: A codelsss introduction to GPU parallelism · CPU / GPU cooperation I The CPU (\host") is in charge. I The CPU sends computationally intensive instruction sets to the GPU (\device")

A codelsssintroduction toGPU parallelism

Will Landau

A review of GPUparallelism

Examples ofparallelism

Vector addition

Pairwise summation

Matrix multiplication

K-means clustering

Markov chain MonteCarlo

Examples of parallelism Vector addition

Vector addition

Will Landau (Iowa State University) A codelsss introduction to GPU parallelism September 23, 2013 12 / 47

Page 13: A codelsss introduction to GPU parallelism · CPU / GPU cooperation I The CPU (\host") is in charge. I The CPU sends computationally intensive instruction sets to the GPU (\device")

A codelsssintroduction toGPU parallelism

Will Landau

A review of GPUparallelism

Examples ofparallelism

Vector addition

Pairwise summation

Matrix multiplication

K-means clustering

Markov chain MonteCarlo

Examples of parallelism Pairwise summation

Pairwise summation

I Let’s take the pairwise sum of the vector,

(5, 2,−3, 1, 1, 8, 2, 6)

using 1 block of 4 threads.

Will Landau (Iowa State University) A codelsss introduction to GPU parallelism September 23, 2013 13 / 47

Page 14: A codelsss introduction to GPU parallelism · CPU / GPU cooperation I The CPU (\host") is in charge. I The CPU sends computationally intensive instruction sets to the GPU (\device")

A codelsssintroduction toGPU parallelism

Will Landau

A review of GPUparallelism

Examples ofparallelism

Vector addition

Pairwise summation

Matrix multiplication

K-means clustering

Markov chain MonteCarlo

Examples of parallelism Pairwise summation

Pairwise summation

5 2 -3 1 1 8 2 6

Thread 1

6

0

Will Landau (Iowa State University) A codelsss introduction to GPU parallelism September 23, 2013 14 / 47

Page 15: A codelsss introduction to GPU parallelism · CPU / GPU cooperation I The CPU (\host") is in charge. I The CPU sends computationally intensive instruction sets to the GPU (\device")

A codelsssintroduction toGPU parallelism

Will Landau

A review of GPUparallelism

Examples ofparallelism

Vector addition

Pairwise summation

Matrix multiplication

K-means clustering

Markov chain MonteCarlo

Examples of parallelism Pairwise summation

Pairwise summation

5 2 -3 1 1 8 2 6

6 10

Thread 21

Will Landau (Iowa State University) A codelsss introduction to GPU parallelism September 23, 2013 15 / 47

Page 16: A codelsss introduction to GPU parallelism · CPU / GPU cooperation I The CPU (\host") is in charge. I The CPU sends computationally intensive instruction sets to the GPU (\device")

A codelsssintroduction toGPU parallelism

Will Landau

A review of GPUparallelism

Examples ofparallelism

Vector addition

Pairwise summation

Matrix multiplication

K-means clustering

Markov chain MonteCarlo

Examples of parallelism Pairwise summation

Pairwise summation

5 2 -3 1 1 8 2 6

6 10 -1

Thread 32

Will Landau (Iowa State University) A codelsss introduction to GPU parallelism September 23, 2013 16 / 47

Page 17: A codelsss introduction to GPU parallelism · CPU / GPU cooperation I The CPU (\host") is in charge. I The CPU sends computationally intensive instruction sets to the GPU (\device")

A codelsssintroduction toGPU parallelism

Will Landau

A review of GPUparallelism

Examples ofparallelism

Vector addition

Pairwise summation

Matrix multiplication

K-means clustering

Markov chain MonteCarlo

Examples of parallelism Pairwise summation

Pairwise summation

5 2 -3 1 1 8 2 6

Thread 3

6 10 -1 7

Will Landau (Iowa State University) A codelsss introduction to GPU parallelism September 23, 2013 17 / 47

Page 18: A codelsss introduction to GPU parallelism · CPU / GPU cooperation I The CPU (\host") is in charge. I The CPU sends computationally intensive instruction sets to the GPU (\device")

A codelsssintroduction toGPU parallelism

Will Landau

A review of GPUparallelism

Examples ofparallelism

Vector addition

Pairwise summation

Matrix multiplication

K-means clustering

Markov chain MonteCarlo

Examples of parallelism Pairwise summation

Pairwise summation

5 2 -3 1 1 8 2 6

6 10 -1 7

Synchronize threads

Will Landau (Iowa State University) A codelsss introduction to GPU parallelism September 23, 2013 18 / 47

Page 19: A codelsss introduction to GPU parallelism · CPU / GPU cooperation I The CPU (\host") is in charge. I The CPU sends computationally intensive instruction sets to the GPU (\device")

A codelsssintroduction toGPU parallelism

Will Landau

A review of GPUparallelism

Examples ofparallelism

Vector addition

Pairwise summation

Matrix multiplication

K-means clustering

Markov chain MonteCarlo

Examples of parallelism Pairwise summation

Synchronizing threads

I Synchronization: waiting for all parallel tasks to reacha checkpoint before allowing any of then to continue.

I Threads from the same block can be synchronized easily.I In general, do not try to synchronize threads from

different blocks. It’s possible, but extremely inefficient.

Will Landau (Iowa State University) A codelsss introduction to GPU parallelism September 23, 2013 19 / 47

Page 20: A codelsss introduction to GPU parallelism · CPU / GPU cooperation I The CPU (\host") is in charge. I The CPU sends computationally intensive instruction sets to the GPU (\device")

A codelsssintroduction toGPU parallelism

Will Landau

A review of GPUparallelism

Examples ofparallelism

Vector addition

Pairwise summation

Matrix multiplication

K-means clustering

Markov chain MonteCarlo

Examples of parallelism Pairwise summation

Pairwise summation

5 2 -3 1 1 8 2 6

6 10 -1 7

5Thread 0

Will Landau (Iowa State University) A codelsss introduction to GPU parallelism September 23, 2013 20 / 47

Page 21: A codelsss introduction to GPU parallelism · CPU / GPU cooperation I The CPU (\host") is in charge. I The CPU sends computationally intensive instruction sets to the GPU (\device")

A codelsssintroduction toGPU parallelism

Will Landau

A review of GPUparallelism

Examples ofparallelism

Vector addition

Pairwise summation

Matrix multiplication

K-means clustering

Markov chain MonteCarlo

Examples of parallelism Pairwise summation

Pairwise summation

5 2 -3 1 1 8 2 6

6 10 -1 7

5 17Thread 1

Will Landau (Iowa State University) A codelsss introduction to GPU parallelism September 23, 2013 21 / 47

Page 22: A codelsss introduction to GPU parallelism · CPU / GPU cooperation I The CPU (\host") is in charge. I The CPU sends computationally intensive instruction sets to the GPU (\device")

A codelsssintroduction toGPU parallelism

Will Landau

A review of GPUparallelism

Examples ofparallelism

Vector addition

Pairwise summation

Matrix multiplication

K-means clustering

Markov chain MonteCarlo

Examples of parallelism Pairwise summation

Pairwise summation

5 2 -3 1 1 8 2 6

6 10 -1 7

5 17

Synchronize Threads

Will Landau (Iowa State University) A codelsss introduction to GPU parallelism September 23, 2013 22 / 47

Page 23: A codelsss introduction to GPU parallelism · CPU / GPU cooperation I The CPU (\host") is in charge. I The CPU sends computationally intensive instruction sets to the GPU (\device")

A codelsssintroduction toGPU parallelism

Will Landau

A review of GPUparallelism

Examples ofparallelism

Vector addition

Pairwise summation

Matrix multiplication

K-means clustering

Markov chain MonteCarlo

Examples of parallelism Pairwise summation

Pairwise summation

5 2 -3 1 1 8 2 6

6 10 -1 7

Thread 022

5 17

Will Landau (Iowa State University) A codelsss introduction to GPU parallelism September 23, 2013 23 / 47

Page 24: A codelsss introduction to GPU parallelism · CPU / GPU cooperation I The CPU (\host") is in charge. I The CPU sends computationally intensive instruction sets to the GPU (\device")

A codelsssintroduction toGPU parallelism

Will Landau

A review of GPUparallelism

Examples ofparallelism

Vector addition

Pairwise summation

Matrix multiplication

K-means clustering

Markov chain MonteCarlo

Examples of parallelism Pairwise summation

Compare the pairwise sum to the sequential sum

I The pairwise sum requires only log2(n) sequential steps,while the sequential sum requires n− 1 sequential steps.

Will Landau (Iowa State University) A codelsss introduction to GPU parallelism September 23, 2013 24 / 47

Page 25: A codelsss introduction to GPU parallelism · CPU / GPU cooperation I The CPU (\host") is in charge. I The CPU sends computationally intensive instruction sets to the GPU (\device")

A codelsssintroduction toGPU parallelism

Will Landau

A review of GPUparallelism

Examples ofparallelism

Vector addition

Pairwise summation

Matrix multiplication

K-means clustering

Markov chain MonteCarlo

Examples of parallelism Pairwise summation

Reductions and scans

I ReductionsI Pairwise sum and pairwise multiplication are examples

of reductions.I Reduction: an algorithm that applies some binary

operation on a vector to produce a scalar.

I ScansI Scan (prefix sum): an operation on a vector that

produces a sequence of partial reductions.I Example: computing the sequence of partial sums in

pairwise fashion.

Will Landau (Iowa State University) A codelsss introduction to GPU parallelism September 23, 2013 25 / 47

Page 26: A codelsss introduction to GPU parallelism · CPU / GPU cooperation I The CPU (\host") is in charge. I The CPU sends computationally intensive instruction sets to the GPU (\device")

A codelsssintroduction toGPU parallelism

Will Landau

A review of GPUparallelism

Examples ofparallelism

Vector addition

Pairwise summation

Matrix multiplication

K-means clustering

Markov chain MonteCarlo

Examples of parallelism Matrix multiplication

Matrix multiplication

I Take an m × n matrix, A = (aij ), and an n × p matrix, B = (bjk ).

Compute C = A · B:

I Write A in terms of its rows: A =

a1.

...am.

where

ai. =[ai1 · · · ain

].

I Write B in terms of its columns: B =[b.1 · · · b.p

]where

b.k =

b1k

...bnk

I Compute C = A · B by taking the product of each row of A with

each column of B:

C = A · B =

(a1. · b.1) · · · (a1. · b.p)...

. . ....

(am. · b.1) · · · (am. · b.p)

Will Landau (Iowa State University) A codelsss introduction to GPU parallelism September 23, 2013 26 / 47

Page 27: A codelsss introduction to GPU parallelism · CPU / GPU cooperation I The CPU (\host") is in charge. I The CPU sends computationally intensive instruction sets to the GPU (\device")

A codelsssintroduction toGPU parallelism

Will Landau

A review of GPUparallelism

Examples ofparallelism

Vector addition

Pairwise summation

Matrix multiplication

K-means clustering

Markov chain MonteCarlo

Examples of parallelism Matrix multiplication

Parallelizing matrix multiplication

I Entry (i , k) of matrix C is

cik = ai1b1k︸ ︷︷ ︸+ ai2b2k︸ ︷︷ ︸+ · · ·+ ainbnk︸ ︷︷ ︸= ci1k + ci2k + · · ·+ cink

I Assign block (i , k) to compute cik .

1. Spawn n threads.2. Tell the j ’th thread to compute cijk = aij · bjk .3. Synchronize threads to make sure we have finished

calculating ci1k , ci2k , . . . , cink before continuing.4. Compute cik =

∑nj=1 cijk as a pairwise sum.

Will Landau (Iowa State University) A codelsss introduction to GPU parallelism September 23, 2013 27 / 47

Page 28: A codelsss introduction to GPU parallelism · CPU / GPU cooperation I The CPU (\host") is in charge. I The CPU sends computationally intensive instruction sets to the GPU (\device")

A codelsssintroduction toGPU parallelism

Will Landau

A review of GPUparallelism

Examples ofparallelism

Vector addition

Pairwise summation

Matrix multiplication

K-means clustering

Markov chain MonteCarlo

Examples of parallelism Matrix multiplication

Matrix multiplication

I Say I want to compute A · B, where:

A =

1 2−1 57 −9

B =

[8 8 73 5 2

]

I I write the multiplication as an array of products:

C =

([1 2

]·[

83

]) ([1 2

]·[

85

]) ([1 2

]·[

72

])([−1 5

]·[

83

]) ([−1 5

]·[

85

]) ([−1 5

]·[

72

])([

7 −9]·[

83

]) ([7 −9

]·[

85

]) ([7 −9

]·[

72

])

Will Landau (Iowa State University) A codelsss introduction to GPU parallelism September 23, 2013 28 / 47

Page 29: A codelsss introduction to GPU parallelism · CPU / GPU cooperation I The CPU (\host") is in charge. I The CPU sends computationally intensive instruction sets to the GPU (\device")

A codelsssintroduction toGPU parallelism

Will Landau

A review of GPUparallelism

Examples ofparallelism

Vector addition

Pairwise summation

Matrix multiplication

K-means clustering

Markov chain MonteCarlo

Examples of parallelism Matrix multiplication

Matrix multiplication

I We don’t need to synchronize blocks because theyoperate independently.

Will Landau (Iowa State University) A codelsss introduction to GPU parallelism September 23, 2013 29 / 47

Page 30: A codelsss introduction to GPU parallelism · CPU / GPU cooperation I The CPU (\host") is in charge. I The CPU sends computationally intensive instruction sets to the GPU (\device")

A codelsssintroduction toGPU parallelism

Will Landau

A review of GPUparallelism

Examples ofparallelism

Vector addition

Pairwise summation

Matrix multiplication

K-means clustering

Markov chain MonteCarlo

Examples of parallelism Matrix multiplication

Matrix multiplication

I Consider block (0, 0), which computes[1 2

]·[

83

]

Will Landau (Iowa State University) A codelsss introduction to GPU parallelism September 23, 2013 30 / 47

Page 31: A codelsss introduction to GPU parallelism · CPU / GPU cooperation I The CPU (\host") is in charge. I The CPU sends computationally intensive instruction sets to the GPU (\device")

A codelsssintroduction toGPU parallelism

Will Landau

A review of GPUparallelism

Examples ofparallelism

Vector addition

Pairwise summation

Matrix multiplication

K-means clustering

Markov chain MonteCarlo

Examples of parallelism Matrix multiplication

Matrix multiplication

Will Landau (Iowa State University) A codelsss introduction to GPU parallelism September 23, 2013 31 / 47

Page 32: A codelsss introduction to GPU parallelism · CPU / GPU cooperation I The CPU (\host") is in charge. I The CPU sends computationally intensive instruction sets to the GPU (\device")

A codelsssintroduction toGPU parallelism

Will Landau

A review of GPUparallelism

Examples ofparallelism

Vector addition

Pairwise summation

Matrix multiplication

K-means clustering

Markov chain MonteCarlo

Examples of parallelism Matrix multiplication

Matrix multiplication

Will Landau (Iowa State University) A codelsss introduction to GPU parallelism September 23, 2013 32 / 47

Page 33: A codelsss introduction to GPU parallelism · CPU / GPU cooperation I The CPU (\host") is in charge. I The CPU sends computationally intensive instruction sets to the GPU (\device")

A codelsssintroduction toGPU parallelism

Will Landau

A review of GPUparallelism

Examples ofparallelism

Vector addition

Pairwise summation

Matrix multiplication

K-means clustering

Markov chain MonteCarlo

Examples of parallelism K-means clustering

Lloyd’s K-means algorithm

I Cluster N vectors in Euclidian space into K groups.

Will Landau (Iowa State University) A codelsss introduction to GPU parallelism September 23, 2013 33 / 47

Page 34: A codelsss introduction to GPU parallelism · CPU / GPU cooperation I The CPU (\host") is in charge. I The CPU sends computationally intensive instruction sets to the GPU (\device")

A codelsssintroduction toGPU parallelism

Will Landau

A review of GPUparallelism

Examples ofparallelism

Vector addition

Pairwise summation

Matrix multiplication

K-means clustering

Markov chain MonteCarlo

Examples of parallelism K-means clustering

Step 1: choose initial cluster centers.

I The circles are the cluster means, the squares are thedata points, and the color indicates the cluster.

Will Landau (Iowa State University) A codelsss introduction to GPU parallelism September 23, 2013 34 / 47

Page 35: A codelsss introduction to GPU parallelism · CPU / GPU cooperation I The CPU (\host") is in charge. I The CPU sends computationally intensive instruction sets to the GPU (\device")

A codelsssintroduction toGPU parallelism

Will Landau

A review of GPUparallelism

Examples ofparallelism

Vector addition

Pairwise summation

Matrix multiplication

K-means clustering

Markov chain MonteCarlo

Examples of parallelism K-means clustering

Step 2: assign each data point (square) to itsclosest center (circle).

Will Landau (Iowa State University) A codelsss introduction to GPU parallelism September 23, 2013 35 / 47

Page 36: A codelsss introduction to GPU parallelism · CPU / GPU cooperation I The CPU (\host") is in charge. I The CPU sends computationally intensive instruction sets to the GPU (\device")

A codelsssintroduction toGPU parallelism

Will Landau

A review of GPUparallelism

Examples ofparallelism

Vector addition

Pairwise summation

Matrix multiplication

K-means clustering

Markov chain MonteCarlo

Examples of parallelism K-means clustering

Step 3: update the cluster centers to be thewithin-cluster data means.

Will Landau (Iowa State University) A codelsss introduction to GPU parallelism September 23, 2013 36 / 47

Page 37: A codelsss introduction to GPU parallelism · CPU / GPU cooperation I The CPU (\host") is in charge. I The CPU sends computationally intensive instruction sets to the GPU (\device")

A codelsssintroduction toGPU parallelism

Will Landau

A review of GPUparallelism

Examples ofparallelism

Vector addition

Pairwise summation

Matrix multiplication

K-means clustering

Markov chain MonteCarlo

Examples of parallelism K-means clustering

Repeat step 2: reassign points to their closestcluster centers.

I . . . and repeat until convergence.

Will Landau (Iowa State University) A codelsss introduction to GPU parallelism September 23, 2013 37 / 47

Page 38: A codelsss introduction to GPU parallelism · CPU / GPU cooperation I The CPU (\host") is in charge. I The CPU sends computationally intensive instruction sets to the GPU (\device")

A codelsssintroduction toGPU parallelism

Will Landau

A review of GPUparallelism

Examples ofparallelism

Vector addition

Pairwise summation

Matrix multiplication

K-means clustering

Markov chain MonteCarlo

Examples of parallelism K-means clustering

Parallel K-means

I Step 2: assign points to closest cluster centers.I Spawn N blocks with K threads each.I Let thread (n, k) compute the distance between data

point n and cluster center k .I Synchronize threads.I Let thread (n, 1) assign data point n to its nearest

cluster center.

I Step 3: recompute cluster centers.I Spawn one block for each cluster.I Within each block, compute the mean of the data in the

corresponding cluster.

Will Landau (Iowa State University) A codelsss introduction to GPU parallelism September 23, 2013 38 / 47

Page 39: A codelsss introduction to GPU parallelism · CPU / GPU cooperation I The CPU (\host") is in charge. I The CPU sends computationally intensive instruction sets to the GPU (\device")

A codelsssintroduction toGPU parallelism

Will Landau

A review of GPUparallelism

Examples ofparallelism

Vector addition

Pairwise summation

Matrix multiplication

K-means clustering

Markov chain MonteCarlo

Examples of parallelism Markov chain Monte Carlo

Markov chain Monte CarloI Consider a bladder cancer data set:

I Available from http://ratecalc.cancer.gov/.I Rates of death from bladder cancer of white males from 2000 to

2004 in each county in the USA.

I Let:I yk = number of observed deaths in county k.I nk = the number of person-years in county k divided by 100,000.I θk = expected number of deaths per 100,000 person-years.

I The model:

ykind∼ Poisson(nk · θk)

θkiid∼ Gamma(α, β)

α ∼ Uniform(0, a0)

β ∼ Uniform(0, b0)

I Also assume α and β are independent and fix a0 and b0.

Will Landau (Iowa State University) A codelsss introduction to GPU parallelism September 23, 2013 39 / 47

Page 40: A codelsss introduction to GPU parallelism · CPU / GPU cooperation I The CPU (\host") is in charge. I The CPU sends computationally intensive instruction sets to the GPU (\device")

A codelsssintroduction toGPU parallelism

Will Landau

A review of GPUparallelism

Examples ofparallelism

Vector addition

Pairwise summation

Matrix multiplication

K-means clustering

Markov chain MonteCarlo

Examples of parallelism Markov chain Monte Carlo

Full conditional distributionsI We want to sample from the joint posterior,

p(θ, α, β | y) ∝ p(y | θ, α, β)p(θ, α, β)

∝ p(y | θ, α, β)p(θ | α, β)p(α, β)

∝ p(y | θ, α, β)p(θ | α, β)p(α)p(β)

∝K∏

k=1

[p(yk | θk , nk )p(θk | α, β)]p(α)p(β)

∝K∏

k=1

[e−nkθk θ

ykk

βα

Γ(α)θα−1k e−θkβ

]I (0 < α < a0)I (0 < β < b0)

I We iteratively sample from the full conditional distributions.

α← p(α | y ,θ, β)

β ← p(β | y ,θ, α)

θk ← p(θk | y ,θ−k , α, β) ⇐ IN PARALLEL!

Will Landau (Iowa State University) A codelsss introduction to GPU parallelism September 23, 2013 40 / 47

Page 41: A codelsss introduction to GPU parallelism · CPU / GPU cooperation I The CPU (\host") is in charge. I The CPU sends computationally intensive instruction sets to the GPU (\device")

A codelsssintroduction toGPU parallelism

Will Landau

A review of GPUparallelism

Examples ofparallelism

Vector addition

Pairwise summation

Matrix multiplication

K-means clustering

Markov chain MonteCarlo

Examples of parallelism Markov chain Monte Carlo

Full conditional distributions

p(θk | y ,θ−k , α, β) ∝ p(θ, α, β | y)

∝ e−nkθkθykk θα−1k e−θkβ

= θyk+α−1k e−θk (nk+β)

∝ Gamma(yk + α, nk + β)

Will Landau (Iowa State University) A codelsss introduction to GPU parallelism September 23, 2013 41 / 47

Page 42: A codelsss introduction to GPU parallelism · CPU / GPU cooperation I The CPU (\host") is in charge. I The CPU sends computationally intensive instruction sets to the GPU (\device")

A codelsssintroduction toGPU parallelism

Will Landau

A review of GPUparallelism

Examples ofparallelism

Vector addition

Pairwise summation

Matrix multiplication

K-means clustering

Markov chain MonteCarlo

Examples of parallelism Markov chain Monte Carlo

Conditional distributions of α and β

p(α | y ,θ, β) ∝ p(θ, α, β | y)

∝K∏

k=1

[θα−1k

βα

Γ(α)

]I (0 < α < a0)

=

(K∏

k=1

θk

βKαΓ(α)−K I (0 < α < a0)

p(β | y ,θ, α) ∝ p(θ, α, β | y)

∝K∏

k=1

[e−θkββα

]I (0 < β < b0)

= βKαe−β∑K

k=1 θk I (0 < β < b0)

∝ Gamma

(Kα + 1,

K∑k=1

θk

)I (0 < β < b0)

Will Landau (Iowa State University) A codelsss introduction to GPU parallelism September 23, 2013 42 / 47

Page 43: A codelsss introduction to GPU parallelism · CPU / GPU cooperation I The CPU (\host") is in charge. I The CPU sends computationally intensive instruction sets to the GPU (\device")

A codelsssintroduction toGPU parallelism

Will Landau

A review of GPUparallelism

Examples ofparallelism

Vector addition

Pairwise summation

Matrix multiplication

K-means clustering

Markov chain MonteCarlo

Examples of parallelism Markov chain Monte Carlo

Summarizing the Gibbs sampler

1. Sample θ from from its full conditional.I Draw the θk ’s in parallel from independent

Gamma(yk + α, nk + β) distributions.I In other words, assign each thread to draw an individualθk from its Gamma(yk + α, nk + β) distribution.

2. Sample α from its full conditional using a random walkMetropolis step.

3. Sample β from its full conditional (truncated Gamma)using the inverse cdf method if b0 is low or anon-truncated Gamma if b0 is high.

Will Landau (Iowa State University) A codelsss introduction to GPU parallelism September 23, 2013 43 / 47

Page 44: A codelsss introduction to GPU parallelism · CPU / GPU cooperation I The CPU (\host") is in charge. I The CPU sends computationally intensive instruction sets to the GPU (\device")

A codelsssintroduction toGPU parallelism

Will Landau

A review of GPUparallelism

Examples ofparallelism

Vector addition

Pairwise summation

Matrix multiplication

K-means clustering

Markov chain MonteCarlo

Examples of parallelism Markov chain Monte Carlo

Preview: a bare bones CUDA C workflow

#i n c l u d e <s t d i o . h>#i n c l u d e < s t d l i b . h>#i n c l u d e <cuda . h>#i n c l u d e <cuda run t ime . h>

g l o b a l v o i d s ome ke rn e l ( . . . ) { . . . }

i n t main ( vo i d ) {// Dec l a r e a l l v a r i a b l e s .. . .// A l l o c a t e hos t memory .. . .// Dynamica l l y a l l o c a t e d e v i c e memory f o r GPU

r e s u l t s .. . .// Write to hos t memory .. . .// Copy hos t memory to d e v i c e memory .. . .

Will Landau (Iowa State University) A codelsss introduction to GPU parallelism September 23, 2013 44 / 47

Page 45: A codelsss introduction to GPU parallelism · CPU / GPU cooperation I The CPU (\host") is in charge. I The CPU sends computationally intensive instruction sets to the GPU (\device")

A codelsssintroduction toGPU parallelism

Will Landau

A review of GPUparallelism

Examples ofparallelism

Vector addition

Pairwise summation

Matrix multiplication

K-means clustering

Markov chain MonteCarlo

Examples of parallelism Markov chain Monte Carlo

Preview: a bare bones CUDA C workflow

// Execute k e r n e l on the d e v i c e .some ke rne l<<< num blocks , num thead s pe r b l o ck

>>>(...) ;

// Write GPU r e s u l t s i n d e v i c e memory back tohos t memory .

. . .// Free dynam ica l l y−a l l o c a t e d hos t memory. . .// Free dynam ica l l y−a l l o c a t e d d e v i c e memory. . .

}

Will Landau (Iowa State University) A codelsss introduction to GPU parallelism September 23, 2013 45 / 47

Page 46: A codelsss introduction to GPU parallelism · CPU / GPU cooperation I The CPU (\host") is in charge. I The CPU sends computationally intensive instruction sets to the GPU (\device")

A codelsssintroduction toGPU parallelism

Will Landau

A review of GPUparallelism

Examples ofparallelism

Vector addition

Pairwise summation

Matrix multiplication

K-means clustering

Markov chain MonteCarlo

Examples of parallelism Markov chain Monte Carlo

Outline

A review of GPU parallelism

Examples of parallelismVector additionPairwise summationMatrix multiplicationK-means clusteringMarkov chain Monte Carlo

Will Landau (Iowa State University) A codelsss introduction to GPU parallelism September 23, 2013 46 / 47

Page 47: A codelsss introduction to GPU parallelism · CPU / GPU cooperation I The CPU (\host") is in charge. I The CPU sends computationally intensive instruction sets to the GPU (\device")

A codelsssintroduction toGPU parallelism

Will Landau

A review of GPUparallelism

Examples ofparallelism

Vector addition

Pairwise summation

Matrix multiplication

K-means clustering

Markov chain MonteCarlo

Examples of parallelism Markov chain Monte Carlo

Resources

1. J. Sanders and E. Kandrot. CUDA by Example.Addison-Wesley, 2010.

2. Prof. Jarad Niemi’s STAT 544 lecture notes.

Will Landau (Iowa State University) A codelsss introduction to GPU parallelism September 23, 2013 47 / 47

Page 48: A codelsss introduction to GPU parallelism · CPU / GPU cooperation I The CPU (\host") is in charge. I The CPU sends computationally intensive instruction sets to the GPU (\device")

A codelsssintroduction toGPU parallelism

Will Landau

A review of GPUparallelism

Examples ofparallelism

Vector addition

Pairwise summation

Matrix multiplication

K-means clustering

Markov chain MonteCarlo

Examples of parallelism Markov chain Monte Carlo

That’s all for today.

I Series materials are available athttp://will-landau.com/gpu.

Will Landau (Iowa State University) A codelsss introduction to GPU parallelism September 23, 2013 48 / 47