Top Banner
Massively Parallel GPU-friendly Algorithms for PET Szirmay-Kalos László, http://cg.iit.bme.hu , Budapest, University of Technology and Economics
33

Massively Parallel GPU-friendly Algorithms for · PDF fileMassively Parallel GPU-friendly Algorithms for PET ... University of Technology and Economics (GP)GPU: CUDA (OpenCL) ... Back

Mar 18, 2018

Download

Documents

truongkien
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Massively Parallel GPU-friendly Algorithms for · PDF fileMassively Parallel GPU-friendly Algorithms for PET ... University of Technology and Economics (GP)GPU: CUDA (OpenCL) ... Back

Massively Parallel GPU-friendly

Algorithms for PET

Szirmay-Kalos László, http://cg.iit.bme.hu,

Budapest, University of Technology and Economics

Page 2: Massively Parallel GPU-friendly Algorithms for · PDF fileMassively Parallel GPU-friendly Algorithms for PET ... University of Technology and Economics (GP)GPU: CUDA (OpenCL) ... Back

(GP)GPU: CUDA (OpenCL)

Multiprocessor N

Multiprocessor 2

Multiprocessor 1

Scalar

Processor

1

Shared memory

Scalar

Processor

M

Cache

Device memory

Instruction

Execution

Unit

• Massively parallel:

#threads > 104

• Independence:

synchronization and

write collisions should

be avoided

• SIMD: conditional

statements are not

welcome

• Coalesced memory

access

Page 3: Massively Parallel GPU-friendly Algorithms for · PDF fileMassively Parallel GPU-friendly Algorithms for PET ... University of Technology and Economics (GP)GPU: CUDA (OpenCL) ... Back

PET physics

Line Of Response

(LOR)

P

e-

e+PN

Page 4: Massively Parallel GPU-friendly Algorithms for · PDF fileMassively Parallel GPU-friendly Algorithms for PET ... University of Technology and Economics (GP)GPU: CUDA (OpenCL) ... Back

MedisonanoScan PET AnyScan PET

Page 5: Massively Parallel GPU-friendly Algorithms for · PDF fileMassively Parallel GPU-friendly Algorithms for PET ... University of Technology and Economics (GP)GPU: CUDA (OpenCL) ... Back

Maximum-likelihood reconstruction

x1 x2

x3 x4

Expected number of hits:

Maximize the probability of

the measurement data y:

A11

A13

A12

A14

xAy ~

414313

2121111~

xAxA

xAxAy

)}(~|Pr{maxarg xyyx

Page 6: Massively Parallel GPU-friendly Algorithms for · PDF fileMassively Parallel GPU-friendly Algorithms for PET ... University of Technology and Economics (GP)GPU: CUDA (OpenCL) ... Back

Iterative solution

Activity in voxels

Correction

Physics

Simulation

Measured

valuesExpected

values

Page 7: Massively Parallel GPU-friendly Algorithms for · PDF fileMassively Parallel GPU-friendly Algorithms for PET ... University of Technology and Economics (GP)GPU: CUDA (OpenCL) ... Back

Computational challenges

• Numbers of LORs and voxels: hundred millions!

• System matrix A: 1016 elements (PetaBytes)

– Probability that a positron of a voxel is detected by a LOR

– Patient dependent

– Not sparse if accurate simulation is needed

– Do not store, estimate on-the-fly

• Matrix elements are high dimensional integrals

– Monte Carlo quadrature

– Reuse of computation

– High performance (parallel) computational platform

• Minimize the effect of estimation error

Page 8: Massively Parallel GPU-friendly Algorithms for · PDF fileMassively Parallel GPU-friendly Algorithms for PET ... University of Technology and Economics (GP)GPU: CUDA (OpenCL) ... Back

Numerical integration

0 1

f: integrand

g: target density

samples

Page 9: Massively Parallel GPU-friendly Algorithms for · PDF fileMassively Parallel GPU-friendly Algorithms for PET ... University of Technology and Economics (GP)GPU: CUDA (OpenCL) ... Back

Quadrature error

f

g

f/g

importance sampling stratification

Page 10: Massively Parallel GPU-friendly Algorithms for · PDF fileMassively Parallel GPU-friendly Algorithms for PET ... University of Technology and Economics (GP)GPU: CUDA (OpenCL) ... Back

Effect of stratificationundersampling

oversampling

Page 11: Massively Parallel GPU-friendly Algorithms for · PDF fileMassively Parallel GPU-friendly Algorithms for PET ... University of Technology and Economics (GP)GPU: CUDA (OpenCL) ... Back

Direct physical simulationInput driven, scattering type algorithm• Thread = photon

• Photons scatter different number of times

• The same detector is hit: write collision

• Random memory access

• Cannot mimic the detectors

source detectors

Page 12: Massively Parallel GPU-friendly Algorithms for · PDF fileMassively Parallel GPU-friendly Algorithms for PET ... University of Technology and Economics (GP)GPU: CUDA (OpenCL) ... Back

GPU friendly approachOutput driven, gathering type

• Thread = importon

• SIMD: grouping importons

• No write collision: LOR-driven

• Cannot mimic the source

source detectors

Page 13: Massively Parallel GPU-friendly Algorithms for · PDF fileMassively Parallel GPU-friendly Algorithms for PET ... University of Technology and Economics (GP)GPU: CUDA (OpenCL) ... Back

Multiple Importance Sampling

f(x)

ga(x)gb(x)

f(xi)

ga(xi)

f(xi)

gb(xi)

f(xi)

ga(xi)+gb(xi)

Page 14: Massively Parallel GPU-friendly Algorithms for · PDF fileMassively Parallel GPU-friendly Algorithms for PET ... University of Technology and Economics (GP)GPU: CUDA (OpenCL) ... Back

Direct gamma photon contribution

Detector

module 1

Detector

module 2

voxel

LOR

vvx dd )(

5D Integration

• Accuracy for given

sample number

• Cost of a sample

l

Page 15: Massively Parallel GPU-friendly Algorithms for · PDF fileMassively Parallel GPU-friendly Algorithms for PET ... University of Technology and Economics (GP)GPU: CUDA (OpenCL) ... Back

Output- or LOR-driven sampling

Detector

module 1

Detector

module 2

LOR

lwuGD

Nlg

),(),(

2

ray

1

Pros:

• Gathering

• Thread coherence

• Texture coherence

• Uniform on detectors

• Low-cost samples

due to reuse

Cons:

• Cannot mimick activityl

l

u

w

Page 16: Massively Parallel GPU-friendly Algorithms for · PDF fileMassively Parallel GPU-friendly Algorithms for PET ... University of Technology and Economics (GP)GPU: CUDA (OpenCL) ... Back

Input or voxel-driven sampling

Detector

module 1

Detector

module 2

LOR

Pros:

• Can mimick activity

Cons:

• Write collisions

• Less coherence

vxD

ullxNlg

d)cos(

)(),(

2

voxel

2

l

u

Page 17: Massively Parallel GPU-friendly Algorithms for · PDF fileMassively Parallel GPU-friendly Algorithms for PET ... University of Technology and Economics (GP)GPU: CUDA (OpenCL) ... Back

Multiple Importance Sampling

voxel voxel

+

lwuGD

Nlg

),(),(

2

ray

1

),(),(),( 21

lglglg

vxD

ullxNlg

d)cos(

)(),(

2

voxel

2

Page 18: Massively Parallel GPU-friendly Algorithms for · PDF fileMassively Parallel GPU-friendly Algorithms for PET ... University of Technology and Economics (GP)GPU: CUDA (OpenCL) ... Back

Multiple Importance Sampling

Page 19: Massively Parallel GPU-friendly Algorithms for · PDF fileMassively Parallel GPU-friendly Algorithms for PET ... University of Technology and Economics (GP)GPU: CUDA (OpenCL) ... Back

Input-driven scattered photon

transport

• Monte Carlo simulation:

– Free path

– Absorption?

– Scattering direction

source absorption

voxelfetches

scattering

Page 20: Massively Parallel GPU-friendly Algorithms for · PDF fileMassively Parallel GPU-friendly Algorithms for PET ... University of Technology and Economics (GP)GPU: CUDA (OpenCL) ... Back

u s

Free Path Sampling

For special cross section functions (t),

it can be solved analytically.

Page 21: Massively Parallel GPU-friendly Algorithms for · PDF fileMassively Parallel GPU-friendly Algorithms for PET ... University of Technology and Economics (GP)GPU: CUDA (OpenCL) ... Back

Ray marching

• Complexity grows with the resolution

• Slow in high resolution low density media

)exp(1 si

i

u

Page 22: Massively Parallel GPU-friendly Algorithms for · PDF fileMassively Parallel GPU-friendly Algorithms for PET ... University of Technology and Economics (GP)GPU: CUDA (OpenCL) ... Back

Mix virtual particles to obtain a density

that can be solved analytically

photon

Virtual

collision

Real

collision

Virtual particleReal material

particle

40963 effective resolution

64 billion sample points

Page 23: Massively Parallel GPU-friendly Algorithms for · PDF fileMassively Parallel GPU-friendly Algorithms for PET ... University of Technology and Economics (GP)GPU: CUDA (OpenCL) ... Back

Output-driven single-scattered

photon transport with reuse

1z

1s

2s

1. Scattering points 2. Ray marching between

scattering points and detectors

1s

2s

1z

2z

3. Combination

1s

2s

Page 24: Massively Parallel GPU-friendly Algorithms for · PDF fileMassively Parallel GPU-friendly Algorithms for PET ... University of Technology and Economics (GP)GPU: CUDA (OpenCL) ... Back

Multiple Importance Sampling

LOR-driven

unscattered

contribution

Voxel-driven

unscattered

contribution

Photon

transport

LOR-driven

single scatter

0 scattering 0 scattering

0, 1, 2, …

scattering 1 scattering

MIS

Page 25: Massively Parallel GPU-friendly Algorithms for · PDF fileMassively Parallel GPU-friendly Algorithms for PET ... University of Technology and Economics (GP)GPU: CUDA (OpenCL) ... Back

Detector response

photon

scattering

absorption

Electronics hits

Problem:

The domain is 4 dimensional.

Page 26: Massively Parallel GPU-friendly Algorithms for · PDF fileMassively Parallel GPU-friendly Algorithms for PET ... University of Technology and Economics (GP)GPU: CUDA (OpenCL) ... Back

dxxwxXL )()(

1

0

))(( dttxXL

L

w

L

=

Quasi-Monte Carlo filtering

Page 27: Massively Parallel GPU-friendly Algorithms for · PDF fileMassively Parallel GPU-friendly Algorithms for PET ... University of Technology and Economics (GP)GPU: CUDA (OpenCL) ... Back

Detector Scattering Compensationw

ith

ou

t

with

Page 28: Massively Parallel GPU-friendly Algorithms for · PDF fileMassively Parallel GPU-friendly Algorithms for PET ... University of Technology and Economics (GP)GPU: CUDA (OpenCL) ... Back

Measured

values

voxels

Correction

Physics

Simulation

)()( xAy nn ~

Expected

values

Back projection

Monte Carlo simulation

or simplification

Random or

deterministic

estimation

1A

y

yA

x

xT

)(

T

)(

)(

n

n

n ~1

y~y

Ratio of

measured

and

computed

values

Page 29: Massively Parallel GPU-friendly Algorithms for · PDF fileMassively Parallel GPU-friendly Algorithms for PET ... University of Technology and Economics (GP)GPU: CUDA (OpenCL) ... Back

2D study

Me

asu

red

sin

og

ram

Page 30: Massively Parallel GPU-friendly Algorithms for · PDF fileMassively Parallel GPU-friendly Algorithms for PET ... University of Technology and Economics (GP)GPU: CUDA (OpenCL) ... Back

Backprojection with unbiased

forward projection

Contribution to xV

Samples

outlier1A

y

yA

x

xT

)(

T

)(

)(

n

n

n ~1

Ly~

Ly

Ly

LL yy ˆ/LL yy ~/

Ratio of

measured

and

computed

values

Page 31: Massively Parallel GPU-friendly Algorithms for · PDF fileMassively Parallel GPU-friendly Algorithms for PET ... University of Technology and Economics (GP)GPU: CUDA (OpenCL) ... Back

Reduce bias and outliers

• Averaging iteration:

• Metropolis iteration: Ignore outliers randomly

Acceptance with probability

nn / Ln

n

Ln

n

L yyy ˆ~)1(~ )1()(

}1,~/ˆmin{ )(n

LLL yya

Page 32: Massively Parallel GPU-friendly Algorithms for · PDF fileMassively Parallel GPU-friendly Algorithms for PET ... University of Technology and Economics (GP)GPU: CUDA (OpenCL) ... Back

Recons

3382 × 845 res

0.1 mm voxels

4322 × 654 res

1.3 mm voxels

Page 33: Massively Parallel GPU-friendly Algorithms for · PDF fileMassively Parallel GPU-friendly Algorithms for PET ... University of Technology and Economics (GP)GPU: CUDA (OpenCL) ... Back

Conclusions

• GPU is an effective tool for computing tens

of thousands of parallel threads having no

conditionals and collisions.

• The problem must be interpreted and

solved to keep this requirement in mind.

• Randomization (Monte Carlo) can help

structure the problem in this way.