Top Banner
Mitglied der Helmholtz-Gemeinschaft Computation of Mutual Information Metric for Image Registration on Multiple GPUs Andrew V. Adinetz 1 , Markus Axer 2 , Marcel Huysegoms 2 , Stefan Köhnen 2 , Jiri Kraus 3 , Dirk Pleiter 1 26.08.2013 1 JSC, Forschungszentrum Jülich 2 INM-1, Forschungszentrum Jülich 3 NVIDIA GmbH
29

Computation of Mutual Information Metric for Image Registration on Multiple GPUs

Jan 18, 2016

Download

Documents

Lester

Computation of Mutual Information Metric for Image Registration on Multiple GPUs. Andrew V. Adinetz 1 , Markus Axer 2 , Marcel Huysegoms 2 , Stefan Köhnen 2 , Jiri Kraus 3 , Dirk Pleiter 1. 26.08.2013. 1 JSC, Forschungszentrum Jülich 2 INM-1, Forschungszentrum Jülich 3 NVIDIA GmbH. - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Computation  of Mutual Information  Metric for  Image  Registration  on Multiple  GPUs

Mitg

lied

der

Hel

mho

ltz-G

emei

nsch

aft

Computation of Mutual Information Metric for Image Registration on Multiple GPUs

Andrew V. Adinetz1, Markus Axer2, Marcel Huysegoms2, Stefan Köhnen2, Jiri Kraus3, Dirk Pleiter1

26.08.2013

1 JSC, Forschungszentrum Jülich2 INM-1, Forschungszentrum Jülich3 NVIDIA GmbH

Page 2: Computation  of Mutual Information  Metric for  Image  Registration  on Multiple  GPUs

Mitg

lied

der

Hel

mho

ltz-G

emei

nsch

aft

• Brain Image Registration• Multi-GPU Implementation

• system memory• listupdate

• Performance Evaluation• Conclusion

Outline

Page 3: Computation  of Mutual Information  Metric for  Image  Registration  on Multiple  GPUs

Mitg

lied

der

Hel

mho

ltz-G

emei

nsch

aft

Preparation of the brain

Page 4: Computation  of Mutual Information  Metric for  Image  Registration  on Multiple  GPUs

Mitg

lied

der

Hel

mho

ltz-G

emei

nsch

aft

BigBrain – first high-resolution brain model at microscopical scale

7404 histological sections stained for cell bodies scanned with a flad bed scanner original resolution 10 × 10 × 20 μm3 (11.000 × 13.000 pixels) downscaling to 20 μm isotropic removal of artifacts 1 Terabyte

in cooperation with Alan Evans, McGill, Montreal

Amunts et al. (2013) Science

Pushing the limits for a cellular brain model

Page 5: Computation  of Mutual Information  Metric for  Image  Registration  on Multiple  GPUs
Page 6: Computation  of Mutual Information  Metric for  Image  Registration  on Multiple  GPUs
Page 7: Computation  of Mutual Information  Metric for  Image  Registration  on Multiple  GPUs
Page 8: Computation  of Mutual Information  Metric for  Image  Registration  on Multiple  GPUs

Mitg

lied

der

Hel

mho

ltz-G

emei

nsch

aft

• The process of aligning images is called registration

Image Registration

ITK Workflow

Page 9: Computation  of Mutual Information  Metric for  Image  Registration  on Multiple  GPUs

Mitg

lied

der

Hel

mho

ltz-G

emei

nsch

aft

• i, j – pixel values (0 .. 255)

• successful for multi-modal registration

Mutual Information Metric

MI(I f ,Im ) = p(i, j)log2i, j

∑ p(i, j)

p f (i)pm ( j)

p f (i) = p(i, j)j

pm ( j) = p(i, j)i

Page 10: Computation  of Mutual Information  Metric for  Image  Registration  on Multiple  GPUs

Mitg

lied

der

Hel

mho

ltz-G

emei

nsch

aft

• main computational kernel• transform can be complex (1000+ parameters)• GPU implementation: 1 pixel/thread, atomics

Two Image Cross-Histogram

for(int y = 0; y < fixed_sz_y; y++) for(int x = 0; x < fixed_sz_x; x++) { int i = bin(fixed[x, y]); float x1 = transform_x(x, y); float y1 = transform_y(x, y); int j = bin(interpolate(moving, x1, y1)); histogram[i, j]++; // atomic on GPU }

Page 11: Computation  of Mutual Information  Metric for  Image  Registration  on Multiple  GPUs

Mitg

lied

der

Hel

mho

ltz-G

emei

nsch

aft

Large Data Size

size: 3.000 × 3.000 px

pixel size: 60 × 60 μm

file size: 30 MB

Large-area Polarimeter

size: 100.000 × 100.000 px

pixel size: 1.6 x 1.6 μm

file size: 40 GB

Polarizing Microscope

Page 12: Computation  of Mutual Information  Metric for  Image  Registration  on Multiple  GPUs

Mitg

lied

der

Hel

mho

ltz-G

emei

nsch

aft

• Domain decomposition• distribute fixed and moving images• histogram contributions summed up

• Moving image: how to handle?• irregular access pattern

• Approaches• System memory replication (sysmem)• Listupdate (listupdate)

Multi-GPU Mutual Information

Page 13: Computation  of Mutual Information  Metric for  Image  Registration  on Multiple  GPUs

Mitg

lied

der

Hel

mho

ltz-G

emei

nsch

aft

• Replicate entire moving image in pinned host RAM• accessible to GPU

+ easy to implement

– system memory accesses are slower

– cannot use texture interpolation

• Optimizations• moving image halo in GPU RAM

System Memory Replication

Page 14: Computation  of Mutual Information  Metric for  Image  Registration  on Multiple  GPUs

Mitg

lied

der

Hel

mho

ltz-G

emei

nsch

aft

• Processing• buffer remote accesses• exchange buffers• compute contributions remotely

+ computation-communication overlap

– hard to implement

– chunk processing (or won‘t fit into buffer)

• Optimizations• buffers: AoS vs. SoA, atomics vs. grouping• using multiple streams

Listupdatetypedef struct { float[2] movingCoords; short destRank; char fixedBin; } message_t;

Page 15: Computation  of Mutual Information  Metric for  Image  Registration  on Multiple  GPUs

Mitg

lied

der

Hel

mho

ltz-G

emei

nsch

aft

Chunk Processing and Overlap

Process chunk Group Exchange Handle messages

Process chunk Group Exchange

Process chunk Group1

2

Fixed ImageFixed Image

y

x(0,0)

Page 16: Computation  of Mutual Information  Metric for  Image  Registration  on Multiple  GPUs

Mitg

lied

der

Hel

mho

ltz-G

emei

nsch

aft

• atomics• each writing thread increments atomic counter

+ simpler

– atomics can be a bottleneck

– one buffer per receiver required

• grouping• each thread writes to fixed location• buffers grouped before sending

+ single buffer, less memory

+ optimized grouping (shared-memory atomics, prefix sum)

– more complicated (separate kernel required)

Buffer Writeout: Atomics vs. Group

Page 17: Computation  of Mutual Information  Metric for  Image  Registration  on Multiple  GPUs

Mitg

lied

der

Hel

mho

ltz-G

emei

nsch

aft

Benchmark setup

Fixed ImageFixed Image

y

Moving Image

x(0,0)

Remote access

Mask

Page 18: Computation  of Mutual Information  Metric for  Image  Registration  on Multiple  GPUs

Mitg

lied

der

Hel

mho

ltz-G

emei

nsch

aft

• JUDGE• 256-node GPU cluster• Each M2070 node:

• 2x M2070 (Fermi) GPU, each 6 GB RAM• 12-core X5650 CPU @ 2.67 GHz, 96 GB RAM

• JuHydra• single-node Kepler machine

• 2x K20X (Kepler) GPU, each 6 GB RAM• 16-core E5-2650 CPU @ 2 GHz, 64 GB RAM

Test Hardware

Page 19: Computation  of Mutual Information  Metric for  Image  Registration  on Multiple  GPUs

Mitg

lied

der

Hel

mho

ltz-G

emei

nsch

aft

Baseline: Full Replication (M2070)

0 9 18 27 36 45 54 63 72 81 90 99 108 117 126 135 144 153 162 171 1800

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

1 - GPU2 - GPUs4 - GPUs

Rotation angle

Runti

me

in s

econ

ds

ideal scalability

Page 20: Computation  of Mutual Information  Metric for  Image  Registration  on Multiple  GPUs

Mitg

lied

der

Hel

mho

ltz-G

emei

nsch

aft

Sysmem on Fermi

0 9 18 27 36 45 54 63 72 81 90 99 108 117 126 135 144 153 162 171 1800

0.2

0.4

0.6

0.8

1

1.2

1-GPU2-GPUs Baseline2 GPUs

Rotation angle

Runti

me

in s

econ

ds

Page 21: Computation  of Mutual Information  Metric for  Image  Registration  on Multiple  GPUs

Mitg

lied

der

Hel

mho

ltz-G

emei

nsch

aft

Sysmem on Fermi: Explanation

No sysmem AccessGood Coalescing

Few sysmem AccessBad Coalescing

Many sysmem AccessBad Coalescing

Most sysmem AccessGood Coalescing

Page 22: Computation  of Mutual Information  Metric for  Image  Registration  on Multiple  GPUs

Mitg

lied

der

Hel

mho

ltz-G

emei

nsch

aft

Sysmem on Fermi: PCI-E Queries

0 9 18 27 36 45 54 63 72 81 90 99 108 117 126 135 144 153 162 171 1800

0.2

0.4

0.6

0.8

1

1.2

0

20000000

40000000

60000000

80000000

100000000

120000000

2-GPUs Baseline 2 GPUs Total Sysmem_queries

Rotation angle

Runti

me

in s

econ

ds

Sysm

em_q

ueri

es

Page 23: Computation  of Mutual Information  Metric for  Image  Registration  on Multiple  GPUs

Mitg

lied

der

Hel

mho

ltz-G

emei

nsch

aft

Sysmem: Halo Sizes

0 18 36 54 72 89.9999999999999 108 126 144 162 1800

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

2 K20X, baseline 2 K20X, sysmem 2 K20X, 5% halo 2 K20X, 10% halo2 K20X, 15% halo 2 K20X, 20% halo 2 K20X, 25% halo

Angle, degrees

Tim

e, s

mostly quantitative, not qualitative difference

Page 24: Computation  of Mutual Information  Metric for  Image  Registration  on Multiple  GPUs

Mitg

lied

der

Hel

mho

ltz-G

emei

nsch

aft

Listupdate: Multiple Streams

4 streams look the best

0 18 36 54 72 89.9999999999999 108 126 144 162 1800

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

2 K20X, 1 stream 2 K20X, 2 streams 2 K20X, 3 streams 2 K20X, 4 streams

Angle, degrees

Tim

e, s

Page 25: Computation  of Mutual Information  Metric for  Image  Registration  on Multiple  GPUs

Mitg

lied

der

Hel

mho

ltz-G

emei

nsch

aft

Listupdate: AoS vs SoA, Atomics vs Group

SoA + atomics looks best

0 18 36 54 72 89.9999999999999 108 126 144 162 1800

0.2

0.4

0.6

0.8

1

1.2

2 K20X, SoA 2 K20X, AoS 2 K20X, compress

Angle, degrees

Tim

e, s

Page 26: Computation  of Mutual Information  Metric for  Image  Registration  on Multiple  GPUs

Mitg

lied

der

Hel

mho

ltz-G

emei

nsch

aft

Sysmem vs. Listupdate: Fermi

0 18 36 54 72 89.9999999999999108 126 144 162 1800

0.5

1

1.5

2

2.5

4 M2070, SoA 4 M2070, baseline 4 M2070, sysmem 4 M2070, 25% halo

Angle, degrees

Tim

e, s

on Fermi, sysmem is better

Page 27: Computation  of Mutual Information  Metric for  Image  Registration  on Multiple  GPUs

Mitg

lied

der

Hel

mho

ltz-G

emei

nsch

aft

Sysmem vs. Listupdate: Kepler (Closeup)

0 18 36 54 72 89.9999999999999 108 126 144 162 1800

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

2 K20X, SoA 2 K20X, baseline 2 K20X, sysmem 2 K20X, 25% halo

Angle, degrees

Tim

e, s

on Kepler, listupdate is better

Page 28: Computation  of Mutual Information  Metric for  Image  Registration  on Multiple  GPUs

Mitg

lied

der

Hel

mho

ltz-G

emei

nsch

aft

• Fermi• performance limited by atomics• system memory replication is better

• Kepler• order of magnitude faster than Fermi• no longer dominated by atomics• listupdate (atomic, SoA, 4 streams) is better

• Future work• Compression• Trials on real images

Conclusions

Page 29: Computation  of Mutual Information  Metric for  Image  Registration  on Multiple  GPUs

Mitg

lied

der

Hel

mho

ltz-G

emei

nsch

aft

• INM-1 at FZJ: http://www.fz-juelich.de/inm/inm-1/EN/Home/home_node.html

• NVidia Application Lab at FZJ: http://www.fz-juelich.de/ias/jsc/nvlab• Andrew V. Adinetz: [email protected] • Jiri Kraus: [email protected] • Dirk Pleiter: [email protected]

Questions

?