Efficient Parallelization of MATLAB Stencil Applications for Multi-Core Clusters Johannes Spazier, Steffen Christgau, Bettina Schnor University of Potsdam, Germany WOLFHPC 2016, Salt Lake City, USA November 13, 2016 Johannes Spazier (University Potsdam) Parallelization of MATLAB codes for Multi-Core Clusters WOLFHPC 16, Nov 13 1/29
29
Embed
Efficient Parallelization of MATLAB Stencil Applications ...hpc.pnl.gov/conf/wolfhpc/2016/talks/spazier.pdfI generic handling of communication between processes I partial computation
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Efficient Parallelization of MATLAB StencilApplications for Multi-Core Clusters
Johannes Spazier, Steffen Christgau, Bettina Schnor
University of Potsdam, Germany
WOLFHPC 2016, Salt Lake City, USA
November 13, 2016
Johannes Spazier (University Potsdam) Parallelization of MATLAB codes for Multi-Core Clusters WOLFHPC 16, Nov 13 1/29
Outline
1 Introduction
2 Message Passing Interface
3 Hybrid Programming
4 Conclusion
Johannes Spazier (University Potsdam) Parallelization of MATLAB codes for Multi-Core Clusters WOLFHPC 16, Nov 13 2/29
Outline
1 Introduction
2 Message Passing Interface
3 Hybrid Programming
4 Conclusion
Johannes Spazier (University Potsdam) Parallelization of MATLAB codes for Multi-Core Clusters WOLFHPC 16, Nov 13 3/29
Introduction
MATLAB
approved as high-level language for scientific computing
significantly reduced implementation effort
enables fast prototyping of mathematical models
well-suited for stencil applications
drawback: slow execution through interpreter
no out-of-the-box parallelization
⇒ insufficient performance for large data sets
Johannes Spazier (University Potsdam) Parallelization of MATLAB codes for Multi-Core Clusters WOLFHPC 16, Nov 13 4/29
Introduction
StencilPaC Overview
MATLAB to parallel C compiler
C Compiler(gcc)
MATLABsource
StencilPaCCompiler
generatedC code
finalexecutable
MPIheaders
MPIlibraries
commandline options CMD
Johannes Spazier (University Potsdam) Parallelization of MATLAB codes for Multi-Core Clusters WOLFHPC 16, Nov 13 5/29
Introduction
StencilPaC Overview
automatic parallelization for matrix operations
B(X ,Y ) = M1 (X1,Y1) ◦ . . . ◦Mn (Xn,Yn)
support different architecturesI shared and distributed memory systems, accelerators
build on common programming APIsI OpenMP, MPI and OpenACC
Y
X
B
= Y1
X1
M1
⊗ ...⊗ Yn
Xn
Mn
Johannes Spazier (University Potsdam) Parallelization of MATLAB codes for Multi-Core Clusters WOLFHPC 16, Nov 13 6/29
Introduction
Applications
two grid-based stencil applications
domain update over multiple iterations
manual reference implementations in C/C++
EasyWave
tsunami simulation developed at theGerman Research Center for Geosciences
access pattern: 5-point-stencil
Cellular Automaton
idealized model for biological systems
9-point-stencil (moore neighborhood)
Johannes Spazier (University Potsdam) Parallelization of MATLAB codes for Multi-Core Clusters WOLFHPC 16, Nov 13 7/29
Introduction
StencilPaC Overview
generated C code is much faster than MATLABfor both applications
improvements of more than
I 7 times with sequential code
I 21 times on an 8 core shared memory system
I 187 times with an NVIDIA Tesla K40m
for the memory-bound tsunami simulation EasyWave
even better results for the Cellular Automaton
Johannes Spazier (University Potsdam) Parallelization of MATLAB codes for Multi-Core Clusters WOLFHPC 16, Nov 13 8/29
Introduction
StencilPaC Overview
distributed systems are most challenging
I automatic partitioning of matrices
I generic handling of communication between processes
I partial computation
small runtime overhead is essential
focus on MPI one-sided API in previous work
today: concepts of and comparison with
I two-sided communication
I hybrid programming
Johannes Spazier (University Potsdam) Parallelization of MATLAB codes for Multi-Core Clusters WOLFHPC 16, Nov 13 9/29
Outline
1 Introduction
2 Message Passing Interface
3 Hybrid Programming
4 Conclusion
Johannes Spazier (University Potsdam) Parallelization of MATLAB codes for Multi-Core Clusters WOLFHPC 16, Nov 13 10/29
Message Passing Interface
Principles
degree of parallelization is given by thenumber of processes
distribute matrices evenly among theprocesses
one-dimensional domain decomposition(block of columns)
compute local parts in parallel
set up communication at runtime
provide appropriate ghost zones
X
Y
Process
0
Process
1
Process
2
Lghost Rghost
Johannes Spazier (University Potsdam) Parallelization of MATLAB codes for Multi-Core Clusters WOLFHPC 16, Nov 13 11/29