State of programming models and code transformations on heterogeneous platforms Boyana Norris [email protected]Computer Scientist, Mathematics and Computer Science Division, Argonne National Laboratory Senior Fellow, Computation Institute, University of Chicago
High-level talk on programming models for parallel heterogeneous architectures at the second workshop organized by the NSF-funded Conceptualization of Software Institute for Abstractions and Methodologies for HPC Simulations Codes on Future Architectures, http://flash.uchicago.edu/site/NSF-SI2/
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
State of programming models and code transformations on heterogeneous platforms
Parallel programming for heterogeneous architectures– Challenges– Example approaches
Help set the stage for subsequent panel discussions w.r.t. issues related to programming heterogeneous architectures– Need your input, please do interrupt
Heterogeneity
Hardware heterogeneity (different devices with different capabilities), e.g.:– Multicore x86 CPUs with GPUs– Multicore x86 CPUs with Intel Phi accelerators– big.LITTLE (coupling slower, low-power ARM cores with faster, power-
hungry ARM cores)– A cluster with different types of nodes – x86 CPU with FPGAs (e.g., Convey)– …
Software heterogeneity (e.g., OS, languages)– Not part of this talk
Similarities among heterogeneous platforms Typically each processor has several, and sometimes many
execution units– NVIDIA Fermi GPUs have 16 Streaming Multiprocessors (SMPs); – AMD GPUs have 20 or more SIMD units; – Intel Phi has >50 x86 cores
Each execution unit typically has SIMD or vector execution. – NVIDIA GPUs execute threads in SIMD-like groups of 32 (what NVIDIA
calls warps); – AMD GPUs execute in wavefronts that are 64-threads wide;– Intel Phi has 512-bit wide SIMD instructions (16 floats or 8 doubles).
Cons: – Manual reimplementation required in most cases– Hard to balance user control with resource management automation– Interoperability
Recall host-directed MPI+X model
Image by Yili Zheng, LBL
PGAS model
Image by Yili Zheng, LBL
High-level frameworks and libraries
Domain-specific problem-solving environments and mathematical libraries can encapsulate the specifics of mapping to heterogeneous architectures (e.g., PETSc, Trilinos, Cactus)
Advantages– Efficient implementations of common functionality– Different levels of APIs to hide or expose different levels of the
implementation and runtime (unlike pure language approaches)– Relatively rapid support of new hardware
Disadvantages– Learning curves, deep software dependencies
Ongoing efforts attempting to balance scalability with productivity
DOE X-Stack program pursues fundamental advances in programming models, languages, compilers, runtime systems and tools to support the transition of applications to exascale platforms– DEGAS (Dynamic, Exascale Global Address Space): a PGAS approach– SLEEC (Semantics-rich Libraries for Effective Exascale Computation):
annotations and cost models to compile into optimized low-level implementations
– X-Tune: model-based code generation and optimization of algorithms written in GPLs
– D-TEC: compilers for both new general-purpose languages and embedding DSLs into other languages
Summary
Many traditional programming models can be used on heterogeneous architectures, with vendor support for compilers, libraries and runtimes
No clear multi-platform winner programming model/language/framework
Many new efforts on deepening the software stack to enable better balance of programmability, performance, portability