Top Banner
Workshop on Tools for Exascale, TGCC at CEA, Bruyères-le-Châtel, 2 October 2012 The H4H project Hybrid programming models for heterogeneous architectures Jean-Marc Morel - Bull Optimize HPC Applications on Heterogeneous Architectures
18

The H4H project - CEA/CEA · The H4H project Hybrid programming ... • Video compression/processing (e.g. Motion estimator) BMAT ... and of Scilab MPI to provide distributed features

Jun 08, 2018

Download

Documents

hoangdan
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: The H4H project - CEA/CEA · The H4H project Hybrid programming ... • Video compression/processing (e.g. Motion estimator) BMAT ... and of Scilab MPI to provide distributed features

Workshop on Tools for Exascale, TGCC at CEA, Bruyè res-le-Châtel, 2 October 2012

The H4H projectHybrid programming models

for h eterogeneous architecturesJean-Marc Morel - Bull

Optimize HPC Applications on Heterogeneous Architectures

Page 2: The H4H project - CEA/CEA · The H4H project Hybrid programming ... • Video compression/processing (e.g. Motion estimator) BMAT ... and of Scilab MPI to provide distributed features

2Jean-Marc Morel - Bull - H4H

Outline

• H4H objectives, partnership, and organisation

• The H4H Application Development Process

• Programming Methods & Tools

• The optimized software stack

• Application domains and use cases

• Mid-term achievements

• PerfCloud: a complementary set of activities

Page 3: The H4H project - CEA/CEA · The H4H project Hybrid programming ... • Video compression/processing (e.g. Motion estimator) BMAT ... and of Scilab MPI to provide distributed features

3Jean-Marc Morel - Bull - H4H

What Is H4H Set For?

Provide developers of compute-intensive applicationswith a highly efficient hybrid programming environment

for heterogeneous computing clusters composed of a mix of classical processors and hardware accelerators

� Facilitate the development process of HPC applications

� Maximize the overall performance of these applications

=> Empower technical and scientific computing

=> Accelerate research and innovation in many domains

=> Improve the competitiveness and independence of Eur ope

Page 4: The H4H project - CEA/CEA · The H4H project Hybrid programming ... • Video compression/processing (e.g. Motion estimator) BMAT ... and of Scilab MPI to provide distributed features

4Jean-Marc Morel - Bull - H4H

The Project Consortium

Page 5: The H4H project - CEA/CEA · The H4H project Hybrid programming ... • Video compression/processing (e.g. Motion estimator) BMAT ... and of Scilab MPI to provide distributed features

5Jean-Marc Morel - Bull - H4H

The H4H Partnership

BullHPC Platform Provider

REPSOLBMATDataLab

ATEMEDassault-Aviation

Industrial HPC users

EfieldGNSINTESMAGMARECOM

Simulation Software Editors

Rogue WaveGWTCAPSSoftware Tools Editors

JSCSupercomputing Centres

SCAICEA LISTScilab Enterprises

Research Labs &Software Institutes

UABHLRSZIH

Telecom-SudParisUVSQ

Academic partners

SwedenSpainGermanyFrance

Page 6: The H4H project - CEA/CEA · The H4H project Hybrid programming ... • Video compression/processing (e.g. Motion estimator) BMAT ... and of Scilab MPI to provide distributed features

6Jean-Marc Morel - Bull - H4H

How Our Work Is Organized

WP1: Project Management & Dissemination

WP2: Programming Models, Methods, and Tools ���� Design a robust programming environment which allows programmers to develop efficient parallel programs for heterogeneous architectures.

WP3: Platforms ���� Develop, integrate, setup, and optimise the appropriate heterogeneous HPC platforms together with optimized software packages such as Scilab and SAMG.

H4H Technology

WP4: Applications ���� Evaluate the H4H technology using industrial test cases.

Feedback to technology providers (WP2 & WP3)

Page 7: The H4H project - CEA/CEA · The H4H project Hybrid programming ... • Video compression/processing (e.g. Motion estimator) BMAT ... and of Scilab MPI to provide distributed features

7Jean-Marc Morel - Bull - H4H

The H4H Application Development Process

Restructure,

add or extend

hybrid

programming

pragmas

Hybrid binary code

for heterogeneous

architecture

Existing Program

High-level hybrid source

code for heterogeneous

architecture

Generate

low-level

source code

and binary

Fix

restructure

optimize

Numerical libraries and solvers,

OpenMP and accelerator runtimes,

Open MPI Library,

job & resource mgt, OS

Optimized software stack for

heterogeneous execution platform

Execute

Analyze correctness

and performance

Page 8: The H4H project - CEA/CEA · The H4H project Hybrid programming ... • Video compression/processing (e.g. Motion estimator) BMAT ... and of Scilab MPI to provide distributed features

8Jean-Marc Morel - Bull - H4H

The H4H development process and tools

Add data distribution directives

OpenMP program

MPI + HMPP program

Transform(OMP2HMPP)

Transform(STEP)

HMPP program

OpenMP + HMPP program

Add directives for hybrid regions

Execution Analysis

Correctness checkers and debuggers

Performance Prediction

Memory / Threading Performance

Parallel PerformanceMonitoring

Fix Restructure

Optimize

High-level hybrid source code for heterogeneous

architecture

Hybrid binary codefor heterogeneous

architecture

ExecuteCode Generation

Accelerator(HMPP)

MPI / OpenMP

Numerical libraries and solvers,

OpenMP and accelerator runtimes,

Open MPI Library,

job & resource mgt, OS

Optimized software stack for

heterogeneous execution platform

Page 9: The H4H project - CEA/CEA · The H4H project Hybrid programming ... • Video compression/processing (e.g. Motion estimator) BMAT ... and of Scilab MPI to provide distributed features

9Jean-Marc Morel - Bull - H4H

Execution Analysis in Detail

Execution Analysis

Correctness checkers and debuggers

Ayudame / TemanejoValgrind

Memory / Threading Performance

ThreadSpotter MAQAO

Parallel Performance

Vampir Scalasca

Score-P

Monitoring

LWM2

Performance Prediction

PAS2P

Page 10: The H4H project - CEA/CEA · The H4H project Hybrid programming ... • Video compression/processing (e.g. Motion estimator) BMAT ... and of Scilab MPI to provide distributed features

10Jean-Marc Morel - Bull - H4H

WP2 Partners � Programming Methods & Tools

CAPS entreprise• HMPPJülich Supercomputing Centre• Scalasca• Score-P (“SILC measurement system” in FPP)Rogue Wave AB• ThreadSpotterTU Dresden / GWT• Vampir• VampirTrace• Score-P (“SILC measurement system” in FPP)UAB / CAOS• PAS2PUniversity of Stuttgart / HLRS• Open MPI + Valgrind• Ayudame / TemanejoUVSQ• MAQAO

Page 11: The H4H project - CEA/CEA · The H4H project Hybrid programming ... • Video compression/processing (e.g. Motion estimator) BMAT ... and of Scilab MPI to provide distributed features

11Jean-Marc Morel - Bull - H4H

WP3: Optimized software stack and libraries

Bull

• HPC software stack (bullx supercomputing suite)

Scilab Enterprises

• Scilab (open source numerical package)

CEA-LIST

• JIT compilation for Scilab on GPU

UAB / CAOS

• RADIC (Fault tolerance architecture)

Page 12: The H4H project - CEA/CEA · The H4H project Hybrid programming ... • Video compression/processing (e.g. Motion estimator) BMAT ... and of Scilab MPI to provide distributed features

12Jean-Marc Morel - Bull - H4H

WP4: Ten Application Partners

ATEME• Video compression/processing (e.g. Motion estimator)BMAT• Music recognition & identification (Vericast)Dassault Aviation• CFD for aerodynamic design (AETHER solver)Efield• Electromagnetic fields modeling and simulationGNS• Metal forming simulationGWT• Open source simulation codes (e.g. molecular modeling)INTES• General purpose implicit finite element analysis systemMAGMA• Casting process simulationRECOM• 3D combustion simulationREPSOL• Seismic imaging and reservoir simulation

Page 13: The H4H project - CEA/CEA · The H4H project Hybrid programming ... • Video compression/processing (e.g. Motion estimator) BMAT ... and of Scilab MPI to provide distributed features

13Jean-Marc Morel - Bull - H4H

Main achievements so far (1 / 4)

• Hybrid Programming Model:– HMPP directives have been extended :

• HMPPAlt (HMPP Alternative) to replace calls to libraries executed on CPUs by calls to their equivalent on GPUs.

• Multi-device programming extension to enable to use multiple accelerators, distributing data and computation efficiently between them.

– Contribution to the creation of the new open standard OpenACC(Members: PGI, Cray, NVIDIA, CAPS)

• Directives that specify loops and regions of code to be offloaded • Portability across operating systems, host CPUs and accelerators

– Two prototypes of source-to-source translators: • OpenMP ���� HMPP+MPI (to distribute data processing)

• OpenMP ���� HMPP (trade-off between performance & energy consumption)

– Investigation of PGAS approach and porting of OpenSHMEM on bullx

Page 14: The H4H project - CEA/CEA · The H4H project Hybrid programming ... • Video compression/processing (e.g. Motion estimator) BMAT ... and of Scilab MPI to provide distributed features

14Jean-Marc Morel - Bull - H4H

Main achievements so far (2 / 4)

• Performance measurement and analysis tools:

– Definition & Implementation of the new Open Trace Format (OTF2)

– Contribution to the development of the Score-P measurement infrastructure

– Enhancement & extension of tools (Scalasca, Vampir )e.g. support of HMPP, CUDA, OpenMP 3.0 tasks, …

better filtering, more scalability, ..

– Enhancement of the performance prediction framework of MAQAO

– Enhancement of ThreadSpotter to optimize cache sharing

Page 15: The H4H project - CEA/CEA · The H4H project Hybrid programming ... • Video compression/processing (e.g. Motion estimator) BMAT ... and of Scilab MPI to provide distributed features

15Jean-Marc Morel - Bull - H4H

Main achievements so far (3 / 4)

• Software stack

– Contribution to the enhancement of the bullx supercomputing suiteAdvanced Edition (e.g. bullxMPI, power management framework, bi-rail IB Interconnect, cluster installation & management, etc.)

– Development of sciGPGPU (Scilab on GPU)

• Taking advantage of CUDA and OpenCL features and of important functions of CuBLAS and CuFFT libraries

• Adding GPU-based functions required by applications (gpuFFT for fftw, gpuInterp & gpuInter2d for cubic spline evaluation, svd single value decomposition, and spec for eigenvalues of matrices and pencils)

and of Scilab MPI to provide distributed features based on MPI

– Extension & enhancement of the SAMG solver and LAMA(Library for Accelerated Math Applications) combining the power of CPUs and GPUs

Page 16: The H4H project - CEA/CEA · The H4H project Hybrid programming ... • Video compression/processing (e.g. Motion estimator) BMAT ... and of Scilab MPI to provide distributed features

16Jean-Marc Morel - Bull - H4H

Main achievements so far (4 / 4)

• Applications: 36 test cases in 10 domains– Performance analysis, code restructuring / porting on GPU– Significant performance improvements, e.g.:

• ATEME: Hierarchical motion estimation on GPU 33% faster

• RECOM: Control of placement on NUMA enabled a 3.8x speedup

• MAGMA: – Solving a large system of equations on GPU can be 2x faster ; – combined GPU-MPI yielded a 3x speedup ;

– ThreadSpotter � 1.6x speedup by solving memory hierarchy problems

• Efield: HMPP enables rapid porting of critical code on GPU: 5x speedupof the FDTD code (Finite-difference in time-domain method)

• …

– Still many challenges• Data transfer often impedes the overall performance improvement

• Difficult to achieve a good load balance between CPUs and GPUs

• How to get overlapping of communication and computation

Page 17: The H4H project - CEA/CEA · The H4H project Hybrid programming ... • Video compression/processing (e.g. Motion estimator) BMAT ... and of Scilab MPI to provide distributed features

17Jean-Marc Morel - Bull - H4H

PerfCloud: A complementary set of activities

• Objectives

– Develop advanced technologies to build a new generation of HPC systems that can be used in cloud infrastructures

• A new HPC architecture based on MIC

• A new cooling technology to reduce energy consumption

• An advanced software development environment for MIC

• A new software infrastructure to manage such a heterogeneous cluster and helping, in particular, to monitor energy consumption

– Demonstrate the usefulness of these technologies on large applications:

• Atmospheric dispersion and weather simulation

• Image retrieval in large data bases of images & videos

– Identify respective advantages & constraints of GPU and MIC

Page 18: The H4H project - CEA/CEA · The H4H project Hybrid programming ... • Video compression/processing (e.g. Motion estimator) BMAT ... and of Scilab MPI to provide distributed features

18Jean-Marc Morel - Bull - H4H

PerfCloud Partners and Agenda

• Partners: – Bull, CAPS, CEA-LIST, UVSQ (already in H4H)– Astrium, CEA-DAM, Numtech, XediX (new)

• Agenda:– Start date: July 2012– Duration: 28 months– End date: Oct. 2014