Top Banner
EU H2020 Centre of Excellence (CoE) 1 October 2015 – 31 March 2018 Grant Agreement No 676553 Performance Optimisation and Productivity Nick Dingle [email protected]
16

Performance Optimisation - ECMWF · •On Performance Optimisation and Productivity •Promoting best practices in parallel programming •Providing Services •Precise understanding

Jun 26, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Performance Optimisation - ECMWF · •On Performance Optimisation and Productivity •Promoting best practices in parallel programming •Providing Services •Precise understanding

EU H2020 Centre of Excellence (CoE) 1 October 2015 – 31 March 2018

Grant Agreement No 676553

Performance Optimisation and Productivity

Nick Dingle [email protected]

Page 2: Performance Optimisation - ECMWF · •On Performance Optimisation and Productivity •Promoting best practices in parallel programming •Providing Services •Precise understanding

Why?

• Complexity of machines and codes

Frequent lack of quantified understanding of actual behaviour

Not clear most productive direction of code refactoring

• Important to maximize efficiency (performance, power) of compute intensive applications and productivity of the development efforts

What?

• Parallel programs, mainly MPI/OpenMP• Although also CUDA, OpenCL, OpenACC, Python, …

2

Motivation

28/10/2016 ECMWF 17th Workshop on HPC in Meteorology [email protected]

Page 3: Performance Optimisation - ECMWF · •On Performance Optimisation and Productivity •Promoting best practices in parallel programming •Providing Services •Precise understanding

POP CoE

• A Centre of Excellence• On Performance Optimisation and Productivity

• Promoting best practices in parallel programming

• Providing Services • Precise understanding of application and system behaviour

• Suggestion/support on how to refactor code in the most productive way

• Horizontal• Transversal across application areas, platforms, scales

• For academic and industrial codes and users

• FREE !

328/10/2016 ECMWF 17th Workshop on HPC in Meteorology [email protected]

Page 4: Performance Optimisation - ECMWF · •On Performance Optimisation and Productivity •Promoting best practices in parallel programming •Providing Services •Precise understanding

? Parallel Application Performance Audit Report• Primary service

• Identify performance issues of customer code (at customer site)

• Small effort (< 1 month)

! Parallel Application Performance Plan Report• Follow-up on the audit service

• Identifies the root causes of the issues found andqualifies and quantifies approaches to address them

• Longer effort (1-3 months)

Proof-of-Concept Software Demonstrator• Experiments and mock-up tests for customer codes

• Kernel extraction, parallelisation, mini-apps experiments to showeffect of proposed optimisations

• 6 months effort

Services provided by the CoE

28/10/2016 ECMWF 17th Workshop on HPC in Meteorology [email protected]

Page 5: Performance Optimisation - ECMWF · •On Performance Optimisation and Productivity •Promoting best practices in parallel programming •Providing Services •Precise understanding

• Application Structure • (if appropriate) Region of Interest• Scalability Information• Application Efficiency

• E.g. time spent outside MPI

• Load Balance• Whether due to internal or external factors

• Computational Performance• Identification of areas for improvement

• Communications• E.g. sensitivity to network performance

• Summary and Recommendation

5

Outline of a typical audit report

28/10/2016 ECMWF 17th Workshop on HPC in Meteorology [email protected]

Page 6: Performance Optimisation - ECMWF · •On Performance Optimisation and Productivity •Promoting best practices in parallel programming •Providing Services •Precise understanding

The process …

When?

October 2015 – March 2018

How?

• Apply

• Fill in small questionnairedescribing application and needshttps://pop-coe.eu/request-service-form

• Questions? Ask [email protected]

• Selection/assignment process

• Install tools @ your production machine

• Interactively: Gather data Analysis Report

628/10/2016 ECMWF 17th Workshop on HPC in Meteorology [email protected]

Page 7: Performance Optimisation - ECMWF · •On Performance Optimisation and Productivity •Promoting best practices in parallel programming •Providing Services •Precise understanding

Customer feedback

7

• What was the quality of their answers?• How responsive have the POP experts been to your questions or concerns about the analysis and the report?28/10/2016 ECMWF 17th Workshop on HPC in Meteorology [email protected]

Page 8: Performance Optimisation - ECMWF · •On Performance Optimisation and Productivity •Promoting best practices in parallel programming •Providing Services •Precise understanding

• Who?• BSC (coordinator), ES

• HLRS, DE

• JSC, DE

• NAG, UK

• RWTH Aachen, IT Center, DE

• TERATEC, FR

A team with

• Excellence in performance tools and tuning

• Excellence in parallel programming models and practices

• Research and development background AND proven commitment in application to real academic and industrial use cases

8

Partners

28/10/2016 ECMWF 17th Workshop on HPC in Meteorology [email protected]

Page 9: Performance Optimisation - ECMWF · •On Performance Optimisation and Productivity •Promoting best practices in parallel programming •Providing Services •Precise understanding

Target customers

• Code developers

• Assessment of detailed actual behaviour

• Suggestion of most productive directions to refactor code

• Users

• Assessment of achieved performance in specific production conditions

• Possible improvements modifying environment setup

• Evidence to interact with code provider

• Infrastructure operators

• Assessment of achieved performancein production conditions

• Possible improvements from modifying environment setup

• Information for time computer time allocation processes

• Training of support staff

• Vendors

• Benchmarking

• Customer support

• System dimensioning/design

928/10/2016 ECMWF 17th Workshop on HPC in Meteorology [email protected]

Page 10: Performance Optimisation - ECMWF · •On Performance Optimisation and Productivity •Promoting best practices in parallel programming •Providing Services •Precise understanding

Tools

• Install and use already available monitoring and analysis technology

• Open-source toolsets

• Extrae + Paraver

• Score-P + Cube + Scalasca/TAU

• Dimemas, Extra-P

• SimGrid

• Commercial toolsets (if available at customer site)

• Intel tools

• Cray tools

• Allinea tools

1028/10/2016 ECMWF 17th Workshop on HPC in Meteorology [email protected]

Page 11: Performance Optimisation - ECMWF · •On Performance Optimisation and Productivity •Promoting best practices in parallel programming •Providing Services •Precise understanding

Code Audit Examples

1128/10/2016 ECMWF 17th Workshop on HPC in Meteorology [email protected]

Page 12: Performance Optimisation - ECMWF · •On Performance Optimisation and Productivity •Promoting best practices in parallel programming •Providing Services •Precise understanding

12

GITM (Cefas)

• An offline particle tracking model code

• Written in Fortran with OpenMP

• Key audit results:• Current performance: 16 threads offer 5x speed-up vs 1 thread

• Load imbalance amongst OpenMP threads

• Maximise opportunities for vectorisation by aligning arrays and refactoring Fortan array operations to use DO loops

• Large number of floating-point divisions

28/10/2016

Page 13: Performance Optimisation - ECMWF · •On Performance Optimisation and Productivity •Promoting best practices in parallel programming •Providing Services •Precise understanding

• Extracted from ECMWF's Integrated Forecasting System (IFS) code

• Mixed Fortran/C++ code with hybrid MPI+OpenMP

• Key audit results:

13

dwarf-D-ellipticSolver-GCR (ECMWF)

• Good scalability as communication pattern introduces little overhead from synchronisation or serialisation

• Minor load imbalance due to variability in IPC across nodes

• Opportunities to further increase performance as IPC is 0.8 on average

28/10/2016

Page 14: Performance Optimisation - ECMWF · •On Performance Optimisation and Productivity •Promoting best practices in parallel programming •Providing Services •Precise understanding

• Undertaken as part of the ESCAPE (Energy-efficient Scalable Algorithms for Weather Prediction at Exascale) project

• Written in Fortran with MPI

• Traces gathered by user and analysed by POP

• Key audit findings:

14

NEMO (Atos)

• Good computational load balance • Observed superlinear speed-up of

analysed region of the program, which was due to improved cache efficiency when strong scaling

28/10/2016

Page 15: Performance Optimisation - ECMWF · •On Performance Optimisation and Productivity •Promoting best practices in parallel programming •Providing Services •Precise understanding

Other POP activities

• Customer advocacy• Gather customers feedback, ensure satisfaction, steer activities

• Sustainability• Explore business models

• Training• Best practices on the use of the tools and programming models

1528/10/2016 ECMWF 17th Workshop on HPC in Meteorology [email protected]

Page 16: Performance Optimisation - ECMWF · •On Performance Optimisation and Productivity •Promoting best practices in parallel programming •Providing Services •Precise understanding

28/10/2016 16

Contact:https://[email protected]

This project has received funding from the European Union‘s Horizon 2020 research and innovation programme under grant agreement No 676553.

Performance Optimisation and Productivity A Centre of Excellence in Computing Applications