Top Banner
CGAM Running the Met Office Unified Model on HPCx Paul Burton CGAM, University of Reading [email protected] www.cgam.nerc.ac.uk/~paul
34

CGAM Running the Met Office Unified Model on HPCx Paul Burton CGAM, University of Reading [email protected] paul.

Dec 21, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: CGAM Running the Met Office Unified Model on HPCx Paul Burton CGAM, University of Reading Paul@met.rdg.ac.uk paul.

CGAMRunning the Met Office Unified Model on HPCx

Paul Burton

CGAM, University of [email protected]

www.cgam.nerc.ac.uk/~paul

Page 2: CGAM Running the Met Office Unified Model on HPCx Paul Burton CGAM, University of Reading Paul@met.rdg.ac.uk paul.

2April 19, 2023

Overview

• CGAM : Who, what, why and how

• The Met Office Unified Model

• Ensemble Climate Models

• High Resolution Climate Models

• Unified Model Performance

• Future Challenges and Directions

Page 3: CGAM Running the Met Office Unified Model on HPCx Paul Burton CGAM, University of Reading Paul@met.rdg.ac.uk paul.

3April 19, 2023

Centre for Global

Atmospheric Modelling

Atmospheric Chemistry Modelling

Support UnitUniversities’ Weather and Environment

Research Network

Distributed Institute for Atmospheric

Composition

British Atmospheric Data Centre

University Facilities for Atmospheric Measurement

Facility for Airbourne

Atmospheric Measurements

Who is CGAM?

Data Assimilation

Research Centre

British Geological

Survey

Centre for Ecology and Hydrology

Proudman Oceanographic

Laboratory

Southampton Oceanography

Centre

Centre for Terrestrial

Carbon Dynamics

Environmental Systems

Science Centre

British Antarctic Survey

Tyndall Centre for Climate Change

ResearchNational Institute for Environmental

e-Science

Centre for Polar Observations and Modelling

NERC Centres for

Atmospheric Science

N.E.R.C.

Page 4: CGAM Running the Met Office Unified Model on HPCx Paul Burton CGAM, University of Reading Paul@met.rdg.ac.uk paul.

4April 19, 2023

What does CGAM do?• Climate Science

– UK Centre of expertise for climate science– Lead UK research in climate science

• Understand and simulate the highly non-linear dynamics and feedbacks of the climate system

• Earth System Modelling• From seasonal to 100’s of years• Close links to Met Office

• Computational Science– Support scientists using Unified Model– Porting and optimisation– Development of new tools

Page 5: CGAM Running the Met Office Unified Model on HPCx Paul Burton CGAM, University of Reading Paul@met.rdg.ac.uk paul.

5April 19, 2023

Why does CGAM exist?

• Will there be an El Nino this year?– How severe will it be?

• Are we seeing increases in extreme weather events in the UK?– 2000 Autumn floods– Drought?

• Will the milder winters of the last decade continue?

• Can we reproduce and understand past abrupt changes in climate?

Page 6: CGAM Running the Met Office Unified Model on HPCx Paul Burton CGAM, University of Reading Paul@met.rdg.ac.uk paul.

6April 19, 2023

How does CGAM answer such questions?

• Models are our laboratory– Investigate predictability– Explore forcings and feedbacks– Test hypothesis

Page 7: CGAM Running the Met Office Unified Model on HPCx Paul Burton CGAM, University of Reading Paul@met.rdg.ac.uk paul.

7April 19, 2023

Met Office Unified Model

• Standardise on using a single model• Met Office’s Hadley Centre recognised as

world leader in climate research• Two way collaboration with the Met Office• Very flexible model

– Forecast– Climate– Global or Limited Area– Coupled ocean model– Easy configuration via a GUI– User configurable diagnostic output

Page 8: CGAM Running the Met Office Unified Model on HPCx Paul Burton CGAM, University of Reading Paul@met.rdg.ac.uk paul.

8April 19, 2023

Unified Model : Technical Details

• Climate configuration uses “old” vn4.5– Vn5 has an updated dynamical core– Next generation “HadGEM” climate

configuration will use this

• Grid-point model– Regular latitude/longitude grid

• Dynamics– Split-explicit finite-difference scheme– Diffusion and polar filtering

• Physical Parameterisation– Almost all constrained to a vertical column

Page 9: CGAM Running the Met Office Unified Model on HPCx Paul Burton CGAM, University of Reading Paul@met.rdg.ac.uk paul.

9April 19, 2023

Unified Model : Parallelisation

• Domain decomposition– Atmosphere : 2D regular decomposition– Ocean : 1D (latitude) decomposition

• GCOM library for communications– Interface to selectable communications library:

MPI, SHMEM, ???– Basic communication primitives– Specialised communications for UM

• Communication Patterns– Halo update (SWAPBOUNDS)– Gather/scatter– Global/partial summations

• Designed/optimised for Cray T3E!

Page 10: CGAM Running the Met Office Unified Model on HPCx Paul Burton CGAM, University of Reading Paul@met.rdg.ac.uk paul.

10April 19, 2023

Model Configurations

• Currently– HadAM3 / HadCM3

• Low resolution (270km : 96 x 73 x 19L)• Running on ~10-40 CPUs

– Turing (T3E1200), Green (O3800), Beowulf cluster

• Over the next year– More of the same– Ensembles

• Low resolution (HadAM3/HadCM3)• 10-100 members

– High resolution• 90km : 288 x 217 x 30L• 60km : 432 x 325 x 40L

Page 11: CGAM Running the Met Office Unified Model on HPCx Paul Burton CGAM, University of Reading Paul@met.rdg.ac.uk paul.

11April 19, 2023

Ensemble Methods in Weather Forecasting

• Have been used operationally for many years (is. ECMWF)– Perturbed starting conditions– Reduced resolution

• Multi-model ensembles– Perturbed starting conditions– Different models

• Why are they used?– Give some indication of predictability– Allows objective assessment of weather-related

risks– More chance of seeing extreme events

Page 12: CGAM Running the Met Office Unified Model on HPCx Paul Burton CGAM, University of Reading Paul@met.rdg.ac.uk paul.

12April 19, 2023

Page 13: CGAM Running the Met Office Unified Model on HPCx Paul Burton CGAM, University of Reading Paul@met.rdg.ac.uk paul.

13April 19, 2023

Climate Ensembles

• Predictability• What confidence do we have in climate

change?• What effect do different forcings have?

– CO2 – different scenarios

– Volcano erruptions– Deforestation

• How sensitive is the model– Twiddle the knobs and see what happens

• How likely are extreme events?– Allows governments to take defensive action now

Page 14: CGAM Running the Met Office Unified Model on HPCx Paul Burton CGAM, University of Reading Paul@met.rdg.ac.uk paul.

14April 19, 2023

Ensembles Implementation

• Setup– Allow users to specify and design an

ensemble exeperiment

• Runtime– Allow the ensemble to run as a single job

on the machine for easy management

• Analysis– How to view and process vast amounts of

data produced

Page 15: CGAM Running the Met Office Unified Model on HPCx Paul Burton CGAM, University of Reading Paul@met.rdg.ac.uk paul.

15April 19, 2023

Setup : Normal UM workflow

UMUI

UM Job

Shell script [poe executable]Fortran Namelists

Data

Starting data

Forcing data Output

Diagnostics

Restart data

Page 16: CGAM Running the Met Office Unified Model on HPCx Paul Burton CGAM, University of Reading Paul@met.rdg.ac.uk paul.

16April 19, 2023

Control

poe UM_JobUM_Job

$MEMBERid=…cd “Job.$MEMBERid”Run script

Setup : UM Ensemble workflow

Job.1

Shell scriptFortran Namelists

Data.1

Starting dataForcing data

Out.1

DiagnosticsRestart data

Job.2

Shell scriptFortran Namelists

Job.3

Shell scriptFortran Namelists

UM Job

Shell script [poe executable]

Fortran Namelists

ConfigN_MEMBERS=3Differences

Data.2

Starting dataForcing data

Data.3

Starting dataForcing data

Out.2

DiagnosticsRestart data

Out.3

DiagnosticsRestart data

ect

ecdt

Page 17: CGAM Running the Met Office Unified Model on HPCx Paul Burton CGAM, University of Reading Paul@met.rdg.ac.uk paul.

17April 19, 2023

UM Ensemble : Runtime (1)

• “poe” called at top level – calls a “top_level_script”– Works out which CPU it’s on– Hence which member it is– Hence which directory/model SCRIPT to

run

• Model scripts run in a separate directory for each member

• Each model script calls the executable

Page 18: CGAM Running the Met Office Unified Model on HPCx Paul Burton CGAM, University of Reading Paul@met.rdg.ac.uk paul.

18April 19, 2023

UM Ensemble : Run time (2)

• Uses “MPH” to change the global communicator– http://www.nersc.gov/research/SCG/acpi/MPH/

– Freely available tool from NERSC– MPH designed for running coupled multi-

model experiments

• Each member has a unique MPI communicator replacing the global communicator

Page 19: CGAM Running the Met Office Unified Model on HPCx Paul Burton CGAM, University of Reading Paul@met.rdg.ac.uk paul.

19April 19, 2023

UM Ensemble : Future Work

• Run time tools

• Control and monitoring of ensemble members

• Real-time production of diagnostics– Currently each member writes its own

diagnostics files• Lots of disk space• I/O performance?

– Have a dedicated diagnostics process• Only output statistical analysis

Page 20: CGAM Running the Met Office Unified Model on HPCx Paul Burton CGAM, University of Reading Paul@met.rdg.ac.uk paul.

20April 19, 2023

UK-HIGEM• National “Grand Challenge” Programme for

High Resolution Modelling of the Global Environment

• Collaboration between a number of academic groups and the Met Office’s Hadley Centre

• Develop high resolution version of HadGEM (~ 10 atmosphere, 1/30 ocean)

• Better understanding and prediction of– Extreme events– Predictability– Feedbacks and interactions– Climate “surprises”

• Regional Impacts of climate change

Page 21: CGAM Running the Met Office Unified Model on HPCx Paul Burton CGAM, University of Reading Paul@met.rdg.ac.uk paul.

21April 19, 2023

UK HiGEM Status

• Project only just starting

• Plan to use Earth Simulator for production runs

• Preliminary runs carried out– Earth Simulator– Very encouraging results

• HPCx is a useful platform– For development– Possibly for some production runs

Page 22: CGAM Running the Met Office Unified Model on HPCx Paul Burton CGAM, University of Reading Paul@met.rdg.ac.uk paul.

22April 19, 2023

UM Performance

• Two configurations– Low resolution 96x73x19L– High resolution 288x217x30L

• Built in comprehensive timer diagnostics– Wallclock time– Communications– Not yet implemented

• I/O, memory, hardware counters, ???

• Outputs an XML file

• Analysed using PHP web page

Page 23: CGAM Running the Met Office Unified Model on HPCx Paul Burton CGAM, University of Reading Paul@met.rdg.ac.uk paul.

23April 19, 2023

LowRes ScalabilityTotal Wallclock Time

1.00E+01

1.00E+02

1.00E+03

0 1 2 3 4 5 6 7 8 9 101112 1314151617 1819202122 232425Nproc

Tim

e (S

eco

nd

s)

Overall

Dynamics

Physics

Page 24: CGAM Running the Met Office Unified Model on HPCx Paul Burton CGAM, University of Reading Paul@met.rdg.ac.uk paul.

24April 19, 2023

LowRes : Communication Time

Send/Receive Time

0.00%

5.00%

10.00%

15.00%

20.00%

25.00%

0 1 2 3 4 5 6 7 8 9 1011 1213 1415 1617 1819 2021 2223 2425Nproc

% o

f S

ecti

on Overall

Dynamics

Physics

Page 25: CGAM Running the Met Office Unified Model on HPCx Paul Burton CGAM, University of Reading Paul@met.rdg.ac.uk paul.

25April 19, 2023

LowRes : Load ImbalanceBarrier Time

0.00%

5.00%

10.00%

15.00%

20.00%

25.00%

30.00%

0 1 2 3 4 5 6 7 8 9 1011 1213 1415 16 1718 1920 2122 2324 25Nproc

% o

f S

ecti

on Overall

Dynamics

Physics

Page 26: CGAM Running the Met Office Unified Model on HPCx Paul Burton CGAM, University of Reading Paul@met.rdg.ac.uk paul.

26April 19, 2023

LowRes : Relative Costs% of Overall Time

0.00%

10.00%

20.00%

30.00%

40.00%

50.00%

60.00%

70.00%

80.00%

90.00%

0 1 2 3 4 5 6 7 8 9 1011 1213 1415 1617 1819 2021 2223 2425Nproc

% o

f O

vera

ll T

ime

Dynamics

Physics

Page 27: CGAM Running the Met Office Unified Model on HPCx Paul Burton CGAM, University of Reading Paul@met.rdg.ac.uk paul.

27April 19, 2023

HiRes ScalabilityTotal Wallclock Time

1.00E+01

1.00E+02

1.00E+03

0 10 20 30 40 50 60 70 80 90 100 110 120 130Nproc

Tim

e (S

eco

nd

s)

Overall

Dynamics

Physics

Page 28: CGAM Running the Met Office Unified Model on HPCx Paul Burton CGAM, University of Reading Paul@met.rdg.ac.uk paul.

28April 19, 2023

HiRes Communication TimeSend/Receive Time

0.00%

5.00%

10.00%

15.00%

20.00%

25.00%

30.00%

35.00%

40.00%

0 10 20 30 40 50 60 70 80 90 100 110 120 130Nproc

% o

f S

ecti

on

Overall

Dynamics

Physics

Page 29: CGAM Running the Met Office Unified Model on HPCx Paul Burton CGAM, University of Reading Paul@met.rdg.ac.uk paul.

29April 19, 2023

HiRes Load ImbalanceBarrier Time

0.00%

5.00%

10.00%

15.00%

20.00%

25.00%

0 10 20 30 40 50 60 70 80 90 100 110 120 130Nproc

% o

f S

ecti

on

Overall

Dynamics

Physics

Page 30: CGAM Running the Met Office Unified Model on HPCx Paul Burton CGAM, University of Reading Paul@met.rdg.ac.uk paul.

30April 19, 2023

HiRes Relative Costs% of Overall Time

0.00%

10.00%

20.00%

30.00%

40.00%

50.00%

60.00%

70.00%

80.00%

90.00%

100.00%

0 10 20 30 40 50 60 70 80 90 100 110 120 130Nproc

% o

f O

vera

ll T

ime

Dynamics

Physics

Page 31: CGAM Running the Met Office Unified Model on HPCx Paul Burton CGAM, University of Reading Paul@met.rdg.ac.uk paul.

31April 19, 2023

HiRes Exclusive Timer

• QT_POS has large “Collective” time– Unexpected!

• Call to global_MAX routine in gather/scatter– Not needed, so deleted!

Page 32: CGAM Running the Met Office Unified Model on HPCx Paul Burton CGAM, University of Reading Paul@met.rdg.ac.uk paul.

32April 19, 2023

HiRes : After “optimisation”

• QT_POS reduced from 65s to 35s• Improved scalability• And repeat…

Page 33: CGAM Running the Met Office Unified Model on HPCx Paul Burton CGAM, University of Reading Paul@met.rdg.ac.uk paul.

33April 19, 2023

Optimisation Strategy

• Low Res– Aiming for 8 CPU runs as ensemble

members (typically ~50 members)– Physics optimisation a priority

• Load Imbalance (SW radiation)• Single processor optimisation

• Hi Res– As many CPUs as is feasible– Dynamics optimisation a priority

• Remove/optimise collective operations• Increase average message length

Page 34: CGAM Running the Met Office Unified Model on HPCx Paul Burton CGAM, University of Reading Paul@met.rdg.ac.uk paul.

34April 19, 2023

Future Challenges

• Diagnostics and I/O– UM does huge amounts of diagnostic I/O in a

typical climate run– All I/O through a single processor

• Cost of gather• Non-parallel I/O

• Ocean models– Only 1D decomposition, so limited scalability– T3E optimised!

• Next generation UM5.x– Much more expensive– Better parallelisation for dynamics scheme