Operational Numerical Weather Prediction in Switzerland ......in Switzerland and Evolution Towards New Supercomputer Architectures Philippe Steiner (MeteoSwiss) Michele De Lorenzi
Post on 05-Jul-2020
4 Views
Preview:
Transcript
Operational Numerical Weather Prediction in Switzerland and Evolution Towards
New Supercomputer Architectures
Philippe Steiner (MeteoSwiss)
Michele De Lorenzi (CSCS)
Angelo Mangili (CSCS)
14th ECMWF Workshop on the Use of HPC in Meteorology, November 2010
14th ECMWF Workshop on High Performance Computing in Meteorology, Nov. 2010 2
Outline
• Actual operational suite of MeteoSwiss at CSCSand performance
• Performance on different Cray architectures
• Swiss HP2C Initiative
• HP2C Project for COSMO– Performance Analysis
– Investigated Hybrid Parallelization via OpenMP/MPI
– Parallelizing I/O
– New Dynamical Core
• Outlook of future suites of MeteoSwiss at CSCS
• Conclusion
14th ECMWF Workshop on High Performance Computing in Meteorology, Nov. 2010 3
MeteoSwiss-CSCS Collaboration
• The Swiss National Supercomputing Centre (CSCS) as unit of ETH Zurich is hosting the operational model of MeteoSwiss.
• More than 10 years of collaboration which has been proven to be fruitful for both institutes.
• MeteoSwiss profits of the expertise of CSCS in HPC and can focus on the NWP tasks.
• CSCS solves the technological and engineering aspects and providesknowledge transfer to other scientific domains.
MeteoSwiss
CSCS
14th ECMWF Workshop on High Performance Computing in Meteorology, Nov. 2010 4
Actual Operational Suite
COSMO-7 : 6.6km, 393x338x60 GP, 3 x 72h fcst /day
COSMO-2 : 2.2km, 520x350x60 GP, 8 x 24h fcst /day
Production at CSCS
Main machine: Cray XT4 Buin, 1056 cores
Fall-back: Cray XT4 Dole, 688 cores
Service nodes used for front-end functions
Separate servers for DB
COSMO-7
COSMO-2
IFS
14th ECMWF Workshop on High Performance Computing in Meteorology, Nov. 2010 5
Production Scheme00 12 24 UTC h
3h
45min
12
UTC
cyc
le
3h
45min
09
UTC
cyc
le
3h
45min
06
UTC
cyc
le3h
45min
15
UTC
cyc
le
3h
45min
18
UTC
cyc
le
3h
45min
21
UTC
cyc
le
3h
45min
00
UTC
cyc
le
3h
45min0
3 U
TC c
ycle
03 06 09 15 18 21
Long production cycle (+72h COSMO-7 and +24h COSMO-2):
0 35
3h
ass
imila
tio
n (
21
UTC
)
0-2
4h
fo
reca
st (
00
UTC
)an
d T
C p
rod
uct
s
Elapsed timein min
3h
ass
imila
tio
n (
21
UTC
)
0-2
4h
fo
reca
st (
00
UTC
)an
d T
C p
rod
uct
s
25
-72
h f
ore
cast
(0
0 U
TC)
and
TC
pro
du
cts
1 7 11 4645
COSMO-2 forecast
COSMO-7 assimilation
COSMO-7 forecast
COSMO-2 assimilation
COSMO-2 TC products
COSMO-7 TC products
6522
14th ECMWF Workshop on High Performance Computing in Meteorology, Nov. 2010 6
• Actual operational suite of MeteoSwiss at CSCSand performance
• Performance on different Cray architectures
• Swiss HP2C Initiative
• HP2C Project for COSMO– Performance Analysis
– Investigated Hybrid Parallelization via OpenMP/MPI
– Parallelizing I/O
– New Dynamical Core
• Outlook of future suites of MeteoSwiss at CSCS
• Conclusion
14th ECMWF Workshop on High Performance Computing in Meteorology, Nov. 2010 7
COSMO 1 km Performance on Cray XT4/XT5/XE6Benchmark
1142 x 765 grid points in lon/lat direction
90 vertical levels
Time-step Δt=8s, 1h integration (no output)
14th ECMWF Workshop on High Performance Computing in Meteorology, Nov. 2010 8
COSMO 1 km Performance on Cray XT4/XT5/XE6Considered Systems
Some Technical Details
Cray XT4 (Buin) Cray XT5 (Rosa) Cray XE6 (Palu)
# cores (total) 1’056 22’128 4’224
# sockets 264 3’688 352
# cores/socket(name)
quad-core AMD Opt.Barcelona
hexa-core AMD Opt.Istanbul
12-core AMD Opt.Magny-cours
core freq. [GHz] 2.4 2.4 2.1
MEM/core [GB] 2 1.33 1.33
MEM Total [TB] 2.1 (DDR) 28.8 (DDR2-800) 5.6 (DDR3-1333)
MEM Bandw./core[GB/s] 2.6 2.6 3.6
Interconnect [GB/s](name)
9.6 (<5 µs)SeaStar2
9.6 (<5 µs)SeaStar2
10.6 (~1.2 µs)Gemini
# racks 3 20 2
14th ECMWF Workshop on High Performance Computing in Meteorology, Nov. 2010 9
COSMO 1 km Performance on Cray XT4/XT5/XE6Benchmark Results (# Cores)
0
500
1000
1500
2000
2500
3000
3500
4000
4500
0 1000 2000 3000 4000
Seco
nd
s
1h simulation COSMO 1 km (no I/O)
Buin XT4
Rosa XT5
Palu XE6
# Cores
14th ECMWF Workshop on High Performance Computing in Meteorology, Nov. 2010 10
COSMO 1 km Performance on Cray XT4/XT5/XE6Benchmark Results (# Nodes)
0
500
1000
1500
2000
2500
3000
3500
4000
4500
0 100 200 300 400
Buin XT4
Rosa XT5
Palu XE6
1h simulation COSMO 1 km (no I/O)
Seco
nd
s
# Nodes
14th ECMWF Workshop on High Performance Computing in Meteorology, Nov. 2010 11
• Actual operational suite of MeteoSwiss at CSCSand performance
• Performance on different Cray architectures
• Swiss HP2C Initiative
• HP2C Project for COSMO– Performance Analysis
– Investigated Hybrid Parallelization via OpenMP/MPI
– Parallelizing I/O
– New Dynamical Core
• Outlook of future suites of MeteoSwiss at CSCS
• Conclusion
14th ECMWF Workshop on High Performance Computing in Meteorology, Nov. 2010 12
Swiss Platform for High-Performance andHigh-Productivity Computing ( , www.hp2c.ch)
• Prepare the Swiss HPC community for disruptive and potentially revolutionary developments in computer architectures during the coming decade.
• Supported in the frame of the Swiss National Initiative for High-Performance Computing and Networking by:– Swiss University Conference
– ETH Domain (the 2 Swiss Federal Institutes of Technology)
InitiativeOverview
14th ECMWF Workshop on High Performance Computing in Meteorology, Nov. 2010 13
• Based on a co-design concept, which bring togheter:
– domain scientists,
– computer engineers,
– hardware architects
to design a “system”.
(= application + software + hardware)
Initiative Framework
14th ECMWF Workshop on High Performance Computing in Meteorology, Nov. 2010 14
• BigDFT - Large scale Density Functional Electronic Structure Calculations in a Systematic Wavelet Basis Set
• Cardiovascular - HPC for Cardiovascular System Simulations
• COSMO - Regional Climate and Weather Modeling on the Next Generations High-Performance Computers: Towards Cloud-Resolving Simulations
• Cosmology - Computational Cosmology on the Petascale
• CP2K - New Frontiers in ab initio Molecular Dynamics
• Ear Modeling - Towards the Building of new Hearing Devices
• Gyrokinetic - Gyrokinetic Numerical Simulations of Turbulence in Fusion Plasmas
• MAQUIS - Modern Algorithms for Quantum Interacting Systems;
• Petaquake - Large-Scale Parallel Nonlinear Optimization for High Resolution 3D-Seismic Imaging
• Selectome - Selectome, looking for Darwinian Evolution in the Tree of Life
• Supernova - Productive 3D Models of Stellar Explosions
Initiative Projects Granted (2010-2012)
14th ECMWF Workshop on High Performance Computing in Meteorology, Nov. 2010 15
• Actual operational suite of MeteoSwiss at CSCSand performance
• Performance on different Cray architectures
• Swiss HP2C Initiative
• HP2C Project for COSMO– Performance Analysis
– Investigated Hybrid Parallelization via OpenMP/MPI
– Parallelizing I/O
– New Dynamical Core
• Outlook of future suites of MeteoSwiss at CSCS
• Conclusion
14th ECMWF Workshop on High Performance Computing in Meteorology, Nov. 2010 16
Task 1: New high-resolution cloud-resolving climate model.
Task 2: Refactoring the COSMO model:– Performance analysis of the current version.
– Investigate hybrid parallelization using OpenMP/MPI.
– Investigate Parallel I/O possibilities.
Task 3: Rewrite the COSMO dynamical core:– Rewrite the dynamical core for current/emerging HPC architectures.
– Develop one code base for both CPU and GPU.
– Adapt physical parametrizations to new code design.
InitiativeCOSMO Project
Collaboration between ETH Zurich, CSCS, MeteoSwiss, DWD,
Supercomputing Systems AG and other partners.
14th ECMWF Workshop on High Performance Computing in Meteorology, Nov. 2010 17
• Actual operational suite of MeteoSwiss at CSCSand performance
• Performance on different Cray architectures
• Swiss HP2C Initiative
• HP2C Project for COSMO– Performance Analysis
– Investigated Hybrid Parallelization via OpenMP/MPI
– Parallelizing I/O
– New Dynamical Core
• Outlook of future suites of MeteoSwiss at CSCS
• Conclusion
14th ECMWF Workshop on High Performance Computing in Meteorology, Nov. 2010 18
0
5
10
15
20
25
30
35
0 2000 4000 6000 8000 10000 12000
Para
llel S
pee
du
p (
ref 3
34 c
ore
s)
Cores
COSMO Speedup (Cray XT5)
Total
FastWaves
Linear
COSMO Project – Performance AnalysisCOSMO Scaling
14th ECMWF Workshop on High Performance Computing in Meteorology, Nov. 2010 19
COSMO Project – Performance AnalysisCOSMO Run-time Distribution
0 20 40 60 80 100 120
334
664
1324
2644
5284
10564
Percent
Run time distribution (Cray XT5)
User
MPI
Sync# C
ore
s
14th ECMWF Workshop on High Performance Computing in Meteorology, Nov. 2010 20
COSMO Project – Performance AnalysisCOSMO Output Overhead (7&2 Km)
DOLE 7km ROSA 7km DOLE 2km (4 ioc) ROSA 2km (4 ioc) ROSA 2km (2 ioc)
0,00%
2,00%
4,00%
6,00%
8,00%
10,00%
12,00%
6%
7%
2%
4% 4%
3%
4%
2%
2% 2%
% of total time spent writing outputs
YUTIMINGS
computation w rite data gather data
% o
f tot
al e
xecu
tion
time
14th ECMWF Workshop on High Performance Computing in Meteorology, Nov. 2010 21
• Actual operational suite of MeteoSwiss at CSCSand performance
• Performance on different Cray architectures
• Swiss HP2C Initiative
• HP2C Project for COSMO– Performance Analysis
– Investigated Hybrid Parallelization via OpenMP/MPI
– Parallelizing I/O
– New Dynamical Core
• Outlook of future suites of MeteoSwiss at CSCS
• Conclusion
14th ECMWF Workshop on High Performance Computing in Meteorology, Nov. 2010 22
• Inserted OpenMP PARALLEL DO directives on outermost loop:
– Over 600 directives inserted.
– Also attempted to use of OpenMP 3.0 COLLAPSE directive.
– Also enabled use of SSE instructions on all routines (previously only used on some routines).
COSMO Project – Hybrid OpenMP/MPIInvestigated Hybrid Parallelization
14th ECMWF Workshop on High Performance Computing in Meteorology, Nov. 2010 23
• Loop level parallelism can achieve some modest performance gains:– Can require many threaded loops -> OpenMP overhead.
– Can require a lot more software engineering to maintain.
COSMO Project – Hybrid OpenMP/MPIMain Outcome
14th ECMWF Workshop on High Performance Computing in Meteorology, Nov. 2010 24
• Actual operational suite of MeteoSwiss at CSCSand performance
• Performance on different Cray architectures
• Swiss HP2C Initiative
• HP2C Project for COSMO– Performance Analysis
– Investigated Hybrid Parallelization via OpenMP/MPI
– Parallelizing I/O
– New Dynamical Core
• Outlook of future suites of MeteoSwiss at CSCS
• Conclusion
14th ECMWF Workshop on High Performance Computing in Meteorology, Nov. 2010 26
• Used properly, parallel I/O might alleviate I/O bottlenecks.
• NetCDF-4, pNetCDF, PIO are parallel I/O libraries which can be quite easily used for improving I/O performance.
COSMO Project – Parallelizing I/OApproach Using Parallel I/O Libraries
14th ECMWF Workshop on High Performance Computing in Meteorology, Nov. 2010 27
COSMO Project – Parallelizing I/OInvestigate In-situ Visualization Techniques
The traditional post-processing model “compute-store-analyze”
does not scale because I/O to disks is the slowest component.
Consequences
• Datasets are often under-sampled on disks.
• Many time steps are never archived.
• It often needs a supercomputer to re-load and visualize supercomputer results.
14th ECMWF Workshop on High Performance Computing in Meteorology, Nov. 2010 28
COSMO Project – Parallelizing I/OVisIt https://wci.llnl.gov/codes/visit
Desktop Machine HPC Systemnode220
node221
node222
node223
simulationcodeVisIt
libraryVisIt GUI
and Viewer
simulationcode
simulationcode
simulationcode
Commands
Images
MP
IM
PI
MP
I
VisItlibrary
VisItlibrary
VisItlibrary
No pre-defined visualization scenario needs to be defined.
14th ECMWF Workshop on High Performance Computing in Meteorology, Nov. 2010 30
COSMO Project – Parallelizing I/OUsing In-situ Techniques for Data Postprocessing
Commu-nicationinterface
for memoryaccess
COSMO compute nodes
COSMO I/O nodes
selected fields
Postprocessingnodes
derived products
• Substantial I/O reduction is expected.• Parallel I/O techniques can be easily applied.
14th ECMWF Workshop on High Performance Computing in Meteorology, Nov. 2010 31
• Actual operational suite of MeteoSwiss at CSCSand performance
• Performance on different Cray architectures
• Swiss HP2C Initiative
• HP2C Project for COSMO– Performance Analysis
– Investigated Hybrid Parallelization via OpenMP/MPI
– Parallelizing I/O
– New Dynamical Core
• Outlook of future suites of MeteoSwiss at CSCS
• Conclusion
14th ECMWF Workshop on High Performance Computing in Meteorology, Nov. 2010 32
• Arithmetic Intensity (= FLOPs per memory access)- High arithmetic intensity processor bound high %peak
- Low arithmetic intensity memory bound low %peak
• COSMO-2 runs with ~3% peak on Cray XT4
COSMO Project – New Dynamical CoreMotivation
O(1) O(log n) O(n)
Stencil Particle methodsSparse Linear Algebra FFTBLAS 1 Dense Linear Algebra
BLAS 2 Lattice Methods (BLAS 3)
Top500 (Linpack)Focus of HPC system design
COSMO dynamical core(stencils on structured grid)
14th ECMWF Workshop on High Performance Computing in Meteorology, Nov. 2010 33
COSMO Project – New Dynamical CoreAnalysis: Memory Scaling
• Benchmark on Cray XT4 (Quad-Core AMD Opteron Budapest 2.3 GHz).
• Keep total #cores constant but change #cores/CPU used.
• Cores on a CPU have to share memory bandwidth.
• Baseline (100%, indicated by red bars) is all cores used (4 per CPU).
Re
lati
ve R
un
tim
e (
4 c
ore
s =
10
0%
)
14th ECMWF Workshop on High Performance Computing in Meteorology, Nov. 2010 34
COSMO Project – New Dynamical CoreFeasibility Study
• Goals
– Rewrite a reduced version of dynamical core (containing all important
algorithmic motifs) in a prototype code.
– Reduce number of for memory accesses.
– Do not use any optimizations that could not be accepted by the COSMO
community.
• Results
Spe
ed
up
Number of cores (on hexa-core AMD Opteron)
Problem size per core
2.0
14th ECMWF Workshop on High Performance Computing in Meteorology, Nov. 2010 35
• Actual operational suite of MeteoSwiss at CSCSand performance
• Performance on different Cray architectures
• Swiss HP2C Initiative
• HP2C Project for COSMO– Performance Analysis
– Investigated Hybrid Parallelization via OpenMP/MPI
– Parallelizing I/O
– New Dynamical Core
• Outlook of future suites of MeteoSwiss at CSCS
• Conclusion
14th ECMWF Workshop on High Performance Computing in Meteorology, Nov. 2010 36
Outlook of Future Suites at MeteoSwissImprove Regionalization & Provide Probability Information
COSMO-1: 1 km mesh size, 8 x 24h deterministic forecast
COSMO-E: 3 km mesh size, 2 x 5d ensemble forecast• Ensemble Data Assimilation (LETKF) with 40 members for both version.
• Possible Implementation on twin systems:– System 1 for operational deterministic parts, System 2 as fail-over.
– Ensemble size reduced in case of failure of one System.
System 1
System 2
System 1
System 2
14th ECMWF Workshop on High Performance Computing in Meteorology, Nov. 2010 37
Conclusion
• We are investing in the preparation of the future COSMO to higher resolution and new HPC systems.
• A co-design project involves domain scientists, computer engineers and hardware architects.
• Approach– Detailed performance analysis of existing code.
– Refactoring of the code on current hardware architectures.
– Rewriting of the application core to run on CPU and GPU (even moving to new programming languages).
– Addressing I/O problem with innovative approaches.
14th ECMWF Workshop on High Performance Computing in Meteorology, Nov. 2010 38
Acknowledgements
Oliver Fuhrer (MeteoSwiss)
Matthew Cordery (CSCS)
Jean-Guillaume Piccinali (CSCS)
Neil Stringfellow (CSCS)
William Sawyer (CSCS)
Davide Tacchella (CSCS)
Petra Baumann (MeteoSwiss)
Ulrich Schättler (DWD) Andre Walser (MeteoSwiss)
Jean-Marie Bettems (MeteoSwiss)
Jean Favre (CSCS)Alam Sadaf Roohi (CSCS)
Tobias Gysi (SCS)
Thomas Schulthess (CSCS)
David Müller (SCS)
top related