YOU ARE DOWNLOADING DOCUMENT

Please tick the box to continue:

Transcript
Page 1: 1 Monday, 26 October 2015 © Crown copyright Met Office Computing Update Paul Selwood, Met Office.

1 Thursday 20 April 2023 © Crown copyright

Met Office Computing Update

Paul Selwood, Met Office

Page 2: 1 Monday, 26 October 2015 © Crown copyright Met Office Computing Update Paul Selwood, Met Office.

2 Thursday 20 April 2023 © Crown copyright

The Met Office

National Weather Service– Global and Local Area

Climate Prediction (Hadley Centre) Operational and Research activities Computers:

– 1991-1996 : Cray Y-MP/C90

– 1996-present : Cray T3E

Currently Relocating to Exeter

Page 3: 1 Monday, 26 October 2015 © Crown copyright Met Office Computing Update Paul Selwood, Met Office.

3 Thursday 20 April 2023 © Crown copyright

Relocation : Exeter 2003

Page 4: 1 Monday, 26 October 2015 © Crown copyright Met Office Computing Update Paul Selwood, Met Office.

4 Thursday 20 April 2023 © Crown copyright

Relocation: Exeter 2003 ~500 staff already

working in Exeter 1 T3E, 1 mainframe,

30 NEC nodes, many servers already moved

1 T3E + mass storage system moving now.

Completion due end November 2003.

Page 5: 1 Monday, 26 October 2015 © Crown copyright Met Office Computing Update Paul Selwood, Met Office.

5 Thursday 20 April 2023 © Crown copyright

Major Applications Unified Model

– Single code used for NWP forecast and climate prediction

– Submodels (atmosphere, ocean …)

– Grid-point model (regular lat-long)

» non-hydrostatic, semi-implicit, Semi-Lagrangian dynamics

» Arakawa C-grid, Charney-Philips vertical staggering

Variational Assimilation– Currently 3D-Var, shortly moving to 4D-Var

– Six hour time window

– Increase in satellite observations

Page 6: 1 Monday, 26 October 2015 © Crown copyright Met Office Computing Update Paul Selwood, Met Office.

6 Thursday 20 April 2023 © Crown copyright

What is the Met Office getting? 2003 (Exeter site)

– 30 nodes of NEC SX-6

– Two computer halls for resiliency

– Front end redundancy for failover

– 6x current capability

2005– additional 15 nodes of next generation

machine

– 12.5x current capability

Page 7: 1 Monday, 26 October 2015 © Crown copyright Met Office Computing Update Paul Selwood, Met Office.

7 Thursday 20 April 2023 © Crown copyright

Supercomputer StatisticsT3E900

T3E1200

SX-6

Processors 880 640 240

Peak Gflops/PE 0.9Gf 1.2Gf 8Gf

Sustained 10% 10% 40%

Total Sustained 79.2 Gf 76.8 Gf 770 Gf

Memory 120 Gb 170 Gb 1.9Tb

Disk 1.4 Tb 1.4 Tb 36 Tb

Page 8: 1 Monday, 26 October 2015 © Crown copyright Met Office Computing Update Paul Selwood, Met Office.

8 Thursday 20 April 2023 © Crown copyright

The bits (one hall)

Met Office Networks

FC

Sw

itch

Mirrored filesystemsUser

filesystems

FibreChannel

TX

712xIA

64

TX

712xIA

64

Gigabit Etherne

t

NO

DE

(8

CP

Us)

NO

DE

(8

CP

Us)

NO

DE

(8

CP

Us)

NO

DE

(8

CP

Us)

NO

DE

(8

CP

Us)

NO

DE

(8

CP

Us)

NO

DE

(8

CP

Us)

NO

DE

(8

CP

Us)

NO

DE

(8

CP

Us)

NO

DE

(8

CP

Us)

NO

DE

(8

CP

Us)

NO

DE

(8

CP

Us)

NO

DE

(8

CP

Us)

NO

DE

(8

CP

Us)

NO

DE

(8

CP

Us)

IXS(Interconne

ct)

IXS

Page 9: 1 Monday, 26 October 2015 © Crown copyright Met Office Computing Update Paul Selwood, Met Office.

9 Thursday 20 April 2023 © Crown copyright

Porting Initial focus has been the porting of

operational codes. Basic port completed for all with no major

issues encountered (some minor ones…) Much easier than C90 to T3E! Trial suite being assembled for parallel

running from October. Porting system very stable (4 months

without reboot!)

Page 10: 1 Monday, 26 October 2015 © Crown copyright Met Office Computing Update Paul Selwood, Met Office.

10 Thursday 20 April 2023 © Crown copyright

Optimisation Vectorisation

– T3E optimisations not too drastic

– T3E streams encouraged vector-like code

– Decomposition can effect vector length

Memory– Avoid bank conflicts

Communication– Relatively slower compared to T3E

Typically 0.5% lines of code inserted, deleted or changed.

Page 11: 1 Monday, 26 October 2015 © Crown copyright Met Office Computing Update Paul Selwood, Met Office.

11 Thursday 20 April 2023 © Crown copyright

Challenges How do we schedule work?

– Operational

– Climate Production

– Research / Development

OpenMP within a node? I/O - needs a rework

– Current access patterns are inefficient

– Packing vectorises poorly

– Packed data sizes too small to utilise best I/O connections

Page 12: 1 Monday, 26 October 2015 © Crown copyright Met Office Computing Update Paul Selwood, Met Office.

12 Thursday 20 April 2023 © Crown copyright

Machine #PEs Time (s)

T3E 144 6750

SX6 8 3185

SX6 16 1890

Initial Results (N216L38)

Real job including I/O Load balance problem? Can run operationally

with just 1 node for current resolutions

Better scalability has been observed for higher resolutions

Page 13: 1 Monday, 26 October 2015 © Crown copyright Met Office Computing Update Paul Selwood, Met Office.

13 Thursday 20 April 2023 © Crown copyright

Operational Codes

Code T3E CPUs SX6 CPUs

GlobalAtmosphere

288 8

3D-Var 144 8

1/3 Ocean 60 2

EuropeanShelf Seas

128 4

Page 14: 1 Monday, 26 October 2015 © Crown copyright Met Office Computing Update Paul Selwood, Met Office.

14 Thursday 20 April 2023 © Crown copyright

Opportunities More complex climate models

– Higher resolution

– More physical interactions represented (eg more complex chemistry)

Satellite data volumes Introduce 4D-Var

– Increase observational window to 12 hours

Increase resolutions of models– Global, Euro-LAM, UK-Mes

– Many different scenarios

Page 15: 1 Monday, 26 October 2015 © Crown copyright Met Office Computing Update Paul Selwood, Met Office.

15 Thursday 20 April 2023 © Crown copyright

Questions?


Related Documents