March 29, 2001 March 29, 2001 Hiroshi Takahara & Toshifumi Takei Hiroshi Takahara & Toshifumi Takei NEC Corporation isualization for High-End Weather and Climat Modeling on the NEC SX Series The 3rd International Workshop on Next Generation Climate Models for Advanced High Performance Computing Facilities E-mail: [email protected]
24
Embed
March 29, 2001 Hiroshi Takahara & Toshifumi Takei NEC Corporation
The 3rd International Workshop on Next Generation Climate Models for Advanced High Performance Computing Facilities. Visualization for High-End Weather and Climate Modeling on the NEC SX Series. March 29, 2001 Hiroshi Takahara & Toshifumi Takei NEC Corporation. - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Vector --- tailored to large-scale simulations and huge data (Meteo/climate, CFD, crash…)
Scalar --- suitable for small-to-medium sized problems
Limited performance scalability due to inter-PE communications
Merits and Demerits of Each Architecture
Vector
Scalar
Shared Distributed・ Shared Distributed
Excellent Effective PerformanceEase of Use(Auto Parallelization)
Excellent Effective PerformanceEase of Use(Auto Parallelization)
:Merits :Demerits
High ScalabilityHigh Scalability
Difficult Parallelization(Require High Skills)
Difficult Parallelization(Require High Skills)
Difficult Parallelization(Require High Skills)
Poor Effective Performance
Difficult Parallelization(Require High Skills)
Poor Effective Performance
Excellent Cost/Peak PerformanceExcellent Cost/Peak PerformanceEase of Use (Auto Parallelization)Ease of Use (Auto Parallelization)
Wide Application RangeWide Application Range
High Cost, Limited ScalabilityHigh Cost, Limited Scalability
Limited ScalabilityLimited Scalability
Some views from the weather & climate Some views from the weather & climate community for vector computerscommunity for vector computers
Shared-memory, vector computers manufactured in Japan, have a combination of usability and performance...
The purchase of Japanese vector computers would have an immediate impact on climate and weather science in the U.S.
The use of distributed memory, commodity-based processor parallel computers increases the needed software investment …
USGCRP Report (Dec.2000)
Pros and Cons about the Validity of the TOP500Pros and Cons about the Validity of the TOP500Pros: Ranking covering worldwide high-performance computers with much swayCons: NOT representing a complete range of applications. Too much impacts in policy makingChanging acceptance among the HPC community because of the increased dominance of
business computing vendors (particularly for lower rankings)
0
50
100
150
200
250
IBM Sun SGI
Cray
Compaq (11)HP (5)
47
Fujitsu (17)
Self(5)HPTi(1)Intel(1)
NEC(23)
Hitachi (16)
215
92
67
Finance, DB, Web etc. 72 sites#15/34 Charles Schwab#53 European Patent Office#93 Sobeys#102 Deutch Telekom#112 Bank Administration Institute (BAI)#120 State Farm#177 NTT#213 Chase Manhattan
Finance, DB, Web etc. 54 sites#136 New York City - Human Resources#139 Bank Westboro#140 E-commerce Stanta Clara#169 Ariline London#170 Bank Milano#171 Bank Munich#173 Chase GlobalNet#176Rakuten **
1 IBM ASCI White, 4938 Lawrence Livermore National Laboratory 2 Intel ASCI Red 2379Sandia National Labs 3 IBM ASCI Blue-Pacific
2144Lawrence Livermore National Laboratory 4 SGI ASCI Blue Mountain
1608Los Alamos National Laboratory 5 IBM SP Power3 375 MHz 1417Naval Oceanographic Office (NAVOCEANO) 6 IBM SP Power3 375 MHz 1179National Centers for Environmental Prediction 7 Hitachi SR8000-F1/112
1035Leibniz Rechenzentrum 8 IBM SP Power3 375 MHz 8 way 929UCSD/San Diego Supercomputer Center 9 Hitachi SR8000-F1/100
917High Energy Accelerator Research Organization /KEK 10 Cray Inc. T3E1200 892 Government
Nov. 2000
** Rank 176: Rakuten is the largest cyber mall in Japan!!!
(FLOPS)
Peak p
erform
ance
1G
10G
100G
1T
4T
Multi-node
IXS or HIPPI-SW
160G
80G
40G
20G
10G
●
●
●
●
A
B
C
D
●
●
●
●
●
5T
●
SX- 5 Series
A Model 64G - 128GF
B Model32G - 64GFLOPS
4G - 8GFLOPS
C Model16G - 32GFLOPS
D Model8G - 16GFLOPS
HPC ServerSX- 5S
●
●
●
8G
4G
16G
8G - 16GFLOPS
SX-5 Series / SX-5S (HPC Server) ProductsSX-5 Series / SX-5S (HPC Server) Products
Single node
●
●
Be
Ce ●
4GFLOPS・
CPU Model
...what you pay for:
1 2 4 8 12 16 20 24 32 47
IFS
BOM GASP
BOM LAPS
CHMI_ALADIN
DMI HIRLAM
51 50 50 4947
4545
35.8
55
44
48.6
44.9
41.4 41.138.3
44 44 43 43
39
37
0
10
20
30
40
50
60
% o
f m
ach
ine
pe
ak
# CPUs
Sustained performance
Performance of Mission-Critical NWP Codes on the SX Series
SX Series in Meteorology / Environmental Science
Europe
・ Danish Meteorological Institute(DMI)
・ Bureau of Meteorology (BOM)/CSIRO
Ѓњ
・ Instituto Nacional De Pesquis Espaciais (INPE)
・ Atmospheric Environment Service(AES)
・ National Institute of Environmental Studies (NIES)・ Japan Marine Science and Technology Center (JAMSTEC)・ Frontier Research System for Global Change
・ Czech Hydrometeorological Institute(CHMI)
・ Institute for Atmospheric Physics in Germany(IAP)
Japan
Australia
North America
South America
Asia
SX Series at Worldwide Major Meteorological Institutions
・ Korea Meteorological Administration(KMA)
・ IRI Lamont Doherty・ Deutsches Klimarechenzentrum (DKRZ)
・ Interdisciplinary Center for Mathematical and Computational Modeling, Warsaw University(ICMW)
・ Istituto Nazionale di Geofisica e Vulcanologia (INGV)
Storing all the computational results for each parameter setneeds more than several GBytes of disk space.Ex. 100*100*100 grid points*10000 time steps --- 200 GBytes (5 variables at each grid point)
Disk space problem
Storing all the computational results for each parameter setneeds more than several GBytes of disk space.Ex. 100*100*100 grid points*10000 time steps --- 200 GBytes (5 variables at each grid point)
Data transfer bottleneck
Transferring GB-order data over a networkis next to impossible.Ex. Effective performance 1MB/sec -->100GB/(1MB/sec)=28h
Data transfer bottleneck
Transferring GB-order data over a networkis next to impossible.Ex. Effective performance 1MB/sec -->100GB/(1MB/sec)=28h
Memory capacity problem
Loading a large volume of datathat were output by a supercomputermay be difficult.
Memory capacity problem
Loading a large volume of datathat were output by a supercomputermay be difficult.
Intensive needs for grasping simulatedIntensive needs for grasping simulatedresults on the flyresults on the fly
Memory capacity NWP code : 100-200 array elements per grid Increasing demand with model resolution and complexity T319L50 model (40km mesh) requires 20-40 GBytes Ensemble forecasting of 50 members --> >> 1TBytes Data assimilation / chemical models much demanding Climate code :1-year simulation 30-60 Gbytes (T213L50) 2-4TBytes (T1280L100) Disk space NCAR: empirically 114 Bytes per MFLOP 5TBytes/month net growth* (*RCI Workshop, April 2000)
-Approach A : Conventional Post-processing –Approach A : Conventional Post-processing –(Vis5D, GrADS, and many of off-the-shelf packages)(Vis5D, GrADS, and many of off-the-shelf packages)
Graphical mapping and rendering on the client side Approach adopted by many conventional post-processors
◆ AdvantagesFull exploitation of server for number crunching and local
machine resources for graphical processing
◆ DrawbacksChallenges in transferring a huge volume of (polygon) data
across the network and manipulating them on the local server
Numericalsimulation
Mapping RenderingImage display
Computing server
User’s terminal
-Approach B (-Approach B (Server-side VisualizationServer-side Visualization)-)-Approach of NEC RVSLIBApproach of NEC RVSLIB
Both mapping and rendering processes on the server side
◆ AdvantagesEfficient usage of network because of transfer of image data (NOT massive polygon data)
Image compression techniques available for further reduction of data
◆ DrawbacksIncreased load of computing and memory resources on the server side for graphical mapping and rendering processes
•Monitoring of an on-going simulation (tracking) and alteration of its parameters (steering) while continuing the simulation
- Constant and reduced data transfer rate between the server and client regardless of the scale of simulations
Reduced Cost and EffortEfficient Use of NW Bandwidth
Network (LAN/WAN)
RVSLIB/Server: SX, WSRVSLIB/Client: PC,WS (Java)
Usage of RVSLIBUsage of RVSLIB
Moviegeneration(batch mode)
--> Initialization in interactive mode Handshake with the client in batch mode Loading a scenario--> Data management (no data copy) --> Rendering C/S communication --> Termination
CALL RVS_INIT
CALL RVS_BFC
CALL RVS_TERM
Main loop body
Time integration
CALL RVS_MAIN
User’s code RVSLIB server
Server
Moviein AVIetc.
Moviein AVIetc.
Scenarioscript
Off-line converter
Moviein avi ormpeg2
RVSLIB/Client (GUI)(interactive mode) Tracking and steering of user code
- UNIX version based on X/Motif - Java version for Windows / UNIX
C/S communica-tion protocols
Intranet- TCP/IP socket Internet/firewall- HTTP Single machine- Shared memory
Interactive Mode
Batch Mode
Data interfaces with GrADS and NetCDF
formats supported for post-processing
R educed cost and effort on a trial-and-error basis - Monitoring of an on-going simulation (tracking) and alteration of its parameters
(steering) while continuing the simulation - Conventional post-processing and batch-mode graphics also available
Best Benefits Gained From RVSLIB
Efficient use of vector/parallel facilities and network - Efficient graphical processing and image creation capitalizing on vector/parallel computing capabilities - Reduced and almost constant network traffic exploiting image data compression
Animation based on scenario
- Easily navigable visualization based on a plot described in a scenario file
Library format tailored to a wide spectrum of simulation programs (BFC Grid, FEM, Multi-block grid, particle simulation, …)
Visualization of flow around a baseball Visualization of flow around a baseball - Collaboration with Physical & - Collaboration with Physical &
Chemical Res. Inst., Japan Chemical Res. Inst., Japan Computation: Finite Difference Method Unsteady, incompressible, viscous Flow Number of Grids: 169 * 92 * 101 Reynolds number: 100000 -- 200000 Ball Speed: 75 ~ 150km/h
Applications
Computation timing data (10000 time steps): Solver only (no visualization) 27150sec (7.54h) + Visualization with same viewing: 28000sec (7.78h) + Visualization with variable viewing: 28150sec (7.82h) ---> Almost no additional CPU time required for visualization because of high-speed visualization on the SX Series
# Computation on SX-5S1 (4GFlops) # Visualization every 10 time steps (contour and tracer) # Tracer movement calculated at each time step
Post-processing with RVSLIBPost-processing with RVSLIB- Collaboration with BoM/Australia -- Collaboration with BoM/Australia -
RVSLIB Client
NetCDF format files
User’s solver RVSLIB Server
Server SX-4/32
Compressed image data
NumericalWeather Prediction
Offlinevisualization
Bureau of Meteorology (Australia)
Oceanic circulation simulated with ACOM2
Atmospheric simulation --Relative humidity around Australia represented by isosurfaces