March 29, 2001 Hiroshi Takahara & Toshifumi Takei NEC Corporation

March 29, 2001March 29, 2001

Hiroshi Takahara & Toshifumi TakeiHiroshi Takahara & Toshifumi TakeiNEC Corporation

Visualization for High-End Weather and Climate Modeling on the NEC SX Series

The 3rd International Workshop on Next Generation Climate Models for Advanced High Performance Computing Facilities

E-mail: [email protected]

PC

Advancement of HPC TechnologyAdvancement of HPC Technology

Biochemistry Weather/climate Ocean

Structural & thermalanalysisApplications

FLOPS

ServerMicro processor

Parallel processing cluster

Highly parallel vectorprocessing

High performance 1-chip vector / massively parallelprocessing

Multi-layered highly parallel processing / Distributed global computing

Supercomputer

Future system

1 Giga

1 Tera

1 Peta

10

100

10

100

CFD Crash

Earth Simulator

Memory size

100M 1G 10G 100G 1T

80 GBytes

Performance (FLOPS)

8 G

800 M

80 M

8 MAirfoil

48-Hour Weather

2-D PlasmaModelingOil ReservoirModeling

Estimate ofHiggs BosonMass

3-D PlasmaModeling

72-HourWeather

Vehicle Designing

StructuralBiology

PharmaceuticalDesigning

ChemicalDynamics

Climate ModelingTurbulence SimulationHuman GenomeOceanic CirculationViscous Fluid DynamicsSemiconductor ModelingQuantum Chromodynamics

(DARPA)

Required performance and memory capacityRequired performance and memory capacity

Vector & scalar processingVector & scalar processing

WeatherWeather

Crash

Amount of computation

Data

siz

e

Genome

FEM

CFD

Vector-tailoredVector-tailored

Scalar-tailoredScalar-tailored

Chemistry

Vector --- tailored to large-scale simulations and huge data (Meteo/climate, CFD, crash…)

Scalar --- suitable for small-to-medium sized problems

Limited performance scalability due to inter-PE communications

Merits and Demerits of Each Architecture

Vector

Scalar

Shared Distributed・ Shared Distributed

Excellent Effective PerformanceEase of Use(Auto Parallelization)

Excellent Effective PerformanceEase of Use(Auto Parallelization)

：Merits :Demerits

High ScalabilityHigh Scalability

Difficult Parallelization(Require High Skills)



Poor Effective Performance


Poor Effective Performance

Excellent Cost/Peak PerformanceExcellent Cost/Peak PerformanceEase of Use (Auto Parallelization)Ease of Use (Auto Parallelization)

Wide Application RangeWide Application Range

High Cost， Limited ScalabilityHigh Cost， Limited Scalability

Limited ScalabilityLimited Scalability

Some views from the weather & climate Some views from the weather & climate community for vector computerscommunity for vector computers

Shared-memory, vector computers manufactured in Japan, have a combination of usability and performance...

The purchase of Japanese vector computers would have an immediate impact on climate and weather science in the U.S.

The use of distributed memory, commodity-based processor parallel computers increases the needed software investment …

USGCRP Report (Dec.2000)

Pros and Cons about the Validity of the TOP500Pros and Cons about the Validity of the TOP500Pros: Ranking covering worldwide high-performance computers with much swayCons: NOT representing a complete range of applications. Too much impacts in policy makingChanging acceptance among the HPC community because of the increased dominance of

business computing vendors (particularly for lower rankings)

0

50

100

150

200

250

IBM Sun SGI

Cray

Compaq (11)HP (5)

47

Fujitsu (17)

Self(5)HPTi(1)Intel(1)

NEC(23)

Hitachi (16)

215

92

67

Finance, DB, Web etc. 　 72 sites#15/34 Charles Schwab#53 European Patent Office#93 Sobeys#102 Deutch Telekom#112 Bank Administration Institute (BAI)#120 State Farm#177 NTT#213 Chase Manhattan

Finance, DB, Web etc. 　 54 sites#136 New York City - Human Resources#139 Bank Westboro#140 E-commerce Stanta Clara#169 Ariline London#170 Bank Milano#171 Bank Munich#173 Chase GlobalNet#176Rakuten **

1 IBM ASCI White, 4938 Lawrence Livermore National Laboratory 2 Intel ASCI Red 2379Sandia National Labs 3 IBM ASCI Blue-Pacific

2144Lawrence Livermore National Laboratory 4 SGI ASCI Blue Mountain

1608Los Alamos National Laboratory 5 IBM SP Power3 375 MHz 　 1417Naval Oceanographic Office (NAVOCEANO) 6 IBM SP Power3 375 MHz 　 1179National Centers for Environmental Prediction 7 Hitachi SR8000-F1/112

1035Leibniz Rechenzentrum 8 IBM SP Power3 375 MHz 8 way 　 929UCSD/San Diego Supercomputer Center 9 Hitachi SR8000-F1/100

917High Energy Accelerator Research Organization /KEK 10 Cray Inc. T3E1200 892 　 Government

Nov. 2000

** Rank 176: Rakuten is the largest cyber mall in Japan!!!

(FLOPS)

Peak p

erform

ance

1G

10G

100G

1T

4T

Multi-node

IXS or HIPPI-SW

１６０Ｇ

８０Ｇ

４０Ｇ

２０Ｇ

１０Ｇ

●

●

●

●

A

　 B

C

D

●

●

●

●

●

５Ｔ

●

SX- 5 Series

A Model 64G - 128GF

B Model32G - 64GFLOPS

4G - 8GFLOPS

C Model16G - 32GFLOPS

D Model8G - 16GFLOPS

　

HPC ServerSX- 5S

●

●

●

　８Ｇ

　４Ｇ

１６Ｇ

8G - 16GFLOPS

SX-5 Series / SX-5S (HPC Server) ProductsSX-5 Series / SX-5S (HPC Server) Products

Single node

●

●

　 Be

Ce ●

4GFLOPS・

CPU Model

...what you pay for:

1 2 4 8 12 16 20 24 32 47

IFS

BOM GASP

BOM LAPS

CHMI_ALADIN

DMI HIRLAM

51 50 50 4947

4545

35.8

55

44

48.6

44.9

41.4 41.138.3

44 44 43 43

39

37

0

10

20

30

40

50

60

% o

f m

ach

ine

pe

ak

# CPUs

Sustained performance

Performance of Mission-Critical NWP Codes on the SX Series

SX Series in Meteorology / Environmental Science

Europe

･ Danish Meteorological Institute(DMI)

･ Bureau of Meteorology (BOM)/CSIRO

Ѓњ

･ Instituto Nacional De Pesquis Espaciais (INPE)

･ Atmospheric Environment Service(AES)

･ National Institute of Environmental Studies (NIES)･ Japan Marine Science and Technology Center (JAMSTEC)･ Frontier Research System for Global Change

･ Czech Hydrometeorological Institute(CHMI)

･ Institute for Atmospheric Physics in Germany(IAP)

Japan

Australia

North America

South America

Asia

SX Series at Worldwide Major Meteorological Institutions

･ Korea Meteorological Administration(KMA)

･ IRI Lamont Doherty･ Deutsches Klimarechenzentrum (DKRZ)

・ Interdisciplinary Center for Mathematical and Computational Modeling, Warsaw University(ICMW)

・ Istituto Nazionale di Geofisica e Vulcanologia (INGV)

・ Meteorological Service Singapore (MSS)

Swiss Center for Scientific Computing (CSCS)

Real-Time Visual Simulation LibraryReal-Time Visual Simulation Library

RVSLIBRVSLIB

http://www.sw.nec.co.jp/APSOFT/SX/rvslib_e/

Image-based visualization tailored to large volume of data resulting from numerical simulations / observations

Challenges in Visualizing a Large Volume of DataChallenges in Visualizing a Large Volume of Data

User’s terminalComputing server

Post-processor

1.199937909288815 -0.1175956774159311 3.017484603229200D-04 0.3024392297917339 1.219247451290822 -0.1220634853191233 2.453941548883239D-04 0.2809288930730908 1.238643912752106 -0.1256990939733991 1.843648193366380D-04 0.2589568927816136 1.257967395765750 -0.1284517830919192 1.183467845204172D-04 0.2367159678901764 1.277045209522219 -0.1303017241312552 4.732921003254189D-05 0.2144004355856013 1.295902107579022 -0.1312699897486619 -2.843950729607857D-05 0.1922945263984200 1.314703922311307 -0.1313582658024421 -1.051331901487677D-04 0.1699623273723783::

1.199937909288815 -0.1175956774159311 3.017484603229200D-04 0.3024392297917339 1.219247451290822 -0.1220634853191233 2.453941548883239D-04 0.2809288930730908 1.238643912752106 -0.1256990939733991 1.843648193366380D-04 0.2589568927816136 1.257967395765750 -0.1284517830919192 1.183467845204172D-04 0.2367159678901764 1.277045209522219 -0.1303017241312552 4.732921003254189D-05 0.2144004355856013 :

Internet

program cfdc implicit real*8 (a-h,o-z) parameter ( maxi=81,maxj=41,maxk=5 ) parameter ( maxgrd=maxi*maxj*maxk ,maxobj=101 & ,maxiwk=512*512*15 ,maxrwk=maxgrd*61 ) integer irvslibstatecc-- permanent array -- dimension x(maxgrd),y(maxgrd),z(maxgrd) & ,scal(maxgrd*5) & ,iobj(maxobj*6),rwork(maxrwk),iwork(maxiwk) & ,iobj2(maxobj*6)::

　　　　　　　　　　 Disk space problem

Storing all the computational results for each parameter setneeds more than several GBytes of disk space.Ex. 100*100*100 grid points*10000 time steps --- 200 GBytes (5 variables at each grid point)

　　　　　　　　　　 Disk space problem

Storing all the computational results for each parameter setneeds more than several GBytes of disk space.Ex. 100*100*100 grid points*10000 time steps --- 200 GBytes (5 variables at each grid point)

　　　　　　 Data transfer bottleneck

Transferring GB-order data over a networkis next to impossible.Ex. Effective performance 1MB/sec -->100GB/(1MB/sec)=28h

　　　　　　 Data transfer bottleneck

Transferring GB-order data over a networkis next to impossible.Ex. Effective performance 1MB/sec -->100GB/(1MB/sec)=28h

　　 Memory capacity problem

Loading a large volume of datathat were output by a supercomputermay be difficult.

　　 Memory capacity problem

Loading a large volume of datathat were output by a supercomputermay be difficult.

Intensive needs for grasping simulatedIntensive needs for grasping simulatedresults on the flyresults on the fly

Memory capacity NWP code : 100-200 array elements per grid Increasing demand with model resolution and complexity T319L50 model (40km mesh) requires 20-40 GBytes Ensemble forecasting of 50 members --> >> 1TBytes Data assimilation / chemical models much demanding Climate code :1-year simulation 30-60 Gbytes (T213L50) 2-4TBytes (T1280L100) Disk space NCAR: empirically 114 Bytes per MFLOP 5TBytes/month net growth* (*RCI Workshop, April 2000)

-Approach A : Conventional Post-processing –Approach A : Conventional Post-processing –(Vis5D, GrADS, and many of off-the-shelf packages)(Vis5D, GrADS, and many of off-the-shelf packages)

Graphical mapping and rendering on the client side Approach adopted by many conventional post-processors

◆ AdvantagesFull exploitation of server for number crunching and local

machine resources for graphical processing

◆ DrawbacksChallenges in transferring a huge volume of (polygon) data

across the network and manipulating them on the local server

Numericalsimulation

Mapping RenderingImage display

Computing server

User’s terminal

-Approach B (-Approach B (Server-side VisualizationServer-side Visualization)-)-Approach of NEC RVSLIBApproach of NEC RVSLIB

Both mapping and rendering processes on the server side

◆ AdvantagesEfficient usage of network because of transfer of image data (NOT massive polygon data)

Image compression techniques available for further reduction of data

◆ DrawbacksIncreased load of computing and memory resources on the server side for graphical mapping and rendering processes

Numericalsimulation

Mapping RenderingImage display

Computing server User’s terminal

Compressed Image Data

Program (Calling RVSLIB)

Image Display GUIRenderingAnimation

ScenarioFile

AnimationFile

VisualizationSteering of SolverCreation of Image

Tracking

Steering

Computing Server（ Supercomputer/Workstation）

Terminal（Workstation/PC）

(Flow Simulator etc.)RVSLIB Client

RVSLIB Server

RVSLIB: Real-time Visual Simulation LibraryRVSLIB: Real-time Visual Simulation Library

•Monitoring of an on-going simulation (tracking) and alteration of its parameters (steering) while continuing the simulation

- Constant and reduced data transfer rate between the server and client regardless of the scale of simulations

Reduced Cost and EffortEfficient Use of NW Bandwidth

Network (LAN/WAN)

RVSLIB/Server: SX, WSRVSLIB/Client: PC,WS (Java)

Usage of RVSLIBUsage of RVSLIB

Moviegeneration(batch mode)

--> Initialization in interactive mode Handshake with the client in batch mode Loading a scenario--> Data management (no data copy) --> Rendering C/S communication --> Termination

CALL RVS_INIT

CALL RVS_BFC

CALL RVS_TERM

Main loop body

Time integration

CALL RVS_MAIN

User’s code RVSLIB server

Server

Moviein AVIetc.

Moviein AVIetc.

Scenarioscript 　

Off-line converter

Moviein avi ormpeg2

RVSLIB/Client (GUI)(interactive mode) Tracking and steering of user code

- UNIX version based on X/Motif - Java version for Windows / UNIX

C/S communica-tion protocols

Intranet- TCP/IP socket Internet/firewall- HTTP Single machine- Shared memory

Interactive Mode

Batch Mode

Data interfaces with GrADS and NetCDF

formats supported for post-processing

R educed cost and effort on a trial-and-error basis - Monitoring of an on-going simulation (tracking) and alteration of its parameters

(steering) while continuing the simulation - Conventional post-processing and batch-mode graphics also available

Best Benefits Gained From RVSLIB

Efficient use of vector/parallel facilities and network - Efficient graphical processing and image creation capitalizing on vector/parallel computing capabilities - Reduced and almost constant network traffic exploiting image data compression

Animation based on scenario

- Easily navigable visualization based on a plot described in a scenario file

Library format tailored to a wide spectrum of simulation programs (BFC Grid, FEM, Multi-block grid, particle simulation, …)

Visualization of flow around a baseball Visualization of flow around a baseball - Collaboration with Physical & - Collaboration with Physical &

Chemical Res. Inst., Japan Chemical Res. Inst., Japan Computation: Finite Difference Method Unsteady, incompressible, viscous Flow Number of Grids: 169 * 92 * 101 Reynolds number: 100000 -- 200000 Ball Speed: 75 ～ 150km/h

Applications

Computation timing data (10000 time steps): Solver only (no visualization) 27150sec (7.54h) + Visualization with same viewing: 28000sec (7.78h) + Visualization with variable viewing: 28150sec (7.82h) ---> Almost no additional CPU time required for visualization because of high-speed visualization on the SX Series

# Computation on SX-5S1 (4GFlops) # Visualization every 10 time steps (contour and tracer) # Tracer movement calculated at each time step

Post-processing with RVSLIBPost-processing with RVSLIB- Collaboration with BoM/Australia -- Collaboration with BoM/Australia -

RVSLIB Client

NetCDF format files

User’s solver RVSLIB Server

Server SX-4/32

Compressed image data

NumericalWeather Prediction

Offlinevisualization

Bureau of Meteorology (Australia)

Oceanic circulation simulated with ACOM2

Atmospheric simulation --Relative humidity around Australia represented by isosurfaces

On-going & Future enhancementsOn-going & Future enhancements

◆ MPI-based performance optimization

◆ Hierarchical data structure for visualization of huge data

combined with wavelet transformation

◆ Inter-server collaboration

◆ Active visualization - Automatic extraction of specific features from data - Visualization combined with data mining　　　　

Needs for Grid ServicesNeeds for Grid Services

Remoteaccess

Remotemonitoring

Informationservices

Faultdetection

. . .Resourcecontrol

CollaborationTools

Data MgmtTools

Distributedsimulation

. . .

net

Toward Global Computing Environments

One Single Machine Never Fits All ...

SX-5 Series: http://www.sw.nec.co.jp/hpc/sx-e/index.html RVSLIB: http://www.sw.nec.co.jp/APSOFT/SX/rvslib_e/

[email protected]

March 29, 2001 Hiroshi Takahara & Toshifumi Takei NEC Corporation

Documents