http://www.jamstec.go.jp The Architecture and the Application Performance of the Earth Simulator Ken’ichi Itakura (JAMSTEC) 15 Dec., 2011 1 ICTS-TIFR Discussion Meeting-2011 Location of Earth Simulator Facilities Yokohama Earth Simulator Site Tokyo 2 HQ
19
Embed
Ken ichi Itakura (JAMSTEC) - Department of Physics
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
http://www.jamstec.go.jp
The Architecture and the Application Performance of the Earth Simulator
Ken’ichi Itakura (JAMSTEC)
15 Dec., 2011 1ICTS-TIFR Discussion Meeting-2011
Location of Earth Simulator Facilities
Yokohama
Earth Simulator Site
Tokyo
2
HQ
Earth Simulator Building
3
Cross‐sectional View of the Earth Simulator Building
Double Floor for Cables
Air Return Duct
Lightning Conductor
Power Supply SystemAir Conditioning system
Seismic Isolation System
Earth Simulator System
4
Earth Simulator
Earth Simulator
March, 2002 ~ March.2009 (1/2 : ~ Sep.2008)
Peak Performance : 40 T Flops
Main Memory : 10 T Bytes
Earth Simulator(Ⅱ)
March, 2009 ~
Peak Performance : 131 T Flops
Main Memory : 20 T Bytes
5
Development of ES started in 1997 with the aim of making a comprehensive understanding of global environmental changes such as global warming.
TOP500 List Earth Simulator got the top position at iSC02 (June 2002) and keep the top position two and half year.
Its construction was completed at the end of February, 2002 and the operation started from March 1, 2002 at the Earth Simulator Center
New Earth Simulator System (ES2) was installed late 2008 and started operation in march 2009.
NQS2 batch job system on PNsAgent request supportUse statistical and resource information managementAutomatic power saving management
FC SwitchFC Switch
ES2 System outline
Maximum power consumption 3000kVA8
New System Layout
50m
65mNewEarth Simulator(ES2)
Original Earth Simulator(Stoped)
Original Earth simulator is opened on March, 2002.
New System start operation on March, 2009.
9
Calculation nodesCalculation nodes
160 nodes
L Batch nodeS Batch node
Interactive node
2 nodes 2 nodes156 nodes
WORK area
Login server
login
HOME/DATA areas
Possible to refer
10
Clustering of nodes to control the system (transparent for uses) .A cluster is consists of 32 nodes.156 nodes are for batch jobs (batch clusters).
Calculation nodesCalculation nodes
160 nodes
L Batch nodeS Batch node
Interactive node
2 nodes 2 nodes156 nodes
WORK area
Login server
login
HOME/DATA areas
Possible to refer
11
Providing special 4 nodes for TSS and
small batch jobs.
Configuration of the TSS cluster.TSS nodes [2 nodes → 1node (changed in 2010)]
Nodes for Single Node batch jobs
[2 nodes → 3 nodes],
Calculation nodesCalculation nodes
160 nodes
L Batch nodeS Batch node
Interactive node
2 nodes 2 nodes156 nodes
WORK area
Login server
login
HOME/DATA areas
Possible to refer
12
Configuration of the batch cluster.Nodes for Multi‐Nodes batch jobs,
System disks for user‐file staging
Calculation nodesCalculation nodes
160 nodes
L Batch nodeS Batch node
Interactive node
2 nodes 2 nodes156 nodes
WORK area
Login server
login
HOME/DATA areas
Possible to refer
13
Storage of user files for batch jobs on a mass‐storage system.Automated file recall (Stage‐In) and migration (Stage‐Out).Connection of all the clusters to a mass‐storage system by IOCS (Linux WS)
70—80%, most of the rest is used for pre/post processing.
ES2 Operation
17
Projects
28th Sep, 2010 DOE HPC Best Practices Workshop 18
FY2011 ES2 Projects○ Proposed Research Projects : 29
Earth Science :18 Innovation : 11
○ Contract Research Projects・KAKUSHIN 5・ The Strategic Industrial Use (Industrial) 13・CREST 1
○ JAMSTEC Research Projects 14・JAMSTEC ・Collaboration Research ・Industrial fee-based usage (New project is accepted at any time.)
Users : 565Organization 125
University 57, Government 15, Company 34, International 19
Proposed
Contract
JAMSTEC
Resource Allocation
#nodes 1~4
28.7%
#nodes 5~8
17.6%
#nodes 9~16
12.4%
#nodes 17~32
21.8%
#nodes 33~64
16.7%
#nodes over 65
2.9%
Computing Resource Distribution (Based on Job Size)
FY2010
19
ES2 Application Field
Atmospheric and
Oceanographic Science28%
Solid Earth Science16%
Global Warming: IPCC
41%
Epoch‐Making Simulation
11%
Industorial Use4%
FY2010
20
21
ES2 Node Utilization(FY2010)
Stopped Operation on 14 Mar.
22
ES2 Node Utilization(FY2011)
※Degeneration operation is carried out for power saving.
April May June July August Sep. Oct.
ES22009(KWH)
3,065 3,105 2,944 3,084 2,973 3,042 3,091
ES2008(KWH)
3,987 4,013 3,978 4,015 4,138 3,9861,752(half
System)
Reduction rate
76.9% 77.4% 74.0% 76.8% 71.9% 76.3%(176.5
%)
・ES2 power consumption is reduced about 75% from ES. ・The ratio of peak performance and power consumption is 4.34 times better than ES.
23
Application Performance
ES2 Application ‐1
25
AFESOFESCFES
ES2 Application ‐2
26
ES2 Application ‐3
27
ES2 Application ‐4
28
Code Name Elapse Time on ES[sec]
#CPUs on ES Elapse Time on ES2[sec]
(Speedup ratio)
#CPUs on ES2
PHASE 135.3 4096 62.2 (2.18) 1024
NICAM‐K* 214.7 2560 109.3 (1.97) 640
MSSG 173.9 4096 86.5 (2.01) 1024
SpecFEM3D 96.3 4056 45.5 (2.12) 1014
Seism3D 48.8 4096 15.6 (3.13) 1024
Speedup ratio harmonic mean 2.22
Performance Evaluation ResultsIn ES Real Applications
29
ES2 is 2.22 times faster
WRF
• WRF (Weather Research and Forecasting Model) is a mesoscale meteorological simulation code which has been developed under the collaboration among US institutions, including NCAR (National Center for Atmospheric Research) and NCEP (National Centers for Environmental Prediction). JAMSTEC has optimized WRFV2 on the Earth Simulator (ES2) renewed in 2009 with the measurement of computational performance.
• As a result, we successfully demonstrated that WRFV2 can run on the ES2 with outstanding performance and the sustained performance.
• The Competition will focus on four of the most challenging benchmarks in the suite: – Global HPL ‐ the Linpack TPP benchmark which measures the floating
point rate of execution for solving a linear system of equations. DGEMM ‐measures the floating point rate of execution of double precision real matrix‐matrix multiplication.
– Global RandomAccess ‐measures the rate of integer random updates of memory (GUPS).
– EP STREAM (Triad) per system ‐ a simple synthetic benchmark program that measures sustainable memory bandwidth (in GB/s) and the corresponding computation rate for simple vector kernel.
– Global FFT ‐measures the floating point rate of execution of double precision complex one‐dimensional Discrete Fourier Transform (DFT).
33
The 2009 HPC Challenge Class 1 Awards:G-HPL Achieved System Affiliation
1st place 1533 Tflop/sCray XT5 ORNL
1st runner up 736 Tflop/sCray XT5 UTK
2nd runner up 368 Tflop/sIBM BG/P LLNL
G-RandomAccess Achieved System Affiliation
1st place 117 GUPSIBM BG/P LLNL
1st runner up 103 GUPSIBM BG/P ANL
2nd runner up 38 GUPSCray XT5 ORNL
G-FFT Achieved System Affiliation
1st place 11 Tflop/sCray XT5 ORNL
1st runner up 8 Tflop/sCray XT5 UTK
2nd runner up 7 Tflop/sNEC SX-9 JAMSTEC
EP-STREAM-Triad (system)
Achieved System Affiliation
1st place 398 TB/sCray XT5 ORNL
1st runner up 267 TB/sIBM BG/P LLNL
2nd runner up 173 TB/sNEC SX-9 JAMSTEC
28th Sep, 2010 DOE HPC Best Practices Workshop 34
XT5@ORNL is 2.3PF and ES2 is 131TF at peak. That is about 17 times . However, G‐FFT
performance is only 1.5 times.
New Earth Simulator(ES2 SX‐9/E)HPC Challenge Awards 2010