Top Banner
KAI ST Brief presentation of Earth Brief presentation of Earth Simulation Center Simulation Center Jang, Jae-Wan
57
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: KAIS T Brief presentation of Earth Simulation Center Jang, Jae-Wan.

KAIST

Brief presentation of Earth Brief presentation of Earth Simulation CenterSimulation Center

Jang, Jae-Wan

Page 2: KAIS T Brief presentation of Earth Simulation Center Jang, Jae-Wan.

CS610

Hardware configurationHardware configuration

Highly parallel vector supercomputer of the distributed-memory type

640 Processor nodes (PNs)

PN 8 vector-type arithmetic processors (APs)

16 GB main momory

Remote control and I/O parts

Page 3: KAIS T Brief presentation of Earth Simulation Center Jang, Jae-Wan.

CS610

Arithmetic processor Arithmetic processor

Page 4: KAIS T Brief presentation of Earth Simulation Center Jang, Jae-Wan.

CS610

Processor nodeProcessor node

Page 5: KAIS T Brief presentation of Earth Simulation Center Jang, Jae-Wan.

CS610

Processor nodeProcessor node

Page 6: KAIS T Brief presentation of Earth Simulation Center Jang, Jae-Wan.

CS610

Interconnection networkInterconnection network

Page 7: KAIS T Brief presentation of Earth Simulation Center Jang, Jae-Wan.

CS610

Interconnection NetworkInterconnection Network

Page 8: KAIS T Brief presentation of Earth Simulation Center Jang, Jae-Wan.

CS610

65m

50m

EarthEarth Simulator Research and Development CenterSimulator Research and Development Center

Page 9: KAIS T Brief presentation of Earth Simulation Center Jang, Jae-Wan.

CS610

SoftwareSoftware

OS NEC’s UNIX-based OS : SUPER-UX

Programming model

Supported language Fortran90, C, C++ (modified for ES)

hybrid flat

Inter-PN HPF/MPI HPF/MPI

Intra-PN Microtasking/OpenMP

AP Automatic vectoriztion

Page 10: KAIS T Brief presentation of Earth Simulation Center Jang, Jae-Wan.

KAIST

Earth Simulator CenterEarth Simulator Center

First results from the Earth SimulatorFirst results from the Earth Simulator

Resolution 300km

Page 11: KAIS T Brief presentation of Earth Simulation Center Jang, Jae-Wan.

KAIST

Earth Simulator CenterEarth Simulator Center

First results from the Earth SimulatorFirst results from the Earth Simulator

Resolution 120km

Page 12: KAIS T Brief presentation of Earth Simulation Center Jang, Jae-Wan.

KAIST

Earth Simulator CenterEarth Simulator Center

First results from the Earth SimulatorFirst results from the Earth Simulator

Resolution 20km

Page 13: KAIS T Brief presentation of Earth Simulation Center Jang, Jae-Wan.

KAIST

Earth Simulator CenterEarth Simulator Center

First results from the Earth SimulatorFirst results from the Earth Simulator

Resolution 10km

Page 14: KAIS T Brief presentation of Earth Simulation Center Jang, Jae-Wan.

CS610

Ocean Circulation Model  ( MOM3 developed by GFDL ) resolution : 0.1º× 0.1º ( 10km) initial condition : Levitus data (1982) computer resources :   number of nodes = 175,

elapsed time 8,100 hours  

First results from the Earth SimulatorFirst results from the Earth Simulator

Page 15: KAIS T Brief presentation of Earth Simulation Center Jang, Jae-Wan.

CS610

First results from the Earth SimulatorFirst results from the Earth Simulator

resolution : 0.1º× 0.1º ( 10km) resolution : 1º× 1º ( 100km)

Ocean Circulation Model  ( MOM3 developed by GFDL )

Page 16: KAIS T Brief presentation of Earth Simulation Center Jang, Jae-Wan.

KAIST

Terascale Cluster:Terascale Cluster:System XSystem X

Virginia Tech, Apple, Mellanox, Cisco, and Liebert

2003. 3. 16

Daewoo Lee

Page 17: KAIS T Brief presentation of Earth Simulation Center Jang, Jae-Wan.

CS610

Terascale Cluster: System XTerascale Cluster: System X

A Groundbreaking Supercomputer Cluster with Industrial Assistance

Apple, Mellanox, Cisco, and Liebert

$5.2 million for hardware

10280/17600 GFlops of Performance with 1100 Nodes (3rd Ranked in TOP500 Supercomputer Site)

Page 18: KAIS T Brief presentation of Earth Simulation Center Jang, Jae-Wan.

CS610

GoalsGoals

Computational Scienceand Engineering

Research

Nanoscale Electronics Quantum Chemistry Molecular Statistics

Fluid DynamicsLarge-Scale Network

EmulationOptimal Design

Computational Scienceand Engineering

Research

Nanoscale Electronics Quantum Chemistry Molecular Statistics

Fluid DynamicsLarge-Scale Network

EmulationOptimal Design

Experimental System

Fault Tolerance and Migration

Queuing System and Scheduler

Distributed Operating System

Parallel FilesystemMiddleware for Grids

Authentication/Security System

Experimental System

Fault Tolerance and Migration

Queuing System and Scheduler

Distributed Operating System

Parallel FilesystemMiddleware for Grids

Authentication/Security System

Dual Usage Mode(90% of computational cycles devoted to production use)

Page 19: KAIS T Brief presentation of Earth Simulation Center Jang, Jae-Wan.

CS610

Hardware ArchitectureHardware Architecture

NodeApple G5 Platform

Dual IBM PowerPC 970(64-bit CPU)

Primary Communication

InfiniBand by Mellanox(20Gbps full duplex, fat-tree topology)

Secondary Communication

Gigabit Ethernet by Cisco

Cooling System by Liebert

Page 20: KAIS T Brief presentation of Earth Simulation Center Jang, Jae-Wan.

CS610

SoftwareSoftware

Mac OS X (FreeBSD based)

MPI-2 (MPICH-2)

Support C/C++/Fortran compilation

Déjà vu: transparent fault-tolerance systemMaintain computer stability by transferring a failed application to another location without alerting the computer, thus keeping the application intact.

Page 21: KAIS T Brief presentation of Earth Simulation Center Jang, Jae-Wan.

CS610

ReferenceReference

Terascale Cluster Web Sitehttp://computing.vt.edu/research_computing/terascale

Page 22: KAIS T Brief presentation of Earth Simulation Center Jang, Jae-Wan.

KAIST

4th fastest supercomputer4th fastest supercomputer

TungstenTungsten

PAK, EUNJI

Page 23: KAIS T Brief presentation of Earth Simulation Center Jang, Jae-Wan.

CS610

44thth : NCSA Tungsten : NCSA Tungsten

Top500.org

National Center for Supercomputing Applications (NCSA)

University of Illinois at Urbana-Champaign

Page 24: KAIS T Brief presentation of Earth Simulation Center Jang, Jae-Wan.

CS610

Tungsten Architecture Tungsten Architecture [1/3][1/3]

Tungsten

Xeon 3.0 GHz Dell cluster

2,560 processors

3 GB memory/node

Peak performance: 15.36 TF

Top 500 list debut: #4 (9.819 TF, November 2003)

Currently 4th fastest supercomputer in the world

Page 25: KAIS T Brief presentation of Earth Simulation Center Jang, Jae-Wan.

CS610

Tungsten Architecture Tungsten Architecture [2/3][2/3]

Components

Myrinet

I/ONode

104 nodes

I/ONode

Shared:122TB

ComputeNode

P P

ComputeNode

P P

ComputeNode

P P

1280 nodes (2560 Processors)

Dell PowerEdge 1750with 3GB DDR SDRAM Intel Xeon 3.06 GHz (dual)

Linux 2.4.20 (Red Hat 9.0)

Cluster File System

Compilers

Intel Fortran 77/90/95 C C++GNU Fortran 77 C C++

LSF + Maui Scheduler

User Applications

Page 26: KAIS T Brief presentation of Earth Simulation Center Jang, Jae-Wan.

CS610

Tungsten Architecture Tungsten Architecture [3/3][3/3]

1450 nodesDell PowerEdge 1750 Server

Intel Xeon 3.06GHZ : Peak performance 6.12GFLOPS

1280 compute nodes, 104 I/O nodes

Parallel I/O 11.1 Gigabytes per second (GB/s) of I/O throughput

Complements the cluster’s 9.8TFLOPS of computational capability

104 node I/O sub-cluster with more than 120TBNode local : 73GB, Shared : 122TB

Page 27: KAIS T Brief presentation of Earth Simulation Center Jang, Jae-Wan.

CS610

Applications on Tungsten Applications on Tungsten [1/3][1/3]

PAPI and PerfSuitePAPI : Portable interface to hardware performance counters

PerfSuite : Set of tools for performance analysis on Linux platforms

Page 28: KAIS T Brief presentation of Earth Simulation Center Jang, Jae-Wan.

CS610

Applications on Tungsten Applications on Tungsten [2/3][2/3]

PAPI and PerfSuite

Page 29: KAIS T Brief presentation of Earth Simulation Center Jang, Jae-Wan.

CS610

Applications on Tungsten Applications on Tungsten [3/3][3/3]

CHARMM (Harvard Version)Chemistry at Harvard Macromolecular Mechanics

General purpose molecular mechanics, molecular dynamics and vibrational analysis packages

Amber 7.0A set of molecular mechanical force fields for the simulation of bimolecular

Package of molecular simulation programs

Page 30: KAIS T Brief presentation of Earth Simulation Center Jang, Jae-Wan.

KAIST

MPP2 SupercomputerMPP2 SupercomputerThe world’s largest Itanium2 cluster.The world’s largest Itanium2 cluster.

Molecular Science Computing FacilityPacific Northwest National Laboratory

2004. 3. 16

Presentation : Kim SangWon

Page 31: KAIS T Brief presentation of Earth Simulation Center Jang, Jae-Wan.

CS610

ContentsContents

MPP2 Supercomputer Overview

Configuration

HP rx2600(Longs Peak) Node

QsNet ELAN Interconnect Network

System/Application Software

File System

Future Plan

Page 32: KAIS T Brief presentation of Earth Simulation Center Jang, Jae-Wan.

CS610

MPP2 OverviewMPP2 Overview

MPP2 The High Performance Computing System-2

At the Molecular Science Computing Facilityin the William R. Wiley Environmental Molecular Sciences Laboratoryat Pacific Northwest National Laboratory

the fifth-fastest supercomputer in the world in the November 2003

Page 33: KAIS T Brief presentation of Earth Simulation Center Jang, Jae-Wan.

CS610

MPP2 Overview MPP2 Overview

System Name : Mpp2Linux Supercomputer cluster11.8(8.633) Teraflops6.8 Terabytes of memory

Purpose : Production

Platform : HP Integrity rx2600 bi-Itanium2 1,5 Ghz

Nodes : 980 (Processors : 1960)

¾ Megawatt of power220 Tons of Air Conditioning4,000 Sq. Ft.

Cost: $24.5 million (estimated)

Generator

UPS

Page 34: KAIS T Brief presentation of Earth Simulation Center Jang, Jae-Wan.

CS610

Configuration(Phase2b)Configuration(Phase2b)

1,856 Madison Batch CPUs

Elan4

4 Login nodes with 4Gb-Enet

2 System Mgt nodes

1,900 next generation Itanium® processors

11.4TF6.8TB Memory

…...928

compute nodes

SAN / 53TB

Lustre

Elan3

Elan4Not

Operational

Operational: September 2003

Page 35: KAIS T Brief presentation of Earth Simulation Center Jang, Jae-Wan.

CS610

Each node has:2 Intel Itanium 2 Processors(1.5Ghz)

6.4GB/s System bus

8.5GB/s Memory bus

12GB of RAM

1 1000T Connection

1 100T Connection

1 Serial Connection

2 Elan3 Connections

HP rx2600 Longs Peak Node HP rx2600 Longs Peak Node ArchitectureArchitecture

PCI-X2 (1GB/s)Elan3

Elan3

2SCSI160

Page 36: KAIS T Brief presentation of Earth Simulation Center Jang, Jae-Wan.

CS610

QsNet ELAN Interconnect NetworkQsNet ELAN Interconnect Network

High bandwidth, Ultra low latency and scalability900Mbytes/s user space to user space bandwidth.

1024 nodes for standard QsNet conf., rising to 4096 in QsNetII systems.

Optimized libraries for common distributed memory programming models exploit the full capabilities of the base hardware.

Page 37: KAIS T Brief presentation of Earth Simulation Center Jang, Jae-Wan.

CS610

Software on MPP2 (1/2)Software on MPP2 (1/2)

System SoftwareOperating System - Red Hat Linux 7.2 Advanced Server

NWLinux : tailored to IA64 clusters (2.4.18 kernel with various patches)

Cluster Management : Resource Management System(RMS) by QuadrixA single point interface to the system for resource managementMonitoring, Fault diagnosis, Data collection, Allocating CPUs, Parallel jobs execution…

Job Management SoftwareLSF(Load Sharing Facility) Batch SchedulerQBank : Control and Manage CPU resources allocated to projects or users.

Compiler SoftwareC (ecc), F77/F90/F95 (efc), G++

Code DevelopmentEtnus TotalView

A parallel and multithreaded application debugger

Vampirthe GUI driven frontend used to visualize the profile data of running a program

gdb

Page 38: KAIS T Brief presentation of Earth Simulation Center Jang, Jae-Wan.

CS610

Software on MPP2 (2/2)Software on MPP2 (2/2)

Application SoftwareQuantum Chemistry Codes

GAMESS(The General Atomic and Molecular Electronic Structure System)performing a variety of ab initio molecular orbital (MO) calculations

MOLPRO an advanced ab initio quantum chemistry software package

NWChemcomputational chemistry software developed by EMSL

ADF (Amsterdam Density Functional) 2000 software for first-principle electronic structure calculations via Density-Functional Theory (DFT)

General Molecular Modeling Software : Amber

Unstructured Mesh Modeling Codes NWGrid (Grid Generator)

hybrid mesh generation, mesh optimization, and dynamic mesh maintenance NWPhys (Unstructured Mesh Solvers)

a 3D, full-physics, first principles, time-domain, free-Lagrange code for parallel processing using hybrid grids.

Page 39: KAIS T Brief presentation of Earth Simulation Center Jang, Jae-Wan.

CS610

File System on MPP2File System on MPP2

Four file systems available on the cluster:Local filesystem(/scratch)

On each of the compute nodes

Non-persistent storage area provided to a parallel job running on that node.

NFS filesystem(/home)User home directory and files are located.

Uses RAID-5 for reliability

Lustre Global filesystem(/dtemp)Designed for the world's largest high-performance compute clusters.

Aggregate write rate of 3.2 Gbyte/s.

Restart files and files needed for post analysis.

Long term global scratch space

AFS filesystem(/msrc) On the front-end (non-compute) nodes

Page 40: KAIS T Brief presentation of Earth Simulation Center Jang, Jae-Wan.

CS610

Future Plan…Future Plan…

MPP2 will be upgraded with the faster Quadrics QsNetII interconnect in early 2004

1,856 Madison Batch CPUs

Elan4

4 Login nodes with 4Gb-Enet

2 System Mgt nodes

…...928

compute nodes

SAN / 53TB

Lustre

Page 41: KAIS T Brief presentation of Earth Simulation Center Jang, Jae-Wan.

KAIST

Bluesky SupercomputerBluesky Supercomputer

Top 500 Supercomputers

CS610 Parallel Processing

Donghyouk Lim

(Dept of Computer Science, KAIST)

Page 42: KAIS T Brief presentation of Earth Simulation Center Jang, Jae-Wan.

CS610

Contents Contents

Introduction

National Center for Atmosphere Research

Scientific Computing Division

Hardware

Software

Recommendations for usage

Related Link

Page 43: KAIS T Brief presentation of Earth Simulation Center Jang, Jae-Wan.

CS610

IntroductionIntroduction

Bluesky 13th Supercomputer in the world

Clustered Symmetric Multi-Processing(SMP) System

1600 IBM Power 4 processor

Peak of 8.7 TFLOP

Page 44: KAIS T Brief presentation of Earth Simulation Center Jang, Jae-Wan.

CS610

National Center for Atmosphere National Center for Atmosphere ResearchResearch

Established in 1960

Located in Boulder, Colorado

Research areaEarth system

Climate change

Changes in atmospheric composition

Page 45: KAIS T Brief presentation of Earth Simulation Center Jang, Jae-Wan.

CS610

Scientific Computing DivisionScientific Computing Division

Research on high-performance supercomputing

Computing resourcesBluesky (IBM Cluster 1600 running AIX) : 13th place

blackforest (IBM SP RS/6000 running AIX) : 80th place

Chinook complex: Chinook (SGI Origin3800 running IRIX) and Chinook (SGI Origin2100 running IRIX)

Page 46: KAIS T Brief presentation of Earth Simulation Center Jang, Jae-Wan.

CS610

HardwareHardware

Processor 1600 Power 4 Processors 1.3 GHz

each can perform up to 4 fp operations per cycle

Peak of 8.7 TFLOPS

Memory2 GB memory per processor

memory on a node is shared between processors on that node

Memory CachesL1 cache : 64KB I-cache, 32KB d-cache, direct mapped

L2 cache : For pair of processors, 1.44MB, 8-way set associative

L3 cache : 32MB, 512byte cache line, 8-way set associative

Page 47: KAIS T Brief presentation of Earth Simulation Center Jang, Jae-Wan.

CS610

HardwareHardware

Computing Nodes8-way processor nodes: 76

32-way processor nodes: 25

32-processor nodes for running interactive jobs: 4

Separate nodes for user logins

System support nodes 12 nodes dedicated to the General Parallel File System (GPFS)

Four nodes dedicated to HiPPI communications to the Mass Storage System

Two master nodes dedicated to controlling LoadLeveler operations

One dedicated system monitoring node

One dedicated test node for system administration, upgrades, testing

Page 48: KAIS T Brief presentation of Earth Simulation Center Jang, Jae-Wan.

CS610

HardwareHardware

Storage RAID disk storage capacity: 31.0 TB total

Each user application can access 120 GB of temporary space

Interconnect fabricSP switch2 (“Colony” switch)

Two full duplex network path to increase throughput

Bandwidth : 1.0GB per second bidirectional

Worst case latency : 2.5 microsecond

HiPPI(High-Performance Parallel Interface) to the Mass Storage System

Gigabit Ethernet network

Page 49: KAIS T Brief presentation of Earth Simulation Center Jang, Jae-Wan.

CS610

SoftwareSoftware

Operating System: AIX (IBM-proprietary UNIX)

Compilers: Fortran (95/90/77), C, C++

Batch subsystem: LoadLeveler Managing serial and parallel jobs over a cluster of servers

File System: General Parallel File System (GPFS)

System information commands: spinfo for general information, lslpp for information about libraries

Page 50: KAIS T Brief presentation of Earth Simulation Center Jang, Jae-Wan.

CS610

Related LinksRelated Links

NCAR : http://www.ncar.ucar.edu/ncar/

SCD : http://www.scd.ucar.edu/

Bluesky : http://www.scd.ucar.edu/computers/bluesky/

IBM p690 : http://www-903.ibm.com/kr/eserver/pseries/highend/p690.html

Page 51: KAIS T Brief presentation of Earth Simulation Center Jang, Jae-Wan.

KAIST

About Cray X1About Cray X1

Kim, SooYoung ([email protected])

(Dept of Computer Science, KAIST)

Page 52: KAIS T Brief presentation of Earth Simulation Center Jang, Jae-Wan.

CS610

Features Features (1/2)(1/2)

Contributing areasweather and climate prediction, aerospace engineering, automotive design, and a wide variety of other applications important in government and academic research

Army High Performance Computing Research Center (AHPCRC), Boeing, Ford, Warsaw Univ., U.S. Government, Department of Energy's Oak Ridge National Laboratory (ORNL)

Operating System: UNICOS/mptm from UNICOS, UNICOS/mktm

True single system image (SSI)

Scheduling algorithms for parallel applications

Accelerated application mode and migration

Variable processor utilization: Each CPU has four internal processorsTogether as a closely coupled, multistreaming processor (MSP)

Individually as four single-streaming processors (SSPs)

Flexible system partitioning

Page 53: KAIS T Brief presentation of Earth Simulation Center Jang, Jae-Wan.

CS610

Features Features (2/2)(2/2)

Scalable system architectureDistributed shared memory (DSM)

Scalable cache coherence protocol

Scalable address translation

Parallel programming modelsShared-memory parallel models

Traditional distributed-memory parallel models: MPI and SHMEM

Up-and-coming global distributed-memory parallel models: Unified Parallel C(UPC)

Programming environmentsFortran compiler, C and C++ compiler

High-performance scientific library (LibSci), language support libraries, system libraries

Etnus TotalView debugger, CrayPat (Cray Performance Analysis Tool)

Page 54: KAIS T Brief presentation of Earth Simulation Center Jang, Jae-Wan.

CS610

Node ArchitectureNode Architecture

Figure 1. Node, Containing Four MSPs

Page 55: KAIS T Brief presentation of Earth Simulation Center Jang, Jae-Wan.

CS610

System Conf. ExamplesSystem Conf. Examples

Cabinets

CPUs Memory Peak Performance

1 (AC) 16 64 – 256 GB 204.8 Gflops

1 64 256 – 1024 GB 819.0 Gflops

4 256 1024 – 4096 GB 3.3 Tflops

8 512 2048 – 8192 GB 6.6 Tflops

16 1024 4096 – 16384 GB 13.1 Tflops

32 2048 8192 – 32768 GB 26.2 Tflops

64 4096 16384 – 65536 GB 52.4 Tflops

Page 56: KAIS T Brief presentation of Earth Simulation Center Jang, Jae-Wan.

CS610

Technical Data Technical Data (1/2)(1/2)

Technical specificationsPeak performance 52.4 Tflops in a 64 cabinet configuration

Architecture Scalable vector MPP with SMP nodes

Processing elementProcessor Cray custom design vector CPU

16 vector floating-point operations/clock cycle32- and 64-bit IEEE arithmetic

Memory size 16 to 64GB per node

Data error protection SECDED

Vector clock speed 800MHz

Peak performance 12.8 Gflops per CPU

Peak memory bandwidth

34.1 GB/sec per CPU

Peak cache bandwidth 76.8 GB/sec per CPU

Packaging 4 CPUs per nodeUp to 4 nodes per AC cabinet, up to 4 interconnected cabinetsUp to 16 nodes per LC cabinet, up to 64 interconnected cabinets

Page 57: KAIS T Brief presentation of Earth Simulation Center Jang, Jae-Wan.

CS610

Technical Data Technical Data (2/2)(2/2)

MemoryTechnology RDRAM with 204 GB/sec peak bandwidth per node

Architecture Cache coherent, physically distributed, globally addressable

Total system memory size

32 GB to 64 TB

Interconnect networkTopology Modified 2D torus

Peak global bandwidth 400 GB/sec for a 64-CPU Liquid Cooled (LC) system

I/OI/O system port channels

4 per node

Peak I/O bandwidth 1.2 GB/sec per channel