Two Types of Supercomputer developments Yutaka Ishikawa RIKEN AICS University of Tokyo 1 2014/09/03 Session.

Two Types of Supercomputer developments

Yutaka IshikawaRIKEN AICS

University of Tokyo

1

2014/09/03

http://computing.ornl.gov/workshops/exascale14/

Session 2: Deployed Ecosystems and Roadmaps for the Future

Smoky Mountains Computational Sciences and Engineering Conference

Supercomputers in Japan

2014/09/03

FLAGSHIP Machine

K Computer

1

10PF

Riken

9 Universitiesand National Laboratories

HPCI (High Performance Computing Infrastructure) is formed from those machines, called leading machines

Features: Single sign-onShared storage (Distributed file system)

As of Jun 2012

Each supercomputer center has one, two or more supercomputers.

Each supercomputer center replace their machines every 4.5 to 6 years.

2

Procurement Policies in Supercomputer Centers

2014/09/03

• Flagship-Aligned Commercial Machine (FAC)– Acquiring a machine whose architecture is the same of the

flagship machine.• Complimentary Function Leading Machine (CFL-M, CFL-D)

– Acquiring a machine whose architecture is different than the flagship machine, e.g. vector machine.

– CFL-M: a commercial machine provided by a vendor– CFL-D: a new machine developed by both a vendor and

supercomputer center.• Upscale Commodity Cluster Machine (UCC)

– Acquiring a large-scale commodity cluster• Technology Path-Forward Machine (TPF)

– Design and development of future advanced machine

3

Supercomputer Centers located at Japanese Universities

Fiscal Year 2012 2013

2014

2015

2016

2017

2018

2019

2020

2021 2022 2023

Hokkaido

Tohoku

Tsukuba

Tokyo

Tokyo Tech.

Nagoya

Kyoto

Osaka

Kyushu

T2K Todai (140 TF)T2K Todai (140 TF)

50+ PF (FAC) 3MW50+ PF (FAC) 3MW

100+ PF(UCC + TPC) 4MW

100+ PF(UCC + TPC) 4MW

Post T2K -- 30 PF (UCC + TPF) 4MWPost T2K -- 30 PF (UCC + TPF) 4MW

(Manycore system) (700+ TF)(Manycore system) (700+ TF) HA-PACS (800 TF) HA-PACS (800 TF)

NEC SX-9 + Exp5800 (31TF)NEC SX-9 + Exp5800 (31TF)

50-100 Pflops(FAC + UCC)50-100 Pflops(FAC + UCC)

　　　　　　 Fujitsu FX10 (90.8TF, 31.8 TB/s), CX400(470.6TF, 55 TB/s)

Fujitsu FX10 (1PFlops, 150TiB, 408 TB/s), Hitachi SR16000/M1 (54.9 TF, 10.9 TiB, 5.376 TB/s)

Fujitsu M9000(3.8TF, 1TB/s) HX600(25.6TF, 6.6TB/s) FX1(30.7TF, 30 TB/s) Upgrade (3.6PF) 3MW

-50 PF (TPF) 2MW-50 PF (TPF) 2MW

100 ～ 200 PF (FAC/TPF + UCC)

100 ～ 200 PF (FAC/TPF + UCC)4MW

Hitachi SR16000/M1 (172 TF, 22TB) Cloud System Hitachi BS2000 (44TF, 14TB)

Hitachi SR16000/M1 (172 TF, 22TB) Cloud System Hitachi BS2000 (44TF, 14TB)

10+ PF (CFL-M/TPF + UCC) 1.5 MW

10+ PF (CFL-M/TPF + UCC) 1.5 MW

100 PF 2 MW(CFL-M/TPF+UCC)

~1PF ,~1PB/s(CFL-M) ~2MW~1PF ,~1PB/s(CFL-M) ~2MW 30+PF, 30+PB/s (CFL-D) ~5.5MW(max)

30+PF, 30+PB/s (CFL-D) ~5.5MW(max)

Tsubame 3.0 (20~30 PF, 2~6PB/s)1.8MW (Max 3MW)

Tsubame 3.0 (20~30 PF, 2~6PB/s)1.8MW (Max 3MW)

Tsubame 4.0 (100~200 PF, 20~40PB/s), 2.3~1.8MW (Max 3MW)

Tsubame 4.0 (100~200 PF, 20~40PB/s), 2.3~1.8MW (Max 3MW)

Tsubame 2.5 (5.7 PF, 110+ TB, 1160 TB/s), 1.8MW

Tsubame 2.0 (2.4PF, 97TB, 744 TB/s)1.8MW

Cray XC30 (400TF)Cray XC30 (400TF)600TF600TF

6-10 PF(FAC/TPF + UCC) 1.8 MW

6-10 PF(FAC/TPF + UCC) 1.8 MW

100+ PF(FAC/TPF + UCC) 1.8-2.4 MW

100+ PF(FAC/TPF + UCC) 1.8-2.4 MW

Cray XE6 (300TF, 92.6TB/s),GreenBlade 8000 (243TF, 61.5 TB/s)SX-8 + SX-9 (21.7 TF, 3.3 TB, 50.4 TB/s)SX-8 + SX-9 (21.7 TF, 3.3 TB, 50.4 TB/s) 500+ TB/s (CFL-M) 1.2 MW500+ TB/s (CFL-M) 1.2 MW 5+ PB/s (TPF) 1.8 MW5+ PB/s (TPF) 1.8 MW

Hitachi SR1600(25TF)Hitachi SR1600(25TF)

Fujitsu FX10 （ 270TF)+FX10 相当 (180TF), CX400/GPGPU (766TF, 183 TB)Fujitsu FX10 （ 270TF)+FX10 相当 (180TF), CX400/GPGPU (766TF, 183 TB)

5-10 PF (FAC)5-10 PF (FAC)Hitachi HA8000tc/ Xeon Phi (712TF, 242 TB) , SR16000(8.2TF, 6 TB)Hitachi HA8000tc/ Xeon Phi (712TF, 242 TB) , SR16000(8.2TF, 6 TB)

100-150 PF(FAC/TPF + UCC)

100-150 PF(FAC/TPF + UCC)

10-20 PF(UCC + TPF)

10-20 PF(UCC + TPF)

3MW2.6MW2.0MW

42014/09/03

Towards the Next Flagship Machine

2014/09/03 5

1

10

100

100Post K Computer

U. of TsukubaU. of Tokyo

PostT2K

T2K

PF

2008 2010 2012 2014 2016 2018 2020

U. of TsukubaU. of TokyoKyoto U.

RIKEN

9 Universitiesand National Laboratories

PostT2K

Arch: UPCC (Upscale Commodity Cluster Machine)

Soft: TPF (Technology Path-Forward Machine)

Manycore architecture

O(10K) nodes

• PostT2K is a production system operated by both Tsukuba and Tokyo

PostK

Flagship Machine

Manycore architecture

O(100K-1M) nodes

• System software and parallel programming language in PostT2K will be employed in a part of Post K’s software environment

• Machine resources will be used to develop system software stack in PostK

PostT2K

• Hardware– Latest CPU technology is assumed– Specifying

• Node Performance, Memory Capacity/Bandwidth, Interconnect performance, File I/O performance, Storage Capacity

• Software– Specifying

• Operating System (Linux and McKernel)• Programming Languages (Fortran, C/C++,

Xcalable MP)• Communication Library (MPI-3)• Math Libraries• File System• Batch Job System

2014/09/03 6

DevelopmentProcurement• McKernel

– Light Weight Microkernel• Xcalable MP

– Parallel Programming Language

• MPICH with Low-level Communication Facility

Linux + McKernel

• Concerns– Reducing memory contention– Reducing data movement among cores– Providing new memory management– Providing fast communication– Parallelizing OS functions achieving less

data movement• New OS mechanisms and APIs are

revolutionarily/evolutionally created and examined, and selected

• Linux with Light Weight Micro Kernel– IHK (Interface for Heterogeneous Kernel)

• Loading a kernel into cores• Communication between Linux and the kernel

– McKernel• Customizable OS environment

– E.g. environment without CPU scheduler (without timer interrupt)

2014/09/03 7

Core

McKernel

Linux Kernel

Dae

mon

Core Core

Use

r pr

oces

s

Use

r pr

oces

s

Dae

mon

Dae

mon

Core

Interface for Hetero. Kernels

System call to LMKSystem call to Linux

Running on both Xeon and Xeon-phi environments

IHK and McKernel have been developed at the University of Tokyo and Riken with Hitachi, NEC, and Fujitsu

PostT2K OS Environment being developped

• Linux Kerne ｌ＋ McKernel– Several variations of McKernel are provided for applications– Linux Kernel resides, but an McKernel is selectively loaded for each

application

2014/09/03 8

Linux kernel is residentApp A on McKernel

without CPU scheduler Is invoked

Finish

App C on McKernel with Segmentation is invoked

Finish

App B on McKernel with CPU scheduler Is invoked

Finish

App D on Linux Is invoked

Finish

XcalableMP(XMP) http://www.xcalablemp.org

What’s XcalableMP (XMP for short)? A PGAS programming model and language

for distributed memory , proposed by XMP Spec WG

XMP Spec WG is a special interest group to design and draft the specification of XcalableMP language. It is now organized under PC Cluster Consortium, Japan. Mainly active in Japan, but open for everybody. Project status (as of Nov. 2013)

XMP Spec Version 1.2 is available at XMP site. new features: mixed OpenMP and OpenACC , libraries for collective communications.

Reference implementation by U. Tsukuba and Riken AICS: Version 0.7 (C and Fortran90) is available for PC clusters, Cray XT and K computer. Source-to- Source compiler to code with the runtime on top of MPI and GasNet.

9

Poss

iblit

yof

Per

form

ance

tun

ing

Programming cost

MPI

Automaticparallelization

PGAS

HPF

chapel

XcalableMPXcalableMP

Poss

iblit

yof

Per

form

ance

tun

ing

Programming cost

MPI

Automaticparallelization

PGAS

HPF

chapel

XcalableMPXcalableMP

int array[YMAX][XMAX];

#pragma xmp nodes p(4)#pragma xmp template t(YMAX)#pragma xmp distribute t(block) on p#pragma xmp align array[i][*] to t(i)

main(){int i, j, res;res = 0;

#pragma xmp loop on t(i) reduction(+:res)for(i = 0; i < 10; i++)for(j = 0; j < 10; j++){

array[i][j] = func(i, j);res += array[i][j];

}}

add to the serial code : incremental parallelization

data distribution

work sharing and data synchronization

Language Features Directive-based language extensions for

Fortran and C for PGAS model Global view programming with global-view

distributed data structures for data parallelism SPMD execution model as MPI pragmas for data distribution of global

array. Work mapping constructs to map works

and iteration with affinity to data explicitly. Rich communication and sync directives

such as “gmove” and “shadow”. Many concepts are inherited from HPF

Co-array feature of CAF is adopted as a part of the language spec for local view programming (also defined in C).

XMP provides a global view for data parallel

program in PGAS model

Code example

10

Roles of PC Cluster Consortium

Development, Maintenance and Promotion

2014/09/03

Members: Univ. of Tsukuba, Univ. of Tokyo, Titech, AMD, Intel, Fujitsu, Hitachi, NEC, Cray, …

IHK, McKernel, LLC, XMP

Integration of other open sources, e.g., MPICH

Distribution as open source Promotion

PostT2KPostK

Vendor

Contribution Contribution

Vendor

ContributionContribution

SupportSupport

PC cluster consortium was established in 2001. The original mission was to contribute to the PC cluster market through the development, maintenance, and promotion of cluster system software based on the SCore cluster system software and Omni OpenMP compiler, developed by the Real World Computing Partnership funded by the Japanese government from 1992 for 10 years.

International Collaboration between DOE and MEXT

2014/09/03 11

PROJECT ARRANGEMENTUNDER THE IMPLEMENTING ARRANGEMENT

BETWEENTHE MINISTRY OF EDUCATION, CULTURE, SPORTS, SCIENCE AND TECHNOLOGY

OF JAPANAND

THE DEPARTMENT OF ENERGY OF THE UNITED STATES OF AMERICACONCERNING COOPERATION IN RESEARCH AND DEVELOPMENT IN ENERGY AND

RELATED FIELDS

CONCERNING COMPUTER SCIENCE AND SOFTWARE RELATED TO CURRENT AND FUTURE HIGH 　 PERFORMANCE 　 COMPUTING 　 FOR 　 OPEN 　 SCIENTIFIC 　

RESEARCH Yoshio Kawaguchi (MEXT, Japan)and William Harrod(DOE, USA)

Purpose: Work together where it is mutually beneficial to expand the HPC ecosystem and improve system capability

– Each country will develop their own path for next generation platforms

– Countries will collaborate where it is mutually beneficial• Joint Activities

– Pre-standardization interface coordination– Collection and publication of open data– Collaborative development of open source software– Evaluation and analysis of benchmarks and

architectures– Standardization of mature technologies

• Kernel System Programming Interface• Low-level Communication Layer• Task and Thread Management to Support Massive

Concurrency• Power Management and Optimization• Data Staging and Input/Output (I/O) Bottlenecks • File System and I/O Management• Improving System and Application Resilience to Chip

Failures and other Faults• Mini-Applications for Exascale Component-Based

Performance Modelling

Technical Areas of Cooperation

12

Concluding Remarks

• Ecosystem– Co-development of system software stack for a leading

machine (PostT2K) and the flagship machine (PostK)– Beneficial to users

• Continuity of System Software and Programming Language from leading machines to the flagship machine

– Contribution to open source community• Shared and Enhanced by the community

• Schedule

2014/09/03

PostT2K

PostK

ProcurementSoftware Development

Operation

Basic Design Design and ImplementationManufacturing, Installation,

and TuningOperation

2014/09/03 13

1. The overall theme of SMC2014 is "Integration of Computing and Data into Instruments of Science and Engineering".

2. Our session is focused on "Deployed Ecosystems and Roadmaps for the Future ". We will be focusing on current experiences and challenges in deploying large scale computing capabilities and our plans and expectations on how future systems will be made available to our scientists and engineers.

3. Consistent with this topic, we are inviting you share your vision for how the computational ecosystem may continue to develop to serve the scientific and engineering challenges of the future.

4. The three other panels in our conference will focus on "Strategic Science: Drivers of Future Innovation", "Future Architectures to Co-Design for Science", and "Math and Computer Science Challenges for Big Data, Analytics, and Scalable Applications".

2014/09/03 14

Two Types of Supercomputer developments Yutaka Ishikawa RIKEN AICS University of Tokyo 1 2014/09/03 Session.

Documents

pf factpf ucc

pf ucc tpf

pf tpf

pf ucc tpc

mw tsubame

tf fx10

mw max

pf cflmtpf ucc