National Institute of Advanced Industrial Science and Technology Running flexible, robust and scalable grid application: Hybrid QM/MD Simulation Hiroshi.

National Institute of Advanced Industrial Science and Technology

Running flexible, robust and scalable grid application:

Hybrid QM/MD　 Simulation

Hiroshi Takemiya, Yusuke Tanimura andHiroshi Takemiya, Yusuke Tanimura andYoshio TanakaYoshio Tanaka

Grid Technology Research Center Grid Technology Research Center National Institute of Advanced Industrial Science and TNational Institute of Advanced Industrial Science and T

echnology, Japanechnology, Japan

Goals of the experiment

To clarify functions needed to execute To clarify functions needed to execute large scale grid applicationslarge scale grid applications

requires many computing resources for a long time

1000 ~ 10000 CPUs1 month ~ 1 year

3 requirements3 requirementsScalability

Managing a large number of resources effectively

RobustnessFault detection and fault recovery

FlexibilityDynamic Resource SwitchingCan’t assume all resources are always available during the experiment

Difficulty in satisfying these requirements

Existing grid programming models are hard to satisfy Existing grid programming models are hard to satisfy the requirementsthe requirements

GridRPCDynamic configuration

Does not need co-allocationEasy to switch computing resources dynamically

Good fault tolerance (detection) One remote executable fault client can retry or use other remote executable

Hard to manage large numbers of servers Client will become bottleneck

Grid-enabled MPIFlexible communication

Possible to avoid communication bottleneck

Static configuration Need co-allocationCan not change the No. of processes during execution

Poor fault tolerance One process fault all process faultFault tolerant MPI is still in the research phase

Gridifying applications using GridRPC and MPI

Combining GridRPC and MPICombining GridRPC and MPIGrid RPC

Allocating server (MPI) programs dynamicallySupporting loose communication between a client and serversManaging only tens to hundreds of server programs

MPISupporting scalable execution of a parallelized server program

Suitable for gridifying applications consisting of Suitable for gridifying applications consisting of loosely-coupled parallel programsloosely-coupled parallel programs

Multi-disciplinary simulationsHybrid QM/MD simulation

GridRPC

client

GridRPC

MPI Programs

GridRPC

MPI Programs

MPI Programs

Related Work

ScalabilityScalabilityLarge scale experiment in SC2004

Gridfying QM/MD simulation program based on our approachExecuting a simulation using ~1800 CPUs of 3 clustersOur approach can manage a large No. of computing resources

RobustnessRobustnessLong run experiment on the PRAGMA testbed

Executing TDDFT program over a monthNinf-G can detect servers faults and return errors correctly

Conducting an experiment to show the validity of our Conducting an experiment to show the validity of our approachapproach

Long run QM/MD simulation on the PRAGMA testbed implementing scheduling mechanism as well as fault tolerant mechanism

Large scale experiment in SC2004

P32 (512 CPU)

F32 (256 CPU)

TCS (512 CPU) @ PSC

P32 (512 CPU)

F32 (1 CPU)

QM #1: 69 atoms including 2H2O+2OH

QM #3: 44 atoms including H2O

QM #2: 68 atoms including H2O

QM #4: 56 atoms including H2OMD: 110,000 atoms

ASC@AIST (1281 CPU) P32 (1024 CPU) Opteron (2.0 GHz) 2-way cluster F32 (257 CPU) Xeon (3.06 GHz) 2-way clusterTCS@ PSC (512 CPU) ES45 alpha (1.0 GHz) 4-way cluster

ASC@AIST (1281 CPU) P32 (1024 CPU) Opteron (2.0 GHz) 2-way cluster F32 (257 CPU) Xeon (3.06 GHz) 2-way clusterTCS@ PSC (512 CPU) ES45 alpha (1.0 GHz) 4-way cluster

Using totally 1793 CPUs on 3 clusters Succeeded in running QM/MD program over 11 hours Our approach can manage a large No. of resources

Using totally 1793 CPUs on 3 clusters Succeeded in running QM/MD program over 11 hours Our approach can manage a large No. of resources

Related Work





Conducting an experiment to show the validity of our Conducting an experiment to show the validity of our approachapproach

Long run QM/MD simulation on the PRAGMA testbed implementing scheduling mechanism as well as fault tolerant mechanism

Long run Experiment on the PRAGMA testbed

PurposePurposeEvaluate quality of Ninf-G2Have experiences on how GridRPC can adapt to faults

Ninf-G stabilityNinf-G stabilityNumber of executions : 43Execution time

(Total) : 50.4 days (Max) : 6.8 days (Ave) : 1.2 days

Number of RPCs: more than 2,500,000

Number of RPC failures: more than 1,600

(Error rate is about 0.064 %)Ninf-G detected these failures and returned errors to the application

0

5

10

15

20

25

30

0 50 100 150

Elapsed time [hours]

Nu

mb

er

of

ali

ve

se

rve

rs

AIST

SDSC

KISTI

KU

NCHC

Related Work





The present experiment reinforces the evidence of tThe present experiment reinforces the evidence of the validity of our approachhe validity of our approach

Long run QM/MD simulation on the PRAGMA testbed implementing a scheduling mechanism for flexibility as well as fault tolerance

Necessity of Large-scale Atomistic Simulation

Modern material engineering requires detailed knowledge baseModern material engineering requires detailed knowledge based on microscopic analysisd on microscopic analysis

Future electronic devicesMicro electro mechanical systems (MEMS)

Features of the analysisFeatures of the analysisnano-scale phenomena

A large number of atoms

Sensitive to environmentVery high precision

Quantum description of bond breaking

[ Deformation process ][ Stress distribution ]

Large-scale Atomistic Simulation

Stress enhances the possibility of corrosion?

Hybrid QM/MD Simulation (1)

Enabling large scale simulation with Enabling large scale simulation with quantum accuracyquantum accuracy

Combining classical MD Simulation with QM simulation

MD simulationSimulating the behavior of atoms in the entire regionBased on the classical MD using an empirical inter-atomic potential

QM simulationModifying energy calculated by MD simulation only in the interesting regionsBased on the density functional theory (DFT)

MD Simulation

QM simulationbased on DFT

Hybrid QM/MD Simulation (2)

Suitable for Grid ComputingSuitable for Grid ComputingAdditive Hybridization

QM regions can be set at will and calculated independentlyComputation dominant

MD and QMs are loosely coupledCommunication cost between QM and MD: ~ O(N)

Very large computational cost of QMComputation cost of QM: ~ O(N3)Computation cost of MD: ~ O(N)

A lot of sources of parallelismMD simulation: executed in parallel (with tight communication)each QM simulation: executed in parallel (with tight communication)QM simulations: executed independently (without communication)MD and QM simulations: executed in parallel (loosely coupled)

QM1

QM2

loose

independent

MD simulation

QM simulation

QM simulation

tight

tight

tight

Modifying the Original Program

Eliminating initial set-up routine in the QM programEliminating initial set-up routine in the QM program

Adding initialization functionAdding initialization function

Eliminating the loop structure in the QM programEliminating the loop structure in the QM program

Tailoring the QM simulation as a functionTailoring the QM simulation as a function

Replacing MPI routine to Ninf-G function callsReplacing MPI routine to Ninf-G function calls

MD part QM part

initial set-up

Calculate MD forces of QM+MD regions

Update atomic positions and velocities

Calculate QM force of the QM regionCalculate QM force of the QM regionCalculate QM force of the QM region

Calculate MD forces of QM region

initial set-upInitializationInitializationInitialization

Initial parameters

Data of QM atoms

QM forces

Data of QM atoms

QM forces

Calculate QM force of the QM regionCalculate QM force of the QM regionCalculate QM force of the QM region

Implementation of a scheduling mechanism

Inserting scheduling layer between application and Inserting scheduling layer between application and grpc layers in the client programgrpc layers in the client program

Application does not care about schedulingFunctions of the layerFunctions of the layer

Dynamic switching of target clustersChecking availabilities of clusters

Available periodMaximum execution time

Error detection/recoveryDetecting server errors/time-outingTime-outing

Preventing application from long waitLong wait in the batch queueLong data transfer time

Trying to continue simulation on other clustersImplemented using Ninf-G

Client program

QMMD simulation Layer(Fortran)

Scheduling Layer

GRPC layer(Ninf-G System)

Long run experiment on the PRAGMA testbed

GoalsGoalsContinue simulation as long as possibleCheck the availability of our programming approach

Experiment TimeExperiment TimeStarted at the 18th Apr. End at the end of May (hopefully)

Target SimulationTarget Simulation5 QM atoms inserted in the box-shaped SiTotally 1728 atoms5 QM regions each of which consists of only 1 atom

Entire region Central region Time evolution of the system

Testbed for the experiment

AIST: UMEAIST: UME

NCHC: ASENCHC: ASE SINICA: PRAGMASINICA: PRAGMA

SDSC: Rocks-52 Rocks-47SDSC: Rocks-52 Rocks-47

UNAM: MaliciaUNAM: MaliciaKU: AMATAKU: AMATA

NCSA: TGCNCSA: TGC

8 clusters of 7 institutes in 5 countries8 clusters of 7 institutes in 5 countriesAIST, KU, NCHC, NCSA, SDSC, SINICA and UNAMunder porting for other 5 clusters

Using 2 CPUS for each QM simulationUsing 2 CPUS for each QM simulation

Change target the cluster at every 2 hoursChange target the cluster at every 2 hours

CNICCNIC

KISTIKISTI

BIIBII

TITECHTITECH

USMUSM

Porting the application

5 steps to port our application5 steps to port our application(1) Check the accessibility using ssh(2) Executing sequential program using globus-job-run(3) Executing MPI program using globus-job-run(4) Executing Ninfied program(5) Executing our application

TroublesTroublesJobmanager-sge had bugs to execute MPI programs

Fixed version was released from AIST

Inappropriate MPI was specified in jobmanagersLAM/MPI does not support execution through GlobusMpich-G is not available due to the certificate problemRecommended to use mpich library

Full Cert

GRAM

Limited Cert

<client> <front end> <back end>

PBS/SGEmpirun

GRAM

Executing the application

Expiration of certificatesExpiration of certificatesWe had to care about many kinds of globus related certs

User cert, host cert, CA cert, CRL…

Globus error message is bad“check host and port”

Poor I/O performancePoor I/O performancePrograms compiled by Intel fortran compiler takes a lot of time for I/O

2 hours to output several Mbytes data!Specifying buffered I/O

Using NFS file system is another cause of poor I/O performance

Remaining processesRemaining processesServer processes remain on the backend nodes while job is deleted from a batch-queueSCMS web is very convenient to find such remaining processes

Preliminary result of the experiment

Succeeded in calculating ~ 10000 time steps during Succeeded in calculating ~ 10000 time steps during 2 weeks2 weeks

No. of GRPC executed: 47593 timesNo. of failure/time-out: 524 times

Most of them (~80 %) occurred in the connection phaMost of them (~80 %) occurred in the connection phasese

Due to connection failure/batch system down/queuing time outTime out for queueing: ~ 60 sec

Other failures include;Other failures include;Exceeding max. execution time (2 hours)Exceeding max. execution time/1 time step (5 min)Exceeding max. CPU time the cluster specified (900 sec)

Giving a demonstration!!Giving a demonstration!!

Execution Profile: Scheduling

Example of exceeding max. execution time Example of exceeding max. execution time

(~60 sec)(~80 sec)

Execution Profile: Error Recovery

Example of error recovering Example of error recovering Batch system faultQueueing time-outExecution time-out

Batch System Fault

Queueing time-out

Execution time-out

National Institute of Advanced Industrial Science and Technology Running flexible, robust and scalable grid application: Hybrid QM/MD Simulation Hiroshi.

Documents

experiment slide

qmmd program

cpu p32

cpu f32

cpu qm

cpu tcs

fault tolerance slide

cpu opteron