Top Banner
National Center for Supercomputing Applications Cloud Resources in Production Cyberenvironments for E- Science Virtual Organizations GridChem/ParamChem Interoprability NSF Cloud ComputingWorkshop Arlington, VA 17-18 Mar 2011 Sudhakar Pamidighantam NCSA, University of Illinois at Urbana-Champaign [email protected]
40

National Center for Supercomputing Applications Cloud Resources in Production Cyberenvironments for E-Science Virtual Organizations GridChem/ParamChem.

Dec 17, 2015

Download

Documents

Colleen wilson
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: National Center for Supercomputing Applications Cloud Resources in Production Cyberenvironments for E-Science Virtual Organizations GridChem/ParamChem.

National Center for Supercomputing Applications

Cloud Resources in Production Cyberenvironments for E-Science Virtual

Organizations GridChem/ParamChem

Interoprability

NSF Cloud ComputingWorkshop

Arlington, VA 17-18 Mar 2011

Sudhakar PamidighantamNCSA, University of Illinois at

[email protected]

Page 2: National Center for Supercomputing Applications Cloud Resources in Production Cyberenvironments for E-Science Virtual Organizations GridChem/ParamChem.

National Center for Supercomputing Applications

Acknowledgements          

Page 3: National Center for Supercomputing Applications Cloud Resources in Production Cyberenvironments for E-Science Virtual Organizations GridChem/ParamChem.

National Center for Supercomputing Applications

Acknowledgments

• Jayeeta Ghosh, NCSA, ParamChem• Suresh Marru, Indiana U. OGCE • Ye Fan, Indiana U. OGCE• Kenno Vonnommeslaeghe, U. Maryland/Paramchem, • Narendra Polani, UKy, Middleware/ParamChem• Michael Sheetz, UKy, Application Interfaces/ ParamChem• Vikram Gazula, UKy, Server Administration• Tom Roney, NCSA, Server and Database Maintenance• Nikhil Singh, NCSA, Paramchem

• Liu Yang, NCSA, GridChem• Scott Brozell, OSC, Applications and Testing• Rion Dooley, TACC Middleware Infrastructure• Stelios Kyriacou, OSC Middleware Scripts• Chona Guiang, TACC Databases and Applications• Kent Milfeld, TACC Database Integration • Kailash Kotwani, NCSA, Applications and Middleware

Page 4: National Center for Supercomputing Applications Cloud Resources in Production Cyberenvironments for E-Science Virtual Organizations GridChem/ParamChem.

National Center for Supercomputing Applications

Outline• Historical Background : --- Grid Computational Chemistry• Production Environments• Current Status Web Services • Usage:Grid and Science Achievements• Cloud in Hybrid Environments• Interoperability• Future

Page 5: National Center for Supercomputing Applications Cloud Resources in Production Cyberenvironments for E-Science Virtual Organizations GridChem/ParamChem.

National Center for Supercomputing Applications

MotivationIntegrating Services for E-Science and

Engineering inResearch, Education and TrainingSoftware - Reasonably Mature and easy to use to address

chemists questions of interestCommunity of Users - Need and capable of using the software Some are non traditional computational chemistsResources - Various in capacity and capability - Distributed and heterogeneous

Page 6: National Center for Supercomputing Applications Cloud Resources in Production Cyberenvironments for E-Science Virtual Organizations GridChem/ParamChem.

National Center for Supercomputing Applications

Extended TeraGrid Facility

www.teragrid.org

Page 7: National Center for Supercomputing Applications Cloud Resources in Production Cyberenvironments for E-Science Virtual Organizations GridChem/ParamChem.

National Center for Supercomputing Applications

NSF Petascale Road Map• Track I Scheme Multi-petaflop single site system to be deployed by 2011 at NCSA BlueWaters http://www.ncsa.illinois.edu/BlueWaters/

• Track 2 Sub-petaflop systems Several to be deployed until Track 1 is online System OS Cores• Dell PowerEdge(NCSA) EM64T 9600• SGI-Altix(PSC) IA64 768• SGI UV-Ice(NCSA) EM64T 1568 • IBM Power4 Cluster(NCSA) Pwr4 48• IBM PowerPC(Indiana) Pwr4 1536• Sun Constellation (TACC) EM64T 50000

Additional Systems to be online soon (currently being allocated) SGI UV-Ice(PSC) EM64T 4096 FutureGrid Diverse on demand

Page 8: National Center for Supercomputing Applications Cloud Resources in Production Cyberenvironments for E-Science Virtual Organizations GridChem/ParamChem.

National Center for Supercomputing Applications

Grids and New OpportunitiesAlliance to TeraGridHomogenous Grid with predefined fixed software and

system stack was planned (Teragrid) but it was difficult to keep it homogenous

Local preferences and diversity leads to heterogeneous grids now! (Operating Systems, Schedulers, Policies, Software and Services)

Openness and standards that lead interoperability are critical for successful services

Grid Hard-ware

Middleware

Scientific ApplicationsInterfacesInterfaces

Page 9: National Center for Supercomputing Applications Cloud Resources in Production Cyberenvironments for E-Science Virtual Organizations GridChem/ParamChem.

National Center for Supercomputing Applications

User CommunityChemistry and Computational Biology

NRAC AAB Small Allocations As of Oct 04

#PIs 26 23 64

#SUs 5,953,100 1,374,100 640,000

TeraGrid Allocations in 2010

Discipline # PIs Initial Alloc. SUs Physics 125 920,254,700

Molecular Biosciences 308 689,733,465

Chemistry 264 255,479,494

Chemical, Thermal Systems 143 232,905,769

Materials Research 207 210,602,367

2101 Users using Chemistry Software

230 ASC 30 AST 18 ATM 8 BCS 30 CCR 28 CDA 653 CHE 11 CTS

1 DBS 2 DEB 805 DMR 10 DMS 18 EAR 1 ECS 23 IBN 2 IRI

153 MCB 10 MSS 3 NCR 4 OCE 37 PHY 6 SEE 5 SES 3 STA

spamidig
ASC Advanced Scientific ComputingCHE ChemistryAST AstronomyCCRDMR Materials ResearchMCB Molecular and Cellular BiologyPHY Physics
Page 10: National Center for Supercomputing Applications Cloud Resources in Production Cyberenvironments for E-Science Virtual Organizations GridChem/ParamChem.

National Center for Supercomputing Applications

Page 11: National Center for Supercomputing Applications Cloud Resources in Production Cyberenvironments for E-Science Virtual Organizations GridChem/ParamChem.

National Center for Supercomputing Applications

Computational Chemistry Grid

This is a Virtual OrganizationIntegrated Cyber Infrastructure for

Computational Chemistry

Integrates Applications, Middleware, HPC

resources, Scheduling and Data

management

Allocations, User services and Training

Page 12: National Center for Supercomputing Applications Cloud Resources in Production Cyberenvironments for E-Science Virtual Organizations GridChem/ParamChem.

National Center for Supercomputing Applications

Other Resources

Extant HPC resources at various

Supercomputer Centers, Cloud resources (Interoperable)

Optionally Other Grids and Hubs/local/personal

resources

These may require existing

allocations/Authorization

Page 13: National Center for Supercomputing Applications Cloud Resources in Production Cyberenvironments for E-Science Virtual Organizations GridChem/ParamChem.

National Center for Supercomputing Applications

Page 14: National Center for Supercomputing Applications Cloud Resources in Production Cyberenvironments for E-Science Virtual Organizations GridChem/ParamChem.

National Center for Supercomputing Applications

Grid Middleware Proxy Server

GridChem System

user user useruser user

PPortal Clientortal Client

Grid ServicesGrid Services

GridGrid

applicationapplicationapplicationapplication

Mass Storage

http://www.nsf.gov/awardsearch/showAward.do?AwardNumber=0438312

Page 15: National Center for Supercomputing Applications Cloud Resources in Production Cyberenvironments for E-Science Virtual Organizations GridChem/ParamChem.

National Center for Supercomputing Applications

Applications

• GridChem supports some apps already– Gaussian, GAMESS, NWChem, Aces3 Molpro, ADF, Quild,

QMCPack, Castep, DMol3, Amber, Charmm

• Schedule of integration of additional software– Crystal– Q-Chem– Wein2K– MCCCS Towhee – Others...

Workflows

Page 16: National Center for Supercomputing Applications Cloud Resources in Production Cyberenvironments for E-Science Virtual Organizations GridChem/ParamChem.

National Center for Supercomputing Applications

Gridchem Middleware Service (GMS)

Page 17: National Center for Supercomputing Applications Cloud Resources in Production Cyberenvironments for E-Science Virtual Organizations GridChem/ParamChem.

National Center for Supercomputing Applications

GridChem Resources Monitoring

http://portal.gridchem.org:8080/gridsphere/gridsphere?cid=home

Page 18: National Center for Supercomputing Applications Cloud Resources in Production Cyberenvironments for E-Science Virtual Organizations GridChem/ParamChem.

National Center for Supercomputing Applications

Application Software ResourcesCurrently Supported

Suite Version Location

Gaussian 03 C.02/D.01 Many Platforms

MolPro 2006.1 NCSA

NWChem 5.0/4.7 Many Platforms

Gamess Jan 06 Many Platforms

Amber 8.0 Many Paltforms

QMCPack 2.0 NCSA

Page 19: National Center for Supercomputing Applications Cloud Resources in Production Cyberenvironments for E-Science Virtual Organizations GridChem/ParamChem.

National Center for Supercomputing Applications

GridChem Software ResourcesNew Applications

Integration Underway

• ADF Amsterdam Density Functional Theory• Wien2K Linearized Augemented Plain wave (DFT)• CPMD Car Parinello Molecular Dynamics • QChem Molecular Energetics (Quantum Chemistry)• Aces3 Parallel Coupled Cluster Quantum Chemistry• Gromacs Nano/Bio Simulations (Molecular Dynamics)

• NAMD Molecular Dynamics• DMol3 Periodic Molecular Systems ( Quantum Chemistry)• Castep Quantum Chemistry • MCCCS-Towhee Molecular Confirmation Sampling (Monte Carlo)• Crystal98/06 Crystal Optimizations (Quantum Chemistry)• ….

Page 20: National Center for Supercomputing Applications Cloud Resources in Production Cyberenvironments for E-Science Virtual Organizations GridChem/ParamChem.

National Center for Supercomputing Applications

GridChem User Services• Allocationhttps://www.gridchem.org/allocations/index.shtmlCommunity and External Registration Reviews, PI Registration and Access Creation Community User Norms Established

• Consulting/User Serviceshttps://www.gridchem.org/consultTicket tracking, Allocation Management

• Documentation, Training and Outreachhttps://www.gridchem.org/doc_train/index.shtmlFAQ Extraction, Tutorials, Dissemination

Help is integrated into the GridChem client

Page 21: National Center for Supercomputing Applications Cloud Resources in Production Cyberenvironments for E-Science Virtual Organizations GridChem/ParamChem.

National Center for Supercomputing Applications

Users and Usage

• 433 Users under 221 Projects

Include Academic PIs, two graduate classes

And about 15 training users

More than a 2, 000, 000 CPU Wallhours

More than 35500 Jobs processed

5 Dissertations, More than 50 Publications

Page 22: National Center for Supercomputing Applications Cloud Resources in Production Cyberenvironments for E-Science Virtual Organizations GridChem/ParamChem.

User Research

National Center for Supercomputing Applications

Diversity of User Research

NH3 on Si Surfaces

CytP450 Catalysis

ZeoliteChemistry

Phosphinoboranepercyclics

Semiquinonereactions

Si Surface IR Disulfide clevageby P-

Thiolate –SS interchange

V in photocatalysts

PES of diphenylbutadienes

FTIR of Heptanedione on Si

Page 23: National Center for Supercomputing Applications Cloud Resources in Production Cyberenvironments for E-Science Virtual Organizations GridChem/ParamChem.

National Center for Supercomputing Applications

Science Enabled

• Azide Reactions for Controlling Clean Silicon Surface Chemistry: Benzylazide on Si(100)-2 x 1Semyon Bocharov et al..J. Am. Chem. Soc., 128 (29), 9300 -9301, 2006

• Chemistry of Diffusion Barrier Film Formation: Adsorption and Dissociation of Tetrakis(dimethylamino)titanium on Si(100)-2 × 1 Rodriguez-Reyes, J. C. F.; Teplyakov, A. V.J. Phys. Chem. C.; 2007; 111(12); 4800-4808.

• Computational Studies of [2+2] and [4+2] Pericyclic Reactions between Phosphinoboranes and Alkenes. Steric and Electronic Effects in Identifying a Reactive Phosphinoborane that Should Avoid Dimerization Thomas M. Gilbert* and Steven M. Bachrach Organometallics, 26 (10), 2672 -2678, 2007.

Page 24: National Center for Supercomputing Applications Cloud Resources in Production Cyberenvironments for E-Science Virtual Organizations GridChem/ParamChem.

National Center for Supercomputing Applications

Science Enabled• Chemical Reactivity of the Biradicaloid (HO...ONO) Singlet

States of Peroxynitrous Acid. The Oxidation of Hydrocarbons, Sulfides, and Selenides. Bach, R. D et al. J. Am. Chem. Soc. 2005, 127, 3140-3155.

• The "Somersault" Mechanism for the P-450 Hydroxylation of Hydrocarbons. The Intervention of Transient Inverted Metastable Hydroperoxides. Bach, R. D.; Dmitrenko, O. J. Am. Chem. Soc. 2006, 128(5), 1474-1488.

• The Effect of Carbonyl Substitution on the Strain Energy of Small Ring Compounds and their Six-member Ring Reference Compounds Bach, R. D.; Dmitrenko, O. J. Am. Chem. Soc. 2006,128(14), 4598.

Page 25: National Center for Supercomputing Applications Cloud Resources in Production Cyberenvironments for E-Science Virtual Organizations GridChem/ParamChem.

National Center for Supercomputing Applications

Distribution of GridChem User Community

Page 26: National Center for Supercomputing Applications Cloud Resources in Production Cyberenvironments for E-Science Virtual Organizations GridChem/ParamChem.

National Center for Supercomputing Applications

Job Distribution

Page 27: National Center for Supercomputing Applications Cloud Resources in Production Cyberenvironments for E-Science Virtual Organizations GridChem/ParamChem.

National Center for Supercomputing Applications

System Wide UsageHPC System Usage (SUs)

Tungsten(NCSA) 5507

Copper(NCSA) 86484

CCGcluster(NCSA) 55709

Condor(NCSA) 30

SDX(UKy) 116143

CCGCluster(UKy) .5

Longhorn(TACC) 54

CCGCluster(OSC) 62000

TGCluster(OSC) 36936

Cobalt(NCSA) 2485

Champion(TACC) 11

Mike4 (LSU) 14537

Page 28: National Center for Supercomputing Applications Cloud Resources in Production Cyberenvironments for E-Science Virtual Organizations GridChem/ParamChem.

Force Field ParameterizationMolecular Force Fields require constant improvement

as new reference data becomes available (that can not be accommodated easily with existing sets)

New molecular systems become amenable for computational analysis

New models/potential energy functions/Hamiltonians for force are established

Coverage of force fields should constantly be extended to cover new fields of research/new functionality (nanomaterials, biomaterials and medicine,...)"

Page 29: National Center for Supercomputing Applications Cloud Resources in Production Cyberenvironments for E-Science Virtual Organizations GridChem/ParamChem.

Cyberenvironments for Molecular Force Fields

• Extension of currently available models, with the resulting parameters sets to be made available publicly

• Databases of experimental and quantum mechanical reference data to be used in the parameterization process

• Integration of computational resources for data acquisition, automation of QM reference data generation

• Automation Extensible infrastructure for parameterization management for rapid and systematic parameterization of novel Hamiltonians (empirical and semi-empirical)

• Systematic improvement of parameter optimization processes

Page 30: National Center for Supercomputing Applications Cloud Resources in Production Cyberenvironments for E-Science Virtual Organizations GridChem/ParamChem.

Accurate Force Fields Are needed

Published by AAAS

A. J. Stone Science 321, 787 -789 (2008)

Fig. 1. Errors (V) in electrostatic potential on a surface at 1.8 times van der Waals radii around N-methyl propanamide for two models. (Left) Point charges; (right) charge, dipole, and quadrupole on C, N, and O; charge and dipole on H. The errors are much reduced in the multipole approach

Page 31: National Center for Supercomputing Applications Cloud Resources in Production Cyberenvironments for E-Science Virtual Organizations GridChem/ParamChem.

Compute ResourcesCompute

Resources

Resource MiddlewareResource

Middleware Cloud Interfaces Grid Middleware SSH & Resource Managers

Computational Clouds Computational Grids

Gateway ServicesGateway Services

User Interfaces

User Interfaces

Web/Gadget Container

Web Enabled Desktop

Applications

User Management

Auditing & Reporting

Fault Tolerance

Application Abstractions

Workflow System

Information Services

ApplicationMonitoring

Registry SecurityProvenance &

Metadata Management

Local Resources

Web/Gadget Interfaces

Gateway Abstraction Interfaces

Science Gateways Layer Cake

Color Coding

Dependent resource provider components

Complimentary Gateway Components

OGCE Gateway Components

Page 32: National Center for Supercomputing Applications Cloud Resources in Production Cyberenvironments for E-Science Virtual Organizations GridChem/ParamChem.

GFac Current & Future Features

Input Handler

s

Input Handler

s

Scheduling Interface

Scheduling Interface

Auditing

Auditing

Monitoring Interface

Monitoring Interface

Data Management Abstraction

Data Management Abstraction

Job ManagementAbstraction

Job ManagementAbstraction

Fault Tolerance

Fault Tolerance

Output HandlersOutput

Handlers

Registry InterfaceRegistry Interface

Checkpoint Support

Checkpoint Support

GlobusGlobus

Campus Resourc

es

Campus Resourc

es

UnicoreUnicore

CondorCondor

Amazon Eucalyptu

s

Amazon Eucalyptu

s

Color Coding

Planned/Requested Features

Existing Features

Page 33: National Center for Supercomputing Applications Cloud Resources in Production Cyberenvironments for E-Science Virtual Organizations GridChem/ParamChem.

OGCE Layered Workflow Architecture:Derived from LEAD Workflow System

Workflow Execution &

Control Engines

Workflow Execution &

Control Engines

Apache ODE

Workflow Specification

Workflow Specification

Workflow Interfaces (Design & Definition)

Workflow Interfaces (Design & Definition)

PythonBPEL 2.0

BPEL 1.0 Java Code Pegasus DAG

Scufl

XBaya GUI (Composition,

Deploying, Steering & Monitoring) Gadget Interface for

Input Binding

Condor DAGMan

Taverna

Dynamic Enactor

Jython InterpreterGBPEL

Flex/Web Composition

Page 34: National Center for Supercomputing Applications Cloud Resources in Production Cyberenvironments for E-Science Virtual Organizations GridChem/ParamChem.

Putting It All Together

Page 35: National Center for Supercomputing Applications Cloud Resources in Production Cyberenvironments for E-Science Virtual Organizations GridChem/ParamChem.

Pegasus WMS

35

Page 36: National Center for Supercomputing Applications Cloud Resources in Production Cyberenvironments for E-Science Virtual Organizations GridChem/ParamChem.

ParamChem-Xbaya-Pegasus• Input Workflow for GridChem/ParamChem created using Pegasus

JAVA DAX API

-- DAX can have combinations of tasks ( like Charmm/ multiple Gaussian tasks) each taking respective input file.

• The tasks can be mapped to either respective specific applications (like charmm/amber/g03 or g09 )based on a simple configuration.

• Input data (instructions, structure, topology, parameters) will be staged from middleware using GridFTP to the execute clusters (such as TeraGrid systems Mercury and Abe at NCSA).

• Jobs will be distributed across the multiple execute clusters using Round-Robin or other schema.

-- Any heuristics based scheduling is also possible.• Output files will be staged back from execute clusters to middleware

using GridFTP for post processing/archiving.

36

Page 37: National Center for Supercomputing Applications Cloud Resources in Production Cyberenvironments for E-Science Virtual Organizations GridChem/ParamChem.

National Center for Supercomputing Applications

Some New GridChem Infrastructure• Workflow Editors• Coupled Application Execution• Large Scale Computing• Metadata and Archiving • Rich Client Platform Refactorization• Intergrid Interactions

• Open Source Distribution http://cvs.gridchem.org/cvs/

• Open Architecture and Implementation details http://www.gridchem.org/wiki

Page 38: National Center for Supercomputing Applications Cloud Resources in Production Cyberenvironments for E-Science Virtual Organizations GridChem/ParamChem.

ParamChem Apache Axis2 Services

• NotificationService• ResourceService• TriggerService• SessionService• SoftwareService• JobService• Workflow Service• FileService• UserService• ProjectService

Page 39: National Center for Supercomputing Applications Cloud Resources in Production Cyberenvironments for E-Science Virtual Organizations GridChem/ParamChem.

Cloud HPC Interoperability

National Center for Supercomputing Applications

The Cloud in our case is a part of over all resources for computing and storage They have to be usable interoperably along with other HPC and local resourcesParticular use will be for on-demand computing and high throughput computingCertain routine sensor enabled data dependent computing hydrological event monitoring and simulation could be handled by clouds for rapid on demand prediction of short term eventsThe interoperability requirements that enable data and computation movement from one resource to other should be explored.

Page 40: National Center for Supercomputing Applications Cloud Resources in Production Cyberenvironments for E-Science Virtual Organizations GridChem/ParamChem.

Imaginations unbound

Questions?

sarvE janAh SukhinO bhavantu

May every person be happy