Ian Foster on behalf of the Globus Alliance Computation Institute Argonne National Lab & University of Chicago Globus: State of the Union
Dec 30, 2015
Ian Fosteron behalf of the Globus Alliance
Computation Institute
Argonne National Lab & University of Chicago
Globus:State of the Union
2
What’s New with Globus?
Globus applications are larger-scaleand more mission critical
Globus tools are increasingly sophisticated (e.g., GridWay, Introduce, OGSA-DAI, UniCluster, Workspaces)
Globus core software is more robust, functional, performant, and easy to use
Globus community is increasingly diverse and international
3
Why Grid (and Globus)? —The Changing Nature of Work
IT must adapt to this new realityIT must adapt to this new reality
Collaborative and DynamicCollaborative and Dynamic
Project focused, globally distributed teams, spanning
organizations within and beyond enterprise boundaries
Project focused, globally distributed teams, spanning
organizations within and beyond enterprise boundaries
Distributed and HeterogeneousDistributed and Heterogeneous
Each team member/group brings own data, compute, & other resources into the project
Each team member/group brings own data, compute, & other resources into the project
Data & Computation Intensive
Data & Computation Intensive
Access to computing and data resources must be coordinated
across the collaboration
Access to computing and data resources must be coordinated
across the collaboration
Concurrent Innovation Cycles
Concurrent Innovation Cycles
Resources must be available to projects with strong QoS, & also
reflect system-wide priorities
Resources must be available to projects with strong QoS, & also
reflect system-wide priorities
4
Bridging the Application-Resource Gap
IBM
IBM
Uniform interfaces,security mechanisms,Web service transport,
monitoring
Computers StorageSpecialized resources
User App
GRAM GridFTPHost EnvUser Svc
DAIS
ToolTool
Workflow
Credent.
Host EnvUser Svc
Registry
6
Drug Discovery:In Silico Screening
2M+ ligandsProtein x target(s)
(Mike Kubal, Benoit Roux, and others)
7
start
report
DOCK6Receptor
(1 per protein:defines pocket
to bind to)
ZINC3-D
structures
ligands complexes
NAB scriptparameters
(defines flexibleresidues, #MDsteps)
Amber Score:1. AmberizeLigand3. AmberizeComplex5. RunNABScript
end
BuildNABScript
NABScript
NABScript
Template
Amber prep:2. AmberizeReceptor4. perl: gen nabscript
FREDReceptor
(1 per protein:defines pocket
to bind to)
Manually prepDOCK6 rec file
Manually prepFRED rec file
1 protein(1MB)
6 GB2M
structures(6 GB)
DOCK6FRED ~4M x 60s x 1 cpu~60K cpu-hrs
Amber~10K x 20m x 1 cpu
~3K cpu-hrs
Select best ~500
~500 x 10hr x 100 cpu~500K cpu-hrsGCMC
PDBprotein
descriptions
Select best ~5KSelect best ~5K
For 1 target:4 million tasks
500,000 cpu-hrs(50 cpu-years)
8
Second-Generation Grids:Service-Oriented Science
People create services (data or functions) …
which I discover (& decide whether to trust) …
& compose to create a new function ...
and then publish as a new service.
I find “someone else” to host services, so I don’t have to become an expert in operating services & computers!
I hope that this “someone else” can manage security, reliability, scalability, …
!!
9
caBIG: sharing of infrastructure, applications, and data.
DataIntegration!
NIH’s Cancer Biomedical Informatics Grid (caBIG)
10
Microarray
NCICB
ResearchCenter
Gene Databas
e
Grid-Enabled Client
ResearchCenter
Tool 1
Tool 2caArray
Protein Database
Tool 3
Tool 4
Grid Data Service
Analytical Service
Image
Tool 2
Tool 3
Grid Services Infrastructure(Metadata, Registry, Query,
Invocation, Security, etc.)
Grid Portal
caBIG Under the Covers
Main ESG PortalMain ESG Portal CMIP3 (IPCC AR4) ESG PortalCMIP3 (IPCC AR4) ESG Portal
198 TB of data at four locations 1,150 datasets 1,032,000 files Includes the past 6 years of joint
DOE/NSF climate modeling experiments
35 TB of data at one location 74,700 files Generated by a modeling campaign coordinated by the
Intergovernmental Panel on Climate Change Data from 13 countries, representing 25 models
8,000 registered users 1,900 registered projects
Downloads to date 49 TB 176,000 files
Downloads to date 387 TB 1,300,000 files 500 GB/day
(average)
400 scientific papers published to date based on analysis of CMIP3 (IPCC AR4) data
Earth System Grid
ESG usage: over 500 sites worldwide
ESG monthly download volumes
13
MEDICUS Under the Covers
DICOM images Send (publish) Query/Retrieve (discover)
Grid Archive Fault tolerant Bandwidth
Security Authentication Authorization Cryptography
Access Web portal
Applications Computing Data Mining
DICOM Grid Interface Service (DGIS)+
Meta Catalog Service (OGSA-DAI)
Data Replication Service (DRS)
Grid Web Portal, OGCE / GridSphere
Globus Toolkit Release 4
GRAM, OGSA-DAI
X.509 Certificates +
MyProxy Delegation
14
Birmingham•
Data Replication Service
Replicating >1 Terabyte/day to 8 sites770 TB replicated to date: >120 million replicasMTBF = 1 month
LIGO Gravitational Wave Observatory
Cardiff
AEI/Golm
Ann Chervenak et al., ISI; Scott Koranda et al, LIGO
16
What’s New with Globus?
Globus applications are larger-scale and more mission critical
Globus tools are increasingly sophisticated (e.g., GridWay, Introduce, OGSA-DAI, UniCluster, Workspaces)
Globus core software is more robust, functional, performant, and easy to use
Globus community is increasingly diverse and international
17
Creating Services in 2005
“This full-day tutorial provides an introduction to programming Java services with the latest version of the Globus Toolkit version 4 (GT4). The tutorial teaches how to build a Java Service that makes use of GT4 mechanisms for state management, security, registry and related topics.”
18
ApplnService
Create
Index service
StoreRepository ServiceAdvertize
Discover
Invoke;get results
Introduce
Container
Transfer GAR
Deploy
Ohio State University and Argonne/U.Chicago
Creating Services in 2008Introduce and gRAVI
Introduce Define service Create skeleton Discover types Add operations Configure security
Grid Remote Application Virtualization Infrastructure Wrap executables
19
Creating Services:E.g., Introduce Authoring Tool
Define service Create skeleton Discover types Add operations Configure security Modify service
targets GT4
Introduce: Hastings, Saltz, et al., Ohio State University
New GT4 servicescreated in
five minutes …
20
Metascheduling in 2005
“Writing software that dispatches jobs to many sites via GRAM interfaces is left as an exercise for the reader.”
21
SGE Cluster
Users
PBS Cluster LSF Cluster
GridWay
Globus Globus
Infrastructure
Applications
Middleware
• Multiple Admin. Domains• Multiple Organizations
•Multiple metaschedulers
•(V)Organization-wide policies
• DRMAA interface• Science Gateways
GridWay
Users
(Virtual)Organization
Globus
Architecture Examples
Metascheduling in 2008: GridWay
EGEE-II• gLite-LHC interoperability• Virtual Organizations
Fusion: Massive Ray TracingBiomed: CD-HIT (Worflow)
AstroGrid-D, German Astronomy Community Grid
• Supercomputing resources• Astronomy-specific resources• GRAM interface
22
What’s New with Globus?
Globus applications are larger-scale and more mission critical
Globus tools are increasingly sophisticated (e.g., GridWay, Introduce, OGSA-DAI, UniCluster, Workspaces)
Globus core software is more robust, functional, performant, and easy to use
Globus community is increasingly diverse and international
23
IncubatorProjects
Globus Software: dev.globus.org
SecurityExecution
MgmtInfo
ServicesCommonRuntime
Globus Projects
Other
MPICH G2
GridWay
Data Mgmt
IncubationMgmt
Cog WF
LRMA
GAARDS
OGROGDTE UGP
HOC-SAPURSE
GridShib
Introduce
Dyn Acct
WEEP
Gavia JSC
Gavia MS
DDM
Virt WkSp
SGGC
Metrics
ServMark
GridFTP
ReliableFile
Transfer
OGSA-DAI
GRAM
MDS4CAS
DataRepDelegation
ReplicaLocation
Java Runtime
C Runtime
Python Runtime
C Sec GT4 Docs
MEDICUS
GSI-OpenSSH
MyProxy
Swift MonMan
NetLogger
GEMLCA
GlobusToolkit
gRAVI
24
Some Recent Globus GridFTP Enhancements
Performance Dynamic data mover
management Small-files optimization
Ease of use SSH
authentication Robustness
Connection mgmt Space reservation
25
Clients
Clients
Clients
TeraGrid’s Information Systems Architecture
CacheCache
WS/RESTHTTP GET
WS/SOAP
WS MDS4
TomcatWebMDS
Apache 2.0
TeraGrid Central Services
TeraGrid Repositories
Partners
WS/SOAPWS MDS4
Resource Provider Services
26
Information Services Users User
DocumentationUser Portal
Database?Database?
Gateways
Peer Grids
User Applications
info.teragrid.org
Others
2727
GRAM Scalability:E.g., AstroGrid-D Performance
#1 as reported on Einstein@home top users http://einstein.phys.uwm.edu/top_users.php
28
What’s New with Globus?
Globus applications are larger-scale and more mission critical
Globus tools are increasingly sophisticated (e.g., GridWay, Introduce, OGSA-DAI, UniCluster, Workspaces)
Globus core software is more robust, functional, performant, and easy to use
Globus community is increasingly diverse and international
32
Examples of Globus-BasedProduction Scientific Grids
APAC (Australia) China Grid China National Grid CROWN Grid DGrid (Germany) EGEE Open Science Grid Taiwan Grid TeraGrid ThaiGrid UK Natl Grid Service
33
http://dev.globus.org
Guidelines(Apache)
Infrastructure(CVS, email,
bugzilla, Wiki)
ProjectsInclude
…
dev.globus — Community Driven Improvement of Globus Software, NSF OCI
34
Selected Globus Content: Tuesday
Tuesday morning GT Java WS Core Authoring Services Using Introduce Grid Remote Application Virtualization Interface
(gRAVI) Tuesday afternoon
What's New in the Data Area? GridFTP and Cluster Meltdown: When No Means
'Maybe Later‘ Grid Information Management using MDS
35
Selected Globus Content: Wednesday Morning
GridWay: The Open Source Metascheduling Technology for Grid Computing
Using Taverna to Orchestrate Grid Services in a Workflow
MyProxy-based Short Lived Credential CA Service at NERSC
Configuring and Deploying GridFTP for Managing Data Movement in Grid/HPC Environments
36
Selected Globus Content: Wednesday Afternoon
Globus Execution Services What's New in 4.0 and 4.2, Future Plans Virtual Machine Management Services Experiences with GRAM in the LEAD portal Swift & Falkon
Globus Security Update and Futures GAARDS Attribute-based Authorization with GridShib
Virtualization and Cloud Computing with Globus
37
Selected Globus Content:Thursday
Innovative Grid Applications Earth System Grid Southern California Earthquake Center MEDICUS and Children’s Oncology Grid
Globus Administration Tutorial
Porting Applications with Globus GridWay
Service Oriented Science Tutorial
38
Examples of What Globus Lets You Do
Build secure & stateful Web services Web Services core, service authoring tools
Configure distributed authorization structures Powerful standards-based security tools
Deploy services/run jobs on remote systems GRAM, virtual workspace, dynamic services
Move data fast & reliably among many sites Globus data services
Discover and monitor services & resources Globus information service