The Grid: Opportunities, Achievements, and Challenges for (Computer) Science Ian Foster Argonne National Laboratory University of Chicago Globus Alliance www.mcs.anl.gov/~foster
Jan 13, 2016
The Grid:Opportunities, Achievements,
and Challenges for (Computer) Science
Ian Foster
Argonne National Laboratory
University of Chicago
Globus Alliance
www.mcs.anl.gov/~foster
[email protected] ARGONNE CHICAGO
The Grid “Resource sharing & coordinated
problem solving in dynamic, multi-institutional virtual organizations”
1. Enable integration of distributed resources
2. Using general-purpose protocols & infrastructure
3. To achieve useful qualities of service
Early 90s◊ Gigabit testbeds, metacomputing
Mid to late 90s◊ Early experiments (e.g., I-WAY), academic software (Globus,
Condor, Legion), experiments 2003
◊ Hundreds of application communities & projects in scientific and technical computing
◊ Major infrastructure deployments◊ Open source technology: Globus Toolkit®, etc.◊ Global Grid Forum: ~2000 people, 30+ countries◊ Growing industrial adoption
The Grid Phenomenon:An Abbreviated History
[email protected] ARGONNE CHICAGO
Context (1):Dramatic Technological Evolution
Ubiquitous Internet: 100+ million hosts◊ Collaboration & resource sharing the norm
Ultra-high-speed networks: 10+ Gb/s◊ Global optical networks
Enormous quantities of data: Petabytes◊ For an increasing number of communities, gating step
is not collection but analysis Huge quantities of computing: 100+ Top/s
◊ Ubiquitous computing via clusters Moore’s law everywhere: 1000x/decade
◊ Instruments, detectors, sensors, scanners
[email protected] ARGONNE CHICAGO
Context (1):Technological Evolution
Internet revolution: 100M+ hosts◊ Collaboration & sharing the norm
Universal Moore’s law: x103/10 yrs◊ Sensors as well as computers
Petascale data tsunami◊ Gating step is analysis
& our old infrastructure?
114 genomes735 in progress
You are here
Context (2):A Powerful New Three-way Alliance
Computing ScienceSystems, Notations &
Formal Foundation→ Architecture,
Algorithms
TheoryModels & Simulations
→Shared Data
Experiment &Advanced Data
Collection→
Shared Data
Requires much engineering and innovation
Changes culture, mores, andbehaviours
CS as the “new mathematics” – George Djorgovski
[email protected] ARGONNE CHICAGO(Based on a slide from HP)
Context (3):New Commercial Computing Models
shared, traded resources
value
clusters
grid-enabled systems
programmable data center
virtual data center
Open VMS clusters, TruCluster, MC ServiceGuard
Tru64, HP-UX, Linux
switchfabriccompute storage
UDC
computing utility
or
GRID
today
Utility computing On-demand Service-orientation Virtualization
[email protected] ARGONNE CHICAGO
Software,Standards
Problem-Driven, Collaborative Research Methodology
Design
DeployBuild
Apply
Analyze
ApplyApply
Deploy
Apply
ComputerScience
Infra-structure
DisciplineAdvances
GlobalCommunity
[email protected] ARGONNE CHICAGO
Problem-Driven, Collaborative Research Methodology
Design
DeployBuild
Apply
Analyze
ApplyApply
Deploy
Apply
ComputerScience
Software,Standards
DisciplineAdvances
Infra-structure
GlobalCommunity
[email protected] ARGONNE CHICAGO
Resource/Service Integrationas a Fundamental Challenge
R
Discovery
Many sourcesof data, services,computation
R
Registries organizeservices of interestto a community
Access
Data integration activitiesmay require access to, &exploration/analysis of, dataat many locations
Exploration & analysismay involve complex,multi-step workflows
RM
RM
RMRM
RM
Resource managementis needed to ensureprogress & arbitrate competing demands
Securityservice
Securityservice
PolicyservicePolicyservice
Security & policymust underlie access& managementdecisions
[email protected] ARGONNE CHICAGO
CPU v. Collab.
10
100
1,000
10,000
100,000
0 500 1000 1500 2000 2500
Collaboration Size
CPU CPU v. Collab.
Earth Simulator
Atmospheric Chemistry Group
LHC Exp.
Astronomy
Grav. Wave
Nuclear Exp.
Current accelerator Exp.
Scale Metrics: Participants, Data, Tasks, Performance, Interactions, …
[email protected] ARGONNE CHICAGO
Profound Technical Challenges
How do we, in dynamic, scalable, multi-institutional, computationally & data-rich settings:
Negotiate & manage trust Access & integrate data Construct & reuse workflows Plan complex computations Detect & recover from failures Capture & share knowledge Represent & enforce policies Achieve end-to-end QoX Move data rapidly & reliably
Support collaborative work Define primitive protocols Build reusable software Package & deliver software Deploy & operate services Operate infrastructure Upgrade infrastructure Perform troubleshooting Etc., etc., etc.
[email protected] ARGONNE CHICAGO
Grid Technology R&DSeeks to Identify Enabling Mechanisms
Infrastructure (“middleware”) for establishing, managing, and evolving multi-organizational federations◊ Dynamic, autonomous, domain independent◊ On-demand, ubiquitous access to computing,
data, and services Mechanisms for creating and managing workflow
within such federations◊ New capabilities constructed dynamically and
transparently from distributed services◊ Service-oriented, virtualization
[email protected] ARGONNE CHICAGO
Grid as Computer Science Integrator and Contributor
Grid technologies and applications
Networking, security, distributed systems
Databases and knowledge representation
Computer supported collaborative work
Compilers, algorithms, formal methods
[email protected] ARGONNE CHICAGO
Grid as Computer Science Integrator and Contributor
Grid technologies and applications
Networking, security, distributed systems
Databases and knowledge representation
Computer supported collaborative work
Compilers, algorithms, formal methods
[email protected] ARGONNE CHICAGO
Computer Science Contributions
Protocols and/or tools for use in dynamic, scalable, multi-institutional, computationally & data-rich settings for:
Large-scale distributedsystem architecture
Cross-org authentication Scalable community-based
policy enforcement Robust & scalable discovery Wide-area scheduling High-performance, robust,
wide-area data management Knowledge-based workflow
generation High-end collaboration
Resource & service virtualization
Distributed monitoring & manageability
Application development Wide area fault tolerance Infrastructure deployment &
management Resource provisioning &
quality of service Performance monitoring &
modeling
“I’ve come across some interesting data, but I need to understand the nature of the corrections applied when it was constructed before I can trust it for my purposes.”
VirtualData
System
Transformation Derivation
Data
created-by
execution-of
consumed-by/generated-by
“I’ve detected a calibration error in an instrument and
want to know which derived data to recompute.”
“I want to search an astronomical database for galaxies with certain characteristics. If a program that performs this analysis exists, I won’t have to write one from scratch.”
“I want to apply an astronomical analysis program to millions of objects. If the results
already exist, I’ll save weeks of computation.”
GriPhyN VirtualData Technology
www.griphyn.org/chimera
[email protected] ARGONNE CHICAGO
Problem-Driven, Collaborative Research Methodology
Design
DeployBuild
Apply
Analyze
ApplyApply
Deploy
Apply
ComputerScience
Software,Standards
DisciplineAdvances
Infra-structure
GlobalCommunity
[email protected] ARGONNE CHICAGO
Incr
ease
d fu
nctio
nalit
y,st
anda
rdiz
atio
n
Customsolutions
1990 1995 2000 2005
Open GridServices Arch
Real standardsMultiple implementations
Web services, etc.
Managed sharedvirtual systems
Research
Globus Toolkit
Defacto standardSingle implementation
Internetstandards
Evolution of Open GridStandards and Software
2010
[email protected] ARGONNE CHICAGO
Open Grid Services Architecture
Service-oriented architecture◊ Key to virtualization, discovery,
composition, local-remote transparency Leverage industry standards
◊ Internet, Web services Distributed service management
◊ A “component model for Web services” A framework for the definition of
composable, interoperable services
“The Physiology of the Grid: An Open Grid Services Architecture for Distributed Systems Integration”, Foster, Kesselman, Nick, Tuecke, 2002
[email protected] ARGONNE CHICAGO
OGSI: Standard Web Services Interfaces & Behaviors
Naming and bindings (basis for virtualization)◊ Every service instance has a unique name, from which can discover
supported bindings Lifecycle (basis for fault-resilient state management)
◊ Service instances created by factories◊ Destroyed explicitly or via soft state
Information model (basis for monitoring & discovery)◊ Service data (attributes) associated with GS instances◊ Operations for querying and setting this info◊ Asynchronous notification of changes to service data
Service Groups (basis for registries & collective svcs)◊ Group membership rules & membership management
Base Fault type
[email protected] ARGONNE CHICAGO
Web Services: Basic Functionality
OGSA
Open Grid Services Architecture
OGSI: Interface to Grid Infrastructure
Applications in Problem Domain X
Compute, Data & Storage Resources
Distributed
Application & Integration Technology for Problem Domain X
Users in Problem Domain X
Virtual Integration Architecture
Generic Virtual Service Access and Integration Layer
-
Structured DataIntegration
Structured Data Access
Structured DataRelational XML Semi-structured
Transformation
Registry
Job Submission
Data Transport Resource Usage
Banking
Brokering Workflow
Authorisation
[email protected] ARGONNE CHICAGO
The Globus Alliance & Toolkit(Argonne, USC/ISI, Edinburgh, PDC)
An international partnership dedicated to creating & disseminating high-quality open source Grid technology: the Globus Toolkit◊ Design, engineering, support, governance
Academic Affiliates make major contributions◊ EU: CERN, Imperial, MPI, Poznan◊ AP: AIST, TIT, Monash◊ US: NCSA, SDSC, TACC, UCSB, UW, etc.
Significant industrial contributions 1000s of users worldwide, many contribute
[email protected] ARGONNE CHICAGO
Globus Toolkit History:An Unreliable Memoir
0
5000
10000
15000
20000
25000
30000
1997 1998 1999 2000 2001 2002
Glo
bu
s T
oo
lkit
Do
wn
load
s/M
on
th f
rom
Glo
bu
s.O
rg
DARPA, NSF begin funding Grid work
NASA initiatesInformation Power Grid
Globus Project winsGlobal Information
InfrastructureAward
MPICH-Greleased
The Grid: Blueprint for a New ComputingInfrastructure published
GT 1.0.0Released
Early ApplicationSuccesses Reported
GT 1.1.1Released
GT 1.1.2Released
GT 1.1.3Released
NSF & European CommissionInitiate Many New Grid Projects
GT 1.1.4 andMPICH-G2 Released
Anatomy of the GridPaper Released
FirstEuroGlobusConference
Held inLecce
SignificantCommercial
Interest inGrids
NSF GRIDS CenterInitiated
GT 2.0 betaReleased
Physiology of the GridPaper Released
GT 2.0Released
GT 2.2Released
Only Globus.Org; not downloads from: NMI UK eScience EU DataGrid IBM Platform etc.
[email protected] ARGONNE CHICAGO
GlobusToolkit
ContributorsInclude
Grid Packaging Technology (GPT) NCSA Persistent GRAM Jobmanager Condor GSI/Kerberos interchangeability Sandia Documentation NASA, NCSA Ports IBM, HP, Sun, SDSC, … MDS stress testing EU DataGrid Support IBM, Platform, UK eScience Testing and patches Many Interoperable tools Many Replica location service EU DataGrid Python hosting environment LBNL Data access & integration UK eScience Data mediation services SDSC Tooling, Xindice, JMS IBM Brokering framework Platform Management framework HP $$ DARPA, DOE, NSF, NASA, Microsoft, EU
[email protected] ARGONNE CHICAGO
Problem-Driven, Collaborative Research Methodology
Design
DeployBuild
Apply
Analyze
ApplyApply
Deploy
Apply
ComputerScience
Software,Standards
DisciplineAdvances
Infra-structure
GlobalCommunity
[email protected] ARGONNE CHICAGO
Infrastructure Broadly deployed services in support of virtual
organization formation and operation◊ Authentication, authorization, discovery, …
Services, software, and policies enabling on-demand access to important resources◊ Computers, databases, networks, storage, software
services,… Operational support for 24x7 availability Integration with campus infrastructures Distributed, heterogeneous, instrumented systems
can be wonderful CS testbeds
[email protected] ARGONNE CHICAGO
Infrastructure Status
>100 infrastructure deployments worldwide◊ Community-specific & general-purpose◊ From campus to international ◊ Most based on GT technology
U.S. examples: TeraGrid, Grid2003, NEESgrid, Earth System Grid, BIRN
Major open issues include practical aspects of operations and federation
Scalability issues (number of users, sites, resources, files, jobs, etc.) also arising
[email protected] ARGONNE CHICAGO
[email protected] ARGONNE CHICAGO
230 TB FCS SAN500 TB FCS SAN
256 2p Madison667 2p Madison
Myrinet
128 2p Madison256 2p Madison
Myrinet
ANL
NCSA
Caltech
SDSC PSC
100 TB DataWulf
TeraGrid
32 Pentium452 2p Madison20 2p Madison
Myrinet
1.1 TF Power4Federation
CHILA
20 TB
96 GeForce4 Graphics Pipes
96 Pentium4 64 2p Madison
Myrinet
4p Vis75 TB Storage
750 4pAlpha EV68
Quadrics
128p EV7
Marvel
16 2p (ER)MadisonQuadrics
4 Lambdas
ANL
Grid2003: Towards a Persistent U.S. Open Science Grid
Status on 11/19/03(http://www.ivdgl.org/grid2003)
Field Equipment
Laboratory Equipment
Remote Users
Remote Users: (K-12 Faculty and Students)
High-Performance Network(s)
Instrumented Structures and Sites
Leading Edge Computation
Curated Data Repository
Laboratory Equipment
Global Connections
(FY 2005 – FY 2014)
Simulation Tools Repository
NEESgrid
www.neesgrid.org
[email protected] ARGONNE CHICAGOwww.earthsystemgrid.org
DOE Earth System Grid
Goal: address technical obstacles to the sharing & analysis of high-volume data from advanced earth system models
[email protected] ARGONNE CHICAGO
Earth System Grid
[email protected] ARGONNE CHICAGO
ResourceCenter
(Processors, disks)
Grid server Nodes
ResourceCenter
ResourceCenter
ResourceCenter
OperationsCenter
Regional SupportCenter
(Support for Applications
Local Resources)
Regional Support
Regional Support
Regional Support
EGEE:Enabling Grids for E-Science in Europe
[email protected] ARGONNE CHICAGO
Problem-Driven, Collaborative Research Methodology
Design
DeployBuild
Apply
Analyze
ApplyApply
Deploy
Apply
ComputerScience
Software,Standards
DisciplineAdvances
Infra-structure
GlobalCommunity
[email protected] ARGONNE CHICAGO
Applications
100s of projects applying Grid technologies in science, engineering, and industry
Many are exploratory but a significant number are delivering real value, in such areas as◊ Remote access to computers, data, services,
instrumentation◊ Federation of computers, data, instruments◊ Collaborative environments
No single recipe for success, but well-defined goals, modest ambition, & skilled staff help
1
10
100
1000
10000
100000
1 10 100
Num
ber
of C
lust
ers
Number of Galaxies
Galaxy clustersize distribution
DAG
Sloan Galaxy Cluster Analysis
Sloan Data
Jim Annis, Steve Kent, Vijay Sehkri, Fermilab, Michael
Milligan, Yong Zhao, Chicago
[email protected] ARGONNE CHICAGO
gx
NCSA Computational Model
All computational models written in Matlab.
m1
f1
UIUC
Experimental Model
gx
f1
m1
f2f2
U. Colorado
Experimental Model
gx
NEESgridMulti-site Online Simulation Test
[email protected] ARGONNE CHICAGO
MOST: A Grid PerspectiveU. Colorado
Experimental Model
gx
f2m1, 1
F2
F1
e
gx
=
gx
f1, x1
UIUC Experimental Model
NTCPNTCP
SERVERSERVER
gx
m1
f1 f2
NCSANCSA
Computational Model
SIMULATIONSIMULATION
COORDINATORCOORDINATOR
NTCPNTCP
SERVERSERVER
NTCPNTCP
SERVERSERVER
[email protected] ARGONNE CHICAGO
0
10
20
30
40
50
60
70
8:0
0
8:3
0
9:0
0
9:3
0
10
:00
10
:30
11
:00
11
:30
12
:00
12
:30
13
:00
13
:30
14
:00
14
:30
15
:00
15
:30
16
:00
16
:30
17
:00
17
:30
18
:00
18
:30
Nu
mb
er
of
Pa
rtic
ipa
nts
UIUC
Colorado
MOST:User
Perspective
[email protected] ARGONNE CHICAGO
Industry Adopts Grid Technology
[email protected] ARGONNE CHICAGO
Concluding Remarks
Design
DeployBuild
Apply
Analyze
ApplyApply
Deploy
Apply
ComputerScience
Software,Standards
DisciplineAdvances
Infra-structure
GlobalCommunity
[email protected] ARGONNE CHICAGO
“Grid” R&D Has Produced Significant Success Stories
Computer science results◊ Metrics: Papers, citations, students
Widely used software◊ Globus Toolkit, Condor, NMI, etc., etc.
International cooperation & community◊ Science, technology, infrastructure
Interdisciplinary science and engineering◊ Effective partnerships & community
Industrial adoption◊ Broad spectrum of large & small companies
[email protected] ARGONNE CHICAGO
Significant Challenges Remain
Scaling in multiple dimensions◊ Ambition and complexity of applications◊ Number of users, datasets, services, …◊ From technologies to solutions
The need for persistent infrastructure◊ Software and people as well as hardware◊ Currently no long-term commitment
Institutionalizing the 3-way alliance◊ Understand implications on the practice of
computer science research
[email protected] ARGONNE CHICAGO
Thanks, in particular, to: Carl Kesselman and Steve Tuecke, my long-time Globus
co-conspirators Gregor von Laszewski, Kate Keahey, Jennifer Schopf, Mike
Wilde, Argonne colleagues Globus Alliance members at Argonne, U.Chicago, USC/ISI,
Edinburgh, PDC Miron Livny, U.Wisconsin Condor project, Rick Stevens,
Argonne & U.Chicago Other partners in Grid technology, application, &
infrastructure projects DOE, NSF, NASA, IBM for generous support
[email protected] ARGONNE CHICAGO
For More Information
The Globus Alliance®◊ www.globus.org
Global Grid Forum◊ www.ggf.org
Background information◊ www.mcs.anl.gov/~foster
2nd Edition: Just Out