Top Banner
The Anatomy of the Grid Enabling Scalable Virtual Organizations Ian Foster Mathematics and Computer Science Division Argonne National Laboratory and Department of Computer Science The University of Chicago http://www.mcs.anl.gov/~foster
38

The Anatomy of the Grid Enabling Scalable Virtual Organizations Ian Foster Mathematics and Computer Science Division Argonne National Laboratory and Department.

Mar 26, 2015

Download

Documents

Alexa Sawyer
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: The Anatomy of the Grid Enabling Scalable Virtual Organizations Ian Foster Mathematics and Computer Science Division Argonne National Laboratory and Department.

The Anatomy of the GridEnabling Scalable Virtual Organizations

Ian Foster

Mathematics and Computer Science Division

Argonne National Laboratory

and

Department of Computer Science

The University of Chicago

http://www.mcs.anl.gov/~foster

Page 2: The Anatomy of the Grid Enabling Scalable Virtual Organizations Ian Foster Mathematics and Computer Science Division Argonne National Laboratory and Department.

[email protected] ARGONNE CHICAGO

Computational

Data

Information

Access

Knowledge

DISCOM

SinRG

but what are they really about?

Grids are “hot” …

APGrid

TeraGrid

Page 3: The Anatomy of the Grid Enabling Scalable Virtual Organizations Ian Foster Mathematics and Computer Science Division Argonne National Laboratory and Department.

[email protected] ARGONNE CHICAGO

Issues I Propose to Address

Problem statement Architecture Globus Toolkit Futures

Page 4: The Anatomy of the Grid Enabling Scalable Virtual Organizations Ian Foster Mathematics and Computer Science Division Argonne National Laboratory and Department.

[email protected] ARGONNE CHICAGO

The Grid Problem

Resource sharing & coordinated problem solving in dynamic, multi-institutional virtual organizations

Page 5: The Anatomy of the Grid Enabling Scalable Virtual Organizations Ian Foster Mathematics and Computer Science Division Argonne National Laboratory and Department.

[email protected] ARGONNE CHICAGO

Elements of the Problem

Resource sharing– Computers, storage, sensors, networks, …

– Sharing always conditional: issues of trust, policy, negotiation, payment, …

Coordinated problem solving– Beyond client-server: distributed data analysis,

computation, collaboration, … Dynamic, multi-institutional virtual orgs

– Community overlays on classic org structures

– Large or small, static or dynamic

Page 6: The Anatomy of the Grid Enabling Scalable Virtual Organizations Ian Foster Mathematics and Computer Science Division Argonne National Laboratory and Department.

[email protected] ARGONNE CHICAGO

Grid Communities & Applications:Data Grids for High Energy Physics

Tier2 Centre ~1 TIPS

Online System

Offline Processor Farm

~20 TIPS

CERN Computer Centre

FermiLab ~4 TIPSFrance Regional Centre

Italy Regional Centre

Germany Regional Centre

InstituteInstituteInstituteInstitute ~0.25TIPS

Physicist workstations

~100 MBytes/sec

~100 MBytes/sec

~622 Mbits/sec

~1 MBytes/sec

There is a “bunch crossing” every 25 nsecs.

There are 100 “triggers” per second

Each triggered event is ~1 MByte in size

Physicists work on analysis “channels”.

Each institute will have ~10 physicists working on one or more channels; data for these channels should be cached by the institute server

Physics data cache

~PBytes/sec

~622 Mbits/sec or Air Freight (deprecated)

Tier2 Centre ~1 TIPS

Tier2 Centre ~1 TIPS

Tier2 Centre ~1 TIPS

Caltech ~1 TIPS

~622 Mbits/sec

Tier 0Tier 0

Tier 1Tier 1

Tier 2Tier 2

Tier 4Tier 4

1 TIPS is approximately 25,000

SpecInt95 equivalents

Image courtesy Harvey Newman, Caltech

Page 7: The Anatomy of the Grid Enabling Scalable Virtual Organizations Ian Foster Mathematics and Computer Science Division Argonne National Laboratory and Department.

[email protected] ARGONNE CHICAGO

Grid Communities and Applications:Network for Earthquake Eng. Simulation

NEESgrid: national infrastructure to couple earthquake engineers with experimental facilities, databases, computers, & each other

On-demand access to experiments, data streams, computing, archives, collaboration

NEESgrid: Argonne, Michigan, NCSA, UIUC, USC

Page 8: The Anatomy of the Grid Enabling Scalable Virtual Organizations Ian Foster Mathematics and Computer Science Division Argonne National Laboratory and Department.

[email protected] ARGONNE CHICAGO

Grid Communities and Applications:Mathematicians Solve NUG30

Community=an informal collaboration of mathematicians and computer scientists

Condor-G delivers 3.46E8 CPU seconds in 7 days (peak 1009 processors) in U.S. and Italy (8 sites)

Solves NUG30 quadratic assignment problem

14,5,28,24,1,3,16,15,10,9,21,2,4,29,25,22,13,26,17,30,6,20,19,8,18,7,27,12,11,23

MetaNEOS: Argonne, Iowa, Northwestern, Wisconsin

Page 9: The Anatomy of the Grid Enabling Scalable Virtual Organizations Ian Foster Mathematics and Computer Science Division Argonne National Laboratory and Department.

[email protected] ARGONNE CHICAGO

Community =– 1000s of home

computer users

– Philanthropic computing vendor (Entropia)

– Research group (Scripps)

Common goal= advance AIDS research

Grid Communities and Applications:Home Computers Evaluate AIDS Drugs

Page 10: The Anatomy of the Grid Enabling Scalable Virtual Organizations Ian Foster Mathematics and Computer Science Division Argonne National Laboratory and Department.

Grid Architecture

Page 11: The Anatomy of the Grid Enabling Scalable Virtual Organizations Ian Foster Mathematics and Computer Science Division Argonne National Laboratory and Department.

[email protected] ARGONNE CHICAGO

Why Discuss Architecture?

Descriptive– Provide a common vocabulary for use when

describing Grid systems Guidance

– Identify key areas in which services are required

Prescriptive– Define standard “Intergrid” protocols and

APIs to facilitate creation of interoperable Grid systems and portable applications

Page 12: The Anatomy of the Grid Enabling Scalable Virtual Organizations Ian Foster Mathematics and Computer Science Division Argonne National Laboratory and Department.

[email protected] ARGONNE CHICAGO

What Sorts of Standards? Need for interoperability when different groups

want to share resources– E.g., IP lets me talk to your computer, but how do

we establish & maintain sharing?

– How do I discover, authenticate, authorize, describe what I want to do, etc., etc.?

Need for shared infrastructure services to avoid repeated development, installation, e.g.– One port/service for remote access to computing,

not one per tool/application

– X.509 enables sharing of Certificate Authorities

Page 13: The Anatomy of the Grid Enabling Scalable Virtual Organizations Ian Foster Mathematics and Computer Science Division Argonne National Laboratory and Department.

[email protected] ARGONNE CHICAGO

So, in Defining Grid Architecture, We Must Address …

Development of Grid protocols & services– Protocol-mediated access to remote resources

– New services: e.g., resource brokering

– “On the Grid” = speak Intergrid protocols

– Mostly (extensions to) existing protocols Development of Grid APIs & SDKs

– Facilitate application development by supplying higher-level abstractions

The (hugely successful) model is the Internet The Grid is not a distributed OS!

Page 14: The Anatomy of the Grid Enabling Scalable Virtual Organizations Ian Foster Mathematics and Computer Science Division Argonne National Laboratory and Department.

[email protected] ARGONNE CHICAGO

The Role of Grid Services(aka Middleware) and Tools

Remotemonitor

Remoteaccess

Informationservices

Faultdetection

. . .Resourcemgmt

CollaborationTools

Data MgmtTools

Distributedsimulation

. . .

net

Page 15: The Anatomy of the Grid Enabling Scalable Virtual Organizations Ian Foster Mathematics and Computer Science Division Argonne National Laboratory and Department.

[email protected] ARGONNE CHICAGO

Layered Grid Architecture(By Analogy to Internet Architecture)

Application

Fabric“Controlling things locally”: Access to, & control of, resources

Connectivity“Talking to things”: communication (Internet protocols) & security

Resource“Sharing single resources”: negotiating access, controlling use

Collective“Coordinating multiple resources”: ubiquitous infrastructure services, app-specific distributed services

InternetTransport

Application

Link

Inte

rnet P

roto

col

Arch

itectu

re

Page 16: The Anatomy of the Grid Enabling Scalable Virtual Organizations Ian Foster Mathematics and Computer Science Division Argonne National Laboratory and Department.

[email protected] ARGONNE CHICAGO

Protocols, Services, and InterfacesOccur at Each Level

Languages/Frameworks

Fabric Layer

Applications

Local Access APIs and Protocols

Collective Service APIs and SDKs

Collective ServicesCollective Service Protocols

Resource APIs and SDKs

Resource ServicesResource Service Protocols

Connectivity APIs

Connectivity Protocols

Page 17: The Anatomy of the Grid Enabling Scalable Virtual Organizations Ian Foster Mathematics and Computer Science Division Argonne National Laboratory and Department.

[email protected] ARGONNE CHICAGO

Where Are We With Architecture?

No “official” standards exist– Nor is it clear what this would mean

But: – Globus Toolkit has emerged as the de facto

standard for several important Connectivity, Resource, and Collective protocols

– GGF has an architecture working group

– Technical specifications are being developed for architecture elements: e.g., security, data, resource management, information

Page 18: The Anatomy of the Grid Enabling Scalable Virtual Organizations Ian Foster Mathematics and Computer Science Division Argonne National Laboratory and Department.

The Globus Toolkit

Page 19: The Anatomy of the Grid Enabling Scalable Virtual Organizations Ian Foster Mathematics and Computer Science Division Argonne National Laboratory and Department.

[email protected] ARGONNE CHICAGO

Grid Services Architecture (1):Fabric Layer

Just what you would expect: the diverse mix of resources that may be shared– Individual computers, Condor pools, file

systems, archives, metadata catalogs, networks, sensors, etc., etc.

Few constraints on low-level technology: connectivity and resource level protocols form the “neck in the hourglass”

Globus toolkit provides a few selected components (e.g., bandwidth broker)

Page 20: The Anatomy of the Grid Enabling Scalable Virtual Organizations Ian Foster Mathematics and Computer Science Division Argonne National Laboratory and Department.

[email protected] ARGONNE CHICAGO

Grid Services Architecture (2):Connectivity Layer Protocols & Services

Communication– Internet protocols: IP, DNS, routing, etc.

Security: Grid Security Infrastructure (GSI)– Uniform authentication & authorization

mechanisms in multi-institutional setting

– Single sign-on, delegation, identity mapping

– Public key technology, SSL, X.509, GSS-API

– Supporting infrastructure: Certificate Authorities, key management, etc.

Page 21: The Anatomy of the Grid Enabling Scalable Virtual Organizations Ian Foster Mathematics and Computer Science Division Argonne National Laboratory and Department.

[email protected] ARGONNE CHICAGO

User

User Proxy

GlobusGlobusCredentialCredential

Site 1

Kerberos

GRAM Process

Process

ProcessGSI

TicketTicket

Site 2

Public Key

GRAM

GSI

CertificateCertificate

Process

Process

Process

Authenticatedinterprocess

communication

CREDENTIAL

Mutualuser-resourceauthentication

Mappingto

local ids

Assignment of credentials to“user proxies”

Single sign-on via “grid-id”

Authorization

Page 22: The Anatomy of the Grid Enabling Scalable Virtual Organizations Ian Foster Mathematics and Computer Science Division Argonne National Laboratory and Department.

[email protected] ARGONNE CHICAGO

GSI Futures

Scalability in numbers of users & resources – Credential management

– Online credential repositories (“MyProxy”)

– Account management Authorization

– Policy languages

– Community authorization Protection against compromised resources

– Restricted delegation, smartcards

Page 23: The Anatomy of the Grid Enabling Scalable Virtual Organizations Ian Foster Mathematics and Computer Science Division Argonne National Laboratory and Department.

[email protected] ARGONNE CHICAGO

GSI Futures:Community Authorization

2. CAS reply, with and resource CA info

user/group membership

resource/collective membership

collective policy information

CAS

Does the collective policy authorize this

request for this user?

User

1. CAS request, with resource names and operations

Resource

Is this request authorized for

the CAS?

Is this request authorized by

the capability? local policy

information

3. Resource request, authenticated with

capability

4. Resource reply

capability

Page 24: The Anatomy of the Grid Enabling Scalable Virtual Organizations Ian Foster Mathematics and Computer Science Division Argonne National Laboratory and Department.

[email protected] ARGONNE CHICAGO

Grid Services Architecture (3):Resource Layer Protocols & Services

Resource management: GRAM– Remote allocation, reservation, monitoring,

control of [compute] resources Data access: GridFTP

– High-performance data access & transport Information: MDS (GRRP, GRIP)

– Access to structure & state information & others emerging: catalog access, code

repository access, accounting, … All integrated with GSI

Page 25: The Anatomy of the Grid Enabling Scalable Virtual Organizations Ian Foster Mathematics and Computer Science Division Argonne National Laboratory and Department.

[email protected] ARGONNE CHICAGO

GRAM Resource Management Protocol

Grid Resource Allocation & Management– Allocation, monitoring, control of computations

Simple HTTP-based RPC– Job request:

> Returns a “job contact”: Opaque string that can be passed between clients, for access to job

– Job cancel, Job status, Job signal

– Event notification (callbacks) for state changes> Pending, active, done, failed, suspended

Servers for most schedulers; C and Java APIs

Page 26: The Anatomy of the Grid Enabling Scalable Virtual Organizations Ian Foster Mathematics and Computer Science Division Argonne National Laboratory and Department.

[email protected] ARGONNE CHICAGO

Resource Management Futures

GRAM-2 protocol (ETA late 2001)– Advance reservations & multiple resource types

– Recoverable requests, timeout, etc.

– Use of SOAP (RPC using HTTP + XML)

– Policy evaluation points for restricted proxies

Page 27: The Anatomy of the Grid Enabling Scalable Virtual Organizations Ian Foster Mathematics and Computer Science Division Argonne National Laboratory and Department.

[email protected] ARGONNE CHICAGO

Data Access & Transfer

GridFTP: extended version of popular FTP protocol for Grid data access and transfer

Secure, efficient, reliable, flexible, extensible, parallel, concurrent, e.g.:– Third-party data transfers, partial file transfers

– Parallelism, striping (e.g., on PVFS)

– Reliable, recoverable data transfers Reference implementations

– Existing clients and servers: wuftpd, nicftp

– Flexible, extensible libraries

Page 28: The Anatomy of the Grid Enabling Scalable Virtual Organizations Ian Foster Mathematics and Computer Science Division Argonne National Laboratory and Department.

[email protected] ARGONNE CHICAGO

Grid Services Architecture (4):Collective Layer Protocols & Services Index servers aka metadirectory services

– Custom views on dynamic resource collections assembled by a community

Resource brokers (e.g., Condor Matchmaker)– Resource discovery and allocation

Replica management and replica selection– Optimize aggregate data access performance

Co-reservation and co-allocation services– End-to-end performance

Etc., etc.

Page 29: The Anatomy of the Grid Enabling Scalable Virtual Organizations Ian Foster Mathematics and Computer Science Division Argonne National Laboratory and Department.

[email protected] ARGONNE CHICAGO

The Grid Information Problem

Large numbers of distributed “sensors” with different properties

Need for different “views” of this information, depending on community membership, security constraints, intended purpose, sensor type

Page 30: The Anatomy of the Grid Enabling Scalable Virtual Organizations Ian Foster Mathematics and Computer Science Division Argonne National Laboratory and Department.

[email protected] ARGONNE CHICAGO

The Globus Toolkit Solution: MDS-2

Registration & enquiry protocols, information models, query languages– Provides standard interfaces to sensors

– Supports different “directory” structures supporting various discovery/access strategies

Page 31: The Anatomy of the Grid Enabling Scalable Virtual Organizations Ian Foster Mathematics and Computer Science Division Argonne National Laboratory and Department.

[email protected] ARGONNE CHICAGO

GRAM GRAM GRAM

LSF Condor NQE

Application

RSL

Simple ground RSL

Information Service

Localresourcemanagers

RSLspecialization

Broker

Ground RSL

Co-allocator

Queries& Info

Resource Management Architecture

* See talk by Jarek Nabrzyski et al.

ASCI DISCOMCondor-GNimrod-GPoznan*U. Lecce

DUROCMPICH-G2

Page 32: The Anatomy of the Grid Enabling Scalable Virtual Organizations Ian Foster Mathematics and Computer Science Division Argonne National Laboratory and Department.

[email protected] ARGONNE CHICAGO

Data Grid Architecture(See talk by Sudharshan Vazhkudai)

Metadata Catalog

Replica Catalog

Tape Library

Disk Cache

Attribute Specification

Logical Collection and Logical File Name

Disk Array Disk Cache

Application

Replica Selection

Multiple Locations

NWS

SelectedReplica

GridFTP commandsPerformanceInformation &Predictions

Replica Location 1 Replica Location 2 Replica Location 3

MDS

+ “Virtual data”: transparency wrt location and materialization (www.griphyn.org)

Page 33: The Anatomy of the Grid Enabling Scalable Virtual Organizations Ian Foster Mathematics and Computer Science Division Argonne National Laboratory and Department.

Grid Futures

Page 34: The Anatomy of the Grid Enabling Scalable Virtual Organizations Ian Foster Mathematics and Computer Science Division Argonne National Laboratory and Department.

[email protected] ARGONNE CHICAGO

Large GridProjects

are in Place

DOE ASCI DISCOM DOE Particle Physics Data Grid DOE Earth Systems Grid DOE Science Grid DOE Fusion Collaboratory European Data Grid Egrid (see talk by G. Allen et al.) NASA Information Power Grid NSF National Technology Grid NSF Network for Earthquake Eng Simulation NSF Grid Application Development Software NSF Grid Physics Network

Page 35: The Anatomy of the Grid Enabling Scalable Virtual Organizations Ian Foster Mathematics and Computer Science Division Argonne National Laboratory and Department.

[email protected] ARGONNE CHICAGO

Problem Evolution Past-present: O(102) high-end systems; Mb/s networks;

centralized (or entirely local) control– I-WAY (1995): 17 sites, week-long; 155 Mb/s

– GUSTO (1998): 80 sites, long-term experiment

– NASA IPG, NSF NTG: O(10) sites, production Present: O(104-106) data systems, computers; Gb/s networks;

scaling, decentralized control– Scalable resource discovery; restricted delegation; community

policy; GriPhyN Data Grid: 100s of sites, O(104) computers; complex policies

Future: O(106-109) data, sensors, computers; Tb/s networks; highly flexible policy, control

Page 36: The Anatomy of the Grid Enabling Scalable Virtual Organizations Ian Foster Mathematics and Computer Science Division Argonne National Laboratory and Department.

[email protected] ARGONNE CHICAGO

The Future:All Software is Network-Centric

We don’t build or buy “computers” anymore, we borrow or lease required resources– When I walk into a room, need to solve a

problem, need to communicate A “computer” is a dynamically, often

collaboratively constructed collection of processors, data sources, sensors, networks– Similar observations apply for software

Page 37: The Anatomy of the Grid Enabling Scalable Virtual Organizations Ian Foster Mathematics and Computer Science Division Argonne National Laboratory and Department.

[email protected] ARGONNE CHICAGO

And Thus …

Reduced barriers to access mean that we do much more computing, and more interesting computing, than today => Many more components (& services); massive parallelism

All resources are owned by others => Sharing (for fun or profit) is fundamental; trust, policy, negotiation, payment

All computing is performed on unfamiliar systems => Dynamic behaviors, discovery, adaptivity, failure

Page 38: The Anatomy of the Grid Enabling Scalable Virtual Organizations Ian Foster Mathematics and Computer Science Division Argonne National Laboratory and Department.

[email protected] ARGONNE CHICAGO

Summary

The Grid problem: Resource sharing & coordinated problem solving in dynamic, multi-institutional virtual organizations

Grid architecture: Emphasize protocol and service definition to enable interoperability and resource sharing

Globus Toolkit as a source of protocol and API definitions, reference implementations

For more info: www.globus.org, www.griphyn.org, www.gridforum.org