Top Banner
April 2003 1 Overview of Grid Computing J. Charles Kesler MCNC
61

Overview of Grid Computing

Jan 18, 2016

Download

Documents

mignon

Overview of Grid Computing. J. Charles Kesler MCNC. Overview. Introduction: Why Grids? Applications for Grids Basic Grid Architecture Grid Platforms Market Segments Examples: Globus, OGSA, AVAKI Building a Grid Project Manager’s View System Administrator’s View - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Overview of Grid Computing

April 2003 1

Overview of Grid Computing

J. Charles KeslerMCNC

Page 2: Overview of Grid Computing

April 2003 2

Overview

Introduction: Why Grids?Applications for GridsBasic Grid ArchitectureGrid Platforms

Market Segments Examples: Globus, OGSA, AVAKI

Building a Grid Project Manager’s View System Administrator’s View Example: The North Carolina BioGrid Project

Grid Reference Resources

Page 3: Overview of Grid Computing

April 2003 3

Why Grids? From the Viewpoint of Research Computing

Researchers are buying clusters A cluster for every researcher in many cases Of course, a cluster comes with a non-trivial amount of

storageComputational power is like commodity Internet

bandwidth – all readily available capacity will be consumed

But, there is a lot of capacity sitting idle in these cluster islands across organizations

Maintenance of clusters is often done inefficiently

…by someone who would prefer to be doing something other than systems administration

Page 4: Overview of Grid Computing

April 2003 4

Current State of Research Computing

Researchers are asking IT to… Host and/or administer compute clusters Host applications and datasets Provide update and backup utilities for datasets Optimize and/or port applications Provide a front end for simplified access to resources Provide tools for workflow automation

That is, IT could benefit from a "utility computing" model to deliver services to researchers

Page 5: Overview of Grid Computing

April 2003 5

Collaboration in the Research Community

Researchers at multiple universities are often working together on the same grants, so they need to share:

Hardware resources Applications Data sets Results

This sharing has to happen across multiple, mutually distrustful administrative domains

The buzzword: Virtual Organization (“VO”)

Page 6: Overview of Grid Computing

April 2003 6

Grid Computing’s Potential for Research

Virtual Computers

Virtual DatabasesUNC-CH

NCSUDuke

WFU

WSSUNCArts

NCAT

UNC-C

UNC-A

ECSU

WCU

ASU

ECU

UNC-G

NCCU

UNC-W

UNC-P

FSU

Unified view of data and computers Computers and data appear to be local

Efficient access to large data sets Caching Replication

Attributes Single sign-on,

security Policy-based

resource sharing

Page 7: Overview of Grid Computing

April 2003 7

Grids According to the Experts

“Flexible, secure, coordinated resource sharing among dynamic collections of individuals, institutions, and resources.”

“A grid is all about gathering together resources and making them accessible to users and applications.”

From The Anatomy of the Grid by Foster, Kesselman and Tuecke

Dr. Andrew Grimshaw, CTO Avaki

Page 8: Overview of Grid Computing

April 2003 8

Grids Are By Definition Heterogeneous

It’s about legacy resources, infrastructure, applications, policies, and procedures

The grid and its administrators must integrate in stealth mode…with

Firewalls Filesystems Queuing systems Grumpy systems administrators Tried and true applications

Page 9: Overview of Grid Computing

April 2003 9

What It Means To…

The end user: Can transparently access resources in multiple VO’s Can more easily collaborate with other researchers

The IT administrator: Has a secure framework for implementing distributed

resource sharing Local resource administrators can control access to

their resourcesThe manager:

Sees better utilization of capital resources Has a tool that helps break down organizational

barriers

Page 10: Overview of Grid Computing

April 2003 10

Challenges in Grid Computing

Reliable performanceTrust relationships between multiple security

domainsDeployment and maintenance of grid middleware

across hundreds or thousands of nodesAccess to data across WAN’sAccess to state information of remote processesWorkflow / dependency managementDistributed software and license managementAccounting and billing

Page 11: Overview of Grid Computing

April 2003 11

Overview

Introduction: Why Grids?Applications for GridsBasic Grid ArchitectureGrid Platforms

Market Segments Examples: Globus, OGSA, AVAKI

Building a Grid Project Manager’s View System Administrator’s View Example: The North Carolina BioGrid Project

Grid Reference Resources

Page 12: Overview of Grid Computing

April 2003 12

Applications for a Grid

Generally, apps that work well on clusters can work well on grids

Non-interactive / batch jobsParallel computations with minimal

interprocess communication and workflow dependencies

Reasonable data transfer requirementsSensible economics

Page 13: Overview of Grid Computing

April 2003 13

Non-Interactive / Batch Jobs

Difficult to get a real-time UI for jobs running on the grid

A possible interactive application: spreadsheet computation

Want to take advantage of off-peak free cycles Jobs run for several days, weeks or months The user might prefer to be sleeping while the job runs!

Running processes might need to be interrupted or re-prioritized based on the current load on a grid compute engine

Idle thread / “screensaver” computing

Page 14: Overview of Grid Computing

April 2003 14

Parallel Computations

Application needs to be able to run as multiple, mostly independent pieces

Good Example: Parameter space study Thousands++ of input files Processed independently by the same application Output file generated for each run (corresponding to

an input file) Analysis of the results reported in the output files to

find the optimal solution Need to build workflow management and results

analysis tools around the grid-based components

Page 15: Overview of Grid Computing

April 2003 15

Minimal Interprocess Communications and Dependencies

Can’t depend on the network’s QoSCan’t rely upon the order of execution and

completionApps that need these things are better suited

for tightly coupled compute platforms (e.g. SMP systems)

Grid can still be useful as a meta-scheduler and data source for such apps

e.g. the user submits the job to the grid queue and asks for the best available SMP resource

Page 16: Overview of Grid Computing

April 2003 16

Reasonable Data Transfer Requirements

It is usually necessary to “stage” files and executables as part of running a grid job

Data transfer time should be small relative to each component job’s run time

Solution: Caching and replication -- but these are not perfect and can be non-trivial to implement

Another solution: schedule the job where the data is (instead of bringing the data to the job)

Might be required if the data is only licensed for some nodes

But, if instead the application is only licensed to run on particular nodes, then the data has to be brought to where the application is

Page 17: Overview of Grid Computing

April 2003 17

The Bottom Line: Sensible Economics

To Grid or Not To Grid:

Productivity Gains > Cost of Building Grid + Opportunity Costs of Resources

Page 18: Overview of Grid Computing

April 2003 18

Some Costs and Benefits

Costs: Grid Middleware Architects and

Developers User Training Infrastructure Hardware Opportunity Costs

Would a big SMP box return better results for your problem?

Benefits: Better Utilization of

Existing Capital Resources

More Efficient Users Ability to complete more

work in the same amount of time

Performance near or sometimes as good as the big SMP box

Page 19: Overview of Grid Computing

April 2003 19

Overview

Introduction: Why Grids?Applications for GridsBasic Grid ArchitectureGrid Platforms

Market Segments Examples: Globus, OGSA, AVAKI

Building a Grid Project Manager’s View System Administrator’s View Example: The North Carolina BioGrid Project

Grid Reference Resources

Page 20: Overview of Grid Computing

April 2003 20

The Single System Model

User Interface / API

ResourceDiscovery

ProcessManagement

AuthenticationAuthorizationAccounting

MessagePassing

DataManagement

Operating System

Storage Compute

Page 21: Overview of Grid Computing

April 2003 21

What Makes a Cluster a Cluster?

Uses a Distributed Resource Manager (DRM) to manager job scheduling

Tightly coupled - High speed, low latency interconnect network

Shared storage for home directories, high throughput scratch space, applications

Fairly homogenous - Configuration management is important!

Single administrative domainUser accounts managed with traditional

mechanisms

Page 22: Overview of Grid Computing

April 2003 22

The Cluster Model

RD PM3A DMMP

Operating System

StorageCompute

Cluster DRM

RD PM3A DMMP

Operating System

StorageCompute

Cluster DRM

RD PM3A DMMP

Operating System

StorageCompute

Cluster DRM

RD PM3A DMMP

Operating System

StorageCompute

Cluster DRM

RD PM3A DMMP

User Interface/API

Cluster DRM

Cluster Node Cluster Node Cluster Node Cluster Node

High SpeedInterconnect

Master Node

SharedStorage

ConfigurationManagement

Page 23: Overview of Grid Computing

April 2003 23

How is an Enterprise Grid Different from a Cluster?

Heterogeneous - Clusters, SMP, even workstations of dissimilar configurations, but all are tied together through a grid middleware layer

Lightly coupled - Connected via 100 or 1000Mbps Ethernet

Introduces a resource registry and grid security service

But usually only a single registry and security service for the grid

Not necessarily a single administrative domain

Page 24: Overview of Grid Computing

April 2003 24

The Enterprise Grid Model

RD PMAA DMMP

Operating System

StorageCompute

Cluster InterfaceRD PMAA DMMP

Operating System

StorageCompute

Cluster InterfaceRD PMAA DMMP

Operating System

StorageCompute

Cluster InterfaceRD PM3A DMMP

Operating System

StorageCompute

Grid Interface

RD PM3A DMMP

Operating System

StorageCompute

Grid Interface

RD PM3A DMMP

User Interface/API

Grid Interface

SMP SMP

EnterpriseLAN or WAN

SecurityInfrastructure

ResourceRegistry

Grid Interface

Cluster DRM RD PMAA DMMP

Operating System

StorageCompute

Cluster InterfaceRD PMAA DMMP

Operating System

StorageCompute

Cluster InterfaceRD PMAA DMMP

Operating System

StorageCompute

Cluster InterfaceGrid Interface

Cluster DRM

RD PM3A DMMP RD PM3A DMMP

Page 25: Overview of Grid Computing

April 2003 25

How is a Global Grid Different from an Enterprise Grid?

"Grid of Grids" - Collection of enterprise gridsLoosely coupled between sites - Not much

control over QoS*Mutually distrustful administrative domainsMultiple grid resource registries and grid

security services

*Not true for grids in the NCREN network!

Page 26: Overview of Grid Computing

April 2003 26

The Global Grid Model

Grid

WAN

RR SI

Cluster

Grid

SMP

Grid

SMP

Grid

Cluster

UI/API

Grid

LAN

Grid

RR SI

SMP

Grid

SMP

Grid

SMP

Grid

Cluster

Cluster

RR SI

ClusterSMP

Grid

Cluster

Grid Grid Grid

LAN

Site A

Site B

Site C

UI/API

Grid

UI/API

Grid

LAN

Page 27: Overview of Grid Computing

April 2003 27

Overview

Introduction: Why Grids?Applications for GridsBasic Grid ArchitectureGrid Platforms

Market Segments Examples: Globus, OGSA, AVAKI

Building a Grid Project Manager’s View System Administrator’s View Example: The North Carolina BioGrid Project

Grid Reference Resources

Page 28: Overview of Grid Computing

April 2003 28

Grid Platforms -- Market Segments

One Way to Categorize Grids:Toolkits Integrated Environments

Or Another Way to Look at Grids:Server AggregationDesktop Aggregation

Page 29: Overview of Grid Computing

April 2003 29

Where Platforms Fit in the Market

Desktop Aggregation Server Aggregation

Toolk

its

Inte

gra

ted

En

viro

nm

ents

• Globus

• OGSA

• Avaki

• United Devices

• Data Synapse

• Entropia

• Parabon

• NMI

• IBM Grid Toolbox

• Platform LSFMulti-Cluster

• BOINC

Page 30: Overview of Grid Computing

April 2003 30

The Early Adopter Market for Grid Technology

Private SectorPharmaceuticals

Banking & FinanceEnergy

(does anyone want this?)

Mix of Industryand AcademiaLife Sciences

Entertainment

Public SectorAcademia

GovernmentNational Labs

Desktop Aggregation Server Aggregation

Toolk

its

Inte

gra

ted

En

viro

nm

ents

Page 31: Overview of Grid Computing

April 2003 31

Grid Platform Example: Globus Toolkit V2

Primary development occurred at Argonne National Labs

Principals were Ian Foster and Carl Kesselman

Open source But architecture development was a closed process

Toolkit approach: different “bundles” that can be installed depending upon what functions are desired

API through CoG (Commodity Grid) kits Java, Python, CORBA, Perl, Matlab, Web services, JSP

Page 32: Overview of Grid Computing

April 2003 32

Globus Toolkit V2

Majority of its use is in university and government research environments

Some vendors offer value-added versions IBM Grid Toolbox Platform Globus

NSF Middleware Initiative (NMI) is packaging pre-built Globus with other relevant components

NWS (Network Weather Service) KX.509/KCA (Kerberos-X.509 integration) Condor-G as a “metascheduler” GSI-enabled OpenSSH

Page 33: Overview of Grid Computing

April 2003 33

Globus Toolkit V2 “Pillars”

InformationServices(MDS)

DataManagement

(GASS)

ResourceManagement

(GRAM)

Grid Security Infrastructure(GSI)

Page 34: Overview of Grid Computing

April 2003 34

Globus Toolkit V2 Stack

MDS GASS/GridFTPGRAM

GSI

HTTP LDAP FTP

TLS/SSL

TCP/IP

Page 35: Overview of Grid Computing

April 2003 35

Globus Toolkit V2 Key Components:GRAM, MDS and GASS

Grid Resource Allocation Manager (GRAM) Server-side: “gatekeeper” process that controls

execution of job managers Client-side: “globusrun” UI to launch jobs

Monitoring and Directory Service (MDS) GRIS: Grid Resource Information Service collects local

info GIIS: Grid Index Information Service collects GRIS info

Global Access to Secondary Storage (GASS) GridFTP, implemented through “in.ftpd” daemon and

“globus-url-copy” command Files accessed through a URI, e.g.

gsiftp://node1.ncbiogrid.org/data/ncbi/ecoli.nt

Page 36: Overview of Grid Computing

April 2003 36

Globus Toolkit V2 Key Components:GSI

Uses a TLS/SSL-based PKI infrastructureAll server resources (i.e. gatekeeper, GRIS) and

users have a public key that has been digitally signed by the CA (the “certificate”) and a private key

“grid-cert-request” to generate key pair User/sysadmin sends the public key to CA CA signs the public key with its private key and returns

to the signed certificate to the user/sysadmin The user/sysadmin stores the signed certificate in the

local filesystem Certificate contains: the subject name, the subject’s

public key, the CA’s name, and the CA’s signature

Page 37: Overview of Grid Computing

April 2003 37

Globus Toolkit V2 Key Components:GSI

Logging in to the grid (“grid-proxy-init”): User creates a temporary public-private key pair User’s private key is used to digitally sign the temporary

public key -- this becomes the “proxy” certificate This creates a chain of trust from the CA to the user to

the proxy certificate The proxy certificate and associated private key are

transmitted with a job

The proxy certificate can be used to issue commands on remote servers on the user’s behalf (“delegation”)

On remote servers, there is a “grid-mapfile” that maps user cert subject names to local userids

Page 38: Overview of Grid Computing

April 2003 38

Globus Toolkit V2 Additional Components

Grid Packaging Tools (GPT) Used to build (“gpt-build”), install (“gpt-install”) and

localize (“gpt-postinstall”) Globus components

MPICH-G2 A Globus V2 enabled version of MPI (Message

Passing Interface) Based on MPICH Utilizes GSI, MDS and GRAM

Page 39: Overview of Grid Computing

April 2003 39

Globus Toolkit V2 Network Services

CertificateAuthority

GIISServer

GRIS

gatekeeper

in.ftpd

Grid Node

GRAMClient

Client Node

GRIS

gatekeeper

in.ftpd

Grid Node

GRIS

gatekeeper

in.ftpd

Grid Node

GRIS

gatekeeper

in.ftpd

Grid Node

Network

Page 40: Overview of Grid Computing

April 2003 40

GRAM, MDS and GASS Interactions

resourceresourceprocessprocess

job manager

gatekeeper

process

GRAM

GRIS

resource

GIIS

MDS

GridFTPin.ftpd

GASS

job allocationjob management

resourcediscovery

data transferdata control

user / proxy

Client

RSL/DUROC/HTTP 1.1 LDAP LDAP

LDAP LDAP

gsiftp

Page 41: Overview of Grid Computing

April 2003 41

Globus Toolkit V2 Strengths and Weaknesses

Strengths: Mindshare and

collaboration in both industry & academia

Open source Standards-based

underpinnings (e.g. SSL, LDAP)

Flexibility and CoG API's Driving OGSA with heavy

resource commitment from IBM

Weaknesses: Significant effort required

to get applications working on a grid

Not production quality at this time

No “metascheduler” -- user has to explicitly tell their jobs where to run

Page 42: Overview of Grid Computing

April 2003 42

Grid Platform Example: OGSA

Merging Grid and Web Services technologiesDeveloping open standards for grid computing

Sponsored by the GGF (organization modeled after IETF) Primary working groups: OGSA and OGSI Many vendors involved: IBM, Sun, Oracle, AVAKI, UD,

etc… (But, ANL and IBM seem to have the upper hand)

Working with the W3C to extend web services

Still in alpha / early beta formWill be open source and commercial

implementations Open source: Globus 3. Commercial: IBM (Websphere), AVAKI, UD, etc…

Page 43: Overview of Grid Computing

April 2003 43

Some Key OGSA Concepts

Grid Service Handle (GSH) GSH is a globally unique name assigned to every

resource Does not contain any protocol or instance specific

information such as network address

Grid Service Reference (GSR) Contains the instance-specific information (e.g.

network address) Only valid for a limited lifespan

Factory Creates and manages grid services per user request Returns the GSH and GSR for a new instance

Page 44: Overview of Grid Computing

April 2003 44

OGSA / Globus 3.0 Preview Release

Implementation of the Grid Service Specification

Built on top of Apache Axis and Java CoGBased in J2EE environment, Limited .NET and C

support at this pointGlobus Toolkit 3.0 expected release

Alpha - Jan 13, 2003 @ GlobusWorld Final – June 2003

Page 45: Overview of Grid Computing

April 2003 45

OGSA / Globus 3.0 Stack

MDS GASS/GridFTPGRAM

Grid Services Abstraction

TCP/IP

SOAP + GSITLS/SSL

Other Transports

Page 46: Overview of Grid Computing

April 2003 46

OGSA Example

Registry

MappingService

ApplicationFactoryService

ApplicationServiceInstance

AuthFactoryService

AuthServiceInstance

User A

Request to CreateAuth Service

Request toAuth User

User B

User AuthInfo

GSH

GSR

Page 47: Overview of Grid Computing

April 2003 47

Grid Platform Example: AVAKI

Original technology came from the Legion project at UVa (which was also used as part of NPACI); principal is Andrew Grimshaw (now CTO)

Integrated solution - load and runObject-oriented architectureData Grid (v3.0) - new architecture meant as the

stepping stone to OGSA; implemented with J2EECompute Grid (v2.6) - latest release of Legion-based

technology; has compute and data grid integratedComprehensive Grid: 3.0 Data + 2.6 Compute Grids

Page 48: Overview of Grid Computing

April 2003 48

AVAKI 3.0 Data Grid ArchitectureAvaki

DomainController

LDAP(User Info)

AVAKIDomain

Controller

Grid Server(metadata)

Grid Server(metadata)

Data AccessServer(NFS)

ShareServer

ShareServer

ShareServer

ShareServer

/dmf/edu /local/data /home/edu /local/data

/grid/grid/dmf/edu/grid/home/edu/grid/data/grid/data/ncbi/grid/data/riceblast

/dmf/edu /data/ncbi /home/edu/data/riceblast

Othergrids

interconnect

Page 49: Overview of Grid Computing

April 2003 49

AVAKI Strengths and Weaknesses

Strengths Vendor support Easy to deploy Data grid Comprehensiveness Works through firewalls

(w/ its Proxy server) Moving towards OGSA

Weaknesses Vendor is a relatively

small company Doesn't have significant

mindshare Currently does not

publish its API's

Page 50: Overview of Grid Computing

April 2003 50

Overview

Introduction: Why Grids?Applications for GridsBasic Grid ArchitectureGrid Platforms

Market Segments Examples: Globus, OGSA, AVAKI

Building a Grid Project Manager’s View System Administrator’s View Example: The North Carolina BioGrid TestBed

Grid Reference Resources

Page 51: Overview of Grid Computing

April 2003 51

Building a Grid -The Project Manager’s View

Keys to success: Realize that grids are built, not bought! Early identification of business drivers and potential

applications for the grid project Have a brainstorming session with stakeholders (e.g.

power users, sys admins, managers)

Doing these things should help you quickly identify:

Is there a good business case for building a grid? What’s the right kind of grid to build?

Desktop or Server Aggregation? Integrated or Toolkit?

Page 52: Overview of Grid Computing

April 2003 52

Building a Grid -The Project Manager’s View

Use a Lifecycle Project Model, e.g. Requirements: identify apps, users and their needs Initial Planning: scope out hardware, middleware Prototype: build a testbed Review results with stakeholders Final Planning: gap analysis for production

implementation Deploy: purchase & install hw, sw; training for users Maintain: break-fix, identify and gridify other apps (Iterate!)

Page 53: Overview of Grid Computing

April 2003 53

Building a Grid -The Systems Administrator’s View

Establish installation and operational standardsEstablish security infrastructure to manage

grid identitiesEstablish resource registry infrastructure Install grid middleware and configure for

appropriate services, e.g. Compute engines Data sources

Establish grid identities for services and usersWork with users to gridify their applications

Page 54: Overview of Grid Computing

April 2003 54

Building a Grid - Example:The North Carolina BioGrid Testbed

Objective was to develop testbed environment to serve as:

A staging area for the production NC BioGrid A research platform for Grid researchers An interoperability testbed for the computing hardware,

middleware, and application software vendor communityTestbed representative of production

environment Hardware and software platforms User client platforms Location dynamics

Testbed needs to be persistent

Page 55: Overview of Grid Computing

April 2003 55

NC BioGrid Key Decisions

Focus on data grid The best way to deploy a petabyte of storage for bio

applications is to aggregate existing pools of storage (no one has $50M to $80M to spend on storage!)

But is a data grid useful without a compute grid? Probably not

Focus on server aggregation Although there are a lot of idle UNIX workstations

and PC’s on the campus, desktop aggregation is a problem we will look at later

Not picking a horse (yet) on Grid middleware Testing AVAKI and Globus

Page 56: Overview of Grid Computing

April 2003 56

NC BioGrid Testbed(Phase 1)

IBMLTO Library

Sun T3

IBM p690

SunFire 3800

FC Switch

FC

IBM eServer 1300

Development& Staging

ClientWorkstation

LAN

10/100

NCSC /RTP

SunFire V880

Gig-EClient

Workstation

CampusNet

IBM eServer 1300

Gig-EClient

Workstation

CampusNet

IBM eServer 1300

Gig-EClient

Workstation

CampusNet

NCREN(OC-48)

NC State / Raleigh

UNC / Chapel Hill

Duke / Durham

Gig-E

Page 57: Overview of Grid Computing

April 2003 57

Site Connection & Data TransportNorth Carolina Research & Education Network

Charlotte

Pembroke

NCSU

NCSUCentennialCampus

NCCUDuke

UNC-CH

Wilmington

ElizabethCity

Asheville

Cullowhee

Greenville

MCNC

Boone

MoreheadCity

Rocky Mount

Qwest

RTP rPoP

NCREN3 High bandwidth (OC-3, OC-12, OC-48) High reliability (multiple paths to rPoPs) Very resilient (all new equipment)

Abilene (OC-48)

Fayetteville

Greensboro

RTP

WinstonSalem

Page 58: Overview of Grid Computing
Page 59: Overview of Grid Computing

April 2003 59

Overview

Introduction: Why Grids?Applications for GridsBasic Grid ArchitectureGrid Platforms

Market Segments Examples: Globus, OGSA, AVAKI

Building a Grid Project Manager’s View System Administrator’s View Example: The North Carolina BioGrid Project

Grid Reference Resources

Page 60: Overview of Grid Computing

April 2003 60

Some Selected Grid Reference ResourcesNC BioGrid: http://www.ncbiogrid.org/

Also: http://www.ncbiogrid.org/resources/grid/index.html

The Global Grid Forum http://www.gridforum.org/

AVAKI http://www.avaki.com/

The Globus Project http://www.globus.org/

IBM RedBook on Globus Computing http://www.redbooks.ibm.com/pubs/pdfs/redbooks/

sg246895.pdf

NSF Middleware Initiative http://www.nsf-middleware.org/

Page 61: Overview of Grid Computing

April 2003 61

Overview of Grid Computing

Questions?