Top Banner
Grid Technology B Different Flavors of Grids CERN Geneva April 1-3 2003 Geoffrey Fox Community Grids Lab Indiana University [email protected]
45
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Cerngridtech Bapril

Grid Technology BDifferent Flavors of

GridsCERN GenevaApril 1-3 2003

Geoffrey FoxCommunity Grids Lab

Indiana [email protected]

Page 2: Cerngridtech Bapril

Different Types of Grids• Compute and File-oriented Grids• “Internet Computing” Grids (Desktop Grids) • Peer-to-peer Grids• Information Grids: to distinguish between

File, database and “Perl Filter” based Grids• Semantic Grids• Integrated (Hybrid, Complexity) Grids

– Bio and Geocomplexity

• Campus Grids• Enterprise Grids

Page 3: Cerngridtech Bapril

Compute and File-oriented Grids• Different Grids have different structures• Compute/File oriented Grids are well represented by

“production part of particle physics” either in– Monte Carlo– Production of Data Summary Tapes

• This is nearer the “Globus GT2” rather than the “Web Service” vision of the Grid

• Strongly supported of course by EDG (European Data Grid and Trillium project in the US (Virtual Data Toolkit)

• Physics Analysis phase of particle physics requires more collaboration and is more dynamic

Page 4: Cerngridtech Bapril

What do HEP experiments want to do on the GRID in the long term ? Production:

Simulation (Monte Carlo generators).

Reconstruction (including detector geometry …).

Event Mixing (bit wise superposition of Signal and Backgrounds).

Reprocessing (Refinement, improved reconstruction data production).

Production (production of AODs and ESDs starting from Raw data).

Very organized activity, generally centrally managed by prod teams

Physics analysis:

Searches for specific event signatures or particle types. (data access can be very sparse, perhaps on the order of

one event out of each million).

Measurement of inclusive and exclusive cross sections for a given physics channel – Measurement of relevant kinematical quantities

I/O not feasible to organize the input data in a convenient fashion unless one constructs new files containing the selected events .

the activities are also uncoordinated (not planned in advance) and (often) iterative.

Page 5: Cerngridtech Bapril

EDG “Compute/File” Grid Work Packages

• WP1: Work Load (Resource) Management System

• WP2: Data (Replication/Caching) Management

• WP3: Grid Monitoring / Grid Information Systems (general meta-data lookup

• WP4: Fabric Management (software etc. on cluster)

• WP5: Storage Element (Grid Interface to mass storage)

• WP6: Security

• WP7: Network Monitoring

Page 6: Cerngridtech Bapril

Compute/File Grid Requirements I• Called Data Grid by Globus team• Terabytes or petabytes of data

– Often read-only data, “published” by experiments

• Large data storage and computational resources shared by researchers around the world– Distinct administrative domains – Respect local and global policies governing how resources

may be used

• Access raw experimental data

• Run simulations and analysis to create “derived” data products

Page 7: Cerngridtech Bapril

Compute/File Grid Requirements II• Locate data

– Record and query for existence of data

• Data access based on metadata– High-level attributes of data

• Support high-speed, reliable data movement– E.g., for efficient movement of large experimental data sets

• Support flexible data access– e.g., databases , hierarchical data formats (HDF), aggregation

of small objects

• Data Filtering– Process data at storage system before transferring

Page 8: Cerngridtech Bapril

Compute/File Grid Requirements III• Planning, scheduling and monitoring execution of data

requests and computations

• Management of data replication– Register and query for replicas– Select the best replica for a data transfer

• Security– Protect data on storage systems– Support secure data transfers– Protect knowledge about existence of data

• Virtual data– Desired data may be stored on a storage system (“materialized”) or

created on demand

Page 9: Cerngridtech Bapril

Functional View of Compute/File Grid

Location based ondata attributes

Location of one ormore physical replicas

State of grid resources, performance measurements and predictions

Metadata Service

Application

Replica LocationService

Information Services

Planner:Data location, Replica selection,Selection of compute and storage nodes

Security and Policy

Executor:Initiates data transfers and computations

Data Movement

Data Access

Compute Resources Storage Resources

Page 10: Cerngridtech Bapril

Layered C/F Grid Architecture

ComputeSystems

NetworksStoragesystems

StorageResource

Management

ComputeResource

Management

General DataDiscoveryServices

CommunityAuthorization

Services

Application-Specific Data

Discovery Services

StorageManagement(Brokering)

ComputeScheduling(Brokering)

Data Filtering orTransformation

Services

DatabaseManagement

Services

RequestInterpretationand Planning

Services

Data AccessProtocol or

Service

DataTransportServices

Monitoring/AuditingServices

Workflow orRequest

ManagementServices

Consistency Services(e.g., Update Subscription,Versioning, Master Copies)

DataFederationServices

RE

SO

UR

CE

:S

HA

RIN

G S

ING

LE

RE

SO

UR

CE

S

CO

LL

EC

TIV

E 1

:G

EN

ER

AL

SE

RV

ICE

S F

OR

CO

OR

DIN

AT

ING

MU

LT

IPL

ER

ES

OU

RC

ES

CO

LL

EC

TIV

E 2

:S

ER

VIC

ES

SP

EC

IFIC

TO

AP

PL

ICA

TIO

ND

OM

AIN

OR

VIR

TU

AL

OR

G.

ResourceMonitoring/

Auditing

FA

BR

ICC

ON

NE

CT

IVIT

Y

CommunicationProtocols (e.g.,TCP/IP stack)

Authentication andAuthorization

Protocols (e.g., GSI)

Data Filtering orTransformation

Services

CO

LL

EC

TIV

E

Page 11: Cerngridtech Bapril

C/F Grid Architecture I (from the bottom up)

• Fabric Layer– Storage systems– Compute systems– Networks

• Connectivity Layer– Communication protocols (e.g., TCP/IP protocol

stack)– Authentication and Authorization protocols (e.g.,

GSI)

Page 12: Cerngridtech Bapril

C/F Grid Architecture II• Resource Layer: sharing single resources

– Data Access Protocol or Service (e.g., Globus gridftp)

– Storage Resource Management (e.g., SRM/DRM/HRM from Lawrence Berkeley Lab)

– Data Filtering or Transformation Services (e.g., DataCutter from Ohio State University)

– Database Management Services (e.g., local RDBMS)

– Compute Resource Management Services (e.g., local supercomputer scheduler)

– Resource Monitoring/Auditing Service

Page 13: Cerngridtech Bapril

C/F Grid Architecture III• Collective 1 Layer: General Services for Coordinating

Multiple Resources

– Data Transport Services (e.g., Globus Reliable File Transfer and Multiple File Transfer Service from LBNL)

– Data Federation Services– Data filtering or Transformation Service (e.g., Active ProxyG

from Ohio State University)– General Data Discovery Services (e.g., Globus Replica

Location Service and Globus Metadata Catalog Service) – Storage management/brokering– Compute management/brokering (e.g., Condor from

University of Wisconsin, Madison)– Monitoring/auditing service

Page 14: Cerngridtech Bapril

C/F Grid Architecture IV• Collective 2 Layer: Services for Coordinating Multiple

Resources that are Specific to an Application Domain or a Virtual Organization

– Request Interpretation and Planning Services (e.g., Globus Chimera and Pegasus for Physics Applications and Condor DAGMan)

– Workflow management service (e.g., Globus Pegasus)

– Application-Specific Data Discovery Services (e.g., Earth Systems Grid Metadata Catalog)

– Community Authorization service (e.g., Globus CAS)

– Consistency Services with varying levels of consistency, including data versioning, subscription, distributed file systems or distributed databases

Page 15: Cerngridtech Bapril

Composing These Services To Provide Higher-Level Functionality

• For example, a Grid File System might compose:– Fabric layer: storage components, compute elements– Connectivity layer: security and communication

protocols– Resource layer: data access protocols or services and

storage resource management– Collective layers: transport and discovery services,

collective storage management, monitoring and auditing, authorization and consistency services

Page 16: Cerngridtech Bapril

Peer to Peer NetworkPeer to Peer Network

User

Resource

Service

Routing

User

Resource

Service

Routing

User

Resource

Service

Routing

User

Resource

Service

Routing

User

Resource

Service

Routing

User

Resource

Service

Routing

Peers

Peers are Jacks of all Trades linked to “all” peers in community

Typically Integrated Clients Servers and Resources

Page 17: Cerngridtech Bapril

ServicesNB Routing

Peer to Peer (Hybrid) GridPeer to Peer (Hybrid) Grid

User

Resource

Service

Routing

User

Resource

Service

Routing

User

Resource

Service

Routing

User

Resource

Service

Routing

User

Resource

Service

Routing

User

Resource

Service

Routing

DynamicMessage or EventRouting fromPeers orServers

Page 18: Cerngridtech Bapril

Peer to Peer GridPeer to Peer Grid

DatabaseDatabase

Peers

Peers

Peer to Peer Grid

A democratic organization

User FacingWeb Service Interfaces

Service FacingWeb Service Interfaces

Event/MessageBrokers

Event/MessageBrokers

Event/MessageBrokers

Chapter 18 and 19 Grid Book

Page 19: Cerngridtech Bapril

Entropia: Desktop GridEntropia: Desktop Grid EntropiaEntropia (chapter 12 of book), (chapter 12 of book), United DevicesUnited Devices, ,

ParabonParabon, , SETI@HomeSETI@Home etc. have demonstrated etc. have demonstrated “internet Computing” or Desktop Grid very succesfully“internet Computing” or Desktop Grid very succesfully

Used to be called Used to be called peer-to-peer computingpeer-to-peer computing but that fell but that fell out of favor due to Napster’s bad nameout of favor due to Napster’s bad name

CondorCondor has similar types of utility but Entropia has similar types of utility but Entropia optimized foroptimized for– Huge number of clientsHuge number of clients

– Providing a secure “sandbox”Providing a secure “sandbox” for application to run in for application to run in which guarantees that which guarantees that application will notapplication will notharm clientharm client

Page 20: Cerngridtech Bapril

Scaling of Entropia ApplicationScaling of Entropia Application

Page 21: Cerngridtech Bapril

JobManagement

ResourceSchedulinng

Physical NodeManagement

Job Manager

Subjob Scheduler

Node Manager

End-user

1

2

3

45

7

8

6b

Entropia Clients

a

computation

resource

resource description

Entropia ArchitectureEntropia Architecture Application Execution on the Entropia System. End-user submits computation to Job Management (1). The Job Manager breaks up the computation into many independent “subjobs” (2) and submits the subjobs to the resource scheduler. In the mean time, the available resources of a client are periodically reported to the Node Manager (a) that informs the Subjob Scheduler (b) using the resource descriptions. The Subjob Scheduler matches the computation needs with the available resources (3) and schedules the computation to be executed by the clients (4,5,6). Results of the computation are sent to the Job Manager (7), put together, and handed back to the end-user (8).

Page 22: Cerngridtech Bapril

Information Grids I• Actually nearly all Grids consist of composing access

to data with processing of that data in some computer program

• In Compute/File Grids (Data Grids for Globus), one naturally allowed database access from programs although in some cases dominant access is to files

• In Information Grids, we consider access to databases but view of course files as a special case of databases

• Real difference is what tier we are looking at:– Compute/File Grids are looking at “backend

resources”– Information Grids are looking at “middle tier”

because typically data volumes are not large enough to stress typical middle-tier mechanisms

Page 23: Cerngridtech Bapril

Information Grids II• Should use Middle tier where possible and adopt

hybrid model with control always in middle tier and using backend only where needed– This would require reworking a lot of tools e.g.

Condor should schedule services not jobs• Most programming models either specify “program

view” or “service view” and do not separate– Developments like GT3 will allow changes but it will take a

long time before key tools are implemented in hybrid mode

• Note Bioinformatics and many other Information Grids only require service view– These applications have in UK e-Science started with “Web

Service” and not “Globus” view

Page 24: Cerngridtech Bapril

Raw (HPC) Resources

Middleware

Database

PortalServices

SystemServices

SystemServices

SystemServices

Application Service

SystemServices

SystemServices

GridComputing

Environments

UserServices

“Core”Grid

Service View

Program View

Page 25: Cerngridtech Bapril

OGSA-DAI(Malcolm Atkinson Edinburgh)

UK e-Science Grid Core Programme

Development of Data Access and Integration Services for OGSA

http://umbriel.dcs.gla.ac.uk/NeSC/general/projects/OGSA_DAI

- Access to XML Databases -

- Access to Relational Databases -

- Distributed Query Processing -

- XML Schema Support for e-Science -

Page 26: Cerngridtech Bapril

DAI Key Services

GridDataService GDS Access to data & DB operations

GridDataServiceFactory GDSF Makes GDS & GDSF

GridDataServiceRegistry GDSR Discovery of GDS(F) & Data

GridDataTranslationService GDTS Translates or Transforms Data

GridDataTransportDepot GDTD Data transport with persistence

Integrated Structured Data TransportRelational & XML models supportedRole-based AuthorisationBinary structured files (later)

Page 27: Cerngridtech Bapril

1a. Request to Registry for sources of data about “x”

1b. Registry responds with

Factory handle2a. Request to Factory for access to database

2b. Factory creates GridDataService to manage access

2c. Factory returns handle of GDS to client

3a. Client queries GDS with XPath, SQL, etc

3b. GDS interacts with database

3c. Results of query returned to client as XML

SOAP/HTTP

service creation

API interactions

Registry

Factory

Grid Data Service

Client

XML / Relational database

Page 28: Cerngridtech Bapril

Client

Client

Client

Relational database

Grid Data Service

Directory / File system

XML database

Interface transparency:

one GDS supports multiple database

types

Page 29: Cerngridtech Bapril

Software Availability

Available now Phase 1 prototype of GDS,

GDSF & GDSR for XMLJava implementations for the axis/tomcat platform and the Xindice database

• Globus-2 Relational database support

BinX Schema v0.2www.epcc.ed.ac.uk/gridserve/WP5

An XML Schema for describing the structure of binary datafiles – the power of XML for terabyte files

Software Q1 2003 Reference

implementation 1

Access & Update

• XML databases

• Relational databases

• To be released as Basic Services in Globus Toolkit 3

umbriel.dcs.gla.ac.uk/NeSC/general/projects/OGSA_DAI/products

Page 30: Cerngridtech Bapril

Advanced Components

DB

Consumer

GDS Client

GDT

Translation

Translation

GDS:PerformScript

Page 31: Cerngridtech Bapril

Composed Components

Translation

Consumer

GDS

Translation

GDT

GDS:performScript

GDT

GDT

Client

GDS:performScript

GDS:performScript

GDS:performScript

Page 32: Cerngridtech Bapril

Futures of OGSA-DAI

Allow querying of distributed databases – this is using Grid to federate multiple databases

Grid is “intrinsically” federation technology – need to mimic classic database federation ideas in a Grid languageForm composite Schema from integration of those of individual databases (OGSA-DAI allows you to query each database web service to find schema)

Decide how to deal with very important case where user view is a complex filter run on database query

Hardest when need to dynamically assign resource to perform filterCould view as a “simulation Web Service” outside OGSA-DAI

DBFilter

WSDLOf Filter

Page 33: Cerngridtech Bapril

“The Semantic Web is an extension of the current Web in which information is given a well-defined meaning, better enabling computers and people to work in cooperation. It is the idea of having data on the Web defined and linked in a way that it can be used for more effective discovery, automation, integration and reuse across various applications. The Web can reach its full potential if it becomes a place where data can be processed by automated tools as well as people”

From the W3C Semantic Web Activity statement

Semantic Grid starts with the Semantic Web which is a “dream” and a project of W3C

Digital Brilliance is phase transition coming from “collective effect” in the Grid Spin Glass. • The Hosting environment is the “Ether”• The Resources are the Spins• The forces are the meta-data linking resources• Knowledge (The Higgs) will emerge when we get enough meta-data to force phase transition

Page 34: Cerngridtech Bapril

Resource Description Framework

Page 35: Cerngridtech Bapril

ClassicalWeb

SemanticWeb

Ric

her

sem

antic

s

Page 36: Cerngridtech Bapril

OWL Web Ontology Language

“The World Wide Web as it is currently constituted resembles a poorly mapped geography. Our insight into the documents and capabilities available are based on keyword searches, abetted by clever use of document connectivity and usage patterns. The sheer mass of this data is unmanageable without powerful tool support. In order to map this terrain more precisely, computational agents require machine-readable descriptions of the content and capabilities of web accessible resources. These descriptions must be in addition to the human-readable versions of that information.

The OWL Guide

Page 37: Cerngridtech Bapril

SW Tools

Good Tools for recording meta-data (OWL) but not soadvanced in looking at their implications

Page 38: Cerngridtech Bapril

ClassicalWeb

ClassicalGrid

More computation

• Semantic Web requires a metadata-enabled Web• Where will the metadata come from?• How about from the linked rich resources of a virtual organization?• A Grid …….

Page 39: Cerngridtech Bapril

Compute Resources Catalogs Data Archives

InformationDiscovery

Metadatadelivery

Data Discovery

Data Delivery

Catalog Mediator Data mediator

1. Portals and Workbenches

Bulk DataAnalysis

CatalogAnalysis

MetadataView

DataView

4.Grid SecurityCachingReplicationBackupScheduling

2.Knowledge & ResourceManagement

Standard Metadata format, Data model, Wire format

Catalog/Image Specific Access

Standard APIs and Protocols Concept space

3.

5.

6.

7. Derived Collections

Astronomy Sky SurveyData Grid

Grid is metadata based middleware

Page 40: Cerngridtech Bapril

An Example of RDF and Dublin Core• <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-

ns#" xmlns:dc="http://purl.org/metadata/dublin_core#">• <rdf:Description about="http://www.dlib.org">• <dc:Title>D-Lib Program - Research in Digital

Libraries</dc:Title><dc:Description>The D-Lib program supports the community of people with research interests in digital libraries and electronic publishing. </dc:Description><dc:Publisher>Corporation For National Research Initiatives</dc:Publisher><dc:Date>1995-01-07</dc:Date>

• <dc:Subject>– <rdf:Bag> <rdf:li>Research; statistical methods</rdf:li>

<rdf:li>Education, research, related topics</rdf:li><rdf:li>Library use Studies</rdf:li> </rdf:Bag>

• </dc:Subject> <dc:Type>World Wide Web Home Page</dc:Type>

• <dc:Format>text/html</dc:Format>• <dc:Language>en</dc:Language>• </rdf:Description> </rdf:RDF>

Page 41: Cerngridtech Bapril

• Annotations of results, workflows and database entries could be represented by RDF graphs using controlled vocabularies described in RDF Schema and OWL

• Personal notes can be XML documents annotated with metadata or RDF graphs linked to results or experimental plans

• Exporting results as RDF makes them available to be reasoned over

• RDF graphs can be the “glue” that associates all the components (literature, notes, code, databases, intermediate results, sketches, images, workflows, the person doing the experiment, the lab they are in, the final paper)

• The provenance trails that keep a record of how a collection of services were orchestrated so they can be replicated or replayed, or act as evidence

For example…

Page 42: Cerngridtech Bapril

• Represent the syntactic data types of e-Science objects using XML Schema data types

• Represent domain ontologies for the semantic mediation between database schema, an application’s inputs and outputs, and workflow work items

• Represent domain ontologies and rules for parameters of machines or algorithms to reason over allowed configurations

• Use reasoning over execution plans, workflows and other combinations of services to ensure the semantic validity of the composition

• Use RDF as a common data model for merging results drawn from different resources or instruments

• Capture the structure of messages that are exchanged between components

More meta-data …

Page 43: Cerngridtech Bapril

• At the data/computation layer: classification of computational and data resources, performance metrics, job control, management of physical and logical resources

• At the information layer: schema integration, workflow descriptions, provenance trail

• At the knowledge layer: problem solving selection, intelligent portals

• Governance of the Grid, for example access rights to databases, personal profiles and security groupings

• Charging infrastructure, computational economy, support for negotiation; e.g. through auction model

And more meta-data …

Page 44: Cerngridtech Bapril

ClassicalWeb

ClassicalGrid

SemanticWeb

Ric

her

sem

antic

s

More computation

SemanticGrid

Source: Norman Paton

http://www.semanticgrid.org

Page 45: Cerngridtech Bapril

Summary of Grid Types• Compute/File Grid: The “Linux workstation view of distributed

system” – need planning, scheduling of 10,000’s jobs, efficient movement of data to processors

• Desktop Grid: as above but use huge numbers of “foreign” compute resources

• Information Grids: Web service access to meta-data rich data repositories

• Hybrid (complexity) Grids: Combination of Information and Compute/File Grids

• Peer-to-peer Grid: Unstructured general purpose access to other style grids

• Semantic Grid: Enables knowledge discovery in all Grids