Top Banner
The Grid a brief briefing Carole Goble Information Management Group
56

The Grid a brief briefing

Feb 25, 2016

Download

Documents

menora

The Grid a brief briefing. Carole Goble Information Management Group. Roadmap. What is the Grid? Example projects Relationship to the Semantic Web Example architectures The international programme. Take Home. The Grid is an international activity - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: The Grid a brief briefing

The Grida brief briefingCarole GobleInformation Management Group

Page 2: The Grid a brief briefing

Roadmap What is the Grid? Example projects Relationship to the Semantic Web

Example architectures The international programme

Page 3: The Grid a brief briefing

Take Home The Grid is an international activity The Grid has attracted high profile

industrial and government support and funding

The Information/Knowledge Grid is in many ways indistinguishable from the Semantic Web

The Grid Community’s understanding of generic and theoretical issues for the IK Grid is immature and hackery.

Page 4: The Grid a brief briefing

So what’s the Grid?

Isn’t it just High Performance Computing for High Energy

Physicists?

Page 5: The Grid a brief briefing

Why Grids? Large-scale science and engineering are

done through the interaction of people, heterogeneous computing resources, information systems, and instruments, all of which are geographically and organizationally dispersed.

The overall motivation for “Grids” is to facilitate the routine interactions of these resources in order to support large-scale science and engineering.

From Bill Johnston 27 July 01

Page 6: The Grid a brief briefing

CERN: Large Hadron Collider (LHC)Raw Data: 1 Petabyte / secFiltered 100Mbyte / sec = 1 Petabyte / year = 1 Million CD ROMs

CMS Detector

Page 7: The Grid a brief briefing

Why Grids? A biochemist exploits 10,000 computers to

screen 100,000 compounds in an hour; A biologist combines a range of diverse and

distributed resources (databases, tools, instruments) to answer complex questions;

1,000 physicists worldwide pool resources for petaop analyses of petabytes of data

Civil engineers collaborate to design, execute, & analyze shake table experiments

From Steve Tuecke 12 Oct. 01

Page 8: The Grid a brief briefing

Why Grids? (contd.) Climate scientists visualize, annotate, &

analyze terabyte simulation datasets An emergency response team couples real

time data, weather model, population data A multidisciplinary analysis in aerospace

couples code and data in four companies A home user invokes architectural design

functions at an application service provider

From Steve Tuecke 12 Oct. 01

Page 9: The Grid a brief briefing

Why Grids? (contd.) An application service provider

purchases cycles from compute cycle providers

Scientists working for a multinational soap company design a new product

A community group pools members’ PCs to analyze alternative designs for a local road

From Steve Tuecke 12 Oct. 01

Page 10: The Grid a brief briefing

The Grid Vision “…flexible, secure, coordinated

resource-sharing among dynamic collections of individuals, institutions, and resources–what we refer to as virtual organisations”

“The Anatomy of the Grid: Enabling Scalable Virtual Organizations” Foster, Kesselman and Tuecke, 2001

Page 11: The Grid a brief briefing

The Grid Problem Enable communities (“virtual

organizations”) to share geographically distributed resources as they pursue common goals -- assuming the absence of… central location, central control, omniscience, existing trust relationships.

From Steve Tuecke 12 Oct. 01

Page 12: The Grid a brief briefing

Large scale Multi-disciplinary

simulation Decision support

and optimization Virtual prototyping Collaborative

analysis and visualization

Large scale distributed data management

Large scale distributed computation

High speed communications

Dynamic collaborative virtual organisations

Visualisation

Data Computation

stretch

Page 13: The Grid a brief briefing

interrogation

workflows

results

Collaboration GridTechnology Grid

What is it?Where is it?

How to get it?When did it? happen?

Who knows it?Why does it?

What are you doing?

Governance & Control

Page 14: The Grid a brief briefing

Online Access to Scientific Instruments

DOE X-ray grand challenge: ANL, USC/ISI, NIST, U.Chicago

tomographic reconstruction

real-timecollection

wide-areadissemination

Advanced Photon Source

archival storage

From Steve Tuecke 12 Oct. 01

desktop & VR clients with

shared controls

Page 15: The Grid a brief briefing

Supernova Cosmology

Page 16: The Grid a brief briefing

Network for EarthquakeEngineering Simulation

NEESgrid: national infrastructure to couple earthquake engineers with experimental facilities, databases, computers, & each other

On-demand access to experiments, data streams, computing, archives, collaboration

NEESgrid: Argonne, Michigan, NCSA, UIUC, USCFrom Steve Tuecke 12 Oct. 01

Page 17: The Grid a brief briefing

Home ComputersEvaluate AIDS Drugs

Community = 1000s of home

computer users Philanthropic

computing vendor (Entropia)

Research group (Scripps)

Common goal= advance AIDS research

From Steve Tuecke 12 Oct. 01

Page 18: The Grid a brief briefing

myGrid Personalised extensible

environments for data-intensive in silico experiments in biology

Straightforward discovery, interoperation, sharing

Workflow oriented provenance propagating change

Individual creativity & collaborative working personalisation

Page 19: The Grid a brief briefing

myGrid resourcesQuestion: Nucleotide binding protein in mouse

Answer: P12345 in Swiss-Prot is an ATPaseTerri Attwood is an expert on thisJackson Labs have a database but you need to

registerA paper has just been published in Proteins by

the Stanford lab on this.

Page 20: The Grid a brief briefing

GeoDISE – engineering design optimisation

Access to knowledge repository Access to optimisation and search tools Industrial analysis codes Distributed computing and data resources in

design optimisation Applied to industrial problems - large scale

CFD codes Demonstrate scalability across distributed

computational and data resources and teams of designers

Page 21: The Grid a brief briefing

GeoDISE Modern engineering firms are global and distributed

“Not just a problem of using HPC”

CAD and analysis tools, user interfaces, PSEs, and

Visualization

Optimisation methods

Data archives (e.g. design/ system usage)

Knowledge repositories & knowledge capture and reuse

tools.

Management of distributed compute and data resources

How to … ?

… improve design environments… cope with legacy code / systems

… integrate large-scale systems in a flexible way

… produce optimized designs

… archive and re-use design history

… capture and re-use knowledge

Page 22: The Grid a brief briefing

Virtual Sky http://virtualsky.org/

Page 23: The Grid a brief briefing

Broader Context “Grid Computing” has much in common

with major industrial thrusts Business-to-business, Peer-to-peer, Application

Service Providers, Storage Service Providers, Distributed Computing, Internet Computing…

Sharing issues not adequately addressed by existing technologies Complicated requirements: “run program X at

site Y subject to community policy P, providing access to data at Z according to policy Q”

High performance: unique demands of advanced & high-performance systems

From Steve Tuecke 12 Oct. 01

Page 24: The Grid a brief briefing

Elements of the Problem Resource sharing

Computers, storage, sensors, networks, … Sharing always conditional: issues of trust, policy,

negotiation, payment, … Coordinated problem solving

Beyond client-server: distributed data analysis, computation, collaboration, …

Dynamic, multi-institutional virtual organisations Community overlays on classic org structures Large or small, static or dynamic

Problem Solving Environments

From Steve Tuecke 12 Oct. 01

Page 25: The Grid a brief briefing

Broader Context “Grid Computing” has much in common with

major industrial thrusts Business-to-business, Peer-to-peer, Application

Service Providers, Storage Service Providers, Distributed Computing, Internet Computing…

Sharing issues not adequately addressed by existing technologies Complicated requirements: “run program X at

site Y subject to community policy P, providing access to data at Z according to policy Q”

High performance: unique demands of advanced & high-performance systems

From Steve Tuecke 12 Oct. 01

Page 26: The Grid a brief briefing

The Globus Project™ Close collaboration with real Grid projects in science

and industry Development and promotion of standard Grid

protocols to enable interoperability and shared infrastructure

Development and promotion of standard Grid software APIs and SDKs to enable portability and code sharing

The Globus Toolkit™: Open source, reference software base for building grid infrastructure and applications

Global Grid Forum: Development of standard protocols and APIs for Grid computing

From Steve Tuecke 12 Oct. 01

Page 27: The Grid a brief briefing

Doesn’t Globus solve it all? Globus ToolKit is focused on the

Data/Computational layer No database connectivity Little brokering, and static not dynamic Weak metadata management, workflow Trashes firewalls No, not everything is JCL, FTP and LDAP Distributed computation dominates

etc…etc…

Page 28: The Grid a brief briefing

Is it done? NASA Power Grid is the only one really

working http://www.ipg.nasa.gov Linking similar supercomputers owned by

the same organisation Computation-focused

High Energy Physics is atypical

Page 29: The Grid a brief briefing

Example Application Projects AstroGrid: astronomy, etc.

(UK) Earth Systems Grid:

environment (US DOE) EU DataGrid: physics,

environment, etc. (EU) EuroGrid: various (EU) Fusion Collaboratory (US DOE) GridLab: astrophysics, etc.

(EU) Grid Physics Network (US

NSF) MetaNEOS: numerical

optimization (US NSF) NEESgrid: civil engineering

(US NSF)

RealityGrid (UK) DAME (UK) Comb-e-Chem (UK) GeoDISE (UK) iVDGL, StarLight (US/EU) DiscoveryNet (UK) myGrid (UK) GridPP (UK) Particle Physics Data Grid

(US DOE) etc…

Page 30: The Grid a brief briefing

“ … Since the early days of mankind the primary motivation for the establishment of communities has been the idea that by being part of an organized group the capabilities of an individual are improved. The great progress in the area of inter-computer communication led to the development of means by which stand-alone processing sub-systems can be integrated into multi-computer ‘communities’. … “

Miron Livny, “ Study of Load Balancing Algorithms for Decentralized Distributed Processing Systems.”, Ph.D thesis, July 1983.

Page 31: The Grid a brief briefing

Every Community needs a Matchmaker!

Condor uses Matchmakers to build Computing Communities out of Commodity Components

.. someone has to bring together community members who have requests for goods and services with members who offer them. Both sides are looking for each other Both sides have constraints Both sides have preferences

Page 32: The Grid a brief briefing

Lets look at some Architectures

Page 33: The Grid a brief briefing

A Desiderata (adapted from Globus)

Software development toolkits e.g. Globus toolkit

Standard protocols, services & APIs

A modular “bag of technologies” Enable incremental

development of grid-enabled tools and applications

Reference implementations Learn through deployment and

applications Open source

Diverse global services

Coreservices

Local OS

A p p l i c a t i o n s

Page 34: The Grid a brief briefing
Page 35: The Grid a brief briefing

Globus Layered Grid ArchitectureCERN - High Energy Physics

Application

Fabric“Controlling things locally”: Access to, & control of, resources

Connectivity“Talking to things”: communication (Internet protocols) & security

Resource“Sharing single resources”: negotiating access, controlling use

Collective“Coordinating multiple resources”: ubiquitous infrastructure services, app-specific distributed services

InternetTransport

Application

Link

Internet Protocol Architecture

From Steve Tuecke 12 Oct. 01

Page 36: The Grid a brief briefing

Keith Jeffery

Page 37: The Grid a brief briefing

Scientific Problems

Processes

Knowledge

Information

Jobs and Data

Data

Raw Resources

Knowledge / capability

Semantics / process

Data / applications

Value chain

Interoperability, higher level ontologies, reasoning, discovery, Reasoning services, Discovery services

Fulfillment Grid

"Reproduced by permission of the IT Innovation Centre, University of Southampton." http://www.it-innovation.soton.ac.uk

Three Layer Grid Abstraction

Page 38: The Grid a brief briefing

Grid

In

form

atio

n

Serv

ice

Uni

form

Res

ourc

eA

cces

sB

roke

ring

Glo

bal

Que

uing

Glo

bal E

vent

Serv

ices

Co-

Sche

dulin

g

Dat

a C

atal

ogui

ngU

nifo

rm D

ata

Acc

ess

Col

labo

ratio

n an

d R

emot

e In

stru

men

t Se

rvic

es

Net

wor

k C

ache

Com

mun

icat

ion

Serv

ices

Aut

hent

icat

ion

Aut

horiz

atio

n

Secu

rity

Serv

ices

Aud

iting

Faul

t M

anag

emen

t

Mon

itorin

g

Grid Common Services: Standardized Services and Resources Interfaces

Applications: Simulations, Data Analysis, etc.Toolkits: Visualization, Data Publication/Subscription, etc.

Distributed Resources

Discipline Specific Portals andScientific Workflow Management Systems

Condor pools

networkcaches

tertiary storage national user facilities

clustersnational supercomputer

facilities

High-speed Networks and Communications Services

= Globus services

Architecture of a Grid

Page 39: The Grid a brief briefing

Architecture of a Grid – upper layersProblem Solving Environments

• Knowledge based query• Tools to implement the human interfaces, e.g. SciRun, ECCE, WebFlow, .....• Mechanisms to express, organize, and manage the workflow of problem solutions (“frameworks”)• Access control

appl

icatio

n co

des

visu

aliza

tion

tool

kits

colla

bora

tion

tool

kits

inst

rum

ent

man

agem

ent

tool

kits

data

pub

lish

and

subs

crib

e to

olki

ts

Applications and Supporting Tools

Grid enabled libraries (security, communication services, data access, global event management, etc.)

Glob

us

MPI

CORB

A

Cond

or-G Java

/Jin

i

DCOM

Application Development and Execution Support

Distributed ResourcesGrid Common Services

Page 40: The Grid a brief briefing

“Knowledge Based” Data Grids

AttributesSemantics

Knowledge

Information

Data

Ingest Services

Management AccessServices

(Model-based Access)

(Data Handling System - SRB)

MC

AT/

HD

F

Grid

s

XM

L D

TD

SDLI

P

XTM

DTD

Rul

es -

KQ

L

InformationRepository

Attribute- based Query

Feature-basedQuery

Knowledge orTopic-Based Query / Browse

KnowledgeRepository for Rules

RelationshipsBetweenConcepts

FieldsContainersFolders

Storage(Replicas,Persistent IDs)

National Partnership for Advanced Computational Infrastructure

Page 41: The Grid a brief briefing

Compute Resources Catalogs Data Archives

InformationDiscovery

Metadatadelivery

Data Discovery

Data Delivery

Catalog Mediator Data mediator

1. Portals and Workbenches

Bulk DataAnalysis

CatalogAnalysis

MetadataView

DataView

4.Grid SecurityCachingReplicationBackupScheduling

2.Knowledge & ResourceManagement

Standard Metadata format, Data model, Wire format

Catalog/Image Specific Access

Standard APIs and Protocols Concept space

3.

5.

6.

7. Derived Collections

Astronomy Sky SurveyData Grid

Page 42: The Grid a brief briefing

referenceditems &

collections

referenceditems &

collections

ReferencedItems &

Collections

NSDLServicesNSDL

ServicesOther NSDLServices

CI Services

visualization...

CI Services

discussion

CI Services

personalization

CI Services

topic-map registry

CI Services

query transform

Core Services:annotation

Core Collection-Building Servicespersistent storage

Core Collection-Building Servicesmetadata harvesting

Core Services:metadata normalizing

Portals &ClientsPortals &

ClientsPortals &Clients

Usage Enhancement

Collection Building

User Interfaces

NSDLCollectionsNSDL

CollectionsNSDL

Collections

Metadata & data access-based

services

Core NSDL BusMeta-data delivery

Data deliveryQuery

Global IdsSecurityNetwork

Virtual Collections &Mediators

Information about collections

Delivery PresentationAggregation - Channels

NSDL

Page 43: The Grid a brief briefing

ERA Concept model

Mediation of Information using XML

Storage Resource Broker/Extensible Meta-data CATalog

ERA: Archival Components Concept

Metadata

ArchivalRepository

OrderFulfillment

System

ReferenceWorkbench

Query

Rebuild

Present

Tapes

AccessioningWorkbench

Accession

Verify

Wrap & Containerize

Describe

CollectionDisks

Internet

Collection

Collection

Archival Research CatalogRecords

Schedules

Grid Security Infrastructure

Page 44: The Grid a brief briefing
Page 45: The Grid a brief briefing

The De Roure Triangle

Agents Web ServicesSemantic Web

Grid Computing

e-Busines

s

e-Scienc

e

?

Page 46: The Grid a brief briefing

California Institute of TechnologyRoy Williams Paul Messina

Page 48: The Grid a brief briefing

£80m Collaborative projects

E-ScienceSteering

Committee

DG Research Councils

Director Director’s

Management RoleDirector’s

Awareness and Co-ordination Role

Generic Challenges EPSRC (£15m), DTI (£15m)

Industrial Collaboration (£40m)

Academic Application SupportProgramme

Research Councils (£74m), DTI (£5m)

PPARC (£26m) BBSRC (£8m) MRC (£8m) NERC (£7m) ESRC (£3m) EPSRC (£17m) CLRC (£5m)

Grid TAG

From Tony Hey 27 July 01

E-Science Programme

Page 49: The Grid a brief briefing

Key Elements of UK Grid Development Plan

Development of Generic Grid Middleware Network of Grid Core Programme e-Science

Centres National Centre http://www.nesc.ac.uk/ Regional Centres http://www.esnw.ac.uk/

Grid IRC Grand Challenge Project Support for e-Science Pilots Short term funding for e-Science

demonstrators Grid Network Team * Grid Engineering Team Grid Support Centre * Task Forces

Adapted from Tony Hey 27 July 01

Page 50: The Grid a brief briefing

Take Home The Grid is an international activity The Grid has attracted high profile

industrial and government support and funding

The Information/Knowledge Grid is in many ways indistinguishable from the Semantic Web

The Grid Community’s understanding of generic and theoretical issues for the IK Grid is immature and hackery.

Page 51: The Grid a brief briefing

Spares

Page 52: The Grid a brief briefing

Supernova Cosmology

Page 53: The Grid a brief briefing

Home ComputersEvaluate AIDS Drugs

Community = 1000s of home

computer users Philanthropic

computing vendor (Entropia)

Research group (Scripps)

Common goal= advance AIDS research

From Steve Tuecke 12 Oct. 01

Page 54: The Grid a brief briefing

Grid viewpoints

interrogation

workflows

results

Access Grid

New Biology

Technology Grid

private

public

What is it?Where is it?

How to get it?When did it happen?

Who knows it?Why does it?

What are you doing?

Governance & Control

Page 55: The Grid a brief briefing

Particle Physics and Astronomy Research Council (PPARC)

GridPP (http://www.gridpp.ac.uk/) to develop the Grid technologies required

to meet the LHC computing challenge ASTROGRID

(http://www.astrogrid.ac.uk/) a ~£4M project aimed at building a data-

grid for UK astronomy, which will form the UK contribution to a global Virtual Observatory

Page 56: The Grid a brief briefing

Infrastructure Deployments Institutional Grid deployments: deploying

services and network infrastructure DISCOM, IPG, TeraGrid, DOE Science Grid, DOD

Grid, NEESgrid, ASCI (Netherlands) International deployments: supporting

international experiments and science iVDGL, StarLight

Support centers U.K. Grid Center U.S. GRIDS Center