Top Banner
The EU DataGrid Architecture The European DataGrid Project Team http://www.eu-datagrid.org [email protected]
65
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: The EU DataGrid Architecture The European DataGrid Project Team  Peter.Kunszt@cern.ch.

The EU DataGrid Architecture

The European DataGrid Project Team

http://www.eu-datagrid.org

[email protected]

Page 2: The EU DataGrid Architecture The European DataGrid Project Team  Peter.Kunszt@cern.ch.

The EDG Architecture Tutorial - n° 2

Contents

Middleware architecture overview

EDG structure Job scheduling

Fabric management

Data Management

Monitoring

Storage

Networking

Summary

Page 3: The EU DataGrid Architecture The European DataGrid Project Team  Peter.Kunszt@cern.ch.

The EDG Architecture Tutorial - n° 3

EDG middleware architecture Globus hourglass

Current EDG architectural functional blocks: Basic Services ( authentication, authorization, Replica

Catalog, secure file transfer,Info Providers) rely on Globus 2.0 (GSI, GRIS/GIIS,GRAM, MDS)

OS & Net services

Basic Services

High level GRID middleware

LHCVO common application layer

Other apps

ALICE ATLAS CMS LHCb

Specific application layer Other apps

GLOBUS 2.0

GRID middleware

Page 4: The EU DataGrid Architecture The European DataGrid Project Team  Peter.Kunszt@cern.ch.

The EDG Architecture Tutorial - n° 4

DataGrid Architecture

Collective ServicesCollective Services

Information & MonitoringInformation

& MonitoringReplica

ManagerReplica

ManagerGrid

SchedulerGrid

Scheduler

Local ApplicationLocal Application Local DatabaseLocal Database

Underlying Grid ServicesUnderlying Grid Services

Computing Element Services

Computing Element Services

Authorization Authentication & Accounting

Authorization Authentication & Accounting

Replica CatalogReplica Catalog

Storage Element Services

Storage Element Services

Database Services

Database Services

Fabric servicesFabric services

ConfigurationManagement

ConfigurationManagement

Node Installation &Management

Node Installation &Management

Monitoringand Fault Tolerance

Monitoringand Fault Tolerance

Resource Management

Resource Management

Fabric StorageManagement

Fabric StorageManagement

Grid

Fabric

Local Computing

Grid Grid Application LayerGrid Application Layer

Data Management

Data Management

Job Management

Job Management

Metadata Management

Metadata Management

Object to File

Mapping

Object to File

Mapping

Logging & Book-

keeping

Logging & Book-

keeping

Page 5: The EU DataGrid Architecture The European DataGrid Project Team  Peter.Kunszt@cern.ch.

The EDG Architecture Tutorial - n° 5

EDG middleware architecture: EDG interfaces

Computing Computing ElementsElements

SystemSystem ManagersManagers

ScientisScientiststs

OperatingOperating SystemSystem

FileFile SystemsSystems

StorageStorage ElementsElements

MassMass Storage Storage SystemsSystemsHPSS, CastorHPSS, Castor

UserUser AccountsAccounts

CertificateCertificate AuthoritiesAuthorities

ApplicationApplication DevelopersDevelopers

BatchBatch SystemsSystems

Collective ServicesCollective Services

Info & MonitorInfo &

MonitorReplica

ManagerReplica

ManagerGrid

SchedulerGrid

Scheduler

Local Application

Local Application

Local DatabaseLocal Database

Underlying Grid ServicesUnderlying Grid Services

Computing Element Services

Computing Element Services

Authorization Authentication

Accounting

Authorization Authentication

AccountingReplica CatalogReplica Catalog

Storage Element Services

Storage Element Services

SQL Database Services

SQL Database Services

Fabric servicesFabric services

ConfigManagem.

ConfigManagem.

Node Installation Managem.

Node Installation Managem.

MonitoringFault

Tolerance

MonitoringFault

Tolerance

Resource Managem.Resource

Managem.Fabric

StorageManagem.

Fabric Storage

Managem.

Grid Application LayerGrid Application Layer

Data Managem.

Data Managem.

Job Managem.

Job Managem.

Metadata Managem.Metadata

Managem.Object to File MapObject to File Map

Logging & Book-

keeping

Logging & Book-

keeping

Page 6: The EU DataGrid Architecture The European DataGrid Project Team  Peter.Kunszt@cern.ch.

The EDG Architecture Tutorial - n° 6

EDG middleware architecture: The Workload Management System

(WP1)

WP1 is responsible for the Workload Management System (WMS).

The WMS is currently composed by the following parts:

User Interface (UI) : access point for the user to the GRID ( using JDL)

Resource Broker (RB) : the broker of GRID resources, matchmaking

Job Submission System (JSS) : Condor-G; interfacing batch systems

Information Index (II) : an LDAP server used as a filter to select resources

Logging and Bookkeeping services (LB) : MySQL databases to store Job Info

Page 7: The EU DataGrid Architecture The European DataGrid Project Team  Peter.Kunszt@cern.ch.

The EDG Architecture Tutorial - n° 7

WP1: Work Load Management

ComponentsJob Description Language

Resource Broker

Job Submission Service

Information Index

User Interface

Logging & Bookkeeping Service

Collective ServicesCollective Services

Info & MonitorInfo &

MonitorReplica

ManagerReplica

ManagerGrid

SchedulerGrid

Scheduler

Local ApplicationLocal Application Local DatabaseLocal Database

Underlying Grid ServicesUnderlying Grid Services

Computing Element Services

Computing Element Services

Authorization Authentication

Accounting

Authorization Authentication

Accounting

Replica CatalogReplica Catalog

Storage Element Services

Storage Element Services

Fabric servicesFabric services

ConfigManagement

ConfigManagement

Node Installation Management

Node Installation Management

MonitoringFault

Tolerance

MonitoringFault

Tolerance

Resource Managem.Resource

Managem.Fabric StorageManagement

Fabric StorageManagement

Grid Application LayerGrid Application Layer

Data Managem.

Data Managem.

Metadata Managem.Metadata

Managem.Object to

File Mapping

Object to File

Mapping

Logging & Book-

keeping

Logging & Book-

keeping

Implementation: UI : python (LB client : C++)

RB : C++

JSS : C++, python

II : LDAP server

LB: MySQL, C++

Input/Output Sandboxes: GridFTP

Job Managem.

Job Managem.

SQL Database Services

SQL Database Services

WMS main interfaces: Globus Gatekeeper

WP2 Replica Catalog APIs

WP3 Information Systems

WP7 network monitoring info providers

End User (using JDL files, on the UI)

Page 8: The EU DataGrid Architecture The European DataGrid Project Team  Peter.Kunszt@cern.ch.

The EDG Architecture Tutorial - n° 8

EDG middleware architecture: WP1 (WMS)

Page 9: The EU DataGrid Architecture The European DataGrid Project Team  Peter.Kunszt@cern.ch.

The EDG Architecture Tutorial - n° 10

File Management

Site A

Storage Element A Storage Element B

Site B

File BFile A File X

File Y File BFile A File C

File D

File Transfer

Page 10: The EU DataGrid Architecture The European DataGrid Project Team  Peter.Kunszt@cern.ch.

The EDG Architecture Tutorial - n° 11

File Management

Site A

Storage Element A Storage Element B

Site B

File BFile A File X

File Y File BFile A File C

File D

Replica Catalog: Map Logical to Site files

File Transfer

Page 11: The EU DataGrid Architecture The European DataGrid Project Team  Peter.Kunszt@cern.ch.

The EDG Architecture Tutorial - n° 12

File Management

Site A

Storage Element A Storage Element B

Site B

File BFile A File X

File Y File BFile A File C

File D

Replica Catalog: Map Logical to Site files

File Transfer

Replica Selection: Get ‘best’ file

Page 12: The EU DataGrid Architecture The European DataGrid Project Team  Peter.Kunszt@cern.ch.

The EDG Architecture Tutorial - n° 13

File Management

Site A

Storage Element A Storage Element B

Site B

File BFile A File X

File Y File BFile A File C

File D

Replica Catalog: Map Logical to Site files

File Transfer

Pre- Post-processing: Prepare files for transferValidate files after transfer

Replica Selection: Get ‘best’ file

Page 13: The EU DataGrid Architecture The European DataGrid Project Team  Peter.Kunszt@cern.ch.

The EDG Architecture Tutorial - n° 14

File Management

Site A

Storage Element A Storage Element B

Site B

File BFile A File X

File Y File BFile A File C

File D

Replica Catalog: Map Logical to Site files

File Transfer

Pre- Post-processing: Prepare files for transferValidate files after transfer

Replica Selection: Get ‘best’ file

Replication Automation:

Data Source subscription

Page 14: The EU DataGrid Architecture The European DataGrid Project Team  Peter.Kunszt@cern.ch.

The EDG Architecture Tutorial - n° 15

File Management

Site A

Storage Element A Storage Element B

Site B

File BFile A File X

File Y File BFile A File C

File D

Replica Catalog: Map Logical to Site files

File Transfer

Pre- Post-processing: Prepare files for transferValidate files after transfer

Replica Selection: Get ‘best’ file

Replication Automation:

Data Source subscription

Load balancing: Replicate based on usage

Page 15: The EU DataGrid Architecture The European DataGrid Project Team  Peter.Kunszt@cern.ch.

The EDG Architecture Tutorial - n° 16

File Management

Site A

Storage Element A Storage Element B

Site B

File BFile A File X

File Y File BFile A File C

File D

Replica Catalog: Map Logical to Site files

File Transfer

Replica Manager:‘atomic’ replication operationsingle client interfaceorchestrator

Pre- Post-processing: Prepare files for transferValidate files after transfer

Replica Selection: Get ‘best’ file

Replication Automation:

Data Source subscription

Load balancing: Replicate based on usage

Page 16: The EU DataGrid Architecture The European DataGrid Project Team  Peter.Kunszt@cern.ch.

The EDG Architecture Tutorial - n° 17

File Management

Site A

Storage Element A Storage Element B

Site B

File BFile A File X

File Y File BFile A File C

File D

Replica Catalog: Map Logical to Site files

File Transfer

Replica Manager:‘atomic’ replication operationsingle client interfaceorchestrator

Pre- Post-processing: Prepare files for transferValidate files after transfer

Replica Selection: Get ‘best’ file

Replication Automation:

Data Source subscription

Load balancing: Replicate based on usageMetadata:

LFN metadataTransaction informationAccess patterns

Page 17: The EU DataGrid Architecture The European DataGrid Project Team  Peter.Kunszt@cern.ch.

The EDG Architecture Tutorial - n° 18

File Management

Site A

Storage Element A Storage Element B

Site B

File BFile A File X

File Y File BFile A File C

File D

Replica Catalog: Map Logical to Site files

File Transfer

Replica Manager:‘atomic’ replication operationsingle client interfaceorchestrator

Pre- Post-processing: Prepare files for transferValidate files after transfer

Replica Selection: Get ‘best’ file

Replication Automation:

Data Source subscription

Load balancing: Replicate based on usageMetadata:

LFN metadataTransaction informationAccess patterns

Page 18: The EU DataGrid Architecture The European DataGrid Project Team  Peter.Kunszt@cern.ch.

The EDG Architecture Tutorial - n° 19

Current State File Transfer: Use GridFTP – deployed

Close collaboration with Globus NetLogger (Brian Tierney and John Bresnahan)

Replication: GDMP – deployed Wrapper around Globus ReplicaCatalog All functionality in one integrated package Using Globus 2 Uses GridFTP for transferring file

Replication: edg-replica-manager – deployed

Replication: Replica Location Service Giggle – in testing Distributed Replica Catalog

Replication: Replica Manager Reptor – in testing

Optimization: Replica Selection OptorSim – in simulation

Metadata Storage: SQL Database Service Spitfire – deployed Servlets on HTTP(S) with XML (XSQL) GSI enabled access + extensions

GSI interface to CASTOR – delivered

Page 19: The EU DataGrid Architecture The European DataGrid Project Team  Peter.Kunszt@cern.ch.

The EDG Architecture Tutorial - n° 20

WP2: Data Management

Deployed ComponentsGridFTP

Replica Manager - edg-replica-manager

Replica Catalog - globus-replica-catalog

GDMP

Spitfire

Collective ServicesCollective Services

Info & MonitorInfo &

MonitorGrid

SchedulerGrid

SchedulerReplica

ManagerReplica

Manager

Local ApplicationLocal Application Local DatabaseLocal Database

Underlying Grid ServicesUnderlying Grid Services

Computing Element Services

Computing Element Services

Authorization Authentication

Accounting

Authorization Authentication

Accounting

Replica CatalogReplica Catalog

Fabric servicesFabric services

ConfigManagement

ConfigManagement

Node Installation Management

Node Installation Management

MonitoringFault

Tolerance

MonitoringFault

Tolerance

Resource Managem.Resource

Managem.Fabric StorageManagement

Fabric StorageManagement

Grid Application LayerGrid Application Layer

Job Managem.

Job Managem.

Metadata Managem.Metadata

Managem.Object to

File Mapping

Object to File

Mapping

Logging & Book-

keeping

Logging & Book-

keeping

Implementation: RM: C++ classes (under development)

RC : Globus Replica Catalog wrapper

GDMP : C++

Spitfire : Java, Web Services

Data Managem.

Data Managem.

SQL Database Services

SQL Database Services

WP2 main interfaces: The GRID Storage Element

WP1 Resource Broker APIs

WP3 GRID Info services

WP7 network monitoring info providers

End User (using GDMP)

Storage Element Services

Storage Element Services

Page 20: The EU DataGrid Architecture The European DataGrid Project Team  Peter.Kunszt@cern.ch.

The EDG Architecture Tutorial - n° 21

Copy data file to storage element:globus-url-copy file:///${chemin}/L69999

gsiftp://lxshare0219.cern.ch/flatfiles/SE1/lhcb/L69999

Register stored data in the catalog:/opt/globus/bin/globus-job-run lxshare0219.cern.ch /bin/bash -c "export

GDMP_CONFIG_FILE=/opt/edg/lhcb/etc/gdmp.conf;/opt/edg/bin/gdmp_register_local_file -d /flatfiles/SE1/lhcb"

Publish catalog:/opt/globus/bin/globus-job-run lxshare0219.cern.ch /bin/bash -c "export

GDMP_CONFIG_FILE=/opt/edg/lhcb/etc/gdmp.conf; /opt/edg/bin/gdmp_publish_catalogue -n"

Copy output to MSS: rfcp L1600061 /castor/cern.ch/lhcb/mc/L1600061

Example of Data Management by LHCb

Page 21: The EU DataGrid Architecture The European DataGrid Project Team  Peter.Kunszt@cern.ch.

The EDG Architecture Tutorial - n° 22

ReplicaOptimiser

Replica Manager

Replica Catalogue

SE

CE

ReplicaOptimiser

Replica Manager

SE

CEphysical file transfer

communication

Client

The Replica Manager APIs

Page 22: The EU DataGrid Architecture The European DataGrid Project Team  Peter.Kunszt@cern.ch.

The EDG Architecture Tutorial - n° 23

The Replica Manager APIs

RM.copy(PhysicalFileName source,

PhysicalFileName destination,

String protocol):Status

allows for third-party transfer

transfer between: two StorageElements or ComputingElement and Storage Element Space management policies under development

Page 23: The EU DataGrid Architecture The European DataGrid Project Team  Peter.Kunszt@cern.ch.

The EDG Architecture Tutorial - n° 24

RM.add/deletePhysicalFileName(LogicalFileName lfn,

PhysicalFileName pfn)

Replica Catalogue operations only - no file transfer

RM.copyAndAddPhysicalFile(PhysicalFileName source,

PhysicalFileName destination,

LogicalFileName lfn,

String protocol):Status

third-party transfer but :

files can only be registered in Replica Catalogue if destination PFN contains a valid SE (i.e. needs to be registered in the RC)!

RM.deletePhysicalFile(LogicalFileName lfn,

PhysicalFileName pfn)

The Replica Manager APIs

Page 24: The EU DataGrid Architecture The European DataGrid Project Team  Peter.Kunszt@cern.ch.

The EDG Architecture Tutorial - n° 25

WP2 next generation Replication Services

Replica Manager

Replica Metadata

Replica Location

File Transfer

Optimization

Transaction

Consistency

Preprocessing

Postprocessing

Subscription

Client

Reptor

Giggle

RepMeC

Optor

GDMP

Page 25: The EU DataGrid Architecture The European DataGrid Project Team  Peter.Kunszt@cern.ch.

The EDG Architecture Tutorial - n° 26

Replication Services Architecture

ReplicaLocation

Index

Site

Replica Manager

StorageElement

ComputingElement

Optimiser

Resource Broker

User Interface

Pre-/Post-processing

Core API

Optimisation API

Processing API

LocalReplicaCatalog

ReplicaLocation

Index

ReplicaMetadata Catalog

ReplicaLocation

Index

Site

Replica Manager

StorageElement

ComputingElement

Optimiser

Pre-/Post-processing

LocalReplicaCatalog

Page 26: The EU DataGrid Architecture The European DataGrid Project Team  Peter.Kunszt@cern.ch.

The EDG Architecture Tutorial - n° 27

Metadata Management and Security

Project Spitfire

'Simple' Grid Persistency Grid Metadata Application Metadata Unified Grid enabled front end to relational databases.

Metadata Replication and Consistency

Publish information on the metadata service

Secure Grid Services

Grid authentication, authorization and access control mechanisms enabled in Spitfire

Modular design, reusable by other Grid Services

Page 27: The EU DataGrid Architecture The European DataGrid Project Team  Peter.Kunszt@cern.ch.

The EDG Architecture Tutorial - n° 28

Spitfire Architecture

Oracle DB2 PostGres MySQL

Atomic RDBMS is always consistent

No local replication of data

Role-based authorization

XSQL Servlet as one access mode

for ‘simple’ web access

Web/Grid Services Paradigm SOAP interfaces JDBC interface to RDBMS

Plugability and extensibility

OracleLayer DB2Layer PGLayer MyLayerLocal Spitfire

Layer

Connecting Layer Global Spitfire LayerSOAP

SOAP SOAP

SOAP SOAP

SOAP

Page 28: The EU DataGrid Architecture The European DataGrid Project Team  Peter.Kunszt@cern.ch.

The EDG Architecture Tutorial - n° 29

WP3’s task is to provide information about

The Grid itself This includes information about resources (ComputingElements, StorageElements and the Network), for which the Globus MDS is a common solution; and job status information(as implemented by WP1's Logging and Bookkeeping).

Grid applications This is information published by user jobs. This is used for performance monitoring.

WP3 : GRID monitoring and Info Providers

Page 29: The EU DataGrid Architecture The European DataGrid Project Team  Peter.Kunszt@cern.ch.

The EDG Architecture Tutorial - n° 30

Main WP3 components: MDS v 2.1: the Globus Monitoring and Discovery Services based on

Soft State Registration protocols and LDAP aggregate directory services

Ftree : EDG developed directory service based on OpenLDAP plus caching to address shortcoming in MDS v1, optimizing data access performances

R-GMA: Relational GMA (Grid Monitoring Architecture [Consumers, Producers and Directory Services, GGF] ) implementation which makes information from producers available to consumers as relations (tables) . It also uses relations to handle the registration of producers. R-GMA is consistent with GMA principles.

GRM / PROVE: Application monitoring and visualization tools of the P-GRADE graphical parallel programming environment, properly modified for application monitoring in the DataGrid. The instrumentation library of GRM is generalized for a flexible trace event specification. The components of GRM will be connected to the R-GMA using its Producer and Consumer APIs.

WP3 : GRID monitoring and Info Providers

Page 30: The EU DataGrid Architecture The European DataGrid Project Team  Peter.Kunszt@cern.ch.

The EDG Architecture Tutorial - n° 31

R-GMA

Use the GMA from GGF

A relational implementation

Applied to both information and monitoring

Creates impression that you have one RDBMS per VO

Producer

Consumer

Registry

subscribe

lookup

Page 31: The EU DataGrid Architecture The European DataGrid Project Team  Peter.Kunszt@cern.ch.

The EDG Architecture Tutorial - n° 32

Relational Approach

Producers announce: SQL “CREATE TABLE” publish: SQL “INSERT”

Consumers collect: SQL “SELECT”

Page 32: The EU DataGrid Architecture The European DataGrid Project Team  Peter.Kunszt@cern.ch.

The EDG Architecture Tutorial - n° 33

R-GMA

API – Servlet communication http(s) in

XML back

Sensor Code

ProducerAPI

Application Code

ConsumerAPI

ProducerServlet

RegistryAPI

Registry Servlet

SchemaAPI

Schema Servlet

Consumer Servlet

RegistryAPI

Page 33: The EU DataGrid Architecture The European DataGrid Project Team  Peter.Kunszt@cern.ch.

The EDG Architecture Tutorial - n° 34

Schema & ContributionsCPULoad (Global Schema)

Country Site Facility Load Timestamp

UK RAL CDF 0.3 19055711022002

UK RAL ATLAS 1.6 19055611022002

UK GLA CDF 0.4 19055811022002

UK GLA ALICE 0.5 19055611022002

CH CERN ALICE 0.9 19055611022002

CH CERN CDF 0.6 19055511022002

CPULoad (Producer3)

CH CERN ATLAS 1.6 19055611022002

CH CERN CDF 0.6 19055511022002

CPULoad (Producer 1)

UK RAL CDF 0.3 19055711022002

UK RAL ATLAS 1.6 19055611022002

CPULoad (Producer 2)

UK GLA CDF 0.4 19055811022002

UK GLA ALICE 0.5 19055611022002

Page 34: The EU DataGrid Architecture The European DataGrid Project Team  Peter.Kunszt@cern.ch.

The EDG Architecture Tutorial - n° 35

Contributions are Views

CPULoad (Producer 1)

UK RAL CDF 0.3 19055711022002

UK RAL ATLAS 1.6 19055611022002

CPULoad (Producer 2)

UK GLA CDF 0.4 19055811022002

UK GLA ALICE 0.5 19055611022002

SELECT * FROM cpuLoad

WHERE country = ’UK’ AND site = ’RAL’

SELECT * FROM cpuLoad

WHERE country = ’UK’ AND site = ’GLA’

Page 35: The EU DataGrid Architecture The European DataGrid Project Team  Peter.Kunszt@cern.ch.

The EDG Architecture Tutorial - n° 36

WP3: GRID Monitoring

ComponentsMDS / FTree

R-GMA

GRM/Prove

Collective ServicesCollective Services

Info & MonitorInfo &

MonitorReplica

ManagerReplica

ManagerGrid

SchedulerGrid

Scheduler

Local ApplicationLocal Application Local DatabaseLocal Database

Underlying Grid ServicesUnderlying Grid Services

Computing Element Services

Computing Element Services

Authorization Authentication

Accounting

Authorization Authentication

Accounting

Replica CatalogReplica Catalog

Storage Element Services

Storage Element Services

Fabric servicesFabric services

ConfigManagement

ConfigManagement

Node Installation Management

Node Installation Management

MonitoringFault

Tolerance

MonitoringFault

Tolerance

Resource Managem.Resource

Managem.Fabric StorageManagement

Fabric StorageManagement

Grid Application LayerGrid Application Layer

Data Managem.

Data Managem.

Metadata Managem.Metadata

Managem.Object to

File Mapping

Object to File

Mapping

Logging & Book-

keeping

Logging & Book-

keeping

Implementation: MDS : LDAP, Globus GRIS, GIIS

FTree : OpenLDAP, caching

RGMA : Java , C++, MySQL, TomCat

GRM / PROVE : P-GRADE

Job Managem.

Job Managem.

SQL Database Services

SQL Database Services

WP3 main interfaces: WP1 Resource Broker ( InfoIndex)

WP2 RM optimizer

all GRID services producing info (SE,CE..)

WP7 network monitoring

Page 36: The EU DataGrid Architecture The European DataGrid Project Team  Peter.Kunszt@cern.ch.

The EDG Architecture Tutorial - n° 37

WP4 is responsible to deliver a computing fabric comprised of all the necessary tools to manage a center providing grid services on clusters of thousands of nodes. The computing fabric is called the Computing Element in EDG.

User Job Control and Management (Grid and local jobs) on fabric batch and/or interactive CPU services

Gridification – Grid interface to fabric resources

Resource Management – manage underlying batch services

Automated System Administration for Computing Fabric Elements. These subsystems are reserved for system administrators and operators for performing system maintenance

Configuration Management

Installation Management

Fabric Monitoring

EDG middleware architecture: WP4 : Fabric Management

Components

Page 37: The EU DataGrid Architecture The European DataGrid Project Team  Peter.Kunszt@cern.ch.

The EDG Architecture Tutorial - n° 38

Farm A (LSF) Farm B (PBS)

Grid User

(Mass storage,Disk pools)

Local User

Installation &Node Mgmt

ConfigurationManagement

Monitoring &Fault Tolerance

FabricGridification

ResourceManagement

Grid InfoServices(WP3)

WP4 subsystems

Other Wps

ResourceBroker(WP1)

Data Mgmt(WP2)

Grid DataStorage(WP5)

WP4 Architecture logical overview

Page 38: The EU DataGrid Architecture The European DataGrid Project Team  Peter.Kunszt@cern.ch.

The EDG Architecture Tutorial - n° 39

Farm A (LSF) Farm B (PBS)

Grid User

(Mass storage,Disk pools)

Local User

Installation &Node Mgmt

ConfigurationManagement

Monitoring &Fault Tolerance

FabricGridification

ResourceManagement

Grid InfoServices(WP3)

WP4 subsystems

Other Wps

ResourceBroker(WP1)

Data Mgmt(WP2)

Grid DataStorage(WP5)

WP4 Architecture logical overview

- Interface between Grid-wide services and local fabric;

- Provides local authentication, authorization and mapping of grid credentials.

- Interface between Grid-wide services and local fabric;

- Provides local authentication, authorization and mapping of grid credentials.

Page 39: The EU DataGrid Architecture The European DataGrid Project Team  Peter.Kunszt@cern.ch.

The EDG Architecture Tutorial - n° 40

Farm A (LSF) Farm B (PBS)

Grid User

(Mass storage,Disk pools)

Local User

Installation &Node Mgmt

ConfigurationManagement

Monitoring &Fault Tolerance

FabricGridification

ResourceManagement

Grid InfoServices(WP3)

WP4 subsystems

Other Wps

ResourceBroker(WP1)

Data Mgmt(WP2)

Grid DataStorage(WP5)

WP4 Architecture logical overview

- provides transparent access (both job and admin) to different cluster batch systems;

- enhanced capabilities (extended scheduling policies, advanced reservation, local accounting).

- provides transparent access (both job and admin) to different cluster batch systems;

- enhanced capabilities (extended scheduling policies, advanced reservation, local accounting).

Page 40: The EU DataGrid Architecture The European DataGrid Project Team  Peter.Kunszt@cern.ch.

The EDG Architecture Tutorial - n° 41

Farm A (LSF) Farm B (PBS)

Grid User

(Mass storage,Disk pools)

Local User

Installation &Node Mgmt

ConfigurationManagement

Monitoring &Fault Tolerance

FabricGridification

ResourceManagement

Grid InfoServices(WP3)

WP4 subsystems

Other Wps

ResourceBroker(WP1)

Data Mgmt(WP2)

Grid DataStorage(WP5)

WP4 Architecture logical overview

- provides the tools to install and manage all software running on the fabric nodes;

-Agent to install, upgrade, remove and configure software packages on the nodes.

-bootstrap services and software repositories.

- provides the tools to install and manage all software running on the fabric nodes;

-Agent to install, upgrade, remove and configure software packages on the nodes.

-bootstrap services and software repositories.

Page 41: The EU DataGrid Architecture The European DataGrid Project Team  Peter.Kunszt@cern.ch.

The EDG Architecture Tutorial - n° 42

Farm A (LSF) Farm B (PBS)

Grid User

(Mass storage,Disk pools)

Local User

Installation &Node Mgmt

ConfigurationManagement

Monitoring &Fault Tolerance

FabricGridification

ResourceManagement

Grid InfoServices(WP3)

WP4 subsystems

Other Wps

ResourceBroker(WP1)

Data Mgmt(WP2)

Grid DataStorage(WP5)

WP4 Architecture logical overview

-provides a central storage and management of all fabric configuration information;

-Compile HLD templates to LLD node profiles

- central DB and set of protocols and APIs to store and retrieve information.

-provides a central storage and management of all fabric configuration information;

-Compile HLD templates to LLD node profiles

- central DB and set of protocols and APIs to store and retrieve information.

Page 42: The EU DataGrid Architecture The European DataGrid Project Team  Peter.Kunszt@cern.ch.

The EDG Architecture Tutorial - n° 43

Farm A (LSF) Farm B (PBS)

Grid User

(Mass storage,Disk pools)

Local User

Installation &Node Mgmt

ConfigurationManagement

Monitoring &Fault Tolerance

FabricGridification

ResourceManagement

Grid InfoServices(WP3)

WP4 subsystems

Other Wps

ResourceBroker(WP1)

Data Mgmt(WP2)

Grid DataStorage(WP5)

WP4 Architecture logical overview - provides the tools

for gathering monitoring information on fabric nodes;

-central measurement repository stores all monitoring information;

- fault tolerance correlation engines detect failures and trigger recovery actions.

- provides the tools for gathering monitoring information on fabric nodes;

-central measurement repository stores all monitoring information;

- fault tolerance correlation engines detect failures and trigger recovery actions.

Page 43: The EU DataGrid Architecture The European DataGrid Project Team  Peter.Kunszt@cern.ch.

The EDG Architecture Tutorial - n° 44

User job management (Grid and local)

Farm A (LSF) Farm B (PBS)

Grid User

(Mass storage,Disk pools)

Local User

Monitoring

FabricGridification

ResourceManagement

Grid InfoServices(WP3)

WP4 subsystems

Other Wps

ResourceBroker(WP1)

Data Mgmt(WP2)

Grid DataStorage(WP5)

Page 44: The EU DataGrid Architecture The European DataGrid Project Team  Peter.Kunszt@cern.ch.

The EDG Architecture Tutorial - n° 45

User job management (Grid and local)

Farm A (LSF) Farm B (PBS)

Grid User

(Mass storage,Disk pools)

Local User

Monitoring

FabricGridification

ResourceManagement

Grid InfoServices(WP3)

WP4 subsystems

Other Wps

ResourceBroker(WP1)

Data Mgmt(WP2)

Grid DataStorage(WP5)

- Submit job- Submit job

Page 45: The EU DataGrid Architecture The European DataGrid Project Team  Peter.Kunszt@cern.ch.

The EDG Architecture Tutorial - n° 46

User job management (Grid and local)

Farm A (LSF) Farm B (PBS)

Grid User

(Mass storage,Disk pools)

Local User

Monitoring

FabricGridification

ResourceManagement

Grid InfoServices(WP3)

WP4 subsystems

Other Wps

ResourceBroker(WP1)

Data Mgmt(WP2)

Grid DataStorage(WP5)

- publish resource and accounting information

- publish resource and accounting information

Page 46: The EU DataGrid Architecture The European DataGrid Project Team  Peter.Kunszt@cern.ch.

The EDG Architecture Tutorial - n° 47

User job management (Grid and local)

Farm A (LSF) Farm B (PBS)

Grid User

(Mass storage,Disk pools)

Local User

Monitoring

FabricGridification

ResourceManagement

Grid InfoServices(WP3)

WP4 subsystems

Other Wps

ResourceBroker(WP1)

Data Mgmt(WP2)

Grid DataStorage(WP5)

- Optimized selection of site

- Optimized selection of site

Page 47: The EU DataGrid Architecture The European DataGrid Project Team  Peter.Kunszt@cern.ch.

The EDG Architecture Tutorial - n° 48

User job management (Grid and local)

Farm A (LSF) Farm B (PBS)

Grid User

(Mass storage,Disk pools)

Local User

Monitoring

FabricGridification

ResourceManagement

Grid InfoServices(WP3)

WP4 subsystems

Other Wps

ResourceBroker(WP1)

Data Mgmt(WP2)

Grid DataStorage(WP5)

- Authorize

- Map grid local credentials

- Authorize

- Map grid local credentials

Page 48: The EU DataGrid Architecture The European DataGrid Project Team  Peter.Kunszt@cern.ch.

The EDG Architecture Tutorial - n° 49

User job management (Grid and local)

Farm A (LSF) Farm B (PBS)

Grid User

(Mass storage,Disk pools)

Local User

Monitoring

FabricGridification

ResourceManagement

Grid InfoServices(WP3)

WP4 subsystems

Other Wps

ResourceBroker(WP1)

Data Mgmt(WP2)

Grid DataStorage(WP5)

- Select an optimal batch queue and submit

- Return job status and output

- Select an optimal batch queue and submit

- Return job status and output

Page 49: The EU DataGrid Architecture The European DataGrid Project Team  Peter.Kunszt@cern.ch.

The EDG Architecture Tutorial - n° 50

Automated management of large clusters

WP4 subsystems

Other Wps

Farm A (LSF) Farm B (PBS)

Installation &Node Mgmt

ConfigurationManagement

Monitoring &Fault ToleranceResource

Management

Information

Invocation

Page 50: The EU DataGrid Architecture The European DataGrid Project Team  Peter.Kunszt@cern.ch.

The EDG Architecture Tutorial - n° 51

Automated management of large clusters

WP4 subsystems

Other Wps

Farm A (LSF) Farm B (PBS)

Installation &Node Mgmt

ConfigurationManagement

Monitoring &Fault ToleranceResource

Management

Information

Invocation

- Node malfunction detected

- Node malfunction detected

Page 51: The EU DataGrid Architecture The European DataGrid Project Team  Peter.Kunszt@cern.ch.

The EDG Architecture Tutorial - n° 52

Automated management of large clusters

WP4 subsystems

Other Wps

Farm A (LSF) Farm B (PBS)

Installation &Node Mgmt

ConfigurationManagement

Monitoring &Fault ToleranceResource

Management

Information

Invocation

-Remove node from queue

-Wait for running jobs(?)

-Remove node from queue

-Wait for running jobs(?)

Page 52: The EU DataGrid Architecture The European DataGrid Project Team  Peter.Kunszt@cern.ch.

The EDG Architecture Tutorial - n° 53

Automated management of large clusters

WP4 subsystems

Other Wps

Farm A (LSF) Farm B (PBS)

Installation &Node Mgmt

ConfigurationManagement

Monitoring &Fault ToleranceResource

Management

Information

Invocation

- Update configuration templates

- Update configuration templates

Page 53: The EU DataGrid Architecture The European DataGrid Project Team  Peter.Kunszt@cern.ch.

The EDG Architecture Tutorial - n° 54

Automated management of large clusters

WP4 subsystems

Other Wps

Farm A (LSF) Farm B (PBS)

Installation &Node Mgmt

ConfigurationManagement

Monitoring &Fault ToleranceResource

Management

Information

Invocation

- Trigger repair- Trigger repair

Page 54: The EU DataGrid Architecture The European DataGrid Project Team  Peter.Kunszt@cern.ch.

The EDG Architecture Tutorial - n° 55

Automated management of large clusters

WP4 subsystems

Other Wps

Farm A (LSF) Farm B (PBS)

Installation &Node Mgmt

ConfigurationManagement

Monitoring &Fault ToleranceResource

Management

Information

Invocation

- Repair (e.g. restart, reboot, reconfigure, …)

- Repair (e.g. restart, reboot, reconfigure, …)

Page 55: The EU DataGrid Architecture The European DataGrid Project Team  Peter.Kunszt@cern.ch.

The EDG Architecture Tutorial - n° 56

Automated management of large clusters

WP4 subsystems

Other Wps

Farm A (LSF) Farm B (PBS)

Installation &Node Mgmt

ConfigurationManagement

Monitoring &Fault ToleranceResource

Management

Information

Invocation

- Node OK detected- Node OK detected

Page 56: The EU DataGrid Architecture The European DataGrid Project Team  Peter.Kunszt@cern.ch.

The EDG Architecture Tutorial - n° 57

Automated management of large clusters

WP4 subsystems

Other Wps

Farm A (LSF) Farm B (PBS)

Installation &Node Mgmt

ConfigurationManagement

Monitoring &Fault ToleranceResource

Management

Information

Invocation

- Put back node in queue

- Put back node in queue

Page 57: The EU DataGrid Architecture The European DataGrid Project Team  Peter.Kunszt@cern.ch.

The EDG Architecture Tutorial - n° 58

Automated management of large clusters

WP4 subsystems

Other Wps

Farm A (LSF) Farm B (PBS)

Installation &Node Mgmt

ConfigurationManagement

Monitoring &Fault ToleranceResource

Management

Information

Invocation

Automation

Page 58: The EU DataGrid Architecture The European DataGrid Project Team  Peter.Kunszt@cern.ch.

The EDG Architecture Tutorial - n° 59

LCFG (Local ConFiGuration system)

Widely used fabric tool, whose purpose is to handle automated installation and configuration in a very diverse and evolving environment

Mechanism: Abstract configuration parameters are stored in a central

repository located in the LCFG server.

Scripts on the host machine (LCFG client) read these configuration parameters and either generate traditional configuration files, or directly manipulate various services.

Page 59: The EU DataGrid Architecture The European DataGrid Project Team  Peter.Kunszt@cern.ch.

The EDG Architecture Tutorial - n° 60

WP4: Fabric Management

ComponentsLCFG

Fabric Monitoring

PBS & LSF info providers

Image installation

Config. Cache Mgr

Collective ServicesCollective Services

Info & MonitorInfo &

MonitorReplica

ManagerReplica

ManagerGrid

SchedulerGrid

Scheduler

Local ApplicationLocal Application Local DatabaseLocal Database

Underlying Grid ServicesUnderlying Grid Services

Computing Element Services

Computing Element Services

Authorization Authentication

Accounting

Authorization Authentication

Accounting

Replica CatalogReplica Catalog

Storage Element Services

Storage Element Services

Fabric servicesFabric services

ConfigManagement

ConfigManagement

Node Installation Management

Node Installation Management

MonitoringFault

Tolerance

MonitoringFault

Tolerance

Resource Managem.Resource

Managem.Fabric StorageManagement

Fabric StorageManagement

Grid Application LayerGrid Application Layer

Data Managem.

Data Managem.

Metadata Managem.Metadata

Managem.Object to

File Mapping

Object to File

Mapping

Logging & Book-

keeping

Logging & Book-

keeping

Implementation: LCFG : C++, XML, HTTP

Job Managem.

Job Managem.

SQL Database Services

SQL Database Services

WP4 main interfaces: WP1 Resource Broker ( InfoIndex)

WP2 Data management

WP5 Storage Element

WP3 GRID Info Services

Page 60: The EU DataGrid Architecture The European DataGrid Project Team  Peter.Kunszt@cern.ch.

The EDG Architecture Tutorial - n° 61

WP5 delivers the Grid interface to Storage.

Its service, the Storage Element (SE) is interfacing to underlying Mass Storage Systems or simple storage services.

WP5 : Mass Storage Management

Page 61: The EU DataGrid Architecture The European DataGrid Project Team  Peter.Kunszt@cern.ch.

The EDG Architecture Tutorial - n° 62

Interface1

Interface3

Interface2

Message Queue

Session Manager

System Log House Keeping

MetaData

MSSInterface

MSSInterface

MSS1 MSS2

Top layer

Core

Bottom layer

Clients ( RB,JSS, RM, GDMP, InfoServices(WP3),User Applic running on CEs, CLIs)

Storage Element

The SE architecture

Page 62: The EU DataGrid Architecture The European DataGrid Project Team  Peter.Kunszt@cern.ch.

The EDG Architecture Tutorial - n° 63

Client SE

ReplicaManager/Catalog

Storage6

2

3

4

1

1. The Client asks a catalog to provide the location of a file2. The catalog responds with the name of an SE3. The client asks the SE for the file4. The SE asks the storage system to provide the file5. The storage system sends the file to the client through the SE or 6. directly

5

6

SE Interactions

Page 63: The EU DataGrid Architecture The European DataGrid Project Team  Peter.Kunszt@cern.ch.

The EDG Architecture Tutorial - n° 64

WP5: Mass Storage Management Achievements

Definition of Architecture and Design for DataGrid storage Element

Collaboration with Globus on GridFTP/RFIO

Collaboration with PPDG on control API Staging from/to CASTOR at CERN

succesfully implemented and tested Succesfully Interfaced to GDMP

Supported Storage Systems: UNIX disk systems HPSS (High Performance Storage

System) CASTOR (through RFIO) GridFTP servers DMF Enstore

Collective ServicesCollective Services

Info & MonitorInfo &

MonitorReplica

ManagerReplica

ManagerGrid

SchedulerGrid

Scheduler

Local ApplicationLocal Application Local DatabaseLocal Database

Underlying Grid ServicesUnderlying Grid Services

Computing Element Services

Computing Element Services

Authorization Authentication

Accounting

Authorization Authentication

Accounting

Replica CatalogReplica Catalog

Storage Element Services

Storage Element Services

Fabric servicesFabric services

ConfigManagement

ConfigManagement

Node Installation Management

Node Installation Management

MonitoringFault

Tolerance

MonitoringFault

Tolerance

Resource Managem.Resource

Managem.Fabric StorageManagement

Fabric StorageManagement

Grid Application LayerGrid Application Layer

Data Managem.

Data Managem.

Metadata Managem.Metadata

Managem.Object to

File Mapping

Object to File

Mapping

Logging & Book-

keeping

Logging & Book-

keeping

Job Managem.

Job Managem.

SQL Database Services

SQL Database Services

WP5 (SE) main interfaces: WP1 Resource Broker & JSS

WP2 RM, RC

WP7 for GRIDftp monitoring

WP3 GRID Info Services

Page 64: The EU DataGrid Architecture The European DataGrid Project Team  Peter.Kunszt@cern.ch.

The EDG Architecture Tutorial - n° 65

WP6: TestBed Integration and demonstrators

WP6 goals: the EDG testbed

Integration of EDG sw releases (currently 1.2) and deployment all over the EDG testbed : the integration team

Working implementation of multiple VOs & basic security infrastructure

Definition of acceptable usage contracts and creation of Certification Authorities group

Set up of the Authorization Working Group to manage authorization policies on the testbed

Components

Support for test-VO, mkgridmap tools

Globus packaging & EDG config

Build tools, CVS central s/w repository

End-user documents

Collective ServicesCollective Services

Info & MonitorInfo &

MonitorReplica

ManagerReplica

ManagerGrid

SchedulerGrid

Scheduler

Local ApplicationLocal Application Local DatabaseLocal Database

Underlying Grid ServicesUnderlying Grid Services

Computing Element Services

Computing Element Services

Authorization Authentication Accounting

Authorization Authentication Accounting

Replica CatalogReplica Catalog

Storage Element Services

Storage Element Services

Fabric servicesFabric services

ConfigManagement

ConfigManagement

Node Installation Management

Node Installation Management

MonitoringFault

Tolerance

MonitoringFault

Tolerance

Resource Managem.Resource

Managem.Fabric StorageManagement

Fabric StorageManagement

Grid Application LayerGrid Application Layer

Data Managem.

Data Managem.

Metadata Managem.Metadata

Managem.Object to

File Mapping

Object to File

Mapping

Logging & Book-

keeping

Logging & Book-

keeping

Job Managem.

Job Managem.

SQL Database Services

SQL Database Services

Page 65: The EU DataGrid Architecture The European DataGrid Project Team  Peter.Kunszt@cern.ch.

The EDG Architecture Tutorial - n° 66

Further Information

DataGrid Dx.2 Deliverables: x=1..5

DataGrid D12.4 Deliverable