Top Banner
E. Ronchieri – n° 1 EDG release 2 Elisabetta Ronchieri INFN CNAF - DataGrid WP1 – Workload Management System elisabetta . ronchieri @ cnaf . infn .it
34

E. Ronchieri – n° 1 EDG release 2 Elisabetta Ronchieri INFN CNAF - DataGrid WP1 – Workload Management System [email protected].

Dec 19, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: E. Ronchieri – n° 1 EDG release 2 Elisabetta Ronchieri INFN CNAF - DataGrid WP1 – Workload Management System elisabetta.ronchieri@cnaf.infn.it.

E. Ronchieri – n° 1

EDG release 2

Elisabetta Ronchieri INFN CNAF - DataGrid WP1 – Workload Management System

[email protected]

Page 2: E. Ronchieri – n° 1 EDG release 2 Elisabetta Ronchieri INFN CNAF - DataGrid WP1 – Workload Management System elisabetta.ronchieri@cnaf.infn.it.

E. Ronchieri – n° 2

Outline

What is Grid?

Grid Projects – Focus on EU Data Grid Project

Selected Areas + Technologies Security – Information and Monitoring Services – Storage Management

- Data Management - Workload Management

Installation

Page 3: E. Ronchieri – n° 1 EDG release 2 Elisabetta Ronchieri INFN CNAF - DataGrid WP1 – Workload Management System elisabetta.ronchieri@cnaf.infn.it.

E. Ronchieri – n° 3

Grid Vision

Researchers, Grid Middleware, Scientific instruments and experiments and Resources are the major figures

Researchers interact with colleagues, share and access data Grid middleware provides part of the sw infrastructure Experiments provides huge amount of data

Grid is: a special form of distributed computing

Computing and storage resources are distributed over several sites Sites are typically connected via wide-area NW links

It can be best applied to applications that have the following features: Distributed user community Lots of computing power (Computational Grid) Lots of storage capacity (Data Grid)

Currently, it is applied mainly in computing sciences

Page 4: E. Ronchieri – n° 1 EDG release 2 Elisabetta Ronchieri INFN CNAF - DataGrid WP1 – Workload Management System elisabetta.ronchieri@cnaf.infn.it.

E. Ronchieri – n° 4

Grid Today

Still many steps must be done (especially to make the Grid popular to a conventional user)

Considerable expertise is still required (especially to make efficient the use of the Grid technology)

There is no single Grid (several projects,…)

Grids need to work together for a standardization Global Grid Forum (GGF http://www.ggf.org)

Its mission is to promote and develop Grid technologies and applications There are a lot working group in several different areas (Scheduling and

Resource Management, Security, ….)

Page 5: E. Ronchieri – n° 1 EDG release 2 Elisabetta Ronchieri INFN CNAF - DataGrid WP1 – Workload Management System elisabetta.ronchieri@cnaf.infn.it.

E. Ronchieri – n° 5

Outline

What is Grid?

Grid Projects – Focus on EU Data Grid Project

Selected Areas + Technologies Security – Information and Monitoring Services – Storage Management

- Data Management - Workload Management

Installation

Page 6: E. Ronchieri – n° 1 EDG release 2 Elisabetta Ronchieri INFN CNAF - DataGrid WP1 – Workload Management System elisabetta.ronchieri@cnaf.infn.it.

E. Ronchieri – n° 6

Major US & European Grid Projects, many with strong HEP participation

US projects European projects

Many national, regional Grid projects --GridPP(UK), INFN-grid(I),NorduGrid, Dutch Grid, …

The Virtual DataToolkit (VDT)

The DataGridToolkit

Page 7: E. Ronchieri – n° 1 EDG release 2 Elisabetta Ronchieri INFN CNAF - DataGrid WP1 – Workload Management System elisabetta.ronchieri@cnaf.infn.it.

E. Ronchieri – n° 7

Outline

What is Grid?

Grid Projects – Focus on EU Data Grid Project

Selected Areas + Technologies Security – Information and Monitoring Services – Storage Management

- Data Management - Workload Management

Installation

Page 8: E. Ronchieri – n° 1 EDG release 2 Elisabetta Ronchieri INFN CNAF - DataGrid WP1 – Workload Management System elisabetta.ronchieri@cnaf.infn.it.

E. Ronchieri – n° 8

EDG Globus-based middleware architecture

EDG is built on the emerging Grid technology

Start: Jan 1, 2001 End: Dec 31, 2003

Current EDG architectural functional blocks: Basic Services provided by Globus 2.2.x (such as authentication authorization, info providers,

replica catalog, secure file transfers) and Condor (such as the submission, the effective job cancellation, the event monitoring, the support for the monitoring)

Higher Level EDG Middleware developed within EDG

Application (such as HEP, BIO, and EO)

OS & Net services

Basic Services

High level Grid middleware

LHCVOs common application layer

Other apps

ALICE ATLAS CMS LHCbSpecific application layer

Other apps

GLOBUS 2.2.x

and Condor

Grid middleware

Page 9: E. Ronchieri – n° 1 EDG release 2 Elisabetta Ronchieri INFN CNAF - DataGrid WP1 – Workload Management System elisabetta.ronchieri@cnaf.infn.it.

E. Ronchieri – n° 9

Outline

What is Grid?

Grid Projects – Focus on EU Data Grid Project

Selected Areas + Technologies Security – Information and Monitoring Services – Storage Management

- Data Management - Workload Management

Installation

Page 10: E. Ronchieri – n° 1 EDG release 2 Elisabetta Ronchieri INFN CNAF - DataGrid WP1 – Workload Management System elisabetta.ronchieri@cnaf.infn.it.

E. Ronchieri – n° 10

Selected Areas for Grid Technologies in EU DataGrid (and partly Globus)

Security All access to and interaction with Grid resources need to be done in a secure way

Major technologies: PKI (Public Key Infrastructure), and GSS

Information and Monitoring Services Before you start using the Grid, you need to know what resources are there and

what you can use

Major technologies: LDAP based or Web Service approach

Data Management Main focus of a Data Grid

Major technologies: LDAP based or Web Service approach

Workload Management Submit your application to Grid where it is executed

Page 11: E. Ronchieri – n° 1 EDG release 2 Elisabetta Ronchieri INFN CNAF - DataGrid WP1 – Workload Management System elisabetta.ronchieri@cnaf.infn.it.

E. Ronchieri – n° 11

Outline

What is Grid?

Grid Project – Focus on EU DataGrid Projects

Selected Areas + Technologies Security – Information and Monitoring Services – Storage Management

- Data Management – Workload Management

Installation

Page 12: E. Ronchieri – n° 1 EDG release 2 Elisabetta Ronchieri INFN CNAF - DataGrid WP1 – Workload Management System elisabetta.ronchieri@cnaf.infn.it.

E. Ronchieri – n° 12

Security in EDG

Why: User jobs might access several remote resources

Users need to be Authenticated (Who am I?) Authorized (What can I do?)

Mainly uses: The security infrastructure provided by Globus

Based on PKI (Public Key Infrastructure) and GSS

Page 13: E. Ronchieri – n° 1 EDG release 2 Elisabetta Ronchieri INFN CNAF - DataGrid WP1 – Workload Management System elisabetta.ronchieri@cnaf.infn.it.

E. Ronchieri – n° 13

Grid Security Requirements

1) Easy to use

2) Single sign-on

3) Run applications

1) Specify local access control

2) Auditing, accounting, etc.

3) Integration local system kerberos, AFS, license mgr.

User View

Resource Owner View

Page 14: E. Ronchieri – n° 1 EDG release 2 Elisabetta Ronchieri INFN CNAF - DataGrid WP1 – Workload Management System elisabetta.ronchieri@cnaf.infn.it.

E. Ronchieri – n° 14

Grid Security Infrastructure (GSI)

Extensions to existing standard protocols & APIs Standards: SSL/TLS, X.509 & CA, GSS

Extensions for single sign-on and delegation

Globus Toolkit reference implementation of GSI SSLeay/OpenSSL + GSS-API + delegation + single sign on

Page 15: E. Ronchieri – n° 1 EDG release 2 Elisabetta Ronchieri INFN CNAF - DataGrid WP1 – Workload Management System elisabetta.ronchieri@cnaf.infn.it.

E. Ronchieri – n° 15

Site N(Unix)

Example of GSI usage

Site A(Unix)

Site B

Computer

User

Storagesystem

Proxy Credential

GridFTP Server

Grid Service

Remote file access request

Restricted Proxy

Page 16: E. Ronchieri – n° 1 EDG release 2 Elisabetta Ronchieri INFN CNAF - DataGrid WP1 – Workload Management System elisabetta.ronchieri@cnaf.infn.it.

E. Ronchieri – n° 16

VO-LDAP Architecture

mkgridmap grid-mapfile

VOVODirectoryDirectory

CN=Mario Rossi

o=xyz,dc=eu-datagrid, dc=org

CN=Franz ElmerCN=John Smith

Authentication Certificate

Authentication Certificate

Authentication Certificate

ou=People ou=Testbed1 ou=???

local users ban list

Adopted by

DataGrid Testbed0 (2001/02)

DataGrid Testbed1 (2003)

DataTAG Testbed (2003)

Page 17: E. Ronchieri – n° 1 EDG release 2 Elisabetta Ronchieri INFN CNAF - DataGrid WP1 – Workload Management System elisabetta.ronchieri@cnaf.infn.it.

E. Ronchieri – n° 17

Outline

What is Grid?

Grid Projects – Focus on EU Data Grid Project

Selected Areas + Technologies Security – Information and Monitoring Services - Storage Management

- Data Management -Workload Management

Installation

Page 18: E. Ronchieri – n° 1 EDG release 2 Elisabetta Ronchieri INFN CNAF - DataGrid WP1 – Workload Management System elisabetta.ronchieri@cnaf.infn.it.

E. Ronchieri – n° 18

Grid Information and Monitoring Services

MDS 2.x R-GMA

DATA Model LDAP (Hierarchical) Relational

communicaton LDAP HTTP

Information storage

LDAP-based backends re-written by Globus

Relational Data Base

queries LDAP queriesLdapsearch -x -H ldap://lxshare0225.cern.ch:2135\ -b 'Mds-Vo-name=datagrid,o=grid’\ 'objectclass=StorageElement‘\ seId SEsize

SQL queriesSelect * from StorageElement

Components

GRIS SEGRIS CE

GIIS

WNWNWN

WNWN

Producer

Consumer

Registry

Page 19: E. Ronchieri – n° 1 EDG release 2 Elisabetta Ronchieri INFN CNAF - DataGrid WP1 – Workload Management System elisabetta.ronchieri@cnaf.infn.it.

E. Ronchieri – n° 19

EDG release 1.x is totally based on MDS 2.x Due to stability problems of this component, in the last period

we use to deploy a pure LDAP server in front of a top level GIIS

EDG release 2.x is based on both MDS 2.x and R-GMA Since the GIS is a vital service for the WM, the Broker will rely

on MDS 2.x until R-GMA won’t prove to be reliable

Grid Information and Monitoring Services in EDG

Page 20: E. Ronchieri – n° 1 EDG release 2 Elisabetta Ronchieri INFN CNAF - DataGrid WP1 – Workload Management System elisabetta.ronchieri@cnaf.infn.it.

E. Ronchieri – n° 20

Outline

What is Grid?

Grid Projects – Focus on EU Data Grid Project

Selected Areas + Technologies Security – Information and Monitoring Services - Storage Management

- Data Management -Workload Management

Installation

Page 21: E. Ronchieri – n° 1 EDG release 2 Elisabetta Ronchieri INFN CNAF - DataGrid WP1 – Workload Management System elisabetta.ronchieri@cnaf.infn.it.

E. Ronchieri – n° 21

Interfaces to SE

First release of the SE control System

The three interfaces to the outside world are: Data transfer

Gridftp will be used to transfer files over the WAN and the files will be available to local nodes by NFS

Information Existing MDS information providers will be extended to provide the extra information in

the GLUE storage schema

Control Function such as reservation for reading and writing, metadata modification, access via

gridftp

It is an implementation of the Storage Resource Management (SRM) specification

The SE control interface to a generic MSS has already been tailored for CERN and RAL

Work is under way with in2p3, wp10 and wp9 to adapt it to their MSS

http://sdm.lbl.gov/srm-wg

Page 22: E. Ronchieri – n° 1 EDG release 2 Elisabetta Ronchieri INFN CNAF - DataGrid WP1 – Workload Management System elisabetta.ronchieri@cnaf.infn.it.

E. Ronchieri – n° 22

Outline

What is Grid?

Grid Projects – Focus on EU Data Grid Project

Selected Areas + Technologies Security – Information and Monitoring Services - Storage Management

- Data Management – Workload Management

Installation

Page 23: E. Ronchieri – n° 1 EDG release 2 Elisabetta Ronchieri INFN CNAF - DataGrid WP1 – Workload Management System elisabetta.ronchieri@cnaf.infn.it.

E. Ronchieri – n° 23

Naming Schemes

GUID – Global Unique Identifier guid:f81d4fae-7dec-11d0-a765-00a0c91e6bf6

LFN – Logical File Name lfn://event20030612

SFN – Storage File Name sfn://ibm139.cnaf.infn.it/edg/storageelement/dev/wpsix/pippo

Host + path + filename

GUID

LFN1

LFN2

LFN3

SFN1

SFN2

SFN3

Page 24: E. Ronchieri – n° 1 EDG release 2 Elisabetta Ronchieri INFN CNAF - DataGrid WP1 – Workload Management System elisabetta.ronchieri@cnaf.infn.it.

E. Ronchieri – n° 24

Replica Manager

Replica Metadata Catalog

Replica Location Service

File Transfer

Optimization Client

RLS

RMC

GridFTP

edg-replica-manager

Replication Services: EDG Replica Manager

Used for querying and assigning LFNs

Used for locating replicas and assigning SFNs

Used for transferring file

Page 25: E. Ronchieri – n° 1 EDG release 2 Elisabetta Ronchieri INFN CNAF - DataGrid WP1 – Workload Management System elisabetta.ronchieri@cnaf.infn.it.

E. Ronchieri – n° 25

VO VO

Replication Services Architecture

Site

Replica Manager

StorageElement

ComputingElement

Optimiser

Resource Broker

User Interface

ReplicaMetadata Catalog

Site

Replica Manager

StorageElement

ComputingElement

Optimiser

ReplicaLocation Service

LocalReplicaCatalog

LFNs -> GUIDGUID->SFNs

Page 26: E. Ronchieri – n° 1 EDG release 2 Elisabetta Ronchieri INFN CNAF - DataGrid WP1 – Workload Management System elisabetta.ronchieri@cnaf.infn.it.

E. Ronchieri – n° 26

Outline

What is Grid?

Grid Projects – Focus on EU Data Grid Project

Selected Areas + Technologies Security – Information and Monitoring Services - Storage Management

- Data Management - Workload Management

Installation

Page 27: E. Ronchieri – n° 1 EDG release 2 Elisabetta Ronchieri INFN CNAF - DataGrid WP1 – Workload Management System elisabetta.ronchieri@cnaf.infn.it.

E. Ronchieri – n° 27

Review of WMS architecture

WMS architecture reviewed To apply the “lessons” learned and addressing the shortcomings

emerged with the first release of the software

To address the scalability problems

To increase the reliability of the system

To favor interoperability with other Grid frameworks, by allowing exploiting WP1 modules (e.g. RB) also “outside” the EDG WMS

Page 28: E. Ronchieri – n° 1 EDG release 2 Elisabetta Ronchieri INFN CNAF - DataGrid WP1 – Workload Management System elisabetta.ronchieri@cnaf.infn.it.

E. Ronchieri – n° 28

WMS Revised Architecture

UIReplicaManager

Inform.Service

NetworkServer

Job Contr.-

CondorG

WorkloadManager

RB node

CE characts& status

SE characts& status

RBstorage

Match-Maker/ Broker

JobAdapter

Log Monitor

Logging &Bookkeeping

Page 29: E. Ronchieri – n° 1 EDG release 2 Elisabetta Ronchieri INFN CNAF - DataGrid WP1 – Workload Management System elisabetta.ronchieri@cnaf.infn.it.

E. Ronchieri – n° 29

Improvements

Duplication of persistent information related to jobs avoided LB only repository of job information Possible to have multiple LB servers per RB (to avoid bottlenecks)

Techniques to quickly recover from failures E.g.: communication among components of WMS much more reliable (done via persistent

queues in the file system)

Also less exposed to memory leaks (coming not only from EDG software)

Flexibility and interoperability increased E.g. RB-Matchmaker as pluggable module Glue Schema compliance

Other enhancements in design and implementation

Page 30: E. Ronchieri – n° 1 EDG release 2 Elisabetta Ronchieri INFN CNAF - DataGrid WP1 – Workload Management System elisabetta.ronchieri@cnaf.infn.it.

E. Ronchieri – n° 30

New functionalities User APIs

Including a Java GUI

Trivial job check-pointing service User can save from time to time the state of the job (defined by the application) A job can be restarted from an intermediate (i.e. previously saved) job state

Gang-matching Allow to take into account both CE and SE information in the matchmaking For example to require a job to run on a CE close to a SE with enough space

Support for parallel MPI jobs

Support for interactive jobs Jobs running on some CE worker node where a channel to the submitting (UI) node is available for the

standard streams (by integrating the Condor Bypass software)

Page 31: E. Ronchieri – n° 1 EDG release 2 Elisabetta Ronchieri INFN CNAF - DataGrid WP1 – Workload Management System elisabetta.ronchieri@cnaf.infn.it.

E. Ronchieri – n° 31

Outline

What is Grid?

Grid Projects – Focus on EU Data Grid Project

Selected Areas + Technologies Security – Information and Monitoring Services - Storage Management

- Data Management - Workload Management

Installation

Page 32: E. Ronchieri – n° 1 EDG release 2 Elisabetta Ronchieri INFN CNAF - DataGrid WP1 – Workload Management System elisabetta.ronchieri@cnaf.infn.it.

E. Ronchieri – n° 32

Installation

EDG SW: Is delivered via rpms Is handled on CVS repository

Globus + Condor SW: are provided via VDT (delivered rpms) upgraded to Globus 2.2.4 and Condor 6.5.1

LCFGng: Is an automatic installation tool based on rpms Is also used for the configuration of the middleware components Works for RH 6.2 and RH 7.3

Sites: Development testbed

Page 33: E. Ronchieri – n° 1 EDG release 2 Elisabetta Ronchieri INFN CNAF - DataGrid WP1 – Workload Management System elisabetta.ronchieri@cnaf.infn.it.

E. Ronchieri – n° 33

EDG Deploying

R-GMA, RM, RLS, ROS, RMC, and WMS + GLUE schema

EDG release 2.0 A temporary tag contains the functionalities for EDG 2.0 (deployed at

CERN, NIKHEF, CNAF, and RAL)

not officially tagged as EDG 2.0 until the basic functionalities work (e.g. job submission, data transfers, etc)

Hopefully the first EDG 2.0 tag at the end of this week

The schedule for moving to gcc3.2.2 for all software is planning for this September

The integration of more functionalities is entirely at the mercy of LCG

Page 34: E. Ronchieri – n° 1 EDG release 2 Elisabetta Ronchieri INFN CNAF - DataGrid WP1 – Workload Management System elisabetta.ronchieri@cnaf.infn.it.

E. Ronchieri – n° 34

Conclusion

Many improvements and many new functionalities

Preliminary results encouraging

More comprehensive evaluation with real tests performed by real users on the large scale testbed