Top Banner
Universita’ degli Studi di Pisa EGEE is a project funded by the European Union under contract IST-2003-508833 Storage Management in LHC Computing Grid Flavia Donno PhD candidate in Computer Engineering Universität Wien and University of Pisa Forschungsprivatissimum # 415040 27 June 2005
32

Universita’ degli Studi di Pisa EGEE is a project funded by the European Union under contract IST-2003-508833 Storage Management in LHC Computing Grid.

Jan 03, 2016

Download

Documents

Gerald Richard
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Universita’ degli Studi di Pisa EGEE is a project funded by the European Union under contract IST-2003-508833 Storage Management in LHC Computing Grid.

Universita’ degli Studidi Pisa

EGEE is a project funded by the European Union under contract IST-2003-508833

Storage Managementin LHC Computing

Grid

Flavia DonnoPhD candidate in Computer Engineering Universität Wien and University of Pisa

Forschungsprivatissimum # 415040 27 June 2005

Page 2: Universita’ degli Studi di Pisa EGEE is a project funded by the European Union under contract IST-2003-508833 Storage Management in LHC Computing Grid.

Forschungsprivatissimum # 415040 – June 27, 2005 - 2

The GridGrid: what is it ? Why the Grid at CERN ? The LHC Computing Grid The LCG Architecture The Storage Element

Hardware/Software solutions on LAN Parallel Filesystems

The SRM Protocol StoRM: A Storage Resource Manager for Filesystems The StoRM Architecture StoRM as Policy Enforcement Point (PEP) for Storage Status of the StoRM Project Conclusions

Outline

Page 3: Universita’ degli Studi di Pisa EGEE is a project funded by the European Union under contract IST-2003-508833 Storage Management in LHC Computing Grid.

Forschungsprivatissimum # 415040 – June 27, 2005 - 3

The Grid: what is it ?

• Many definitions:• It's an aggregation of geographically dispersed computing, storage, and network resources, coordinated to deliver improved performance, higher quality of service, better utilization, and easier access to data. • It enables virtual, collaborative organizations, sharing applications and data in an open, heterogeneous environment.

Researchers perform their activities regardless of geographical location, interact with colleagues, share and access data

Scientific instruments and experiments provide huge amount of data

The GRID: networked data processing centres and ”middleware” software as the “glue” of resources.

Page 4: Universita’ degli Studi di Pisa EGEE is a project funded by the European Union under contract IST-2003-508833 Storage Management in LHC Computing Grid.

Forschungsprivatissimum # 415040 – June 27, 2005 - 4

Compute and Data Grids

• A compute grid is essentially a collection of distributed computing resources, within or across locations, which are aggregated to act as a unified processing resource or virtual supercomputer. Collecting these resources into a unified pool involves coordinated usage policies, job scheduling and queuing characteristics, grid-wide security, and user authentication.

• A data grid provides wide area, secure access to current data. Data grids enable users and applications to manage and efficiently use database information from distributed locations. Much like compute grids, data grids also rely on software for secure access and usage policies. Data grids can be deployed within one administrative domain or across multiple domains.

ComputeComputeGridGrid

DataDataGridGrid

Computing Element Computing Element

Page 5: Universita’ degli Studi di Pisa EGEE is a project funded by the European Union under contract IST-2003-508833 Storage Management in LHC Computing Grid.

Forschungsprivatissimum # 415040 – June 27, 2005 - 5

The Grid: clusters, intra-grids, extra-grids

Page 6: Universita’ degli Studi di Pisa EGEE is a project funded by the European Union under contract IST-2003-508833 Storage Management in LHC Computing Grid.

Forschungsprivatissimum # 415040 – June 27, 2005 - 6

Why the Grid ?

• Scale of the problems frontier research in many different fields today requires world-wide

collaborations (i.e. multi-domain access to distributed resources)

• GRIDs provide access to large data processing power and huge data storage possibilities

As the grid grows its usefulness increases (more resources available)

• Large communities of possible GRID users : High Energy Physics Environmental studies: Earthquakes forecast, geologic and climate changes,

ozone monitoring Biology, Genetics, Earth Observation Astrophysics, New composite materials research Astronautics, etc.

Page 7: Universita’ degli Studi di Pisa EGEE is a project funded by the European Union under contract IST-2003-508833 Storage Management in LHC Computing Grid.

Forschungsprivatissimum # 415040 – June 27, 2005 - 7

Why the Grid @ CERN ?CMS

ATLAS

LHCb

~10 PetaBytes / year~108 events/year

~103 batch and interactive users

Page 8: Universita’ degli Studi di Pisa EGEE is a project funded by the European Union under contract IST-2003-508833 Storage Management in LHC Computing Grid.

Forschungsprivatissimum # 415040 – June 27, 2005 - 8

Why the Grid @ CERN ?

• High-throughput computing (based on reliable “commodity” technology) • More than 3000 (dual processor) PCs with Linux• More than 3 Petabyte of data (on disk and tapes)

Nowhere near enough!

Page 9: Universita’ degli Studi di Pisa EGEE is a project funded by the European Union under contract IST-2003-508833 Storage Management in LHC Computing Grid.

Forschungsprivatissimum # 415040 – June 27, 2005 - 9

Why the Grid @ CERN ?

• Problem: CERN alone can provide only a fraction of the necessary resources

• Solution: Computing centers, which were isolated in

the past, should now be connected, uniting the computing resources of particle physicists in the world!  

Europe: 267 institutes4603 users

Elsewhere: 208 institutes1632 users

Page 10: Universita’ degli Studi di Pisa EGEE is a project funded by the European Union under contract IST-2003-508833 Storage Management in LHC Computing Grid.

Forschungsprivatissimum # 415040 – June 27, 2005 - 10

The Grid Projects at CERN : LCG

The LCG (LHC Computing Grid) has started in 2002

Its goal is to build a world-wide computing infrastructurebased on Grid middleware to offer a computing platformfor the LHC experiments.

http://www.cern.ch/lcg

More than 23,000 HEP jobsrunning in a day concurrently.

Page 11: Universita’ degli Studi di Pisa EGEE is a project funded by the European Union under contract IST-2003-508833 Storage Management in LHC Computing Grid.

Forschungsprivatissimum # 415040 – June 27, 2005 - 11

The LCG Architecture

It is based at the moment on the Globus Toolkit version

3 (not Web Service Resource Framework - WSRF - based)

Features: Single sign-on (Globus Security

Infrastructure), delegation, remote submission (Globus

Resource Allocation Manager) , GridFTP, Monitor and

Directory Service (MDS)

Other projects contributing to the LCG middleware are:

European Data Grid, DataTag, PPDG, GriPhyN, OSG,

Condor

Services: Resource Broker, Virtual Organization

Management Service, Data Management Service, Data

Catalogues Fabric Management and Configuration,

Monitoring and Control,

Storage Management Solutions

Ian FosterKarl KesselmanSteve Tuecke

Page 12: Universita’ degli Studi di Pisa EGEE is a project funded by the European Union under contract IST-2003-508833 Storage Management in LHC Computing Grid.

Forschungsprivatissimum # 415040 – June 27, 2005 - 12

The Middleware components

UINetworkServer

Job Contr.-

CondorG

WorkloadManager

ReplicaLocationService

Inform.Service

ComputingElement Storage

Element

RB node

CE chars.& status

SE chars.& status

StorageElement

VOMS

Job Match-Maker/Broker

Job“Grid enabled”data transfers/

accesses

Result

Page 13: Universita’ degli Studi di Pisa EGEE is a project funded by the European Union under contract IST-2003-508833 Storage Management in LHC Computing Grid.

Forschungsprivatissimum # 415040 – June 27, 2005 - 13

The Storage Element

Site A Site B

Data MovementReplication

Data Storage, Access

Registration of data

GridFTP

Globus Replica Catalog (RLS= Replica Location Service)

Page 14: Universita’ degli Studi di Pisa EGEE is a project funded by the European Union under contract IST-2003-508833 Storage Management in LHC Computing Grid.

Forschungsprivatissimum # 415040 – June 27, 2005 - 14

Requirements for Storage

Site A

Data Storage, Access

• Users on the Grid share resources and access Users on the Grid share resources and access them concurrentlythem concurrently Transparent access to files (migration to/from disk pools, other site

storage, Mass Storage Systems) File pinning File Locking Space reservation and management File status notification Life Time Management Security Privacy Local Policy Enforcement High I/O performance

Page 15: Universita’ degli Studi di Pisa EGEE is a project funded by the European Union under contract IST-2003-508833 Storage Management in LHC Computing Grid.

Forschungsprivatissimum # 415040 – June 27, 2005 - 15

HW/SW solutions on LAN

LCG: Hierarchical Structure – CERN is the Tier-0 center where data are collected. Tier-1 centers need to be able to serve ~Petabyte of data. Tier-2 are smaller centers that allow users to access ~100Terabyte of data. Tier-3 are small University sites.

Mass Storage Systems (MSS) are normally hosted at Tier-0 and Tier-1 centers. Through robotic tape systems and home developed solutions, data are transparently spooled from tape to disks servers and made available to users (CASTOR, ENSTORE, HPSS, JasMINE, HSM UniTree, …). Protocols for file access are normally “proprietary”: rfio, dcap, ftp, …

Disk Pool Servers are based on low-cost parallel or serial ATA disks and can operate at the block or file level and aggregate RAID (Redundant Array of Independent Disks) controllers and capacity. The arrays perform load-balance among self-contained storage modules to allow for performance growth in a linear manner (Castor DiskPool, d-Cache, LCG DPM, SRB, SAM...). Access to files is guaranteed via POSIX-like calls. Management is quite hard.

Page 16: Universita’ degli Studi di Pisa EGEE is a project funded by the European Union under contract IST-2003-508833 Storage Management in LHC Computing Grid.

Forschungsprivatissimum # 415040 – June 27, 2005 - 16

HW/SW solutions on LAN

Storage Area Network(SAN) is a high-speed special-purpose network (or sub-network) that interconnects different kinds of data storage devices with associated data servers. SANs utilizes Fiber Channel over high-speed fibre optic or copper cabling and can reach data transfer rates of up to 200 Mbps. SANs support disk mirroring; backup and restore; archival and retrieval of archived data; data migration from one storage device to another; and the sharing of data among different servers in a network. SAN solutions operate at the block level.

Network Attached Storage (NAS) is a product concept that packages file system hardware and software with a complete storage I/O subsystem as an integrated file server solution. NAS Servers are normally specialized servers that can handle a number of network protocols, including Microsoft's Internetwork Packet Exchange, NetBEUI and CIFS, Novell's Netware Internetwork Packet Exchange, and Sun Microsystem’s NFS. NAS systems provide for dynamic load balancing capabilities, dynamic volume and file system expansion and offer a single, global namespace. NAS systems can deliver performance of tens of Gigabytes/sec in a standard sequential read/write test.

Page 17: Universita’ degli Studi di Pisa EGEE is a project funded by the European Union under contract IST-2003-508833 Storage Management in LHC Computing Grid.

Forschungsprivatissimum # 415040 – June 27, 2005 - 17

HW/SW solutions on LAN

Grid Storage refers to a topology for scaling the capacity of NAS in response to application requirements, and a technology for enabling and managing a single file system so that it can span an increasing volume of storage. NAS heads are the components containing a thin operating system optimized for NFS (or proprietary) protocol support and storage device attachment. NAS heads are joined together using clustering technology to create one virtual head.

Distributed Storage Tank (DST) project by IBM aims in the Global Grid Forum to produce a standards-based Lightweight Directory Access Protocol (LDAP) server to act as the master namespace server.

Page 18: Universita’ degli Studi di Pisa EGEE is a project funded by the European Union under contract IST-2003-508833 Storage Management in LHC Computing Grid.

Forschungsprivatissimum # 415040 – June 27, 2005 - 18

Distributed and Parallel File Systems

• Cluster and distributed file systems are an alternative form of shared file system technology

• They do not use a separate meta-data server, are designed to work only in homogenous server environments and improving storage manageability is not a goal.

• Using very high-speed connections (Switched Gigabit Ethernet, Infiniband, etc.) such solutions provide for POSIX I/O, centralized management, load balancing, monitoring, and fail-over capabilities

Page 19: Universita’ degli Studi di Pisa EGEE is a project funded by the European Union under contract IST-2003-508833 Storage Management in LHC Computing Grid.

Forschungsprivatissimum # 415040 – June 27, 2005 - 19

Distributed and Parallel File Systems

• IBM/GPFS, LUSTRE and PVFS-2. Capacity: large files (10-50GB), 100TB file-systems; High throughput: wide striping, large blocks, many GB/s throughputs; Reliability and fault-tolerance: node and disk failures; Online centralized system management: dynamic configuration and monitoring; Parallel data and metadata access: shared disks and distributed locking; Space allocation at file level; Quota, meta-data and file lifetime management; Access Control Lists (ACLs).

Page 20: Universita’ degli Studi di Pisa EGEE is a project funded by the European Union under contract IST-2003-508833 Storage Management in LHC Computing Grid.

Forschungsprivatissimum # 415040 – June 27, 2005 - 20

The SRM Protocol

Storage Resource Manager (SRM)• Storage resource managers are middleware components that

manage shared storage resources on the grid and provide management functionalities like: Uniform access to heterogeneous types of storage File pinning Disk space allocation and advanced disk space reservation Protocol negotiation Life time management of files Management of security

Page 21: Universita’ degli Studi di Pisa EGEE is a project funded by the European Union under contract IST-2003-508833 Storage Management in LHC Computing Grid.

Forschungsprivatissimum # 415040 – June 27, 2005 - 21

The SRM Protocol

SRM functionality• SRM interface specification describes:

Space management functions• Space reservation• Dynamic space management.

Permission functions• Permission setting over storage resources.

Data Transfer functions• Protocol negotiation• Pinning of files.• File lifetime management.

Status functions• Status of asynchronous requests. SRM missing functionality

File locking Quota management Local Policies Enforcement Security/Privacy Not fully defined

SRM Interface

SRM-dCache

Access

StoRMSRM-Castor

Storage System

GPFS

Storage System

DCACHE

Storage System

CASTOR

Access Protocol

rfio

Access Protocol

dcap

Access Protocol

posix

Data Access

ManagementAccess

Page 22: Universita’ degli Studi di Pisa EGEE is a project funded by the European Union under contract IST-2003-508833 Storage Management in LHC Computing Grid.

Forschungsprivatissimum # 415040 – June 27, 2005 - 22

SRM interface

Methods definitionSpace Management Functions

srmReserveSpacesrmReleaseSpacesrmUpdateSpace

srmCompactSpacesrmGetSpaceMetaData

srmChangeFileStorageTypesrmGetSpaceToken

Permission FunctionssrmSetPermission

srmReassignToUsersrmCheckPermission

Directory FunctionssrmMkdirsrmRmdir

srmRmsrmLs

Data Transfer FunctionssrmPrepareToGetsrmPrepareToPut

srmCopysrmRemoveFilessrmReleaseFiles

srmPutDonesrmAbortRequest

srmAbortFilessrmSuspendRequestsrmResumeRequest

Status FunctionssrmStatusOfGetRequestsrmStatusOfPutRequest

srmStatusOfCopyRequestsrmGetRequestSummary

srmGetRequestID

Page 23: Universita’ degli Studi di Pisa EGEE is a project funded by the European Union under contract IST-2003-508833 Storage Management in LHC Computing Grid.

Forschungsprivatissimum # 415040 – June 27, 2005 - 23

StoRM for performing filesystems

• StoRM is a Storage Resource Manager.• It exposes a web service interface.

StoRM Web service description (wsdl) is compliant with SRM specification version 2.1.1

• It is built on top of GPFS (provides for POSIX/IO).• It guarantees coherent access to storage for both Grid and local

applications. VOMS certificates.• It extends the SRM interface with quota management, locking,

ACLs and policy enforcement.• It is integrated with:

Replica Consistency service Workload Management Service (WMS)

• Agreement Provider for Advance Reservation of Storage resource Third parties SRM service implementations (SRM compliant)

Page 24: Universita’ degli Studi di Pisa EGEE is a project funded by the European Union under contract IST-2003-508833 Storage Management in LHC Computing Grid.

Forschungsprivatissimum # 415040 – June 27, 2005 - 24

StoRM

Storage Element

StoRM

Storage Element

SRM

WorkloadManagement

Service

ReplicaManagement

Service

Storage Agreement

Provider

ReplicaConsistency

Service

Page 25: Universita’ degli Studi di Pisa EGEE is a project funded by the European Union under contract IST-2003-508833 Storage Management in LHC Computing Grid.

Forschungsprivatissimum # 415040 – June 27, 2005 - 25

StoRM: WMS with reserveSpace

SRM associates a space token

PUSH or PULL

SpaceToken isa job paramenter

Write atend of job

Space Areain any SE

Page 26: Universita’ degli Studi di Pisa EGEE is a project funded by the European Union under contract IST-2003-508833 Storage Management in LHC Computing Grid.

Forschungsprivatissimum # 415040 – June 27, 2005 - 26

StoRM: The Server Architecture

Page 27: Universita’ degli Studi di Pisa EGEE is a project funded by the European Union under contract IST-2003-508833 Storage Management in LHC Computing Grid.

Forschungsprivatissimum # 415040 – June 27, 2005 - 27

StoRM: The Server Architecture

Page 28: Universita’ degli Studi di Pisa EGEE is a project funded by the European Union under contract IST-2003-508833 Storage Management in LHC Computing Grid.

Forschungsprivatissimum # 415040 – June 27, 2005 - 28

StoRM: The Server Architecture

Page 29: Universita’ degli Studi di Pisa EGEE is a project funded by the European Union under contract IST-2003-508833 Storage Management in LHC Computing Grid.

Forschungsprivatissimum # 415040 – June 27, 2005 - 29

StoRM as Policy Enforcement Point

PBox(SE Istance)

Sto

RM

Ser

ver

StoRM Service(Permission F.)

PermissionComponent

PermissionCatalog

CE

Gatekeeper

GPFS File System(ACL Enforcement Mechanism)

LCAS

PBox(CE Istance)

LCMAPS

PrivilegeEnforcer

PFNACL

JobManager

WN#k

GPFS

UserJob

Page 30: Universita’ degli Studi di Pisa EGEE is a project funded by the European Union under contract IST-2003-508833 Storage Management in LHC Computing Grid.

Forschungsprivatissimum # 415040 – June 27, 2005 - 30

Status of StoRM

• Main functionalities available. Request manager stressed-tested. Integration tests performed. • Databases schema is now stable. • First demo with WS-Ag has been demonstrated successfully. • Integration with Just in Time ACLs management is proceeding now.• Intense collaboration with IBM for both GPFS functionalities, SRM definition and GGF Filesystem WG.• Big interest from Grid research communities to use StoRM. It will be deployed by EGEE/LCG.

Page 31: Universita’ degli Studi di Pisa EGEE is a project funded by the European Union under contract IST-2003-508833 Storage Management in LHC Computing Grid.

Forschungsprivatissimum # 415040 – June 27, 2005 - 31

Conclusion

• The Grid Storage access and management is still an open issue• Many solutions exist but do not cover all needs• Need to well characterize storage in Grid information system• Integration with vendor hardware/software solutions is still not accomplished• Global Grid Forum is trying to establish the bases for a standard for a Grid Open FileSystem. Vendors competition still makes the effort hard.• StoRM is a step forward in this direction, proposing a Grid interface to distributed and parallel filesystems.• StoRM exercizes the software development cycle for standard for the SRM Grid interface proposed and extends it. • StoRM is in testing phase. It will be adopted by EGEE/LCG Grid for High Energy Physics communities, Biology and other e-sciences.

Page 32: Universita’ degli Studi di Pisa EGEE is a project funded by the European Union under contract IST-2003-508833 Storage Management in LHC Computing Grid.

Forschungsprivatissimum # 415040 – June 27, 2005 - 32

Forshungsprivatissimum # 415040

Storage Management inLHC Computing Grid

Flavia DonnoPhD candidate in Computer Engineering

Universitat Wien and University of Pisa

And ….

Hope you enjoyed this lecture.Thank you for your attention !