Universita’ degli Studi di Pisa EGEE is a project funded by the European Union under contract IST-2003-508833 Storage Management in LHC Computing Grid Flavia Donno PhD candidate in Computer Engineering Universität Wien and University of Pisa Forschungsprivatissimum # 415040 27 June 2005
32
Embed
Universita’ degli Studi di Pisa EGEE is a project funded by the European Union under contract IST-2003-508833 Storage Management in LHC Computing Grid.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Universita’ degli Studidi Pisa
EGEE is a project funded by the European Union under contract IST-2003-508833
Storage Managementin LHC Computing
Grid
Flavia DonnoPhD candidate in Computer Engineering Universität Wien and University of Pisa
Forschungsprivatissimum # 415040 27 June 2005
Forschungsprivatissimum # 415040 – June 27, 2005 - 2
The GridGrid: what is it ? Why the Grid at CERN ? The LHC Computing Grid The LCG Architecture The Storage Element
Hardware/Software solutions on LAN Parallel Filesystems
The SRM Protocol StoRM: A Storage Resource Manager for Filesystems The StoRM Architecture StoRM as Policy Enforcement Point (PEP) for Storage Status of the StoRM Project Conclusions
Outline
Forschungsprivatissimum # 415040 – June 27, 2005 - 3
The Grid: what is it ?
• Many definitions:• It's an aggregation of geographically dispersed computing, storage, and network resources, coordinated to deliver improved performance, higher quality of service, better utilization, and easier access to data. • It enables virtual, collaborative organizations, sharing applications and data in an open, heterogeneous environment.
Researchers perform their activities regardless of geographical location, interact with colleagues, share and access data
Scientific instruments and experiments provide huge amount of data
The GRID: networked data processing centres and ”middleware” software as the “glue” of resources.
Forschungsprivatissimum # 415040 – June 27, 2005 - 4
Compute and Data Grids
• A compute grid is essentially a collection of distributed computing resources, within or across locations, which are aggregated to act as a unified processing resource or virtual supercomputer. Collecting these resources into a unified pool involves coordinated usage policies, job scheduling and queuing characteristics, grid-wide security, and user authentication.
• A data grid provides wide area, secure access to current data. Data grids enable users and applications to manage and efficiently use database information from distributed locations. Much like compute grids, data grids also rely on software for secure access and usage policies. Data grids can be deployed within one administrative domain or across multiple domains.
ComputeComputeGridGrid
DataDataGridGrid
Computing Element Computing Element
Forschungsprivatissimum # 415040 – June 27, 2005 - 5
The Grid: clusters, intra-grids, extra-grids
Forschungsprivatissimum # 415040 – June 27, 2005 - 6
Why the Grid ?
• Scale of the problems frontier research in many different fields today requires world-wide
collaborations (i.e. multi-domain access to distributed resources)
• GRIDs provide access to large data processing power and huge data storage possibilities
As the grid grows its usefulness increases (more resources available)
• Large communities of possible GRID users : High Energy Physics Environmental studies: Earthquakes forecast, geologic and climate changes,
ozone monitoring Biology, Genetics, Earth Observation Astrophysics, New composite materials research Astronautics, etc.
Forschungsprivatissimum # 415040 – June 27, 2005 - 7
Why the Grid @ CERN ?CMS
ATLAS
LHCb
~10 PetaBytes / year~108 events/year
~103 batch and interactive users
Forschungsprivatissimum # 415040 – June 27, 2005 - 8
Why the Grid @ CERN ?
• High-throughput computing (based on reliable “commodity” technology) • More than 3000 (dual processor) PCs with Linux• More than 3 Petabyte of data (on disk and tapes)
Nowhere near enough!
Forschungsprivatissimum # 415040 – June 27, 2005 - 9
Why the Grid @ CERN ?
• Problem: CERN alone can provide only a fraction of the necessary resources
• Solution: Computing centers, which were isolated in
the past, should now be connected, uniting the computing resources of particle physicists in the world!
Europe: 267 institutes4603 users
Elsewhere: 208 institutes1632 users
Forschungsprivatissimum # 415040 – June 27, 2005 - 10
The Grid Projects at CERN : LCG
The LCG (LHC Computing Grid) has started in 2002
Its goal is to build a world-wide computing infrastructurebased on Grid middleware to offer a computing platformfor the LHC experiments.
http://www.cern.ch/lcg
More than 23,000 HEP jobsrunning in a day concurrently.
Forschungsprivatissimum # 415040 – June 27, 2005 - 11
The LCG Architecture
It is based at the moment on the Globus Toolkit version
3 (not Web Service Resource Framework - WSRF - based)
Forschungsprivatissimum # 415040 – June 27, 2005 - 14
Requirements for Storage
Site A
Data Storage, Access
• Users on the Grid share resources and access Users on the Grid share resources and access them concurrentlythem concurrently Transparent access to files (migration to/from disk pools, other site
storage, Mass Storage Systems) File pinning File Locking Space reservation and management File status notification Life Time Management Security Privacy Local Policy Enforcement High I/O performance
Forschungsprivatissimum # 415040 – June 27, 2005 - 15
HW/SW solutions on LAN
LCG: Hierarchical Structure – CERN is the Tier-0 center where data are collected. Tier-1 centers need to be able to serve ~Petabyte of data. Tier-2 are smaller centers that allow users to access ~100Terabyte of data. Tier-3 are small University sites.
Mass Storage Systems (MSS) are normally hosted at Tier-0 and Tier-1 centers. Through robotic tape systems and home developed solutions, data are transparently spooled from tape to disks servers and made available to users (CASTOR, ENSTORE, HPSS, JasMINE, HSM UniTree, …). Protocols for file access are normally “proprietary”: rfio, dcap, ftp, …
Disk Pool Servers are based on low-cost parallel or serial ATA disks and can operate at the block or file level and aggregate RAID (Redundant Array of Independent Disks) controllers and capacity. The arrays perform load-balance among self-contained storage modules to allow for performance growth in a linear manner (Castor DiskPool, d-Cache, LCG DPM, SRB, SAM...). Access to files is guaranteed via POSIX-like calls. Management is quite hard.
Forschungsprivatissimum # 415040 – June 27, 2005 - 16
HW/SW solutions on LAN
Storage Area Network(SAN) is a high-speed special-purpose network (or sub-network) that interconnects different kinds of data storage devices with associated data servers. SANs utilizes Fiber Channel over high-speed fibre optic or copper cabling and can reach data transfer rates of up to 200 Mbps. SANs support disk mirroring; backup and restore; archival and retrieval of archived data; data migration from one storage device to another; and the sharing of data among different servers in a network. SAN solutions operate at the block level.
Network Attached Storage (NAS) is a product concept that packages file system hardware and software with a complete storage I/O subsystem as an integrated file server solution. NAS Servers are normally specialized servers that can handle a number of network protocols, including Microsoft's Internetwork Packet Exchange, NetBEUI and CIFS, Novell's Netware Internetwork Packet Exchange, and Sun Microsystem’s NFS. NAS systems provide for dynamic load balancing capabilities, dynamic volume and file system expansion and offer a single, global namespace. NAS systems can deliver performance of tens of Gigabytes/sec in a standard sequential read/write test.
Forschungsprivatissimum # 415040 – June 27, 2005 - 17
HW/SW solutions on LAN
Grid Storage refers to a topology for scaling the capacity of NAS in response to application requirements, and a technology for enabling and managing a single file system so that it can span an increasing volume of storage. NAS heads are the components containing a thin operating system optimized for NFS (or proprietary) protocol support and storage device attachment. NAS heads are joined together using clustering technology to create one virtual head.
Distributed Storage Tank (DST) project by IBM aims in the Global Grid Forum to produce a standards-based Lightweight Directory Access Protocol (LDAP) server to act as the master namespace server.
Forschungsprivatissimum # 415040 – June 27, 2005 - 18
Distributed and Parallel File Systems
• Cluster and distributed file systems are an alternative form of shared file system technology
• They do not use a separate meta-data server, are designed to work only in homogenous server environments and improving storage manageability is not a goal.
• Using very high-speed connections (Switched Gigabit Ethernet, Infiniband, etc.) such solutions provide for POSIX I/O, centralized management, load balancing, monitoring, and fail-over capabilities
Forschungsprivatissimum # 415040 – June 27, 2005 - 19
Distributed and Parallel File Systems
• IBM/GPFS, LUSTRE and PVFS-2. Capacity: large files (10-50GB), 100TB file-systems; High throughput: wide striping, large blocks, many GB/s throughputs; Reliability and fault-tolerance: node and disk failures; Online centralized system management: dynamic configuration and monitoring; Parallel data and metadata access: shared disks and distributed locking; Space allocation at file level; Quota, meta-data and file lifetime management; Access Control Lists (ACLs).
Forschungsprivatissimum # 415040 – June 27, 2005 - 20
The SRM Protocol
Storage Resource Manager (SRM)• Storage resource managers are middleware components that
manage shared storage resources on the grid and provide management functionalities like: Uniform access to heterogeneous types of storage File pinning Disk space allocation and advanced disk space reservation Protocol negotiation Life time management of files Management of security
Forschungsprivatissimum # 415040 – June 27, 2005 - 21
Space management functions• Space reservation• Dynamic space management.
Permission functions• Permission setting over storage resources.
Data Transfer functions• Protocol negotiation• Pinning of files.• File lifetime management.
Status functions• Status of asynchronous requests. SRM missing functionality
File locking Quota management Local Policies Enforcement Security/Privacy Not fully defined
SRM Interface
SRM-dCache
Access
StoRMSRM-Castor
Storage System
GPFS
Storage System
DCACHE
Storage System
CASTOR
Access Protocol
rfio
Access Protocol
dcap
Access Protocol
posix
Data Access
ManagementAccess
Forschungsprivatissimum # 415040 – June 27, 2005 - 22
SRM interface
Methods definitionSpace Management Functions
srmReserveSpacesrmReleaseSpacesrmUpdateSpace
srmCompactSpacesrmGetSpaceMetaData
srmChangeFileStorageTypesrmGetSpaceToken
Permission FunctionssrmSetPermission
srmReassignToUsersrmCheckPermission
Directory FunctionssrmMkdirsrmRmdir
srmRmsrmLs
Data Transfer FunctionssrmPrepareToGetsrmPrepareToPut
srmCopysrmRemoveFilessrmReleaseFiles
srmPutDonesrmAbortRequest
srmAbortFilessrmSuspendRequestsrmResumeRequest
Status FunctionssrmStatusOfGetRequestsrmStatusOfPutRequest
srmStatusOfCopyRequestsrmGetRequestSummary
srmGetRequestID
Forschungsprivatissimum # 415040 – June 27, 2005 - 23
StoRM for performing filesystems
• StoRM is a Storage Resource Manager.• It exposes a web service interface.
StoRM Web service description (wsdl) is compliant with SRM specification version 2.1.1
• It is built on top of GPFS (provides for POSIX/IO).• It guarantees coherent access to storage for both Grid and local
applications. VOMS certificates.• It extends the SRM interface with quota management, locking,
ACLs and policy enforcement.• It is integrated with:
Replica Consistency service Workload Management Service (WMS)
• Agreement Provider for Advance Reservation of Storage resource Third parties SRM service implementations (SRM compliant)
Forschungsprivatissimum # 415040 – June 27, 2005 - 24
StoRM
Storage Element
StoRM
Storage Element
SRM
WorkloadManagement
Service
ReplicaManagement
Service
Storage Agreement
Provider
ReplicaConsistency
Service
Forschungsprivatissimum # 415040 – June 27, 2005 - 25
StoRM: WMS with reserveSpace
SRM associates a space token
PUSH or PULL
SpaceToken isa job paramenter
Write atend of job
Space Areain any SE
Forschungsprivatissimum # 415040 – June 27, 2005 - 26
StoRM: The Server Architecture
Forschungsprivatissimum # 415040 – June 27, 2005 - 27
StoRM: The Server Architecture
Forschungsprivatissimum # 415040 – June 27, 2005 - 28
StoRM: The Server Architecture
Forschungsprivatissimum # 415040 – June 27, 2005 - 29
StoRM as Policy Enforcement Point
PBox(SE Istance)
Sto
RM
Ser
ver
StoRM Service(Permission F.)
PermissionComponent
PermissionCatalog
CE
Gatekeeper
GPFS File System(ACL Enforcement Mechanism)
LCAS
PBox(CE Istance)
LCMAPS
PrivilegeEnforcer
PFNACL
JobManager
WN#k
GPFS
UserJob
Forschungsprivatissimum # 415040 – June 27, 2005 - 30
Status of StoRM
• Main functionalities available. Request manager stressed-tested. Integration tests performed. • Databases schema is now stable. • First demo with WS-Ag has been demonstrated successfully. • Integration with Just in Time ACLs management is proceeding now.• Intense collaboration with IBM for both GPFS functionalities, SRM definition and GGF Filesystem WG.• Big interest from Grid research communities to use StoRM. It will be deployed by EGEE/LCG.
Forschungsprivatissimum # 415040 – June 27, 2005 - 31
Conclusion
• The Grid Storage access and management is still an open issue• Many solutions exist but do not cover all needs• Need to well characterize storage in Grid information system• Integration with vendor hardware/software solutions is still not accomplished• Global Grid Forum is trying to establish the bases for a standard for a Grid Open FileSystem. Vendors competition still makes the effort hard.• StoRM is a step forward in this direction, proposing a Grid interface to distributed and parallel filesystems.• StoRM exercizes the software development cycle for standard for the SRM Grid interface proposed and extends it. • StoRM is in testing phase. It will be adopted by EGEE/LCG Grid for High Energy Physics communities, Biology and other e-sciences.
Forschungsprivatissimum # 415040 – June 27, 2005 - 32
Forshungsprivatissimum # 415040
Storage Management inLHC Computing Grid
Flavia DonnoPhD candidate in Computer Engineering
Universitat Wien and University of Pisa
And ….
Hope you enjoyed this lecture.Thank you for your attention !