Top Banner
Overview of the SDSC Storage Resource Broker Wayne Schroeder (and other SRB team members) May, 2004 San Diego Supercomputer Center, University of California San Diego SDSC/UCSD/NPACI
28

Overview of the SDSC Storage Resource Broker Wayne Schroeder (and other SRB team members) May, 2004 San Diego Supercomputer Center, University of California.

Mar 28, 2015

Download

Documents

Jason McGuire
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Overview of the SDSC Storage Resource Broker Wayne Schroeder (and other SRB team members) May, 2004 San Diego Supercomputer Center, University of California.

Overview of the SDSCStorage Resource Broker

Wayne Schroeder(and other SRB team members)

May, 2004

San Diego Supercomputer Center,University of California San Diego

SDSC/UCSD/NPACI

Page 2: Overview of the SDSC Storage Resource Broker Wayne Schroeder (and other SRB team members) May, 2004 San Diego Supercomputer Center, University of California.

What is SRB? (1 of 3)

• The SDSC Storage Resource Broker (SRB) is client-server middleware that provides a uniform interface for connecting to heterogeneous data resources over a network and accessing unique or replicated data objects.

• SRB, in conjunction with the Metadata Catalog (MCAT), provides a way to access data sets and resources based on their logical names or attributes rather than their names and physical locations.

Page 3: Overview of the SDSC Storage Resource Broker Wayne Schroeder (and other SRB team members) May, 2004 San Diego Supercomputer Center, University of California.

What is SRB? (2 of 3)

• The SDSC SRB system is a comprehensive distributed data management solution, with features to support the management, collaborative (and controlled) sharing, publication, and preservation of distributed data collections.

• The SRB also serves as middleware via a rich set of APIs available to higher-level applications and by providing a management layer on top of a wide variety of storage systems.

Page 4: Overview of the SDSC Storage Resource Broker Wayne Schroeder (and other SRB team members) May, 2004 San Diego Supercomputer Center, University of California.

What is SRB? (3 of 3)

The SRB is an integrated solution which includes:– a logical namespace,– interfaces to a wide variety of storage systems, – high performance data movement (including parallel I/O), – fault-tolerance and fail-over, – WAN-aware performance enhancements (bulk operations), – storage-system-aware performance enhancements ('containers' to aggregate files),– metadata ingestion and queries (a MetaData Catalog (MCAT)), – user accounts, groups, access control, audit trails, GUI administration tool– data management features, replication– user tools (including a Windows GUI tool (inQ), a set of SRB Unix commands, and Web

(mySRB)), and APIs (including C, C++, Java, and Python).

SRB Scales Well (many millions of files, terabytes)

Supports Multiple Administrative Domains / MCATs (srbZones)

And includes SDSC Matrix: SRB-based data grid workflow management system to create, access and manage workflow process pipelines.

Page 5: Overview of the SDSC Storage Resource Broker Wayne Schroeder (and other SRB team members) May, 2004 San Diego Supercomputer Center, University of California.

SRB Projects• Digital Libraries

– UCB, Umich, UCSB, Stanford,CDL– NSF NSDL - UCAR / DLESE

• NASA Information Power Grid• Astronomy

– National Virtual Observatory – 2MASS Project (2 Micron All Sky Survey)

• Particle Physics – Particle Physics Data Grid (DOE)– GriPhyN – SLAC Synchrotron Data Repository

• Medicine– Digital Embryo (NLM)

• Earth Systems Sciences– ESIPS– LTER

• Persistent Archives– NARA– LOC

• Neuro Science & Molecular Science– TeleScience/NCMIR, BIRN– SLAC, AfCS, …

Over 90 Tera Bytes in 16 million files

Page 6: Overview of the SDSC Storage Resource Broker Wayne Schroeder (and other SRB team members) May, 2004 San Diego Supercomputer Center, University of California.

SRB Scalability

NPACI 6,050.00 2,317,368 367 8,350.00 2,903,386 372 8,570.00 2,946,526 375 8,822.00 2,995,432 377Digsky 46,100.00 5,719,025 68 46,100.00 5,719,025 68 46,100.00 5,719,025 68 42,786.00 6,076,982 69DigEmbryo 720.00 45,365 23 720.00 45,365 23 720.00 45,365 23 720.00 45,365 23HyperLter 215.00 5,097 27 215.00 5,097 28 215.00 5,097 28 215.00 5,097 28Hayden 7,078.00 59,399 142 7,130.00 59,781 158 7,830.00 59,983 158 7,835.00 60,001 168Portal 968.00 27,250 316 1,133.00 31,717 339 1,141.00 32,396 342 1,244.00 34,094 352SLAC 1,790.00 254,974 43 1,790.00 254,974 43 1,790.00 254,974 43 2,108.00 294,149 43NARA/Collection 52.80 79,195 51 53.10 79,112 55 53.90 79,118 55 67.00 82,031 56NSDL/SIO Exp 232.00 15,809 23 315.00 27,170 23 418.00 39,253 23 603.00 87,191 26TRA 90.60 2,385 25 92.00 2,378 26 92.00 2,387 26 92.00 2,387 26LDAS/SALK 498.00 9,858 60 737.00 12,898 66 767.00 12,906 66 824.00 13,016 66BIRN 121.00 237,283 138 273.00 675,531 145 382.00 2,048,132 156 389.00 1,084,749 167AfCS 95.30 18,762 20 99.00 19,714 20 102.00 20,260 20 107.00 21,295 21UCSDLib 1,084.00 138,415 29 1,085.00 138,421 29 1,085.00 138,421 29 1,085.00 138,421 29NSDL/CI 278.00 993,886 113 379.00 2,596,090 114 379.00 2,596,090 114 465.00 2,948,903 114SCEC 12.60 18,660 38 7,561.00 1,249,144 39 9,680.00 1,561,396 40 12,274.00 1,721,241 43TeraGrid 623.00 36,508 1,978 1,664.00 47,644 1,942 1,745.00 49,106 2,073 10,603.00 433,938 2,229

TOTAL 66,008.30 9,979,239 3,461 77,696.10 13,867,447 3,490 81,069.90 15,610,435 3,639 90,239.00 16,044,292 3,837

66 TB 9.97 million 3 thousand 77 TB 13.9 million 3 thousand 81 TB 15.6 million3 thousand 90 TB 16 million 3 thousand

Project Instance

As of 10/01/2003

Does not cover databases; covers only files stored in file systems and archival storage systems ** Does not cover data brokered by SRB spaces administered outside SDSC.

Count (files)Data_size (in

GB)Data_size (in

GB)Count (files)Users

Storage Resource Broker (SRB)Data brokered by SDSC instances of SRB**

Users

As of 7/24/2003

Count (files) Users

As of 9/12/2003

Does not cover shadow-

CommentsData_size (in

GB)Data_size (in

GB)Count (files)

Users

As of 11/14/2003

Page 7: Overview of the SDSC Storage Resource Broker Wayne Schroeder (and other SRB team members) May, 2004 San Diego Supercomputer Center, University of California.

Case Study: SRB in BIRN

BIRN Toolkit

Mediator

Viewing/Visualization Queries/ResultsApplications Data Management

File System

MCAT

HPSS

Data M

od

elD

ata Access

Data G

ridC

ompu

tatio

nal G

rid

Collaboration

NM

IG

rid

Man

agem

ent

Globus

GridPort

Scheduler

Distributed Resources

Database

SRB

Database

Page 8: Overview of the SDSC Storage Resource Broker Wayne Schroeder (and other SRB team members) May, 2004 San Diego Supercomputer Center, University of California.

SRB History

• A DataGrid since SRB 1.0, Production 1997• SDSC Started by General Atomics, 1985

– GA/UCSD Staff– On UCSD Campus– SRB by GA Employees

• Today, SDSC no longer GA, all UCSD– All staff UCSD employees

• GA Commercial SRB Version (Nirvana)– Based on SRB 1.1.8 (2001)– Nirvana and SDSC versions diverged– SDSC SRB free to academic organizations– License from Nirvana for commercial

Page 9: Overview of the SDSC Storage Resource Broker Wayne Schroeder (and other SRB team members) May, 2004 San Diego Supercomputer Center, University of California.

SRB – A Data Grid Solution

• Storage Resource Broker– Collaborative client-server system that federates

distributed heterogeneous resources using uniform interfaces and metadata

– Provides a simple tool to integrate data and metadata handling – attribute-based access

– Blends browsing and searching

– Developed at SDSC - Operational for 5+ years;

- Under continual development since 1997;

- Customer-driven;

- Brokering over 90 TeraBytes in over 16 million files at SDSC

Page 10: Overview of the SDSC Storage Resource Broker Wayne Schroeder (and other SRB team members) May, 2004 San Diego Supercomputer Center, University of California.

Using a Data Grid - Details

SRB

MCAT

DB

SRB

SRB

SRB

SRB SRB

•Data Grid has arbitrary number of servers•Complexity is hidden from users

Page 11: Overview of the SDSC Storage Resource Broker Wayne Schroeder (and other SRB team members) May, 2004 San Diego Supercomputer Center, University of California.

SDSC Storage Resource Broker & Meta-data Catalog

SRBArchives

HPSS, ADSM,UniTree, DMF

DatabasesDB2, Oracle,

Sybase

File SystemsUnix, NT,Mac OSX

Application

C, C++, Linux I/O

Unix Shell

Dublin Core

Resource,User

User Defined

ApplicationMeta-data

RemoteProxies

DataCutter

Third-partycopy

Java, NTBrowsers

WebPrologPredicate

MCAT

HRM

Page 12: Overview of the SDSC Storage Resource Broker Wayne Schroeder (and other SRB team members) May, 2004 San Diego Supercomputer Center, University of California.

SRBserver

SRB agent

SRBserver

Federated SRB Operation

MCAT

Read Application in Boston

SRB agent

1

2

3

4

6

5

5/6

Logical NameOr

Attribute Condition

1.Logical-to-Physical mapping2. Identification of Replicas3.Access & Audit Control

Peer-to-peer Brokering

Server(s) Spawning

Data Access

Parallel Data Access

R1 R2R2

San Diego

Durham

Page 13: Overview of the SDSC Storage Resource Broker Wayne Schroeder (and other SRB team members) May, 2004 San Diego Supercomputer Center, University of California.

inQ Windows GUI

Page 14: Overview of the SDSC Storage Resource Broker Wayne Schroeder (and other SRB team members) May, 2004 San Diego Supercomputer Center, University of California.

Virtual Hierarchical Collection Management

Page 15: Overview of the SDSC Storage Resource Broker Wayne Schroeder (and other SRB team members) May, 2004 San Diego Supercomputer Center, University of California.

Attributes

• SRB metadata– Location, protocol– Unix semantics– Authorization, authentication– Latency management– Container aggregation

• Administrative– Dublin core, provenance– Annotations, comments

• Discipline specific attributes– Collection– User defined

Page 16: Overview of the SDSC Storage Resource Broker Wayne Schroeder (and other SRB team members) May, 2004 San Diego Supercomputer Center, University of California.

Authentication Management

Grid Security Infrastructure (GSI) Encrypted Password GSS-API for Kerberos or DCE Collection-owned Data

Collection ID installed at each storage system Users authenticate themselves to the SRB SRB authenticates to local server Or GSI Delegation (Ananta Manandhar, CCLRC)

Page 17: Overview of the SDSC Storage Resource Broker Wayne Schroeder (and other SRB team members) May, 2004 San Diego Supercomputer Center, University of California.

Logical File Name

One of the major functions of SRB is the mapping between a logical file name and itsphysical file. The mapped info of a logical filename includes:

Location of name in collection hierarchy Physical file location: host name and path Protocol: for fetching ‘local’ file Unix semantics for file manipulation Location in container Audit trail Access control list Locking status

Page 18: Overview of the SDSC Storage Resource Broker Wayne Schroeder (and other SRB team members) May, 2004 San Diego Supercomputer Center, University of California.

Replica Management

Files can be replicated into any valid physical storage resource registered in SRB.

Each replica is managed by the same logical filename as the original one and a unique replication number. Each replica can have unique metadata.

1-to-many Replication: A logical resource can contain several physical storage resources.

Multiple replicas can be made to the same storage resource Many Modes of Replication:

Synchronous Replication; Sput to a logical resource Asynchronous Replication; Sput then later Sreplicate Out of Band Replication; Outside SRB, then register

Page 19: Overview of the SDSC Storage Resource Broker Wayne Schroeder (and other SRB team members) May, 2004 San Diego Supercomputer Center, University of California.

Containers

• Physical Grouping of Objects• Similar to tar but has significant differences• Multiple Uses:

– To take advantage of resource characteristics– To aid access patterns– Move data sets together– Tie together logically different files– Automatic Archiving/Caching

• Chaining of Containers• Sharing of metadata• Containers for Collections

Page 20: Overview of the SDSC Storage Resource Broker Wayne Schroeder (and other SRB team members) May, 2004 San Diego Supercomputer Center, University of California.

Proxy Operation

• Proxy operation - – server performs operations on behalf of client– performs operations where the data are located– subset and filter operations – datacutter– Metadata extraction and ingestion checks– srbExecCommand() API and Spcommand

utility -• request a specific server to execute a specific

command and stream the result to stdin• used by the NVO(national virtual observatory)

cutout service

Page 21: Overview of the SDSC Storage Resource Broker Wayne Schroeder (and other SRB team members) May, 2004 San Diego Supercomputer Center, University of California.

SRB – More Features

• Client Support– Pure Java Client– Web Services - WSDL, Matrix workflow system– Web Support - MySRB Extensions– Pure Java Client & Browser– inQ Version 3.1 and more Windows Support

• Administrative Support– GUI-based Administration– More Features - Resource, User, Method Management– User-friendly Installation Procedures

Page 22: Overview of the SDSC Storage Resource Broker Wayne Schroeder (and other SRB team members) May, 2004 San Diego Supercomputer Center, University of California.

Metadata Management

Metadata Insertion Through User Interfaces Bulk Metadata Insertion Template Based Metadata Extraction Metadata Search

system data user-defined metadata File Content Search: Key words are pre-extracted by a

template and saved as user-defined metadata.

Page 23: Overview of the SDSC Storage Resource Broker Wayne Schroeder (and other SRB team members) May, 2004 San Diego Supercomputer Center, University of California.

Storage Resource Broker

• SRB wears many hats:– It is a distributed but unified file system– It is a database access interface– It is a digital library– It is a semantic web– It is a data grid system– It is an advanced archival system

Page 24: Overview of the SDSC Storage Resource Broker Wayne Schroeder (and other SRB team members) May, 2004 San Diego Supercomputer Center, University of California.

Criticisms of SRB

• Not completely open source – But semi-open and available to academics

• Not standards-based – But internal protocols need not be

• Monolithic– Integrated– And well partitioned

Page 25: Overview of the SDSC Storage Resource Broker Wayne Schroeder (and other SRB team members) May, 2004 San Diego Supercomputer Center, University of California.

Some SRB Weaknesses (my view)

• Difficult to explain and understand– SRB does so much, people tend to learn subsets and

are often unaware of useful features– Different groups are interested in different sets of

features– An “elevator speech” is either vague or incomplete

• Not completely open source• Collaborations difficult

– Need to expand

• Limited Staff– Feature-focused projects (+/-): docs, error messages

Page 26: Overview of the SDSC Storage Resource Broker Wayne Schroeder (and other SRB team members) May, 2004 San Diego Supercomputer Center, University of California.

Some SRB Strengths

• Integrated solution– High performance– Highly functional– Relatively easy to enhance

• Middle-ware and Complete-ware• Customer driven• Sound architecture• Mature, but also being actively developed• Growing user base• Highly coordinated centralized team

Page 27: Overview of the SDSC Storage Resource Broker Wayne Schroeder (and other SRB team members) May, 2004 San Diego Supercomputer Center, University of California.

TeamSRB, San Diego

• Reagan Moore (Program Director, DAKS)• Arcot Rajasekar (Director)• Michael Wan (Chief Architect)• Wayne Schroeder• George Kremenek• Bing Zhu• Sheau-Yen Chen• Charles Cowart• Arun Jagatheesan (GriPhyN)• Lucas Gilbert• Roman Olsachnowsky (BIRN)• Tim Warnock (BIRN)

Page 28: Overview of the SDSC Storage Resource Broker Wayne Schroeder (and other SRB team members) May, 2004 San Diego Supercomputer Center, University of California.

Contacts

• For Additional Information:Web: http://www.npaci.edu/dice/srb

Mail: [email protected]

Mailing-list: [email protected]