Top Banner
Federating Archives in the Federating Archives in the DELAMAN Network DELAMAN Network Reagan W. Moore Reagan W. Moore San Diego Supercomputer Center San Diego Supercomputer Center [email protected] http://www.npaci.edu/DICE/SRB http://www.npaci.edu/DICE/SRB Storage Resource Broker Storage Resource Broker
35

Federating Archives in the DELAMAN Network Reagan W. Moore San Diego Supercomputer Center [email protected] Storage Resource.

Dec 17, 2015

Download

Documents

Leslie Parsons
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Federating Archives in the DELAMAN Network Reagan W. Moore San Diego Supercomputer Center moore@sdsc.edu  Storage Resource.

Federating Archives in the Federating Archives in the DELAMAN NetworkDELAMAN Network

Reagan W. MooreReagan W. Moore

San Diego Supercomputer CenterSan Diego Supercomputer Center

[email protected]

http://www.npaci.edu/DICE/SRBhttp://www.npaci.edu/DICE/SRB

Storage Resource BrokerStorage Resource Broker

Page 2: Federating Archives in the DELAMAN Network Reagan W. Moore San Diego Supercomputer Center moore@sdsc.edu  Storage Resource.

• Build a shared collection• Authenticate users independently of the storage

systems• Control access independently of the storage

systems• Organize the file name space independently of

the storage systems• Manage context (metadata) independently of

content (files)• Maintain consistency between context and

operations on content

Distributed Data ManagementDistributed Data Management

Using Data GridsUsing Data Grids

Page 3: Federating Archives in the DELAMAN Network Reagan W. Moore San Diego Supercomputer Center moore@sdsc.edu  Storage Resource.

Storage Resource Broker

• Generic distributed data management technology• Data grids - sharing• Digital libraries - publication• Persistent archives - preservation

• Federated server architecture / thin client• 250,000 lines of “C” code• Supports all major compute and storage platforms

• All requirements listed on following Scenario slides are supported

Page 4: Federating Archives in the DELAMAN Network Reagan W. Moore San Diego Supercomputer Center moore@sdsc.edu  Storage Resource.

Scenario 1- Data MigrationScenario 1- Data Migration

• Provide URIDs (logical file names) that are independent of storage system

• Provide metadata for each file• Support browse and discovery on collection

hierarchy• Support access interfaces to the data• Support registration of existing files into a

shared collection• Single sign-on environment

• GSI / challenge response / tickets

Page 5: Federating Archives in the DELAMAN Network Reagan W. Moore San Diego Supercomputer Center moore@sdsc.edu  Storage Resource.

Managing Distributed DataManaging Distributed Data

Storage Repository

• Storage location

• User name

• File name

• File context (creation date,…)

• Access constraints

Data Access Methods (Web Browser, DSpace, OAI-PMH)

Naming conventions provided by storage systems

Page 6: Federating Archives in the DELAMAN Network Reagan W. Moore San Diego Supercomputer Center moore@sdsc.edu  Storage Resource.

Data Grids Provide a Level of Indirection Data Grids Provide a Level of Indirection for Each Naming Conventionfor Each Naming Convention

Storage Repository

• Storage location

• User name

• File name

• File context (creation date,…)

• Access constraints

Data Grid

• Logical resource name space

• Logical user name space

• Logical file name space (URID)

• Logical context (metadata)

• Control/consistency constraints

Data Collection

Data Access Methods (C library, Unix, Web Browser)

Data is organized as a shared collection

Page 7: Federating Archives in the DELAMAN Network Reagan W. Moore San Diego Supercomputer Center moore@sdsc.edu  Storage Resource.

Provide Context for DataProvide Context for Data

• Properties of files• Provenance - source• Descriptive attributes• State information resulting from operations on files

• Organize properties as metadata in a collection hierarchy• Define operations on file properties• Manage state information - location, replicas, containers, checksums

• Separate context management from content management• Maintain consistency of context as operations are done on content

• Support context management• Schema extension, automated SQL generation, bulk metadata load• Metadata extraction through a remote procedure parsing the file

Page 8: Federating Archives in the DELAMAN Network Reagan W. Moore San Diego Supercomputer Center moore@sdsc.edu  Storage Resource.

SRBserver

SRB agent

SRBserver

Federated Server ArchitectureFederated Server Architecture

MCAT

Read Application

SRB agent

1

2

34

6

5

Logical NameOr

Attribute Condition

1.Logical-to-Physical mapping2.Identification of Replicas3.Access & Audit Control

Peer-to-peer

Brokering

Server(s) SpawningData

Access

Parallel Data Access

R1R2

5/6

Page 9: Federating Archives in the DELAMAN Network Reagan W. Moore San Diego Supercomputer Center moore@sdsc.edu  Storage Resource.

Unix Shell

Java, NTBrowser

Kepler Actors

OAI,WSDL,WSRF

HTTPDSpace

OpenDAP

Archives - Tape,Sam-QFS, DMF,

HPSS, ADSM,UniTree, ADS

DatabasesDB2, Oracle, Sybase,SQLserver, Postgres,

mySQL, Informix

File SystemsUnix, NT,Mac OSX

Application

ORB

Storage Repository VirtualizationCatalog Abstraction

DatabasesDB2, Oracle, Sybase,

Postgres, mySQL,Informix

C, C++, Java Libraries

Logical Name Space

LatencyManagement

DataTransport

MetadataTransport

Consistency & Metadata Management / Authorization,Authentication,Audit

Linux I/O

DLL /Python,

Perl

Federation Management

Storage Resource Broker - Data Grid

Page 10: Federating Archives in the DELAMAN Network Reagan W. Moore San Diego Supercomputer Center moore@sdsc.edu  Storage Resource.
Page 11: Federating Archives in the DELAMAN Network Reagan W. Moore San Diego Supercomputer Center moore@sdsc.edu  Storage Resource.

Scenario 2 - Data ExchangeScenario 2 - Data Exchange

• Support access controls on the URIDs• Java administration GUI to support owner control of

access controls• Can delegate permission to set access controls• Access controls apply on all replicas independent of

storage system

• Support latency management for moving files across wide area networks• Parallel I/O, replication, staging, aggregation of data /

metadata / I/O commands

• Support integrity validation• Manage checksums for each file

Page 12: Federating Archives in the DELAMAN Network Reagan W. Moore San Diego Supercomputer Center moore@sdsc.edu  Storage Resource.

Latency Management -Bulk OperationsLatency Management -Bulk Operations

• Bulk register• Create a logical name for a file

• Bulk load• Create a copy of the file on a data grid storage repository

• Bulk unload• Provide containers to hold small files and pointers to each file location

• Bulk delete• Mark as deleted in metadata catalog• After specified interval, delete file

• Bulk metadata load• Support parsing of metadata from a remote file at remote storage

• Requests for bulk operations for access control setting, …

Page 13: Federating Archives in the DELAMAN Network Reagan W. Moore San Diego Supercomputer Center moore@sdsc.edu  Storage Resource.

Scenario 3 - Community AccessScenario 3 - Community Access

• Within the shared collection, the digital entities are owned and managed by the data grid• Files, URLs, SQL commands, database binary large objects can

be registered into the shared collection

• Access controls for• Files / metadata / storage systems

• Access controls are defined for multiple roles• Schema extension, create new metadata• Modify metadata• Add annotations• Turn on audit trails• Write data• Read data

Page 14: Federating Archives in the DELAMAN Network Reagan W. Moore San Diego Supercomputer Center moore@sdsc.edu  Storage Resource.

Scenario 4 - Explorative StudiesScenario 4 - Explorative Studies

• Uniform access mechanisms to data across all storage systems• Support for queries on databases• Support for formatting results (XML, HTML)• Support audit trails, encryption

• Support user-defined collection hierarchy• Soft links (build a logical collection of pointers to data

within the data grid)

• Support for multiple types of discovery• By URID (Logical File Name)• By query on metadata (may be unique to a single file)• By GUID (handle system)

Page 15: Federating Archives in the DELAMAN Network Reagan W. Moore San Diego Supercomputer Center moore@sdsc.edu  Storage Resource.

Scenario 5 - EducationScenario 5 - Education

• SRB is used to build digital libraries• Assemble class material• Manage student reports• Display material through web browsers

• Federation of digital libraries• Controlled sharing across independent data grids or

digital libraries• Support for cross-registration of logical name spaces• Authentication done by “home” data grid• Access controls managed by both data grids

Page 16: Federating Archives in the DELAMAN Network Reagan W. Moore San Diego Supercomputer Center moore@sdsc.edu  Storage Resource.

FederationFederation

Data Grid

• Logical resource name space

• Logical user name space

• Logical file name space

• Logical context (metadata)

• Control/consistency constraints

Data Collection B

Data Access Methods (Web Browser, DSpace, OAI-PMH)

Data Grid

• Logical resource name space

• Logical user name space

• Logical file name space

• Logical context (metadata)

• Control/consistency constraints

Data Collection A

Access controls and consistency constraints on cross registration of digital entities

Page 17: Federating Archives in the DELAMAN Network Reagan W. Moore San Diego Supercomputer Center moore@sdsc.edu  Storage Resource.

Scenario 6 - Updating ResourcesScenario 6 - Updating Resources

• Maintain system level metadata• Owner of registered file• Creation time, modification time, size, audit trails• Replica locations

• Support for synchronization of replicas• Can modify a replica, subsequent reads are to the

modified copy• Can synchronize copies to the modified version

• Support for physical file containers• Aggregate small files before storage

Page 18: Federating Archives in the DELAMAN Network Reagan W. Moore San Diego Supercomputer Center moore@sdsc.edu  Storage Resource.

Scenario 7 - Web-based EditionsScenario 7 - Web-based Editions

• Support for digital library interfaces on top of the data grid• Transana - technology to manipulate, edit, and

manage classroom video (University of Wisconsin)• DSpace - digital library system to manage ingestion of

material into a collection• OAI-PMH - Open Archives Initiative protocol for

metadata harvesting• OpenDAP - Data Access Protocol that supports both

semantic and structural manipulation of registered files• Windows browser, Web browser, Java, WSDL

interfaces• Collaborating on development of portlet interface

Page 19: Federating Archives in the DELAMAN Network Reagan W. Moore San Diego Supercomputer Center moore@sdsc.edu  Storage Resource.

Unix Shell

Java, NTBrowser

Kepler Actors

OAI,WSDL,WSRF

HTTPDSpace

OpenDAP

Archives - Tape,Sam-QFS, DMF,

HPSS, ADSM,UniTree, ADS

DatabasesDB2, Oracle, Sybase,SQLserver,Postgres,

mySQL, Informix

File SystemsUnix, NT,Mac OSX

Application

ORB

Storage Repository VirtualizationCatalog Abstraction

DatabasesDB2, Oracle, Sybase,

Postgres, mySQL,Informix

C, C++, Java Libraries

Logical Name Space

LatencyManagement

DataTransport

MetadataTransport

Consistency & Metadata Management / Authorization,Authentication,Audit

Linux I/O

DLL /Python,

Perl

Federation Management

Storage Resource Broker - Data Grid

Page 20: Federating Archives in the DELAMAN Network Reagan W. Moore San Diego Supercomputer Center moore@sdsc.edu  Storage Resource.

Scenario 8 - Unconnected EditionsScenario 8 - Unconnected Editions

• Ability to download data from shared collection to local resource• Support for PCs, workstations,

supercomputers

• Generalization of anonymous FTP• Can issue a ticket permitting

• Limited number of read accesses valid for specified time interval

• Can set public access to a sub-collection• Can restrict access by user

name/domain/zone

Page 21: Federating Archives in the DELAMAN Network Reagan W. Moore San Diego Supercomputer Center moore@sdsc.edu  Storage Resource.

Local ArchivesLocal Archives

• Maintain files in local file system• Register existence of the files into the data grid• Issue synchronization command to replicate

into the archive

• Maintain a data grid on the local system• Entire environment can be installed on a Mac in

15 minutes (Perl install script)• Use data grid federation to synchronize name

spaces, files, metadata from local data grid to archives data grid

Page 22: Federating Archives in the DELAMAN Network Reagan W. Moore San Diego Supercomputer Center moore@sdsc.edu  Storage Resource.

Scenario 9 - Collaborative CommmentaryScenario 9 - Collaborative Commmentary

• Comments can be added by owner• Annotations can be added by authorized

persons• Annotations marked by person name, date• Can restrict annotation right by group

• Can choose to create explicit metadata attributes to manage comments• Can store multiple comments per object• Can search across metadata

• Or can use digital library interfaces to manage comments

Page 23: Federating Archives in the DELAMAN Network Reagan W. Moore San Diego Supercomputer Center moore@sdsc.edu  Storage Resource.

Sites Using the SRBSites Using the SRBCiteSeer, Penn StateCity Univ. of New YorkGeospatial Environment, UCSDDrexel UniversityEOSDIS Distributed Active, NASA GoddardGeorgia TechKentucky State Libraries & ArchivesLibrary of CongressLos Alamos National LabNASA AmesNASA Goddard Space Flight CenterNCSA Grid Computing NIH (NCI Center for Bioinformatics)Penn State UniversityPittsburgh Supercomputing CenterPurdue University. IndianaStanford UniversityTACC, University of TexasTexas A & MUC Santa CruzUCLAUCSD NeuroscienceUniversity of MarylandUniversity of Michigan, CAC department University of New MexicoUniversity of WashingtonUniversity of WisconsinUSCYale University

Academia Sinica, TaiwanASCC, Computing Centre, TaiwanAustralian National UniversityBedford Oceanography,CanadaBioinformatics Institute, SingaporeCSIRO, AustraliaData Storage Institute, SingaporeEGEE, French National CenterGeoForschungsZentrum, GermanyJames Cook University, AustraliaKEK High Energy Physics, JapanMax Planck Institute, NetherlandsParallab, NorwaySouth Australian Advanced ComputingUIB (Parallab) , NorwayUniversity of AmsterdamUniversity of Cambridge, AstronomyUniversity of Cambridge, e-ScienceUniversity of EdinburghUniversity of Genoa, ItalyUniversity of Hong KongUnivrsity of ManchesterUniversity of OsloUniversity of SouthamptonYork Univ (UK)

Page 24: Federating Archives in the DELAMAN Network Reagan W. Moore San Diego Supercomputer Center moore@sdsc.edu  Storage Resource.

Storage Resource Broker Collections at SDSC(11/2/2004)

GBs ofdata

stored

Numberof files

Numberof Users

Data Grid      

NSF/ITR - National Virtual Observatory 53,858 9,536,698 80NSF - National Partnership for Advanced Computational Infrastructure 24,738 5,754,890 380

Hayden Planetarium - Evolution of the Solar System visualizations 7,201 113,600 178

NSF/NPACI - Joint Center for Structural Genomics 5,228 652,031 50

NSF/NPACI - Biology and Environmental collections 8,851 33,340 67

NSF - TeraGrid, ENZO Cosmology simulations 121,550 1,096,947 3,247

NIH - Biomedical Informatics Research Network 6,002 4,107,508 214

Digital Library      

NLM - Digital Embryo image collection 720 45,365 23

NSF/NPACI - Long Term Ecological Reserve 253 8,436 36

NSF/NPACI - Grid Portal 2,211 51,227 407

NIH - Alliance for Cell Signaling microarray data 856 62,291 21

NSF - National Science Digital Library SIO Explorer collection 2,080 808,901 27

NSF/NPACI -Transana education research video collection 92 2,387 26

NSF/ITR - Southern California Earthquake Center 91,040 1,791,494 62

Persistent Archive      

UCSD Libraries archive 128 204,828 29

NARA- Research Prototype Persistent Archive 166 316,813 58

NSF - National Science Digital Library persistent archive 3,571 26,908,350 122

TOTAL 328 TB 51 million 4,900

Page 25: Federating Archives in the DELAMAN Network Reagan W. Moore San Diego Supercomputer Center moore@sdsc.edu  Storage Resource.

Generic InfrastructureGeneric Infrastructure

• SDSC developed the Storage Resource Broker (SRB) to support access to distributed data• Effort started in 1996 as a DARPA funded project• Now support over 30 national/international projects

• Development team of 12 staff is led by• Michael Wan, data management systems• Arcot Rajasekar , information management systems

Page 26: Federating Archives in the DELAMAN Network Reagan W. Moore San Diego Supercomputer Center moore@sdsc.edu  Storage Resource.

• Arun Jagatheesan• George Kremenek• Sheau-Yen Chen• Arcot Rajasekar (SRB development

lead)• Reagan Moore (SRB PI)• Michael Wan (SRB architect)• Roman Olschanowsky (BIRN)• Bing Zhu• Charlie Cowart• Lucas Gilbert • Tim Warnock• Wayne Schroeder (SRB product)• Adam Birnbaum (SRB production)• Antoine De Torcy• Vicky Rowley (BIRN)• Marcio Faerman (SCEC)• Students & emeritus

• Erik Vandekieft• Reena Mathew• Xi (Cynthia) Sheng• Allen Ding• Grace Lin• Qiao Xin• Daniel Moore• Ethan Chen• Jon Weinburg

• Supported by overt 20 projects (NSF, DOE, NASA, NARA, NIH, LOC, NHPRC)

SDSC SRB Team SDSC SRB Team (left to right)(left to right)

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.QuickTime™ and a

TIFF (Uncompressed) decompressorare needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture. QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Page 27: Federating Archives in the DELAMAN Network Reagan W. Moore San Diego Supercomputer Center moore@sdsc.edu  Storage Resource.

Data Grid CapabilitiesData Grid Capabilities

• Data manipulation• Containers• Parallel I/O• Firewall interactions

• Resource interactions• Fault tolerance• Load leveling• Replication

• HIPAA security requirements• Authentication of all users• Access controls on data and metadata• Audit trails• Data encryption• Centralized control

• Application interfaces• C library, Shell commands, Java, Perl, Python, WSDL, workflow

Page 28: Federating Archives in the DELAMAN Network Reagan W. Moore San Diego Supercomputer Center moore@sdsc.edu  Storage Resource.

Data Management System Data Management System FeaturesFeatures

• Data grid for managing distributed data• Latency management for bulk analyses of collections• Infrastructure independent name spaces for describing

data, resources, users, and state information

• Digital library for managing data context• Curation services for managing collections• Descriptive metadata for discovery

• Persistent archive to manage technology evolution• Interoperability mechanisms between heterogeneous

storage systems and user access mechanisms

Page 29: Federating Archives in the DELAMAN Network Reagan W. Moore San Diego Supercomputer Center moore@sdsc.edu  Storage Resource.

BIRN - Biomedical Informatics BIRN - Biomedical Informatics Research Network Data GridResearch Network Data Grid

DukeUCLA

Cal Tech

Wash U.Duke

Harvard

NIH/NCRR Centers for Imaging and Computing

Cal-(IT)2NPACI/SDSC

“Deep Web”

“Surface Web”

Integrating Cyber Infrastructure to Link: •Advanced Imaging Instruments •Data Intensive Computing •Multi-Scale Brain Databases

Wireless “Pad” Web Interface

Page 30: Federating Archives in the DELAMAN Network Reagan W. Moore San Diego Supercomputer Center moore@sdsc.edu  Storage Resource.

Digital LibraryDigital Library

• Collection hierarchy for organizing data• User-defined metadata• Collection level metadata

• Metadata manipulation• Schema extension• Bulk metadata processing• Queries on metadata• Access controls on metadata• Views on collections

• Digital library APIs• DSpace, Fedora, OAI-PMH, web browsers• METS metadata XML schema

Page 31: Federating Archives in the DELAMAN Network Reagan W. Moore San Diego Supercomputer Center moore@sdsc.edu  Storage Resource.

Southern California Earthquake Southern California Earthquake CenterCenter

Store seismic data • Managing over 90 TBs, over 1.7

million files• Store community models for

seismic velocity• Data distributed between USC,

SDSC

SCEC community digital library • Storage Resource Broker data

grid technology• NMI portal interface• Digital library services to

display seismograms• Visualizations of seismic waves

at the surface• Visualization of seismic wave

propagation through the volume

SCEC Community

Library

Select Receiver (Lat/Lon)

OutputTime HistorySeismograms

Select ScenarioFault Model

Source Model

Page 32: Federating Archives in the DELAMAN Network Reagan W. Moore San Diego Supercomputer Center moore@sdsc.edu  Storage Resource.

Registry Layer

Existing Data Centers

Data Services

Semantics (UCD)

SIA

P, S

SA

P

VO

Tab le

FIT

S, G

I F,…

OpenS

kyQuery

SkyQueryVOPlot OASISconVOT

TopcatMirage

AladinDIS

Disks, Tapes, CPUs, Fiber

Grid MiddlewareSRB, Globus, OGSA

SOAP, GridFTP

data mining

visualization

image

sourcedetection

Virtual Observatory Architecture

Digital LibraryOther registriesXML, DC, METS

OAI ADS

My Space storage services

Databases, Persistency, Replication

Virtual Data

Workflow (pipelines)

Discover Compute Publish Collaborate

Authentication & Authorization

crossmatch

HTTP Services SOAP Services Grid Services stateless, registered self-describing persistent, authenticated

Portals, User Interfaces, Tools

Compute Services

Bulk A

ccess

interfaces to data

National Virtual ObservatoryNational Virtual Observatory

Provide access to large star catalogs and large image sky surveys

• 2MASS • SDSS• DPOSS• USNO-B• Macho

Page 33: Federating Archives in the DELAMAN Network Reagan W. Moore San Diego Supercomputer Center moore@sdsc.edu  Storage Resource.

National Science Digital LibraryNational Science Digital Library

Web Interface to Persistent Archive

Preserve educational material that has been registered into a central repository at Cornell through URLs• Crawl web and retrieve material, 10 levels of indirection• Convert internal URLs into data grid handles• Aggregate files into containers for storage• Preserve using SRB data grid technology• Currently housing over 26 million files

Page 34: Federating Archives in the DELAMAN Network Reagan W. Moore San Diego Supercomputer Center moore@sdsc.edu  Storage Resource.

National Archives and Records National Archives and Records Administration - Research Prototype Administration - Research Prototype

Persistent ArchivePersistent Archive

NARA U Md SDSC

MCAT MCAT MCAT

Principle copystored at NARAwith completemetadata catalog

Replicated copyat U Md for improvedaccess, load balancingand disaster recovery

Deep Archive atSDSC, no useraccess, but complete copy

Demonstrate preservation environment • Authenticity• Integrity• Management of technology evolution• Mitigation of risk of data loss

• Replication of data• Federation of catalogs

• Management of preservation metadata• Scalability

• EAP collection• 350,000 files• 1.2 TBs in size

Federation of Three Independent Data Grids

Page 35: Federating Archives in the DELAMAN Network Reagan W. Moore San Diego Supercomputer Center moore@sdsc.edu  Storage Resource.

For More InformationFor More Information

Reagan W. MooreSan Diego Supercomputer Center

[email protected]

http://www.npaci.edu/DICE

http://www.npaci.edu/DICE/SRB

http://www.npaci.edu/dice/srb/mySRB/mySRB.html