Top Banner
SAN DIEGO SUPERCOMPTER CENTER UC SAN DIEGO LIBRARIES NDIIPP PARTNERS MEETING 07.09.08 David Minor SDSC Robert H. McDonald SDSC Sangchul Song UMIACS Bryan Beecher ICPSR Justin Littman LC Chronopolis in Practice
20

SAN DIEGO SUPERCOMPTER CENTERUC SAN DIEGO LIBRARIESNDIIPP PARTNERS MEETING 07.09.08 David Minor SDSC Robert H. McDonald SDSC Sangchul Song UMIACS Bryan.

Dec 22, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: SAN DIEGO SUPERCOMPTER CENTERUC SAN DIEGO LIBRARIESNDIIPP PARTNERS MEETING 07.09.08 David Minor SDSC Robert H. McDonald SDSC Sangchul Song UMIACS Bryan.

SAN DIEGO SUPERCOMPTER CENTER UC SAN DIEGO LIBRARIESNDIIPP PARTNERS MEETING 07.09.08

David MinorSDSC

Robert H. McDonaldSDSC

Sangchul Song UMIACS

Bryan BeecherICPSR

Justin LittmanLC

Chronopolis in Practice

Page 2: SAN DIEGO SUPERCOMPTER CENTERUC SAN DIEGO LIBRARIESNDIIPP PARTNERS MEETING 07.09.08 David Minor SDSC Robert H. McDonald SDSC Sangchul Song UMIACS Bryan.

SAN DIEGO SUPERCOMPTER CENTER UC SAN DIEGO LIBRARIESNDIIPP PARTNERS MEETING 07.09.08

Outline Current Chronopolis Implementation

Accomplishments (2/08 – Present) Ingested Content Transmission

Technologies for Ingest ICPSR – SRB CDL – Bagit NCSU - Bagit

Technologies for Integrity Audit Control Environment

Questions

Page 3: SAN DIEGO SUPERCOMPTER CENTERUC SAN DIEGO LIBRARIESNDIIPP PARTNERS MEETING 07.09.08 David Minor SDSC Robert H. McDonald SDSC Sangchul Song UMIACS Bryan.

SAN DIEGO SUPERCOMPTER CENTER UC SAN DIEGO LIBRARIESNDIIPP PARTNERS MEETING 07.09.08

Chronopolis Implementation

Sun 614062TB

Sun 614062TB

SRB D-Broker

SRB D-Broker

SRB MCAT

Sun SAM-QFS

Sun SAM-QFS

SRB D-Broker

SRB D-Broker

SRB MCAT

Apple XsanApple Xsan

SRB D-Broker

SRB D-Broker

SRB MCAT

CDL Server

ICPSR Server

NCAR Network

MarylandNetwork

SDSC Network

ICPSR Network

UC Berkeley Network

Chronopolis Data 12-25TB

Chronopolis Data 12-25TB

Chronopolis Data 12TB

Chronopolis Data 12TB

CDL Server

SDSC Network

NCAR Network

UMD Network

Tape SilosTape Silos

Adapted from Bryan Banister (SDSC)

Page 4: SAN DIEGO SUPERCOMPTER CENTERUC SAN DIEGO LIBRARIESNDIIPP PARTNERS MEETING 07.09.08 David Minor SDSC Robert H. McDonald SDSC Sangchul Song UMIACS Bryan.

SAN DIEGO SUPERCOMPTER CENTER UC SAN DIEGO LIBRARIESNDIIPP PARTNERS MEETING 07.09.08

Key Deliverables 07/08 7.2 - A well-integrated network and data grid for

content sharing among CDL and ICPSR supporting sustained high-capacity transfer rates.

7.3 - An integrated set of monitoring tools for the Chronopolis Data Grid using the replication monitor, ACE, and INCA for the Library community.

7.5 - A Dissemination Information Package (DIP) for content submitted by both ICPSR and CDL will be available for both ICPSR and CDL to retrieve their content from the Chronopolis gateway.

7.7 - An ingested content collection from ICPSR of 12-15 TB

7.8 - An ingested content collection from CDL of 25 TB

Page 5: SAN DIEGO SUPERCOMPTER CENTERUC SAN DIEGO LIBRARIESNDIIPP PARTNERS MEETING 07.09.08 David Minor SDSC Robert H. McDonald SDSC Sangchul Song UMIACS Bryan.

SAN DIEGO SUPERCOMPTER CENTER UC SAN DIEGO LIBRARIESNDIIPP PARTNERS MEETING 07.09.08

7.5 Deliverable Refinements

7.5 A Dissemination Information Package (DIP) for content submitted by both ICPSR and CDL will be available for both ICPSR and CDL to retrieve their content from the Chronopolis gateway.

Two Components Emerging

Component 1

DIP based on Bagit structure

Component 2

DIP that supports transmission package to load into Fedora repository software

Page 6: SAN DIEGO SUPERCOMPTER CENTERUC SAN DIEGO LIBRARIESNDIIPP PARTNERS MEETING 07.09.08 David Minor SDSC Robert H. McDonald SDSC Sangchul Song UMIACS Bryan.

SAN DIEGO SUPERCOMPTER CENTER UC SAN DIEGO LIBRARIESNDIIPP PARTNERS MEETING 07.09.08

Accomplishments (2/08-Present)

NDIIPP Client Ingested Content ICPSR – 5 TB (Staging) CDL – 4 TB (Staging)

Chronopolis Replicated Content SDSC UMIACS – 3 TB (Copy 2) SDSC NCAR (forthcoming)

Transmission Speed-Ingest ICSPR – Approx 1 TB per day CDL – Bagit Tests using LC python scripts (15

processes)City Bag – 46.22 Mb/sec – 498.96 GB per dayState Bag – 42.88 Mb/sec – 463.10 GB per day

Page 7: SAN DIEGO SUPERCOMPTER CENTERUC SAN DIEGO LIBRARIESNDIIPP PARTNERS MEETING 07.09.08 David Minor SDSC Robert H. McDonald SDSC Sangchul Song UMIACS Bryan.

SAN DIEGO SUPERCOMPTER CENTER UC SAN DIEGO LIBRARIESNDIIPP PARTNERS MEETING 07.09.08

New Partners

N.C. State GIS Data@5 TBs

Already working with BagIt Format

Scripps Institute of Oceanography Data@2 TBsAlready working with SRB

Page 8: SAN DIEGO SUPERCOMPTER CENTERUC SAN DIEGO LIBRARIESNDIIPP PARTNERS MEETING 07.09.08 David Minor SDSC Robert H. McDonald SDSC Sangchul Song UMIACS Bryan.

SAN DIEGO SUPERCOMPTER CENTER UC SAN DIEGO LIBRARIESNDIIPP PARTNERS MEETING 07.09.08

Technologies for Ingest/Replication

SRB to SRB Connections ICPSR-Client Scripps-Client UMIACS-Chronopolis Partner NCAR-Chronopolis Partner

Bagit Transfers CDL NC State

Page 9: SAN DIEGO SUPERCOMPTER CENTERUC SAN DIEGO LIBRARIESNDIIPP PARTNERS MEETING 07.09.08 David Minor SDSC Robert H. McDonald SDSC Sangchul Song UMIACS Bryan.

Transfer Methodology (ICPSR – Client)

• Synchronize collections of content with SDSC’s storage grid Original scope was just our web-delivered content

• Compressed• 400GB• Tens of thousands of files

Since then we have copied our complete holdings• Uncompressed• 5000GB• Millions of files

Page 10: SAN DIEGO SUPERCOMPTER CENTERUC SAN DIEGO LIBRARIESNDIIPP PARTNERS MEETING 07.09.08 David Minor SDSC Robert H. McDonald SDSC Sangchul Song UMIACS Bryan.

Transfer method

• SRB utilities are the base Sput Srsync

• Cannot use the utilities “out of the box” Too many files Too many timeouts

• Wrap the utilities with some simple shell script grouping

Page 11: SAN DIEGO SUPERCOMPTER CENTERUC SAN DIEGO LIBRARIESNDIIPP PARTNERS MEETING 07.09.08 David Minor SDSC Robert H. McDonald SDSC Sangchul Song UMIACS Bryan.

Example

• Metadata resides in Oracle; dump it nightly to SRB Sput –fK /path/to/oracle/export s:/SDSC-chron/icpsr.umich/database

• Files reside elsewhere and there are LOTS Wrap Sinit, Srsync and Sexit in a script, Ssend Invoke via a mechanism like this:

• find /archive <criteria> | xargs –n 3 –P 0 Ssend

Select a bunch of “just big enough” directories to feed into Ssend, and not too many at a time

Page 12: SAN DIEGO SUPERCOMPTER CENTERUC SAN DIEGO LIBRARIESNDIIPP PARTNERS MEETING 07.09.08 David Minor SDSC Robert H. McDonald SDSC Sangchul Song UMIACS Bryan.

BagIt• Motivating use cases:

– Transfer of content internally and between preservation partners

– Long-term storage of content

• Needs:

– Minimally self-identifying and self-describing packages– Support for error detection and transfer optimization

• Characteristics:

– Low overhead

– Content agnostic

– Supported by off-the-shelf tools (e.g., MD5Deep)

Page 13: SAN DIEGO SUPERCOMPTER CENTERUC SAN DIEGO LIBRARIESNDIIPP PARTNERS MEETING 07.09.08 David Minor SDSC Robert H. McDonald SDSC Sangchul Song UMIACS Bryan.

• Informed by

• LC's eDeposit Pilot Project

• NDIIPP Archive and Ingest Handling Test (AIHT)• Tabata et al., “Enclose-and-Deposit Method,”

IWAW ’05

• Documented at

• www.ietf.org/internet-drafts/draft-kunze-bagit-01.txt

• www.cdlib.org/inside/diglib/bagit/bagitspec.html

Page 14: SAN DIEGO SUPERCOMPTER CENTERUC SAN DIEGO LIBRARIESNDIIPP PARTNERS MEETING 07.09.08 David Minor SDSC Robert H. McDonald SDSC Sangchul Song UMIACS Bryan.

• Basic bag:<bag_dir>/

bagit.txt

manifest-<algorithm>.txt

[optional additional tag files]

data/

[content file hierarchy]

• Bag parts:– bagit.txt: Bag signature– manifest-<algorithm>.txt: List of content files and

fixities

• Example, manifest-md5.txt:

49afbd86a1ca9f34b677a3f09655eae9 data/27613-h/images/q172.png

408ad21d50cef31da4df6d9ed81b01a7 data/27613-h/images/q172.txt

– package-info.txt: Bag contents metadata (optional)– fetch.txt: Bag contents included by reference (optional)

Page 15: SAN DIEGO SUPERCOMPTER CENTERUC SAN DIEGO LIBRARIESNDIIPP PARTNERS MEETING 07.09.08 David Minor SDSC Robert H. McDonald SDSC Sangchul Song UMIACS Bryan.

UNIVERSITY of MARYLAND INSTITUTE for ADVANCED COMPUTER STUDIES

ACE – Auditing Control Environment

• Software to ensure the long term integrity of digital objects.

• Underpinnings are based on rigorous cryptographic techniques and a third party integrity management and auditing.

• Automatic regular audits based on policies set by the archive manager.

• Scalable, cost-effective, and can interoperate with any archiving architecture.

Page 16: SAN DIEGO SUPERCOMPTER CENTERUC SAN DIEGO LIBRARIESNDIIPP PARTNERS MEETING 07.09.08 David Minor SDSC Robert H. McDonald SDSC Sangchul Song UMIACS Bryan.

UNIVERSITY of MARYLAND INSTITUTE for ADVANCED COMPUTER STUDIESACE – System Architecture

reply

Token Registry

hdd

Archiving Node

cd-romtape drive

request

ACE Audit Manager

Third-Party Integrity Management System

CryptoSummary

Information

reply

Token Registry

hdd

Archiving Node

cd-romtape drive

request

ACE Audit Manager

witnesses witnesses

Audit Policy Audit Policy

Page 17: SAN DIEGO SUPERCOMPTER CENTERUC SAN DIEGO LIBRARIESNDIIPP PARTNERS MEETING 07.09.08 David Minor SDSC Robert H. McDonald SDSC Sangchul Song UMIACS Bryan.

UNIVERSITY of MARYLAND INSTITUTE for ADVANCED COMPUTER STUDIES

ACE Audit

Each digital object is periodically audited using the integrity token, according to the policy set by the local manager.

Cryptographic summaries are audited as necessary by the archive or an independent party using the published witness values.

Page 18: SAN DIEGO SUPERCOMPTER CENTERUC SAN DIEGO LIBRARIESNDIIPP PARTNERS MEETING 07.09.08 David Minor SDSC Robert H. McDonald SDSC Sangchul Song UMIACS Bryan.

UNIVERSITY of MARYLAND INSTITUTE for ADVANCED COMPUTER STUDIES

ACE Screen Shots

Last audit:successful

Adding a CollectionAuditing a CollectionViewing an Error Report

Action Pane(Collection Specific)

Status Pane(Overview)

Start AuditingEdit Collection LocationRemove CollectionBrowse Collection

View Events

View Error Report

Page 19: SAN DIEGO SUPERCOMPTER CENTERUC SAN DIEGO LIBRARIESNDIIPP PARTNERS MEETING 07.09.08 David Minor SDSC Robert H. McDonald SDSC Sangchul Song UMIACS Bryan.

SAN DIEGO SUPERCOMPTER CENTER UC SAN DIEGO LIBRARIES

Q and A

Page 20: SAN DIEGO SUPERCOMPTER CENTERUC SAN DIEGO LIBRARIESNDIIPP PARTNERS MEETING 07.09.08 David Minor SDSC Robert H. McDonald SDSC Sangchul Song UMIACS Bryan.

SAN DIEGO SUPERCOMPTER CENTER UC SAN DIEGO LIBRARIES