Top Banner
Data Publishing Service Indiana University Stacy Kowalczyk April 9, 2010
13
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Data Publishing Service Indiana University Stacy Kowalczyk April 9, 2010.

Data Publishing Service Indiana University

Stacy KowalczykApril 9, 2010

Page 2: Data Publishing Service Indiana University Stacy Kowalczyk April 9, 2010.

Questions

• Which phases of the data life cycle are managed by your repository?

• How do data management requirements differ across the data life cycle?

• What systems do you use to support the data life cycle?

• Can you generalize the mechanisms used to migrate data between different phases of the data life cycle?

Page 3: Data Publishing Service Indiana University Stacy Kowalczyk April 9, 2010.

Data Publishing Service• A new service of the IUScholarWorks institutional

repository and the Scholarly Data Services• Providing data management support and data access• Data will have a persistent URL so it can be linked to

publications• The service will combine our DSpace repository with

IU’s Scholarly Data system (formerly known as MDSS), a system that researchers are already uses

• Allows discovery over the Web• Preservation – bit level

Page 4: Data Publishing Service Indiana University Stacy Kowalczyk April 9, 2010.

Current Data Lifecycle Model Implementation

Scholarly Data ServiceData creationresearch designdata management planningdata collection (surveying, experimentation, measuring etc.)data checking and cleaning

↓Data analysisanalysisderived data creationcreation of data documentation

↓End of researchresearch outputspreparing data for preservation

IU ScholarWorksPreservation of datastorage of datamigration to suitable format/mediummetadata creation

↓Distribution/publication of data

↓Re-use of databy same researcherby other researchers

http://www.data-archive.ac.uk/sharing/lifecycle.asp

Page 5: Data Publishing Service Indiana University Stacy Kowalczyk April 9, 2010.

Scholarly Data Service

• Massive Data Storage System• Current system for research data storage• Installed in 1998• Based on IBM developed High Performance

Storage System (HPSS) software• It offers over 2.8 petabytes of disk- and

tape-based storage. Distributed between Indianapolis and Bloomington campuses

Page 6: Data Publishing Service Indiana University Stacy Kowalczyk April 9, 2010.

IUBSubsystem

IUPUISubsystem

Research Network

Research Network

BloomingtonUsers

BloomingtonUsers Indianapolis

UsersIndianapolis

Users

HPSSMoversHPSS

MoversHPSS

MoversHPSS

Movers

Research Network

Research Network

TCP/IP Wide Area

Network

SANSANSANSAN

IUBCampus Network

IUPUICampus Network

Disk ArraysDisk Arrays Tape LibraryTape LibraryDisk ArraysDisk Arrays Tape LibraryTape Library

HPSS CoreServers

HPSS CoreServers

Distributed between IUB and IUPUI

Page 7: Data Publishing Service Indiana University Stacy Kowalczyk April 9, 2010.

Data Publishing in IU Scholarworks

• Discovery and access of datasets and related publications through the IUScholarWorks Repository service

• DSpace records that are searchable, indexed, and harvested and available at stable URLs

• DSpace records that contain DSpace bitstreams for small datasets

• DSpace records that link via stable URLs to large datasets in IU MDSS

Page 8: Data Publishing Service Indiana University Stacy Kowalczyk April 9, 2010.

IU MDSS

MDSS web server

HTTP Server

hpssfs filesystem

IUScholarWorks Data: Linking to MDSS and delivery via HTTP

Item record with URL’s of

datasets in MDSS

Page 9: Data Publishing Service Indiana University Stacy Kowalczyk April 9, 2010.

Data Publishing in IU Scholarworks• Facilitating the submission process for

both the researcher and collection manager

• We facilitate the process for submitters via the DSpace Configurable Submission system

• We facilitate the data collection manager’s process via steps in the DSpace workflow system

Page 10: Data Publishing Service Indiana University Stacy Kowalczyk April 9, 2010.

IU MDSS

Initiate MDSS actions (move datasets, etc.)

Instructions and

preparation

Describe item

metadata form(s)

Review step

File upload step

MDSS and dataset

info/form

Finalize/ Accept License

IUScholarWorks Data: Item submission user interfacePhase 2, automated workflow

DSpace Configurable Submission System

Non-interactive processing steps Update

metadata

Query MDSS technical metadata

(checksum, etc.)

Page 11: Data Publishing Service Indiana University Stacy Kowalczyk April 9, 2010.

Planning for a More Curated Life Cycle Model

April 10, 2023

http://libraries.mit.edu/guides/subjects/data-management/cycle.html

Page 12: Data Publishing Service Indiana University Stacy Kowalczyk April 9, 2010.

Active and Social Curation

• Engage researchers during projects not at the end

• Use immediate benefits to drive automatic capture and 'volunteering’ of metadata

• Reduce costs by re-engineering curation processes to leverage this rich metadata and volunteered effort

Page 13: Data Publishing Service Indiana University Stacy Kowalczyk April 9, 2010.

Appraisal and

Selection Trusted Digital Repository Federation (OAIS compliant) Preservation

Actions

Compound Objects - OAI-ORE

Dissemination Packages

Ingest, AIPs

Active Data Systems

Data Acquisition, Analysis and Simulation

Search, Browse,

Annotation, Visualization

Tools

Metadata Management

DDI3. METS, PREMIS, MODS, DC, SensorML,

OGC, …

Automated Curation Workflow/Rule

Engine

Operates on Metadata, Content Objects and

Trigger Events

Access Mechanisms and E-Scholarship Services

Migration and

Emulation Tools

Use, Reuse, Repurposing

Tools

Wide-Area File System

Active Curation OAIS Repository FederationCuration Boundary

UserContributor