Top Banner
Active Data Curation in Libraries: Issues and Challenges ASEE ELD Presentation June 27, 2011 William H. Mischo & Mary C. Schlembach
11

Active Data Curation in Libraries: Issues and Challenges ASEE ELD Presentation June 27, 2011 William H. Mischo & Mary C. Schlembach.

Dec 21, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Active Data Curation in Libraries: Issues and Challenges ASEE ELD Presentation June 27, 2011 William H. Mischo & Mary C. Schlembach.

Active Data Curation in Libraries: Issues and Challenges

ASEE ELD PresentationJune 27, 2011

William H. Mischo & Mary C. Schlembach

Page 2: Active Data Curation in Libraries: Issues and Challenges ASEE ELD Presentation June 27, 2011 William H. Mischo & Mary C. Schlembach.

Active Data Curation• Curation is the active use of data. It is a lifecycle

process.• Curation requires discipline specific knowledge

and experience.• Domain dependent curation rules and

preservation actions must be merged into the scientific workflow processes.

• Need to automate data ingest, descriptive metadata creation, preservation and digital object relationships.

Page 3: Active Data Curation in Libraries: Issues and Challenges ASEE ELD Presentation June 27, 2011 William H. Mischo & Mary C. Schlembach.

Scientific Workflow

Fedora/Hydra Trusted Digital Repository (OAIS compliant)

Preservation Actions

Metadata Management

METS, PREMIS, MODS, DC, XSLT

The Grainger Library Active Data Curation The Grainger Library Active Data Curation Lifecycle ElementsLifecycle Elements

Curation Rule Engine

Operates on Metadata, Content Objects

AIPs, OAI-ORE

Curation Rule Engine:-- Domain dependent

-- Can be invoked explicitly-- But also automated based on

system trigger events

CI-3, CI-5 Responses

Access Mechanisms and E-Scholarship

Services, GRIPs

DIP Packages

SIP packagesAppraisal

and Selection

Migration and

Emulation Tools

Use, Reuse, Repurposing

Tools

Page 4: Active Data Curation in Libraries: Issues and Challenges ASEE ELD Presentation June 27, 2011 William H. Mischo & Mary C. Schlembach.

Say What?• What is the role of the library? The engineering

librarian? The campus? The subject discipline? • Libraries are creating content asset preservation

systems. Trusted Digital Repositories. Fedora/Hydra/archivematica at UIUC Library.

• Role for the science/engineering library: connecting data to literature.

• Knowledge creation process and libraries.• GrIPs (Group Information Profiles).• NSF Data Management Plans.

Page 5: Active Data Curation in Libraries: Issues and Challenges ASEE ELD Presentation June 27, 2011 William H. Mischo & Mary C. Schlembach.
Page 6: Active Data Curation in Libraries: Issues and Challenges ASEE ELD Presentation June 27, 2011 William H. Mischo & Mary C. Schlembach.

What Data should be Curated?• Defining data curation: DataNet projects: Data

Conservancy (Hopkins), DataONE (New Mexico). • Purdue profiles.• Raw data and processed data.• We surveyed several groups in specific

disciplines. – Atmospheric Sciences (experimental)– Biophysics (simulation data).

Page 7: Active Data Curation in Libraries: Issues and Challenges ASEE ELD Presentation June 27, 2011 William H. Mischo & Mary C. Schlembach.

Atmospheric Science: Experimental Data• Five levels and two data streams:

– Level 1: raw voltages from an instrument– Level 2: calibrated data derived from raw

voltages– Level 3: image products displaying the data– Level 4: derived parameters, statistics, etc.

from calibrated data– Level 5: analysis of Level 4 data that winds

up in papers, publications, etc.• Two other necessary data streams: ancillary

instrument information and metadata.

Page 8: Active Data Curation in Libraries: Issues and Challenges ASEE ELD Presentation June 27, 2011 William H. Mischo & Mary C. Schlembach.

Biophysics: Simulation Data• Modeling of interactions of atomic level molecular data.• Three levels:

– Level 1: raw data from simulation run: positions and velocities of particles; software widely used.

– Level 2: various raw data extracts of subsets of particles run data.

– Level 3: visualization files (movie, images); analysis products generated from the visualization data for publication data.

• Also necessary are input parameters (starting coordinates, etc.) and other metadata.

Page 9: Active Data Curation in Libraries: Issues and Challenges ASEE ELD Presentation June 27, 2011 William H. Mischo & Mary C. Schlembach.

Data Management Plan• The Data Management Plan (DMP) is a new NSF

mandatory supplementary document for all research proposals.– http://www.nsf.gov/bfa/dias/policy/dmp.jsp

• Each directorate, including the Engineering Directorate (ENG) is providing specific directions and required elements.

• The ENG document: http://nsf.gov/eng/general/ENG_DMP_Policy.pdf

Page 10: Active Data Curation in Libraries: Issues and Challenges ASEE ELD Presentation June 27, 2011 William H. Mischo & Mary C. Schlembach.

Data Management Plan• The digital data to be archived includes

analyzed data – typically data that will go into articles and papers, and the metadata that defines the data that was generated.

• For Engineering Directorate grants, raw data from sensors or other instruments is not required to be archived.

Page 11: Active Data Curation in Libraries: Issues and Challenges ASEE ELD Presentation June 27, 2011 William H. Mischo & Mary C. Schlembach.

Data Management Plan• Maximum of two pages and will not count

against the 15 page limit for proposals.• UIUC Grainger Library has prepared overview

document and template for DMPs. Working on Wizard.

• As part of NSF Ethics CORE Digital Library, working on RCR Requirement database and Wizard.