RDAP13 John Kunze: The Data Management Ecosystem

The Data Management Ecosystem

4 A p r i l 2 0 1 3

U n i v e r s i t y o f C a l i f o r n i a C u r a ti o n C e n t e rC a l i f o r n i a D i g i t a l L i b r a r y

The research data problem

• Journal article

– Uniquely and persistently identified

– Concept of “publish”

– Multiple copies

– Easily findable

– Services: impact metrics, citation tracking, etc.

• Research data

– Nope

– Not really

– Typically one

– Difficult

– Nope

Research data is seen as a second-class citizen in the

scholarly record.

An ecosystem of inter-dependent partners

Besides data repository and publisher partners...• researchers• educators• citizen science groups• funders• tenure and promotion committees

Libraries as neutral connection partners

Where can libraries make a difference?

Research

CollectSave

PublishShare

CreateKnowledge

Research & Scholarship Lifecycle

Collect > Publish > Share > Save > Research

Capture today’s web; build tomorrow’s archives

Create, edit, share, and save data management plans

Open source curation add-in for Microsoft Excel


Create and manage persistent identifiers: ARKs, DOIs, etc.

An infrastructure to publish and get credit for sharing research data


Curation repository: store, manage, preserve, and share research data

Open deposit, open access repository for spreadsheet data

Data Observation Network for Earth


What’s missing to complete the “incentive” circuit?

• Impact measures, citation tracking

“Connecting the data to the research it informs”

Altmetrics tools to measure non-traditional products and uses , etc.,

Stable storage: Merritt repository• Curation repository open to the UC

community and beyond

• Discipline / content agnostic

• Micro-services architecture

• Easy-to-use UI or API

• Hosted or locally deployedPrimary Functions

1. Deposit

2. Manage (metadata, versions, etc)

3. Access (expose)

4. Share (with other researchers)

5. Preserve

EZID: Long term identifiers made easy

• Precise identification of a dataset (DOI or ARK)

• Credit to data producers and data publishers

• A link from the traditional literature to the data (DataCite)

• Exposure and research metrics for datasets(Web of Knowledge, Google)

Primary Functions1. Create persistent identifiers2. Manage identifiers (and associated

metadata) over time3. Resolve identifiers

Take control of the management and distribution of your research, share and get credit for it, and build your reputation through its collection and documentation

Discovery: DataCite consortium• Technische Informationsbibliothek (TIB),

Germany

• Australian National Data Service (ANDS)

• The British Library

• California Digital Library, USA

• Canada Institute for Scientific and

Technical Information (CISTI)

• L’Institut de l’Information Scientifique et

Technique (INIST), France

• Library or the ETH Zürich

• Library of TU Delft, The Netherlands

• Office of Scientific and Technical

Information, US Department of Energy

• Purdue University, USA

• Technical Information Center of Denmark

Member Nodes

• diverse institutions

• serve local community

• provide resources for managing their data

New distributed framework

Coordinating Nodes• retain complete metadata

catalog • subset of all data• perform basic indexing• provide network-wide

services• ensure data availability

(preservation) • provide replication

services

Flexible, scalable, sustainable network

The rest of the story

www.cdlib.org/uc3

[email protected]

[email protected] for service questions

RDAP13 John Kunze: The Data Management Ecosystem

Documents

RDAP13 John Kunze: The Data Management Ecosystem