The Data Management Ecosystem
4 A p r i l 2 0 1 3
U n i v e r s i t y o f C a l i f o r n i a C u r a ti o n C e n t e rC a l i f o r n i a D i g i t a l L i b r a r y
The research data problem
• Journal article
– Uniquely and persistently identified
– Concept of “publish”
– Multiple copies
– Easily findable
– Services: impact metrics, citation tracking, etc.
• Research data
– Nope
– Not really
– Typically one
– Difficult
– Nope
Research data is seen as a second-class citizen in the
scholarly record.
An ecosystem of inter-dependent partners
Besides data repository and publisher partners...• researchers• educators• citizen science groups• funders• tenure and promotion committees
Libraries as neutral connection partners
Where can libraries make a difference?
Research
CollectSave
PublishShare
CreateKnowledge
Research & Scholarship Lifecycle
Collect > Publish > Share > Save > Research
Capture today’s web; build tomorrow’s archives
Create, edit, share, and save data management plans
Open source curation add-in for Microsoft Excel
Collect > Publish > Share > Save > Research
Create and manage persistent identifiers: ARKs, DOIs, etc.
An infrastructure to publish and get credit for sharing research data
Collect > Publish > Share > Save > Research
Curation repository: store, manage, preserve, and share research data
Open deposit, open access repository for spreadsheet data
Data Observation Network for Earth
Collect > Publish > Share > Save > Research
What’s missing to complete the “incentive” circuit?
• Impact measures, citation tracking
“Connecting the data to the research it informs”
Altmetrics tools to measure non-traditional products and uses , etc.,
Stable storage: Merritt repository• Curation repository open to the UC
community and beyond
• Discipline / content agnostic
• Micro-services architecture
• Easy-to-use UI or API
• Hosted or locally deployedPrimary Functions
1. Deposit
2. Manage (metadata, versions, etc)
3. Access (expose)
4. Share (with other researchers)
5. Preserve
EZID: Long term identifiers made easy
• Precise identification of a dataset (DOI or ARK)
• Credit to data producers and data publishers
• A link from the traditional literature to the data (DataCite)
• Exposure and research metrics for datasets(Web of Knowledge, Google)
Primary Functions1. Create persistent identifiers2. Manage identifiers (and associated
metadata) over time3. Resolve identifiers
Take control of the management and distribution of your research, share and get credit for it, and build your reputation through its collection and documentation
Discovery: DataCite consortium• Technische Informationsbibliothek (TIB),
Germany
• Australian National Data Service (ANDS)
• The British Library
• California Digital Library, USA
• Canada Institute for Scientific and
Technical Information (CISTI)
• L’Institut de l’Information Scientifique et
Technique (INIST), France
• Library or the ETH Zürich
• Library of TU Delft, The Netherlands
• Office of Scientific and Technical
Information, US Department of Energy
• Purdue University, USA
• Technical Information Center of Denmark
Member Nodes
• diverse institutions
• serve local community
• provide resources for managing their data
New distributed framework
Coordinating Nodes• retain complete metadata
catalog • subset of all data• perform basic indexing• provide network-wide
services• ensure data availability
(preservation) • provide replication
services
Flexible, scalable, sustainable network