Top Banner
Beyond text: New roles for libraries DataCite – Persistent links to scientific data using the DOI system Jan Brase, DataCite
33

Beyond text: New roles for libraries DataCite –

Jan 12, 2016

Download

Documents

porter

Beyond text: New roles for libraries DataCite – Persistent links to scientific data using the DOI system. Jan Brase, DataCite. Thousand years ago: science was empirical describing natural phenomena Last few hundred years: theoretical branch using models, generalizations - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Beyond text: New roles for libraries DataCite –

Beyond text: New roles for libraries

DataCite –Persistent links to scientific data using the DOI system

Jan Brase, DataCite

Page 2: Beyond text: New roles for libraries DataCite –

Thousand years ago: science was empirical

describing natural phenomena

Last few hundred years: theoretical branch

using models, generalizations

Last few decades: a computational branch

simulating complex phenomena

Today: data exploration (eScience)

unify theory, experiment, and simulation

Jim Gray, eScience Group, Microsoft Research

2

22.

3

4

a

cG

a

a

2

22.

3

4

a

cG

a

a

Science Paradigms

Page 3: Beyond text: New roles for libraries DataCite –

Scientific Information is more than a journal article or a book

Libraries should open their cataolgues to any kind of information

The catalogue of the future is NOT ONLY a window to the library‘s holding, but

A portal in a net of trusted providers of scientific content

Consequences for Libraries

Page 4: Beyond text: New roles for libraries DataCite –

We do not have it

BUT

We know where you can find

And here is the link to it!

Page 5: Beyond text: New roles for libraries DataCite –

5

Simulation

Simulation

Scientific FilmsScientific Films

3D Objects 3D Objects

Grey Literature Grey Literature

Research Data Research Data

Software Software

Including non-classical publications

Page 6: Beyond text: New roles for libraries DataCite –

Why is this a role for libraries?

• Libraries have a history in bringing scientific information to the public

• Libraries have a tendency to be persistent• A project will be forgotten in 40 years, the

library will very likely still exist then

• Library are very trustworthy organisations

Page 7: Beyond text: New roles for libraries DataCite –

DataCite

Page 8: Beyond text: New roles for libraries DataCite –

High visability of the content

Easy re-use and verification.

Scientific reputation for the collection and documentation of content (Citation Index)

Encouraging the Brussels declaration on STM publishing

Avoiding duplications

Motivation for new research

What if any kind of scientific content would be citable?

Page 9: Beyond text: New roles for libraries DataCite –

Digital Object Identifiers (DOI names) offer a solution

Mostly widely used identifier for scientific articles

Researchers, authors, publishers know how to use them

Put datasets on the same playing field as articles

DatasetYancheva et al (2007). Analyses on sediment of Lake Maar. PANGAEA.doi:10.1594/PANGAEA.587840

URLs are not persistent

(e.g. Wren JD: URL decay in MEDLINE- a 4-year follow-up study. Bioinformatics. 2008, Jun 1;24(11):1381-5).

DOI names for citations

Page 10: Beyond text: New roles for libraries DataCite –

Global consortium carried by local institutions

focused on improving the scholarly infrastructure around datasets and other non-textual information

focused on working with data centres and organisations that hold content

Providing standards, workflows and best-practice

Initially, but not exclusivly based on the DOI system

Founded December 1st 2009 in London

DataCite

Page 11: Beyond text: New roles for libraries DataCite –

Carries

International DOI Foundation

DataCite

MemberInstitution

Data CentreData CentreData Centre

MemberInstitution

Data CentreData CentreData Centre

… Works with

Managing Agent(TIB)

Member

AssociateStakeholder

DataCite structure

Page 12: Beyond text: New roles for libraries DataCite –

High visability of the content

Easy re-use and verification.

Scientific reputation for the collection and documentation of content (Citation Index)

Encouraging the Brussels declaration on STM publishing

Avoiding duplications

Motivation for new research

What if any kind of scientific content would be citable?

Page 13: Beyond text: New roles for libraries DataCite –

Digital Object Identifiers (DOI names) offer a solution

Mostly widely used identifier for scientific articles

Researchers, authors, publishers know how to use them

Put datasets on the same playing field as articles

DatasetYancheva et al (2007). Analyses on sediment of Lake Maar. PANGAEA.doi:10.1594/PANGAEA.587840

URLs are not persistent

(e.g. Wren JD: URL decay in MEDLINE- a 4-year follow-up study. Bioinformatics. 2008, Jun 1;24(11):1381-5).

DOI names for citations

Page 14: Beyond text: New roles for libraries DataCite –

How to achieve this?

Science is global• it needs global standards• Global workflows• Cooperation of global players

Science is carried out locally• By local scientist• Beeing part of local infrastrucures• Having local funders

Page 15: Beyond text: New roles for libraries DataCite –

Global consortium carried by local institutions

focused on improving the scholarly infrastructure around datasets and other non-textual information

focused on working with data centres and organisations that hold content

Providing standards, workflows and best-practice

Initially, but not exclusivly based on the DOI system

Founded December 1st 2009 in London

DataCite

Page 16: Beyond text: New roles for libraries DataCite –

Technische Informationsbibliothek (TIB)Canada Institute for Scientific and Technical

Information (CISTI), California Digital Library, USAPurdue University, USAOffice of Scientific and Technical

Information (OSTI), USALibrary of theTU Delft,

The NetherlandsTechnical Information

Center of DenmarkThe British LibraryZB Med, GermanyZBW, GermanyGesis, GermanyLibrary of the ETH ZürichL’Institut de l’Information Scientifique

et Technique (INIST), FranceSwedish National Data Service (SND)Australian National Data Service (ANDS)Conferenza dei Rettori delle Università Italiane (CRUI)National Research Council of Thailand (NRCT)Hngarian Academy of Sciences

DataCite member

Affiliated member:Digital Curation Center (UK)Microsoft ResearchInteruniversity Consortium for

Political and Social Research (ICPSR) Korea Institute of Science and

Technology Information (KISTI) Bejiing Genomic Institute (BGI)Institute of Electrical and

Electronics Engineers (IEEE)Harvard University LibraryWorld Data System (WDS)GWDG

Page 17: Beyond text: New roles for libraries DataCite –

IRD

( gr av/ 10 cm 3)

Sand

( %)

CaCO3

( %)

TOC

( %)

Radio

( %/ sand)

Smect

( %/ clay)

IRD

( gr av/ 10 cm 3)

Sand

( %)

CaCO3

( %)

TOC

( %)

Radio

( %/ sand)

Smect

( %/ clay)

IRD

( gr av/ 10 cm 3)

Sand

( %)

CaCO3

( %)

TOC

( %)

Radio

( %/ sand)

Smect

( %/ clay)

IRD

( gr av/ 10 cm 3)

Sand

( %)

CaCO3

( %)

TOC

( %)

Radio

( %/ sand)

Smect

( %/ clay)

IRD

( gr av/ 10 cm 3)

Sand

( %)

CaCO3

( %)

TOC

( %)

Radio

( %/ sand)

Smect

( %/ clay)

PS1389-3 PS1390-3 PS1431-1 PS1640-1 PS1648-1

Age (kyr) max. : 233.55 kyr PS1389-3ff

0.0

100.0

200.0

0 20 0 100 0 15 0 0. 5 0 50 0 100 0 20 0 100 0 15 0 0. 5 0 50 0 100 0 20 0 100 0 15 0 0. 5 0 50 0 100 0 20 0 100 0 15 0 0. 5 0 50 0 100 0 20 0 100 0 15 0 0. 5 0 50 0 100

54° 0' 54° 0'

54°30' 54°30'

55° 0' 55° 0'

55°30' 55°30'

11°

11°

12°

12°

13°

13°

14°

14°

15°

15°

World vector shore lineGrain size class KOLP AGrain size class KOEHN2Grain size class KOEHNGeochemistryGrain size class KOLP BGrain size class KOLP DIN20 m

Scale: 1:2695194 at Latitude 0°

Source: Baltic Sea Research Institute, Warnemünde.

Earth quake events => doi:10.1594/GFZ.GEOFON.gfz2009kciu

Climate models => doi:10.1594/WDCC/dphase_mpeps

Sea bed photos => doi:10.1594/PANGAEA.757741

Distributes samples => doi:10.1594/PANGAEA.51749

Medical case studies => doi:10.1594/eaacinet2007/CR/5-270407

Computational model => doi:10.4225/02/4E9F69C011BC8

Audio record => doi:10.1594/PANGAEA.339110

Grey Literature => doi:10.2314/GBV:489185967

Videos => doi:10.3207/2959859860

What type of data are we talking about?

Anything that is the foundation of further reserach

is research data

Data is evidence

Page 18: Beyond text: New roles for libraries DataCite –

Over 2,00,000 DOI names registered so far.

262 data centers.

5,600,000 resolutions in 2013 so far.

DataCite Metadata schema published (in cooperation with all members) http://schema.datacite.org

DataCite MetadataStore

http://search.datacite.org

DataCite in 2013

Page 20: Beyond text: New roles for libraries DataCite –

OAI and Statistics

OAI Harvester

http://oai.datacite.org

DataCite statistics (resolution and registration)

http://stats.datacite.org

Page 21: Beyond text: New roles for libraries DataCite –

DataCite Content Service

Service for displaying DataCite metadata

Different formats (BibTeX, RIS, RDF, etc.)

Content Negotation (through MIME-Typ)

• Access through DOI proxy (http://dx.doi.org)

• First implemented by CNRI and CrossRef:

Documentation:

http://www.crosscite.org/cn/

Page 22: Beyond text: New roles for libraries DataCite –

Content negotiation

Optimized for m2m communication using the accept header of the http protocol

curl -L -H "Accept: MIME_TYPE" http://dx.doi.org/DOI

Try a shortcut out in any webbrowser:

http://data.datacite.org/MIME_TYPE/DOI

http://data.crossref.org/DOI

Page 23: Beyond text: New roles for libraries DataCite –

Resolving to the citation

http://data.datacite.org/application/x-datacite+text/10.5524/100005

Li, j; Zhang, G; Lambert, D; Wang, J (2011): Genomic data from Emperor penguin. GigaScience. http://dx.doi.org/10.5524/100005

Page 24: Beyond text: New roles for libraries DataCite –

Resolving to the RDF metadata

http://data.datacite.org/application/rdf+xml/10.5524/100005

<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:owl="http://www.w3.org/2002/07/owl#" xmlns:j.0="http://purl.org/dc/terms/" > <rdf:Description rdf:about="http://dx.doi.org/10.5524/100005"> <j.0:identifier>10.5524/100005</j.0:identifier> <j.0:creator>Li, J</j.0:creator> <j.0:creator>Zhang, G</j.0:creator> <j.0:creator>Wang, J</j.0:creator> <owl:sameAs>doi:10.5524/100005</owl:sameAs> <owl:sameAs>info:doi/10.5524/100005</owl:sameAs> <j.0:publisher>GigaScience</j.0:publisher> <j.0:creator>Lambert, D</j.0:creator> <j.0:date>2011</j.0:date> <j.0:title>Genomic data from the Emperor penguin (Aptenodytes forsteri)</j.0:title> </rdf:Description></rdf:RDF>

Page 25: Beyond text: New roles for libraries DataCite –

Example of use

This allows persistent identification of RDF statements!

Implemented for all over 45 million CrossRef and DataCite DOI names

Example of use:

DOI Citation Formatter

http://www.crosscite.org/citeproc/

Page 26: Beyond text: New roles for libraries DataCite –

2012: STM, CrossRef and DataCite Joint Statement

1. To improve the availability and findability of research data, the signers encourage authors of research papers to deposit researcher validated data in trustworthy and reliable Data Archives.

2. The Signers encourage Data Archives to enable bi-directional linking between datasets and publications by using established and community endorsed unique persistent identifiers such as database accession codes and DOI's.

3. The Signers encourage publishers and data archives to make visible or increase visibility of these links from publications to datasets and vice versa

26

Page 27: Beyond text: New roles for libraries DataCite –

Example

The dataset:Storz, D et al. (2009): Planktic foraminiferal flux and faunal composition of sediment trap

L1_K276 in the northeastern Atlantic. http://dx.doi.org/10.1594/PANGAEA.724325

Is supplement to the article:Storz, David; Schulz, Hartmut; Waniek, Joanna J; Schulz-Bull, Detlef;

Kucera, Michal (2009): Seasonal and interannual variability of the planktic foraminiferal flux in the vicinity of the Azores Current.

Deep-Sea Research Part I-Oceanographic Research Papers, 56(1), 107-124,

http://dx.doi.org/10.1016/j.dsr.2008.08.009

Page 28: Beyond text: New roles for libraries DataCite –

Next steps

ODIN project with ORCID.

http://datacite.labs.orcid-eu.org/

MoU with Thomson reuters to cooperate on data citation index

DataCite plugin for next D-Space release (early 2014)

Page 29: Beyond text: New roles for libraries DataCite –

Let us get back to libraries

Page 30: Beyond text: New roles for libraries DataCite –

The wave

Growth of Information –

Diversity of media types and formats

User requirements – e. g. :Science 2.0, collaborativenetworks, social media

Page 31: Beyond text: New roles for libraries DataCite –

A threat?

Information overload is only a problem for manual curation.

Google is not complaining about data deluge—they’re constantly trying to get more data.

The more data you throw, the better the filter gets.

To develop and maintain these tools is a classical tasks for libraries!

Don’t turn off the taps, build boats.

Page 32: Beyond text: New roles for libraries DataCite –

It is not only a challenge …

… it is an opportunity

Libraries should ride the wave …

Page 33: Beyond text: New roles for libraries DataCite –

Thank you!