Top Banner
© S.J. Coles 2006 Enabling the reusability of scientific data: Experiences with designing an open access infrastructure for sharing datasets Simon J. Coles EPSRC National Crystallography Service School of Chemistry University of Southampton
16

© S.J. Coles 2006 Usability WS, NeSC Jan 06 Enabling the reusability of scientific data: Experiences with designing an open access infrastructure for sharing.

Mar 28, 2015

Download

Documents

Riley Byrne
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: © S.J. Coles 2006 Usability WS, NeSC Jan 06 Enabling the reusability of scientific data: Experiences with designing an open access infrastructure for sharing.

© S.J. Coles 2006

Enabling the reusability of scientific data: Experiences with designing an

open access infrastructure for sharing datasets

Simon J. Coles

EPSRC National Crystallography Service

School of Chemistry

University of Southampton

Page 2: © S.J. Coles 2006 Usability WS, NeSC Jan 06 Enabling the reusability of scientific data: Experiences with designing an open access infrastructure for sharing.

© S.J. Coles 2006

Data & the Publication Problem

Cl

Cl

Cl

Cl

Cl

Cl

ClCl Cl

Cl

Cl

ClCl

O

O

O

O

N

N

N

N

N+

O

O

O

N+

O

O

O

25,000,000

2,000,000

450,000

Page 3: © S.J. Coles 2006 Usability WS, NeSC Jan 06 Enabling the reusability of scientific data: Experiences with designing an open access infrastructure for sharing.

© S.J. Coles 2006

A Different Approach to Data Publication?

Underlying dataIntellect & Interpretation

Page 4: © S.J. Coles 2006 Usability WS, NeSC Jan 06 Enabling the reusability of scientific data: Experiences with designing an open access infrastructure for sharing.

© S.J. Coles 2006

Requirements

• Capture of all digital data and information generated during the course of an experiment

• Data validation• Adding value• Archival system for data with attached

bibliographic and chemical metadata• Automatic report generation• Schema and protocols for publication and

dissemination of a dataset

Page 5: © S.J. Coles 2006 Usability WS, NeSC Jan 06 Enabling the reusability of scientific data: Experiences with designing an open access infrastructure for sharing.

© S.J. Coles 2006

Open Access Crystal Structure Archive

ecrystals.chem.soton.ac.uk

Page 6: © S.J. Coles 2006 Usability WS, NeSC Jan 06 Enabling the reusability of scientific data: Experiences with designing an open access infrastructure for sharing.

© S.J. Coles 2006

Access to the Underlying Data

Page 7: © S.J. Coles 2006 Usability WS, NeSC Jan 06 Enabling the reusability of scientific data: Experiences with designing an open access infrastructure for sharing.

© S.J. Coles 2006

Publicising Content

Page 8: © S.J. Coles 2006 Usability WS, NeSC Jan 06 Enabling the reusability of scientific data: Experiences with designing an open access infrastructure for sharing.

© S.J. Coles 2006

Harvesting, Linking and Aggregating

Page 9: © S.J. Coles 2006 Usability WS, NeSC Jan 06 Enabling the reusability of scientific data: Experiences with designing an open access infrastructure for sharing.

© S.J. Coles 2006

Usability: Quality & Uniformity of data• Different laboratories, practices & instruments present

a heterogeneous body of data

• Publish according to IUCr ratified schema

• To support publication according to this schema a toolbox add-on to the archive has been developed

• Toolbox requires 2 mandatory files only & is capable of performing file format conversions and generate value added files

Page 10: © S.J. Coles 2006 Usability WS, NeSC Jan 06 Enabling the reusability of scientific data: Experiences with designing an open access infrastructure for sharing.

© S.J. Coles 2006

Usability: Ease of Deposition & Metadata Quality

• Minimal number of manual metadata entries – many can be hardwired into the system

• Deposition guidelines initially prepared by students to provide impartial feedback

• Full documentation and in-line help/examples• Restrained lists, e.g. Keywords• Data deposited automatically by toolbox• Automated generation of metadata for report

and OAI interface

Page 11: © S.J. Coles 2006 Usability WS, NeSC Jan 06 Enabling the reusability of scientific data: Experiences with designing an open access infrastructure for sharing.

© S.J. Coles 2006

Usability: Data Validation

• Peer review removed from self deposit publication

• Simple checks for consistency made by the toolbox• Checks for crystallographic integrity made through a

web service (IUCr, ‘CHECKCIF’)• Introduction of data ‘editor’ for the archive; a

deposition must be signed-off by a recognised professional before going live

• Quality indicators automatically taken from dataset and presented in HTML jump-off page

Page 12: © S.J. Coles 2006 Usability WS, NeSC Jan 06 Enabling the reusability of scientific data: Experiences with designing an open access infrastructure for sharing.

© S.J. Coles 2006

Usability: Identifiers

• URL of deposited dataset provides an identifier• Persistent only if the Institutional support model is

accepted / adopted

• Signed-up to an agency to register metadata relating to datasets with a DOI

• Pay registry to ensure that DOI always resolves to associated dataset (10cents to register 1cent per annum to maintain)

• InChI chemical identifier - a unique text descriptor for a molecule

Page 13: © S.J. Coles 2006 Usability WS, NeSC Jan 06 Enabling the reusability of scientific data: Experiences with designing an open access infrastructure for sharing.

© S.J. Coles 2006

Usability: Dissemination & Aggregation

• OAI metadata schema; ratified by IUCr & chemical community

• OAI covers bibliographic terms; must introduce chemical terms

• Both library and subject specific aggregators satisfied

• Chemical linking; InChI, chemical classifications and restricted keywords list

Page 14: © S.J. Coles 2006 Usability WS, NeSC Jan 06 Enabling the reusability of scientific data: Experiences with designing an open access infrastructure for sharing.

© S.J. Coles 2006

Usability: Endorsement

• Feedback during development from technical publishing arm of IUCr

• Designed for automatic incorporation into CSD (global database operated by CCDC)

• Accepted by Executive Committee of IUCr

• Reuse of data achieved in collaboration with Leverhulme Centre for Molecular Informatics

Page 15: © S.J. Coles 2006 Usability WS, NeSC Jan 06 Enabling the reusability of scientific data: Experiences with designing an open access infrastructure for sharing.

© S.J. Coles 2006

Usability: Community Uptake

• Southampton archive about to publish routinely via the archive

• Five crystallography laboratories in UK agreed to adopt philosophy, install and populate archives

• CCDC will harvest required data from all archives

• IUCr will harvest and curate all data• Develop aggregator services in collaboration

with IUCr

Page 16: © S.J. Coles 2006 Usability WS, NeSC Jan 06 Enabling the reusability of scientific data: Experiences with designing an open access infrastructure for sharing.

© S.J. Coles 2006

Usability: The Next Challenges

• Full acceptance by chemical community– Validation worries– Curation worries– The requirement for as many peer reviewed

publications as possible (despite quality)• Full acceptance by wider chemistry publishing

community– Loss of control over underlying data– Faith in Open Archives replacing experimental

descriptions in articles• Development of fully functional aggregator services