Top Banner
© S.J. Coles 2006 Digital Repositories as a Mechanism for the Capture, Management and Dissemination of Chemical Data Simon Coles School of Chemistry, University of Southampton, U.K. [email protected]
24

© S.J. Coles 2006 Digital Repositories as a Mechanism for the Capture, Management and Dissemination of Chemical Data Simon Coles School of Chemistry, University.

Mar 28, 2015

Download

Documents

Brian O'Neil
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: © S.J. Coles 2006 Digital Repositories as a Mechanism for the Capture, Management and Dissemination of Chemical Data Simon Coles School of Chemistry, University.

                                                             

© S.J. Coles 2006

Digital Repositories as a Mechanism for the Capture, Management and

Dissemination of Chemical Data

Simon Coles

School of Chemistry,

University of Southampton, U.K.

[email protected]

Page 2: © S.J. Coles 2006 Digital Repositories as a Mechanism for the Capture, Management and Dissemination of Chemical Data Simon Coles School of Chemistry, University.

                                                             

© S.J. Coles 2006

A Data-Rich Subject – the Crystallography Problem

Cl

Cl

Cl

Cl

Cl

Cl

ClCl Cl

Cl

Cl

ClCl

O

O

O

O

N

N

N

N

N+

O

O

O

N+

O

O

O

30,000,000

1.5,000,000

450,000

Page 3: © S.J. Coles 2006 Digital Repositories as a Mechanism for the Capture, Management and Dissemination of Chemical Data Simon Coles School of Chemistry, University.

                                                             

© S.J. Coles 2006

Funding Body Viewpoint

Page 4: © S.J. Coles 2006 Digital Repositories as a Mechanism for the Capture, Management and Dissemination of Chemical Data Simon Coles School of Chemistry, University.

                                                             

© S.J. Coles 2006

Open Access as the Answer?

Page 5: © S.J. Coles 2006 Digital Repositories as a Mechanism for the Capture, Management and Dissemination of Chemical Data Simon Coles School of Chemistry, University.

                                                             

© S.J. Coles 2006

Separating Data from Interpretations

Underlying data

Intellect & Interpretation

Page 6: © S.J. Coles 2006 Digital Repositories as a Mechanism for the Capture, Management and Dissemination of Chemical Data Simon Coles School of Chemistry, University.

                                                             

© S.J. Coles 2006

Research & e-Science workflows

Aggregator services: national, commercial

Repositories : institutional, e-prints, subject, data, learning objects

Data curation: databases & databanks

Validation

Harvestingmetadata

Data creation / capture / gathering: laboratory experiments, Grids, fieldwork, surveys, media

Deposit / self-archiving

Peer-reviewed publications: journals, conference proceedings

Publication

Validation

Data analysis, transformation, mining, modelling

Searching , harvesting, embedding

Presentation services: subject, media-specific, data, commercial portals

Resource discovery, linking, embedding

Linking

The scholarly knowledge cycle.

Liz Lyon, eBankUK article. Ariadne, July 2003.

Page 7: © S.J. Coles 2006 Digital Repositories as a Mechanism for the Capture, Management and Dissemination of Chemical Data Simon Coles School of Chemistry, University.

                                                             

© S.J. Coles 2006

Workflow Capture and Analysis

RAW DATA DERIVED DATA RESULTS DATA

Page 8: © S.J. Coles 2006 Digital Repositories as a Mechanism for the Capture, Management and Dissemination of Chemical Data Simon Coles School of Chemistry, University.

                                                             

© S.J. Coles 2006

The eCrystals Data Archive

http://ecrystals.chem.soton.ac.uk

Page 9: © S.J. Coles 2006 Digital Repositories as a Mechanism for the Capture, Management and Dissemination of Chemical Data Simon Coles School of Chemistry, University.

                                                             

© S.J. Coles 2006

Access to the underlying data

Page 10: © S.J. Coles 2006 Digital Repositories as a Mechanism for the Capture, Management and Dissemination of Chemical Data Simon Coles School of Chemistry, University.

                                                             

© S.J. Coles 2006

Metadata Publication

• Using simple Dublin Core • Crystal structure• Title (Systematic IUPAC Name)• Authors• Affiliation• Creation Date

• Additional chemical information through Qualified Dublin Core• Empirical formula• International Chemical Identifier (InChI)• Compound Class & Keywords

• Specifies which ‘datasets’ are present in an entry

• DOI

• Rights

• Citation

http://www.ukoln.ac.uk/projects/ebank-uk/schemas/

Page 11: © S.J. Coles 2006 Digital Repositories as a Mechanism for the Capture, Management and Dissemination of Chemical Data Simon Coles School of Chemistry, University.

                                                             

© S.J. Coles 2006

Metadata and Data Quality Control Data manipulation toolbox

Associated Metadata

Value added

Format conversion

Page 12: © S.J. Coles 2006 Digital Repositories as a Mechanism for the Capture, Management and Dissemination of Chemical Data Simon Coles School of Chemistry, University.

                                                             

© S.J. Coles 2006

Harvesting & Aggregating: Google

Coles, S.J., Day, N.E., Murray-Rust, P., Rzepa, H.S., Zhang, Y., Org. Biomol. Chem., 2005, (10),1832-1834. DOI: 10.1039/b502828k

Page 13: © S.J. Coles 2006 Digital Repositories as a Mechanism for the Capture, Management and Dissemination of Chemical Data Simon Coles School of Chemistry, University.

                                                             

© S.J. Coles 2006

Harvesting: OAIster

Page 14: © S.J. Coles 2006 Digital Repositories as a Mechanism for the Capture, Management and Dissemination of Chemical Data Simon Coles School of Chemistry, University.

                                                             

© S.J. Coles 2006

Linking and aggregating

Page 15: © S.J. Coles 2006 Digital Repositories as a Mechanism for the Capture, Management and Dissemination of Chemical Data Simon Coles School of Chemistry, University.

                                                             

© S.J. Coles 2006

Embedded in a science portal

Page 16: © S.J. Coles 2006 Digital Repositories as a Mechanism for the Capture, Management and Dissemination of Chemical Data Simon Coles School of Chemistry, University.

                                                             

© S.J. Coles 2006

eBank/eCrystals Future

• Full embedding in daily laboratory practice• Roll out to other institutions• Full support from host institution• Community acceptance• Federation of repositories• Specialised aggregator services (Crystallography)• Generic aggregator services (Chemistry / Science)

Page 17: © S.J. Coles 2006 Digital Repositories as a Mechanism for the Capture, Management and Dissemination of Chemical Data Simon Coles School of Chemistry, University.

                                                             

© S.J. Coles 2006

The Information Environment

Institutional Data Sources

Page 18: © S.J. Coles 2006 Digital Repositories as a Mechanism for the Capture, Management and Dissemination of Chemical Data Simon Coles School of Chemistry, University.

                                                             

© S.J. Coles 2006

Data and Information Loss

Page 19: © S.J. Coles 2006 Digital Repositories as a Mechanism for the Capture, Management and Dissemination of Chemical Data Simon Coles School of Chemistry, University.

                                                             

© S.J. Coles 2006

Repositories Supporting Laboratory Working Practice

• eBank-UK concentrating on dissemination of data compiled once a study is complete

• To fully assure quality and accuracy of metadata essential to capture as it is generated

• Repository architecture has the potential to store data and metadata as they are generated

• Repository also has capability to manage data and provide report generation and analysis tools

Page 20: © S.J. Coles 2006 Digital Repositories as a Mechanism for the Capture, Management and Dissemination of Chemical Data Simon Coles School of Chemistry, University.

                                                             

© S.J. Coles 2006

Laboratory Repositories

Page 21: © S.J. Coles 2006 Digital Repositories as a Mechanism for the Capture, Management and Dissemination of Chemical Data Simon Coles School of Chemistry, University.

                                                             

© S.J. Coles 2006

Workflow Analysis

Researcher, Compound, Experiment type, Timestamp

Sample preparation

Data acquisition

Deposit current dataset

Analyse: Refine experiment?

Complete experiment deposit

Page 22: © S.J. Coles 2006 Digital Repositories as a Mechanism for the Capture, Management and Dissemination of Chemical Data Simon Coles School of Chemistry, University.

                                                             

© S.J. Coles 2006

The R4L Repository

Deposit

Search

Page 23: © S.J. Coles 2006 Digital Repositories as a Mechanism for the Capture, Management and Dissemination of Chemical Data Simon Coles School of Chemistry, University.

                                                             

© S.J. Coles 2006

R4L Essentials

• Continual deposition and metadata capture from the very start of the experiment

• Prior Assertion Service – a legally sound protection of IPR

• Laboratory data management and analysis of heterogeneous datasets

• Production of reports – Individual experiment• Production of reports – Study involving several

experiments• Panel of publishers to direct requirements for

data publication

Page 24: © S.J. Coles 2006 Digital Repositories as a Mechanism for the Capture, Management and Dissemination of Chemical Data Simon Coles School of Chemistry, University.

                                                             

© S.J. Coles 2006

Something to take home!

• Open access to data does not harm or hinder publication of ideas and interpretation in a conventional fashion

• Open access to data, when linked to a publication containing interpretations, enhances the value of the publication

• Open access to ALL data underpinning a publication enables efficient assessment and reuse of that data

• Essential to embed repository deposition into ALL aspects of (laboratory) working procedures