Top Banner
UKOLN is supported by: Enhancing access to research data: the challenge of crystallography Rachel Heery, Monica Duke, Michael Day UKOLN, University of Bath Leslie Carr, Simon Coles University of Southampton www.bath.ac.u k A centre of expertise in digital informaion management JCDL 2005, June 7-11, Denver
25

UKOLN is supported by: Enhancing access to research data: the challenge of crystallography Rachel Heery, Monica Duke, Michael Day UKOLN, University of.

Mar 28, 2015

Download

Documents

Elijah Smith
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: UKOLN is supported by: Enhancing access to research data: the challenge of crystallography Rachel Heery, Monica Duke, Michael Day UKOLN, University of.

                                                             

UKOLN is supported by:

Enhancing access to research data: the challenge of crystallography

Rachel Heery, Monica Duke, Michael Day

UKOLN, University of Bath

Leslie Carr, Simon Coles

University of Southampton

www.bath.ac.uk

A centre of expertise in digital informaion management

JCDL 2005, June 7-11, Denver

Page 2: UKOLN is supported by: Enhancing access to research data: the challenge of crystallography Rachel Heery, Monica Duke, Michael Day UKOLN, University of.

                                                             

Enhancing access to research data: overview

• Crystallography as an exemplar

• Impact of digital technologies on scientific research process

• Need new modes of data curation

• eBank project: applying digital library techniques to support data curation

• Next steps

Page 3: UKOLN is supported by: Enhancing access to research data: the challenge of crystallography Rachel Heery, Monica Duke, Michael Day UKOLN, University of.

                                                             

Changes in scientific research process

• Increasing data volumes from eScience / Grid-enabled / cyber-infrastructure applications, “big science”

• Changing research methods: high througput technologies, automation, ‘smart labs’

• Potential for re-use of data, new inter-disciplinary research

• Different types of data: observational data, experimental data, computational data: different stewardship requirements

Page 4: UKOLN is supported by: Enhancing access to research data: the challenge of crystallography Rachel Heery, Monica Duke, Michael Day UKOLN, University of.

Data Overload!

How do we disseminate?

EPSRC National Crystallography

Service

The data deluge: crystallography

Page 5: UKOLN is supported by: Enhancing access to research data: the challenge of crystallography Rachel Heery, Monica Duke, Michael Day UKOLN, University of.

Data overload & the publication bottleneck

Cl

Cl

Cl

Cl

Cl

Cl

ClCl Cl

Cl

Cl

ClCl

O

O

O

O

N

N

N

N

N+

O

O

O

N+

O

O

O

25,000,000

2,000,000

300,000

Page 6: UKOLN is supported by: Enhancing access to research data: the challenge of crystallography Rachel Heery, Monica Duke, Michael Day UKOLN, University of.

Current Publishing Process• Journal articles: aims, ideas, context, conclusions – only most significant data

• Raw & underlying data required by peers not readily available

Page 7: UKOLN is supported by: Enhancing access to research data: the challenge of crystallography Rachel Heery, Monica Duke, Michael Day UKOLN, University of.

                                                             

Context: existing data repositories• National data archives:

– UK Data Archive, Arts and Humanities Data Service, US National Archives and Records Administration (NARA), Atlas Datastore

• Discipline specific archives: – GenBank, Protein Data Bank

• Crystallography archives– Cambridge Crystallographic Data Centre (Cambridge

Structural Database) , Indiana University Molecular Structure Center (Crystal Data Server, Reciprocal Net), FIZ Karlsruhe (Inorganic crystals), Toth Information Systems (CHRYSTMET)

• Journals require deposit of data to support articles– Typically deposit of summary data…. partial coverage

Page 8: UKOLN is supported by: Enhancing access to research data: the challenge of crystallography Rachel Heery, Monica Duke, Michael Day UKOLN, University of.

Crystallography workflowRAW DATA DERIVED DATA RESULTS DATA

• Initialisation: mount new sample on diffractometer & set up data collection

• Collection: collect data• Processing: process and correct images• Solution: solve structures• Refinement: refine structure• CIF: produce CIF (Crystallographic Information File)• Validation: chemical & crystallographic checks

Page 9: UKOLN is supported by: Enhancing access to research data: the challenge of crystallography Rachel Heery, Monica Duke, Michael Day UKOLN, University of.

                                                             

eBank UK project overview

• JISC funded in 2003, now in Phase 2 to 2006• Joint effort between crystallographers, computer

scientists, digital library researchers• Investigating contribution of existing digital library

technologies to enable ‘publication at source’• Partners have interest in dissemination of

chemistry research data, open access, OAI, institutional repositories http://www.ukoln.ac.uk/projects/ebank-uk/

Page 10: UKOLN is supported by: Enhancing access to research data: the challenge of crystallography Rachel Heery, Monica Duke, Michael Day UKOLN, University of.

                                                             

eBank project team

University of Bath, UKOLN• Michael Day, Monica Duke, Rachel Heery, Liz

Lyon, Traugott KochUniversity of Southampton, School of Chemistry• Simon Coles, Jeremy Frey, Mike HursthouseUniversity of Southampton, School of Electronics

and Computer Science• Leslie Carr, Chris GutteridgeUniversity of Manchester, PSIgate• John Blunden-Ellis

Page 11: UKOLN is supported by: Enhancing access to research data: the challenge of crystallography Rachel Heery, Monica Duke, Michael Day UKOLN, University of.

                                                             

eBank phase one: achievements• Gathered requirements from crystallographers • Established pilot institutional repository for

crystallography data at Southampton with web interface

• Developed a demonstrator aggregator service at UKOLN (CCDC exploring aggregation service)

• Developed appropriate schema • Demonstrated a search interface as an embedded

service at PSIgate portal• Demonstrated an added value service linking

research data to papers (one-off)

Page 12: UKOLN is supported by: Enhancing access to research data: the challenge of crystallography Rachel Heery, Monica Duke, Michael Day UKOLN, University of.

                                                             

Institutional repositories…publication at source

• Institution establishes repository(s)• Institution pro-actively supports deposit

process• OAI provides basis for interoperability • Potential for added value services

• And/Or ….international subject based archives?

Page 13: UKOLN is supported by: Enhancing access to research data: the challenge of crystallography Rachel Heery, Monica Duke, Michael Day UKOLN, University of.

                                                             

Crystallography good fit….

• Crystallography has well defined data creation workflow

• Tradition of sharing using standard file format

• Crystallography Information File (CIF)

• What about other chemistry sub-disciplines? other scientific disciplines?

Page 14: UKOLN is supported by: Enhancing access to research data: the challenge of crystallography Rachel Heery, Monica Duke, Michael Day UKOLN, University of.

Data Flow in eBank UK

OA

I-P

MH

Submit

Store/link

Harvest (XML)

Index and Search

Data files

Metadatapresent

HTML

present

HTML

Institutional repository

eBank aggregator

Create

Page 15: UKOLN is supported by: Enhancing access to research data: the challenge of crystallography Rachel Heery, Monica Duke, Michael Day UKOLN, University of.

                                                             

Southampton digital repository

http://ecrystals.chem.soton.ac.uk

Page 16: UKOLN is supported by: Enhancing access to research data: the challenge of crystallography Rachel Heery, Monica Duke, Michael Day UKOLN, University of.

Access to ALL underlying data

Page 18: UKOLN is supported by: Enhancing access to research data: the challenge of crystallography Rachel Heery, Monica Duke, Michael Day UKOLN, University of.

Embedded search service at PSIgate

PSIgate subject gateway:service provider

Page 19: UKOLN is supported by: Enhancing access to research data: the challenge of crystallography Rachel Heery, Monica Duke, Michael Day UKOLN, University of.

                                                             

Schema for records made available for harvesting• Data holding (collection of files associated with

experiment)• Qualified Dublin Core data elements plus additional chemical

properties – Empirical formula– International Chemical Identifier (InChI)– Compound Class

• Individual data files• Separate records for stage status of each file

• Description set wrapped into one XML record using METS

• Research metadata/data as a complex object

Page 20: UKOLN is supported by: Enhancing access to research data: the challenge of crystallography Rachel Heery, Monica Duke, Michael Day UKOLN, University of.

ebank_dc record (XML)

Crystal structure (data holding)

Crystal structure report (HTML)

Dataset

Dataset

Institutional repositories

eBank UK aggregator service

ePrint UK aggregator service

Other aggregators and services

DepositHarvesting OAI-PMH

ebank_dc

Harvesting OAI-PMH oai_dc,ebank_dc

Harvesting OAI-PMH oai_dc

Dataset

dc:identifier

dcterms:references

Linking

dc:type=“CrystalStructure”

Model input Andy Powell, UKOLN.

Eprint oai_dc record (XML)

dcterms:isReferencedBy

dc:type=“Eprint” and/or ”Text”

eBank data model

Eprint “jump-off” page (HTML)

dc:identifierEprint manifestation (e.g. PDF)

Linking

Dep

osit

Page 21: UKOLN is supported by: Enhancing access to research data: the challenge of crystallography Rachel Heery, Monica Duke, Michael Day UKOLN, University of.

                                                             

Creating the metadata

• Potential to embed ‘deposit and disseminate’ into workflow of chemist in automated way

Page 22: UKOLN is supported by: Enhancing access to research data: the challenge of crystallography Rachel Heery, Monica Duke, Michael Day UKOLN, University of.

Data Collection

Diffraction

Unit Cell

Success

Strategy

Data Collection

Data Process

System Y

PreScans

Yes

Yes

BruNo Mount

BruNo Unmount

Setup via GUI

Sample Tray

No

No

Page 23: UKOLN is supported by: Enhancing access to research data: the challenge of crystallography Rachel Heery, Monica Duke, Michael Day UKOLN, University of.

                                                             

eBank phase two work areas

• Sub-disciplines of chemistry and physical sciences

• Pursue generic data model• Use of identifiers for citing datasets• Subject approach to discovering research

data• Access to research data in teaching and

learning context• Liaise with other digital repository initiatives

Page 24: UKOLN is supported by: Enhancing access to research data: the challenge of crystallography Rachel Heery, Monica Duke, Michael Day UKOLN, University of.

                                                             

For the future…

• Who provides added value services?– Authority files, automated subject indexing, annotation,

data mining, visualisation

• What are the preservation issues?– UK Digital Curation Centre http://www.dcc.ac.uk

– National Science Board Draft report on long-lived data collections http://www.nsf.gov/nsb/meetings/2005/LLDDC_draftreport.pdf

• How to manage complex objects descriptions within OAI

• Digital curation of research data presents new roles for scientists, computer scientists, data managers…. ‘data scientists’

Page 25: UKOLN is supported by: Enhancing access to research data: the challenge of crystallography Rachel Heery, Monica Duke, Michael Day UKOLN, University of.

                                                             

Thank you.Comments, questions?

http://www.ukoln.ac.uk/projects/ebank-uk/

Acnowledgement to all project partners for their contributions to this presentation.