Federation eCrystals Federation: Open Repositories for Data-driven Science Dr Liz Lyon, UKOLN, University of Bath, UK Dr Simon Coles, University of Southampton,

Post on 28-Mar-2015

215 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

Transcript

Federation

eCrystals Federation: Open Repositories for Data-driven Science

Dr Liz Lyon, UKOLN, University of Bath, UK

Dr Simon Coles, University of Southampton, UK

Chemical Informatics Workshop, Manchester, March 2008

This work is licensed under a Creative Commons LicenceAttribution-ShareAlike 3.0

http://creativecommons.org/licenses/by-sa/3.0/

Themes1. Context: Institutional data repositories

crystallography exemplar2. Scale: repository federations3. Longevity: Digital curation and preservation4. Integration: Semantic challenges

eBank Project – building the eCrystals Data Repository

ePrints platform @ Southampton

Institutional Repository exemplar

Embedded in workflow

http://ecrystals.chem.soton.ac.uk

Started Sept 2003

Scholarly knowledge cycle context

UKOLN-led interdisciplinary team

Scaling Up Report

Phase 3 findings:

Data policy should reflect lab practice & institutional model

Diverse lab practice

LIMS proprietary formats

Data quality criteria/validation

“Prior publication” problem

We need automated assignment of terms for data discovery

No discipline preservation model

nλ = 2 d sinθ

TheThe

eCrystals Repository

ePrints.org v3.0

Repository Foundations • Using simple Dublin Core

• Crystal structure• Title (Systematic IUPAC Name)• Authors• Affiliation• Creation Date

• Additional chemical information through Qualified Dublin Core• Empirical formula• International Chemical Identifier (InChI)• Compound Class & Keywords

• Specifies which ‘datasets’ are present in an entry

• Application Profile http://www.ukoln.ac.uk/projects/ebank-uk/schemas/

• DOI links http://dx.doi.org/10.1594/ecrystals.chem.soton.ac.uk/145

• Rights & Citation http://ecrystals.chem.soton.ac.uk/rights.html

Learned society + subject repository support

Federation interoperability & linking services

• Roll-out in 2 phases led by University of Southampton• Establish Federation policies, application profile, mappings• Bi-directional links with derived articles in “publisher

repositories”, IUCr, Royal Society of Chemistry (RSC), Chemistry Central: scholarly knowledge cycle

• StOReLink project - Test linking options: StORe middleware and CLADDIER

• OAI-ORE Testbed

eChemistry project

Laboratory practice & workflow• Community standard CIF• Mixed lab practice – central service

facility versus single “staff crystallographer” in department

• Achieve end-to-end workflow• Challenge of instrument manufacturers

with proprietary formats• “Repository Lite” for smaller lab

operations?

X-ray diffractometers

eBank-UK Phase 3 Curation & Preservation Study: Sustainability issueshttp://www.ukoln.ac.uk/projects/ebank-uk/curation

/

Examined four main areas1. Audit and certification (TRAC,

DRAMBORA, NESTOR, ISO International repository audit and certification BOF Group)

2. The Open Archival Information System (OAIS) and Representation Information (RI)

3. eBank-UK application profile and preservation metadata

4. ePrints.org repository platform

Recommendations:

Self-assessment using DRAMBORA

Consider Representation Information in wider context

Develop preservation strategy

Capture preservation metadata - PREMIS

Crystallographic schema underpins CIF (Crystallographic Information Framework), but is limited to data parameters e.g. cell_length_a

Semantic issues

IUCr Acta Cryst 1992

Limited set of keywords describing methods, properties & applications, compounds, attributes

No established crystallography dictionary or controlled vocabulary to give chemistry context

What do we want to do?• Support depositors’ keyword/term assignment• Facilitate and improve automated indexing• Support advanced search / browse• Allow metadata validation & enhancement• Apply across a heterogeneous Federation• Cross search, cross browse functionality• Link data to all associated digital objects• Develop domain semantics / vocabulary• Use domain-specific authority files• Mine to “discover” rather than “find”• Achieve full inter-disciplinary integration

Some (semantic) issues…..• How are terms assigned?• Informal tags and/or structured KOS? • How is a vocabulary curated and maintained?• Can a vocabulary be transformed into a (Semantic Web related

understanding) ontology?• Disambiguation, acronyms, IUPAC names• Persistent identification for data citation• Granularity of data citation• Data (and metadata) quality, provenance, validation• Embedding within complex workflows• Use collaborative social approaches? • Community adoption: becomes part of the culture

Federation

Questions?

Slides will be available at :

http://wiki.ecrystals.chem.soton.ac.uk/index.php

http://www.ukoln.ac.uk/ukoln/staff/e.j.lyon/presentations.html

This work is licensed under a Creative Commons LicenceAttribution-ShareAlike 3.0

http://creativecommons.org/licenses/by-sa/3.0/

top related