Top Banner
EPrints Workshop, January 2005 1 eBank UK: Dissemination of research data using EPrints Simon Coles, School of Chemistry, University of Southampton
20

EPrints Workshop, January 20051 eBank UK: Dissemination of research data using EPrints Simon Coles, School of Chemistry, University of Southampton.

Dec 19, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: EPrints Workshop, January 20051 eBank UK: Dissemination of research data using EPrints Simon Coles, School of Chemistry, University of Southampton.

                                                             

EPrints Workshop, January 2005 1

eBank UK:

Dissemination of research data using EPrints

Simon Coles, School of Chemistry, University of Southampton

Page 2: EPrints Workshop, January 20051 eBank UK: Dissemination of research data using EPrints Simon Coles, School of Chemistry, University of Southampton.

                                                             

EPrints Workshop, January 2005 2

Overview

• Scholarly communications in Chemistry Data, information, workflows and provenance

• The data publication bottleneck e-Science and chemistry

• eBank UK Information architecture, data flow and

interoperability

• Challenges for the future Expansion into other disciplines and data formats

Page 3: EPrints Workshop, January 20051 eBank UK: Dissemination of research data using EPrints Simon Coles, School of Chemistry, University of Southampton.

                                                             

EPrints Workshop, January 2005 3

Research & e-Science workflows

Aggregator services: national, commercial

Repositories : institutional, e-prints, subject, data, learning objects

Data curation: databases & databanks

Validation

Harvestingmetadata

Data creation / capture / gathering: laboratory experiments, Grids, fieldwork, surveys, media

Deposit / self-archiving

Peer-reviewed publications: journals, conference proceedings

Publication

Validation

Data analysis, transformation, mining, modelling

Searching , harvesting, embedding

Presentation services: subject, media-specific, data, commercial portals

Resource discovery, linking, embedding

Linking

The scholarly knowledge cycle.

Liz Lyon, eBankUK article. Ariadne, July 2003.

Page 4: EPrints Workshop, January 20051 eBank UK: Dissemination of research data using EPrints Simon Coles, School of Chemistry, University of Southampton.

                                                             

EPrints Workshop, January 2005 4

Learning & Teaching workflows

Research & e-Science workflows

Aggregator services:

eBank UK

Repositories : institutional, e-prints, subject, data, learning objects

Data curation: databases & databanks

Institutional presentation services: portals, Learning Management Systems, u/g, p/g courses, modules

Validation

Harvestingmetadata

Data creation / capture / gathering: laboratory experiments, Grids, fieldwork, surveys, media

Resource discovery, linking, embedding

Deposit / self-archiving

Peer-reviewed publications: journals, conference proceedings

Publication

Validation

Data analysis, transformation, mining, modelling

Resource discovery, linking, embedding

Deposit / self-archiving

Learning object creation, re-use

Searching , harvesting, embedding

Quality assurance bodies

Validation

Presentation services: subject, media-specific, data, commercial portals

Resource discovery, linking, embedding

Linking

Page 5: EPrints Workshop, January 20051 eBank UK: Dissemination of research data using EPrints Simon Coles, School of Chemistry, University of Southampton.

                                                             

EPrints Workshop, January 2005 5

Current chemistry publishing protocols

Ideas and interpretations Hooks into the literature

Results & derived data

Raw data!

Page 6: EPrints Workshop, January 20051 eBank UK: Dissemination of research data using EPrints Simon Coles, School of Chemistry, University of Southampton.

                                                             

EPrints Workshop, January 2005 6

Page 7: EPrints Workshop, January 20051 eBank UK: Dissemination of research data using EPrints Simon Coles, School of Chemistry, University of Southampton.

                                                             

EPrints Workshop, January 2005 7

Data Overload!

How do we disseminate?

EPSRC National Crystallography

Service

The data deluge

Page 8: EPrints Workshop, January 20051 eBank UK: Dissemination of research data using EPrints Simon Coles, School of Chemistry, University of Southampton.

                                                             

EPrints Workshop, January 2005 8

CombeChem: eScience testbed

Properties

X-Raye-Lab

Analysis

Propertiese-Lab

SimulationVideo

Diff

ract

omet

er

Grid Middleware

StructuresDatabase

Page 9: EPrints Workshop, January 20051 eBank UK: Dissemination of research data using EPrints Simon Coles, School of Chemistry, University of Southampton.

                                                             

EPrints Workshop, January 2005 9

Establishing common ground…

• Understand the data creation process • Terminology and definitions

– Data– Metadata– Datafile– Dataset– Data holding

• Different views– Digital library researchers, computer scientists, chemists– Generic vs specific– Modeller vs practitioner

• Aim for a common ontology• Modelling the domain• Creating a metadata schema

Page 10: EPrints Workshop, January 20051 eBank UK: Dissemination of research data using EPrints Simon Coles, School of Chemistry, University of Southampton.

                                                             

EPrints Workshop, January 2005 10

Crystallography workflow• Initialisation: mount new sample on diffractometer &

set up data collection• Collection: collect data• Processing: process and correct images• Solution: solve structures• Refinement: refine structure• CIF: produce CIF (Crystallographic Information File

format)• Report: generate Crystal Structure Report

RAW DATA DERIVED DATA RESULTS DATA

Page 11: EPrints Workshop, January 20051 eBank UK: Dissemination of research data using EPrints Simon Coles, School of Chemistry, University of Southampton.

                                                             

EPrints Workshop, January 2005 11

Deposition into the archive

Page 12: EPrints Workshop, January 20051 eBank UK: Dissemination of research data using EPrints Simon Coles, School of Chemistry, University of Southampton.

                                                             

EPrints Workshop, January 2005 12

An Archive entry

ecrystals.chem.soton.ac.uk

Page 13: EPrints Workshop, January 20051 eBank UK: Dissemination of research data using EPrints Simon Coles, School of Chemistry, University of Southampton.

                                                             

EPrints Workshop, January 2005 13

Access to the underlying data

Page 14: EPrints Workshop, January 20051 eBank UK: Dissemination of research data using EPrints Simon Coles, School of Chemistry, University of Southampton.

                                                             

EPrints Workshop, January 2005 14

Some metadata issues

• Using simple and qualified Dublin Core • Additional chemical information in schema for

harvesting e.g. empirical formula• Schema contains International Chemical Identifier

(InChI)• Links to all datasets associated with an experiment• Links to individual datasets within an experiment• Links to EPrints (and other published literature)

derived from the data• Using vocabularies specific to crystallography• Engaging the broader scientific community to ensure

different schemas are compliant and standards can emerge

Page 15: EPrints Workshop, January 20051 eBank UK: Dissemination of research data using EPrints Simon Coles, School of Chemistry, University of Southampton.

                                                             

EPrints Workshop, January 2005 16

Harvesting: OAIster

Page 16: EPrints Workshop, January 20051 eBank UK: Dissemination of research data using EPrints Simon Coles, School of Chemistry, University of Southampton.

                                                             

EPrints Workshop, January 2005 17

Linking and aggregating

Page 17: EPrints Workshop, January 20051 eBank UK: Dissemination of research data using EPrints Simon Coles, School of Chemistry, University of Southampton.

                                                             

EPrints Workshop, January 2005 18

Embedded in a science portal

Page 18: EPrints Workshop, January 20051 eBank UK: Dissemination of research data using EPrints Simon Coles, School of Chemistry, University of Southampton.

                                                             

EPrints Workshop, January 2005 19

Current situation

• Version 2.0 eBank metadata schema• Pilot institutional e-data repository for harvesting (raw,

derived, results data) using EPrints.org software• Exports records as ebank_dc and oai_dc• Validation of schema & discussion with International

Union of Crystallography for final developments and wider deployment

• Pilot eBank UK aggregator service• Developing search interface Version 1.0 • Testing with PSIgate physical sciences portal –

embedding eBank UK

Page 19: EPrints Workshop, January 20051 eBank UK: Dissemination of research data using EPrints Simon Coles, School of Chemistry, University of Southampton.

                                                             

EPrints Workshop, January 2005 20

What’s next?

• Progress towards generic metadata schemas • Validation against other schema (CCLRC Model)• Eprints.org software: allow for more generic scientific data

and schemas? • Metadata enhancement: keywords based on knowledge of

keywords in related publications?• Investigate identifiers: International Chemical Identifier • Explore context sensitive linking• Full embedding into chemical and crystallographic research

and publishing• e-Learning embedding and pedagogic evaluation• Feasibility study in related domains

Page 20: EPrints Workshop, January 20051 eBank UK: Dissemination of research data using EPrints Simon Coles, School of Chemistry, University of Southampton.

                                                             

EPrints Workshop, January 2005 21

Breakout Session?• Describing non ‘Dublin Core’ terms

Qualified Dublin Core Complex object formats: METS vs MPEG-21 DIDL Set & Friends containers

• Compliance between schemas One generic schema Develop multiple schemas

• Rights Use / reuse Publisher

• Linking & aggregating DOI Keyword ontologies Identifiers Context sensitive linking