Top Banner
1 Curation of Chemistry Data from the Laboratory to Publication Jeremy Frey & Simon Coles School of Chemistry University of Southampton
32

11 Curation of Chemistry Data from the Laboratory to Publication Jeremy Frey & Simon Coles School of Chemistry University of Southampton Jeremy Frey &

Jan 04, 2016

Download

Documents

Randolph Wright
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: 11 Curation of Chemistry Data from the Laboratory to Publication Jeremy Frey & Simon Coles School of Chemistry University of Southampton Jeremy Frey &

11

Curation of Chemistry Data from the Laboratory

to Publication

Curation of Chemistry Data from the Laboratory

to Publication

Jeremy Frey & Simon ColesSchool of Chemistry

University of Southampton

Jeremy Frey & Simon ColesSchool of Chemistry

University of Southampton

Page 2: 11 Curation of Chemistry Data from the Laboratory to Publication Jeremy Frey & Simon Coles School of Chemistry University of Southampton Jeremy Frey &

AHM2006AHM2006 Data Curation WorkshopData Curation Workshop 22

The CombeChem Project

The CombeChem Project

End to End linking of data and informationLaboratory to publication and back againVery long data chains can be involved e.g.

from a chemistry lab to mouse genetic expression

The exponential world of combinatorial synthesis and high throughput analysis meets the exponentially growing power of computing “Automation, Semantics & the Grid”

End to End linking of data and informationLaboratory to publication and back againVery long data chains can be involved e.g.

from a chemistry lab to mouse genetic expression

The exponential world of combinatorial synthesis and high throughput analysis meets the exponentially growing power of computing “Automation, Semantics & the Grid”

Page 3: 11 Curation of Chemistry Data from the Laboratory to Publication Jeremy Frey & Simon Coles School of Chemistry University of Southampton Jeremy Frey &

AHM2006AHM2006 Data Curation WorkshopData Curation Workshop 33

Plan & COSHH

Digital Model

InformationIntegration

Report

Knowledge

Goal

Literature

Synthesis

not just one laboratory but many co-laboratories

working together

Analysis

Smart Laboratory

Smart Storage Smart Dissemination

Smart HCI

Page 4: 11 Curation of Chemistry Data from the Laboratory to Publication Jeremy Frey & Simon Coles School of Chemistry University of Southampton Jeremy Frey &

AHM2006AHM2006 Data Curation WorkshopData Curation Workshop 44

Problems with ‘Small Laboratory’ Working Practice

Problems with ‘Small Laboratory’ Working Practice

“Data from experiments conducted as recently as six months ago might be suddenly deemed important, but those researchers may never find those numbers – or if they did might not know what those numbers meant”

“Lost in some research assistant’s computer, the data are often irretrievable or an undecipherable string of digits”

“To vet experiments, correct errors, or find new breakthroughs, scientists desperately need better ways to store and retrieve research data”

“Data from Big Science is … easier to handle, understand and archive. Small Science is horribly heterogeneous and far more vast. In time Small Science will generate 2-3 times more data than Big Science.”

‘Lost in a Sea of Science Data’ S.Carlson, The Chronicle of Higher Education (23/06/2006)

Page 5: 11 Curation of Chemistry Data from the Laboratory to Publication Jeremy Frey & Simon Coles School of Chemistry University of Southampton Jeremy Frey &

AHM2006AHM2006 Data Curation WorkshopData Curation Workshop 55

The concept of Publication@Source

The concept of Publication@Source

Trace all the way back from publication to the original data – provenance

The data is the key - DataGridStart as you mean to go on – ELNs are a

necessityCuration of subsequently produced data

Trace all the way back from publication to the original data – provenance

The data is the key - DataGridStart as you mean to go on – ELNs are a

necessityCuration of subsequently produced data

Page 6: 11 Curation of Chemistry Data from the Laboratory to Publication Jeremy Frey & Simon Coles School of Chemistry University of Southampton Jeremy Frey &

AHM2006AHM2006 Data Curation WorkshopData Curation Workshop 66

Observations are nevercollected on note pads,

filter paper or other temporary paper for later transfer into a

notebook

If you are caught using the “scrap of paper” technique,

your improperly recorded data may be confiscated by your TA

Page 7: 11 Curation of Chemistry Data from the Laboratory to Publication Jeremy Frey & Simon Coles School of Chemistry University of Southampton Jeremy Frey &

AHM2006AHM2006 Data Curation WorkshopData Curation Workshop 77

Lab books are a big block to publication@source: if it’s not digital, it is more difficult to share

Need a usable digital lab book. Design by analogy to help Chemists and Computer Scientists work together.

Only some equipment is networked

This is where it all starts: The Lab & The Lab Book

Page 8: 11 Curation of Chemistry Data from the Laboratory to Publication Jeremy Frey & Simon Coles School of Chemistry University of Southampton Jeremy Frey &

AHM2006AHM2006 Data Curation WorkshopData Curation Workshop 88

COSHHleverage off things we already have to do

COSHHleverage off things we already have to do

Page 9: 11 Curation of Chemistry Data from the Laboratory to Publication Jeremy Frey & Simon Coles School of Chemistry University of Southampton Jeremy Frey &

AHM2006AHM2006 Data Curation WorkshopData Curation Workshop 99

1 1 2 2 1 3 1 4

Sample of 4-flourinatedbiphenyl

Add CoolReflux

Butanone Sample ofK2CO3Powder

Weigh

grammes0.9031

Measure

40 ml

Add

Weigh

2.0719 g

text

3 5

Add

g

Sample ofBr11OCB

2 6

Reflux

2 7

Cool

Water

Measure

30 ml

9

Liquid-liquid

extraction

DCM

Measure

3 of 40 ml

10

Dry

MgSO4

11

Filter(Buchner)

12

RemoveSolvent

by RotaryEvaporation

13

Fuse

Silica

14

ColumnChromatography

Ether/PetrolRatio

Butanone dried via silica column andmeasured into 100ml RB flask.

Used 1ml extra solvent to wash outcontainer.

Started reflux at 13.30. (Had tochange heater stirrer) Only reflux

for 45min, next step 14:15.

Inorganics dissolve 2layers. Added brine

~20ml.

Organics are yellowsolution

Washed MgSO4 withDCM ~ 50ml

Measure

excess

Observation Types

weight - grammes

measure - ml, drops

annotate - text

temperature - K, °C

Key

Process

Input

Literal

Observation

Add CoolRefluxAddAdd Reflux Cool Dry Filter Remove

Solventby Rotary

Evaporation

Fuse ColumnChromatography

Dissolve 4-flourinatedbiphenyl inbutanone

Add K2CO3powder

Heat at refluxfor 1.5 hours

Cool and addBr11OCB

Heat atreflux untilcompletion

Cool and addwater (30ml)

Combine organics,dry over MgSO4 &filter

Removesolvent invacuo

Liquid-liquid

extraction

Extract withDCM(3x40ml)

Fuse compound to silica &column in ether/petrol

4 8

Add

Add

text

Annotate

Annotate

text

Weigh

Annotate

g

Annotate Annotate

text text

Future Questions

Whether to have many subclasses of processes or fewer with annotations

How to depict destructive processes

How to depict taking lots of samples

What is the observation/process boundary? e.g. MRI scan

1.5918

Combechem

30 January 2004gvh, hrm, gms

Ingredient List

Fluorinated biphenyl 0.9 gBr11OCB 1.59 gPotassium Carbonate 2.07 gButanone 40 ml

image

To

Do

Lis

tP

lan

Pro

ce

ss

Re

co

rd

PLAN

Process Record

Page 10: 11 Curation of Chemistry Data from the Laboratory to Publication Jeremy Frey & Simon Coles School of Chemistry University of Southampton Jeremy Frey &

AHM2006AHM2006 Data Curation WorkshopData Curation Workshop 1010

1 1 2 2 1 3

Sample of 4-flourinatedbiphenyl

Add Reflux

Butanone Sample ofK2CO3Powder

Weigh

grammes0.9031

Measure

40 ml

Add

Weigh

2.0719 g

text

Butanone dried via silica column andmeasured into 100ml RB flask.

Used 1ml extra solvent to wash outcontainer.

Started reflux at 13.30. (Had tochange heater stirrer) Only reflux

for 45min, next step 14:15.

Add RefluxAdd

Dissolve 4-flourinatedbiphenyl inbutanone

Add K2CO3powder

Heat at refluxfor 1.5 hours

text

Annotate

Annotate

Ingredient List

Fluorinated biphenyl 0.9 gBr11OCB 1.59 gPotassium Carbonate 2.07 gButanone 40 ml

1 1 2 2 1 3

Sample of 4-flourinatedbiphenyl

Add Reflux

Butanone Sample ofK2CO3Powder

Weigh

grammes0.9031

Measure

40 ml

Add

Weigh

2.0719 g

text

Butanone dried via silica column andmeasured into 100ml RB flask.

Used 1ml extra solvent to wash outcontainer.

Started reflux at 13.30. (Had tochange heater stirrer) Only reflux

for 45min, next step 14:15.

Add RefluxAdd

Dissolve 4-flourinatedbiphenyl inbutanone

Add K2CO3powder

Heat at refluxfor 1.5 hours

text

Annotate

Annotate

Ingredient List

Fluorinated biphenyl 0.9 gBr11OCB 1.59 gPotassium Carbonate 2.07 gButanone 40 ml

Page 11: 11 Curation of Chemistry Data from the Laboratory to Publication Jeremy Frey & Simon Coles School of Chemistry University of Southampton Jeremy Frey &

AHM2006AHM2006 Data Curation WorkshopData Curation Workshop 1111

Key

Process

Input

Literal

Observation

pla

n-t

o-

hea

t_te

a_in

_wat

er

plan-to-add_tea_to_water

Add tea to hotwater

Heat tea for5 minutes

Filter off tealeaves

File: combechem/process/tea.rdfOntology: combechem/process/process-record.rdfs

13:41:36 14 July 2004© 2004 University of Southampton

Ste

ps

Pla

nP

roc

ess R

ec

ord

planned-weight_of_tea_leaves

5

planned_tea_leaves

plan-to-weigh_tea_leaves

processed-by-iv

material-observed-by

produces-observation

has-unitvalue

produces-substance

pla

n-t

o-f

ilter

_tea

produces-substance

300

has-unitvalue

processed-by-iv

material-observed-by

planned_some_water

plan-to-measure_some_water

produces-observation

planned-volume_of_some_water

processed-by

processed-by

next-step next-step

hea

t_te

a_in

_wa

ter

add_tea_to_water

weight_of_tea_leaves

5.021

tea_leaves

weighing_tea_leaves

processed-by-iv

material-observed-by

produces-observation

has-unitvalue

produces-

substance

filt

er_

tea

produces-substance

&cec;volumeunit-millilitre310

has-unitvalue

processed-by-iv

material-observed-by

some_water

measuring_some_water

produces-observation

volume_of_some_water

processed-by

processed-by

pla

n-t

o-t

ea_i

n_w

ater

pla

n-t

o-h

ot_

tea

tea_

in_w

ate

r

ho

t_te

a

step-text step-text step-text

experiment-pretty-name

The basic teaexperiment

experiment-description

Add tea leaves tohot water, refluxing,

filtering, drinking(maybe)

experimenter

starting-process

MakingTea

http://www.ecs.soton.ac.uk/info/#person-00389

process-record-of

material-record-of

process-record-of

produces-substance

pla

n-t

o-f

inis

he

d_t

ea

produces-substance

fin

ish

ed_t

ea

<tabletscribble>

value

process-observed-by

watching_tea_boil

produces-observation

heat_tea_notes

&cec;massunit-gramme

&cec;volumeunit-millilitre

&cec;massunit-gramme

Smarttea.org

Making Tea

Namespaces

rdf http://www.w3.org/1999/02/22-rdf-syntax-ns#rdfs http://www.w3.org/2000/01/rdf-schema#xsd http://www.w3.org/2001/XMLSchema#akt http://www.aktors.org/ontology/portal#cml http://www.xml-cml.org/schema/cml2/corecec http://www.combechem.org/ontology/process/0.1#st http://smarttea.org/#

part-of-step

part-of-step

part-of-step

step1 step2 step3

experiment-goal

material-is-ingredient-of

material-is-ingredient-of

material-record-of

process-record-of

process-record-of

process-record-of

material-record-of

material-record-of

starting-step

getRecord()

There is a potential containment problem in pulling back partial RDF graphs from the triple store.

Solved by using multiple triple stores but boundaries are a major issue for the future.

Page 12: 11 Curation of Chemistry Data from the Laboratory to Publication Jeremy Frey & Simon Coles School of Chemistry University of Southampton Jeremy Frey &

AHM2006AHM2006 Data Curation WorkshopData Curation Workshop 1212

ArchitectureArchitecture

SURIGSURIGSURIGData stores

SemanticData

Otherservices

Weights &Measures

Bench

Planner0

Viewer0

PH

PJava

“Client” LibrariesSOAP

JenaSURIG

Applications

Institutional archivesand m

etadata publication

Page 13: 11 Curation of Chemistry Data from the Laboratory to Publication Jeremy Frey & Simon Coles School of Chemistry University of Southampton Jeremy Frey &

AHM2006AHM2006 Data Curation WorkshopData Curation Workshop 1313

The Analytical LaboratoryThe Analytical Laboratory

Capture information from places you would not want to put your eyes

Capture environmental data automatically

Capture people and movements

Provide this information in real time as well as for the laboratory record

Capture information from places you would not want to put your eyes

Capture environmental data automatically

Capture people and movements

Provide this information in real time as well as for the laboratory record

Page 14: 11 Curation of Chemistry Data from the Laboratory to Publication Jeremy Frey & Simon Coles School of Chemistry University of Southampton Jeremy Frey &

AHM2006AHM2006 Data Curation WorkshopData Curation Workshop 1414

Data Source

ArchiveClient

WebClient

Mobilephone

Data Source

PDA

MessageBroker

TranslatorService

Pub-Sub systems provide the flexible & extensible approach to distribution

BLOG

Page 15: 11 Curation of Chemistry Data from the Laboratory to Publication Jeremy Frey & Simon Coles School of Chemistry University of Southampton Jeremy Frey &

AHM2006AHM2006 Data Curation WorkshopData Curation Workshop 1515

Temperature – room, laser

Door & interlock, Motion Sensors

Air Conditioning failed

Page 16: 11 Curation of Chemistry Data from the Laboratory to Publication Jeremy Frey & Simon Coles School of Chemistry University of Southampton Jeremy Frey &

AHM2006AHM2006 Data Curation WorkshopData Curation Workshop 1616

Databases - Our experienceDatabases - Our experience

What do you do when the actual users keep changing their mind?

Is a traditional relational database suitable?Danger of re-enforcing scientific bias against

relational database for laboratory data.RDF & Triple stores were again the solution

What do you do when the actual users keep changing their mind?

Is a traditional relational database suitable?Danger of re-enforcing scientific bias against

relational database for laboratory data.RDF & Triple stores were again the solution

Page 17: 11 Curation of Chemistry Data from the Laboratory to Publication Jeremy Frey & Simon Coles School of Chemistry University of Southampton Jeremy Frey &

AHM2006AHM2006 Data Curation WorkshopData Curation Workshop 1717

RDF/RDFS High level Schema for chemical properties

Page 18: 11 Curation of Chemistry Data from the Laboratory to Publication Jeremy Frey & Simon Coles School of Chemistry University of Southampton Jeremy Frey &

AHM2006AHM2006 Data Curation WorkshopData Curation Workshop 1818

Page 19: 11 Curation of Chemistry Data from the Laboratory to Publication Jeremy Frey & Simon Coles School of Chemistry University of Southampton Jeremy Frey &

AHM2006AHM2006 Data Curation WorkshopData Curation Workshop 1919

Triple Stores - The Heart of the Semantic WebScaling - 3Store response

Memory leak in testing program!

Page 20: 11 Curation of Chemistry Data from the Laboratory to Publication Jeremy Frey & Simon Coles School of Chemistry University of Southampton Jeremy Frey &

AHM2006AHM2006 Data Curation WorkshopData Curation Workshop 2020

Scaling the triplestoresScaling the triplestores

Moved from…A model of harvesting data from multiple

sources into one scalable storetoA model of distributed RDF sources and

caching what is needed for the task at hand into multiple stores fit-for-purpose

Moved from…A model of harvesting data from multiple

sources into one scalable storetoA model of distributed RDF sources and

caching what is needed for the task at hand into multiple stores fit-for-purpose

The Semantic Web!

Page 21: 11 Curation of Chemistry Data from the Laboratory to Publication Jeremy Frey & Simon Coles School of Chemistry University of Southampton Jeremy Frey &

AHM2006AHM2006 Data Curation WorkshopData Curation Workshop 2121

Experiments on the Grid: The NCS Service

Experiments on the Grid: The NCS Service

HTTPS

Page 22: 11 Curation of Chemistry Data from the Laboratory to Publication Jeremy Frey & Simon Coles School of Chemistry University of Southampton Jeremy Frey &

AHM2006AHM2006 Data Curation WorkshopData Curation Workshop 2222

Binary raw data archived in Atlas Datastore

x300

ADS£’s

Page 23: 11 Curation of Chemistry Data from the Laboratory to Publication Jeremy Frey & Simon Coles School of Chemistry University of Southampton Jeremy Frey &

AHM2006AHM2006 Data Curation WorkshopData Curation Workshop 2323

A Data-Rich Subject – the Crystallography ProblemA Data-Rich Subject – the Crystallography Problem

Cl

Cl

Cl

Cl

Cl

Cl

ClCl Cl

Cl

Cl

ClCl

O

O

O

O

N

N

N

N

N+

O

O

O

N+

O

O

O

30,000,000

1.5,000,000

450,000

Page 24: 11 Curation of Chemistry Data from the Laboratory to Publication Jeremy Frey & Simon Coles School of Chemistry University of Southampton Jeremy Frey &

AHM2006AHM2006 Data Curation WorkshopData Curation Workshop 2424

The eCrystals Digital RepositoryThe eCrystals Digital Repository

http://ecrystals.chem.soton.ac.uk

Page 25: 11 Curation of Chemistry Data from the Laboratory to Publication Jeremy Frey & Simon Coles School of Chemistry University of Southampton Jeremy Frey &

AHM2006AHM2006 Data Curation WorkshopData Curation Workshop 2525

Access to the underlying dataAccess to the underlying data

Page 26: 11 Curation of Chemistry Data from the Laboratory to Publication Jeremy Frey & Simon Coles School of Chemistry University of Southampton Jeremy Frey &

AHM2006AHM2006 Data Curation WorkshopData Curation Workshop 2626

Aggregator services

Institutional data repositoriesValidation

Deposit

Publishers: peer-review journals, conference proceedings, etc

Publication

Validation

Data analysis, transformation, mining, modelling

Search, harvest

Presentation services / portals

Data discovery, linking, citation

Laboratory repositoryDeposit

The eCrystals ‘Global’ ModelThe eCrystals ‘Global’ Model

Preservation and curation

Page 27: 11 Curation of Chemistry Data from the Laboratory to Publication Jeremy Frey & Simon Coles School of Chemistry University of Southampton Jeremy Frey &

AHM2006AHM2006 Data Curation WorkshopData Curation Workshop 2727

Laboratory Repositories and Information Management

Laboratory Repositories and Information Management

Page 28: 11 Curation of Chemistry Data from the Laboratory to Publication Jeremy Frey & Simon Coles School of Chemistry University of Southampton Jeremy Frey &

AHM2006AHM2006 Data Curation WorkshopData Curation Workshop 2828

Need for a data archive in the laboratory

Need for a data archive in the laboratory

Not just the published spectra!

Page 29: 11 Curation of Chemistry Data from the Laboratory to Publication Jeremy Frey & Simon Coles School of Chemistry University of Southampton Jeremy Frey &

AHM2006AHM2006 Data Curation WorkshopData Curation Workshop 2929

Deposit

The R4L RepositoryThe R4L Repository

Search / Browse

Create new compound Add experiment data and metadata

Page 30: 11 Curation of Chemistry Data from the Laboratory to Publication Jeremy Frey & Simon Coles School of Chemistry University of Southampton Jeremy Frey &

AHM2006AHM2006 Data Curation WorkshopData Curation Workshop 3030

Several groups making and analysing; the library Administrative Domains transfer or share the data

Several groups making and analysing; the library Administrative Domains transfer or share the data

Researcher

NationalArchive

ResearchGroup

InstitutionInternational

Database

ResearchGroup

Page 31: 11 Curation of Chemistry Data from the Laboratory to Publication Jeremy Frey & Simon Coles School of Chemistry University of Southampton Jeremy Frey &

AHM2006AHM2006 Data Curation WorkshopData Curation Workshop 3131

SVG “active” graphics

Link to data, follow links back to the raw data archive

Link to simulation, full simulation data archived in BioSimGrid

R4L

Paper organized using RDF

Page 32: 11 Curation of Chemistry Data from the Laboratory to Publication Jeremy Frey & Simon Coles School of Chemistry University of Southampton Jeremy Frey &

AHM2006AHM2006 Data Curation WorkshopData Curation Workshop 3232

Summary:Summary:Making sure other people can find,

understand and re-use your data easily and with confidence (even when there is a huge amount of it!)

Make use of Plans to inform the digital context - metadata in advance

Have concern for the “End-to-End life cycle” of chemistry information from the start.

Understanding Usability and Human Computer Interaction is vital for adoption

Making sure other people can find, understand and re-use your data easily and with confidence (even when there is a huge amount of it!)

Make use of Plans to inform the digital context - metadata in advance

Have concern for the “End-to-End life cycle” of chemistry information from the start.

Understanding Usability and Human Computer Interaction is vital for adoption