Top Banner
21 Nov 2006 21 Nov 2006 Jeremy G. Frey Jeremy G. Frey University of Southampton University of Southampton DCC Conference Glasgow The curation of laboratory The curation of laboratory experimental data as part of the experimental data as part of the overall data lifecycle overall data lifecycle Jeremy G.Frey Jeremy G.Frey School of Chemistry, University of School of Chemistry, University of Southampton, UK Southampton, UK 21 Nov 2006 21 Nov 2006 DCC Conference, Glasgow DCC Conference, Glasgow
37

21 Nov 2006 Jeremy G. Frey University of Southampton DCC Conference Glasgow The curation of laboratory experimental data as part of the overall data lifecycle.

Dec 22, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: 21 Nov 2006 Jeremy G. Frey University of Southampton DCC Conference Glasgow The curation of laboratory experimental data as part of the overall data lifecycle.

21 Nov 200621 Nov 2006 Jeremy G. FreyJeremy G. FreyUniversity of SouthamptonUniversity of Southampton

DCC Conference Glasgow

The curation of laboratory The curation of laboratory experimental data as part of the experimental data as part of the

overall data lifecycleoverall data lifecycle

Jeremy G.FreyJeremy G.FreySchool of Chemistry, University of School of Chemistry, University of

Southampton, UKSouthampton, UK

21 Nov 200621 Nov 2006

DCC Conference, GlasgowDCC Conference, Glasgow

Page 2: 21 Nov 2006 Jeremy G. Frey University of Southampton DCC Conference Glasgow The curation of laboratory experimental data as part of the overall data lifecycle.

21 Nov 200621 Nov 2006 Jeremy G. FreyJeremy G. FreyUniversity of SouthamptonUniversity of Southampton

DCC Conference 2006

If you do things right at the start then all the following processes are much easier!Exponentially growing amount of data - the future overwhelms the past

Page 3: 21 Nov 2006 Jeremy G. Frey University of Southampton DCC Conference Glasgow The curation of laboratory experimental data as part of the overall data lifecycle.

21 Nov 200621 Nov 2006 Jeremy G. FreyJeremy G. FreyUniversity of SouthamptonUniversity of Southampton

DCC Conference 2006

The CombThe CombeeChem ProjectChem Project End to End linking of data and End to End linking of data and

informationinformation Publication@SourcePublication@Source

So collect data with regard to how it So collect data with regard to how it could eventually be usedcould eventually be used Make sure the metadata is of high qualityMake sure the metadata is of high quality Record properly at source in Digital FormRecord properly at source in Digital Form

The Chemistry LabThe Chemistry Lab People & Machines working togetherPeople & Machines working together

Page 4: 21 Nov 2006 Jeremy G. Frey University of Southampton DCC Conference Glasgow The curation of laboratory experimental data as part of the overall data lifecycle.

21 Nov 200621 Nov 2006 Jeremy G. FreyJeremy G. FreyUniversity of SouthamptonUniversity of Southampton

DCC Conference 2006

Combechem

Smart Lab

R4L

e-Bank

E-Malaria

Instruments on the Grid

BioSimGridStatistics

Page 5: 21 Nov 2006 Jeremy G. Frey University of Southampton DCC Conference Glasgow The curation of laboratory experimental data as part of the overall data lifecycle.

21 Nov 200621 Nov 2006 Jeremy G. FreyJeremy G. FreyUniversity of SouthamptonUniversity of Southampton

DCC Conference 2006

Plan & COSHH

Digital Model

InformationIntegration

Report

Knowledge

Goal

Literature

Synthesis

not just one laboratory but many co-laboratories

working together

Analysis

Smart Laboratory

Smart Storage

Smart Dissemination

Smart HCI

The concept of Publication @ The concept of Publication @ SourceSourceThe concept of Publication @ The concept of Publication @ SourceSource

Smart Workflow

Page 6: 21 Nov 2006 Jeremy G. Frey University of Southampton DCC Conference Glasgow The curation of laboratory experimental data as part of the overall data lifecycle.

21 Nov 200621 Nov 2006 Jeremy G. FreyJeremy G. FreyUniversity of SouthamptonUniversity of Southampton

DCC Conference 2006

If only I knew exactly how she did this experiments

I know all this supplementary information could be useful but will people really remember the format? Is it worth all the hassle?

I wish I could get the numbers from this graph - the pdf is not much use.

I wish I had recorded things at the start the way I do now…..

Typical Laboratory

Page 7: 21 Nov 2006 Jeremy G. Frey University of Southampton DCC Conference Glasgow The curation of laboratory experimental data as part of the overall data lifecycle.

21 Nov 200621 Nov 2006 Jeremy G. FreyJeremy G. FreyUniversity of SouthamptonUniversity of Southampton

DCC Conference 2006

First, they do an online search

Need to make the data available

Need to be able to find it

But how to expose it?

Page 8: 21 Nov 2006 Jeremy G. Frey University of Southampton DCC Conference Glasgow The curation of laboratory experimental data as part of the overall data lifecycle.

21 Nov 200621 Nov 2006 Jeremy G. FreyJeremy G. FreyUniversity of SouthamptonUniversity of Southampton

DCC Conference 2006

I am sure we collected that information a few years ago…

The details should be in her thesis…..

Can you read what he says here….?

Can you find the file of data that were used to make the plot?

Some of these problems are due to the lack of information recorded at the time. Others are due to loss of information over time.

Page 9: 21 Nov 2006 Jeremy G. Frey University of Southampton DCC Conference Glasgow The curation of laboratory experimental data as part of the overall data lifecycle.

21 Nov 200621 Nov 2006 Jeremy G. FreyJeremy G. FreyUniversity of SouthamptonUniversity of Southampton

DCC Conference 2006

What are the people up to?What are the people up to?

Capture Data and ContextCapture Data and Context PeoplePeople ProcessProcess EnvironmentEnvironment

Page 10: 21 Nov 2006 Jeremy G. Frey University of Southampton DCC Conference Glasgow The curation of laboratory experimental data as part of the overall data lifecycle.

21 Nov 200621 Nov 2006 Jeremy G. FreyJeremy G. FreyUniversity of SouthamptonUniversity of Southampton

DCC Conference 2006

Permanent, documented and primary record of laboratory

observations

Page 11: 21 Nov 2006 Jeremy G. Frey University of Southampton DCC Conference Glasgow The curation of laboratory experimental data as part of the overall data lifecycle.

21 Nov 200621 Nov 2006 Jeremy G. FreyJeremy G. FreyUniversity of SouthamptonUniversity of Southampton

DCC Conference 2006

Observations are nevercollected on note pads,

filter paper or other temporary paper for later transfer into a

notebook

If you are caught using the “scrap of paper” technique,

your improperly recorded data may be confiscated by your TA

Page 12: 21 Nov 2006 Jeremy G. Frey University of Southampton DCC Conference Glasgow The curation of laboratory experimental data as part of the overall data lifecycle.

21 Nov 200621 Nov 2006 Jeremy G. FreyJeremy G. FreyUniversity of SouthamptonUniversity of Southampton

DCC Conference 2006

COSHHCOSHHLLeverage off things we already everage off things we already have to do – “We have a cunning have to do – “We have a cunning plan”plan”

Page 13: 21 Nov 2006 Jeremy G. Frey University of Southampton DCC Conference Glasgow The curation of laboratory experimental data as part of the overall data lifecycle.

21 Nov 200621 Nov 2006 Jeremy G. FreyJeremy G. FreyUniversity of SouthamptonUniversity of Southampton

DCC Conference 2006

1 1 2 2 1 3 1 4

Sample of 4-flourinatedbiphenyl

Add CoolReflux

Butanone Sample ofK2CO3Powder

Weigh

grammes0.9031

Measure

40 ml

Add

Weigh

2.0719 g

text

3 5

Add

g

Sample ofBr11OCB

2 6

Reflux

2 7

Cool

Water

Measure

30 ml

9

Liquid-liquid

extraction

DCM

Measure

3 of 40 ml

10

Dry

MgSO4

11

Filter(Buchner)

12RemoveSolvent

by RotaryEvaporation

13

Fuse

Silica

14Column

Chromatography

Ether/PetrolRatio

Butanone dried via silica column andmeasured into 100ml RB flask.

Used 1ml extra solvent to wash outcontainer.

Started reflux at 13.30. (Had tochange heater stirrer) Only reflux

for 45min, next step 14:15.

Inorganics dissolve 2layers. Added brine

~20ml.

Organics are yellowsolution

Washed MgSO4 withDCM ~ 50ml

Measure

excess

Observation Types

weight - grammes

measure - ml, drops

annotate - text

temperature - K, °C

Key

Process

Input

Literal

Observation

Add CoolRefluxAddAdd Reflux Cool Dry Filter Remove

Solventby Rotary

Evaporation

Fuse ColumnChromatography

Dissolve 4-flourinatedbiphenyl inbutanone

Add K2CO3powder

Heat at refluxfor 1.5 hours

Cool and addBr11OCB

Heat atreflux untilcompletion

Cool and addwater (30ml)

Combine organics,dry over MgSO4 &filter

Removesolvent invacuo

Liquid-liquid

extraction

Extract withDCM(3x40ml)

Fuse compound to silica &column in ether/petrol

4 8

Add

Add

text

Annotate

Annotate

text

Weigh

Annotate

g

Annotate Annotate

text text

Future Questions

Whether to have many subclasses of processes or fewer with annotations

How to depict destructive processes

How to depict taking lots of samples

What is the observation/process boundary? e.g. MRI scan

1.5918

Combechem

30 January 2004gvh, hrm, gms

Ingredient List

Fluorinated biphenyl 0.9 gBr11OCB 1.59 gPotassium Carbonate 2.07 gButanone 40 ml

image

To Do

List

Plan

Process

Record

Page 14: 21 Nov 2006 Jeremy G. Frey University of Southampton DCC Conference Glasgow The curation of laboratory experimental data as part of the overall data lifecycle.

21 Nov 200621 Nov 2006 Jeremy G. FreyJeremy G. FreyUniversity of SouthamptonUniversity of Southampton

DCC Conference 2006

1 1 2 2 1 3

Sample of 4-flourinatedbiphenyl

Add Reflux

Butanone Sample ofK2CO3Powder

Weigh

grammes0.9031

Measure

40 ml

Add

W eigh

2.0719 g

text

Butanone dried via silica column andmeasured into 100ml RB flask.

Used 1ml extra solvent to wash outcontainer.

Started reflux at 13.30. (Had tochange heater stirrer) Only reflux

for 45min, next step 14:15.

Add RefluxAdd

Dissolve 4-flourinatedbiphenyl inbutanone

Add K2CO3powder

Heat at refluxfor 1.5 hours

text

Annotate

Annotate

Ingredient List

Fluorinated biphenyl 0.9 gBr11OCB 1.59 gPotassium Carbonate 2.07 gButanone 40 ml

Page 15: 21 Nov 2006 Jeremy G. Frey University of Southampton DCC Conference Glasgow The curation of laboratory experimental data as part of the overall data lifecycle.

Pub-Sub systems provide the flexible & extensible approach to distribution of real time laboratory monitoring & archiving

Data Source

ArchiveClient

WebClient

Mobilephone

Data Source

PDA

MessageBroker

TranslatorService BLOG

Air Conditioning failed

Smart Laboratory Spaces

Page 16: 21 Nov 2006 Jeremy G. Frey University of Southampton DCC Conference Glasgow The curation of laboratory experimental data as part of the overall data lifecycle.

21 Nov 200621 Nov 2006 Jeremy G. FreyJeremy G. FreyUniversity of SouthamptonUniversity of Southampton

DCC Conference 2006

But what about the laboratory environment?

“I just realized, Howard, that everything in this apartment is more sophisticated than we are”

Page 17: 21 Nov 2006 Jeremy G. Frey University of Southampton DCC Conference Glasgow The curation of laboratory experimental data as part of the overall data lifecycle.

Semantic DataGridSemantic DataGrid

CombeChem used, tested & CombeChem used, tested & strained the Semantic Web strained the Semantic Web forfor Enhanced (annotated) DataGrid Enhanced (annotated) DataGrid

over multiple diverse storesover multiple diverse stores Storage of Provenance Storage of Provenance

Information Information Some Data StorageSome Data Storage Annotated multimedia streamsAnnotated multimedia streams Units & Propoerties OntologyUnits & Propoerties Ontology Multiple Triple StoresMultiple Triple Stores

Page 18: 21 Nov 2006 Jeremy G. Frey University of Southampton DCC Conference Glasgow The curation of laboratory experimental data as part of the overall data lifecycle.

21 Nov 200621 Nov 2006 Jeremy G. FreyJeremy G. FreyUniversity of SouthamptonUniversity of Southampton

DCC Conference 2006

Laboratory “Blogs”Laboratory “Blogs”

Laboratory notebook is a BlogLaboratory notebook is a Blog Encourage and facilitate collaborationEncourage and facilitate collaboration Need a data repository behind the Need a data repository behind the BBloglog

R4LR4L E-BankE-Bank

Flexible Flexible Service oriented approach being developedService oriented approach being developed

A VREA VRE

Page 19: 21 Nov 2006 Jeremy G. Frey University of Southampton DCC Conference Glasgow The curation of laboratory experimental data as part of the overall data lifecycle.

21 Nov 200621 Nov 2006 Jeremy G. FreyJeremy G. FreyUniversity of SouthamptonUniversity of Southampton

DCC Conference 2006

Instrument Blog

‘Blog-jects’

Page 20: 21 Nov 2006 Jeremy G. Frey University of Southampton DCC Conference Glasgow The curation of laboratory experimental data as part of the overall data lifecycle.

21 Nov 200621 Nov 2006 Jeremy G. FreyJeremy G. FreyUniversity of SouthamptonUniversity of Southampton

DCC Conference 2006

The ‘Scientific Blog’ is being tried in an attempt to combine laboratory notebooks and publication

Page 21: 21 Nov 2006 Jeremy G. Frey University of Southampton DCC Conference Glasgow The curation of laboratory experimental data as part of the overall data lifecycle.

21 Nov 200621 Nov 2006 Jeremy G. FreyJeremy G. FreyUniversity of SouthamptonUniversity of Southampton

DCC Conference 2006

Format Issues – everyday and for the long term

Page 22: 21 Nov 2006 Jeremy G. Frey University of Southampton DCC Conference Glasgow The curation of laboratory experimental data as part of the overall data lifecycle.

21 Nov 200621 Nov 2006 Jeremy G. FreyJeremy G. FreyUniversity of SouthamptonUniversity of Southampton

DCC Conference 2006

Note the use of “YouTube”

An experiment that failed… Publishable? Useful?

Page 23: 21 Nov 2006 Jeremy G. Frey University of Southampton DCC Conference Glasgow The curation of laboratory experimental data as part of the overall data lifecycle.

21 Nov 200621 Nov 2006 Jeremy G. FreyJeremy G. FreyUniversity of SouthamptonUniversity of Southampton

DCC Conference 2006

Record the ‘Scientific Conversation’ – this part of the record often exists only in the ‘grey literature’

CoAKTing

Memetic

Page 24: 21 Nov 2006 Jeremy G. Frey University of Southampton DCC Conference Glasgow The curation of laboratory experimental data as part of the overall data lifecycle.

21 Nov 200621 Nov 2006 Jeremy G. FreyJeremy G. FreyUniversity of SouthamptonUniversity of Southampton

DCC Conference 2006

Laboratory IRs and Information Laboratory IRs and Information ManagementManagement

Page 25: 21 Nov 2006 Jeremy G. Frey University of Southampton DCC Conference Glasgow The curation of laboratory experimental data as part of the overall data lifecycle.

21 Nov 200621 Nov 2006 Jeremy G. FreyJeremy G. FreyUniversity of SouthamptonUniversity of Southampton

DCC Conference 2006

Repositories

Page 26: 21 Nov 2006 Jeremy G. Frey University of Southampton DCC Conference Glasgow The curation of laboratory experimental data as part of the overall data lifecycle.

21 Nov 200621 Nov 2006 Jeremy G. FreyJeremy G. FreyUniversity of SouthamptonUniversity of Southampton

DCC Conference 2006

ValidationValidation

Increasing the value of data Increasing the value of data How to bring all the necessary information How to bring all the necessary information

together to enable appropriate validationtogether to enable appropriate validation Increasingly difficult & expensive to Increasingly difficult & expensive to

achieveachieve Need provenance and contextNeed provenance and context Essential step otherwise just a collection Essential step otherwise just a collection

of items of items

Page 27: 21 Nov 2006 Jeremy G. Frey University of Southampton DCC Conference Glasgow The curation of laboratory experimental data as part of the overall data lifecycle.

21 Nov 200621 Nov 2006 Jeremy G. FreyJeremy G. FreyUniversity of SouthamptonUniversity of Southampton

DCC Conference 2006

Why?Why?Publishing Data and Information Publishing Data and Information

LossLoss

Page 28: 21 Nov 2006 Jeremy G. Frey University of Southampton DCC Conference Glasgow The curation of laboratory experimental data as part of the overall data lifecycle.

21 Nov 200621 Nov 2006 Jeremy G. FreyJeremy G. FreyUniversity of SouthamptonUniversity of Southampton

DCC Conference 2006

SVG “active” graphics

Link to data, follow links back to the raw data archive

Link to simulation, full simulation data archived in BioSimGrid

R4L

Paper organized using RDF

Page 29: 21 Nov 2006 Jeremy G. Frey University of Southampton DCC Conference Glasgow The curation of laboratory experimental data as part of the overall data lifecycle.

21 Nov 200621 Nov 2006 Jeremy G. FreyJeremy G. FreyUniversity of SouthamptonUniversity of Southampton

DCC Conference 2006

Access to information requires Access to information requires crossing administrative domainscrossing administrative domains

Researcher

NationalArchive

ResearchGroup

InstitutionInternational

Database

ResearchGroup

Page 30: 21 Nov 2006 Jeremy G. Frey University of Southampton DCC Conference Glasgow The curation of laboratory experimental data as part of the overall data lifecycle.

21 Nov 200621 Nov 2006 Jeremy G. FreyJeremy G. FreyUniversity of SouthamptonUniversity of Southampton

DCC Conference 2006

Subversive and furtive sharing & exploitation of data in virtual

space

Data

CAS

RDF

OAI Taxi

E-

user

LabsDigital Repository

Page 31: 21 Nov 2006 Jeremy G. Frey University of Southampton DCC Conference Glasgow The curation of laboratory experimental data as part of the overall data lifecycle.

21 Nov 200621 Nov 2006 Jeremy G. FreyJeremy G. FreyUniversity of SouthamptonUniversity of Southampton

DCC Conference 2006

He is charged with expressing contempt for meta-data

Page 32: 21 Nov 2006 Jeremy G. Frey University of Southampton DCC Conference Glasgow The curation of laboratory experimental data as part of the overall data lifecycle.

21 Nov 200621 Nov 2006 Jeremy G. FreyJeremy G. FreyUniversity of SouthamptonUniversity of Southampton

DCC Conference 2006

Metadata LifecycleMetadata Lifecycle

Creation and maintenance of metadataCreation and maintenance of metadata Need a metadata infrastructure as well as Need a metadata infrastructure as well as

a data infrastructurea data infrastructure Capture process as well as resultsCapture process as well as results Automatic metadata generation when Automatic metadata generation when

possiblepossible Human annotation will always be neededHuman annotation will always be needed

Page 33: 21 Nov 2006 Jeremy G. Frey University of Southampton DCC Conference Glasgow The curation of laboratory experimental data as part of the overall data lifecycle.

21 Nov 200621 Nov 2006 Jeremy G. FreyJeremy G. FreyUniversity of SouthamptonUniversity of Southampton

DCC Conference 2006

PlansPlans

Plans are usefulPlans are useful This is the way things are supposed to be This is the way things are supposed to be

donedone The Plan provides a digital context so The Plan provides a digital context so

increases the value of planningincreases the value of planning Key to our ‘Smart Lab’ approach….Key to our ‘Smart Lab’ approach…. Is it the best way?Is it the best way?

Page 34: 21 Nov 2006 Jeremy G. Frey University of Southampton DCC Conference Glasgow The curation of laboratory experimental data as part of the overall data lifecycle.

21 Nov 200621 Nov 2006 Jeremy G. FreyJeremy G. FreyUniversity of SouthamptonUniversity of Southampton

DCC Conference 2006

Who is responsible Who is responsible

Context is crucial for curation Context is crucial for curation every person, on each step of the process every person, on each step of the process

of converting data to knowledge of converting data to knowledge Need to consider the future access to this Need to consider the future access to this

information by themselves and others.information by themselves and others.

Page 35: 21 Nov 2006 Jeremy G. Frey University of Southampton DCC Conference Glasgow The curation of laboratory experimental data as part of the overall data lifecycle.

21 Nov 200621 Nov 2006 Jeremy G. FreyJeremy G. FreyUniversity of SouthamptonUniversity of Southampton

DCC Conference Glasgow

Information Providers Information

Consumers

These are the same people – if we can ‘talk’ to ourselves efficiently over time then that is a good start to be able to ‘talk’ to others

Page 36: 21 Nov 2006 Jeremy G. Frey University of Southampton DCC Conference Glasgow The curation of laboratory experimental data as part of the overall data lifecycle.

21 Nov 200621 Nov 2006 Jeremy G. FreyJeremy G. FreyUniversity of SouthamptonUniversity of Southampton

DCC Conference 2006

All I am saying is that now is the time to develop the technology to deflect an asteroid

We must speed up the knowledge discovery process

Page 37: 21 Nov 2006 Jeremy G. Frey University of Southampton DCC Conference Glasgow The curation of laboratory experimental data as part of the overall data lifecycle.

21 Nov 200621 Nov 2006 Jeremy G. FreyJeremy G. FreyUniversity of SouthamptonUniversity of Southampton

DCC Conference 2006

PEOPLEPEOPLE

Southampton ECS, Southampton ECS, MATHS & CHEMISTRYMATHS & CHEMISTRY

IT-INNOVATIONIT-INNOVATION BRISTOLBRISTOL UKOLNUKOLN CCLRCCCLRC INDIANAINDIANA SYDNEYSYDNEY MANCHESTERMANCHESTER

EPRSC e-Science & EPRSC e-Science & Chemistry ProgrammesChemistry Programmes

JISC e-InfrastructreJISC e-Infrastructre

DTIDTI

See web site for full See web site for full details and linksdetails and links

www.combechem.orgwww.combechem.org