21 Nov 2006 21 Nov 2006 Jeremy G. Frey Jeremy G. Frey University of Southampton University of Southampton DCC Conference Glasgow The curation of laboratory The curation of laboratory experimental data as part of the experimental data as part of the overall data lifecycle overall data lifecycle Jeremy G.Frey Jeremy G.Frey School of Chemistry, University of School of Chemistry, University of Southampton, UK Southampton, UK 21 Nov 2006 21 Nov 2006 DCC Conference, Glasgow DCC Conference, Glasgow
37
Embed
21 Nov 2006 Jeremy G. Frey University of Southampton DCC Conference Glasgow The curation of laboratory experimental data as part of the overall data lifecycle.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
21 Nov 200621 Nov 2006 Jeremy G. FreyJeremy G. FreyUniversity of SouthamptonUniversity of Southampton
DCC Conference Glasgow
The curation of laboratory The curation of laboratory experimental data as part of the experimental data as part of the
overall data lifecycleoverall data lifecycle
Jeremy G.FreyJeremy G.FreySchool of Chemistry, University of School of Chemistry, University of
Southampton, UKSouthampton, UK
21 Nov 200621 Nov 2006
DCC Conference, GlasgowDCC Conference, Glasgow
21 Nov 200621 Nov 2006 Jeremy G. FreyJeremy G. FreyUniversity of SouthamptonUniversity of Southampton
DCC Conference 2006
If you do things right at the start then all the following processes are much easier!Exponentially growing amount of data - the future overwhelms the past
21 Nov 200621 Nov 2006 Jeremy G. FreyJeremy G. FreyUniversity of SouthamptonUniversity of Southampton
DCC Conference 2006
The CombThe CombeeChem ProjectChem Project End to End linking of data and End to End linking of data and
So collect data with regard to how it So collect data with regard to how it could eventually be usedcould eventually be used Make sure the metadata is of high qualityMake sure the metadata is of high quality Record properly at source in Digital FormRecord properly at source in Digital Form
The Chemistry LabThe Chemistry Lab People & Machines working togetherPeople & Machines working together
21 Nov 200621 Nov 2006 Jeremy G. FreyJeremy G. FreyUniversity of SouthamptonUniversity of Southampton
DCC Conference 2006
Combechem
Smart Lab
R4L
e-Bank
E-Malaria
Instruments on the Grid
BioSimGridStatistics
21 Nov 200621 Nov 2006 Jeremy G. FreyJeremy G. FreyUniversity of SouthamptonUniversity of Southampton
DCC Conference 2006
Plan & COSHH
Digital Model
InformationIntegration
Report
Knowledge
Goal
Literature
Synthesis
not just one laboratory but many co-laboratories
working together
Analysis
Smart Laboratory
Smart Storage
Smart Dissemination
Smart HCI
The concept of Publication @ The concept of Publication @ SourceSourceThe concept of Publication @ The concept of Publication @ SourceSource
Smart Workflow
21 Nov 200621 Nov 2006 Jeremy G. FreyJeremy G. FreyUniversity of SouthamptonUniversity of Southampton
DCC Conference 2006
If only I knew exactly how she did this experiments
I know all this supplementary information could be useful but will people really remember the format? Is it worth all the hassle?
I wish I could get the numbers from this graph - the pdf is not much use.
I wish I had recorded things at the start the way I do now…..
Typical Laboratory
21 Nov 200621 Nov 2006 Jeremy G. FreyJeremy G. FreyUniversity of SouthamptonUniversity of Southampton
DCC Conference 2006
First, they do an online search
Need to make the data available
Need to be able to find it
But how to expose it?
21 Nov 200621 Nov 2006 Jeremy G. FreyJeremy G. FreyUniversity of SouthamptonUniversity of Southampton
DCC Conference 2006
I am sure we collected that information a few years ago…
The details should be in her thesis…..
Can you read what he says here….?
Can you find the file of data that were used to make the plot?
Some of these problems are due to the lack of information recorded at the time. Others are due to loss of information over time.
21 Nov 200621 Nov 2006 Jeremy G. FreyJeremy G. FreyUniversity of SouthamptonUniversity of Southampton
DCC Conference 2006
What are the people up to?What are the people up to?
Capture Data and ContextCapture Data and Context PeoplePeople ProcessProcess EnvironmentEnvironment
21 Nov 200621 Nov 2006 Jeremy G. FreyJeremy G. FreyUniversity of SouthamptonUniversity of Southampton
DCC Conference 2006
Permanent, documented and primary record of laboratory
observations
21 Nov 200621 Nov 2006 Jeremy G. FreyJeremy G. FreyUniversity of SouthamptonUniversity of Southampton
DCC Conference 2006
Observations are nevercollected on note pads,
filter paper or other temporary paper for later transfer into a
notebook
If you are caught using the “scrap of paper” technique,
your improperly recorded data may be confiscated by your TA
21 Nov 200621 Nov 2006 Jeremy G. FreyJeremy G. FreyUniversity of SouthamptonUniversity of Southampton
DCC Conference 2006
COSHHCOSHHLLeverage off things we already everage off things we already have to do – “We have a cunning have to do – “We have a cunning plan”plan”
21 Nov 200621 Nov 2006 Jeremy G. FreyJeremy G. FreyUniversity of SouthamptonUniversity of Southampton
DCC Conference 2006
1 1 2 2 1 3 1 4
Sample of 4-flourinatedbiphenyl
Add CoolReflux
Butanone Sample ofK2CO3Powder
Weigh
grammes0.9031
Measure
40 ml
Add
Weigh
2.0719 g
text
3 5
Add
g
Sample ofBr11OCB
2 6
Reflux
2 7
Cool
Water
Measure
30 ml
9
Liquid-liquid
extraction
DCM
Measure
3 of 40 ml
10
Dry
MgSO4
11
Filter(Buchner)
12RemoveSolvent
by RotaryEvaporation
13
Fuse
Silica
14Column
Chromatography
Ether/PetrolRatio
Butanone dried via silica column andmeasured into 100ml RB flask.
Used 1ml extra solvent to wash outcontainer.
Started reflux at 13.30. (Had tochange heater stirrer) Only reflux
Pub-Sub systems provide the flexible & extensible approach to distribution of real time laboratory monitoring & archiving
Data Source
ArchiveClient
WebClient
Mobilephone
Data Source
PDA
MessageBroker
TranslatorService BLOG
Air Conditioning failed
Smart Laboratory Spaces
21 Nov 200621 Nov 2006 Jeremy G. FreyJeremy G. FreyUniversity of SouthamptonUniversity of Southampton
DCC Conference 2006
But what about the laboratory environment?
“I just realized, Howard, that everything in this apartment is more sophisticated than we are”
Semantic DataGridSemantic DataGrid
CombeChem used, tested & CombeChem used, tested & strained the Semantic Web strained the Semantic Web forfor Enhanced (annotated) DataGrid Enhanced (annotated) DataGrid
over multiple diverse storesover multiple diverse stores Storage of Provenance Storage of Provenance
Information Information Some Data StorageSome Data Storage Annotated multimedia streamsAnnotated multimedia streams Units & Propoerties OntologyUnits & Propoerties Ontology Multiple Triple StoresMultiple Triple Stores
21 Nov 200621 Nov 2006 Jeremy G. FreyJeremy G. FreyUniversity of SouthamptonUniversity of Southampton
DCC Conference 2006
Laboratory “Blogs”Laboratory “Blogs”
Laboratory notebook is a BlogLaboratory notebook is a Blog Encourage and facilitate collaborationEncourage and facilitate collaboration Need a data repository behind the Need a data repository behind the BBloglog
R4LR4L E-BankE-Bank
Flexible Flexible Service oriented approach being developedService oriented approach being developed
A VREA VRE
21 Nov 200621 Nov 2006 Jeremy G. FreyJeremy G. FreyUniversity of SouthamptonUniversity of Southampton
DCC Conference 2006
Instrument Blog
‘Blog-jects’
21 Nov 200621 Nov 2006 Jeremy G. FreyJeremy G. FreyUniversity of SouthamptonUniversity of Southampton
DCC Conference 2006
The ‘Scientific Blog’ is being tried in an attempt to combine laboratory notebooks and publication
21 Nov 200621 Nov 2006 Jeremy G. FreyJeremy G. FreyUniversity of SouthamptonUniversity of Southampton
DCC Conference 2006
Format Issues – everyday and for the long term
21 Nov 200621 Nov 2006 Jeremy G. FreyJeremy G. FreyUniversity of SouthamptonUniversity of Southampton
DCC Conference 2006
Note the use of “YouTube”
An experiment that failed… Publishable? Useful?
21 Nov 200621 Nov 2006 Jeremy G. FreyJeremy G. FreyUniversity of SouthamptonUniversity of Southampton
DCC Conference 2006
Record the ‘Scientific Conversation’ – this part of the record often exists only in the ‘grey literature’
CoAKTing
Memetic
21 Nov 200621 Nov 2006 Jeremy G. FreyJeremy G. FreyUniversity of SouthamptonUniversity of Southampton
DCC Conference 2006
Laboratory IRs and Information Laboratory IRs and Information ManagementManagement
21 Nov 200621 Nov 2006 Jeremy G. FreyJeremy G. FreyUniversity of SouthamptonUniversity of Southampton
DCC Conference 2006
Repositories
21 Nov 200621 Nov 2006 Jeremy G. FreyJeremy G. FreyUniversity of SouthamptonUniversity of Southampton
DCC Conference 2006
ValidationValidation
Increasing the value of data Increasing the value of data How to bring all the necessary information How to bring all the necessary information
together to enable appropriate validationtogether to enable appropriate validation Increasingly difficult & expensive to Increasingly difficult & expensive to
achieveachieve Need provenance and contextNeed provenance and context Essential step otherwise just a collection Essential step otherwise just a collection
of items of items
21 Nov 200621 Nov 2006 Jeremy G. FreyJeremy G. FreyUniversity of SouthamptonUniversity of Southampton
DCC Conference 2006
Why?Why?Publishing Data and Information Publishing Data and Information
LossLoss
21 Nov 200621 Nov 2006 Jeremy G. FreyJeremy G. FreyUniversity of SouthamptonUniversity of Southampton
DCC Conference 2006
SVG “active” graphics
Link to data, follow links back to the raw data archive
Link to simulation, full simulation data archived in BioSimGrid
R4L
Paper organized using RDF
21 Nov 200621 Nov 2006 Jeremy G. FreyJeremy G. FreyUniversity of SouthamptonUniversity of Southampton
DCC Conference 2006
Access to information requires Access to information requires crossing administrative domainscrossing administrative domains
Researcher
NationalArchive
ResearchGroup
InstitutionInternational
Database
ResearchGroup
21 Nov 200621 Nov 2006 Jeremy G. FreyJeremy G. FreyUniversity of SouthamptonUniversity of Southampton
DCC Conference 2006
Subversive and furtive sharing & exploitation of data in virtual
space
Data
CAS
RDF
OAI Taxi
E-
user
LabsDigital Repository
21 Nov 200621 Nov 2006 Jeremy G. FreyJeremy G. FreyUniversity of SouthamptonUniversity of Southampton
DCC Conference 2006
He is charged with expressing contempt for meta-data
21 Nov 200621 Nov 2006 Jeremy G. FreyJeremy G. FreyUniversity of SouthamptonUniversity of Southampton
DCC Conference 2006
Metadata LifecycleMetadata Lifecycle
Creation and maintenance of metadataCreation and maintenance of metadata Need a metadata infrastructure as well as Need a metadata infrastructure as well as
a data infrastructurea data infrastructure Capture process as well as resultsCapture process as well as results Automatic metadata generation when Automatic metadata generation when
possiblepossible Human annotation will always be neededHuman annotation will always be needed
21 Nov 200621 Nov 2006 Jeremy G. FreyJeremy G. FreyUniversity of SouthamptonUniversity of Southampton
DCC Conference 2006
PlansPlans
Plans are usefulPlans are useful This is the way things are supposed to be This is the way things are supposed to be
donedone The Plan provides a digital context so The Plan provides a digital context so
increases the value of planningincreases the value of planning Key to our ‘Smart Lab’ approach….Key to our ‘Smart Lab’ approach…. Is it the best way?Is it the best way?
21 Nov 200621 Nov 2006 Jeremy G. FreyJeremy G. FreyUniversity of SouthamptonUniversity of Southampton
DCC Conference 2006
Who is responsible Who is responsible
Context is crucial for curation Context is crucial for curation every person, on each step of the process every person, on each step of the process
of converting data to knowledge of converting data to knowledge Need to consider the future access to this Need to consider the future access to this
information by themselves and others.information by themselves and others.
21 Nov 200621 Nov 2006 Jeremy G. FreyJeremy G. FreyUniversity of SouthamptonUniversity of Southampton
DCC Conference Glasgow
Information Providers Information
Consumers
These are the same people – if we can ‘talk’ to ourselves efficiently over time then that is a good start to be able to ‘talk’ to others
21 Nov 200621 Nov 2006 Jeremy G. FreyJeremy G. FreyUniversity of SouthamptonUniversity of Southampton
DCC Conference 2006
All I am saying is that now is the time to develop the technology to deflect an asteroid
We must speed up the knowledge discovery process
21 Nov 200621 Nov 2006 Jeremy G. FreyJeremy G. FreyUniversity of SouthamptonUniversity of Southampton