oreChem: Planning and Enacting Chemistry on the Semantic Web Microsoft Research eScience Workshop 2010 Berkeley, CA USA Mark Borkum, Simon Coles and Jeremy Frey 12 October 2010
Nov 01, 2014
oreChem: Planning and Enacting Chemistry on the Semantic WebMicrosoft Research eScience Workshop 2010Berkeley, CA USA
Mark Borkum, Simon Coles and Jeremy Frey12 October 2010
2
Overview• Introduction
• Ontology
• Case Study: X-ray Crystallography
• Future Work
• Summary
3
The Scientific Method• A systematic
process for knowledge acquisition
• Becoming increasingly data-intensive
4
The Data Deluge• In Haiku:
– Lots of producers;Generating more datathan ever before.
• 40 years ago, a PhD student would determine 3 structures over the entire course of their study!
The Great Wave off Kanagawa by Katsushika Hokusai
5
The Scientific Method (on the Web)
6
Provenance (The Elephant in the Room)• The 7 W’s [Goble
2002]
– Who, What, Where, Why, When, Which, & (W)How
• The Why aspect is often ignored
7
The oreChem Project• Funded by Microsoft
Research
• Investigating the design and deployment of a semantic-based eScience infrastructure for Chemistry
• Project website:
– http://research.microsoft.com/en-us/projects/orechem/
oreChem
Dublin Core, FOAF, SIOC, OWL Time, GeoNames, etc…
8
oreChem Core Ontology
9
Planning• Prospective
provenance
• Describes a scientific experiment that will be enacted (in the future)
• Three entity types:
– Plan– Plan Stage– Plan Object
10
Enactment• Retrospective
provenance
• Describes a scientific experiment that was enacted
• Three entity types:
– Run– Stage– Object
“In theory, there is no difference between theory and practice.But, in practice, there is.” Unknown (possibly Yogi Berra)
12
Realisation (is not Instantiation)• Each ‘run thing’ is
linked to zero or one ‘plan thing’
– Deviation from the plan is allowed
13
X-RAY CRYSTALLOGRAPHY
Case Study
14
Current Practice in Crystallography• Crystallography data
is highly structured
– The de facto standard adopted by the community is the CIF (Crystallographic Information File)
• Relatively few crystal structures are openly available online
http://www.rin.ac.uk/our-work/data-management-and-curation/share-or-not-share-research-data-outputs
15
Crystallography and Fraud
16
The eCrystals Federation• JISC project
• Network of crystallography resources
• All published records are available as Open Data
• Based on EPrints repository
http://ecrystals.chem.soton.ac.uk/
17
eCrystal #20• Each eCrystals
record contains:
– Bibliographic metadata
– Fundamental and derived data (excluding raw images)
– Final structure solution
18
Single Crystal Structure Determination1. Take powder
specimen of chemical substance
2. Measure diffraction of X-rays
3. Compute electron densities
4. Solve for crystal structure
19
oreChem Plan for eCrystals• Machine-readable
representation of methodology
• Describes requirements for software and data products
• Available online at:– http://ecrystals.chem.sot
on.ac.uk/plan.rdf
20
oreChem Run for eCrystal #20• Exported by
“oreChem” plug-in for EPrints 3.1
– RDF/XML serialisation
– Uses SWRL rules to infer causal relationships
• Describes:
– Software– Data products
http://ecrystals.chem.soton.ac.uk/cgi/export/20/ORE_Chem/ecrystals-eprint-20.xml?include_xsl=1
21
Retrospective Provenance Graphs for eCrystal #20
Stages and Objects Objects
used (dashed)emitted (solid)
derivedFrom (solid)
used(?s, ?o1) & emitted(?s, ?o2) derivedFrom(?o2, ?o1)
22
Crystallography and Fraud – SPARQL PREFIX orechem: <http://www.openarchives.org/2010/05/24-orechem-ns#>PREFIX ecrystals: <http://ecrystals.chem.soton.ac.uk/plan.rdf#>SELECT ?run ?raw ?derived ?reportedWHERE { ?run a orechem:Run ; orechem:hasPlan ecrystals:Ecrystals ; orechem:containsObject ?raw ; orechem:containsObject ?derived ; orechem:containsObject ?reported . ?raw a orechem:File ; orechem:hasPlanObject ecrystals:HKL . ?derived a orechem:File ; orechem:derivedFrom ?raw . ?reported a orechem:File ; orechem:hasPlanObject ecrystals:CIF ; orechem:derivedFrom ?derived .}
23
Crystallography and Fraud – SPARQL (2)
24
Crystallography and Fraud – SPARQL (3)?run ?raw
?reported
?derived
http://ecrystals.chem.soton.ac.uk/cgi/export/20/ORE_Chem/ecrystals-eprint-20.xml?include_xsl=1
25
Crystallography and Fraud – SPARQL (4)
?run ?raw ?derived ?reported
_:eCrystal_20_Run 02sot126.hkl 02sot126.prp 02sot126.cif
_:eCrystal_20_Run 02sot126.hkl 02sot126.lst 02sot126.cif
_:eCrystal_20_Run 02sot126.hkl 02sot126.res 02sot126.cif
26
Future Work• oreChem Core Ontology
– Support for conditionals and continuations
• oreChem Lower Ontology
– Specialised for Physical and Computational Chemistry
• Applications and Services
– oreChem Plan Designer and Enactor– oreChem Run Inspector
27
Summary• <summary/>
28
Acknowledgements• Microsoft Research
– Tony Hey– Lee Dirks– Savas Parastatidis– Alex Wade
• oreChem Project
– Carl Lagoze, Theresa Velden
– Jeremy Frey, Simon Coles
– Peter Murray-Rust, Nick Day, Jim Downing
– C. Lee Giles, Prasenjit Mitra, William Brouwer, Na Li
– Marlon Pierce, Sashi Kiran Challa
29
Thank You• Questions?