Page 1
Open Knowledge: Reproducibility inCheminformatics with Open Data, Open
Source and Open Standards
Egon Willighagen <http://chem-bla-ics.blogspot.com/>
Bioclipse & Proteochemometric Group (Prof. Wikberg)Department of Pharmaceutical Biosciences
Uppsala University
2009-08-31
Page 2
Problem
Solution
Results
Discussions
Conclusion
The Setting...
1998: Organicchemistry...beatiful science!But ... why, how,what, ...
PJJA Buijnsters et al., Eur.J.Org.Chem, 2002, 1397–1406
2009-08-31 Bioclipse & Proteochemometric Group - 2 - Egon Willighagen | chem-bla-ics.blogspot.com
Page 3
Problem
Solution
Results
Discussions
Conclusion
Reliable Knowledge: Trust
How to build Trusttrack record
2009-08-31 Bioclipse & Proteochemometric Group - 3 - Egon Willighagen | chem-bla-ics.blogspot.com
Page 4
Problem
Solution
Results
Discussions
Conclusion
Knowledge: Trust
How to build Trusttrack recordtransparency: citation
2009-08-31 Bioclipse & Proteochemometric Group - 4 - Egon Willighagen | chem-bla-ics.blogspot.com
Page 5
Problem
Solution
Results
Discussions
Conclusion
Knowledge: Trust
How to build Trusttrack recordtransparency: citationreproducibility: details
2009-08-31 Bioclipse & Proteochemometric Group - 5 - Egon Willighagen | chem-bla-ics.blogspot.com
Page 6
Problem
Solution
Results
Discussions
Conclusion
Knowledge: Trust
How to build Trusttrack recordtransparency: citationreproducibility: details
Open {Data|Standards|Source|. . . }
2009-08-31 Bioclipse & Proteochemometric Group - 6 - Egon Willighagen | chem-bla-ics.blogspot.com
Page 7
Problem
Solution
Results
Discussions
Conclusion
Knowledge Representation...
What are theorganic normalconditions?
2009-08-31 Bioclipse & Proteochemometric Group - 7 - Egon Willighagen | chem-bla-ics.blogspot.com
Page 8
Problem
Solution
Results
Discussions
Conclusion
The Problem: Reproducibility...
Where reproducibility isseverely hampered:
recalculate basic atom andbond propertiesaccess to QSAR/QSPRdatawell-defined algorithmspublications destroyinformation
2009-08-31 Bioclipse & Proteochemometric Group - 8 - Egon Willighagen | chem-bla-ics.blogspot.com
Page 9
Problem
Solution
Results
Discussions
Conclusion
Solutions...
Openesslicense that allowsmodification andredistributionhiding behind publicdomain is not helpful
Semantic Webbe explicit in what youmeanboth in facts and inalgorithms
2009-08-31 Bioclipse & Proteochemometric Group - 9 - Egon Willighagen | chem-bla-ics.blogspot.com
Page 10
Problem
Solution
Results
Discussions
Conclusion
Reproducibility needs ODOSOS
Open DataNo Intellectual Monopoly
Open Sourcealgorithms are compleximplementations even morestrong interaction with representation
Open StandardsSemantic Webformatsunique identifiers
http: // en. wikipedia. org/ wiki/ Glyn_ Moody
2009-08-31 Bioclipse & Proteochemometric Group - 10 - Egon Willighagen | chem-bla-ics.blogspot.com
Page 11
Problem
Solution
Results
Discussions
Conclusion
Jmol
Started in 1997 byDan Gezelter(Notre Dame)Leaders: BradlySmith, me, MiguelHoward, BobHanson
E.L. Willighagen, M. Howard, Nature Precedings, 2005http: // www. jmol. org/
2009-08-31 Bioclipse & Proteochemometric Group - 11 - Egon Willighagen | chem-bla-ics.blogspot.com
Page 12
Problem
Solution
Results
Discussions
Conclusion
The Chemistry Development Kit
A Family of ProjectsCDK-Taverna (chemoinformatics workflows)JChemPaint (semantic 2D editor)ChemoJava (GPL-ed extension)
Goalslibrary of cheminformatics algorithmseducational
UsageCDK 2003: 75+ times cited in literatureBioclipse, KNIME, Jumbo (CML), AMBIT, ...
C. Steinbeck et al., J.Chem.Inf.Comput.Sci, 2003C. Steinbeck et al., Curr.Pharm.Design, 2006
2009-08-31 Bioclipse & Proteochemometric Group - 12 - Egon Willighagen | chem-bla-ics.blogspot.com
Page 13
Problem
Solution
Results
Discussions
Conclusion
CDK: an Open Project
Featuresopen mailinglist and bugtrackeropen source repositoryrelease soon, release often
Offer Reviewsenior developers reviewpatches
2009-08-31 Bioclipse & Proteochemometric Group - 13 - Egon Willighagen | chem-bla-ics.blogspot.com
Page 14
Problem
Solution
Results
Discussions
Conclusion
Bioclipse
O. Spjuth et al., BMC Bioinformatics 2007, 8:59
2009-08-31 Bioclipse & Proteochemometric Group - 14 - Egon Willighagen | chem-bla-ics.blogspot.com
Page 15
Problem
Solution
Results
Discussions
Conclusion
Integration
Servicesdatabases: PubChemweb servicesGoogle SpreadsheetsMyExperiment.org: BioclipseScripting LanguageTwitter, ...journals, ...
TechniquesSOAP, REST, XMPP, . . .Resource Description Frameworkdedicated APIs
2009-08-31 Bioclipse & Proteochemometric Group - 15 - Egon Willighagen | chem-bla-ics.blogspot.com
Page 16
Problem
Solution
Results
Discussions
Conclusion
MyExperiment: Bioclipse ScriptingLanguage
2009-08-31 Bioclipse & Proteochemometric Group - 16 - Egon Willighagen | chem-bla-ics.blogspot.com
Page 17
Problem
Solution
Results
Discussions
Conclusion
XMPP
XMPPJabberprotocolAlternative toHTTPXML-based:improvedsemantics
FeaturesAsychronousXML-based:improvedsemantics
J. Wagener et al., BMC Bioinformatics, 2009, in production
2009-08-31 Bioclipse & Proteochemometric Group - 17 - Egon Willighagen | chem-bla-ics.blogspot.com
Page 18
Problem
Solution
Results
Discussions
Conclusion
Resource Description Framework
Facts as Triplessubjectpredictate (relation)object
Exampleswp:Benzenechem:hasSMILES"c1ccccc1"wp:Benzene owl:sameAschemspider:123
2009-08-31 Bioclipse & Proteochemometric Group - 18 - Egon Willighagen | chem-bla-ics.blogspot.com
Page 19
Problem
Solution
Results
Discussions
Conclusion
OpenMolecules RDF
http://rdf.openmolecules.net/
2009-08-31 Bioclipse & Proteochemometric Group - 19 - Egon Willighagen | chem-bla-ics.blogspot.com
Page 20
Problem
Solution
Results
Discussions
Conclusion
Blue Obelisk
R Guha et al., J.Chem.Inf.Model.,2006
2009-08-31 Bioclipse & Proteochemometric Group - 20 - Egon Willighagen | chem-bla-ics.blogspot.com
Page 21
Problem
Solution
Results
Discussions
Conclusion
Which License?
ChoiceGPL v2 or v3, LGPL v2 orv3, Apache, BSD, MIT, ...FDL, CC0, PDDLImportant: redistribution,modification
Bad Practisenot explicitly stating yourintentionsPublic Domain
2009-08-31 Bioclipse & Proteochemometric Group - 21 - Egon Willighagen | chem-bla-ics.blogspot.com
Page 22
Problem
Solution
Results
Discussions
Conclusion
Mixing Data?
License IncompatibilityAsk about the copyrightholders intention!
Use Open Standard InterfacesResource DescriptionFramework
2009-08-31 Bioclipse & Proteochemometric Group - 22 - Egon Willighagen | chem-bla-ics.blogspot.com
Page 23
Problem
Solution
Results
Discussions
Conclusion
Conclusions
No Intellectual Monopoly AcchievedJmol, CDK, JChemPaint, Bioclipse
• A huge success!Open Data in chemistry is still way behind
• Open Access trap• Public Domain trap
Semantics is showing up• in RDF• in Publishing
2009-08-31 Bioclipse & Proteochemometric Group - 23 - Egon Willighagen | chem-bla-ics.blogspot.com
Page 24
Problem
Solution
Results
Discussions
Conclusion
The Details
http://www.citeulike.org/user/
egonw/tag/papers
http:
//chem-bla-ics.blogspot.com
mailto:
[email protected]
2009-08-31 Bioclipse & Proteochemometric Group - 24 - Egon Willighagen | chem-bla-ics.blogspot.com