HYQUE: EVALUATING SCIENTIFIC HYPOTHESES USING SEMANTIC WEB TECHNOLOGIES MICHEL DUMONTIER, PHD ASSOCIATE PROFESSOR OF BIOINFORMATICS, DEPARTMENT OF BIOLOGY, INSTITUTE OF BIOCHEMISTRY AND SCHOOL OF COMPUTER SCIENCE @ CARLETON UNIVERSITY PROFESSEUR ASSOCIÉ, DÉPARTEMENT D’INFORMATIQUE ET DE GÉNIE LOGICIEL, UNIVERSITÉ LAVAL
35
Embed
HyQue: Evaluating scientific Hypotheses using semantic web technologies
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
HYQUE: EVALUATING SCIENTIFIC HYPOTHESES USING SEMANTIC WEB
TECHNOLOGIES
MICHEL DUMONTIER, PHD
ASSOCIATE PROFESSOR OF BIOINFORMATICS, DEPARTMENT OF BIOLOGY, INSTITUTE OF BIOCHEMISTRY AND SCHOOL OF COMPUTER SCIENCE @ CARLETON UNIVERSITY
PROFESSEUR ASSOCIÉ, DÉPARTEMENT D’INFORMATIQUE ET DE GÉNIELOGICIEL, UNIVERSITÉ LAVAL
HYQUE IS A COLLABORATIVE WORK
Work performed by Alison Callahan, a PhD student under my supervision @ Carleton University
Partnership with Dr. Nigam Shah, Assistant Professor at Stanford University
Computationally augmented method for hypothesis evaluation
• developed by Racunas et al. [1]• minimum event-based vocabulary• uses consistency checking to evaluate hypotheses
• constraints to ensure valid claims• rules to evaluate evidence
• compares hypotheses using neighborhood functions• incremental hypothesis improvement
[1] Racunas S. A., Shah N. H., Albert I. and Fedoroff N. V. (2004). HyBrow: A prototype system for computer-aided hypothesis evaluation. Bioinformatics 20(S. 1): i1-i8.
THE GAL GENE NETWORK IN YEAST
• Genes that encode proteins that transport and metabolize galactose
• permease – gal2p – transports galactose into cells
conjunctive hypothesis – must satisfy two expressions
conjunctive hypothesis with conditional expression
HYBROW• small, manually generated knowledge base
• hard coded Perl rules
• challenging to apply to a new domain
• needs access to a greater KB
SEMANTIC WEB TECHNOLOGIES FOR KNOWLEDGE MANAGEMENT?Semantic Web technologies are promising for application to automating hypothesis evaluation
• Languages for formal knowledge representation• Automated reasoning• Querying over distributed resources• Growing number of biological resources available in SW formats
• Ontologies• Data
Bio2RDF is one the largest resources of linked life data on the Web
~40 data sets available• Globally distributed• Dataset-specific SPARQL endpoints
BIO2RDF IS PART OF A GROWING WEB OF LINKED DATA
“Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/”
It is about standards for publishing, sharing and querying knowledge drawn from diverse sources
It enables the answering of sophisticated questions
The Semantic Web is a web of knowledge
ontology as a strategy to
formally represent knowledge
The Web Ontology Language (OWL) Has Explicit Semantics
Can therefore be used to capture knowledge in a machine understandable way
HYBROW HYQUE
• Hypothesis query and evaluation system
• Built on Semantic Web technologies
• Background knowledge encoded as OWL ontologies
• Queries against SPARQL endpoints• Context-specific rules that consider experimental
conditions• consumes and produces RDF• Can be accessed via web or semantic web services
HYQUE IS COMPOSED OF …
• HyQue hypothesis ontology
• Describes generic input hypothesis and output hypothesis evaluation classes
• Uses upper level classes e.g. ‘proposition’, ‘measurement value’, ‘event’
• HyQue Data
• Experimentally determined interactions between the GAL proteins (GAL knowledge base from HyBrow project)
• Literature-based evidence (citations)• Knowledge about cellular localization and biological processes (GO)• Types of evidence supporting these interactions (ECO)• yeast gene/protein/function data (SGD)
A HYQUE HYPOTHESIS IS A COLLECTION OF PROPOSITIONS
• HyQue hypotheses are composed of one or more propositions connected using logical operators (AND, OR, XOR…)
• proposition: “a statement expressing something true or false”
• HyQue propositions only specify events
HyQue hypothesis ≡ ‘proposition’
that ‘specifies’ only `event’)
HyQue hypothesis ≡ ‘proposition’
that `has component part’ only
(`proposition’ that ‘specifies’ only `event’)
HYQUE EVENTS
1. protein-protein binding
2. protein-nucleic acid binding
3. molecular activation
4. molecular inhibition
5. gene induction
6. gene repression
7. transport
HYQUE EVENTS
Events are composed of conditional assertions on a relation between ‘actor’ and ‘target’
induces(agent, target, context, location)
For decidable logic (OWL), an n-ary object is used
Final hypothesis score is 1.6 + events e2 + e3 have the strongest experimental support.
e1 (Gal4p induces expression of GAL1)
OR
e2 (Gal3p induces expression of GAL2
e3 AND Gal4p induces expression of GAL7)
OR
e4 (Gal4p induces expression of GAL7
e5 AND Gal80p inhibits production of Gal4p
when GAL3 is over-expressed
e6 AND Gal80p induces expression of GAL7)
HYPOTHESIS EVALUATION REPRESENTED AS RDF
BROWSE HYPOTHESIS AND EVALUATION AS LINKED DATA
http://sadiframework.org
Mark Wilkinson, UBCMichel Dumontier, Carleton UniversityChristopher Baker, UNB
The Semantic Automated Discovery and Integration (SADI) framework makes it easy to create Semantic Web services using OWL classes as service inputs and outputs
Users can post a hypothesis in RDF and receive the hypothesis evaluation RDF
HyQue can become part of a workflow for investigations
FUTURE DIRECTIONS• Investigate alternative, finer grained scoring systems
• Expand beyond the GAL network with network reconstructions and NLP facilitated data curation
• Collaborative social environment to engineer, share, compare and evaluate hypotheses, and format the results
CONCLUSION
HyQue is a new system to construct and evaluate (automatically obtain support for) hypotheses using formalized background knowledge and data on the Semantic Web
Stephen Racunas and Amar Das for helpful discussions
Bio2RDF: Peter Ansell, Francois Belleau, Allison Callahan, Jacques Corbeil, Jose Cruz-Toledo, Alex De Leon, Steve Etlinger, James Hogan, Nichealla Keith, Jean Morissette, Marc-Alexandre Nolin, Nicole Tourigny, Philippe Rigault and, Paul Roe
SADI: Christopher Baker, Melanie Courtot, Jose Cruz-Toledo, Steve Etlinger, Nichealla Keith, Artjom Klein, Luke McCarthy, Silvane Paixao, Ben Vandervalk, Natalia Villanueva-Rosales, Mark Wilkinson