Real World Applications of OWL 1 Michel Dumontier, Ph.D. Associate Professor of Bioinformatics Department of Biology, School of Computer Science, Institute of Biochemistry, Carleton University Ottawa Institute of Systems Biology Ottawa-Carleton Institute of Biomedical Engineering Professeur Associé, Université Laval Visiting Associate Professor, Stanford University Protege Short Course::Dumontier:March 2012
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
1
Real World Applications of OWL
Michel Dumontier, Ph.D.
Associate Professor of BioinformaticsDepartment of Biology, School of Computer Science, Institute of Biochemistry,
Carleton UniversityOttawa Institute of Systems Biology
Ottawa-Carleton Institute of Biomedical EngineeringProfesseur Associé, Université Laval
The intent is to express that the species represents a substance composed of glucose moleculesWe also know from the SBML model that this substance is located in the cytosol and with a (initial)
concentration of 0.09765M
The annotation element stores the
RDFsubject
Implicit subject and xml attributes
Protege Short Course::Dumontier:March 2012
9
OWL Axiom:M SubClassOf: represents some MaterialEntity
Conversion rule: a Model annotated with class C represents:
If C is a SubClassOf MaterialEntity then M SubClassOf: represents some C
If C is a SubClassOf Function then M SubClassOf: represents some (has-function some C)
If C is a SubClassOf Process then M SubClassOf: represents some (has-function some
(realized-by only C))
For each model annotation, we make a commitment to what it represents
Protege Short Course::Dumontier:March 2012
10
Protege Short Course::Dumontier:March 2012
11
Model verification
After reasoning, we found 27 models to be inconsistent
reasons1. our representation - functions sometimes found in the place
of physical entities (e.g. entities that secrete insulin). better to constrain with appropriate relations
2. SBML abused – e.g. species used as a measure of time3. Incorrect annotations - constraints in the ontologies
themselves mean that the annotation is simply not possible
Protege Short Course::Dumontier:March 2012
12
Finding inconsistencies with axiomatically enhanced ontologies
ATPase activity (GO:0004002) is a Catalytic activity that has Water and ATP as input, ADP and phosphate as output and is a part of an ATP catabolic process.To this, we add:• GO: ATP + Water the only inputs (universal quantification)• ChEBI: Water, ATP, alpha-D-glucose 6-phosphate are all
different (disjointness)• “ATP” input to “ATPase” reaction, which is annotated with
ATPase activity. The species “ATP”, however, is mis-annotated with Alpha-D-glucose 6-phosphate (CHEBI:17665), not with ATP.
• Unsatisfiable -> curation error in BIOMD0000000176 and BIOMD0000000177 models of anaerobic glycolysis in yeast.
Protege Short Course::Dumontier:March 2012
13
Classification:Phosphotases
• Bioinformaticians use tools to identify functional domains (e.g., InterProScan)
• Tools simply show the presence of domains - they do not classify proteins
• Experts classify proteins according to domain arrangements - the presence and number of each domain is important
14
PhosphaBase: an ontology-driven database resource for protein phosphatases.Wolstencroft KJ, Stevens R, Tabernero L, Brass A. Proteins. 2005 Feb 1;58(2):290-4.
Protege Short Course::Dumontier:March 2012
Phosphatase Functional Domains
15 Protege Short Course::Dumontier:March 2012
Defining Protein Phosphatases
• Necessary and sufficient conditions are stipulated using EquivalentClass axioms
• A protein phosphatase is exactly a protein that consists of exactly one transmembrane domain and contains at least one phosphotase domain
ProteinPhosphatase EquivalentTo: Protein AND hasDomain 1 transMembraneDomain AND hasDomain min 1 PhosphataseCatalyticDomain
16 Protege Short Course::Dumontier:March 2012
17
More precise class expressions can be formulated for subtypes
Inclusion of universal quantifier now restricts the domains to only the types listed
R2A EquivalentTo: ProteinAND hasDomain 2 ProteinTyrosinePhosphataseDomain AND hasDomain 1 TransmembraneDomain AND hasDomain 4 FibronectinDomainsAND hasDomain 1 ImmunoglobulinDomain AND hasDomain 1 MAMDomainAND hasDomain 1 Cadherin-LikeDomainAND hasDomain only (TyrosinePhosphataseDomain OR TransmembraneDomain OR FibronectinDomain OR ImnunoglobulinDomain OR Clathrin-LikeDomain OR ManDomain)
Protege Short Course::Dumontier:March 2012
hydroxyl groupmethyl group
Knowledge of functional groups is important in chemical synthesis,
pharmaceutical design and lead optimization.
Functional groups describe chemical reactivity in terms of
atoms and their connectivity, and exhibits characteristic chemical
behavior when present in a compound.
Describing chemical functional groups in OWL-DL for the classification of chemical compounds
N Villanueva-Rosales, M Dumontier. 2007. OWLED, Innsbruck, Austria.
Ethanol
Protege Short Course::Dumontier:March 201218
Describing Functional Groups in DL
HydroxylGroup: CarbonGroup that (hasSingleBondWith some (OxygenAtom that hasSingleBondWith some HydrogenAtom)
OHR
R group
Protege Short Course::Dumontier:March 201219
Fully Classified Ontology
35 FG
Protege Short Course::Dumontier:March 201220
And, we define certain compounds
Alcohol: OrganicCompound that (hasPart some HydroxylGroup)
Protege Short Course::Dumontier:March 201221
Organic Compound Ontology
28 OC
Protege Short Course::Dumontier:March 201222
Question Answering:Classes as self-contained queries
• Query PubChem, DrugBank and dbPedia
Protege Short Course::Dumontier:March 201223
Querying Kidney and Urinary Knowledge Base and Ontology
KUPO Ontology
Entre gene
Gene X GO:0054426go:biological_process
Gene YMA:00345
kupo:002444
PT epithelial cell
rdfs:label
ro:part_of
MA:00456
kupo:004672
DT epithelial cell
rdfs:label
ro:part_of
Higgings Dataset
MA:000345
kupo:expressed_in
Gene YMA:00456
kupo:expressed_in
Proximal tubule
Distal tubule
Gene X
Query: What are the genes involved in Proteins transport expressed in Proximal Tubule Epithelial Cell?
24 Protege Short Course::Dumontier:March 201224
Semantic Annotation and Query
AE/GEO acquire
>250,000 Assays
>10,000 experiment
s
Re-annotate & summarizeATLAS
ArrayExpress
Curation Curation
Ontologically Modeling Sample Variables in Gene Expression Data [email protected]
Protege Short Course::Dumontier:March 201225
ontology-based data exploration
Query for Cell adhesion genes in all ‘organism parts’
‘View on EFO’
Ontologically Modeling Sample Variables in Gene Expression Data [email protected]
Protege Short Course::Dumontier:March 201226
Ontology-based query expansion for ArrayExpress Archive @ www.ebi.ac.uk/arrayexpress
Protege Short Course::Dumontier:March 201227
Search and Co-Occurrence
Protege Short Course::Dumontier:March 201228
Semantic Assistantservices relevant for the user's current task are offered directly within a desktop application. This approach relies on ontology-described semantic web services to provide external natural language processing (NLP) pipelines
Leverage of OWL-DL axioms in a Contact Centre for Technical Product SupportAlex Kouznetsov, Bradley Shoebottom, René Witte, Christopher JO Baker. OWLED 2010.
Protege Short Course::Dumontier:March 201229
Plug-in for Open Office Client
Protege Short Course::Dumontier:March 201230
• HyQue helps construct and evaluate (automatically obtain support for) hypotheses using formalized background knowledge and data using the Semantic Web
• HyQue makes it possible to develop a reliability model around data based on our scientific expectations of corroborating evidence
Protege Short Course::Dumontier:March 201231
Callahan A, Dumontier M, Shah NH. HyQue: evaluating hypotheses using Semantic Web technologies. J Biomed Semantics. 2011 May 17;2 Suppl 2:S3.
Callahan A, Dumontier M. Evaluating scientific hypotheses using the SPARQL Inferencing Notation. Extended Semantic Web Conference (ESWC 2012). Heraklion, Crete. May 27-31, 2012. Accepted.
Hypothesis
h1:
e1 (Gal4p induces expression of GAL1)
h2:
e2 (Gal3p induces expression of GAL2
e3 AND Gal4p induces expression of GAL7)
h3:
e4 (Gal4p induces expression of GAL7
e5 AND Gal80p inhibits production of Gal4p
when GAL3 is over-expressed
e6 AND Gal80p induces expression of GAL7)
• simple event-based expression
• conjunctive hypothesis – must satisfy two expressions
• conjunctive hypothesis with conditional expression
Protege Short Course::Dumontier:March 201232
HYQUE ARCHITECTURE
Callahan A, Dumontier M, Shah NH. HyQue: evaluating hypotheses using Semantic Web technologies. J Biomed Semantics. 2011 May 17;2 Suppl 2:S3.
Callahan A, Dumontier M. Evaluating scientific hypotheses using the SPARQL Inferencing Notation. Extended Semantic Web Conference (ESWC 2012). Heraklion, Crete. May 27-31, 2012. Accepted.
Protege Short Course::Dumontier:March 201233
Rule-based assessment of evidence
• ‘induce’ rule (maximum score: 5):– Is event negated?
• If yes, subtract 2
– Is logical operator ‘induce’?• If yes, add 1; if no, subtract 1
– Is agent of type ‘protein’ or ‘RNA’?• If yes, add 1; if of type ‘gene’, subtract 1
– Is target of type ‘gene’? • If yes, add 1; if no, subtract 1
– Does agent have known ‘transcription factor activity’? • If yes, add 1
– Is event located in the ‘nucleus’?• If yes, add 1; if no, subtract 1
GO:0010628
CHEBI:36080
SO:0000236
GO:0003700
GO:0005634
Protege Short Course::Dumontier:March 201234
Linked Open Results : from hypothesis to evidence
Protege Short Course::Dumontier:March 201235
Literature-Based Enrichment Analysis
• Enrichment analysis on terms extracted using a target ontology for associated articles.
Protege Short Course::Dumontier:March 201236
Enabling enrichment analysis with the Human Disease Ontology. Paea LePendu, , Mark A. Musen, Nigam H. Shah. Journal of Biomedical Informatics. Volume 44, Supplement 1, December 2011, Pages S31–S38
Protege Short Course::Dumontier:March 201237
Phenotype-based predictions
Phenotypes can be used as a substrate to cluster similar diseases, identify potential model systems, predict potential disease-treating drugs or their adverse events, drug repurposing, etc
Protege Short Course::Dumontier:March 201238
Robert Hoehndorf, Paul N. Schofield and Georgios V. Gkoutos. PhenomeNET: a whole-phenome approach to disease gene discovery. Nucleid Acids Research, 2011.
Linking pharmgkb to phenotype studies and animal models of disease for drug repurposing.Hoehndorf R, Oellrich A, Rebholz-Schuhmann D, Schofield PN, Gkoutos GV. Pac Symp Biocomput. 2012:388-99.
CK Chen, CJ Mungall, GV Gkoutos et al. MouseFinder: candidate disease genes from mouse phenotype data. Human Mutation 2012
Tetralogy of Fallot
Protege Short Course::Dumontier:March 201239
OMIM
Human Phenotype Ontology
Phenotype ontologies should contain descriptions of