Integrating Large, Disparate, Biomedical Ontologies to Boost Organ Development Network Connectivity
Post on 26-Jun-2015
414 Views
Preview:
Transcript
Integrating Large, Disparate, Biomedical Integrating Large, Disparate, Biomedical Ontologies to Boost Organ Development Ontologies to Boost Organ Development Network ConnectivityNetwork Connectivity
Chimezie Ogbuji1 and Rong Xu2
Metacognition LLC1
Case Western Reserve University2
OutlineOutlineOutline
◦Background◦Motivation◦Literature review / related work◦Opportunity / specific example◦Hypothesis◦Method◦Evaluation◦Discussion
BackgroundBackgroundControlled biomedical vocabulary
systems (and ontologies) play a key role in the analysis of genetic disease◦Structured, interoperable, and machine-
readable◦Facilitate reproducibility of scientific
results and use of intelligent software that can leverage underlying meaning
◦Scientific results and the structured biomedical knowledge they are based on may be used for multiple - even unanticipated - purposes
MotivationMotivationWant descriptive relations that
comprise terminology paths between (congenital) diseases and the anatomical entities that become malformed
Want to use these as the basis for analysis and classification of congenital disorders according to their underlying molecular mechanism
OpportunityOpportunity The Gene Ontology (GO) is arguably the most
prominent example of how highly-organized and structured medical knowledge can be leveraged to facilitate medical genetics◦ Has a hierarchy of biological processes involving
organ development. The Foundational Model of Anatomy (FMA) is a
vast ontology with an objective to conceptualize the physical objects and spaces that constitute the human body ◦ macroscopic, microscopic and sub-cellular
canonical anatomy. Their skeletal relations (is_a, part_of, and
has_part) have the same meaning
Literature reviewLiterature reviewCellular components function via
interaction with each other in a highly-complex and interconnected network
Interdependencies among a cell’s molecular components lead to functional, molecular, and causal relationships among distinct phenotypes.
Network-based approaches to disease have the potential to provide a framework for classifying disease, defining susceptibility, predicting disease outcome, and identifying tailored therapeutic strategies
Barabási et al. Network Medicine: A Network-based Approach to Human Disease, Nature Reviews Genetics 2011.
Barabási et al. 2011
For over a decade, analysis of biological networks via network and graph theory has revealed the importance of locally-dense andwell-connected subgraphs (hubs).Schwikowski et al. A network of protein-protein interactions in yeast 2000
Related workRelated workInvestigation of structural and lexical
concordance between anatomy terms in the FMA and SNOMED-CT◦ Bodenreider & Zhang 2006
Leveraging this concordance for integrating modules from each for a specific domain◦ Ogbuji et al. 2010
Discussion of logical consequences of using part_of between both anatomical entities (in the FMA) and biological processes (the GO)◦ Jimenez-Ruiz et al. 2010
Marfan Syndrome (MFS)Marfan Syndrome (MFS)
[…] mainly characterized by aneurysm formation in the proximal ascending aorta, leading to aortic dissection or rupture at a young age when left untreated. The identification of the underlying genetic cause of MFS, namely mutations in the fibrillin-1 gene (FBN1), has further enhanced [...] insights into the complex pathophysiology of aneurysm formation
In UMLS Metathesaurus• Finding site: connective tissue structure (SNOMED-CT)
• Category: congenitial skeletal disorder (CRISP Thesaurus and NLM MTH)
Marfan Syndrome Marfan Syndrome exampleexampleIn the GO, FBN1 is annotated with the
GO_0001501 (skeletal system development) and GO_0007507 (heart development) concepts (amongst others)
The former coincides with the more common finding site and classification of MFS as a congenital skeletal disorder
This is in spite of the fact that associations (causal and otherwise) between MFS and cardiovascular diseases such as aortic root dilation are well-documented in the medical literature
HypothesisHypothesisA high-quality integration of the GO's
development process hierarchy with the FMA will have several benefits:◦ New biological pathways from genetic
diseases to the anatomical entities whose development are involved in their underlying molecular mechanisms
◦ Graph and network analysis can benefit from an increase in connectivity for discovering biologically meaningful motifs
◦ Similarly, classification algorithms can also take advantage of this
Copper: annotates human geneGold : does not annotate human gene
Method and materialsMethod and materialsIntegration is performed on the
following GO development process hiearchies◦ Anatomical structure development◦ Anatomical structure arrangement◦ Anatomical structure morphogenesis
Only GO concepts that annotate human genes are considered
In processing the GO, the logical properties (transitivity, for example) of the relations are fully considered◦ This will always be the case, henceforth
Method and materials Method and materials (continued)(continued)The FMA ontology is loaded (as OWL/RDF)
into a triple store for remote querying via SPARQL
The prefix of the human-readable label for each GO concept in the development hierarchies is stemmed and used as a basis for case-insensitive, lexical matching on primary labels and exact synonyms of FMA classes via a SPARQL query
FMA classes that match exactly are considered to denote the anatomical entities that participate in the corresponding GO biological process
ExampleExampleGO_0007507 (heart
development)Prefix: heartMatching FMA concept: FMA_7088 (Heart)
EvaluationEvaluationResult: 1644 development
process and anatomical entity pairs
We calculate the Jaccard coefficient of the overlap between hierarchies for 6 major organs and the anatomical development processes they participate in
Evaluation (continued)Evaluation (continued)Using the GO development process for
some FMA organ O as the starting point, the set of all subordinate terms is calculated: GOsubgraph(O)
Example:◦ GO_0007507 (heart development) has
GO_0003170 (heart valve development) as a component (via has_part)
◦ GO_0003170 subsumes GO_0003176 (aortic valve development) and has GO_0003179 (heart valve morphogenesis) as a component
◦ Each of these would be considered as subordinates of GO_0007507
Evaluation (continued)Evaluation (continued)In a similar fashion, the subordinate
anatomical entities for each O amongst the 6 chosen organs are calculated: ◦ FMAsubgraph(O)
For each O, we calculate the GO terms that are both in GOsubgraph(O) and were matched with an FMA class that is in FMAsubgraph(O)
This resulting set of GO terms is considered the intersecting set and the Jaccard coefficient is calculated with respect to this, FMAsubgraph(O), and GOsubgraph(O)
Jaccard Coefficient (overlap)Jaccard Coefficient (overlap)
Evaluation: network Evaluation: network connectivityconnectivityWe calculate number of new
paths from OMIM diseases through their genes to the anatomical entities in the FMA:◦P+
dgo
Similarly, we calculate the number of new paths starting from the genes to additional FMA anatomical entities◦P+
go
Network connectivity: Network connectivity: continuedcontinuedOnly genes that are annotated
with anatomical development processes matched to FMA classes and OMIM diseases associated with these genes were considered◦Genesdev
Number of additional P+dgo paths on a logarithmic
scale
Log-scaled histogram of additional paths from Genesdev to FMA classes, only for those genes that had additional paths
Evaluation summaryEvaluation summaryOn average, mapping introduces
9,549 additional P+dgo paths per
OMIM diseaseOn average, each Genedev gene had
17,037 additional paths to FMA classes
Caveat in normalizing the number of P+
dgo paths by number of genes◦paths from diseases to anatomical
entities introduce combinatorial factor of disease-gene pairings
DiscussionDiscussionOverlap results indicate little
overlap between the GO hierarchies and corresponding FMA hierarchies
Not surprising as both cover disparate domains within medicine and one is specific to humans while the other is not
Discussion (continued)Discussion (continued)This along with the size of the FMA
as a whole and within the portions mapped to the GO hierarchies indicate opportunity to build on the mapping and to integrate both ontologies in a meaningful way
Connectivity results demonstrate significant increase of biological paths from genetic diseases (and their genes) to the anatomical entities participating in the development process
Discussion (continued)Discussion (continued)As these paths are at least as
logically and biologically sound as the ontologies they were forged from, we expect that an appreciable amount of them will be useful for analysis
To our knowledge, this is the first attempt of this kind to integrate the anatomical structural development, morphogenesis, and organization hierarchies in the GO with the FMA
LimitationsLimitationsRegarding deductions (formal or
otherwise) that follow from an integration of the FMA and GO◦Need to be careful to only consider
annotations for humans or to have a robust way to manage the uncertainty introduced in not doing so
top related