Enriching Bio-ontologies with Non-hierarchical Relations Workshop on bio-ontologies October 28, 2005 Olivier Bodenreider Olivier Bodenreider Lister Hill National Center Lister Hill National Center for Biomedical Communications for Biomedical Communications Bethesda, Maryland - USA Bethesda, Maryland - USA
38
Embed
Enriching Bio-ontologies with Non-hierarchical Relations
Workshop on bio-ontologies October 28, 2005. Enriching Bio-ontologies with Non-hierarchical Relations. Olivier Bodenreider Lister Hill National Center for Biomedical Communications Bethesda, Maryland - USA. Acknowledgments. Marc Aubry UMR 6061 CNRS, Rennes, France - PowerPoint PPT Presentation
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Lister Hill National CenterLister Hill National Centerfor Biomedical Communicationsfor Biomedical CommunicationsBethesda, Maryland - USABethesda, Maryland - USA
2 Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications
AcknowledgmentsAcknowledgments
Marc AubryMarc AubryUMR 6061 CNRS,UMR 6061 CNRS,Rennes, FranceRennes, France
Anita BurgunAnita BurgunUniversity of University of Rennes, FranceRennes, France
3 Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications
Explicit relations to other terms within the same Explicit relations to other terms within the same hierarchyhierarchy
No (explicit) relationsNo (explicit) relations To terms across hierarchiesTo terms across hierarchies To concepts from other biological ontologiesTo concepts from other biological ontologies
4 Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications
Gene OntologyGene Ontology
Molecularfunctions
Cellularcomponents
Biologicalprocesses
BP: metal ion transportMF: metal ion transporter activity
5 Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications
6 Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications
Related workRelated work
Ontologizing GOOntologizing GO GONGGONG
Identifying relations among GO terms across hierarchiesIdentifying relations among GO terms across hierarchies Lexical approachLexical approach Non-lexical approachesNon-lexical approaches
Identifying relations between GO terms and OBO termsIdentifying relations between GO terms and OBO terms ChEBIChEBI
Representing relations among GO terms and between GO Representing relations among GO terms and between GO terms and OBO termsterms and OBO terms ObolObol
9 Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications
Three non-lexical approachesThree non-lexical approaches
All based on annotation databasesAll based on annotation databases
Similarity in the vector space modelSimilarity in the vector space model
Statistical analysis of co-occurring GO termsStatistical analysis of co-occurring GO terms
Association rule miningAssociation rule mining
10 Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications
Similarity in the vector space modelSimilarity in the vector space model
Annotationdatabase
Annotationdatabase
GO
term
s
Genesg1 g2 … gn
t1
t2
…
tn
GO terms
Gen
es
g1
g2
…
gn
t1 t2 … tn
11 Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications
Similarity in the vector space modelSimilarity in the vector space model
GO terms
GO
term
s t1
t2
…
tn
t1 t2 … tn
Similaritymatrix
Similaritymatrix
Sim(ti,tj) = ti · tj
GO
term
s
Genesg1 g2 … gn
t1
t2
…
tn
tj
ti
12 Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications
Analysis of co-occurring GO termsAnalysis of co-occurring GO terms
Annotationdatabase
Annotationdatabase
GO terms
Gen
es
g1
g2
…
gn
t1 t2 … tn
g2
t2 t7 t9
… t3 t7 t9
g5
t5
tt22-t-t77 11
tt22-t-t99 11
tt77-t-t99 22
……
tt55 11
tt77 22
tt99 22
……
13 Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications
Analysis of co-occurring GO termsAnalysis of co-occurring GO terms
Statistical analysis: test independenceStatistical analysis: test independence Likelihood ratio test (GLikelihood ratio test (G22)) Chi-square test (Pearson’s Chi-square test (Pearson’s χχ22))
Example from GOA (Example from GOA (22,72022,720 annotations) annotations) C0006955 [BP]C0006955 [BP] Freq. = Freq. = 588588 C0008009 [MF]C0008009 [MF] Freq. = Freq. = 5353
presentpresent absentabsent TotalTotal
presentpresent 4646 542542 588588
absentabsent 77 21,58321,583 22,13222,132
totaltotal 5353 22,12522,125 22,72022,720
GO:0008009 immune responseimmune response
GO:0006955
chemokinechemokineactivityactivity
Co-oc. = Co-oc. = 4646
GG22 = 298.7 = 298.7p < 0.000p < 0.000
14 Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications
Association rule miningAssociation rule mining
Annotationdatabase
Annotationdatabase
GO terms
Gen
es
g1
g2
…
gn
t1 t2 … tn
g2
t2 t7 t9
transaction
apriori• Rules: t1 => t2
• Confidence: > .9• Support: .05
15 Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications
Examples of associationsExamples of associations
Vector space modelVector space model MF: ice binding BP: response to freezing
(generated singular forms from plurals)(generated singular forms from plurals)
31 Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications
ExamplesExamples
iron [CHEBI:18248]
uronic acid [CHEBI:27252]
carbon [CHEBI:27594]
BP iron ion transport [GO:0006826]
MF iron superoxide dismutase activity [GO:0008382]
CC vanadium-iron nitrogenase complex [GO:0016613]
BP response to carbon dioxide [GO:0010037]
MF carbon-carbon lyase activity [GO:0016830]
BP uronic acid metabolism [GO:0006063]
MF uronic acid transporter activity [GO:0015133]
32 Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications
Quantitative resultsQuantitative results
2,700 ChEBI entities 2,700 ChEBI entities (27%) identified in some (27%) identified in some GO termGO term
9,431 GO terms (55%) 9,431 GO terms (55%) include some ChEBI include some ChEBI entity in their namesentity in their names
10,516 entities 17,250 terms
20,497 associations
33 Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications
GeneralizationGeneralization
MFCC BP
CHEBI
FMA
Cell types
REX FIX
MousePathology
cellmembraneviscosity
enzymaticreaction
Leydic cell tumor
irondeposition
cerebellar aplasia
Conclusions
35 Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications
Conclusions (1)Conclusions (1)
Links across OBO ontologies need to be made Links across OBO ontologies need to be made explicitexplicit Between GO terms across GO hierarchiesBetween GO terms across GO hierarchies Between GO terms and OBO termsBetween GO terms and OBO terms Between terms across OBO ontologiesBetween terms across OBO ontologies
Automatic approachesAutomatic approaches Effective (GO-GO, GO-ChEBI)Effective (GO-GO, GO-ChEBI) At least to bootstrap the processAt least to bootstrap the process Needs to be refinedNeeds to be refined
36 Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications
Conclusions (2)Conclusions (2)
Affordable relationsAffordable relations Computer-intensive, not labor-intensiveComputer-intensive, not labor-intensive
Methods must be combinedMethods must be combined Cross-validationCross-validation Redundancy as a surrogate for reliabilityRedundancy as a surrogate for reliability Relations identified specifically by one approachRelations identified specifically by one approach
False positivesFalse positives Specific strength of a particular methodSpecific strength of a particular method
Requires (some) manual curationRequires (some) manual curation Biologists must be involvedBiologists must be involved
37 Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications
ReferencesReferences
Bodenreider O, Aubry M, Burgun A. Bodenreider O, Aubry M, Burgun A. Non-lexical approaches to Non-lexical approaches to identifying associative relations in the Gene Ontologyidentifying associative relations in the Gene Ontology. In: Altman RB, . In: Altman RB, Dunker AK, Hunter L, Jung TA, Klein TE, editors. Pacific Dunker AK, Hunter L, Jung TA, Klein TE, editors. Pacific Symposium on Biocomputing 2005: World Scientific; 2005. p. 91-Symposium on Biocomputing 2005: World Scientific; 2005. p. 91-102.102.http://mor.nlm.nih.gov/pubs/pdf/2005-psb-ob.pdfhttp://mor.nlm.nih.gov/pubs/pdf/2005-psb-ob.pdf
Burgun A, Bodenreider O. Burgun A, Bodenreider O. An ontology of chemical entities helps An ontology of chemical entities helps identify dependence relations among Gene Ontology termsidentify dependence relations among Gene Ontology terms. . Proceedings of the First International Symposium on Semantic Mining Proceedings of the First International Symposium on Semantic Mining in Biomedicine (SMBM-2005)in Biomedicine (SMBM-2005)Electronic proceedings: CEUR-WS/Vol-148Electronic proceedings: CEUR-WS/Vol-148http://mor.nlm.nih.gov/pubs/pdf/2005-smbm-ab.pdfhttp://mor.nlm.nih.gov/pubs/pdf/2005-smbm-ab.pdf
Lister Hill National CenterLister Hill National Centerfor Biomedical Communicationsfor Biomedical CommunicationsBethesda, Maryland - USABethesda, Maryland - USA