Construction And Construction And Evaluation Of OWL-DL Evaluation Of OWL-DL Ontologies Ontologies Mark Wilkinson Mark Wilkinson Assistant Professor Assistant Professor Department of Medical Genetics Department of Medical Genetics University of British Columbia University of British Columbia iCAPTURE Centre, St. Paul’s Hospital iCAPTURE Centre, St. Paul’s Hospital Presenting the work of Presenting the work of Benjamin Good, M.Sc. Benjamin Good, M.Sc. Wilkinson Laboratory Wilkinson Laboratory Bioinformatics Doctoral Programme, UBC Bioinformatics Doctoral Programme, UBC
34
Embed
Construction And Evaluation Of OWL-DL Ontologies Mark Wilkinson Assistant Professor Department of Medical Genetics University of British Columbia iCAPTURE.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Construction And Evaluation Of Construction And Evaluation Of OWL-DL OntologiesOWL-DL OntologiesMark WilkinsonMark WilkinsonAssistant ProfessorAssistant ProfessorDepartment of Medical GeneticsDepartment of Medical GeneticsUniversity of British ColumbiaUniversity of British ColumbiaiCAPTURE Centre, St. Paul’s HospitaliCAPTURE Centre, St. Paul’s Hospital
““We believe that [centralized ontology We believe that [centralized ontology building] efforts are unsustainable and building] efforts are unsustainable and
that the Semantic Web will eventually be that the Semantic Web will eventually be built in the same way as the WWW was – built in the same way as the WWW was –
by its users”by its users”
Good and Wilkinson, “The Life Sciences Semantic Web is Full of Creeps!”, Briefings in Good and Wilkinson, “The Life Sciences Semantic Web is Full of Creeps!”, Briefings in Bioinformatics, (in press)Bioinformatics, (in press)
Why Do We Think This Way?Why Do We Think This Way?
BioMoby: Mass collaborative ontology building to support Web Services Interoperability
What Does BioMoby Do?What Does BioMoby Do?
The MOBY PlanThe MOBY Plan
Create an ontology of bioinformatics data-typesCreate an ontology of bioinformatics data-types
Define an XML representation of this ontologyDefine an XML representation of this ontology
Create an ontology of bioinformatics operationsCreate an ontology of bioinformatics operations
Open these ontologies to public inputOpen these ontologies to public input
Define Web interfaces v.v. these two ontologiesDefine Web interfaces v.v. these two ontologies
Register Interfaces in an ontology-aware RegistryRegister Interfaces in an ontology-aware Registry
A Machine can find an appropriate serviceA Machine can find an appropriate service
A Machine can execute that service unattendedA Machine can execute that service unattended
Ontology is community-extensibleOntology is community-extensible
Take home message…this was built by a community of non-expert ontologists!
Open Open Kimono TimeKimono Time
The BioMoby The BioMoby ontology is ontology is quite messy…quite messy…
……communal communal brains can brains can build useful build useful ontologies, but ontologies, but we will need we will need better toolingbetter tooling
How are ontologies usually How are ontologies usually constructed?constructed?
By A Few People With Lots By A Few People With Lots Of Moola!Of Moola!
$15 Billion(?) (Smith, Barry, KBB Workshop, and $15 Billion(?) (Smith, Barry, KBB Workshop, and Montreal, 2005)Montreal, 2005)
Why does it cost so much??
To build the Semantic Web for Life Sciences To build the Semantic Web for Life Sciences we need to encode knowledge from EVERY we need to encode knowledge from EVERY domain of biology – from barley root apex domain of biology – from barley root apex structure and function, to HIV clinical-trials structure and function, to HIV clinical-trials
outcomes… and this knowledge is outcomes… and this knowledge is constantly changing! constantly changing!
At >>$25M a pop, can we At >>$25M a pop, can we affordafford the the Semantic Web???Semantic Web???
The iCAPTURer MethodThe iCAPTURer Method
Template-Assisted Ontology ConstructionTemplate-Assisted Ontology Construction
Pre-iCAPTURerPre-iCAPTURer
Extract the brain of one or a very few experts – expensive and time-consuming…
iCAPTUReriCAPTURerConsume as many brains as possibleConsume as many brains as possible
The iCAPTURer ExperimentThe iCAPTURer Experiment
HypothesesHypotheses
With a starting thesaurus of conceptsWith a starting thesaurus of conceptsWith a clear, simple interface for linking themWith a clear, simple interface for linking them
““wet” researchers can create a robust wet” researchers can create a robust ontology themselvesontology themselves
Using carefully-defined templates, a Knowledge Engineer can control the structure of an ontology
without controlling, nor even understanding, the content
Domain: Cardiovascular and Pulmonary Domain: Cardiovascular and Pulmonary disease, both clinical and moleculardisease, both clinical and molecularCapture Scope Capture Scope
Hyponomy (is a) relationsHyponomy (is a) relations
Ontology Task: Ontological classification Ontology Task: Ontological classification of conference abstracts to aid inof conference abstracts to aid insemantic searchingsemantic searching
InterfaceInterfaceChatterbotChatterbot
““I’ve heard that a cardiac myocyte is a type of I’ve heard that a cardiac myocyte is a type of cardiac cell. Is this true?”cardiac cell. Is this true?”
““I’ve heard that STEMI means the same thing as ST I’ve heard that STEMI means the same thing as ST Elevated Myocardial Infarction. Is that nonsense, or Elevated Myocardial Infarction. Is that nonsense, or
is it correct?”is it correct?”
““How do you feel about your mother?”How do you feel about your mother?”
Results Over 5 daysResults Over 5 days
Concepts accepted and expert-validated: 661Concepts accepted and expert-validated: 661
Humans had an Humans had an extremelyextremely difficult difficulttime classifying things intotime classifying things intopre-existing categoriespre-existing categories
Humans had an Humans had an extremelyextremely difficult time difficult time defining new categories and placing them defining new categories and placing them into the existing classification systeminto the existing classification system
How Do We Know If It Is How Do We Know If It Is Any Good?Any Good?
Templates control structure, but Templates control structure, but not contentnot content
Structurally sound, logically valid, Structurally sound, logically valid, ontologies can still be nonsensical!ontologies can still be nonsensical!
How do we measure the quality of How do we measure the quality of an ontology?an ontology?
Domain specific Domain specific ““Fit” to textFit” to text
Similarity to a Similarity to a gold standardgold standard
Task-basedTask-based
Slow, subjectiveSlow, subjective
Fast, questionable valueFast, questionable value
Fast, useful, not enoughFast, useful, not enough
Fast in theory, useful…Fast in theory, useful…
Fast, dependent on NLPFast, dependent on NLP
Fast to run, extremely Fast to run, extremely slow to set upslow to set up
Real, but not Real, but not generalizablegeneralizable
Problem Problem Evaluating the metricsEvaluating the metrics
No clear winner has yet emerged from the No clear winner has yet emerged from the morass of metricsmorass of metrics
A “global” winner is unlikely to be foundA “global” winner is unlikely to be found
Each seems to have some benefits and Each seems to have some benefits and some disadvantagessome disadvantages
Each may be useful for one ontology but Each may be useful for one ontology but not anothernot another
How do we evaluate which metrics are How do we evaluate which metrics are useful for evaluating our ontologies?useful for evaluating our ontologies?
Ontology Permutation As A Ontology Permutation As A Metrics-Evaluation ToolMetrics-Evaluation Tool
Take an ontology that everyone agreesTake an ontology that everyone agreesis “good”is “good”
Make it worse by systematically adding Make it worse by systematically adding random changes (noise)random changes (noise)
Quality metric should correlate with the Quality metric should correlate with the amount of noise addedamount of noise added
An Objective Comparison Of An Objective Comparison Of Ontology Quality MetricsOntology Quality Metrics
Amount of noise added (ontology quality decreasing)
QualityQualityMetric 1Metric 1
QualityMetric 2
MeasuredMeasuredOntologyOntology
QualityQuality
Adding Noise To OntologiesAdding Noise To Ontologies
Maintain same number of classes and Maintain same number of classes and relationships as well as satisfiabilityrelationships as well as satisfiability
Add noise by swapping relationships Add noise by swapping relationships attached to pairs of classesattached to pairs of classes
Sub/superclassSub/superclass
Domain/range etc.,Domain/range etc.,
Validate with Pellet reasonerValidate with Pellet reasoner
Quantifying NoiseQuantifying Noise
Simple number of changes is misleading, Simple number of changes is misleading, and not a good measure of “noise”and not a good measure of “noise”
Noise better quantified by the degree of Noise better quantified by the degree of (dis)similarity between the permuted (dis)similarity between the permuted ontology and the source ontologyontology and the source ontology
Maedche, A. and S. Staab, Measuring Similarity between OntologiesLecture Notes in Computer Science. 2002. 251
shipssandwater
Example Of Similarity MeasurementExample Of Similarity Measurement
Semantic distanceSemantic distance
fishermen
dolphins
fishseaweed
anchoviestunasharks
Air breathing Water breathing
Aquatic things
non breathing
Air-centric OntologySemantic Distance
Dolphins Fishermen 0
Dolphins Fish 4
1
2 3
4
Leg-centric Ontology Semantic Distance
Dolphins Fishermen 4
Dolphins Fish 0
Example Of Similarity MeasurementExample Of Similarity Measurement
Semantic distanceSemantic distance
fishermenfish
seaweedanchovies
tunasharks
Has legs No legs
1
23
4
dolphins
ships sand
water
Aquatic things
ConclusionsConclusions
Communities can build useful ontologiesCommunities can build useful ontologies
Better tools make better ontologiesBetter tools make better ontologies
Chatterbot templates seem to work wellChatterbot templates seem to work wellCould easily be incorporated into existing Could easily be incorporated into existing software tools for dynamic, organization-wide software tools for dynamic, organization-wide knowledge capture!knowledge capture!
Ontology evaluation is hard!Ontology evaluation is hard!
Some non-task-based evaluation metrics Some non-task-based evaluation metrics are showing promiseare showing promise
Genome CanadaGenome CanadaGenome AlbertaGenome Alberta
Genome British ColumbiaGenome British Columbia
GA: A Bioinformatics Platform for GA: A Bioinformatics Platform for Genome CanadaGenome Canada
GBC: Better Biomarkers in TransplantationGBC: Better Biomarkers in Transplantation
GA: A Bioinformatics Platform for GA: A Bioinformatics Platform for Genome CanadaGenome Canada
GBC: Better Biomarkers in TransplantationGBC: Better Biomarkers in Transplantation
Canadian Institutes For Health ResearchCanadian Institutes For Health Research
Bioinformatics Training ProgramBioinformatics Training Program