Top Banner
@Interontology08, February 27, 2008 The Semantic Web for Scientific Research: A ‘perfect storm’ for the development of Ontology Alan Ruttenberg Principal Scientist
45

@Interontology08, February 27, 2008 The Semantic Web for Scientific Research: A ‘perfect storm’ for the development of Ontology Alan Ruttenberg Principal.

Dec 22, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: @Interontology08, February 27, 2008 The Semantic Web for Scientific Research: A ‘perfect storm’ for the development of Ontology Alan Ruttenberg Principal.

@Interontology08, February 27, 2008

The Semantic Web for Scientific Research: A ‘perfect storm’ for the development of Ontology

Alan RuttenbergPrincipal Scientist

Page 2: @Interontology08, February 27, 2008 The Semantic Web for Scientific Research: A ‘perfect storm’ for the development of Ontology Alan Ruttenberg Principal.

Weather conditions

• Open source ethic is mainstream

• Beginnings of a viable Semantic Web

• Funders: products of public science not optimally used

• Burgeoning quality-focused developer community

Page 3: @Interontology08, February 27, 2008 The Semantic Web for Scientific Research: A ‘perfect storm’ for the development of Ontology Alan Ruttenberg Principal.

• Initial standardizations• OWL 1.0 (OWL 1.1 WG in progress)

• SPARQL

• Viable tools • Scalable triple stores e.g. Virtuoso, Oracle…

• Reasoners: Pellet, Fact++, CEL, QuOnto…

Beginnings of a viable Semantic Web

Page 4: @Interontology08, February 27, 2008 The Semantic Web for Scientific Research: A ‘perfect storm’ for the development of Ontology Alan Ruttenberg Principal.

Funders: Products of public science not optimally used

• Both government and philanthropies

• Data sharing mandates

• Open access publication mandates

• Recognition that Ontology can play key role (and funding)• Wonderweb, NCBO, JCOR, (more in Europe,

beginnings in Australia, China)

• E.g. NIH Ontology grants

Page 5: @Interontology08, February 27, 2008 The Semantic Web for Scientific Research: A ‘perfect storm’ for the development of Ontology Alan Ruttenberg Principal.

Burgeoning quality-focused developer community

• W3C Semantic Web for Life Sciences Interest Group• Brings together scientists, medical researchers, science

writers and informaticians from academia, government, non-profit organizations - health care, pharmaceuticals and industry vendors

• Chartering of second phase in progress

• OBO Foundry• Principle-based development of science-based ontologies

with the goal of creating a suite of interoperable reference ontologies for biomedicine.

• Process and governance are being refined

• Groups are lining up to join

Page 6: @Interontology08, February 27, 2008 The Semantic Web for Scientific Research: A ‘perfect storm’ for the development of Ontology Alan Ruttenberg Principal.

Some projects I’m involved in

• The challenge of data integration at Web scales• The Neurocommons

• Collaborative Ontology Development• OBI – The Ontology for Biomedical Investigations

• Identifying and working through aspects of Ontology• Working with, and on, the Basic Formal Ontology

• What is a Gene Ontology Annotation?

Page 7: @Interontology08, February 27, 2008 The Semantic Web for Scientific Research: A ‘perfect storm’ for the development of Ontology Alan Ruttenberg Principal.

The Neurocommons

AddGene Plasmids

NeuronDB

BAMS

Neurocommons text mining

Homologene

SWAN

Entrez Gene

Gene ontology

annotations

Mammalian Phenotype

PDSPki

BrainPharm

AlzGene

Antibodies

PubChem

MESH

Reactome

Allen Brain Atlas

Publications CCDB

NeuronbankOBO Ontologies

NeuroMorphoSAO

Coriell cells

Page 8: @Interontology08, February 27, 2008 The Semantic Web for Scientific Research: A ‘perfect storm’ for the development of Ontology Alan Ruttenberg Principal.

What’s a (Science) Commons?

• Built on open resources: public domain, open databases, open literature

• Encoded in open architectures and technical standards

Page 9: @Interontology08, February 27, 2008 The Semantic Web for Scientific Research: A ‘perfect storm’ for the development of Ontology Alan Ruttenberg Principal.

Science Commons

• Science Commons is a project of Creative Commons• Creative Commons provides free tools that let authors, scientists, artists, and

educators easily mark their creative work with the freedoms they want it to carry• 140,000,000 objects on the Web under CC licenses in 40+ countries • 700+ peer-reviewed journals carry CC licensing, including Public Library of

Science

• Science Commons specializes CC to science• For consumers of knowledge: make it easy to use and re-use information and

increase chances for discovery• For providers of knowledge: provide legal certainty and automated attribution and

tracking• For funders: provide new metrics for tracking return on investment based on re-use

Page 10: @Interontology08, February 27, 2008 The Semantic Web for Scientific Research: A ‘perfect storm’ for the development of Ontology Alan Ruttenberg Principal.

Neurocomons approach

• From OBO Foundry: Carefully model biology to enable integration of data sources. “Audit trail to reality”

• From Web: Assign all biological entities URIs (lots already provided by OBO) and translate to OWL/RDF

• From OWL: Add triples inferred by reasoner to increase expressiveness of queries with even simple query engine

• From software engineering: Provide data via SPARQL first (API). Build tools on top of that.

• From open source movement: Make it freely available, reproducible

Page 11: @Interontology08, February 27, 2008 The Semantic Web for Scientific Research: A ‘perfect storm’ for the development of Ontology Alan Ruttenberg Principal.

The Gene Ontology

The gene ontology names many biological processes and tells us which genes are known to be involved in those processes.

Page 12: @Interontology08, February 27, 2008 The Semantic Web for Scientific Research: A ‘perfect storm’ for the development of Ontology Alan Ruttenberg Principal.

The Gene Ontology (a small portion)

Activation of innate immune response

Cell surface pattern recognition receptor signaling pathway

Biological Process

is_a

part_of

Page 13: @Interontology08, February 27, 2008 The Semantic Web for Scientific Research: A ‘perfect storm’ for the development of Ontology Alan Ruttenberg Principal.

A simple query:Biological processes in dendrites?

Alzheimer’s disease is characterized by neural degeneration. Among other things, there is damage to dendrites and axons, parts of nerve cells.

What resources do we have available to learn more about biological processes in dendrites?

Page 14: @Interontology08, February 27, 2008 The Semantic Web for Scientific Research: A ‘perfect storm’ for the development of Ontology Alan Ruttenberg Principal.

Biological processes naming dendrites

PREFIX owl: <http://www.w3.org/2002/07/owl#>PREFIX go: <http://purl.org/obo/owl/GO#>PREFIX obo: <http://www.geneontology.org/formats/oboInOwl#>PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>

select ?name ?class ?definitionfrom <http://purl.org/commons/hcls/20070416>where{ graph <http://purl.org/commons/hcls/20070416/classrelations> {?class rdfs:subClassOf go:GO_0008150} ?class rdfs:label ?name. ?class obo:hasDefinition ?def. ?def rdfs:label ?definition filter(regex(?name,"[Dd]endrite"))}

URI for Biological Process(OBO Foundry principles guarantee unique names

for each Universal)

Page 15: @Interontology08, February 27, 2008 The Semantic Web for Scientific Research: A ‘perfect storm’ for the development of Ontology Alan Ruttenberg Principal.

From the “console”

Page 16: @Interontology08, February 27, 2008 The Semantic Web for Scientific Research: A ‘perfect storm’ for the development of Ontology Alan Ruttenberg Principal.

But answers are also available by a “GET”•/sparql/?query=PREFIX%20owl%3A%20%3Chttp%3A%2F%2Fwww.w3.org%2F2002%2F07%2Fowl%23%3E%0APREFIX%20go%3A%20%3Chttp%3A%2F%2Fpurl.org%2Fobo%2Fowl%2FGO%23%3E%0APREFIX%20obo%3A%20%3Chttp%3A%2F%2Fwww.geneontology.org%2Fformats%2FoboInOwl%23%3E%0APREFIX%20rdfs%3A%20%3Chttp%3A%2F%2Fwww.w3.org%2F2000%2F01%2Frdf-schema%23%3E%0A%0Aselect%20%20%3Fname%20%20%3Fclass%20%3Fdefinition%0Afrom%20%3Chttp%3A%2F%2Fpurl.org%2Fcommons%2Fhcls%2F20070416%3E%0Awhere%0A%7B%20%20%20graph%20%3Chttp%3A%2F%2Fpurl.org%2Fcommons%2Fhcls%2F20070416%2Fclassrelations%3E%0A%20%20%20%20%20%7B%3Fclass%20rdfs%3AsubClassOf%20go%3AGO_0008150%7D%0A%20%20%20%20%3Fclass%20rdfs%3Alabel%20%3Fname.%0A%20%20%20%20%3Fclass%20obo%3AhasDefinition%20%3Fdef.%0A%20%20%20%20%3Fdef%20rdfs%3Alabel%20%3Fdefinition%20%0A%20%20%20%20filter(regex(%3Fname%2C%22%5BDd%5Dendrite%22))%0A%7D%0A&format=&maxrows=50

So someone, somewhere else, can build something better

*Note: Different query than previous slide

Page 17: @Interontology08, February 27, 2008 The Semantic Web for Scientific Research: A ‘perfect storm’ for the development of Ontology Alan Ruttenberg Principal.

Three levels of representing scientific knowledge

• Record level: Represent database records. Inconsistent if two sources disagree about contents of a field.

• Statement level: Represent what researchers say. Inconsistent if two people disagree about what a paper said

• Domain level: OBO Foundry approach. Represent your best understanding of consensus. Inconsistent if facts contradict.

• We need all three (but make clear which is which)

• Next slide query is hybrid of Record/Domain

Page 18: @Interontology08, February 27, 2008 The Semantic Web for Scientific Research: A ‘perfect storm’ for the development of Ontology Alan Ruttenberg Principal.

A SPARQL query for processes involved in pyramidal neurons

prefix go: <http://purl.org/obo/owl/GO#>prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>prefix owl: <http://www.w3.org/2002/07/owl#>prefix mesh: <http://purl.org/commons/record/mesh/>prefix sc: <http://purl.org/science/owl/sciencecommons/>prefix ro: <http://www.obofoundry.org/ro/ro.owl#>

select ?genename ?processnamewhere{ graph <http://purl.org/commons/hcls/pubmesh> { ?paper ?p mesh:D017966 . ?article sc:identified_by_pmid ?paper. ?gene sc:describes_gene_or_gene_product_mentioned_by ?article. } graph <http://purl.org/commons/hcls/goa> { ?protein rdfs:subClassOf ?res. ?res owl:onProperty ro:has_function. ?res owl:someValuesFrom ?res2. ?res2 owl:onProperty ro:realized_as. ?res2 owl:someValuesFrom ?process. graph <http://purl.org/commons/hcls/20070416/classrelations> {{?process <http://purl.org/obo/owl/obo#part_of> go:GO_0007166} union {?process rdfs:subClassOf go:GO_0007166 }} ?protein rdfs:subClassOf ?parent. ?parent owl:equivalentClass ?res3. ?res3 owl:hasValue ?gene. } graph <http://purl.org/commons/hcls/gene> { ?gene rdfs:label ?genename } graph <http://purl.org/commons/hcls/20070416> { ?process rdfs:label ?processname}}

Mesh: Pyramidal Neurons

Pubmed: Journal Articles

Entrez Gene: Genes

GO: Signal Transduction

Inference required

Page 19: @Interontology08, February 27, 2008 The Semantic Web for Scientific Research: A ‘perfect storm’ for the development of Ontology Alan Ruttenberg Principal.

Google: 223,000 results

Page 20: @Interontology08, February 27, 2008 The Semantic Web for Scientific Research: A ‘perfect storm’ for the development of Ontology Alan Ruttenberg Principal.

ResultsDRD1, 1812 adenylate cyclase activationADRB2, 154 adenylate cyclase activationADRB2, 154 arrestin mediated desensitization of G-protein coupled receptor protein signaling pathwayDRD1IP, 50632 dopamine receptor signaling pathwayDRD1, 1812 dopamine receptor, adenylate cyclase activating pathwayDRD2, 1813 dopamine receptor, adenylate cyclase inhibiting pathwayGRM7, 2917 G-protein coupled receptor protein signaling pathwayGNG3, 2785 G-protein coupled receptor protein signaling pathwayGNG12, 55970 G-protein coupled receptor protein signaling pathwayDRD2, 1813 G-protein coupled receptor protein signaling pathwayADRB2, 154 G-protein coupled receptor protein signaling pathwayCALM3, 808 G-protein coupled receptor protein signaling pathwayHTR2A, 3356 G-protein coupled receptor protein signaling pathwayDRD1, 1812 G-protein signaling, coupled to cyclic nucleotide second messengerSSTR5, 6755 G-protein signaling, coupled to cyclic nucleotide second messengerMTNR1A, 4543 G-protein signaling, coupled to cyclic nucleotide second messengerCNR2, 1269 G-protein signaling, coupled to cyclic nucleotide second messengerHTR6, 3362 G-protein signaling, coupled to cyclic nucleotide second messengerGRIK2, 2898 glutamate signaling pathwayGRIN1, 2902 glutamate signaling pathwayGRIN2A, 2903 glutamate signaling pathwayGRIN2B, 2904 glutamate signaling pathwayADAM10, 102 integrin-mediated signaling pathwayGRM7, 2917 negative regulation of adenylate cyclase activityLRP1, 4035 negative regulation of Wnt receptor signaling pathwayADAM10, 102 Notch receptor processingASCL1, 429 Notch signaling pathwayHTR2A, 3356 serotonin receptor signaling pathwayADRB2, 154 transmembrane receptor protein tyrosine kinase activation (dimerization)PTPRG, 5793 transmembrane receptor protein tyrosine kinase signaling pathwayEPHA4, 2043 transmembrane receptor protein tyrosine kinase signaling pathwayNRTN, 4902 transmembrane receptor protein tyrosine kinase signaling pathwayCTNND1, 1500 Wnt receptor signaling pathway

Many of the genes are indeed related to Alzheimer’s Disease through gamma secretase (presenilin) activity

Page 21: @Interontology08, February 27, 2008 The Semantic Web for Scientific Research: A ‘perfect storm’ for the development of Ontology Alan Ruttenberg Principal.

What happens when data is discoverable,

queryable, and accessible on the open web?

Allen Brain Institute ServersJavascript

SPARQLAJAX Q

uery

UR

L

http://www.brainmap.org://….0205032816_B.aff/TileGroup3/1-0-1.jpg

GoogleMapsAPI

http://hcls1.csail.mit.edu/map/#Kcnip3@2850,Kcnd1@2800

Neurocommons Servers

Page 22: @Interontology08, February 27, 2008 The Semantic Web for Scientific Research: A ‘perfect storm’ for the development of Ontology Alan Ruttenberg Principal.

Others can “view source”, use our code in their own applications

Page 23: @Interontology08, February 27, 2008 The Semantic Web for Scientific Research: A ‘perfect storm’ for the development of Ontology Alan Ruttenberg Principal.

Background Technology

So far about 350M triples in Openlink Virtuoso (~20Gb)

Commodity Hardware: 2x2core duo/2 disks/8G Ram

Biggest so far is MeSH associations to articles (200M triples)

Smaller, from 10K to 10M triples/source

A small fraction of biological knowledge

(another element of the perfect storm is that computer hardware is so cheap and powerful)

Page 24: @Interontology08, February 27, 2008 The Semantic Web for Scientific Research: A ‘perfect storm’ for the development of Ontology Alan Ruttenberg Principal.

Results are success, but process more so

• Sample of three interesting cases on the way to the neurocommons• Integration of Senselab

• Finding and addressing inconsistency

• Modeling Gene Ontology Annotations

Page 25: @Interontology08, February 27, 2008 The Semantic Web for Scientific Research: A ‘perfect storm’ for the development of Ontology Alan Ruttenberg Principal.

Process(1): NeuronDB

• Started with homegrown ontology. Problem: How to link with anything else

• Eg. No links to evidence, “receptors” versus proteins with receptor activity (like GOA)

• Process, iterate many times, fixing OWL, GO understanding/conformance, augmenting what is in ontology.

• Ends with something that links with GO Function. Accepted process for how to move both NeuronDB and GO forward.

• Next slides – in detail how the discussion/teaching goes

Page 26: @Interontology08, February 27, 2008 The Semantic Web for Scientific Research: A ‘perfect storm’ for the development of Ontology Alan Ruttenberg Principal.

Words mix up functions and objects

Ligand

Neurotransmitter

Hormone

Peptide

Looking for peptides?

Page 27: @Interontology08, February 27, 2008 The Semantic Web for Scientific Research: A ‘perfect storm’ for the development of Ontology Alan Ruttenberg Principal.

Foundry approach connects words to their corresponding entities in reality

PeptideReceptorLigand - A peptide that has a function which makes it able to bind to a receptor

PeptideNeurotransmitter - A peptide expressed in a neuron that has a function which makes it able to regulate another neuron

PeptideHormone - A peptide that produced in one organ and having an regulatory effect in another.

Peptide - A “short” polymer of amino acids

Looking for peptides?

Page 28: @Interontology08, February 27, 2008 The Semantic Web for Scientific Research: A ‘perfect storm’ for the development of Ontology Alan Ruttenberg Principal.

Peptides from CHEBIChemical Entities of Biological Interest

Page 29: @Interontology08, February 27, 2008 The Semantic Web for Scientific Research: A ‘perfect storm’ for the development of Ontology Alan Ruttenberg Principal.

Hormone Activity from GO Molecular Function

Page 30: @Interontology08, February 27, 2008 The Semantic Web for Scientific Research: A ‘perfect storm’ for the development of Ontology Alan Ruttenberg Principal.
Page 31: @Interontology08, February 27, 2008 The Semantic Web for Scientific Research: A ‘perfect storm’ for the development of Ontology Alan Ruttenberg Principal.

Towards RDF/OWL(1)

ALL instances of PeptideHormone are an instance of Peptide that has_role SOME instance of HormoneActivity

Page 32: @Interontology08, February 27, 2008 The Semantic Web for Scientific Research: A ‘perfect storm’ for the development of Ontology Alan Ruttenberg Principal.

Towards RDF/OWL(3)

ALL instances of PeptideHormone are an instance of Peptide that has_role SOME instance of HormoneActivity

Page 33: @Interontology08, February 27, 2008 The Semantic Web for Scientific Research: A ‘perfect storm’ for the development of Ontology Alan Ruttenberg Principal.

Towards RDF/OWL(3) - Instances

Page 34: @Interontology08, February 27, 2008 The Semantic Web for Scientific Research: A ‘perfect storm’ for the development of Ontology Alan Ruttenberg Principal.

Towards RDF/OWL(4) URIs

chebi:25905 = <http://purl.org/obo/owl/CHEBI#CHEBI_25905>

Page 35: @Interontology08, February 27, 2008 The Semantic Web for Scientific Research: A ‘perfect storm’ for the development of Ontology Alan Ruttenberg Principal.

Towards OWL(5) : triples

chebi:25905 rdfs:subClassOf chebi:16670.

chebi:25905 rdfs:subClassOf _:1.

:_1 owl:onProperty ro:hasRole.

:_1 owl:someValuesFrom go:GO_00179.…

Page 36: @Interontology08, February 27, 2008 The Semantic Web for Scientific Research: A ‘perfect storm’ for the development of Ontology Alan Ruttenberg Principal.

SPARQLing: Put ?variables where you are

looking for matches

chebi:25905 rdfs:subClassOf chebi:16670.

chebi:25905 rdfs:subClassOf _:1.

:_1 owl:onProperty ro:hasRole.

:_1 owl:someValuesFrom go:GO_00179.

select ?moleculeClasswhere {

?moleculeClass rdfs:subClassOf chebi:16670.

?moleculeClass rdfs:subClassOf ?res.

?res owl:onProperty ro:hasRole.

?res owl:someValuesFrom go:GO_00179.}

?moleculeClass = chebi:25905

Page 37: @Interontology08, February 27, 2008 The Semantic Web for Scientific Research: A ‘perfect storm’ for the development of Ontology Alan Ruttenberg Principal.

Process(2): Inconsistency!

• Once Neurondb is coded properly, and an OWL reasoner is run, it declares the ontology inconsistent

• Problem: There are contradictory assertions about whether a particular ionic current occurs in a particular cell type.

• What to do? “Three levels of representing scientific knowledge” tell us how inconsistency arises in each

• Inconsistency is NOT acceptable, but might this be an issue of confusion over desired level?

Page 38: @Interontology08, February 27, 2008 The Semantic Web for Scientific Research: A ‘perfect storm’ for the development of Ontology Alan Ruttenberg Principal.

The dispute: Ionic current? Yes or No

Anotherinvestigation

One investigation

Illustration – not the particular cell/current

Page 39: @Interontology08, February 27, 2008 The Semantic Web for Scientific Research: A ‘perfect storm’ for the development of Ontology Alan Ruttenberg Principal.

Resolving the inconsistency

• If at the statement level, there need be no inconsistency if the assertions are qualified as being statements of someone. Choice 1: Rework representation to make this so

• If at the domain level, then only one can be right. Choice 2) As curator make judgement about which is right, or, see if information missing in the representation that would have this not be a contradiction.

• Resolution: Domain level is desired. Closer examination of papers find results from different species.

• Example of “ontological commitment” and dealing with consequences.

Page 40: @Interontology08, February 27, 2008 The Semantic Web for Scientific Research: A ‘perfect storm’ for the development of Ontology Alan Ruttenberg Principal.

Process(3): What is a GO Annotation

Page 41: @Interontology08, February 27, 2008 The Semantic Web for Scientific Research: A ‘perfect storm’ for the development of Ontology Alan Ruttenberg Principal.

Problems with integrating annotations with other knowledge

• What are the entities?

• What are the relationships between the process and the entities.

• How can we make All-Some statements involving annotations?

Page 42: @Interontology08, February 27, 2008 The Semantic Web for Scientific Research: A ‘perfect storm’ for the development of Ontology Alan Ruttenberg Principal.

A closer look

Ask me about evidence?

Page 43: @Interontology08, February 27, 2008 The Semantic Web for Scientific Research: A ‘perfect storm’ for the development of Ontology Alan Ruttenberg Principal.

Semantic Web technology and ontology in the service of science

Let our tools help us find

mistakes (and other insights)

by having representation

that is good enough to be

wrong.

Expressed formally, and in conjunction with a reasoner, we might find that it can't possibly be there are instances of this class (unsatisfiable)

Page 44: @Interontology08, February 27, 2008 The Semantic Web for Scientific Research: A ‘perfect storm’ for the development of Ontology Alan Ruttenberg Principal.

Public science: What we’d like to do better

• Broader knowledge base - cells, anatomy, physiology, behavior, protocols, reagents

• Beyond simple interaction: More precise representations of mechanism to be able to query and exploit computationally

• Built in a open, scalable, scientifically credible way, to encourage sustained contribution, and to take advantage of “web effects”

Page 45: @Interontology08, February 27, 2008 The Semantic Web for Scientific Research: A ‘perfect storm’ for the development of Ontology Alan Ruttenberg Principal.

How do we get there?

• Interoperation is paramount, but modeling is hard: Work with the OBO Foundry

• Build a skilled community

• Use (open!) Semantic Web Technologies to enable web effects

• Support and nurture a growing and vigorous community (SWAN, BIRN, OBI) all of whom build on the rest and enable others to build more

• Work to advance key technologies and infrastructure - text mining, structured abstracts, query, reasoning.

• Recruit more ontologists! (That’s you)