Top Banner
Provenance information in biomedical knowledge repositories A use case Olivier Bodenreider Lister Hill National Center for Biomedical Communications Bethesda, Maryland - USA Westfields Conference Center, Washington D.C., USA. October 25, 2009
31

Provenance information in biomedical knowledge repositories · 25.10.2009  · in biomedical knowledge repositories A use case Olivier Bodenreider Lister Hill National Center for

Oct 18, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Provenance information in biomedical knowledge repositories · 25.10.2009  · in biomedical knowledge repositories A use case Olivier Bodenreider Lister Hill National Center for

Provenance informationin biomedical knowledge repositories

A use case

Olivier Bodenreider

Lister Hill National Centerfor Biomedical Communications

Bethesda, Maryland - USA

Westfields Conference Center, Washington D.C., USA.October 25, 2009

Page 2: Provenance information in biomedical knowledge repositories · 25.10.2009  · in biomedical knowledge repositories A use case Olivier Bodenreider Lister Hill National Center for

Lister Hill National Center for Biomedical Communications 2

Advanced Library Services project

Biomedical Knowledge Repository Knowledge extracted from

Textual sources (e.g., biomedical literature) using Natural Language Processing (NLP) techniques

Structured knowledge bases (e.g., Entrez) Terminological resources (e.g., UMLS)

Support services including Enhanced information retrieval Multi-document summarization Question answering Knowledge discovery

Page 3: Provenance information in biomedical knowledge repositories · 25.10.2009  · in biomedical knowledge repositories A use case Olivier Bodenreider Lister Hill National Center for

Lister Hill National Center for Biomedical Communications 3

Outline

Examples of provenance informationin biomedical knowledge bases

Examples of applications requiring provenance information

Issues and challenges

Page 4: Provenance information in biomedical knowledge repositories · 25.10.2009  · in biomedical knowledge repositories A use case Olivier Bodenreider Lister Hill National Center for

Examples of provenance informationin biomedical knowledge bases

Page 5: Provenance information in biomedical knowledge repositories · 25.10.2009  · in biomedical knowledge repositories A use case Olivier Bodenreider Lister Hill National Center for

Lister Hill National Center for Biomedical Communications 5

References for the examples

Entrez SystemNational Center for Biotechnology Information (NCBI) Entrez Gene

http://www.ncbi.nlm.nih.gov/gene/7068

PubMed http://www.ncbi.nlm.nih.gov/pubmed/17177139

Mouse Genome Informatics (MGI)The Jackson Laboratory Mammalian Orthology

http://www.informatics.jax.org/searches/homology_form.shtml

Page 6: Provenance information in biomedical knowledge repositories · 25.10.2009  · in biomedical knowledge repositories A use case Olivier Bodenreider Lister Hill National Center for

Lister Hill National Center for Biomedical Communications 6

Page 7: Provenance information in biomedical knowledge repositories · 25.10.2009  · in biomedical knowledge repositories A use case Olivier Bodenreider Lister Hill National Center for

Lister Hill National Center for Biomedical Communications 7

EG:7068 THRB

hasSymbol

EG:7068 GRTH

EG:7068 PHRT

HGNC

providedBy

Page 8: Provenance information in biomedical knowledge repositories · 25.10.2009  · in biomedical knowledge repositories A use case Olivier Bodenreider Lister Hill National Center for

Lister Hill National Center for Biomedical Communications 8

Page 9: Provenance information in biomedical knowledge repositories · 25.10.2009  · in biomedical knowledge repositories A use case Olivier Bodenreider Lister Hill National Center for

Lister Hill National Center for Biomedical Communications 9

EG:7068 EG:8125 (ANP32A)

interactsWith

EG:7068 EG:9318 (COPS2)

BIND

providedBy

HPRD

PMID:7776974supportedBy

PMID:10207062

Page 10: Provenance information in biomedical knowledge repositories · 25.10.2009  · in biomedical knowledge repositories A use case Olivier Bodenreider Lister Hill National Center for

Lister Hill National Center for Biomedical Communications 10

Page 11: Provenance information in biomedical knowledge repositories · 25.10.2009  · in biomedical knowledge repositories A use case Olivier Bodenreider Lister Hill National Center for

Lister Hill National Center for Biomedical Communications 11

TAS

PMID:1618799supportedBy

IEA

hasEvidencehasFunction

EG:7068 GO:0003707 (steroid hormonereceptor activity)

EG:7068 GO:0004887 (thyroid hormonereceptor activity)

providedBy GOA

Page 12: Provenance information in biomedical knowledge repositories · 25.10.2009  · in biomedical knowledge repositories A use case Olivier Bodenreider Lister Hill National Center for

Lister Hill National Center for Biomedical Communications 12

Page 13: Provenance information in biomedical knowledge repositories · 25.10.2009  · in biomedical knowledge repositories A use case Olivier Bodenreider Lister Hill National Center for

Lister Hill National Center for Biomedical Communications 13

J:90500supportedBy

orthologousWith

providedBy MGI

hasEvidence

NT

AA

EG:7068 EG:21384

Page 14: Provenance information in biomedical knowledge repositories · 25.10.2009  · in biomedical knowledge repositories A use case Olivier Bodenreider Lister Hill National Center for

Lister Hill National Center for Biomedical Communications 14

Page 15: Provenance information in biomedical knowledge repositories · 25.10.2009  · in biomedical knowledge repositories A use case Olivier Bodenreider Lister Hill National Center for

Lister Hill National Center for Biomedical Communications 15

Page 16: Provenance information in biomedical knowledge repositories · 25.10.2009  · in biomedical knowledge repositories A use case Olivier Bodenreider Lister Hill National Center for

Lister Hill National Center for Biomedical Communications 16

YesisMajorTopic

indexedBy

PMID:17177139 MESH:D009154 (Mutation)

PMID:17177139 MESH:D037042 (Thyroid HormoneReceptor Beta)

providedBy MEDLINE

PMID:17177139 2006/12/21creationDate

Page 17: Provenance information in biomedical knowledge repositories · 25.10.2009  · in biomedical knowledge repositories A use case Olivier Bodenreider Lister Hill National Center for

Examples of applications requiring provenance information

Page 18: Provenance information in biomedical knowledge repositories · 25.10.2009  · in biomedical knowledge repositories A use case Olivier Bodenreider Lister Hill National Center for

Lister Hill National Center for Biomedical Communications 18

Types of applications

Information retrievalMulti-document summarizationQuestion answeringKnowledge discovery

Page 19: Provenance information in biomedical knowledge repositories · 25.10.2009  · in biomedical knowledge repositories A use case Olivier Bodenreider Lister Hill National Center for

Lister Hill National Center for Biomedical Communications 19

Information retrieval

Application Search by statements

e.g., find all documents asserting that “IL-13 inhibits COX-2”

Provenance information Publication date Origin of indexing … (Similar to traditional search engines)

Page 20: Provenance information in biomedical knowledge repositories · 25.10.2009  · in biomedical knowledge repositories A use case Olivier Bodenreider Lister Hill National Center for

Lister Hill National Center for Biomedical Communications 20

Multi-document summarization

Application Extract and prioritize statements from multiple

documents to create a summary Provenance information

Level of confidence (e.g., for automatic extraction using NLP techniques)

Page 21: Provenance information in biomedical knowledge repositories · 25.10.2009  · in biomedical knowledge repositories A use case Olivier Bodenreider Lister Hill National Center for

Lister Hill National Center for Biomedical Communications 21

Question answering

Application Find answers to templated questions (e.g., “what genes

does IL-13 exhibit?”) Provenance information

Select reputable sources (provenance information associated with the documents: source)

Select recent documents (provenance information associated with the documents: publication date)

Select valid statements (provenance information associated with the statements: level of confidence)

Page 22: Provenance information in biomedical knowledge repositories · 25.10.2009  · in biomedical knowledge repositories A use case Olivier Bodenreider Lister Hill National Center for

Lister Hill National Center for Biomedical Communications 22

Knowledge discovery

Application Find path in a graph between entities of interest, using

patterns of link types Provenance information

Origin of the statements (not entities) Required for both asserted and inferred statements

Compute provenance information for inferred statements

Page 23: Provenance information in biomedical knowledge repositories · 25.10.2009  · in biomedical knowledge repositories A use case Olivier Bodenreider Lister Hill National Center for

Issues and challenges

Page 24: Provenance information in biomedical knowledge repositories · 25.10.2009  · in biomedical knowledge repositories A use case Olivier Bodenreider Lister Hill National Center for

Lister Hill National Center for Biomedical Communications 24

Limitations of naïve implementation

Reification through blank nodes Not intuitive to users

Further away from the domain model Increases the complexity of queries

Inefficient Increases the number of triples Scalability issues

Page 25: Provenance information in biomedical knowledge repositories · 25.10.2009  · in biomedical knowledge repositories A use case Olivier Bodenreider Lister Hill National Center for

Lister Hill National Center for Biomedical Communications 25

Lack of support for provenance

No native support for provenance information in Mainstream triple stores Major query languages for triple stores

Many variants of SPARQL and RQL provide limited support

Named graphs (supported in quad stores) do not offer the required level of granularity

Standardization of emerging provenance models

Page 26: Provenance information in biomedical knowledge repositories · 25.10.2009  · in biomedical knowledge repositories A use case Olivier Bodenreider Lister Hill National Center for

Lister Hill National Center for Biomedical Communications 26

Linked datahttp://linkeddata.org

Page 27: Provenance information in biomedical knowledge repositories · 25.10.2009  · in biomedical knowledge repositories A use case Olivier Bodenreider Lister Hill National Center for

Lister Hill National Center for Biomedical Communications 27

Linked data

Page 28: Provenance information in biomedical knowledge repositories · 25.10.2009  · in biomedical knowledge repositories A use case Olivier Bodenreider Lister Hill National Center for

Lister Hill National Center for Biomedical Communications 28

Linked biomedical data[Tim Berners-Lee TED 2009 conference]http://www.w3.org/2009/Talks/0204-ted-tbl/#(1)

Page 29: Provenance information in biomedical knowledge repositories · 25.10.2009  · in biomedical knowledge repositories A use case Olivier Bodenreider Lister Hill National Center for

Lister Hill National Center for Biomedical Communications 29

Linked data vs. provenance

Currently No provenance information in Linked Data Does Bio2RDF’s “Banff manifesto” exclude

provenance de facto? (no blank nodes allowed) Ability to link datasets outweighs absence of

provenance informationLimitations

Applications cannot select/exclude specific statements Navigation vs. knowledge discovery

Page 30: Provenance information in biomedical knowledge repositories · 25.10.2009  · in biomedical knowledge repositories A use case Olivier Bodenreider Lister Hill National Center for

Lister Hill National Center for Biomedical Communications 30

Summary

Need for systems handling provenance information Transparently for the user

Directly in the triple stores / query languages

At different levels of granularity e.g., resource vs. statement within a resource

For both asserted and inferred statements Scalability

Not exposing provenance information in Linked Data is a major limitation

Page 31: Provenance information in biomedical knowledge repositories · 25.10.2009  · in biomedical knowledge repositories A use case Olivier Bodenreider Lister Hill National Center for

MedicalOntologyResearch

Olivier Bodenreider

Lister Hill National Centerfor Biomedical CommunicationsBethesda, Maryland - USA

Contact:Web:

[email protected]