Top Banner
The UMLS and the Semantic Web W3C Semantic Web W3C Semantic Web Health Care and Life Sciences Interest Group Health Care and Life Sciences Interest Group BioRDF Teleconference BioRDF Teleconference September 22, 2008 September 22, 2008 Olivier Bodenreider Olivier Bodenreider Lister Hill National Center Lister Hill National Center for Biomedical Communications for Biomedical Communications Bethesda, Maryland - USA Bethesda, Maryland - USA
25

The UMLS and the Semantic Web

Dec 30, 2015

Download

Documents

ferris-atkinson

W3C Semantic Web Health Care and Life Sciences Interest Group BioRDF Teleconference September 22, 2008. The UMLS and the Semantic Web. Olivier Bodenreider Lister Hill National Center for Biomedical Communications Bethesda, Maryland - USA. Outline. The UMLS (in a nutshell) - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: The UMLS and the Semantic Web

The UMLS and the Semantic Web

W3C Semantic WebW3C Semantic WebHealth Care and Life Sciences Interest GroupHealth Care and Life Sciences Interest Group

BioRDF TeleconferenceBioRDF TeleconferenceSeptember 22, 2008September 22, 2008

Olivier BodenreiderOlivier Bodenreider

Lister Hill National CenterLister Hill National Centerfor Biomedical Communicationsfor Biomedical CommunicationsBethesda, Maryland - USABethesda, Maryland - USA

Page 2: The UMLS and the Semantic Web

Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications 2

OutlineOutline

The UMLS (in a nutshell)The UMLS (in a nutshell) Lexical resourcesLexical resources MetathesaurusMetathesaurus Semantic NetworkSemantic Network

Why is the UMLS relevant to the Semantic Web?Why is the UMLS relevant to the Semantic Web? Issues and challengesIssues and challenges

Page 3: The UMLS and the Semantic Web

Unified Medical Language System Unified Medical Language System (UMLS)(UMLS)

Page 4: The UMLS and the Semantic Web

Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications 4

UMLS: 3 componentsUMLS: 3 components

SPECIALIST LexiconSPECIALIST Lexicon 200,000 lexical items200,000 lexical items Part of speech and variant informationPart of speech and variant information

MetathesaurusMetathesaurus 5M names from over 100 terminologies5M names from over 100 terminologies 1M concepts1M concepts 16M relations16M relations

Semantic NetworkSemantic Network 135 high-level categories135 high-level categories 7000 relations among them7000 relations among them

Lexicalresources

Ontologicalresources

Terminologicalresources

Page 5: The UMLS and the Semantic Web

Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications 5

UMLS UMLS Characteristics (1)Characteristics (1)

Current version: 2008AA (2-3 annual releases)Current version: 2008AA (2-3 annual releases) Type: Terminology integration systemType: Terminology integration system Domain: BiomedicineDomain: Biomedicine Developer: NLMDeveloper: NLM Funding: NLM (intramural)Funding: NLM (intramural) AvailabilityAvailability

Publicly available: Yes* (cost-free license required)Publicly available: Yes* (cost-free license required) Repositories: UMLSRepositories: UMLS

URL: URL: http://umlsks.nlm.nih.gov/

Page 6: The UMLS and the Semantic Web

Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications 6

UMLS UMLS Characteristics (2)Characteristics (2)

Number ofNumber of Concepts: 1.5M (2008AA)Concepts: 1.5M (2008AA) Terms: ~6MTerms: ~6M

Major organizing principles (Metathesaurus):Major organizing principles (Metathesaurus): Concept orientationConcept orientation Source transparencySource transparency Multi-lingual through translationMulti-lingual through translation

Formalism: Proprietary format (RRF)Formalism: Proprietary format (RRF)

Page 7: The UMLS and the Semantic Web

Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications 7

UMLS UMLS Integrating subdomainsIntegrating subdomains

Biomedicalliterature

Biomedicalliterature

MeSH

Genomeannotations

Genomeannotations

GOModelorganisms

Modelorganisms

NCBITaxonomy

Geneticknowledge bases

Geneticknowledge bases

OMIM

Clinicalrepositories

Clinicalrepositories

SNOMED CTOthersubdomains

Othersubdomains

AnatomyAnatomy

FMA

UMLS

Page 8: The UMLS and the Semantic Web

Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications 8

Trans-namespace integrationTrans-namespace integration

Genomeannotations

Genomeannotations

GOModelorganisms

Modelorganisms

NCBITaxonomy

Geneticknowledge bases

Geneticknowledge bases

OMIMOther

subdomainsOther

subdomains

AnatomyAnatomy

FMA

UMLS

Addison Disease (D000224)

Addison's disease (363732003)

Biomedicalliterature

Biomedicalliterature

MeSH

Clinicalrepositories

Clinicalrepositories

SNOMED CT

UMLSC0001403

Page 9: The UMLS and the Semantic Web

Heart

Concepts

Metathesaurus

22

225

97

4

12

9 31

Esophagus

Left PhrenicNerve

HeartValves

FetalHeart

Medias-tinum

SaccularViscus

AnginaPectoris

CardiotonicAgents

TissueDonors

AnatomicalStructure

Fully FormedAnatomical

Structure

EmbryonicStructure

Body Part, Organ orOrgan Component Pharmacologic

Substance

Disease orSyndrome

PopulationGroup

Semantic Types

SemanticNetwork

Page 10: The UMLS and the Semantic Web

Why is the UMLS relevantWhy is the UMLS relevantto the Semantic Web?to the Semantic Web?

Page 11: The UMLS and the Semantic Web

Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications 11

Relevance to the SW Relevance to the SW MetathesaurusMetathesaurus

Terminology integration systemTerminology integration system Trans-namespace integrationTrans-namespace integration Integration beyond shared identifiersIntegration beyond shared identifiers

Repository of biomedical terminologies/ontologiesRepository of biomedical terminologies/ontologies Many UMLS vocabularies used for the annotation Many UMLS vocabularies used for the annotation

of datasets (including clinical records)of datasets (including clinical records)

Page 12: The UMLS and the Semantic Web

Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications 12

Relevance to the SW Relevance to the SW MetathesaurusMetathesaurus

Broad coverage of biomedicineBroad coverage of biomedicine Large user baseLarge user base Tooling availableTooling available

E.g, visualization, named entity recognition, etc.E.g, visualization, named entity recognition, etc.

Page 13: The UMLS and the Semantic Web

Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications 13

Relevance to the SW Relevance to the SW Semantic NetworkSemantic Network

Top-level ontology of the biomedical domainTop-level ontology of the biomedical domain Broad biomedical categoriesBroad biomedical categories Helps partition biomedical conceptsHelps partition biomedical concepts Semantic relationsSemantic relations

Page 14: The UMLS and the Semantic Web

Issues and ChallengesIssues and Challenges

Page 15: The UMLS and the Semantic Web

Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications 15

Issues and challengesIssues and challenges

AvailabilityAvailability Mandatory license agreementMandatory license agreement

DiscoverabilityDiscoverability No metadata No metadata

FormalismFormalism No easy conversion to SKOS/RDF(S)/OWLNo easy conversion to SKOS/RDF(S)/OWL

IdentifiersIdentifiers

Steep learning curveSteep learning curve

Page 16: The UMLS and the Semantic Web

Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications 16

AvailabilityAvailability

Some source vocabularies have intellectual Some source vocabularies have intellectual property restrictionsproperty restrictions E.g., most drug vocabulariesE.g., most drug vocabularies Complex agreement for SNOMED CT: available at no Complex agreement for SNOMED CT: available at no

cost for member countries of the IHTSDOcost for member countries of the IHTSDO Mandatory license agreementMandatory license agreement

No cost for researchNo cost for research May require negotiation with the vocabulary developer May require negotiation with the vocabulary developer

for production applicationsfor production applications MetamorphoSys helps extract selected sources MetamorphoSys helps extract selected sources

from the UMLSfrom the UMLS

Page 17: The UMLS and the Semantic Web

Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications 17

DiscoverabilityDiscoverability

Discoverability of individual conceptsDiscoverability of individual concepts UMLSKS web servicesUMLSKS web services Search all UMLS source vocabularies at the same timeSearch all UMLS source vocabularies at the same time Named entity recognition/normalization (e.g., Named entity recognition/normalization (e.g.,

MetaMap)MetaMap)

Discoverability of terminologies/ontologiesDiscoverability of terminologies/ontologies No comprehensive registriesNo comprehensive registries No rich registriesNo rich registries

With rich metadata supporting the discoverability of With rich metadata supporting the discoverability of terminologies/ontologiesterminologies/ontologies

Page 18: The UMLS and the Semantic Web

Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications 18

FormalismFormalism

UMLS: Proprietary formatUMLS: Proprietary format Rich Release Format (RRF)Rich Release Format (RRF) All terminologies/ontologies represented in the same All terminologies/ontologies represented in the same

formatformat

No easy conversion to SKOS/RDF(S)/OWLNo easy conversion to SKOS/RDF(S)/OWL Underspecified semanticsUnderspecified semantics

Child/parent Child/parent subClassOf subClassOf

Complex semanticsComplex semantics Descriptors / concepts / termsDescriptors / concepts / terms

Rich attribute setRich attribute set

Page 19: The UMLS and the Semantic Web

Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications 19

Identifiers for biomedical entitiesIdentifiers for biomedical entities

What is identified?What is identified? Entity vs. resource about the entityEntity vs. resource about the entity

Which identifier to pick?Which identifier to pick? E.g., Addison’s diseaseE.g., Addison’s disease

363732003363732003 (SNOMED CT)(SNOMED CT) D000224D000224 (MeSH)(MeSH) C0001403C0001403 (UMLS Metathesaurus)(UMLS Metathesaurus)

Which format?Which format? URI vs. LSIDURI vs. LSID

Which authoritative source for minting URIs?Which authoritative source for minting URIs? Ontology developers vs. (e.g.) Bio2RDF Ontology developers vs. (e.g.) Bio2RDF

Page 20: The UMLS and the Semantic Web

Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications 20

Steep learning curveSteep learning curve

Large resourceLarge resource 1.5M concepts1.5M concepts 6M terms6M terms Over 20M relations Over 20M relations

Complex structureComplex structure MetathesaurusMetathesaurus Semantic NetworkSemantic Network

Rich set of attributesRich set of attributes

Rich set of relationsRich set of relations TerminologicalTerminological SemanticSemantic StatisticalStatistical MappingMapping

Multiple languagesMultiple languages

Complex domainComplex domain

Page 21: The UMLS and the Semantic Web

ConclusionsConclusions

Page 22: The UMLS and the Semantic Web

Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications 22

ConclusionsConclusions

UMLS as a terminology integration systemUMLS as a terminology integration system Helps bridge across namespacesHelps bridge across namespaces Helps integrate information sourcesHelps integrate information sources

Beyond shared identifiersBeyond shared identifiers

UMLS as a repository of terminologies/ontologiesUMLS as a repository of terminologies/ontologies Single source, single format for 143 vocabulariesSingle source, single format for 143 vocabularies

Issues with availability, discoverability and Issues with availability, discoverability and formalismformalism

Identifiers for biomedical entitiesIdentifiers for biomedical entities

Page 23: The UMLS and the Semantic Web

Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications 23

ReferencesReferences

UMLSUMLSumlsinfo.nlm.nih.govumlsinfo.nlm.nih.gov

UMLS browsersUMLS browsers (free, but UMLS license required) (free, but UMLS license required) Knowledge Source Server: Knowledge Source Server: umlsks.nlm.nih.govumlsks.nlm.nih.gov Semantic Navigator: Semantic Navigator: http://mor.nlm.nih.gov/perl/semnav.plhttp://mor.nlm.nih.gov/perl/semnav.pl

RRF browserRRF browser(standalone application distributed with the UMLS)(standalone application distributed with the UMLS)

Page 24: The UMLS and the Semantic Web

Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications 24

ReferencesReferences

Recent overviewsRecent overviews Bodenreider O. (2004). Bodenreider O. (2004). The Unified Medical Language The Unified Medical Language

System (UMLS): Integrating biomedical terminologySystem (UMLS): Integrating biomedical terminology. . Nucleic Acids ResearchNucleic Acids Research; D267-D270.; D267-D270.

Bodenreider O. Bodenreider O. From terminology integration to From terminology integration to information integration: Unified Medical Language information integration: Unified Medical Language System (UMLS).System (UMLS). BioRDF Teleconference, W3C BioRDF Teleconference, W3C Semantic Web Health Care and Life Sciences Interest Semantic Web Health Care and Life Sciences Interest Group, June 5, 2006.Group, June 5, 2006.http://mor.nlm.nih.gov/pubs/pres/060605-BioRDF.pdf

Page 25: The UMLS and the Semantic Web

MedicalMedicalOntologyOntologyResearchResearch

Olivier BodenreiderOlivier Bodenreider

Lister Hill National CenterLister Hill National Centerfor Biomedical Communicationsfor Biomedical CommunicationsBethesda, Maryland - USABethesda, Maryland - USA

Contact:Contact:Web:Web:

[email protected]@nlm.nih.govmor.nlm.nih.govmor.nlm.nih.gov