Olivier Bodenreider Olivier Bodenreider Lister Hill National Center Lister Hill National Center for Biomedical Communications for Biomedical Communications Bethesda, Maryland Bethesda, Maryland - - USA USA The Unified Medical Language System Overview Ontology and Taxonomy Coordinating Working Group MITRE - McLean, VA October 5, 2005
28
Embed
The Unified Medical Language System Overview · 07/10/2005 · Olivier Bodenreider Lister Hill National Center for Biomedical Communications Bethesda, Maryland - USA The Unified
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Olivier BodenreiderOlivier Bodenreider
Lister Hill National CenterLister Hill National Centerfor Biomedical Communicationsfor Biomedical CommunicationsBethesda, Maryland Bethesda, Maryland -- USAUSA
The Unified Medical Language SystemOverview
Ontology and Taxonomy Coordinating Working GroupMITRE - McLean, VA
October 5, 2005
2
What does UMLS stand for?What does UMLS stand for?
UUnifiednified
MMedicaledical
LLanguageanguage
SSystemystem
UMLS®
Unified Medical Language System®
UMLS Metathesaurus®
3
MotivationMotivation
Started in 1986Started in 1986
National Library of MedicineNational Library of Medicine
““LongLong--term R&D projectterm R&D project””
«[…] the UMLS project is an effort to overcome two significant barriers to effective retrieval of machine-readable information.
• The first is the variety of ways the same concepts are expressedin different machine-readable sources and by different people.
• The second is the distribution of useful information among many disparate databases and systems.»
~80 families of vocabularies~80 families of vocabulariesmultiple translations (e.g., MeSH, ICPC, ICDmultiple translations (e.g., MeSH, ICPC, ICD--10)10)
variants (Americanvariants (American--English equivalents, Australian English equivalents, Australian extension/adaptation)extension/adaptation)
subsequent editions usually considered distinct families subsequent editions usually considered distinct families (ICD: 9(ICD: 9--10; DSM: IIIR10; DSM: IIIR--IV)IV)
Broad coverage of biomedicineBroad coverage of biomedicine
Common presentationCommon presentation
(2005AB)
5
Biomedical terminologiesBiomedical terminologies
General vocabulariesGeneral vocabulariesanatomy (UWDA, anatomy (UWDA, NeuronamesNeuronames))
drugs (drugs (RxNormRxNorm, First , First DataBankDataBank, Micromedex), Micromedex)
medical devices (UMD, SPN)medical devices (UMD, SPN)
Several perspectivesSeveral perspectivesclinical terms (SNOMED CT)clinical terms (SNOMED CT)
information sciences (MeSH, CRISP)information sciences (MeSH, CRISP)
MALADIE D'ADDISON - FrenchAddison-Krankheit - GermanMorbo di Addison - ItalianDOENCA DE ADDISON - PortugueseADDISONOVA BOLEZN' - RussianENFERMEDAD DE ADDISON - Spanish
A disease characterized by hypotension, weight loss, anorexia, weakness, and sometimes a bronze-like melanotichyperpigmentation of the skin. It is due to tuberculosis- or autoimmune-induced disease (hypofunction) of the adrenal glands that results in deficiency of aldosterone and cortisol. In the absence of replacement therapy, it is usually fatal.
SNOMEDMeSHAODRead Codes…
Disease or Syndrome
11
Metathesaurus Metathesaurus ConceptsConcepts
ConceptConcept (~ 1.2 M)(~ 1.2 M) CUICUISet of synonymousSet of synonymousconcept namesconcept names
TermTerm (~ 4.2 M)(~ 4.2 M) LUILUISet of normalized namesSet of normalized names
StringString (~ 4.8 M)(~ 4.8 M) SUISUIDistinct concept nameDistinct concept name
AtomAtom (~ 5.6 M)(~ 5.6 M) AUIAUIConcept nameConcept namein a given sourcein a given source
Metathesaurus Metathesaurus Evolution over timeEvolution over time
Concepts never die (in principle)Concepts never die (in principle)CUIs are permanent identifiersCUIs are permanent identifiers
What happens when they do die (in reality)?What happens when they do die (in reality)?Concepts can merge or splitConcepts can merge or split
Resulting in new concepts and deletionsResulting in new concepts and deletions
Addison's diseaseC0001403
Addison's disease, NOS C0271735
1992 1993 1994 1995 1996 1997 1998 1999 2004…
13
Metathesaurus Metathesaurus RelationsRelations
Symbolic relations:Symbolic relations: ~9 M pairs of concepts~9 M pairs of concepts
Statistical relations :Statistical relations : ~7 M pairs of concepts ~7 M pairs of concepts (co(co--occurring concepts)occurring concepts)
Mapping relations:Mapping relations: 100,000 pairs of concepts100,000 pairs of concepts
Categorization: Relationships between concepts Categorization: Relationships between concepts and semantic types from the Semantic Networkand semantic types from the Semantic Network
14
Symbolic relationsSymbolic relations
RelationRelationPair of Pair of ““atomatom”” identifiersidentifiers
TypeType
Attribute (if any)Attribute (if any)
List of sources (for type and attribute)List of sources (for type and attribute)
Semantics of the relationship:Semantics of the relationship:defined by its defined by its typetype [and [and attributeattribute]]
Source transparency: the informationis recorded at the “atom” level
15
Organize conceptsOrganize concepts
InterInter--concept concept relationships: hierarchies relationships: hierarchies from the source from the source vocabulariesvocabularies
Relationships can inherit semanticsRelationships can inherit semantics
Semantic Network
Metathesaurus
AdrenalCortex
AdrenalCortical
hypofunction
Disease or SyndromeBody Part, Organ,
or Organ Component
Pathologic Functionisa
Biologic Function
isa
Fully FormedAnatomical
Structure
isa
location of
location of
Heart
Concepts
Metathesaurus
22
225
97
4
12
9 31
Esophagus
Left PhrenicNerve
HeartValves
FetalHeart
Medias-tinum
SaccularViscus
AnginaPectoris
CardiotonicAgents
TissueDonors
AnatomicalStructure
Fully FormedAnatomical
Structure
EmbryonicStructure
Body Part, Organ orOrgan Component Pharmacologic
Substance
Disease orSyndrome
PopulationGroup
Semantic Types
SemanticNetwork
21
Lexical toolsLexical tools
To manage lexical variation in biomedical To manage lexical variation in biomedical terminologiesterminologies
Major toolsMajor toolsNormalizationNormalization
IndexesIndexes
Lexical Variant Generation program (Lexical Variant Generation program (lvglvg))
Based on the SPECIALIST LexiconBased on the SPECIALIST Lexicon
Used by noun phrase extractors, search enginesUsed by noun phrase extractors, search engines
22
UMLS distributionUMLS distribution
License agreementLicense agreementPart of the content is copyrightedPart of the content is copyrighted
33--4 releases per year4 releases per yearSome components released more frequentlySome components released more frequently
Availability of UMLS dataAvailability of UMLS dataDistributed on DVDDistributed on DVD
Downloaded from NLM websiteDownloaded from NLM website
APIs available for Java and XMLAPIs available for Java and XML--TCP/IPTCP/IP
23
UMLS tools developed at NLMUMLS tools developed at NLM
Several browsersSeveral browsersMetamorphoSysMetamorphoSys
Install and customize the UMLSInstall and customize the UMLSPart of the UMLS distributionPart of the UMLS distribution
Lexical Variant Generation toolsLexical Variant Generation toolsManage term variationManage term variationPart of the UMLS distribution and available separatelyPart of the UMLS distribution and available separately
MetaMapMetaMap ((MMTxMMTx))Identify Identify MetathesaurusMetathesaurus concepts in textconcepts in textAvailable separately (requires UMLS license)Available separately (requires UMLS license)
MedicalOntologyResearch
Olivier BodenreiderOlivier Bodenreider
Lister Hill National CenterLister Hill National Centerfor Biomedical Communicationsfor Biomedical CommunicationsBethesda, Maryland Bethesda, Maryland -- USAUSA
RRF browserRRF browser(standalone application distributed with the UMLS)(standalone application distributed with the UMLS)
26
ReferencesReferences
Recent overviewsRecent overviewsBodenreiderBodenreider O. (2004). O. (2004). The Unified Medical Language The Unified Medical Language System (UMLS): Integrating biomedical terminologySystem (UMLS): Integrating biomedical terminology. . Nucleic Acids ResearchNucleic Acids Research; D267; D267--D270.D270.
Nelson, S. J., Powell, T. & Humphreys, B. L. (2002 ). Nelson, S. J., Powell, T. & Humphreys, B. L. (2002 ). The Unified Medical Language System (UMLS) The Unified Medical Language System (UMLS) ProjectProject. In: Kent, Allen; Hall, Carolyn M., editors. . In: Kent, Allen; Hall, Carolyn M., editors. Encyclopedia of Library and Information ScienceEncyclopedia of Library and Information Science. New . New York: Marcel York: Marcel DekkerDekker. p.369. p.369--378. 378.
27
ReferencesReferences
UMLS as a research projectUMLS as a research projectLindberg, D. A., Humphreys, B. L., & McCray, A. T. Lindberg, D. A., Humphreys, B. L., & McCray, A. T. (1993). (1993). The Unified Medical Language SystemThe Unified Medical Language System. . Methods Methods InfInf Med, 32Med, 32(4), 281(4), 281--91.91.
Humphreys, B. L., Lindberg, D. A., Schoolman, H. M., Humphreys, B. L., Lindberg, D. A., Schoolman, H. M., & Barnett, G. O. (1998). & Barnett, G. O. (1998). The Unified Medical The Unified Medical Language System: an informatics research Language System: an informatics research collaborationcollaboration. . J Am Med Inform Assoc, 5J Am Med Inform Assoc, 5(1), 1(1), 1--11.11.
28
ReferencesReferences
Technical papersTechnical papersMcCray, A. T., & Nelson, S. J. (1995). McCray, A. T., & Nelson, S. J. (1995). The The representation of meaning in the UMLSrepresentation of meaning in the UMLS. . Methods Methods InfInfMed, 34Med, 34(1(1--2), 1932), 193--201.201.
BodenreiderBodenreider O. & McCray A. T. (2003). O. & McCray A. T. (2003). Exploring Exploring semantic groups through visual approachessemantic groups through visual approaches. . Journal of Journal of Biomedical InformaticsBiomedical Informatics, 36(6), 414, 36(6), 414--432. 432.