Top Banner
Mapping biomedical literature into UMLS concepts MetaMap Presented By: Osama Jomaa Miami University
34
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Unified Medical Language System & MetaMap

Mapping biomedical literature into UMLS concepts

MetaMap

Presented By: Osama JomaaMiami University

Page 2: Unified Medical Language System & MetaMap

Unified Medical Language

Page 3: Unified Medical Language System & MetaMap

Motivation

“... to facilitate the development of computer systems that behave as if they "understand" the meaning of the language of biomedicine and health.”

National Library of Medicine

Page 4: Unified Medical Language System & MetaMap

UMLS Components

1. Metathesaurus

+1 Million biomedical concepts from over 100 vocabularies

2. .Semantic Network

133 categories & 54 relationships.

3. .Specialist Lexicon & Lexical Tools

Software programs to aid in NLP

Page 5: Unified Medical Language System & MetaMap

Meta thesaurusPatient Care Controlled Terms

Biomedical Vocabs from Different

LanguagesClinical/Health Services Research

Health Services Billing

Biomedical Literature Catalogs

Public Health Statistics

.

.

.

.

.

5,000,000

biomedi cal te rm

1,000 ,000 Con

cep ts

+ 100 Source Vocabs

Relational DB Tables

Page 6: Unified Medical Language System & MetaMap

Metathesaurus●Concepts are classified into categories:–Diagnosis

–Procedures & Supplies

–Diseases

–….

●Concepts have unique identifier.●Concepts have preferred terms.●Concepts can be grouped into subsets via applying filters.

Page 7: Unified Medical Language System & MetaMap

Source Vocabularies Categories

Page 8: Unified Medical Language System & MetaMap

One Concept Many Terms

One concept can have many terms in multiple vocabularies.

Example: Atrial Fibrillation

Page 9: Unified Medical Language System & MetaMap

Preferred TermsConcept: Hodgkin's Disease

Page 10: Unified Medical Language System & MetaMap

Unique Identifiers● Concept Unique Identifier (CUI)

Link all the names in all the source vocabs that mean the same to one concept and assign a unique identifier, CUI, to it.

● Lexical Unique Identifier (LUI)

Are lexical variants for the concepts detected using Lexical Variant Generator (LVG) program.

● String Unique Identifier (SUI)

Represents variations in the char set, upper-lower case, or permutation difference.

● Atom Unique Identifier (AUI)

Every occurrence of a string in each source vocab is assigned a unique identifier, AUI.

Page 11: Unified Medical Language System & MetaMap
Page 12: Unified Medical Language System & MetaMap

Semantic Network● Semantic Types

+133 types, each MT concept assigned one semantic type at least.

● Semantic Relationships

54 relationaship. Is-A is the most important.

Page 13: Unified Medical Language System & MetaMap

Semantic NetworkSemantic Types Examples:✔ Organisms✔ Anatomical structures✔ Biologic function✔ Chemicals✔ Physical objects

Entity

Event

Semantic Relationships Examples:✔ Physically related to✔ Spatially related to✔ Temporally related to✔ Functionally related to✔ Conceptually related to

Page 14: Unified Medical Language System & MetaMap

Lexical Tools

●The Specialist Lexicon

Is an English lexicon (dictionary) that includes over 200,000 biomedical terms from a variety of source to aid in NLP.

●Lexical Variant Generator (LVG)●Norm

Normalizer

●Wordind

Tokenizer

Page 15: Unified Medical Language System & MetaMap

MetaMap

Page 16: Unified Medical Language System & MetaMap

Why Concept Identification?

● Information extraction/Data mining

● Classification/Categorization

● Text summarization

● Question answering

● Literature-based Knowledge Discovery

Page 17: Unified Medical Language System & MetaMap

ExamplePhrase: “lung cancer.”

Meta Candidates (8):

1000 Lung Cancer {MDR,DXP} (Malignant neoplasm of lung) [Neoplastic Process]

1000 Lung Cancer (Carcinoma of lung) [Neoplastic Process]

861 Cancer (Malignant Neoplasms) [Neoplastic Process]

861 Lung [Body Part, Organ, or Organ Component]

861 Cancer (Cancer Genus) [Invertebrate]

861 Lung (Entire lung) [Body Part, Organ, or Organ Component]

861 Cancer (Specialty Type - cancer) [Biomedical Occupation or

Discipline]

768 Pneumonia [Disease or Syndrome]

Meta Mapping (1000):

1000 Lung Cancer (Carcinoma of lung) [Neoplastic Process]

Meta Mapping (1000):

1000 Lung Cancer (Malignant neoplasm of lung) [Neoplastic Process]

Page 18: Unified Medical Language System & MetaMap

The Algorithm

Page 19: Unified Medical Language System & MetaMap

MetaMap Options● Word Sense Disambiguation (-y)

Determines which concept is the best choice using surrounding context.

● Negation (--negx)

Identifies negated entities.

Page 20: Unified Medical Language System & MetaMap

Examples●WSD Examples–“Fifteen (6.4%) of 234 colds treated with placebo ..”

●Cold (cold temperature) [npop]●Cold (Common cold) [dsyn]●Cold (Cold Sensation) [phsf]

–“.. the drugs were compared in two four-point, double-blind bioassays.”●Double (Diplopia) [dsyn] vs. Double (Duplicate) [ftcn]●Blind (Blind Vision) [dsyn] vs. BLIND (Blinded) [reasa] vs. Blind (Visually impaired persons) [podg]

● Bioassays (Biological Assay) [lbpr]

Page 21: Unified Medical Language System & MetaMap

Examples● Negation Example

– “There is no focal infiltrate or pleural effusion.”

– --negex output(in addition to normal output):

NEGATIONS:

Negation Type:nega

Negation Trigger: no

Negation PosInfo: 9/2

Negated Concept: C0332448:Infiltrate

Concept PosInfo: 18/10

Negation Type:nega

Negation Trigger: no

Negation PosInfo: 9/2

Negated Concept: C2073625:pleural effusion, C0032227:Pleural Effusion

Concept PosInfo: 32/16

Page 22: Unified Medical Language System & MetaMap

Other Options● -@ --WSD <hostname> : Which WSD server to use.

● -8 --dynamic_variant_generation : dynamic variant generation

● -D --all_derivational_variants : all derivational variants

● -J --restrict_to_sts <semtypelist> : restrict to semantic types

● -K --ignore_stop_phrases : ignore stop phrases.

● -R --restrict_to_sources <sourcelist> : restrict to sources

● -V --mm_data_version <name> : version of MetaMap data to use.

● -X --truncate_candidates_mappings : truncate candidates mapping

● -Y --prefer_multiple_concepts : prefer multiple concepts

● -Z --mm_data_year <name> : year of MetaMap data to use.

● -a --all_acros_abbrs : allow Acronym/Abbreviation variants

● -b --compute_all_mappings : compute/display all mappings

● -d --no_derivational_variants : no derivational variants

● -e --exclude_sources <sourcelist> : exclude semantic types

● -g --allow_concept_gaps : allow concept gaps

● -i --ignore_word_order : ignore word order

● -k --exclude_sts <semtypelist> : exclude semantic types

● -o --allow_overmatches : allow overmatches

● -r --threshold <integer> : Threshold for displaying candidates.

● -y --word_sense_disambiguation : use WSD

Page 23: Unified Medical Language System & MetaMap

MetaMap Output Formats

● Human-readable outputp

● MetaMap Machine Output (MMO)

● XML output

● Colorized MetaMap output (MetaMap 3D)

● Fielded (MMI) Outputs

Page 24: Unified Medical Language System & MetaMap

Human ReadablePhrase: "heart attack"

Meta Candidates (8):

1000 Heart attack (Myocardial Infarction) [Disease or Syndrome]

861 Heart [Body Part, Organ, or Organ Component]

861 Attack, NOS (Onset of illness) [Finding]

861 Attack (Attack device) [Medical Device]

861 attack (Attack behavior) [Social Behavior]

861 Heart (Entire heart) [Body Part, Organ, or Organ Component]

861 Attack (Observation of attack) [Finding]

827 Attacked (Assault) [Injury or Poisoning]

Meta Mapping (1000):

1000 Heart attack (Myocardial Infarction) [Disease or Syndrome]

Page 25: Unified Medical Language System & MetaMap

Machine Outputcandidates([

ev(-1000, 'C0027051', 'Heart attack', 'Myocardial Infarction', [heart,attack], [dsyn], [[[1,2],[1,2],0]], yes, no, ['MEDLINEPLUS], [0/12]),

ev(-861, 'C0018787', 'Heart', 'Heart', [heart], [bpoc], [[[1,1],[1,1],0]], yes, no, ['AIR'],[0/5]),

ev(-861, 'C0277793', 'Attack, NOS', 'Onset of illness', [attack], [fndg], [[[2,2],[1,1],0]], yes, no, ['MTH'], [6/6]),

ev(-861, 'C0699795', 'Attack', 'Attack device', [attack], [medd] [[[2[medd],[[[2,2],[1,1],0]],2] [1 1] 0]] yesyes, nono, ['MTH'[ MTH ,'MMSL']MMSL ], [6/6])[6/6]),

ev(-861, 'C1261512', attack, 'Attack behavior', [attack],[socb], [[[2,2],[1,1],0]], yes, no, ['MTH','PSY','AOD'], [6/6]),

ev(-861, 'C1281570', 'Heart', 'Entire heart', [heart], [bpoc], [[[1,1],[1,1],0]], yes, no, ['MTH','SNOMEDCT'], [0/5]),

Ev(-861, , 'C1304680',, 'Attack',, 'Observation of attack',, [attack],,[fndg], [[[2,2],[1,1],0]],yes, no, ['MTH','SNOMEDCT'], [6/6]),

ev(-827, 'C0004063', 'Attacked', 'Assault', [attacked], [inpo], [[[2,2],[1,1],1]], yes, no, ['ICD10AM'], [6/6])]).

Page 26: Unified Medical Language System & MetaMap

Unformatted XML<Candidate><CandidateScore>-1000</CandidateScore><CandidateCUI>C0027051</CandidateCUI><CandidateM

atched>Heart attack</CandidateMatched><CandidatePreferred>Myocardial Infarction</CandidatePreferr

ed><MatchedWords Count=2><MatchedWord>heart</MatchedWord><MatchedWord>attack</MatchedWord></Match

edWords><SemTypes Count=1><SemType>dsyn</SemType></SemTypes><MatchMaps Count=1><MatchMap><TextMat

chStart>1</TextMatchStart><TextMatchEnd>2</TextMatchEnd><ConcMatchStart>1</ConcMatchStart><ConcMa

tchEnd>2</ConcMatchEnd><LexVariation>0</LexVariation></MatchMap></MatchMaps><IsHead>yes</IsHead><

IsOverMatch>no</IsOverMatch><Sources Count=24><Source>MEDLINEPLUS</Source></Sources><ConceptPIs C

ount=1><ConceptPI><StartPos>0</StartPos><Length>12</Length></ConceptPI></ConceptPIs></Candidate>

Page 27: Unified Medical Language System & MetaMap

Formatted XML<Candidate>

<CandidateScore>-1000</CandidateScore>

<CandidateCUI>C0027051</CandidateCUI>

<CandidateMatched>Heart attack</CandidateMatched>

<CandidatePreferred>Myocardial Infarction</CandidatePreferred>

<MatchedWords

Count=2><MatchedWord>heart</MatchedWord><MatchedWord>attack</MatchedWord></MatchedWords>

<SemTypes>

<Count=1><SemType>dsyn</SemType></SemTypes>

<MatchMaps Count=1>

<MatchMap>

<TextMatchStart>1</TextMatchStart>

<ConcMatchEnd>2</ConcMatchEnd>

<LexVariation>0</LexVariation>

</MatchMap>

</MatchMaps>

<IsHead>yes</IsHead>

<IsOverMatch>no</IsOverMatch>

<Sources Count=24><Source>MEDLINEPLUS</Source></Sources>

<ConceptPIs Count=1><ConceptPI><StartPos>0</StartPos><Length>12</Length></ConceptPI></ConceptPIs>

</Candidate>

Page 28: Unified Medical Language System & MetaMap

MetaMap 3D

Page 29: Unified Medical Language System & MetaMap

MetaMap: Technical Aspect

●Download –MetaMap API Underlying Architecture.

–MetaMap Java API.

●Extract and Install–$ bzip2 -dc public_mm_linux_javaapi_{four-digit-year}.tar.bz2 | tar xvf -

–$ ./bin/install.sh

●Starting MetaMap Server

$ ./bin/skrmedpostctl start #Start SKR Server

$ ./bin/wsdserverctl start #Start WSD Server (Optional)

$ ./bin/mmserver{two-digit-year} #Start MetaMap Server

Page 30: Unified Medical Language System & MetaMap

MetaMap Java API

Two jar files contain the API:

✔ /src/javaapi/dist/MetaMapApi.jar

✔ /src/javaapi/dist/prologbeans.jar

Page 31: Unified Medical Language System & MetaMap

Code Time :)

MetaMapApi api = new MetaMapApiImpl("localhost");

List<Result> resultList = api.processCitationsFromFile("Abstract.txt");

Result result = resultList.get(0);

Page 32: Unified Medical Language System & MetaMap

Code Time :)for (Utterance utterance: result.getUtteranceList()) {

System.out.println("Utterance:");

System.out.println(" Id: " + utterance.getId());

System.out.println(" Utterance text: " + utterance.getString());

System.out.println(" Position: " + utterance.getPosition());

Page 33: Unified Medical Language System & MetaMap

Code Time :)

for (PCM pcm: utterance.getPCMList()) {System.out.println("Phrase:");System.out.println(" text: " + pcm.getPhrase().getPhraseText());System.out.println("Candidates:");for (Ev ev: pcm.getCandidateList()) { System.out.println(" Candidate:"); System.out.println(" Score: " + ev.getScore()); System.out.println(" Concept Id: " + ev.getConceptId()); System.out.println(" Concept Name: " + ev.getConceptName()); System.out.println(" Preferred Name: " + ev.getPreferredName()); System.out.println(" Matched Words: " + ev.getMatchedWords()); System.out.println(" Semantic Types: " + ev.getSemanticTypes()); System.out.println(" MatchMap: " + ev.getMatchMap()); System.out.println(" MatchMap alt. repr.: " + ev.getMatchMapList()); System.out.println(" is Head?: " + ev.isHead()); System.out.println(" is Overmatch?: " + ev.isOvermatch()); System.out.println(" Sources: " + ev.getSources()); System.out.println(" Positional Info: " + ev.getPositionalInfo());}

Page 34: Unified Medical Language System & MetaMap

Code Time :)

System.out.println("Mappings:");for (Mapping map: pcm.getMappingList()) { System.out.println(" Map Score: " + map.getScore()); for (Ev mapEv: map.getEvList()) { System.out.println(" Score: " + mapEv.getScore()); System.out.println(" Concept Id: " + mapEv.getConceptId()); System.out.println(" Concept Name: " + mapEv.getConceptName()); System.out.println(" Preferred Name: " + mapEv.getPreferredName()); System.out.println(" Matched Words: " + mapEv.getMatchedWords()); System.out.println(" Semantic Types: " + mapEv.getSemanticTypes()); System.out.println(" MatchMap: " + mapEv.getMatchMap()); System.out.println(" MatchMap alt. repr.: " + mapEv.getMatchMapList()); System.out.println(" is Head?: " + mapEv.isHead()); System.out.println(" is Overmatch?: " + mapEv.isOvermatch()); System.out.println(" Sources: " + mapEv.getSources()); System.out.println(" Positional Info: " + mapEv.getPositionalInfo()); }}}}