Top Banner
The Use of Semantic Graphs for Modeling Biomedical Text Laura Plaza NIL- Natural Interaction based on Language Universidad Complutense de Madrid
39

The Use of Semantic Graphs for Modeling Biomedical Text

Feb 11, 2016

Download

Documents

alka

The Use of Semantic Graphs for Modeling Biomedical Text. Laura Plaza NIL- Natural Interaction based on Language Universidad Complutense de Madrid. Text summarization. Semantic graph based representation. Semantic graph based representation. Automatic Indexing. Information Retrieval. - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: The  Use of  Semantic Graphs for Modeling Biomedical Text

The Use of Semantic Graphs for Modeling Biomedical Text

Laura PlazaNIL- Natural Interaction based on LanguageUniversidad Complutense de Madrid

Page 2: The  Use of  Semantic Graphs for Modeling Biomedical Text

Semantic graph based

representation

Information Retrieval

Automatic Indexing

Semantic graph based

representation

Text summariza

tion

Page 3: The  Use of  Semantic Graphs for Modeling Biomedical Text

Why semantic?

Cerebrovascular diseases during pregnancy may

result from hemorrhage

The common cold is more common in cold weather

than in summer

Brain vascular disorders during

gestation may result from hemorrhage

=Polysemy

Synonymy

Page 4: The  Use of  Semantic Graphs for Modeling Biomedical Text

Why graphs?Pneumococcal infection is a

lung infection caused by streptococcus pneumonia.

Mycoplasma pneumonia is another type of atypical

phneumonia.

PneumonIa

Pneumococcal

pneumonia

influenza

Co-occurs with

The patient referred feeling short of breath

and was diagnosed with pneumonia

Symptom

Page 5: The  Use of  Semantic Graphs for Modeling Biomedical Text

Our ProposalUsing concepts and relations

from external knowlegde sources for representing the text as a graph

Exploiting the topology of the network to identify groups of concepts semantically related that represent different topics

Page 6: The  Use of  Semantic Graphs for Modeling Biomedical Text

Representation ProcessDocument pre-processing

Concept identification

Document representation

Concept clustering and topic recognition

Page 7: The  Use of  Semantic Graphs for Modeling Biomedical Text

Document preprocessing

Page 8: The  Use of  Semantic Graphs for Modeling Biomedical Text

Concept IdentificationThe goal of the trial was to assess

cardiovascular mortality for stroke

ConceptsGoals (Intellectual Product)Clinical Trials (Research Activity)Cardiovascular system (Body System)Mortality vital statistics (Quantitative Concept)Cerebrovascular accident (Disease or Syndrome)

Page 9: The  Use of  Semantic Graphs for Modeling Biomedical Text

Concept Identification - Ambiguity

Tissues are often coldPhrase: “Tissues”

Meta Mapping (1000) 1000 C0040300:Tissues (Body tissue)Phrase: “are”Phrase: “often cold”MetaMapping (888) 694 C0332183:Often (Frequent) 861 C0234192:Cold (Cold Sensation)MetaMapping (888) 694 C0332183:Often (Frequent) 861 C0009443:Cold (Common Cold)MetaMapping (888) 694 C0332183:Often (Frequent) 861 C0009264:Cold (Cold temperature)

WSD

• Personalized PageRank (PPR)• Journal Descriptor Indexing (JDI)• Machine Readable Dictionary (MRD) • Automatic Extracted Corpus (AEC)

Page 10: The  Use of  Semantic Graphs for Modeling Biomedical Text

Document Representation

Activity

Clinical or Research Activity

Research Activity

Study

Clinical Study

Clinical Trials

Anatomic Structure

System or Substance

Organ System

Cardiovascular System

Disease

Disorder Or Finding

Disease or Disorder

Non-Neoplastic Disorder

Non-Neoplastic Disorder by Site

Non-Neoplastic Cardiovascular Disorder

Non-Neoplastic Vascular Disorder

Cerebrovascular Disorder

Cerebrovascular Accident

Disorder by Site

Respiratory and Thoracic Disorder

Thoracic Disorder

Heart Disorder

Coronary Heart Disease

Non-Neoplastic Heart Disorder

Congestive Heart Failure

Finding by Site or System

Cardiovascular System Finding

Blood Pressure Finding

Hypertensive Disease

Personnel

Professional Personnel

Clinicians

The goal of the trial was to assess cardiovascular mortality and morbidity for stroke, coronary heart disease and

congestive heart failure, as an evidence-based guide for clinicians who treat hypertension.

Page 11: The  Use of  Semantic Graphs for Modeling Biomedical Text

Document RepresentationAll the sentence graphs are merged

into a single Document GraphThe graph is extended with more

semantic relationsEach edge is assigned a weight in [0,

1]Different relations may be assigned

different weightsThe more specific are the concepts, the

more weight is assigned to the edge

Page 12: The  Use of  Semantic Graphs for Modeling Biomedical Text

The goal of the trial was to assess cardiovascular mortality and morbidity for stroke, coronary heart disease and congestive heart failure, as an evidence-based guide

for clinicians who treat hypertension.While event rates for fatal cardiovascular disease were similar, there was a

disturbing tendency for stroke to occur more often in the doxazosin group, than in the group taking chlorthalidone

Other related relationsAssociated with relations

Is a relationsClinicians

Research Activity

Study

Clinical Study

Clinical Trials

Organ System

Cardiovascular System

Disease or Disorder

Non-Neoplastic Disorder

Non-Neoplastic Disorder by Site

Non-Neoplastic Cardiovascular Disorder

Non-Neoplastic Vascular Disorder

Cerebrovascular Disorder

Cerebrovascular Accident

Disorder by Site

Respiratory and Thoracic Disorder

Thoracic Disorder

Heart Disorder

Coronary Heart Disease

Non-Neoplastic Heart Disorder

Congestive Heart Failure

Finding by Site or System

Cardiovascular System Finding

Blood Pressure Finding

Hypertensive Disease

Disorder of Cardiovascular System

Cardiovascular Diseases

Cardiovascular Drug

Alpha-Adrenergic Blocking Agent

Doxazosin

Pharmaceutical Adjuvant

Diuretic

Thiazide Diuretics

Chlorthalidone

1/21/2

2/32/3

3/41

Page 13: The  Use of  Semantic Graphs for Modeling Biomedical Text

Concept Clustering & Topic Recognition

.

.

.

hubs

Page 14: The  Use of  Semantic Graphs for Modeling Biomedical Text

Concept Clustering & Topic Recognition

Concepts are ranked by salienceThe n vertices with a highest

salience are called hub vertices

),(

)()(kjjkj vvconnecteve

ji eweightvSalience

Page 15: The  Use of  Semantic Graphs for Modeling Biomedical Text

Concept Clustering & Topic Recognition

The hub vertices are grouped into Hub Vertex Sets (HVSs)

The remaining vertices are assigned to the cluster to which they are more connected

The number and properties of the clustering strongly depends on the parameters’ values

Page 16: The  Use of  Semantic Graphs for Modeling Biomedical Text

Concept Clustering & Topic Recognition

Chlorthalidone

Congestive heart failure

Drug pseudoallergen by function

Adverse reactions

Amlodipine

Blood pressure finding

Hepatic

Cerebrovascularaccident

Health personnel

Clinicians

Persons

Elderly

Patients

Organism

Populationgroup

.

.

.

Page 17: The  Use of  Semantic Graphs for Modeling Biomedical Text

Semantic graph based

representation

Information Retrieval

Automatic Indexing

Text summariza

tion

Page 18: The  Use of  Semantic Graphs for Modeling Biomedical Text

Text SummarizationCreating a compacted version of one

or various documents

Motivation

Extracts vs. abstracts Single vs. multi-document Generic vs. Application-

oriented

Types

Summaries as an indication of what a document is about

Improving indexing, categorization, and IR

Page 19: The  Use of  Semantic Graphs for Modeling Biomedical Text

Text Summarization

Cluster m

Cluster 1...

Sentence n

Sentence1...

Similarity = 35.0

Similarity = 12.0

Similarity = 4.0

Similarity = 86.0

5.0)(

0.1)(

0

),(

,,

,,

,,

,

jikik

jikik

jikik

Svvjkji

wCHVSv

wCHVSv

wCv

wSCsimilarityjkk

Page 20: The  Use of  Semantic Graphs for Modeling Biomedical Text

Text SummarizationCluster 1 … Cluster n

Sentence 1 (98,.0)

… Sentence 6 (18.0)

Sentence n (28.0)

… Sentence 3 (1.0)

…. … …

Sentence selection

H.1: Selecting the top n ranked sentences from the biggest cluster

H.2: Selecting ni sentences from each cluster

H.3: Weighting the sentence-to-cluster similarity to the clusters’ sizes +

other traditional criteria: frequency, position,

similarity with the title, etc

Page 21: The  Use of  Semantic Graphs for Modeling Biomedical Text

Text SummarizationEvaluation: How is the important

content preserved in the summary? ROUGE automatic evaluation metrics Comparison with the abstracts of the articles

ROUGE-2

ROUGE-SU4

H. 3* 0.3538 0.3267H.2* 0.3421 0.3205H.1* 0.3453 0.3189LexRank

0.3248 0.3097

SUMMA 0.3187 0.2989AutoSummarize

0.2446 0.2318

Page 22: The  Use of  Semantic Graphs for Modeling Biomedical Text

Text SummarizationEvaluation: How does ambiguity

affect summarization?

ROUGE-2

ROUGE-SU4

AEC 0.3670 0.3379MRD 0.3611 0.3341JDI 0.3538 0.3267First mapping

0.3283 0.3117

Page 23: The  Use of  Semantic Graphs for Modeling Biomedical Text

Given a list of genes (or proteins):1. Retrieving documents related to the genes2. Building a sematic graph-based

representation of the corpus3. Identifying groups of genes/proteins4. Generating a summary for each group that

describes the functionality of the entities

Summarization of Biological Entity-related Information

Multi-document, application-oriented

summarization

Page 24: The  Use of  Semantic Graphs for Modeling Biomedical Text

Automatic Indexing of Biomedical Literature using Summaries

Title + Abstract

MTI

Ordered list of MeSH main headings

Full text

Refined list of MeSH Headings

Page 25: The  Use of  Semantic Graphs for Modeling Biomedical Text

Automatic Indexing of Biomedical Literature using SummariesWhat about using the full texts?

◦Recall increases by precision decreases

What about using automatic summaries of different lenghts?◦As the lenght increases, recall

improves but precision worsens◦There is a summary lenght which

maximizes F-measure

Page 26: The  Use of  Semantic Graphs for Modeling Biomedical Text

Semantic graph based

representation

Information Retrieval

Automatic Indexing

Text summariza

tion

Page 27: The  Use of  Semantic Graphs for Modeling Biomedical Text

Retrieval of Similar Patient Cases

Motivation: Facilitating the access to previous cases

Problem: Given a reference patient record, to

retrieve others from the clinical database that are similar to the reference

one

Page 28: The  Use of  Semantic Graphs for Modeling Biomedical Text

Retrieval of Similar Patient Cases

Same symptom or sign (e.g. , fever)

Same diagnosis (e.g. bacterial pneumonia)

Same test or procedure (e.g., endoscopy biopsy)

Same medication (e.g. clopidogrel)

But … absent criteria are not relevant!!!

When can we consider that two patient

records are similar?

Page 29: The  Use of  Semantic Graphs for Modeling Biomedical Text

Retrieval of Similar Patient CasesThe records are represented using UMLS graphsConcepts are filtered by semantic typesNegated concept are ignored

Category UMLS Semantic TypesSymptoms and

SignsSign or SymptomFinding

Diseases Disease or SyndromePathologic Function

ProceduresTherapeutic or Preventive

ProcedureDiagnosis Procedure

Body PartsBody Location or RegionBody Part, Organ, or Organ

ComponentMedicaments Pharmacologic substance

Page 30: The  Use of  Semantic Graphs for Modeling Biomedical Text

Retrieval of Similar Patient Cases

We compute the similarity among the reference record and all records in the database

4869,0ityMaxSimilar

VotesSimilarity

55

54

53

1111...

112

111

119...

112

111

Similarity

Finding by site

Clinical finding

Disease

Bacterialpneumonia

Infectious disease

Disorder by body site

Pneumonia due to Streptococcus

Mycoplasma pneumonia

Respiratoryfinding

Functional findingof respiratory tract

Coughing

Clinical finding

Disorder by body site

Finding by site1/11

2/11

3/11

8/11

9/11

10/11

3/5

4/5

5/5

Bacterialpneumonia

Pneumococcal pneumonia

11/11

Pneumonia due to anaerobic bacteria

Pneumonia due to pleuropneumonia

Graph A Graph B

... ...Virus Diseases

Page 31: The  Use of  Semantic Graphs for Modeling Biomedical Text

Semantic graph based

representation

Information Retrieval

Automatic Indexing

Text summariza

tion

Page 32: The  Use of  Semantic Graphs for Modeling Biomedical Text

Automatic Indexing of EHRDiscovering relevant SNOMED-CT

concepts in health records

1. Spell checking2. Acronym expansion and WSD3. Negation detection4. Concept identification

4 steps

Page 33: The  Use of  Semantic Graphs for Modeling Biomedical Text

Automatic Indexing of EHR1. Spell Checking

◦ Hunspell + Levenshtein + keyboard + phonetic distance

Page 34: The  Use of  Semantic Graphs for Modeling Biomedical Text

Automatic Indexing of EHR2. Acronym expansion and WSD

◦ A list of abbreviation + Machine Learning + expert rules

Page 35: The  Use of  Semantic Graphs for Modeling Biomedical Text

Automatic Indexing of EHR1. Negation detection

◦ NegEx algorithm Spanish adaptation◦ Negation cue + Negation scope

Page 36: The  Use of  Semantic Graphs for Modeling Biomedical Text

Automatic Indexing of EHR4. Concept identification

QueryEl recién nacido fue

ingresado

SNOMED-CT concept

descriptions

Candidate mappings

- Recién nacido.- Recién nacido

prematuro.- Ingreso del

paciente.Scoring function

Final mappings- Recién nacido.- Ingreso del

paciente.

Page 37: The  Use of  Semantic Graphs for Modeling Biomedical Text

Automatic Indexing of EHR

Page 38: The  Use of  Semantic Graphs for Modeling Biomedical Text

Automatic Indexing of EHR Future work

◦ Representing the EHR as a graph using different relations from SNOMED-CT

◦ Computing the salience of the concepts to obtain the most representative ones

◦ Using such representation in different NLP tasks (e.g., categorization, IR, etc.)

Page 39: The  Use of  Semantic Graphs for Modeling Biomedical Text

Further ReadingsSummarization

Plaza, L., Díaz, A., Gervás, P. (2011). A semantic graph-based approach to biomedical summarization. Artificial Intelligence in Medicine,53.

Plaza, L. (2012). Evaluating the importance of sentence position for automatic summarization of biomedical literature. Submitted to Bioinformatics

Word Sense DisambiguationPlaza, L., Stevenson, M., Díaz, A. (2012). Resolving Ambiguity in Biomedical Text

to Improve Summarization. Information Processing & Management, 48(4). Plaza, L., Jimeno-Yepes, A., Díaz, A., Aronson, A.(2011).Studying correlation

between different word sense disambiguation methods and summarization effectiveness in biomedical texts. BMC Bioinformatics, 12.

Automatic IndexingJimeno-Yepes, A., Plaza, L., Mork, J., Díaz, A., Aronson, A.(2012).Using automatic

summaries to improve automatic indexing. To appear in BMC Bioinformatics.

Retrieval of Similar CasesPlaza, L., Díaz, A.(2010).Retrieval of Similar Electronic Health Records using

UMLS Concept Graphs. 15th International Conf. on Applications of Natural Language to Information Systems.