Top Banner
Identifying Disease Diagnosis Factors by Proximity-based Mining of Medical Texts Rey-Long Liu * , Shu-Yu Tung, and Yun- Ling Lu * Dept. of Medical Informatics Tzu Chi University Taiwan, R.O.C.
21

Identifying Disease Diagnosis Factors by Proximity-based Mining of Medical Texts Rey-Long Liu *, Shu-Yu Tung, and Yun-Ling Lu * Dept. of Medical Informatics.

Jan 03, 2016

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Identifying Disease Diagnosis Factors by Proximity-based Mining of Medical Texts Rey-Long Liu *, Shu-Yu Tung, and Yun-Ling Lu * Dept. of Medical Informatics.

Identifying Disease Diagnosis Factors by

Proximity-based Mining of Medical Texts

Rey-Long Liu*, Shu-Yu Tung, and Yun-Ling Lu*Dept. of Medical Informatics

Tzu Chi University

Taiwan, R.O.C.

Page 2: Identifying Disease Diagnosis Factors by Proximity-based Mining of Medical Texts Rey-Long Liu *, Shu-Yu Tung, and Yun-Ling Lu * Dept. of Medical Informatics.

Outline

• Research background

• Problem definition

• The proposed approach: PDFI

• Empirical evaluation

• Conclusion

PDFI@ACIIDS2011 2

Page 3: Identifying Disease Diagnosis Factors by Proximity-based Mining of Medical Texts Rey-Long Liu *, Shu-Yu Tung, and Yun-Ling Lu * Dept. of Medical Informatics.

Research Background

PDFI@ACIIDS2011 3

Page 4: Identifying Disease Diagnosis Factors by Proximity-based Mining of Medical Texts Rey-Long Liu *, Shu-Yu Tung, and Yun-Ling Lu * Dept. of Medical Informatics.

Diagnosis Knowledge Map: Fundamental of Diagnosis

Support & Education

PDFI@ACIIDS2011 4

r5

r4

r3

r2

r1

d3

d2

d1

Symptoms & Signs (and examinations & tests)DiseasesRisk Factors

m1

m2

m3

m4

m5

Page 5: Identifying Disease Diagnosis Factors by Proximity-based Mining of Medical Texts Rey-Long Liu *, Shu-Yu Tung, and Yun-Ling Lu * Dept. of Medical Informatics.

Basic Properties

• Diagnosis factors of a disease– Risk factors, symptoms, and signs of the disease

• A diagnosis knowledge map consist of many-to-many relationships between diseases and their diagnosis factors– May have different capability of discriminating

the diseases, and may evolve

• Construction of a diagnosis knowledge map is essential but costly

PDFI@ACIIDS2011 5

Page 6: Identifying Disease Diagnosis Factors by Proximity-based Mining of Medical Texts Rey-Long Liu *, Shu-Yu Tung, and Yun-Ling Lu * Dept. of Medical Informatics.

Problem Definition

PDFI@ACIIDS2011 6

Page 7: Identifying Disease Diagnosis Factors by Proximity-based Mining of Medical Texts Rey-Long Liu *, Shu-Yu Tung, and Yun-Ling Lu * Dept. of Medical Informatics.

Goal• Explore how the identification of the diagnosis

factors may be supported by text mining

• Develop a technique PDFI (Proximity-based Diagnosis Factors Identifier) that – Employs term proximity to improve diagnosis

factors identifiers– Serves as a supplement to improve existing

identifiers

PDFI@ACIIDS2011 7

Page 8: Identifying Disease Diagnosis Factors by Proximity-based Mining of Medical Texts Rey-Long Liu *, Shu-Yu Tung, and Yun-Ling Lu * Dept. of Medical Informatics.

Related Work• Extract relationships by parsing or template

matching– Weakness: Relationships between diseases and

diagnosis factors are seldom expressed in individual sentences

• Select key features by text classification– Weakness: Term proximity is NOT considered

• Proximity-based retrieval– Weakness: NOT applicable to diagnosis factor

identificationPDFI@ACIIDS2011 8

Page 9: Identifying Disease Diagnosis Factors by Proximity-based Mining of Medical Texts Rey-Long Liu *, Shu-Yu Tung, and Yun-Ling Lu * Dept. of Medical Informatics.

The Proposed Approach: PDFI

PDFI@ACIIDS2011 9

Page 10: Identifying Disease Diagnosis Factors by Proximity-based Mining of Medical Texts Rey-Long Liu *, Shu-Yu Tung, and Yun-Ling Lu * Dept. of Medical Informatics.

Basic Observation

• In a medical text talking about the diagnosis of a disease, the diagnosis factors often appear in a nearby area of the text

PDFI@ACIIDS2011 10

Page 11: Identifying Disease Diagnosis Factors by Proximity-based Mining of Medical Texts Rey-Long Liu *, Shu-Yu Tung, and Yun-Ling Lu * Dept. of Medical Informatics.

The Approach

• For a candidate diagnosis factor u, PDFI– Measures how other candidate diagnosis factors

appear in the areas near to u in the medical texts, and then

– Encodes the term proximity information into the discriminating capability of u measured by the underlying discriminative factors identifiers.

PDFI@ACIIDS2011 11

Page 12: Identifying Disease Diagnosis Factors by Proximity-based Mining of Medical Texts Rey-Long Liu *, Shu-Yu Tung, and Yun-Ling Lu * Dept. of Medical Informatics.

System Overview

PDFI@ACIIDS2011 12

Encode term proximity contexts to revise the strengths of candidate factors

Measure discriminating strengths of candidate factors

Underlying identifierPDFI

Ranked factors for individual diseases

Texts about individual diseases

Discriminating strengths of candidate factors

Page 13: Identifying Disease Diagnosis Factors by Proximity-based Mining of Medical Texts Rey-Long Liu *, Shu-Yu Tung, and Yun-Ling Lu * Dept. of Medical Informatics.

Scoring for a Candidate Factor

PDFI@ACIIDS2011 13

||1

1

),(Pr ;)( ,

Ne

cureoximitySco unNnMinDist nu

),(1

1),(

cuRankcuScoreIdentifier

MinDistu,c= Minimum distance between u and n in the texts about disease c, and α is set to 30

• For a candidate diagnosis factor u for disease c

Rank(u,c) = Rank of u w.r.t. c by the underlying identifier

Finalscore(u, c) = ProximityScore(u,c)+IdentifierScore(u,c)

Page 14: Identifying Disease Diagnosis Factors by Proximity-based Mining of Medical Texts Rey-Long Liu *, Shu-Yu Tung, and Yun-Ling Lu * Dept. of Medical Informatics.

Empirical Evaluation

PDFI@ACIIDS2011 14

Page 15: Identifying Disease Diagnosis Factors by Proximity-based Mining of Medical Texts Rey-Long Liu *, Shu-Yu Tung, and Yun-Ling Lu * Dept. of Medical Informatics.

Experimental Data• Medical dictionary: from MeSH

– Each MeSH term and its retrieval equivalence terms, resulting in a dictionary of 164,354 medical terms

• Medical texts for disease: from MedlinePlus– All the diseases for which MedlinePlus tags

diagnosis/symptoms texts, resulting in a text database of 420 medical texts for 131 diseases

– Each medical text is manually read and cross-checked to extract target diagnosis factor terms from the texts, resulting in 2,797 target terms

PDFI@ACIIDS2011 15

Page 16: Identifying Disease Diagnosis Factors by Proximity-based Mining of Medical Texts Rey-Long Liu *, Shu-Yu Tung, and Yun-Ling Lu * Dept. of Medical Informatics.

Underlying Diagnosis Factor Identifier

• The chi-square feature scoring technique – Produces a discriminating strength for each

feature (candidate factor) with respect to each disease, and

– For each disease, all positively-correlated features are sent to PDFI for re-ranking

PDFI@ACIIDS2011 16

Page 17: Identifying Disease Diagnosis Factors by Proximity-based Mining of Medical Texts Rey-Long Liu *, Shu-Yu Tung, and Yun-Ling Lu * Dept. of Medical Informatics.

Evaluation Criteria

• Mean average precision (MAP)– Measuring how target diagnosis factors are ranked

high for the medical expert to check and validate

– Example• Targets ranked 1st, 3rd, 5th AP=(1/1+2/3+3/5)/3=0.76

• Targets ranked 1st, 2nd, 3rd AP=(1/1+2/2+3/3)/3=1.00

PDFI@ACIIDS2011 17

k

jfj

iAPC

iAPMAP

m

j i

C

i

1

||

1 )()(,

||

)(

Page 18: Identifying Disease Diagnosis Factors by Proximity-based Mining of Medical Texts Rey-Long Liu *, Shu-Yu Tung, and Yun-Ling Lu * Dept. of Medical Informatics.

Results• MAP: chi-square: 0.2136; chi-square+PDFI: 0.2996

PDFI@ACIIDS2011 18

Page 19: Identifying Disease Diagnosis Factors by Proximity-based Mining of Medical Texts Rey-Long Liu *, Shu-Yu Tung, and Yun-Ling Lu * Dept. of Medical Informatics.

An Example• Parasitic diseases

– AP: chi-square:0.3003; chi-square+PDFI:0.3448• PDFI promotes the ranks of several target

diagnosis factors (e.g., parasite, antigen, diarrhea, and MRI scan)

– They appear at some place(s) where more other candidate terms occur in a nearby area

• PDFI lowers the ranks of a few target diagnosis factors (e.g., serology)

– Serology only appears at one place where the author used lots of words to explain serology

PDFI@ACIIDS2011 19

Page 20: Identifying Disease Diagnosis Factors by Proximity-based Mining of Medical Texts Rey-Long Liu *, Shu-Yu Tung, and Yun-Ling Lu * Dept. of Medical Informatics.

Conclusion

PDFI@ACIIDS2011 20

Page 21: Identifying Disease Diagnosis Factors by Proximity-based Mining of Medical Texts Rey-Long Liu *, Shu-Yu Tung, and Yun-Ling Lu * Dept. of Medical Informatics.

• Diagnosis factors to discriminate diseases are the fundamental basis for – Diagnosis decision support, diagnosis skill

training, medical research, & health education

• Text mining is a good way to identify and maintain the huge amount of diagnosis factors for diseases

• By encoding term proximity information, PDFI may be a good supplement to existing technique to identify the diagnosis factors for individual diseases

PDFI@ACIIDS2011 21