Top Banner
Determining the Syntactic Structure of Medical Terms in Clinical Notes Bridget T. McInnes Ted Pedersen Serguei V. Pakhomov [email protected]
23

Determining the Syntactic Structure of Medical Terms in Clinical Notes Bridget T. McInnes Ted Pedersen Serguei V. Pakhomov [email protected].

Dec 20, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Determining the Syntactic Structure of Medical Terms in Clinical Notes Bridget T. McInnes Ted Pedersen Serguei V. Pakhomov bthomson@cs.umn.edu.

Determining the Syntactic Structure of Medical Terms in Clinical Notes

Bridget T. McInnesTed Pedersen

Serguei V. Pakhomov

[email protected]

Page 2: Determining the Syntactic Structure of Medical Terms in Clinical Notes Bridget T. McInnes Ted Pedersen Serguei V. Pakhomov bthomson@cs.umn.edu.

Goal

The goal of this presentation is to present a simple but effective approach to identify the

syntactic structure of three word terms

Page 3: Determining the Syntactic Structure of Medical Terms in Clinical Notes Bridget T. McInnes Ted Pedersen Serguei V. Pakhomov bthomson@cs.umn.edu.

Importance

Potentially improve the analysis of unrestricted medical text Mapping of medical text to standardized

terminologies

Unsupervised syntactic parsing

Page 4: Determining the Syntactic Structure of Medical Terms in Clinical Notes Bridget T. McInnes Ted Pedersen Serguei V. Pakhomov bthomson@cs.umn.edu.

Syntactic Structure of Terms

w1 w2 w3 w1 w2 w3 w1 w2 w3 w1 w2 w3

Monolithic

Non-branching Right-branchingLeft-branching

blue = independencegreen = dependence

Page 5: Determining the Syntactic Structure of Medical Terms in Clinical Notes Bridget T. McInnes Ted Pedersen Serguei V. Pakhomov bthomson@cs.umn.edu.

Example

small bowel obstruction

Page 6: Determining the Syntactic Structure of Medical Terms in Clinical Notes Bridget T. McInnes Ted Pedersen Serguei V. Pakhomov bthomson@cs.umn.edu.

Syntactic Structure of Example

small bowel obstruction

small bowel obstruction small bowel obstruction small bowel obstruction small bowel obstruction

Monolithic

Non-branching Right-branchingLeft-branching

Page 7: Determining the Syntactic Structure of Medical Terms in Clinical Notes Bridget T. McInnes Ted Pedersen Serguei V. Pakhomov bthomson@cs.umn.edu.

Method used to determine the structure of a term

The Log Likelihood Ratio is the ratio between the observed probability of a term occurring and the probability it would be expected to occur

Probability of Term Occurring-----------------------------------

Expected Probability of Term

Page 8: Determining the Syntactic Structure of Medical Terms in Clinical Notes Bridget T. McInnes Ted Pedersen Serguei V. Pakhomov bthomson@cs.umn.edu.

Log Likelihood Ratio

The expected probability of a term is often based on the Non-branching (Independence) Model

P(small bowel obstruction)-----------------------------------

P(small) P(bowel) P(obstruction)

EXPECTED PROBABILITY

OBSERVED PROBABILITY

Page 9: Determining the Syntactic Structure of Medical Terms in Clinical Notes Bridget T. McInnes Ted Pedersen Serguei V. Pakhomov bthomson@cs.umn.edu.

Extended Log Likelihood Ratio

The expected probabilities can be calculated using two other hypothesis (models)

Non-branching Right-branchingLeft-branching

P(small)P(bowel)P(obstruction) P(small bowel) P(obstruction) P(small) P(bowel obstruction)

Page 10: Determining the Syntactic Structure of Medical Terms in Clinical Notes Bridget T. McInnes Ted Pedersen Serguei V. Pakhomov bthomson@cs.umn.edu.

Three Log Likelihood Ratio Equations

P(small bowel obstruction)-----------------------------------

P(small) P(bowel) P(obstruction)

P(small bowel obstruction)-----------------------------------

P(small bowel) P(obstruction)

P(small bowel obstruction)-----------------------------------

P(small) P(bowel obstruction)

Non-branching

Right-branching Left-branching

Page 11: Determining the Syntactic Structure of Medical Terms in Clinical Notes Bridget T. McInnes Ted Pedersen Serguei V. Pakhomov bthomson@cs.umn.edu.

Expected Probability

The expected probability of a term differs as does the Log Likelihood Ratio

Non-branching Right-branchingLeft-branching

P(small) P(bowel) P(obstruction) P(small bowel) P(obstruction) P(small) P(bowel obstruction)

LL = 11,635.45 LL = 5,169.81 LL = 8,532.90

Page 12: Determining the Syntactic Structure of Medical Terms in Clinical Notes Bridget T. McInnes Ted Pedersen Serguei V. Pakhomov bthomson@cs.umn.edu.

Model Fitting

The model with the lowest Log Likelihood Ratio best describes the underlying structure of the

term

Non-branching Right-branchingLeft-branching

P(small) P(bowel) P(obstruction) P(small bowel) P(obstruction) P(small) P(bowel obstruction)

LL = 11,635.45 LL = 5,169.81 LL = 8,532.90

Page 13: Determining the Syntactic Structure of Medical Terms in Clinical Notes Bridget T. McInnes Ted Pedersen Serguei V. Pakhomov bthomson@cs.umn.edu.

ReCap

The Log Likelihood Ratio is calculated for each possible model Non-branching

Right-branching

Left-branching

The probabilities for each model are obtained from a corpus

The term is assigned the structure whose model has the lowest Log Likelihood Ratio

Page 14: Determining the Syntactic Structure of Medical Terms in Clinical Notes Bridget T. McInnes Ted Pedersen Serguei V. Pakhomov bthomson@cs.umn.edu.

Test Set

Contains 708 three word terms from the SNOMED-CT

73 terms

Monolithic

Non-branching Right-branchingLeft-branching

6 terms 378 terms251 terms

Page 15: Determining the Syntactic Structure of Medical Terms in Clinical Notes Bridget T. McInnes Ted Pedersen Serguei V. Pakhomov bthomson@cs.umn.edu.

Test Set (cont)

Syntactic structure of each term was determined through the consensus of two medical text index experts (kappa = 0.704)

The probabilities were obtained from over 10,000 Mayo Clinic clinical notes

Page 16: Determining the Syntactic Structure of Medical Terms in Clinical Notes Bridget T. McInnes Ted Pedersen Serguei V. Pakhomov bthomson@cs.umn.edu.

Monolithic Results

Left branching Right branching Our Method0

10

20

30

40

50

60

70

80

Agreement

Technique

Per

cen

tag

e ag

reem

ent

wit

h h

um

an e

xper

ts

35.5

53.4

74.8

Page 17: Determining the Syntactic Structure of Medical Terms in Clinical Notes Bridget T. McInnes Ted Pedersen Serguei V. Pakhomov bthomson@cs.umn.edu.

Results without Monolithic Terms

Left branching Right branching Our Method0

10

20

30

40

50

60

70

80

Agreement

Technique

Per

cen

tag

e ag

reem

ent

wit

h h

um

an e

xper

ts

39.5

59.5

83.5

Page 18: Determining the Syntactic Structure of Medical Terms in Clinical Notes Bridget T. McInnes Ted Pedersen Serguei V. Pakhomov bthomson@cs.umn.edu.

Limitations

Monolithic structures possibly identify through collocation extraction or

dictionary lookup

As the number of words in a term grows so does the number of hypothesis (models) to be evaluated only consider adjacent models

limit the length of the terms to 5 or 6 words

Page 19: Determining the Syntactic Structure of Medical Terms in Clinical Notes Bridget T. McInnes Ted Pedersen Serguei V. Pakhomov bthomson@cs.umn.edu.

Conclusions

Present a simple but effective method to identify the structure of three word terms

The method uses the Log Likelihood Ratio

Could be extended to identify the structure of for four, five and six word terms

Page 20: Determining the Syntactic Structure of Medical Terms in Clinical Notes Bridget T. McInnes Ted Pedersen Serguei V. Pakhomov bthomson@cs.umn.edu.

Future Work

Improve accuracy of method explore other measures of association

Chi-squared, Phi, Dice coefficient ...

incorporate multiple measures together

Extend our method to four and five word terms difficulty: finding a test set

Page 21: Determining the Syntactic Structure of Medical Terms in Clinical Notes Bridget T. McInnes Ted Pedersen Serguei V. Pakhomov bthomson@cs.umn.edu.

Thank you

Software:

Ngram Statistic Package (NSP)www.d.umn.edu/~tpederse/nsp.html

Log Likelihood Ratio Modelswww.cs.umn.edu/~bthomson/mti.html

Page 22: Determining the Syntactic Structure of Medical Terms in Clinical Notes Bridget T. McInnes Ted Pedersen Serguei V. Pakhomov bthomson@cs.umn.edu.

Log Likelihood Equation

2 * ∑xyz ( nxyz * log(nxyz / mxyz) )

Page 23: Determining the Syntactic Structure of Medical Terms in Clinical Notes Bridget T. McInnes Ted Pedersen Serguei V. Pakhomov bthomson@cs.umn.edu.

Expected Values

2 * ∑xyz ( nxyz * log(nxyz / mxyz) )

Non-branching: mxyz = nx++ * n+y+ * n++z / n+++

Left-branching: mxyz = nxy+ * n++z / n+++

Right-branching: mxyz = nx++ * n+yz / n+++