Top Banner
Extracting BI-RADS Features from Portuguese Clinical Texts H. Nassif, F. Cunha, I.C. Moreira, R. Cruz-Correia, E. Sousa, D. Page, E. Burnside, and I. Dutra University of Wisconsin – Madison, and University of Porto, Portugal
21

Extracting BI-RADS Features from Portuguese Clinical Texts H. Nassif, F. Cunha, I.C. Moreira, R. Cruz- Correia, E. Sousa, D. Page, E. Burnside, and I.

Dec 31, 2015

Download

Documents

Lucas Rogers
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Extracting BI-RADS Features from Portuguese Clinical Texts H. Nassif, F. Cunha, I.C. Moreira, R. Cruz- Correia, E. Sousa, D. Page, E. Burnside, and I.

Extracting BI-RADS Features from Portuguese Clinical Texts

H. Nassif, F. Cunha, I.C. Moreira, R. Cruz-Correia, E. Sousa, D. Page, E. Burnside,

and I. Dutra

University of Wisconsin – Madison, and University of Porto, Portugal

Page 2: Extracting BI-RADS Features from Portuguese Clinical Texts H. Nassif, F. Cunha, I.C. Moreira, R. Cruz- Correia, E. Sousa, D. Page, E. Burnside, and I.

The American Cancer Society, Cancer Facts & Figures 2009.

Page 3: Extracting BI-RADS Features from Portuguese Clinical Texts H. Nassif, F. Cunha, I.C. Moreira, R. Cruz- Correia, E. Sousa, D. Page, E. Burnside, and I.

Impression(free text)

Mammogram Radiologist

Structured Database

PredictiveModelBenign Malignant

Page 4: Extracting BI-RADS Features from Portuguese Clinical Texts H. Nassif, F. Cunha, I.C. Moreira, R. Cruz- Correia, E. Sousa, D. Page, E. Burnside, and I.

BI-RADS Lexicon

Concepts

Page 5: Extracting BI-RADS Features from Portuguese Clinical Texts H. Nassif, F. Cunha, I.C. Moreira, R. Cruz- Correia, E. Sousa, D. Page, E. Burnside, and I.

Lobular Shape Oval Shape Obscured Margin …Report 1 0 1 0 …Report 2 1 0 1 …

… … … … …

Example• In the right breast, an approximately 1.0 cm

mass is identified in the right upper slightly inner breast. This mass is noncalcified and partially obscured and lobulated in appearance.

Concepts

Page 6: Extracting BI-RADS Features from Portuguese Clinical Texts H. Nassif, F. Cunha, I.C. Moreira, R. Cruz- Correia, E. Sousa, D. Page, E. Burnside, and I.

Nassif 09

Page 7: Extracting BI-RADS Features from Portuguese Clinical Texts H. Nassif, F. Cunha, I.C. Moreira, R. Cruz- Correia, E. Sousa, D. Page, E. Burnside, and I.

Syntax Analyzer

• Tokenize sentences• Discard punctuation• Keep stop words• Stem words

Page 8: Extracting BI-RADS Features from Portuguese Clinical Texts H. Nassif, F. Cunha, I.C. Moreira, R. Cruz- Correia, E. Sousa, D. Page, E. Burnside, and I.

Nassif 09

Page 9: Extracting BI-RADS Features from Portuguese Clinical Texts H. Nassif, F. Cunha, I.C. Moreira, R. Cruz- Correia, E. Sousa, D. Page, E. Burnside, and I.

Information from Lexicon

• Translate lexicon into Portuguese• Lexicon specifies synonyms:

Eg: Equal density, Isodense

• Lexicon allows for ambiguous wording:

Text Conceptindistinct margin indistinct margin

indistinct calcification amorphous calcificationindistinct image not a concept

Page 10: Extracting BI-RADS Features from Portuguese Clinical Texts H. Nassif, F. Cunha, I.C. Moreira, R. Cruz- Correia, E. Sousa, D. Page, E. Burnside, and I.

Nassif 09

Page 11: Extracting BI-RADS Features from Portuguese Clinical Texts H. Nassif, F. Cunha, I.C. Moreira, R. Cruz- Correia, E. Sousa, D. Page, E. Burnside, and I.

Experts

• Provide domain specific information– Synonyms: Oval, Ovoid– Acronyms, abbreviations– Domain idiosyncrasies

• Interact with and modify semantic rules

Page 12: Extracting BI-RADS Features from Portuguese Clinical Texts H. Nassif, F. Cunha, I.C. Moreira, R. Cruz- Correia, E. Sousa, D. Page, E. Burnside, and I.

Nassif 09

Page 13: Extracting BI-RADS Features from Portuguese Clinical Texts H. Nassif, F. Cunha, I.C. Moreira, R. Cruz- Correia, E. Sousa, D. Page, E. Burnside, and I.

Concept Finder

• Regular expression rules• Extract concepts from text

• Rule formation:– Initial rules based on lexicon– Rules refined by experts

Page 14: Extracting BI-RADS Features from Portuguese Clinical Texts H. Nassif, F. Cunha, I.C. Moreira, R. Cruz- Correia, E. Sousa, D. Page, E. Burnside, and I.

Rule Generation Example 1

• Aim: Regional Distribution Concept • Lexicon specifies the word “regional”• Initial rule: presence of the word “regional”• Run on training set, experts see results• Many false positives:

– “regional medical center”, “regional hospital”

• Rule refined by experts:– “regional .* !(medical|hospital)”

Page 15: Extracting BI-RADS Features from Portuguese Clinical Texts H. Nassif, F. Cunha, I.C. Moreira, R. Cruz- Correia, E. Sousa, D. Page, E. Burnside, and I.

Rule Generation Example 2

• Aim: Skin Thickening Concept• Lexicon specifies “skin thickening”• Try “skin” and “thickening” in same sentence

– “skin retraction and thickening”– “thickening of the overlying skin”– “A BB placed on the skin overlying a palpable focal

area of thickening in the upper outer right breast”

• Experts suggest “skin” and “thickening” in close proximity

Page 16: Extracting BI-RADS Features from Portuguese Clinical Texts H. Nassif, F. Cunha, I.C. Moreira, R. Cruz- Correia, E. Sousa, D. Page, E. Burnside, and I.

Scope

• Scope: distance between two words• Start with a large scope:

– assess number of true and false positives

• Move to smaller scopes: – assess number of false negatives

• Check precision and recall estimates• Experts decide on the best distance

Page 17: Extracting BI-RADS Features from Portuguese Clinical Texts H. Nassif, F. Cunha, I.C. Moreira, R. Cruz- Correia, E. Sousa, D. Page, E. Burnside, and I.

Nassif 09

Page 18: Extracting BI-RADS Features from Portuguese Clinical Texts H. Nassif, F. Cunha, I.C. Moreira, R. Cruz- Correia, E. Sousa, D. Page, E. Burnside, and I.

Negation Detector

• Negation triggers (Mutalik 01, Gindl 08):– “não” (not) when not preceded by “onde” (where)– “sem” (without)– “nem” (nor).

• Precedes or appears within the subsentence• Establish negation scope• “without evidence of suspicious cluster of

microcalcifications”

Page 19: Extracting BI-RADS Features from Portuguese Clinical Texts H. Nassif, F. Cunha, I.C. Moreira, R. Cruz- Correia, E. Sousa, D. Page, E. Burnside, and I.

Dataset

• Training set: 1,129 reports, unlabeled• Testing set: 153 pairs, labeled by radiologist

– Basic screening report– Detailed diagnostic report

• Perform three refinement passes– Double blind, based on lexicon– Refine rules– Refine manual labeling and rules

Page 20: Extracting BI-RADS Features from Portuguese Clinical Texts H. Nassif, F. Cunha, I.C. Moreira, R. Cruz- Correia, E. Sousa, D. Page, E. Burnside, and I.

Results

Page 21: Extracting BI-RADS Features from Portuguese Clinical Texts H. Nassif, F. Cunha, I.C. Moreira, R. Cruz- Correia, E. Sousa, D. Page, E. Burnside, and I.

Conclusion

• Out of 48 disputed cases, parser correctly classified 25 (52.1%)

• First Portuguese BI-RADS extractor– Discovers features missed or misclassified– Similar performance to manual annotation

• Method portable to other languages