Top Banner
1 EliXR: An Approach to Eli gibility Criteria Ex traction and R epresentation Chunhua Weng, PhD, Zhihui Luo, PhD, Stephen B. Johnson, PhD Department of Biomedical Informatics Columbia University March 10, 2011
20

Chunhua Weng, PhD - EliXR

Oct 14, 2014

Download

Documents

Vadim Pavlov

EliXR: An Approach to Eligibility Criteria Extraction and Representation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Chunhua Weng, PhD - EliXR

1

EliXR: An Approach to Eligibility Criteria Extraction and Representation

Chunhua Weng, PhD, Zhihui Luo, PhD, Stephen B. Johnson, PhDDepartment of Biomedical Informatics

Columbia UniversityMarch 10, 2011

Page 2: Chunhua Weng, PhD - EliXR

Problem

Free-text clinical research eligibility criteria are not amenable for machine processing.

Computational representations (e.g., ontologies) are much needed to support electronic eligibility determination, clinical evidence application, clinical research knowledge management, etc.

2

Page 3: Chunhua Weng, PhD - EliXR

Related Work

• Eligibility Rule Grammar and Ontology (ERGO)• Agreement on Standardized Protocol Inclusion

Requirements for Eligibility (ASPIRE)• Many other prior efforts1

3

1.Weng C, SW Tu, I Sim, R. Richesson, Formal Representations of Eligibility Criteria: A Literature Review, Journal of Biomedical Informatics: 43(2010), 451‐467.

Page 4: Chunhua Weng, PhD - EliXR

The Research Gap

• Plethora of representations, no canonical model

• Ontology for human annotation vs. ontology for NLP

• Lacking ontology and NLP symbiosis

4

Page 5: Chunhua Weng, PhD - EliXR

Our Research Question

Can we induce templates that can facilitate both representation and extraction from criteria text?

(A template is a “world model” for eligibility criteria as a semantic network)

5

Page 6: Chunhua Weng, PhD - EliXR

From Text to Templates

TEMPLATES• Concepts• Semantic Relationships

6

TEXT• Phrases• Sentence• Phrase Co-occurrence Frequency

Template development = segmentation of UMLS Semantic Network for the eligibility criteria domain

Page 7: Chunhua Weng, PhD - EliXR

Methods: The EliXR Framework

7

Criteria Corpus

Lexicon Creation1

Semantic Annotation1

Semantic Dependency

Parsing4

TemplateInduction6

Semantic Pattern Mining5

Template Filling

Dynamic Criteria Categorization2,3

Structured Criteria

UMLS

Automatic Template selection

Page 8: Chunhua Weng, PhD - EliXR

Semantic Annotation Compared with MMTx

Example:Patients with complications such as serious cardiac, renal and hepatic disorders.

EliXR Annotation:{Patients | Patient or Disabled Group} {with|} {complications | Pathologic

Function} {such|} {as|} {serious | Qualitative Concept} {cardiac | Body Part, Organ, or Organ Component} {renal | Body Part, Organ, or Organ Component} {and|} {hepatic | Body Location or Region} {disorders | Disease or Syndrome} {.|.|}

MMTx 2.4C Annotation:{Patients | Patient or Disabled Group} {with complications | Pathologic Function}

{such as serious cardiac, renal | Idea or Concept} {and|} {hepatic disorders| Disease or Syndrome}

8

[Luo, CRI‐10]

Page 9: Chunhua Weng, PhD - EliXR

9

The 27 Eligibility Criteria Categories[Luo, AMIA‐10]

Page 10: Chunhua Weng, PhD - EliXR

Dependency Parsing for “at least 1 week since discontinuation of prior pulmonary

hypertension medication”

10

Page 11: Chunhua Weng, PhD - EliXR

Frequent Semantic Patterns

11

GroupsSemantic patterns in

each group

Disease Criteria 60

Lab Results Criteria 36

Cancer Criteria 28

Medication Criteria 23

Therapy or Surgery Criteria 23

Temporal Expression 9

Total Unique Patterns 175

Page 12: Chunhua Weng, PhD - EliXR

CLAS | FTCN | QLCO | QNCO | CLNA

DSYN | NEOPPATF | SOSY | FNDG

TOPP | PHSU | CLDG

BLOR | BPOC

LBPR| DIAP

TMCO

VIRS

PODG

Causes

Manifestation Of 

Occurs in

Aggregated Patterns for Disease Criteria

ORGA

Page 13: Chunhua Weng, PhD - EliXR

Modifier

DiseaseManifestation

Treatment (Therapy or Drug)

Body Location

Diagnostic Procedure

Temporal Constraints

Etiology

Population Group

Causes

Manifestation of

Occurs in

Disease Criteria Template

History

Page 14: Chunhua Weng, PhD - EliXR

0:m

attribute

Class

Has‐a

Micro‐Templates for Temporal Expressions

temporal relationship

reference interval

1:m

0:m

0:m

0:m

temporal pattern0:m

intrinsic temporal pattern

intrinsic duration

0:m

0:m

anchor event

Temporal Expression

event

0:m

cycle

frequency

Page 15: Chunhua Weng, PhD - EliXR

AQUA Parsing Accuracy

• Only 900 criteria sentences were used for training• A human review served as the gold standard

15

Five Test Sets(100 criteria each)

Tree Structure Correctness

1 90.60%2 94.30%3 92.80%4 93.00%5 93.00%

Avg. 92.70%

Page 16: Chunhua Weng, PhD - EliXR

Evaluation of Semantic Patterns

16

Min. support

Semantic Type PatternsFrequent Sub-trees

Maximal Frequent Sub-trees

Min. Pattern Cover

Total Patterns

Total Binary

PatternsCoverage Total

PatternsCoverage Total

Number

Overlap with

Patterns2 1825 669 91.30% 175 81.30% 183 90

Min. support

Semantic Group PatternsFrequent Sub-trees

Maximal Frequent Sub-trees

Total Patterns

Total Binary

PatternsCoverage Total

PatternsCoverage

2 2378 120 92.50% 39 90.60%

Page 17: Chunhua Weng, PhD - EliXR

Contributions

1. Templates with rich semantics that can be mapped to UMLS and semantically aligned with text;

2. A method for segmenting UMLS for boot strapping knowledge representation for eligibility criteria;

3. A method combining machine learning and dependency tree pattern mining for iterative, (semi-)automatic knowledge acquisition.

17

Page 18: Chunhua Weng, PhD - EliXR

Acknowledgements• NLM R01 LM009886 (04/01/09 - ) “Bridging the semantic gap between

eligibility criteria and clinical data” (PI: Weng)

• Colleagues on the AQUA and EliXR team– Eneida Mendoca, MD, PhD– Robert Duffy, MS– Xiaoying Wu, PhD

• Feedback fromIda Sim, Samson Tu, James J. Cimino, Nigam Shah, GQ Zhang, Albert Lai

18

Page 19: Chunhua Weng, PhD - EliXR

Resources for Sharing & Collaboration

1. A UMLS-based semantic lexicon for eligibility criteria

2. A semantic annotator

3. A dynamic semantic classifier for criteria sentences

4. A dependency parser with enriched semantic information

5. A temporal expression ontology for eligibility criteria

6. A tool for temporal expression extraction and encoding

19

Page 20: Chunhua Weng, PhD - EliXR

References1. Johnson, SB, Conceptual graph grammar--a simple formalism for sublanguage. Methods of Information in Med. 1998 Nov;37(4-5):345-52. 2. Johnson, SB. A semantic lexicon for medical language processing. J Am Med Inform Assoc, 6:205--218, 1999. 3. Campbell, DA., Johnson, SB. A transformational-based learner for dependency grammars in discharge summaries. Proc. Of ACL-02 workshop on BioNLP, 37--44, 2002.4. Zaki MJ. Efficiently Mining Frequent Trees in a Forest: Algorithms and Applications. IEEE Trans. Knowl.Data Eng., 17(8):1021-1035, 2005. 5. Luo Z, Duffy R, Johnson SB, Weng C, Corpus-based Approach to Creating a Semantic Lexicon for Clinical Research Eligibility Criteria from UMLS. Proc of AMIA Summit on Clinical Research Informatics. 2010: 26-31. 6. Luo Z, Johnson SB, Chase HS, Weng C, Semi-automatically Inducing Semantic Classes of Clinical Research Eligibility Criteria Using UMLS and Hierarchical Clustering, Proc of AMIA Symp 2010, 487-91.7. Weng C, Luo Z, Dynamic Categorization of Eligibility Criteria, Proc of AMIA Fall Symp 2010, 1306.

20