Top Banner
Recent Efforts in Clinical NLP: Clinical Text Analysis and Knowledge Extraction System (cTAKES) Guergana K. Savova, PhD Children’s Hospital Boston and Harvard Medical School
42

Recent Efforts in Clinical NLP: Clinical Text Analysis and Knowledge Extraction System (cTAKES) Guergana K. Savova, PhD Children’s Hospital Boston and.

Dec 17, 2015

Download

Documents

Steven Bradford
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Recent Efforts in Clinical NLP: Clinical Text Analysis and Knowledge Extraction System (cTAKES) Guergana K. Savova, PhD Children’s Hospital Boston and.

Recent Efforts in Clinical NLP:Clinical Text Analysis and

Knowledge Extraction System (cTAKES)

Guergana K. Savova, PhDChildren’s Hospital Boston and

Harvard Medical School

Page 2: Recent Efforts in Clinical NLP: Clinical Text Analysis and Knowledge Extraction System (cTAKES) Guergana K. Savova, PhD Children’s Hospital Boston and.

Acknowledgements

Software developers and contributors at different times (in no specific order)James Masanz, Mayo ClinicPatrick Duffy, Mayo ClinicPhilip Ogren, University of ColoradoSean Murphy, Mayo ClinicVinod Kaggal, Mayo ClinicJiaping Zheng, Childrens Hospital BostonPei Chen, Childrens Hospital BostonJihno Choi, University of Colorado

Investigators (in no specific order)Christopher Chute, MD, DrPH, Mayo ClinicJames Buntrock, MS, Mayo ClinicGuergana Savova, PhD, Childrens Hospital Boston

Page 3: Recent Efforts in Clinical NLP: Clinical Text Analysis and Knowledge Extraction System (cTAKES) Guergana K. Savova, PhD Children’s Hospital Boston and.

Overview

BackgroundClinical Text Analysis and Knowledge Extraction System (cTAKES)cTAKES for developers Download and install of cTAKES How to build the dictionary

cTAKES: graphical user interface

Page 4: Recent Efforts in Clinical NLP: Clinical Text Analysis and Knowledge Extraction System (cTAKES) Guergana K. Savova, PhD Children’s Hospital Boston and.

4

Definitions

• Information Extraction (IE)• Extracting existing facts from unstructured or loosely

structured text into a structured form

• Information Retrieval (IR)• Finding documents relevant to a user query

• Named Entity Recognition (NER)• Discovery of groups of textual mentions that belong to certain

semantic class

• Natural Language Processing (NLP)• Computational methods for text processing based on

linguistically sound principles

• Clinical NLP – NLP for the clinical narrative

• Biomedical NLP – NLP for the clinical narrative and biomedical literature

Page 5: Recent Efforts in Clinical NLP: Clinical Text Analysis and Knowledge Extraction System (cTAKES) Guergana K. Savova, PhD Children’s Hospital Boston and.

5

Problem Space

• Structured information• Relational databases

• Easy to extract information from them

• Semi-structured information• Loosely formatted XML, CSV tables

• Not challenging to extract information

• Unstructured information• Scholarly literature, clinical notes, research reports, webpages

• Majority of information is unstructured!!

• Real challenge to extract the information

Page 6: Recent Efforts in Clinical NLP: Clinical Text Analysis and Knowledge Extraction System (cTAKES) Guergana K. Savova, PhD Children’s Hospital Boston and.

Overarching Goal

Open-source, general-purpose clinical NLP toolkit Phenotype extraction from unstructured data Library of modules Cohesive with other initiatives Cutting edge methodologies Best software development practices

Our principles Open source Scalable and robust Modular and expandable Based on existing standards and conventions Scalable, adaptable methodologies through open

collaboration in the open-source development

Page 7: Recent Efforts in Clinical NLP: Clinical Text Analysis and Knowledge Extraction System (cTAKES) Guergana K. Savova, PhD Children’s Hospital Boston and.

A 43-year-old woman was diagnosed with type 2 diabetes mellitus by her family physician 3 mpresentation. Her initial blood glucose was 340 mg/dL. Glyburide

A 43-year-old woman was diagnosed with type 2 diabetes mellitus by her family physician 3 months before this presentation. Her initial blood glucose was 340 mg/dL. Glyburide

A 43-year-old woman was diagnosed with type 2 diabetes mellitus by her family physician 3 months before this presentation. Her initial blood glucose was 340 mg/dL. Glyburide

A 43-year-old woman was diagnosed with type 2 diabetes mellitus by her family physician 3 months before this presentation. Her initial blood glucose was 340 mg/dL. Glyburide 2.5 mg once daily was prescribed. Since then, self-monitoring of blood glucose (SMBG) showed blood glucose levels of 250-270 mg/dL. She was referred to an endocrinologist for further evaluation.

On examination, she was normotensive and not acutely ill. Her body mass index (BMI) was 18.7 kg/m2 following a recent 10 lb weight loss. Her thyroid was symmetrically enlarged and ankle reflexes absent. Her blood glucose was 272 mg/dL, and her hemoglobin A1c (HbA1c) was 10.3%. A lipid profile showed a total cholesterol of 261 mg/dL, triglyceride level of 321 mg/dL, HDL level of 48 mg/dL, and an LDL of 150 mg/dL. Thyroid function was normal. Urinanalysis showed trace ketones.

She adhered to a regular exercise program and vitamin regimen, smoked 2 packs of cigarettes daily for the past 25 years, and limited her alcohol intake to 1 drink daily. Her mother's brother was diabetic.

Processing Clinical Notes

A 43-year-old woman was diagnosed with type 2 diabetes mellitus by her family physician 3 months before this presentation. Her initial blood glucose was 340 mg/dL. Glyburide 2.5 mg once daily was prescribed. Since then, self-monitoring of blood glucose (SMBG) showed blood glucose levels of 250-270 mg/dL. She was referred to an endocrinologist for further evaluation.

On examination, she was normotensive and not acutely ill. Her body mass index (BMI) was 18.7 kg/m2 following a recent 10 lb weight loss. Her thyroid was symmetrically enlarged and ankle reflexes absent. Her blood glucose was 272 mg/dL, and her hemoglobin A1c (HbA1c) was 10.3%. A lipid profile showed a total cholesterol of 261 mg/dL, triglyceride level of 321 mg/dL, HDL level of 48 mg/dL, and an LDL of 150 mg/dL. Thyroid function was normal. Urinanalysis showed trace ketones.

She adhered to a regular exercise program and vitamin regimen, smoked 2 packs of cigarettes daily for the past 25 years, and limited her alcohol intake to 1 drink daily. Her mother's brother was diabetic.

Page 8: Recent Efforts in Clinical NLP: Clinical Text Analysis and Knowledge Extraction System (cTAKES) Guergana K. Savova, PhD Children’s Hospital Boston and.

Clinical Element Modelhttp://intermountainhealthcare.org/cem/Page

s/home.aspxDisorder CEM text: diabetes mellituscode: 73211009subject: patient relative temporal context: 3 months agonegation indicator: not negated

Disorder CEM text: diabetes mellituscode: 73211009subject: family member relative temporal context: negation indicator: not negated

Tobacco Use CEM text: smokingcode: 365981007subject: patient relative temporal context: 25 yearsnegation indicator: not negated

Medication CEM text: Glyburidecode: 315989subject: patient frequency: once dailynegation indicator: not negated strength: 2.5 mg

A 43-year-old woman was diagnosed with type 2 diabetes mellitus by her family physician 3 months before this presentation. Her initial blood glucose was 340 mg/dL. Glyburide 2.5 mg once daily was prescribed. Since then, self-monitoring of blood glucose (SMBG) showed blood glucose levels of 250-270 mg/dL. She was referred to an endocrinologist for further evaluation.

On examination, she was normotensive and not acutely ill. Her body mass index (BMI) was 18.7 kg/m2 following a recent 10 lb weight loss. Her thyroid was symmetrically enlarged and ankle reflexes absent. Her blood glucose was 272 mg/dL, and her hemoglobin A1c (HbA1c) was 10.3%. A lipid profile showed a total cholesterol of 261 mg/dL, triglyceride level of 321 mg/dL, HDL level of 48 mg/dL, and an LDL of 150 mg/dL. Thyroid function was normal. Urinanalysis showed trace ketones.

She adhered to a regular exercise program and vitamin regimen, smoked 2 packs of cigarettes daily for the past 25 years, and limited her alcohol intake to 1 drink daily. Her mother's brother was diabetic.

A 43-year-old woman was diagnosed with type 2 diabetes mellitus by her family physician 3 months before this presentation. Her initial blood glucose was 340 mg/dL. Glyburide 2.5 mg once daily was prescribed. Since then, self-monitoring of blood glucose (SMBG) showed blood glucose levels of 250-270 mg/dL. She was referred to an endocrinologist for further evaluation.

On examination, she was normotensive and not acutely ill. Her body mass index (BMI) was 18.7 kg/m2 following a recent 10 lb weight loss. Her thyroid was symmetrically enlarged and ankle reflexes absent. Her blood glucose was 272 mg/dL, and her hemoglobin A1c (HbA1c) was 10.3%. A lipid profile showed a total cholesterol of 261 mg/dL, triglyceride level of 321 mg/dL, HDL level of 48 mg/dL, and an LDL of 150 mg/dL. Thyroid function was normal. Urinanalysis showed trace ketones.

She adhered to a regular exercise program and vitamin regimen, smoked 2 packs of cigarettes daily for the past 25 years, and limited her alcohol intake to 1 drink daily. Her mother's brother was diabetic.

A 43-year-old woman was diagnosed with type 2 diabetes mellitus by her family physician 3 months before this presentation. Her initial blood glucose was 340 mg/dL. Glyburide 2.5 mg once daily was prescribed. Since then, self-monitoring of blood glucose (SMBG) showed blood glucose levels of 250-270 mg/dL. She was referred to an endocrinologist for further evaluation.

On examination, she was normotensive and not acutely ill. Her body mass index (BMI) was 18.7 kg/m2 following a recent 10 lb weight loss. Her thyroid was symmetrically enlarged and ankle reflexes absent. Her blood glucose was 272 mg/dL, and her hemoglobin A1c (HbA1c) was 10.3%. A lipid profile showed a total cholesterol of 261 mg/dL, triglyceride level of 321 mg/dL, HDL level of 48 mg/dL, and an LDL of 150 mg/dL. Thyroid function was normal. Urinanalysis showed trace ketones.

She adhered to a regular exercise program and vitamin regimen, smoked 2 packs of cigarettes daily for the past 25 years, and limited her alcohol intake to 1 drink daily. Her mother's brother was diabetic.

A 43-year-old woman was diagnosed with type 2 diabetes mellitus by her family physician 3 months before this presentation. Her initial blood glucose was 340 mg/dL. Glyburide 2.5 mg once daily was prescribed. Since then, self-monitoring of blood glucose (SMBG) showed blood glucose levels of 250-270 mg/dL. She was referred to an endocrinologist for further evaluation.

On examination, she was normotensive and not acutely ill. Her body mass index (BMI) was 18.7 kg/m2 following a recent 10 lb weight loss. Her thyroid was symmetrically enlarged and ankle reflexes absent. Her blood glucose was 272 mg/dL, and her hemoglobin A1c (HbA1c) was 10.3%. A lipid profile showed a total cholesterol of 261 mg/dL, triglyceride level of 321 mg/dL, HDL level of 48 mg/dL, and an LDL of 150 mg/dL. Thyroid function was normal. Urinanalysis showed trace ketones.

She adhered to a regular exercise program and vitamin regimen, smoked 2 packs of cigarettes daily for the past 25 years, and limited her alcohol intake to 1 drink daily. Her mother's brother was diabetic.

Page 9: Recent Efforts in Clinical NLP: Clinical Text Analysis and Knowledge Extraction System (cTAKES) Guergana K. Savova, PhD Children’s Hospital Boston and.

Comparative Effectiveness

Disorder CEM text: diabetes mellituscode: 73211009subject: patient relative temporal context: 3 months agonegation indicator: not negated

Disorder CEM text: diabetes mellituscode: 73211009subject: family member relative temporal context: negation indicator: not negated

Tobacco Use CEM text: smokingcode: 365981007subject: patient relative temporal context: 25 yearsnegation indicator: not negated

Medication CEM text: Glyburidecode: 315989subject: patient frequency: once dailynegation indicator: not negated strength: 2.5 mg

Compare the effectiveness of different treatment strategies (e.g., modifying target levels for glucose, lipid, or blood pressure) in reducing cardiovascular complications in newly diagnosed adolescents and adults with type 2 diabetes.

Compare the effectiveness of traditional behavioral interventions versus economic incentives in motivating behavior changes (e.g., weight loss, smoking cessation, avoiding alcohol and substance abuse) in children and adults.

Page 10: Recent Efforts in Clinical NLP: Clinical Text Analysis and Knowledge Extraction System (cTAKES) Guergana K. Savova, PhD Children’s Hospital Boston and.

Meaningful Use

Disorder CEM text: diabetes mellituscode: 73211009subject: patient relative temporal context: 3 months agonegation indicator: not negated

Disorder CEM text: diabetes mellituscode: 73211009subject: family member relative temporal context: negation indicator: not negated

Tobacco Use CEM text: smokingcode: 365981007subject: patient relative temporal context: 25 yearsnegation indicator: not negated

Medication CEM text: Glyburidecode: 315989subject: patient frequency: once dailynegation indicator: not negated strength: 2.5 mg

• Maintain problem list

• Maintain active med list

• Record smoking status

• Provide clinical summaries for each office visit

• Generate patient lists for specific conditions

• Submit syndromic surveillance data

Page 11: Recent Efforts in Clinical NLP: Clinical Text Analysis and Knowledge Extraction System (cTAKES) Guergana K. Savova, PhD Children’s Hospital Boston and.

Clinical Practice

Disorder CEM text: diabetes mellituscode: 73211009subject: patient relative temporal context: 3 months agonegation indicator: not negated

Medication CEM text: Glyburidecode: 315989subject: patient frequency: once dailynegation indicator: not negated strength: 2.5 mg

• Provide problem list and meds from the visit

Page 12: Recent Efforts in Clinical NLP: Clinical Text Analysis and Knowledge Extraction System (cTAKES) Guergana K. Savova, PhD Children’s Hospital Boston and.

Applications

Meaningful use of the EMR Comparative effectiveness Clinical investigation Patient cohort identification Phenotype extraction

Epidemiology Clinical practice and many more….With deep semantic processing, the sky is the limit for applications

Page 13: Recent Efforts in Clinical NLP: Clinical Text Analysis and Knowledge Extraction System (cTAKES) Guergana K. Savova, PhD Children’s Hospital Boston and.

Partnerships

NCBC-funded initiatives Integrating Data for Analysis, Anonymization and Sharing (iDASH) Ontology Development and Information Extraction (ODIE)

Veterans AdministrationStrategic Health Advanced Research Projects (SHARP)

SHARP 3: SMaRT app (http://www.smartplatforms.org/) SHARP 4: www.sharpn.org

R01s Shared annotated lexical resource Temporal relation discovery for the clinical domain Milti-source integrated platform for answering clinical questions

eMERGE, PGRN (Pharmacogenomics Research Network)Linguistic Data Consortium and Penn TreebankMITRE Corporation

Page 14: Recent Efforts in Clinical NLP: Clinical Text Analysis and Knowledge Extraction System (cTAKES) Guergana K. Savova, PhD Children’s Hospital Boston and.

Integrating cTAKES within i2b2

Querying encrypted clinical notes stored in the i2b2 databaseProcessing the result notes through cTAKESPersisting extracted concepts into the i2b2 databaseThus, the concepts are now searchable by the researcherEnabling the training and running classifiers directly from the i2b2 workbench

https://www.i2b2.org/events/slides/i2b2_AMIA_Tutorial_20100310.pdf

….a scalable informatics framework that will enable clinical researchers to use existing clinical data for discovery research and, when combined with IRB-approved genomic data, facilitate the design of targeted therapies for individual patients with diseases having genetic origins.

Page 15: Recent Efforts in Clinical NLP: Clinical Text Analysis and Knowledge Extraction System (cTAKES) Guergana K. Savova, PhD Children’s Hospital Boston and.

15

clinical Text Analysis and Knowledge Extraction System (cTAKES)

Page 16: Recent Efforts in Clinical NLP: Clinical Text Analysis and Knowledge Extraction System (cTAKES) Guergana K. Savova, PhD Children’s Hospital Boston and.

16

Page 17: Recent Efforts in Clinical NLP: Clinical Text Analysis and Knowledge Extraction System (cTAKES) Guergana K. Savova, PhD Children’s Hospital Boston and.

cTAKES Adoption May, 2011: 2306 downloads*

eMERGE (SGH, NW) PGRN (HMS, NW) Extensions: Yale (YATEX), MITRE

* Source: http://sourceforge.net/project/stats/?group_id=255545&ugn=ohnlp&type=&mode=alltime

Page 18: Recent Efforts in Clinical NLP: Clinical Text Analysis and Knowledge Extraction System (cTAKES) Guergana K. Savova, PhD Children’s Hospital Boston and.

18

cTAKES Technical Details • Open source

• Apache v2.0 license

• http://sourceforge.net/projects/ohnlp/

• Java 1.5

• Dependency on UMLS which requires a UMLS license (free)

• Framework • IBM’s Unstructured Information Management Architecture

(UIMA) open source framework, Apache project

• Methods • Natural Language Processing methods (NLP)

• Based on standards and conventions to foster interoperability

• Application • High-throughput system

Page 19: Recent Efforts in Clinical NLP: Clinical Text Analysis and Knowledge Extraction System (cTAKES) Guergana K. Savova, PhD Children’s Hospital Boston and.

19

cTAKES: Components

• Sentence boundary detection (OpenNLP technology)

• Tokenization (rule-based)

• Morphologic normalization (NLM’s LVG)

• POS tagging (OpenNLP technology)

• Shallow parsing (OpenNLP technology)

• Named Entity Recognition• Dictionary mapping (lookup algorithm)

• Machine learning (MAWUI)

• types: diseases/disorders, signs/symptoms, anatomical sites, procedures, medications

• Negation and context identification (NegEx)

• Dependency parser

• Drug Profile module

• Smoking status classifier

• CEM normalization module (soon to be released)

Page 20: Recent Efforts in Clinical NLP: Clinical Text Analysis and Knowledge Extraction System (cTAKES) Guergana K. Savova, PhD Children’s Hospital Boston and.

20

Output Example: Drug Object

• “Tamoxifen 20 mg po daily started on March 1, 2005.”• Drug

• Text: Tamoxifen

• Associated code: C0351245

• Strength: 20 mg

• Start date: March 1, 2005

• End date: null

• Dosage: 1.0

• Frequency: 1.0

• Frequency unit: daily

• Duration: null

• Route: Enteral Oral

• Form: null

• Status: current

• Change Status: no change

• Certainty: null

Page 21: Recent Efforts in Clinical NLP: Clinical Text Analysis and Knowledge Extraction System (cTAKES) Guergana K. Savova, PhD Children’s Hospital Boston and.

21

Output Example: Disorder Object

• “No evidence of cholangiocarcinoma.”• Disorder

• Text: cholangiocarcinoma

• Associated code: SNOMED 70179006

• Certainty: 1

• Context: current

• Relatedness to patient: true

• Status: negated

Page 22: Recent Efforts in Clinical NLP: Clinical Text Analysis and Knowledge Extraction System (cTAKES) Guergana K. Savova, PhD Children’s Hospital Boston and.

(1)cTAKES for developersDownload and install of cTAKES

Building the dictionary

Jiaping ZhengChildren’s Hospital Boston

Page 23: Recent Efforts in Clinical NLP: Clinical Text Analysis and Knowledge Extraction System (cTAKES) Guergana K. Savova, PhD Children’s Hospital Boston and.

Introduction

See separate pdf for the slides

Page 24: Recent Efforts in Clinical NLP: Clinical Text Analysis and Knowledge Extraction System (cTAKES) Guergana K. Savova, PhD Children’s Hospital Boston and.

24

Graphical User Interface (GUI) to cTAKES:

a Prototype

Pei J. ChenChildren’s Hospital Boston

Page 25: Recent Efforts in Clinical NLP: Clinical Text Analysis and Knowledge Extraction System (cTAKES) Guergana K. Savova, PhD Children’s Hospital Boston and.

cTAKES as a Service

Objectives1. Demo cTAKES prototype web application

Empower End Users to leverage cTAKES2. Gather feedback for future cTAKES GUI3. Potential system integrations with other applications

(i.e. i2b2, ARC, Web Annotator)

Developed within i2b2 to integrate cTAKES in the i2b2 NLP cell

Page 26: Recent Efforts in Clinical NLP: Clinical Text Analysis and Knowledge Extraction System (cTAKES) Guergana K. Savova, PhD Children’s Hospital Boston and.

cTAKES Web Application: a Prototype

http://chipweb2.chip.org/cTakes_webservice_trunk/index.html

Page 27: Recent Efforts in Clinical NLP: Clinical Text Analysis and Knowledge Extraction System (cTAKES) Guergana K. Savova, PhD Children’s Hospital Boston and.

Single clinical note

Page 28: Recent Efforts in Clinical NLP: Clinical Text Analysis and Knowledge Extraction System (cTAKES) Guergana K. Savova, PhD Children’s Hospital Boston and.
Page 29: Recent Efforts in Clinical NLP: Clinical Text Analysis and Knowledge Extraction System (cTAKES) Guergana K. Savova, PhD Children’s Hospital Boston and.
Page 30: Recent Efforts in Clinical NLP: Clinical Text Analysis and Knowledge Extraction System (cTAKES) Guergana K. Savova, PhD Children’s Hospital Boston and.
Page 31: Recent Efforts in Clinical NLP: Clinical Text Analysis and Knowledge Extraction System (cTAKES) Guergana K. Savova, PhD Children’s Hospital Boston and.
Page 32: Recent Efforts in Clinical NLP: Clinical Text Analysis and Knowledge Extraction System (cTAKES) Guergana K. Savova, PhD Children’s Hospital Boston and.
Page 33: Recent Efforts in Clinical NLP: Clinical Text Analysis and Knowledge Extraction System (cTAKES) Guergana K. Savova, PhD Children’s Hospital Boston and.
Page 34: Recent Efforts in Clinical NLP: Clinical Text Analysis and Knowledge Extraction System (cTAKES) Guergana K. Savova, PhD Children’s Hospital Boston and.
Page 35: Recent Efforts in Clinical NLP: Clinical Text Analysis and Knowledge Extraction System (cTAKES) Guergana K. Savova, PhD Children’s Hospital Boston and.
Page 36: Recent Efforts in Clinical NLP: Clinical Text Analysis and Knowledge Extraction System (cTAKES) Guergana K. Savova, PhD Children’s Hospital Boston and.
Page 37: Recent Efforts in Clinical NLP: Clinical Text Analysis and Knowledge Extraction System (cTAKES) Guergana K. Savova, PhD Children’s Hospital Boston and.
Page 38: Recent Efforts in Clinical NLP: Clinical Text Analysis and Knowledge Extraction System (cTAKES) Guergana K. Savova, PhD Children’s Hospital Boston and.
Page 39: Recent Efforts in Clinical NLP: Clinical Text Analysis and Knowledge Extraction System (cTAKES) Guergana K. Savova, PhD Children’s Hospital Boston and.
Page 40: Recent Efforts in Clinical NLP: Clinical Text Analysis and Knowledge Extraction System (cTAKES) Guergana K. Savova, PhD Children’s Hospital Boston and.

Technologies

Front-End

Web GUI ExtJS JavaScript

Back-End

cTAKES JAVA UIMA

Middleware

Web ServicesJAVAApache

CXFJSON

Page 41: Recent Efforts in Clinical NLP: Clinical Text Analysis and Knowledge Extraction System (cTAKES) Guergana K. Savova, PhD Children’s Hospital Boston and.

Deployment Considerations

Deployment ModelSecurityPerformanceLicensing (UMLS, Apache, GPL v.3)

Page 42: Recent Efforts in Clinical NLP: Clinical Text Analysis and Knowledge Extraction System (cTAKES) Guergana K. Savova, PhD Children’s Hospital Boston and.