1 1 Intro to Clinical NLP Part 2: Demonstra5ons of open source tools The American Medical Informa5cs Associa5on Annual Mee5ng October 22, 2011
1 1
Intro to Clinical NLP Part 2: Demonstra5ons of open source tools
The American Medical Informa5cs Associa5on Annual Mee5ng October 22, 2011
2
Overview
• Move beyond theory & concepts of Part 1 • Demonstra5ons / presenta5ons of soHware in use
• By researchers building & using systems
• SoHware shown is representa5ve – NOT comprehensive
• NOT a series of tool-‐specific tutorials • AOempt to make the concepts “real”
• PLEASE INTERUPT WITH QUESTIONS • First 5me presen5ng this • You’re probably not the only one wondering what we’re babbling
on about
• Don’t forget your survey 2
3
Your Presenters
• Leonard D’Avolio • Dept of Veterans Affairs • Harvard Medical School
• Bre7 South • University of Utah • Dept of Veterans Affairs
• Sco7 DuVall • University of Utah • Dept of Veterans Affairs
• Dina Demner-‐Fushman • Na5onal Library of Medicine
• Guergana Savova • Children’s Hospital Boston • Harvard Medical School
• Wendy Chapman • University of San Diego • Dept of Veterans Affairs
3
4
Developing / using NLP is a process The NLP Process
5
Overview
• Annota5on using eHOST • BreO
• Using rules & regular expressions • ScoO
• A tour of some NLM resources • Dina
• Concept mapping using cTAKES • Guergana
• Evalua5on workbench • Wendy
• Finding “cases like this one” using ARC • Len
5
6 6
Geang Started
Data & Tools for Today 75 pre-‐annotated documents from i2b2 Challenge
• Thanks to Susanne Churchill & i2b2/VA Challenge Team
iDASH NLP Ecosystem
• Created to increase access to NLP tools / data • A virtual machine with soHware & data from today is available:
hOp://nlp-‐ecosystem.sdsc.edu/vm-‐download.html
8
To Find NLP-‐Related Resources
• ORBIT Project– www.orbit.nlm.nih.gov
• Created by / for the clinical NLP com-‐ munity
• Track resources of
interest
8
9 9
Annota5on using eHOST
BreO South Shuying Shen
AnnotaJon tools • What tool will be used?
o For this demo we will use an alpha version of an open source annota5on tool called eHOST (Extensible Human Oracle Suite of Tools).
o hOp://code.google.com/p/ehost/
• Tools funcJonaliJes: o eHOST Alpha: Interac5ve annota5on approaches (pre-‐annota5on,
interac5ve annota5on), semi-‐automated cura5on, enhanced adjudica5on, workspace features.
o ChartReader: Joint development effort between VA CHIR, VINCI , iDASH and others to integrate eHOST alpha with a tool that supports scalability for large annota5on tasks, administra5ve mode, security protocols, syncing with a server, audit trails, standardized data storage via database connec5vity.
Demo annotaJon task
• Use Case: Extract as many explicitly men5oned diagnoses as possible from a collec5on of 75 discharge summaries selected from one of the i2b2 Challenge tasks.
• Goals: o Illustrate level of difficulty involved with annota5on and building
reference standards.
o Demonstrate annota5ng clinical texts using an annota5on tool and guidelines to build a reference standard.
o Calculate evalua5on metrics in terms of task reliability (IAA) and accuracy (Precision, Recall, F-‐measure).
o AnnotaJon Strategy: One commonly used approach to crea5ng a reference standard involves double annota5on.
Demo annotaJon task • Things we have built for you:
o We don’t expect you to infer clinical diagnoses (no discourse or linking of concepts across sentences).
o We have already developed an annota5on guideline and schema for this task.
o Diagnoses are loosely based on seman5c types from the UMLS:
Demo annotaJon task • The challenge: o One of the aOributes we will iden5fy is nega5on status.
-‐ “[No evidence of] peripheral arterial disease”. -‐ “The pa5ent has [a known history of] “pulmonary edema”, “congesJve heart failure”, “hypertension”, and “diabetes”.
o This task does have a certain level of difficulty, but will be a good demonstra5on of annota5ng texts to build a reference standard for a prac5cal applica5on of clinical NLP.
Hands-‐on component (your homework):
• Sign i2b2 DUA and obtain permission to iDASH VM
• Review the annota5on guideline for this task and experiment using the eHOST tool. – annotate the first 5 documents using the guideline.
hOp://code.google.com/p/ehost/ !
StarJng eHOST-‐Alpha (stand-‐alone client)
Double click
16
Assigning a workspace
Click “change”
“Browse” to workspace
17
Displaying documents
Click on the “project corpus”
Available text in the Viewer under “Text Display”
18
CreaJng an annotaJon schema
Nega5on aOribute with “values”
“AOribute” editor
“Markable” Diagnoses
“Markable” editor
19
CreaJng an annotaJon schema
“Rela5onship” editor
Build a “Rela5onship”
20
AnnotaJng Texts
Assign an “annotator”
Current “annotator”
21
AnnotaJng Texts
Annota5on
Text selector
Grow, shrink annota5on span
Delete
22
AnnotaJng Texts “Save as”
XML output to specific loca5on
23
AdjudicaJon
Read in XML output from A2
Add annota5ons
24
AdjudicaJon
Annota5ons A1 and A2
Show side by side comparison
Diff A1 and A2 shown by red underline
25
AdjudicaJon
Show side by side comparison
Accept, reject, modify
26
ReporJng
HTML style repor5ng
27
ReporJng
Reliability (task consistency)
Validity (task accuracy)
28
ReporJng
Summary detail
Show details for unmatched annota5ons
Increasing AnnotaJon Efficiency and Quality
• Interac5ve annota5on using “Oracle mode”
• Semi-‐Automated cura5on
• Pre-‐Annota5on
• Documenta5on for these func5ons is found on the eHOST wiki site:hOp://code.google.com/p/ehost/
30
InteracJve AnnotaJon “Oracle”
Adjudica5on Mode
“Oracle” mode
See the same types of annota5ons in
context
31
Semi-‐Automated curaJon
Find other instances of the same string
32
Pre-‐AnnotaJon using eHOST
Pre-‐annota5on using pre-‐defined dic5onaries (i.e. UMLS concepts)
Pre-‐annota5on using custom regular expressions
Future direcJons
Chart Reader eHOST
Web applicaJon Client app on your computer
34
AdministraJve FuncJons “Administra5ve” func5ons
Extended func5onali5es
Assign annotators to tasks
Available annotators
Poten5al roles
Assigned projects and set 5me expecta5ons
36
Thank you for your a7enJon!
For more informa5on: [email protected] [email protected] [email protected] [email protected] [email protected] [email protected]
37 37
Using Rules & Regular Expressions
ScoO DuVall
Acknowledgements
Tom Ginter Balaji Soundrarajan This work was supported using resources and facili5es at the VA Salt Lake City Health Care System with funding support from the VA Informa5cs and Compu5ng Infrastructure (VINCI), VA HSR HIR 08-‐204 and the Consor5um for Healthcare Informa5cs Research (CHIR), VA HSR HIR 08-‐374
38
What Rules and Pa7erns Look Like
Rules: If <you see this> Then <do that>
Pa7erns: Find things that <look like this>
39
When To Use Rules and Pa7erns
1. Simple, Unsolved Problems
2. Problems That DicJonary Lookup Doesn’t Solve
3. Finding Structured Data or Using Structure in Text
4. As a Compliment to Any Problem 40
S E N T E N C E
W O R D
P. O.
S P E E C H
P H R A S E
C O N C E P T
project-‐specific concepts
word patterns output
pre-‐processing inference
post-‐processing
41
Where Rules and Pa7erns Fit
training set Load and
randomize validation set 1
initial rule set
extract concepts using current rule
set
2 update / add to
rule set
3
Use failure analysis to identify needed
rule changes
Load and randomize
5 Repeat steps 3 and 4 until recall
and precision target levels are
reached 4
training
Annotated Document Corpus
relevant concepts
irrelevant and missed concepts
Compare extracted concepts with reference standard for accuracy
7
validation
relevant concepts
irrelevant and missed concepts
extract concepts using
final rule set
6 final
rule set
42
How To Develop Rules and Pa7erns
Building on Rules and Pa7erns
Her vital signs were temperature of 100.8 , heart rate 96 , blood pressure 140/80 , respiraJons 20 . The paJent had a sodium of 133 , potassium 5.1 , chloride 102 , bicarbonate 11.4 , BUN and creaJnine 23 and 2.0 . 43
Building on Rules and Pa7erns
44
Her vital signs were <vital sign> of <#> , <vital sign> <#>, <vital sign> <#>/<#>, <vital sign> <#> . The paJent had a <lab test> of <#> , <lab test> <#>, <lab test> <#> , <lab test> <#> , <lab test> and <lab test> <#> and <#> .
46 46
A Tour of Na5onal Library of Medicine NLP
Resources
Dina Demner-‐Fushman
47
Overview
• The Unified Medical Language System (UMLS) • >98 controlled vocabularies linked with concept
unique iden5fiers • MetaMap
• Gramma5cally-‐based concept-‐mapping soHware • SemRep
• Discovering / exploring rela5onships between concepts
• RXNav • Drug informa5on browser
47
This is a 72-‐year-‐old male with a history of thymoma resected in 1996 , chronic obstruc5ve pulmonary disease , hypothyroidism who was transferred …for an myocardial infarc5on and cardiac catheteriza5on .The pa5ent developed shortness of breath at home and the EMTs were called and the pa5ent was found to be in respiratory distress .
user terminology
sources
This is a 72-‐year-‐old male with a history of thymoma resected in 1996 , chronic obstruc5ve pulmonary disease , hypothyroidism who was transferred …for an myocardial infarc5on and cardiac catheteriza5on .The pa5ent developed shortness of breath at home and the EMTs were called and the pa5ent was found to be in respiratory distress .
Related terms, rela5ons, co-‐occurring terms
This is a 72-‐year-‐old male with a history of thymoma resected in 1996 , chronic obstruc5ve pulmonary disease , hypothyroidism who was transferred …for an myocardial infarc5on and cardiac catheteriza5on .The pa5ent developed shortness of breath at home and the EMTs were called and the pa5ent was found to be in respiratory distress .
Can we get to “thymoma resec5on”?
UMLS-‐based tools
This is a 72-‐year-‐old male with a history of thymoma resected in 1996 , chronic obstruc5ve pulmonary disease , hypothyroidism who was transferred …for an myocardial infarc5on and cardiac catheteriza5on .The pa5ent developed shortness of breath at home and the EMTs were called and the pa5ent was found to be in respiratory distress .
UMLS-‐based tools
This is a 72-‐|year|-‐|old| male| with a |history| of |thymoma| resected| in 1996 , |chronic obstruc5ve pulmonary disease| , |hypothyroidism| who was |transferred| …for an |myocardial infarc5on| and |cardiac catheteriza5on| .The |pa5ent| developed |shortness| of |breath| at |home| and the EMTs were |called| and the |pa5ent |was |found| to be in |respiratory distress|
InteracJve MetaMap Results
Composite phrases
Words sense disambigua5on
This is a 72-‐year-‐old male with a history of thymoma resected in 1996 , chronic obstruc5ve pulmonary disease , hypothyroidism who was transferred …for an myocardial infarc5on and cardiac catheteriza5on .The pa5ent developed shortness of breath at home and the EMTs were called and the pa5ent was found to be in respiratory distress .
Restrict to disorders
Nega5on
UMLS-‐sanc5oned rela5ons
Matching drugs and diseases through UMLS rela5ons
1) search the disease in NDF-‐RT (get the NUI)
2) get all drugs that treat/prevent the disease
Finding these resources
• hOps://uts.nlm.nih.gov/home.html
• hOp://skr.nlm.nih.gov/index.shtml
• hOp://rxnav.nlm.nih.gov/
64 64
Concept Mapping Using the Clinical Text Analysis and
Knowledge Extrac5on System (cTAKES)
Guergana Savova
65
Overview
• cTAKES: Aims
• cTAKES: Use cases • cTAKES: High-‐level overview • cTAKES: Examples of annota5on layers
65
Aims • Informa5on extrac5on (IE): transforma5on of unstructured text into structured representa5ons and merging clinical data extracted from free text with structured data
– En2ty and Event discovery – Rela2on discovery – Normaliza2on template: Clinical Element Model (CEM)
• Overarching goal – high-‐throughput phenotype extrac5on from clinical free text based on standards and the principles of interoperability
– general purpose clinical NLP tool with applica2ons to the majority of all imaginable use cases
A 43-year-old woman was diagnosed with type 2 diabetes mellitus by her family physician 3 mpresentation. Her initial blood glucose was 340 mg/dL. Glyburide
A 43-year-old woman was diagnosed with type 2 diabetes mellitus by her family physician 3 months before this presentation. Her initial blood glucose was 340 mg/dL. Glyburide
A 43-year-old woman was diagnosed with type 2 diabetes mellitus by her family physician 3 months before this presentation. Her initial blood glucose was 340 mg/dL. Glyburide
A 43-year-old woman was diagnosed with type 2 diabetes mellitus by her family physician 3 months before this presentation. Her initial blood glucose was 340 mg/dL. Glyburide 2.5 mg once daily was prescribed. Since then, self-monitoring of blood glucose (SMBG) showed blood glucose levels of 250-270 mg/dL. She was referred to an endocrinologist for further evaluation.
On examination, she was normotensive and not acutely ill. Her body mass index (BMI) was 18.7 kg/m2 following a recent 10 lb weight loss. Her thyroid was symmetrically enlarged and ankle reflexes absent. Her blood glucose was 272 mg/dL, and her hemoglobin A1c (HbA1c) was 10.3%. A lipid profile showed a total cholesterol of 261 mg/dL, triglyceride level of 321 mg/dL, HDL level of 48 mg/dL, and an LDL of 150 mg/dL. Thyroid function was normal. Urinanalysis showed trace ketones.
She adhered to a regular exercise program and vitamin regimen, smoked 2 packs of cigarettes daily for the past 25 years, and limited her alcohol intake to 1 drink daily. Her mother's brother was diabetic.
Processing Clinical Notes
A 43-year-old woman was diagnosed with type 2 diabetes mellitus by her family physician 3 months before this presentation. Her initial blood glucose was 340 mg/dL. Glyburide 2.5 mg once daily was prescribed. Since then, self-monitoring of blood glucose (SMBG) showed blood glucose levels of 250-270 mg/dL. She was referred to an endocrinologist for further evaluation.
On examination, she was normotensive and not acutely ill. Her body mass index (BMI) was 18.7 kg/m2 following a recent 10 lb weight loss. Her thyroid was symmetrically enlarged and ankle reflexes absent. Her blood glucose was 272 mg/dL, and her hemoglobin A1c (HbA1c) was 10.3%. A lipid profile showed a total cholesterol of 261 mg/dL, triglyceride level of 321 mg/dL, HDL level of 48 mg/dL, and an LDL of 150 mg/dL. Thyroid function was normal. Urinanalysis showed trace ketones.
She adhered to a regular exercise program and vitamin regimen, smoked 2 packs of cigarettes daily for the past 25 years, and limited her alcohol intake to 1 drink daily. Her mother's brother was diabetic.
Clinical Element Model hOp://intermountainhealthcare.org/CEM
Disorder CEM text: diabetes mellitus code: 73211009 subject: patient relative temporal context: 3 months ago negation indicator: not negated
Disorder CEM text: diabetes mellitus code: 73211009 subject: family member relative temporal context: negation indicator: not negated
Tobacco Use CEM text: smoking code: 365981007 subject: patient relative temporal context: 25 years negation indicator: not negated
Medication CEM text: Glyburide code: 315989 subject: patient frequency: once daily negation indicator: not negated strength: 2.5 mg
A 43-year-old woman was diagnosed with type 2 diabetes mellitus by her family physician 3 months before this presentation. Her initial blood glucose was 340 mg/dL. Glyburide 2.5 mg once daily was prescribed. Since then, self-monitoring of blood glucose (SMBG) showed blood glucose levels of 250-270 mg/dL. She was referred to an endocrinologist for further evaluation.
On examination, she was normotensive and not acutely ill. Her body mass index (BMI) was 18.7 kg/m2 following a recent 10 lb weight loss. Her thyroid was symmetrically enlarged and ankle reflexes absent. Her blood glucose was 272 mg/dL, and her hemoglobin A1c (HbA1c) was 10.3%. A lipid profile showed a total cholesterol of 261 mg/dL, triglyceride level of 321 mg/dL, HDL level of 48 mg/dL, and an LDL of 150 mg/dL. Thyroid function was normal. Urinanalysis showed trace ketones.
She adhered to a regular exercise program and vitamin regimen, smoked 2 packs of cigarettes daily for the past 25 years, and limited her alcohol intake to 1 drink daily. Her mother's brother was diabetic.
A 43-year-old woman was diagnosed with type 2 diabetes mellitus by her family physician 3 months before this presentation. Her initial blood glucose was 340 mg/dL. Glyburide 2.5 mg once daily was prescribed. Since then, self-monitoring of blood glucose (SMBG) showed blood glucose levels of 250-270 mg/dL. She was referred to an endocrinologist for further evaluation.
On examination, she was normotensive and not acutely ill. Her body mass index (BMI) was 18.7 kg/m2 following a recent 10 lb weight loss. Her thyroid was symmetrically enlarged and ankle reflexes absent. Her blood glucose was 272 mg/dL, and her hemoglobin A1c (HbA1c) was 10.3%. A lipid profile showed a total cholesterol of 261 mg/dL, triglyceride level of 321 mg/dL, HDL level of 48 mg/dL, and an LDL of 150 mg/dL. Thyroid function was normal. Urinanalysis showed trace ketones.
She adhered to a regular exercise program and vitamin regimen, smoked 2 packs of cigarettes daily for the past 25 years, and limited her alcohol intake to 1 drink daily. Her mother's brother was diabetic.
A 43-year-old woman was diagnosed with type 2 diabetes mellitus by her family physician 3 months before this presentation. Her initial blood glucose was 340 mg/dL. Glyburide 2.5 mg once daily was prescribed. Since then, self-monitoring of blood glucose (SMBG) showed blood glucose levels of 250-270 mg/dL. She was referred to an endocrinologist for further evaluation.
On examination, she was normotensive and not acutely ill. Her body mass index (BMI) was 18.7 kg/m2 following a recent 10 lb weight loss. Her thyroid was symmetrically enlarged and ankle reflexes absent. Her blood glucose was 272 mg/dL, and her hemoglobin A1c (HbA1c) was 10.3%. A lipid profile showed a total cholesterol of 261 mg/dL, triglyceride level of 321 mg/dL, HDL level of 48 mg/dL, and an LDL of 150 mg/dL. Thyroid function was normal. Urinanalysis showed trace ketones.
She adhered to a regular exercise program and vitamin regimen, smoked 2 packs of cigarettes daily for the past 25 years, and limited her alcohol intake to 1 drink daily. Her mother's brother was diabetic.
A 43-year-old woman was diagnosed with type 2 diabetes mellitus by her family physician 3 months before this presentation. Her initial blood glucose was 340 mg/dL. Glyburide 2.5 mg once daily was prescribed. Since then, self-monitoring of blood glucose (SMBG) showed blood glucose levels of 250-270 mg/dL. She was referred to an endocrinologist for further evaluation.
On examination, she was normotensive and not acutely ill. Her body mass index (BMI) was 18.7 kg/m2 following a recent 10 lb weight loss. Her thyroid was symmetrically enlarged and ankle reflexes absent. Her blood glucose was 272 mg/dL, and her hemoglobin A1c (HbA1c) was 10.3%. A lipid profile showed a total cholesterol of 261 mg/dL, triglyceride level of 321 mg/dL, HDL level of 48 mg/dL, and an LDL of 150 mg/dL. Thyroid function was normal. Urinanalysis showed trace ketones.
She adhered to a regular exercise program and vitamin regimen, smoked 2 packs of cigarettes daily for the past 25 years, and limited her alcohol intake to 1 drink daily. Her mother's brother was diabetic.
Compara5ve Effec5veness
Disorder CEM text: diabetes mellitus code: 73211009 subject: patient relative temporal context: 3 months ago negation indicator: not negated
Disorder CEM text: diabetes mellitus code: 73211009 subject: family member relative temporal context: negation indicator: not negated
Tobacco Use CEM text: smoking code: 365981007 subject: patient relative temporal context: 25 years negation indicator: not negated
Medication CEM text: Glyburide code: 315989 subject: patient frequency: once daily negation indicator: not negated strength: 2.5 mg
Compare the effectiveness of different treatment strategies (e.g., modifying target levels for glucose, lipid, or blood pressure) in reducing cardiovascular complications in newly diagnosed adolescents and adults with type 2 diabetes.
Compare the effectiveness of traditional behavioral interventions versus economic incentives in motivating behavior changes (e.g., weight loss, smoking cessation, avoiding alcohol and substance abuse) in children and adults.
Meaningful Use
Disorder CEM text: diabetes mellitus code: 73211009 subject: patient relative temporal context: 3 months ago negation indicator: not negated
Disorder CEM text: diabetes mellitus code: 73211009 subject: family member relative temporal context: negation indicator: not negated
Tobacco Use CEM text: smoking code: 365981007 subject: patient relative temporal context: 25 years negation indicator: not negated
Medication CEM text: Glyburide code: 315989 subject: patient frequency: once daily negation indicator: not negated strength: 2.5 mg
• Maintain problem list
• Maintain active med list
• Record smoking status
• Provide clinical summaries for each office visit
• Generate patient lists for specific conditions
• Submit syndromic surveillance data
Clinical Prac5ce
Disorder CEM text: diabetes mellitus code: 73211009 subject: patient relative temporal context: 3 months ago negation indicator: not negated
Medication CEM text: Glyburide code: 315989 subject: patient frequency: once daily negation indicator: not negated strength: 2.5 mg
• Provide problem list and meds from the visit
Applica5ons • Meaningful use of the EMR
• Compara5ve effec5veness
• Clinical inves5ga5on – Pa5ent cohort iden5fica5on – Phenotype extrac5on
• Epidemiology
• Clinical prac5ce • Decision support systems
• …..
Overview • Goal:
• Phenotype extrac5on • Generic – to be used for a variety of retrievals and use cases • Expandable – at the informa5on model level and methods • Modular • Cuang edge technologies – best methods combining exis5ng prac5ces and novel
research with rapid technology transfer • Terminology agnos5c: able to plug in any terminology • Best soHware prac5ces • Stand-‐alone tool easily pluggable within other plaxorms/toolsets
• Apache v2.0 license • Goal: cTAKES as a top-‐level Apache project • h7p://sourceforge.net/projects/ohnlp/
cTAKES Adop5on • May, 2011:
– 2306 downloads*
• i2b2 NLP cell integra5on; relevance to CTSAs
• eMERGE (SGH, NW)
• PGRN (HMS, NW)
• SHARPn • Extensions: Yale (YTEX),
MITRE
• Mil5-‐source Integrated Plaxorm for Answering Clinical Ques5ons (MiPACQ)
* Source: http://sourceforge.net/project/stats/?group_id=255545&ugn=ohnlp&type=&mode=alltime
cTAKES Technical Details • Open source
• Apache v2.0 license
• h7p://sourceforge.net/projects/ohnlp/ • Java 1.5
• Framework • IBM’s Unstructured InformaJon Management Architecture (UIMA) open source
framework, Apache project
• Methods • Natural Language Processing methods (NLP) • Based on standards and convenJons to foster interoperability
• Applica5on • High-‐throughput system
cTAKES: Components (all trained on clinical data) • Sentence boundary detecJon (OpenNLP technology) • TokenizaJon (rule-‐based) • Morphologic normalizaJon (NLM’s LVG) • POS tagging (OpenNLP technology) • Shallow parsing (OpenNLP technology) • Named EnJty RecogniJon
• Dic5onary mapping (lookup algorithm) • Machine learning (MAWUI) • UMLS seman5c types: diseases/disorders, signs/symptoms, anatomical sites, procedures,
medica5ons
• NegaJon and context idenJficaJon (NegEx) • Dependency parser • Drug Profile module • Smoking status classifier • CEM normalizaJon module • ConsJtuency parser (release in November, 2011) • Coreference module (release in November, 2011) • UMLS relaJon discovery module (release in December, 2011) • SemanJc role labeler (release in January, 2012)
Extra slides
Output Example: Drug Object • “Tamoxifen 20 mg po daily started on March 1, 2005.”
• Drug • Text: Tamoxifen • Associated code: C0351245 • Strength: 20 mg • Start date: March 1, 2005 • End date: null • Dosage: 1.0 • Frequency: 1.0 • Frequency unit: daily • DuraJon: null • Route: Enteral Oral • Form: null • Status: current • Change Status: no change • Certainty: null
Courtesy of David Carrell
87 87
Finding “Cases Like This” Using ARC
Leonard D’Avolio
88
Automated Retrieval Console (ARC)
• Reduce custom soHware & rules development • 90/90 goal
• Reduce process to smallest possible effort
• Free up researchers to do research
88
89
90
Underlying Approach
• Import Knowtator files • Open source, widely used annota5on package • eHOST uses Knowtator
• Turn NLP output into “features” for supervised machine
learning • cTAKES • MALLET
• Use n-‐fold cross valida5on to try several models behind the scenes
• Present top scores to researchers who can then deploy on larger collec5on 90
91
Document Retrieval
Recall Precision F-‐Measure
Prostate Cancer Path Reports 0.97 0.95 0.94
Colorectal Cancer Path Reports 0.90 0.92 0.89
Lung Cancer Imaging 0.76 0.80 0.75
PTSD Psychotherapy Notes 0.98 0.90 0.93
Breast Cancer OperaJve Reports 0.88 0.90 0.88
Concept Retrieval (inexact span matching)
Recall Precision F-‐Measure
2010 i2b2/VA Medical Problems 0.75 0.93 0.83
2010 i2b2/VA Medical Treatments 0.76 0.89 0.82
2010 i2b2/VA Medical Tests 0.76 0.90 0.83
“Out of Box” Performance
92
DemonstraJon
• Using the demo data set • Find vascular disease
• including cerebro, cardio, intes5nal or peripheral
For more informa5on, tutorials, download, etc hOp://research.maveric.org/mig/arc.html
Or search “ARC” on ORBIT Project
92
93 93
Evalua5on Using the Evalua5on Workbench
Wendy Chapman
EvaluaJon Workbench
• What is it? o A tool for comparing the output of two NLP annotators on clinical text o NLP system vs human annota5on
o View annota5ons o Calculate outcome measures o Drill down to all levels of annota5on
o Document-‐level
o Perform error analysis o Future versions will support formal error analysis
Levels of Annota5on
• Document – Report classified as Shigellosis
• Group – Sec5on classified as Past Medical History Sec5on
• UOerance – Group of text classified as Sentence
• Snippet – “chest pain” classified as CUI 058273
• Word – “pain” classified as noun)
• Token – “.” classified as EOS marker 95
Demo evaluaJon workbench
• Use Case: Evaluate single system (Topaz) performance at iden5fica5on of a specific subset of disorders relevant for disease surveillance:
• 101 condi5ons
• Classify document • Condi5on Acute • Condi5on Chronic • Condi5on Absent
• Classify snippets à map to Core Concept code • Direc5onality à negated, affirmed • Experiencer à pa5ent, other • Temporality à historical, acute. Hypothe5cal/condi5onal
Diarrhea Abdominal pain Wheezing Fever Cough
Informa5on Model for Workbench Every annota5on has the following meta-‐data:
97
ID Level Span
Classification - properties
Attributes Related Annotations
Patient denies diarrhea!
ID: 0034 Snippet
9-14
Negation trigger
- Source:NegEx
Direction: Forward 0035 (negates)
ID: 0035 Snippet
16-23
CoreConceptInstance
Directionality: absent Experiencer: patient Temporality: recent
0034 (negated by)
negates
Rela5onships – user specifies Component rela5onship: Annota5ons comprising another annota5ons
98
ID: 0035 Snippet
84-91 Core Concept Instance:Diarrhea
Directionality: present Experiencer: patient Temporality: recent
This patient presented with periumbilical abdominal pain with nausea, vomiting, and diarrhea…The patient’s nausea, vomiting,and diarrhea did resolve during his hospital course.!
ID: 0041
Snippet
218-225 Core Concept Instance Diarrhea
Directionality: absent Experiencer: patient Temporality: recent
ID: 0086
Document
1-973
Document Core Concept: Diarrhea
Status: acute Components: [0035, 0041]
99
Document & annota5ons
Outcome Measures for Selected Annota5ons
Select Classifica5ons
to View
Report List
AOributes for Selected
Annota5on
Rela5onships for Selected
Annota5on
Status of Workbench
• Ul5mately will read in output of any system that maps to our informa5on model
• Currently reads in output of single system -‐ Topaz
• Crea5ng tool to read in UIMA type system descrip5on and assist user in mapping to informa5on model – Working on cTAKES medica5on annota5on type system
• Available on the iDASH VM – January will be available on GitHub
100
101
Contact
• Leonard D’Avolio • [email protected]
• BreO South • [email protected]
• ScoO DuVall • [email protected]
• Dina Demner-‐Fushman • [email protected]
• Guergana Savova • [email protected]
• Wendy Chapman • [email protected]
101