Using Patient Data to Retrieve Health Knowledge James J. Cimino, Mark Meyer, Nam-Ju Lee, Suzanne Bakken Columbia University AMIA Fall Symposium October 25, 2005
Dec 22, 2015
Using Patient Data to Retrieve Health Knowledge
James J. Cimino, Mark Meyer, Nam-Ju Lee, Suzanne Bakken
Columbia UniversityAMIA Fall Symposium
October 25, 2005
Automated Retrieval with Clinical Data
UnderstandInformation
Needs
1
Get InformationFrom EMR
2
AutomatedTranslation
5
ResourceTerminology
4
Presentation
7ResourceSelection
3
Querying
6MRSA
What’s Hardest about Infobuttons?
• It’s not knowing the questions
• It’s not integrating clinical info systems
• It’s not linking to resources
• It’s translating source data to target terms
Automated Retrieval with Clinical Data
UnderstandInformation
Needs
1
Get InformationFrom EMR
2
AutomatedTranslation
5
ResourceTerminology
4
Presentation
7ResourceSelection
3
Querying
6MRSA
What’s Hardest about Infobuttons?
• It’s not knowing the questions
• It’s not integrating clinical info systems
• It’s not linking to resources
• It’s translating source data to target terms
Types of Source Terminologies
• Uncoded (narrative):– Radiology reports (?)
"…infiltrate is seen in the left upper lobe."
• Coded– Lab tests (6,133)
AMIKACIN, PEAK LEVEL– Sensitivity tests (476)
AMI 6 MCG/ML– Microbiology results (2,173)
ESCHERECHIA COLI– Medications (15,311)
UD AMIKACIN 1 GM VIAL
Types of Target Terminologies
• Narrative search:– PubMed– RxList– Up to Date– Micromedex– Lab Tests Online– OneLook– National Guideline
Clearinghouse
• Coded resource:– Lexicomp– CPMC Lab Manual
• Coded search– PubMed
Term Samples
• 100 terms from radiology reports using MedLEE
• 100 Medication ingredients
• 100 Lab test analytes
• 100 Microbiology results
• 94 Sensitivity test reagents
The Experiments
• Identify sources of patient data
• Get random sample of terms for each source
• Translate terms if needed (multiple methods)
• Perform automated retrieval with terms
Searches Performed
Narrative Concept ConceptResource Resource Search
Un-Coded
Coded
RadiologyTerms
Medications
LabTests
SensitivityTests
MicrobiologyResults
PubMed, NGC,OneLook, UptoDate
RxList, Micromedex Lexicomp
LabtestsOnline, CPMC Lab PubMed PubMed Manual
RxList, Micromedex
UptoDate, PubMed PubMed
Mapping Methods
• Microbiology results to MeSH:– Semi-automated
• Lab tests to MeSH analytes:– Automated, using UMLS
• Medications to Lexicomp:– Natural language processing
• Lab tests to CPMC Lab Manual:– Manual matching
Results: Multiple DocumentsTerms from Data Source Searches Performed Retrieval Success
100 Findings 100 PubMed 100 % (92,440)and Diagnoses 100 Up to Date 82% (28.6)
from 20 100 NGC 95% (119) Radiology Reports 100 One Look 81% (25.8)
100 Up to Date 94% (1.4)100 Microbiology 100 PubMed 100% (3,328)
Result Terms 100 PubMed (using MeSH translation)
100% (18,036)
100 Lab Test Terms 100 Lab Tests Online 73% (133)(using analyte names) 100 PubMed 99% (84,633)
100 PubMed (using MeSH translation)
100% (90,656)
Retrieval success is represented as percent of terms that successfully retrieved any results; numbers in parentheses indicate average numbers of results (citations, documents, topics, definitions, etc., depending on the target resource) for those searches that retrieved at least one result.
Uncoded versus Coded Searches
• 1,028/2,173 (47.3%) of microbiology tests terms mapped to MeSH
• 940/1041 (90.3%) of lab analytes mapped to LOINC• 485/940 (51.6%) LOINC analytes mapped to MeSH
Result Type Number Ratio
Identical 33 1.00
Slight Diff 7 1.44
Large Diff 60 29.92
Result Type Number Ratio
Identical 72 1.00
Slight Diff 16 1.05
Large Diff 12 3.28
Results: Single Document
Terms from Data Source Searches Performed Retrieval Success
100 Medication Terms 100 Lexicomp (using document identifiers)
96% (1)
100 Laboratory Test Terms
100 Lab Manual (using document identifiers)
94% (1)
Retrieval success is represented as percent of terms that successfully retrieved any results; numbers in parentheses indicate average numbers of results (citations, documents, topics, definitions, etc., depending on the target resource) for those searches that retrieved at least one result.
Results: Page of Links
Terms from Data Source Searches Performed Retrieval Success
100 Medication Terms 100 Rx List 95% [.88/.04](using ingredient names) 100 Micromedex 100% [.89/.06]94 Sensitivity Test Terms 94 Rx List 85%[.79/.06](using antibiotic names) 94 Micromedex 97% [.96/.01]
Results for Rx List and Micromedex are difficult to quantify, because they provided heterogeneous lists of links; rather than provide link counts, we assessed the true positive and false negative rates, shown in brackets.
Micromedex versus RXList
194 Terms
9 missed by both
RxList: 163 Micromedex: 180158 Terms
foundby both22 found by Micromedex
but missed by RxList
5 found by RxListbut missed by Micromedex
See For Yourself!
www.dbmi.columbia.edu/cimino/2005amia-data.html
Discussion• 7 sources, 894 terms, 11 resources, 1,592 searches• Automated retrieval is technically possible
– Found something 73-100% of the time– 12/16 experiments “succeeded” 94-100%
• Translation often unsuccessful• Automated indexing works• Usefulness of translation to MeSH is marginal• Good quality when retrieving pages of links
(Micromedex and RxList)• Good quality when with concept-indexed resources• Recall/precision of document retrievals unknown
– Need to define the question– Additional evaluation needed
Next Steps
• Creation of terminology management and indexing suite
• Formal analysis of qualities of answers