Mining and Processing of Unstructured Medical Data Cindy Perscheid Festival of Genomics London, Jan 19, 2016
Jan 09, 2017
Mining and Processing of Unstructured Medical Data
Cindy Perscheid
Festival of Genomics
London, Jan 19, 2016
■ Doctor‘s and discharge letters
■ Clinical trial descriptions
■ Scientific publications
Unstructured Medical Data Information Hidden in Text
Perscheid, Schapranow
Processing of Unstructured Medical Data
Chart 2
■ Huge amount of data: Pubmed with references to +25 Million articles
■ Restricted querying: Keyword search
■ Multilingual
Unstructured Medical Data Challenges and Limitations
Perscheid, Schapranow
Processing of Unstructured Medical Data
Chart 3
[Patients 65 years]NP or [older]ADJP [with]PP [breast cancer]NP ... [Patients 65 years]NP or [older]ADJP [with]PP [breast cancer]NP ...
■ Named Entity Recognition: Identify keywords
■ Part-Of-Speech Tagging: Identify grammatical function of words
■ Parsing: Identify sentence structure and components
□ Chunking: Combine words and POS tags to chunks
□ Relation Extraction: Identify relations between sentence parts
■ Semantic Role Labeling: Identify specific roles in sentence
■ …
Natural Language Processing Selected Methods
Perscheid, Schapranow
Processing of Unstructured Medical Data
Noun Noun Noun
Disease
Preposition
Person
Adjective
Chart 4
Noun
■ IMDB provides text analysis features, e.g.
□ Fulltext indexing
□ Entity Recognition
□ Tokenization/Chunking
□ Fuzzy search
■ Mechanisms can be made domain-specific by specifying
□ Dictionaries
□ CGUL rules containing regular expressions with linguistic attributes
Outlook IMDB Textual Analysis Features
T Text Retrieval and Extraction
Multi-Core and Parallelization
Reduction of Layers
x x
Perscheid, Schapranow
Processing of Unstructured Medical Data
Chart 5
?
Natural Language Processing Applications
Perscheid, Schapranow
Processing of Unstructured Medical Data
Chart 6 Hello Bonjour
Text Summarization
Question Answering Systems
Machine Translation
Information Retrieval and Extraction
Doctor‘s Letter Explanation
major depression
What disease is mirtazapine
predominantly used for?
?
■ In short: Slow tools, wrong results
□ Too hard: Natural language is complex
□ Too much data: >25 Million papers in PubMed…
Application Example: Question Answering Still a lot to Improve…
Perscheid, Schapranow
Processing of Unstructured Medical Data
Credit: Dr. Mariana Neves, Hasso Plattner Institute
Chart 7
Thanks!
Hasso Plattner Institute Enterprise Platform & Integration Concepts
August-Bebel-Str. 88 14482 Potsdam, Germany
Dr. Matthieu-P. Schapranow [email protected]
http://we.analyzegenomes.com/
Cindy Perscheid, M. Sc. [email protected]
Perscheid, Schapranow
Processing of Unstructured Medical Data
Chart 8