Biomedical Named Entity Recognition
Post on 24-Feb-2016
75 Views
Preview:
DESCRIPTION
Transcript
Citation
Biomedical InformaticsData ➜ Information ➜ Knowledge
BMI
Biomedical Named Entity Recognition
Ramakanth Kavuluru
NLP Seminar – 8/21/2012
BMI
What are named entities?
• The benefits of taking cholesterol lowering statin drugs outweigh the risks even among people who are likely to develop diabetes.
• Acute exposure to resveratrol inhibits AMPK activity in human skeletal muscle cells
BMI
What are named entities?
• The benefits of taking cholesterol lowering statin drugs outweigh the risks even among people who are likely to develop diabetes.
• Acute exposure to resveratrol inhibits AMPK activity in human skeletal muscle cells
Biologically Active Substance
Drug
Disorder
Organic Chemical
Enzyme
Cell
BMI
What are named entities?
• The benefits of taking cholesterol lowering statin drugs outweigh the risks even among people who are likely to develop diabetes.
• Acute exposure to resveratrol inhibits AMPK activity in human skeletal muscle cells
Cholesterol lowering drugs
Drug
Biological Function
BMI
Why do we need to extract them?
• To provide effective semantic search– Find all discharge summaries of patients that
have a history of diabetes and obesity and have taken statins as part of their treatment.
– Find all biomedical articles that discuss the dopamine neurotransmitter in the context of depressive disorders.
Clinical Trial Recruitment
Literature Review
BMI
Why do we need to extract them?
• To use as features in machine learning for effective text classification
• To build semantic clusters of textual documents to understand evolving themes
• Reduce noise by avoiding key words that are not indicative of the classes or clusters
• Recently, as a first step in relation extraction and hence in knowledge discovery
BMI
A major task in text mining• Extract information from textual data• Use this information to solve problems• What type of information?– relevant concepts - a medical condition or
finding, a drug, a gene or protein, an emotion (hope, love, …)
– Relevant (binary) relations – drug TREATS a condition, protein CAUSES a disease
• What are the typical questions?– Does a pathology report indicate a reportable
case?– Which patients satisfy the criteria for a clinical
trial?
BMI
Knowledge Discovery
• VIP Peptide – increases – Catecholamine Biosynthesis
• Catecholamines – induce – β-adrenergic receptor activity
• β-adrenergic receptors – are involved – fear conditioning
VIP Peptide – affects – fear conditioning ?????
In Cattle
In Rats
In Humans
BMI
Clinical NER
Concept Type Attributes• Disorder/
Symptom
• Medication
• Procedures
Present/historical/absent, Acute? Uncertain?
Present/historical/future
BMI
Why is NER Hard?
BMI
Linguistic Variation
• Derivational variation: cranial, cranium• Inflectional variation: coughed, coughing• Synonymy– nuerofibromin 2, merlin, NF2 protein, and
schwannomin.– Addison’s disease, adrenal insufficiency,
hypocortisolism, bronzed disease– Feeding problems in newborn – The mother
said she was having trouble feeding the baby.
BMI
Polysemy
• Merlin – both a bird and protein in UMLS• Discharge– Patient was prescribed codeine upon discharge– The discharge was yellow and purulent
• Abbreviations– APC: Activated protein C, Adenomatosis
polyposis coli, antigen presenting cell, aerobic plate count, advanced pancreatic cancer, age period cohort, antibody producing cells, atrial premature complex
BMI
Negation
• Nearly half of all clinical concepts in dictated narratives are negated– There is no maxillary sinus tenderness
• Implied absence without negation– Lungs are clear upon auscultationSo,– Rales: Absent– Rhonchi: Absent– Wheezing: Absent
BMI
Controlled Terminologies
Controlled vocabularies or taxonomies– Gene Ontology (gene products)
• most cited, 450 per year in PubMed• Total of 33000+ terms
– SNOMED CT (about 300K+ concepts)– NCI Thesaurus , ICD-9/10, ICD-0-3, LOINC,
MedlinePlus– UMLS Metathesaurus (integration of 140+
vocabularies)• 2.3 million concepts
BMI
more Metathesaurus
• CUIs• LUIs• SUIs• AUIs
BMI
Semantic Types and Relations
• NLM Semantic Network, the type system behind UMLS Metathesaurus– Semantic Types (135)
• Semantic Groups (15)– Semantic Relations (54)
• Specialist Lexicon– Malaria, malarial– Hyperplasia, hyperplastic
How do we extract named entities?
BMI
Metamap from NLM
Identify phrases: Use SPECIALIST parser
Map to CUIs: Use SPECIALIST Lexicon, Metathesaurus and Semantic Network
BMI
Output of syntactic analysis
• Syntactic Analysis – “ocular complications of myasthenia gravis” – Ocular (adj), complications (noun), of (prep),
myasthenia (noun), gravis (noun)– gives noun phrases (NP): “Ocular
complications” and “Myasthenia gravis”– Prepositions are ignored– In a given NP, you have a head and modifiers:
• Ocular (mod) and complications (head)• How about “male pattern baldness”?
BMI
Variant Generation
BMI
Variant Generation
BMI
Candidate identification• Look for all variants in Metathesaurus
strings and identify those candidate concepts (CUIs) that contain at least one variant as a substring
• Example: For ocular complication, obtain all Metathesaurus strings that contain any of the following as substrings– Optic complication– Eyes complication– Opthalmic complicated– ….
BMI
Mapping and Evaluation
• So now we have a bunch of candidate CUIs based on presence of variants of the given phrase in Metathesaurus strings. How do we select the best candidate.
• Use several measures to compute a rank– Centrality (involvement of head)– Variation (average of inverse distance scores)– Coverage– Cohesivness
BMI
Final Score
BMI
Metamap Options
• Types of variants: include or exclude derivational variants
• Word sense disambiguation– Discharge (bodily secretion VS release the
patient)• Concept gaps– Obstructive apnea mapping to “obstructive
sleep apnea” or “obstructive neonatal apnea”• Term processing– Process the input string as a single concept,
that is, don’t split it into noun phrases
BMI
Output options
• Human readable format• XML format• Restrictions based on certain vocabularies:
consider only ICD-9• Restrictions based on certain types:
consider only pharmacological substances (i.e., drugs)
DEMO TIME: Daniel Harris
BMI
References• An overview of Metamap
: Historical Perspectives and Recent Advances, Alan Aronson and Francois Lang
• Effective Mapping of Biomedical Text to the UMLS Metathesaurus: The MetaMap Program, Alan Aronson
• Comparison of LVG and Metamap Functionality, Alan Aronson
• Lexical, Terminological, and Ontological Resources for Biological Text Mining, Olivier Bodenreider
top related