Lars Juhl Jensen
Pragmatic text miningFrom literature to electronic health
records
why text mining?
data mining
guilt by association
structured data
unstructured text
biomedical literature
>10 km
too much to read
computer
as smart as a dog
teach it specific tricks
named entity recognition
dictionary-based approach
identification required
dictionary
cyclin dependent kinase 1
CDC2
expansion rules
CDC2
hCdc2
flexible matching
hyphens and spaces
“black list”
SDS
efficient tagger
Pafilis et al., PLOS ONE, 2013
the formal way
benchmark
manually annotated corpus
automatic tagging
precision
recall
natural language processing
Gene and protein namesCue words for entity recognitionVerbs for relation extraction
[nxexpr The expression of [nxgene the cytochrome genes [nxpg CYC1 and CYC7]]]is controlled by[nxpg HAP1]
hard work
the pragmatic way
“benchmark light”
requires fewer calories
non-annotated corpus
automatic tagging
random inspection
precision
no recall
relative recall
co-mentioning
within documents
within paragraphs
within sentences
weighted score
unifying text & data
web resources
text mining
curated knowledge
Letunic & Bork, Trends in Biochemical Sciences, 2008
experimental data
von Mering et al., Nucleic Acids Research, 2005
computational predictions
common identifiers
quality scores
proteins
Szklarczyk, Franceschini et al., Nucleic Acids Research, 2011
small molecules
Kuhn et al., Nucleic Acids Research, 2012
compartments
compartments.jensenlab.org
tissues
tissues.jensenlab.org
diseases
electronic health records
Jensen et al., Nature Reviews Genetics, 2012
structured data
Jensen et al., Nature Reviews Genetics, 2012
unstructured data
clinical narrative
Danish
busy doctors
psychiatric patients
pharmacovigilance
structured data
medication
text mining
drug indications
adverse drug events
temporal correlation
complex filtering
Eriksson et al., in submitted, 2013
Eriksson et al., submitted, 2013
Drug substance ADE p-value
Chlordiazepoxide Nystagmus 4.0e-8
Simvastatin Personality changes
8.4e-8
Dipyridamole Visual impairment
4.4e-4
Citalopram Psychosis 8.8e-4
Bendroflumethiazide
Apoplexy 8.5e-3
AcknowledgmentsProtein networksChristian von MeringDamian SzklarczykMichael KuhnManuel StarkJean MullerTobias DoerksAlexander RothMilan SimonovicBerend SnelMartijn HuynenPeer Bork
Localization and diseaseSune FrankildAlberto SantosKalliopi TsafouJanos BinderReinhard SchneiderSean O’DonoghueElectronic health recordsRobert ErikssonThomas WergeSøren Brunak