Bibliological data science and drug discovery Knowing the knowns* Effectively Harnessing the World’s Literature To Inform Rational Compound Design - ACS National Meeting, Philadelphia, Aug 21-24, 2016 Jeremy J Yang Translational Informatics Division School of Medicine University of New Mexico Integrative Data Science Lab School of Informatics & Computing Indiana University *phrase borrowed from Edgar Jacoby, Janssen.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Bibliological data science and drug discoveryKnowing the knowns*
Effectively Harnessing the World’s Literature To Inform Rational Compound Design - ACS National Meeting, Philadelphia, Aug 21-24, 2016
Jeremy J Yang
Translational Informatics Division School of Medicine
University of New Mexico
Integrative Data Science Lab School of Informatics & Computing
Indiana University
*phrase borrowed from Edgar Jacoby, Janssen.
In science, luck favors the prepared.- Louis Pasteur
The main thing was not to . . . "foul up." - The Right Stuff, by Tom Wolfe, about John Glenn.
Overview of talk
● Formulation of problem● Resources and examples:
TIN-X, Target Importance and Novelty Explorer (&IDG)
Chem2Bio2RDF
OPDDR, Open Phenotypic Drug Discovery Resource
DrugCentral
Formulation of problem
● "World's Literature" redefined by online revolution● Rational Compound Design = improving our odds● For given research question, what are the known knowns?● Connect the dots and weigh the evidence from global
knowledge graph.
TIN-X
TIN-X Target Importance & Novelty Explorer
● Bibliometric application developed for Illuminating the Druggable Genome (IDG) project
● Text mining from Novo Nordisk Center for Protein Research (U. Copenhagen) lab of Lars Juhl Jensen.
● Algorithm and client developed at UNM (Cristian Bologa, Daniel Cannon)
● Disease Ontology (DO) classification ● Drug Target Ontology (DTO) protein classification
Illuminating the Druggable Genome (IDG)
7Knowledge Mgmt Center PI:
Tudor Oprea, MD, PhD
pharos.nih.gov
TIN-X
http://newdrugtargets.org
TIN-X
TIN-X
http://newdrugtargets.org
Target Novelty:
Fk = 1 / Tk
● Tk = # targets in paper (k)● Fk = fractional score of paper (k)● for papers where Tk > 0
Ni = 1 / ∑(Fk)● Ni = novelty, target (i)● sum over papers where target (i) mentioned
Target-Disease Importance:
Fk = 1 / (Tk * Dk)● Tk = # targets in paper (k)● Dk = # diseases in paper (k)● Fk = fractional score of paper (k)
Iij = ∑(Fk)● Iij = importance, target (i) for disease (j)● sum over papers where both mentioned
Target Importance and Novelty Explorer (TIN-X), Daniel Cannon, Jeremy Yang, Stephen Mathias, Oleg Ursu, Subramani Mani, Anna Waller, Stephan Schürer, Lars Juhl Jensen, Larry Sklar, Cristian Bologa, and Tudor Oprea (manuscript in preparation).
TIN-X
TIN-X Target Importance & Novelty Explorer
● Text mining is a valuable tool for monitoring literature, filtering and ranking, and detecting trends.
● Automation can infer patterns regarding community trends and consensus.
● Interactive visualization tools help navigate big data.● Good big data text miners care about small data too!
TIN-X Key contributors
Cristian Bologa Daniel Cannon Lars Juhl Jensen
Chem2Bio2RDF
● 24 sources, 52 datasets, 78M triples
● Semantically linked● Chen, B, et al, BMC
Bioinformatics (2010).● Chen, B et al, PLoS
Comp Bio (2012).● Fu, G et al, BMC
Bioinfo (2016).● Related projects:
Bio2RDF, LOD
http://chem2bio2rdf.org
Classes:biological chemical
chemogenomicsliterature
phenotypesystemsdiseasepathway
polypharmacologyPPI
side effect
BindingDBBindingMOADIUChEBIChEMBLCTDDCDBDIP
DrugBankHGNCHPRDKEGGMATADOROMIMPDBePDSP
PharmGKBPubChemPubMedReactomeSIDERTTDUniProt
Sources:
Linked Open Data (LOD)
http://linkeddata.org/
Chem2Bio2RDF apps: (1) SLAP, (2) Metapaths
2012
2016
● Data semantics essential for integration of heterogeneous sources
● Strong evidence requires strong semantics● Semantic Web Technologies common framework
enabling -- but not assuring -- community progress● Chem2Bio2RDF v2.0 to leverage major community
advances (esp. Open PHACTS)● Data ecosystems, coop-tition & prisoner's dilemma
Key contributors
Bin Chen Ying Ding David Wild
OPDDR
OPDDR
Open Phenotypic Drug Discovery Resource
https://ncats.nih.gov/expertise/preclinical/pd2
OPDDRcollaboration
Example: OIDD HeLa cell based assayIntegrated RDF
bioassay:AID1117350skos:exactMatchoidd_assay:17 .
bioassay:AID1117350 dcterms:source source:ID846 ; dcterms:title "Increased chromatin condensation in HeLa cells-IC50"@en .
● OPDDR phenotypic assays have been linked and integrated via community semantics to both phenotypic (cell lines) and molecular (genomic/protein targets)
● New phenotypic knowledge domain offers additional value in drug discovery and pharmacological informatics
● Open PHACTS excellent, well suited platform
DrugCentral
DrugCentral
● DrugCentral is a free, open, curated resource about approved drugs, designed for research
● Compounds, products, labels, targets, IDs, names● DrugCentral developed over several years at UNM● DrugCentral recently released with new interface● License: CC-BY-SA
http://drugcentral.org
http://drugcentral.org
http://drugcentral.org
DrugCentral
● Free, open, accurate, comprehensive drug reference for biomolecular and biomedical informatics research
Compounds 4444
Products 84787
Synonyms 20522
Structures 4231
Targets 3651
Bioactivities 15620
MoA 3484
SNOMED 45349
"DrugCentral: online drug compendium", Oleg Ursu, Jayme Holmes, Jeffrey Knockel, Cristian Bologa, Jeremy Yang, Stephen Mathias, Stuart Nelson, Tudor Oprea (manuscript submitted).
In Conclusion● New resources continue to emerge and evolve, providing
opportunities for knowledge driven drug discovery● Community standards → more intelligent web● Adapt to new data environment for success● Private + public data must be integrated to
○ Be prepared (like Pasteur)○ Not "foul up" (like Glenn)