Dual-Event Machine Learning Models to Accelerate Drug Discovery Sean Ekins 1,2* , Robert C. Reynolds 3,4* , Hiyun Kim 5 , Mi-Sun Koo 5 , Marilyn Ekonomidis 5 , Meliza Talaue 5 , Steve D. Paget 5 , Lisa K. Woolhiser 6 , Anne J. Lenaerts 6 , Barry A. Bunin 1 , Nancy Connell 5 and Joel S. Freundlich 5,7* 1 Collaborative Drug Discovery, 1633 Bayshore Highway, Suite 342, Burlingame, CA 94010, USA. 2 Collaborations in Chemistry, 5616 Hilltop Needmore Road, Fuquay-Varina, NC 27526, USA. 3 Southern Research Institute, 2000 Ninth Avenue South, Birmingham, AL 35205, USA. 4 Current address: University of Alabama at Birmingham, College of Arts and Sciences , Department of Chemistry, 1530 3 rd Avenue South, Birmingham, Alabama 35294-1240, USA. 5 Department of Medicine, Center for Emerging and Reemerging Pathogens, UMDNJ – New Jersey Medical School, 185 South Orange Avenue Newark, NJ 07103, USA. 6 Department of Microbiology, Immunology and Pathology, Colorado State University, 200 West Lake Street, CO 80523, USA. 7 Department of Pharmacology & Physiology, UMDNJ – New Jersey Medical School, 185 South Orange Avenue Newark, NJ 07103, USA. .
29
Embed
Dual-Event Machine Learning Models to Accelerate Drug ... · PDF fileDual-Event Machine Learning Models ... “Dual-event machine learning models to accelerate drug discovery” April
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Dual-Event Machine Learning Models to Accelerate Drug Discovery
Sean Ekins1,2*, Robert C. Reynolds3,4*, Hiyun Kim5, Mi-Sun Koo5, Marilyn
Ekonomidis5, Meliza Talaue5, Steve D. Paget5, Lisa K. Woolhiser6, Anne J.
Lenaerts6, Barry A. Bunin1, Nancy Connell5 and Joel S. Freundlich5,7* 1Collaborative Drug Discovery, 1633 Bayshore Highway, Suite 342, Burlingame, CA 94010, USA. 2Collaborations in Chemistry, 5616 Hilltop Needmore Road, Fuquay-Varina, NC 27526, USA. 3Southern Research Institute, 2000 Ninth Avenue South, Birmingham, AL 35205, USA. 4Current address: University of Alabama at Birmingham, College of Arts and Sciences , Department of Chemistry, 1530 3rd
Avenue South, Birmingham, Alabama 35294-1240, USA. 5Department of Medicine, Center for Emerging and Reemerging Pathogens, UMDNJ – New Jersey Medical School, 185 South
Orange Avenue Newark, NJ 07103, USA. 6Department of Microbiology, Immunology and Pathology, Colorado State University, 200 West Lake Street, CO 80523, USA. 7Department of Pharmacology & Physiology, UMDNJ – New Jersey Medical School, 185 South Orange Avenue Newark, NJ
07103, USA.
.
Tuberculosis kills 1.6-1.7m/yr (~1 every 8 seconds)
1/3rd of worlds population infected!!!!
Multi drug resistance in 4.3% of cases
Extensively drug resistant increasing incidence
one new drug (bedaquiline) in 40 yrs
Drug-drug interactions and Co-morbidity with HIV
Collaboration between groups is rare
These groups may work on existing or new targets
Use of computational methods with TB is rare
TB facts
streptomycin (1943)
para-aminosalicyclic acid (1949)
isoniazid (1952) (Bayer, Roche, Squibb)
pyrazinamide (1954)
cycloserine (1955)
ethambutol (1962)
rifampicin (1967)
~ 20 public datasets for TB
Including Novartis data on TB hits
>300,000 cpds
Patents, Papers Annotated by CDD
Open to browse by anyone
http://www.collaborativedrug.
com/register
Phenotypic screening HTS Hit rates
SRI papers
Usually less than 1%
Bayesian Model Construction: Mtb Whole-Cell HTS
• Learning from 3,779 compounds from an NIAID library
- active: MIC < 5 mM
- inactive: MIC ≥ 5 mM
Bayesian machine learning
Ekins, Williams and Xu, Drug Metab Dispos 38: 2302-2308, 2010
Bayesian classification is a simple probabilistic classification model. It is based on
Bayes’ theorem
h is the hypothesis or model
d is the observed data
p(h) is the prior belief (probability of hypothesis h before observing any data)
p(d) is the data evidence (marginal probability of the data)
p(d|h) is the likelihood (probability of data d if hypothesis h is true)
p(h|d) is the posterior probability (probability of hypothesis h being true given the
observed data d)
A weight is calculated for each feature using a Laplacian-adjusted probability
estimate to account for the different sampling frequencies of different features.
The weights are summed to provide a probability estimate