Top Banner
Dual-Event Machine Learning Models to Accelerate Drug Discovery Sean Ekins 1,2* , Robert C. Reynolds 3,4* , Hiyun Kim 5 , Mi-Sun Koo 5 , Marilyn Ekonomidis 5 , Meliza Talaue 5 , Steve D. Paget 5 , Lisa K. Woolhiser 6 , Anne J. Lenaerts 6 , Barry A. Bunin 1 , Nancy Connell 5 and Joel S. Freundlich 5,7* 1 Collaborative Drug Discovery, 1633 Bayshore Highway, Suite 342, Burlingame, CA 94010, USA. 2 Collaborations in Chemistry, 5616 Hilltop Needmore Road, Fuquay-Varina, NC 27526, USA. 3 Southern Research Institute, 2000 Ninth Avenue South, Birmingham, AL 35205, USA. 4 Current address: University of Alabama at Birmingham, College of Arts and Sciences , Department of Chemistry, 1530 3 rd Avenue South, Birmingham, Alabama 35294-1240, USA. 5 Department of Medicine, Center for Emerging and Reemerging Pathogens, UMDNJ – New Jersey Medical School, 185 South Orange Avenue Newark, NJ 07103, USA. 6 Department of Microbiology, Immunology and Pathology, Colorado State University, 200 West Lake Street, CO 80523, USA. 7 Department of Pharmacology & Physiology, UMDNJ – New Jersey Medical School, 185 South Orange Avenue Newark, NJ 07103, USA. .
29

dual-event machine learning models to accelerate drug discovery

Jan 28, 2015

Download

Technology

Sean Ekins

ACS talk 2013
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: dual-event machine learning models to accelerate drug discovery

Dual-Event Machine Learning Models to Accelerate Drug Discovery

Sean Ekins1,2*, Robert C. Reynolds3,4*, Hiyun Kim5, Mi-Sun Koo5, Marilyn Ekonomidis5, Meliza Talaue5, Steve D. Paget5, Lisa K. Woolhiser6, Anne J. Lenaerts6, Barry A. Bunin1, Nancy Connell5 and Joel S. Freundlich5,7*  

1Collaborative Drug Discovery, 1633 Bayshore Highway, Suite 342, Burlingame, CA 94010, USA.2Collaborations in Chemistry, 5616 Hilltop Needmore Road, Fuquay-Varina, NC 27526, USA.3Southern Research Institute, 2000 Ninth Avenue South, Birmingham, AL 35205, USA. 4Current address: University of Alabama at Birmingham, College of Arts and Sciences , Department of Chemistry, 1530 3 rd Avenue South, Birmingham, Alabama 35294-1240, USA.5Department of Medicine, Center for Emerging and Reemerging Pathogens, UMDNJ – New Jersey Medical School, 185 South Orange Avenue Newark, NJ 07103, USA.6Department of Microbiology, Immunology and Pathology, Colorado State University, 200 West Lake Street, CO 80523, USA.7Department of Pharmacology & Physiology, UMDNJ – New Jersey Medical School, 185 South Orange Avenue Newark, NJ 07103, USA.

.

Page 2: dual-event machine learning models to accelerate drug discovery

Tuberculosis kills 1.6-1.7m/yr (~1 every 8 seconds) 1/3rd of worlds population infected!!!!

Multi drug resistance in 4.3% of cases Extensively drug resistant increasing incidence one new drug (bedaquiline) in 40 yrs

Drug-drug interactions and Co-morbidity with HIV

Collaboration between groups is rare These groups may work on existing or new targets Use of computational methods with TB is rare

TB facts

streptomycin (1943)para-aminosalicyclic acid (1949)isoniazid (1952) (Bayer, Roche, Squibb)pyrazinamide (1954)cycloserine (1955)ethambutol (1962)rifampicin (1967)

Page 3: dual-event machine learning models to accelerate drug discovery

~ 20 public datasets for TBIncluding Novartis data on TB hits >300,000 cpds

Patents, Papers Annotated by CDD

Open to browse by anyone

http://www.collaborativedrug.com/

register

Page 4: dual-event machine learning models to accelerate drug discovery

Phenotypic screening HTS Hit rates

SRI papers

Usually less than 1%

Page 5: dual-event machine learning models to accelerate drug discovery

Bayesian Model Construction: Mtb Whole-Cell HTS

• Learning from 3,779 compounds from an NIAID library- active: MIC < 5 mM- inactive: MIC ≥ 5 mM

Page 6: dual-event machine learning models to accelerate drug discovery

Bayesian machine learning

Ekins, Williams and Xu, Drug Metab Dispos 38: 2302-2308, 2010

Bayesian classification is a simple probabilistic classification model. It is based on Bayes’ theorem

h is the hypothesis or modeld is the observed datap(h) is the prior belief (probability of hypothesis h before observing any data)p(d) is the data evidence (marginal probability of the data)p(d|h) is the likelihood (probability of data d if hypothesis h is true) p(h|d) is the posterior probability (probability of hypothesis h being true given the observed data d)

A weight is calculated for each feature using a Laplacian-adjusted probability estimate to account for the different sampling frequencies of different features.

The weights are summed to provide a probability estimate

Page 7: dual-event machine learning models to accelerate drug discovery

Novel Bayesian Models for Mtb Whole-Cell Efficacy

SRI MLSMR 220K single point modelactive: ≥90% inhibition @ 10 mM; inactive <90% inhibition @ 10

mM

SRI MLSMR 2.5K dose reponse modelactive: IC50 ≤ 5 mM; inactive: IC50 > 5 mM

Ekins, S. et al., Mol. Biosyst. 2010, 6, 840-51; Ekins, S. et al., Mol. Biosyst. 2010, 6, 2316-2324.

• Laplacian-corrected Bayesian classifier models (Accelrys Discovery Studio)

• Molecular function class fingerprints of maximum diameter 6 (FCFP_6)

• Simple molecular descriptors chosen including AlogP, molecular weight,# rotatable bonds, # rings, # hydrogen bond acceptors, # hydrogen bonddonors, and polar surface area

• Validated w/ leave-one-out cross-validation & leave-50%-out cross-validation

Model Building and Validation

Page 8: dual-event machine learning models to accelerate drug discovery

Bayesian Classification TB Models

Dateset (number of molecules)

External ROC Score

Internal ROC

Score Concordance Specificity Sensitivity

MLSMR All single point

screen (N = 220463) 0.86 ± 0 0.86 ± 0 78.56 ± 1.86 78.59 ± 1.94 77.13 ± 2.26

MLSMR dose response set

(N = 2273) 0.73 ± 0.01 0.75 ± 0.01 66.85 ± 4.06 67.21 ± 7.05 65.47 ± 7.96

We can use the public data for machine learning model buildingUsing Discovery Studio Bayesian modelLeave out 50% x 100

Ekins et al., Mol BioSyst, 6: 840-851, 2010

Page 9: dual-event machine learning models to accelerate drug discovery

Bayesian Classification Models for TB

G1: 1704324327

73 out of 165 good Bayesian Score: 2.885

G2: -2092491099 57 out of 120 good

Bayesian Score: 2.873

G3: -1230843627

75 out of 188 good Bayesian Score: 2.811

G4: 940811929

35 out of 65 good Bayesian Score: 2.780

G5: 563485513

123 out of 357 good Bayesian Score: 2.769

B1: 1444982751

0 out of 1158 good Bayesian Score: -3.135

B2: 274564616

0 out of 1024 good Bayesian Score: -3.018

B3: -1775057221 0 out of 982 good

Bayesian Score: -2.978

B4: 48625803

0 out of 740 good Bayesian Score: -2.712

B5: 899570811

0 out of 738 good Bayesian Score: -2.709

Good

Bad

active compounds with MIC < 5uM

Laplacian-corrected Bayesian classifier models were generated using FCFP-6 and simple descriptors. 2 models 220,000 and >2000 compounds

Ekins et al., Mol BioSyst, 6: 840-851, 2010

Page 10: dual-event machine learning models to accelerate drug discovery

Bayesian Classification Dose response

Good

Bad

Ekins et al., Mol BioSyst, 6: 840-851, 2010

Page 11: dual-event machine learning models to accelerate drug discovery

100K library Novartis Data FDA drugs

Additional test sets

Suggests models can predict data from the same and independent labsEnrichments 4-10 foldInitial enrichment – enables screening few compounds to find actives

21 hits in 2108 cpds34 hits in 248 cpds1702 hits in >100K cpds

Ekins and Freundlich, Pharm Res, 28, 1859-1869, 2011.Ekins et al., Mol BioSyst, 6: 840-851, 2010

Page 12: dual-event machine learning models to accelerate drug discovery

Testing to date has been retrospective

Can we use our models to select compounds and influence design?

Prospective prediction

Do it enough times to show robustness

Testing prospectively

Page 13: dual-event machine learning models to accelerate drug discovery

Ranked Asinex 25K library with MLSMR dose response model –

Bayesian score range -28.4 – 15.3

99 compounds screened (Bayesian score 9.4 – 15.3).

12 cpds were identified with IC90 < 30 ug/mL

~12% hit rate

Most active SYN 22269076

Pyrazolo[1,5-a]pyrimidine

IC50 1.1ug/ml (3.2uM)

Bayesian Machine Learning Models – testing

BayesianScore14.9 10.6 9.8

Page 14: dual-event machine learning models to accelerate drug discovery

Some follow up compounds for the Asinex hit

Page 15: dual-event machine learning models to accelerate drug discovery

Principal component analysis (PCA) of all SRI data sets to illustrate overlap of chemistry space using the datasets

from this study (red TAACF-CB2, green = MLSMR, black = kinase dataset), 3PCs explain 72% of the variance.

Page 16: dual-event machine learning models to accelerate drug discovery

Top scoring molecules assayed forMtb growth inhibition

Mtb screening molecule database

High-throughputphenotypic

Mtb screening

Descriptors + Bioactivity (+Cytotoxicity)

Bayesian Machine Learning Mtb Model

Molecule Database (e.g. GSK malaria actives)

virtually scored using Bayesian Models

New bioactivity datamay enhance models

Identify in vitro hits

Increased hit/lead discovery efficiency

Dual-Event models

Page 17: dual-event machine learning models to accelerate drug discovery

Dual-Event models

Become more stringent in what we call an ACTIVE

IC90 < 10 ug/ml (CB2) or <10uM (MLSMR) and a selectivity index (SI) greater than ten.

SI was calculated as SI = CC50/IC90 where CC50 is the concentration that resulted in 50% inhibition of Vero cells (CC50).

Page 18: dual-event machine learning models to accelerate drug discovery

Bayesian Classification TB Models

Dateset (number of molecules)

External ROC

Score

Internal ROC

Score Concordance Specificity Sensitivity

MLSMR All single point

screen (N = 220463) 0.86 ± 0 0.86 ± 0 78.56 ± 1.86 78.59 ± 1.94 77.13 ± 2.26

MLSMR dose response set

(N = 2273) 0.73 ± 0.01 0.75 ± 0.01 66.85 ± 4.06 67.21 ± 7.05 65.47 ± 7.96

NEW Dose resp and cytotoxicity (N =

2273) 0.82 ± 0.02 0.84 ± 0.02 82.61 ± 4.68 83.91 ± 5.48 65.99 ± 7.47

Single pt ROC XV AUC = 0.88Dose resp = 0.78Dose resp + cyto = 0.86

Ekins et al., PLOSONE, in press 2013

Page 19: dual-event machine learning models to accelerate drug discovery

A new dataset to use as a test set for models

Page 20: dual-event machine learning models to accelerate drug discovery

Bayesian Machine Learning Models – blind testing

Dual event model shows increased enrichment

Ekins et al.,Chem Biol 20, 370–378, 2013

Page 21: dual-event machine learning models to accelerate drug discovery

1. Virtually screen 13,533-member GSK antimalarial hit library2. Model = SRI TAACF-CB dose response + cytotoxicity model3. Top 46 commercially available compounds visually inspected4. 7 compounds chosen for Mtb testing based on

- drug-likeness- chemotype diversity

Prospective prediction of antimalarial compounds vs Mtb

Dateset

(number of molecules)External

ROC ScoreInternal ROC

Score Concordance Specificity Sensitivity

TAACF-CB2 IC90 and

cytotoxicity (1783)0.64 0.59 ± 0.01 0.63 ± 0.02 55.74 ±1.31 61.61 ± 8.96

Page 22: dual-event machine learning models to accelerate drug discovery

Prospective prediction of antimalarial compounds vs Mtb

7 tested, 5 active (70% hit rate)

Ekins et al.,Chem Biol 20, 370–378, 2013

Page 23: dual-event machine learning models to accelerate drug discovery

Bayesian Model Follow-up: Do we have a lead?

• BAS00521003/ TCMDC-125802 reported to be a P. falciparum lactate dehydrogenase inhibitor• Only one report (that we were unaware of when picking the compound) of antitubercular activity from 1969 - solid agar MIC = 1 mg/mL (“wild strain”) - “no activity” in mouse model up to 400 mg/kg - however, activity was solely judged by extension of survival!

Bruhin, H. et al., J. Pharm. Pharmac. 1969, 21, 423-433.

SRI MLSMR 220K library contains:107 hits with this substructure - 3 nitrofuryl hydrazones - 10 furyl hydrazones - 19 nitrophenyl hydrazones32 inactives with this substructure

Maddry et al., Tuberculosis 2009, 89, 354.

MIC of 0.0625 ug/mL

Page 24: dual-event machine learning models to accelerate drug discovery

Efficacy Profiling of TCMDC-125802

• 64X MIC affords 6 logs of kill• Resistance and/or drug instability beyond 14 d

Vero cells : CC50 = 4.0 mg/mL

Selectivity Index SI = CC50/MICMtb = 16 – 64

Ekins et al.,Chem Biol 20, 370–378, 2013

Page 25: dual-event machine learning models to accelerate drug discovery

In vivo Evaluation of TCMDC-125802Goal: Evaluate the in vivo safety and efficacy of JSF-2019 in mouse models of TB infection

Step #2: 7-day Maximum Tolerated Dose study in mice - formulated in 0.5% methyl cellulose - single dose p.o. @ 30, 100, and 300 mg/kg in B6D2F1 mice - no overt toxicity

Lisa Woolhiser and Anne Lenaerts (CSU)

Step #3: evaluation in GKO mouse model of TB infection - Five 12 week-old female C57BL/6 mice infected with Mtb Erdman via low-dose aerosol exposure

- Days 16 – 23 : dosed w/ 300 mg/kg JSF-2019 p.o. OR 25 mg/kg INH OR untreated

- Sacrificed day 24 and lung and spleen homogenates were cultured

- no difference in lungs and spleens vs. control

Page 26: dual-event machine learning models to accelerate drug discovery

http://goo.gl/UujRXBallel et al., Fueling Open-Source drug discovery: 177 small-molecule leads against tuberculosis ChemMedChem 2013.

GSK screened 2M compounds – 3 yrs ago Bayesian predictions for 14,000 cpds exposed 11 / 15 (73%) correct when paper was publishedFurther prospective validation example

Why screen cpds?

Page 27: dual-event machine learning models to accelerate drug discovery

Conclusions

>38,000 molecules screened through Bayesian models

106 molecules were tested in vitro

17 actives were identified (22.5 % hit rate)

Identified several novel potent lead series with good cytotoxicity & selectivitySome series have been missed in SRI screening data

Took a non toxic molecule quickly in vivo – Have made analogs in attempt to overcome in vivo efficacy failure

All Bayesian models shared with Abbott and Merck in TB Accelerator project

All Bayesian models are freely available to researchers

Ekins et al.,Chem Biol 20, 370–378, 2013

Page 28: dual-event machine learning models to accelerate drug discovery

Acknowledgments

The project described was supported by Award Number R43 LM011152-01 “Biocomputation across distributed private datasets to enhance drug discovery” from the National Library of Medicine (PI: S. Ekins)

Accelrys

The CDD TB has been developed thanks to funding from the Bill and Melinda Gates Foundation (Grant#49852 “Collaborative drug discovery for TB through a novel database of SAR data optimized to promote data archiving and sharing”)

Allen Casey (IDRI)

Joel Freundlich Lab

Page 29: dual-event machine learning models to accelerate drug discovery

You can find me @... CDD Booth 205

PAPER ID: 13433PAPER TITLE: “Dispensing processes profoundly impact biological assays and computational and statistical analyses”April 8th 8.35am Room 349

PAPER ID: 14750PAPER TITLE: “Enhancing High Throughput Screening For Mycobacterium tuberculosis Drug Discovery Using Bayesian Models” April 9th 1.30pm Room 353PAPER ID: 21524

PAPER TITLE: “Navigating between patents, papers, abstracts and databases using public sources and tools”April 9th 3.50pm Room 350PAPER ID: 13358

PAPER TITLE: “TB Mobile: Appifying Data on Anti-tuberculosis Molecule Targets”April 10th 8.30am Room 357

PAPER ID: 13382PAPER TITLE: “Challenges and recommendations for obtaining chemical structures of industry-provided repurposing candidates”April 10th 10.20am Room 350

PAPER ID: 13438PAPER TITLE: “Dual-event machine learning models to accelerate drug discovery”April 10th 3.05 pm Room 350