Top Banner
Accident and misadventure in property-based molecular design Peter W Kenny (blog ) NEQUIMED -IQSC-USP Funding: FAPESP and CNPq
29

BrazMedChem2014

Jul 02, 2015

Download

Education

Peter Kenny

Accident and misadventure in property-based molecular design
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: BrazMedChem2014

Accident and misadventure in property-based

molecular design

Peter W Kenny (blog)

NEQUIMED-IQSC-USP

Funding: FAPESP and CNPq

Page 2: BrazMedChem2014

Hypothesis-driven molecular design and relationships between structures as framework for analysing activity and properties

?

Date of Analysis N DlogFu SE SD %increase

2003 7 -0.64 0.09 0.23 0

2008 12 -0.60 0.06 0.20 0

Mining PPB database for carboxylate/tetrazole pairs suggested that bioisosteric replacement wouldlead to decrease in Fu . Tetrazoles were not synthesised even though their logP values are expected tobe 0.3 to 0.4 units lower than for corresponding carboxylic acids.

Hypothesis-driven versus prediction-driven molecular design: Kenny JCIM 2009 49:1234-1244 DOI

Relationships between structures as framework for analyzing SAR/SPR: Kenny & Sadowski (2005) Methods and

Principles in Medicinal Chemistry (Chemoinformatics in Drug Discovery, ed T Oprea) 2005, 23, 271-285 DOI

Tetazole/carboxylate matched molecular pair analysis: Birch et al (2009) BMCL19:850-853 DOI

Page 3: BrazMedChem2014

Some things that make drug discovery difficult

• Having to exploit targets that are weakly-linked to

human disease

• Poor understanding and predictability of toxicity

• Inability to measure free (unbound) physiological

concentrations of drug for remote targets (e.g.

intracellular or on far side of blood brain barrier)

Dans la merde, FBDD & Molecular Design blog :

Page 4: BrazMedChem2014

TEP = [𝐷𝑟𝑢𝑔 𝑿,𝑡 ]𝑓𝑟𝑒𝑒

𝐾𝑑

Target engagement potential (TEP) A basis for pharmaceutical molecular design?

Design objectives• Low Kd for target(s)• High (hopefully undetectable) Kd for antitargets• Ability to control [Drug(X,t)]free

Kenny, Leitão & Montanari JCAMD 2014 28:699-710 DOI

Page 5: BrazMedChem2014

Property-based design as search for ‘sweet spot’

Green and red lines represent probability of achieving ‘satisfactory’ affinity and‘satisfactory’ ADMET characteristics respectively. The blue line shows the product ofthese probabilities and characterizes the ‘sweet spot’. This way of thinking about the‘sweet spot’ has similarities with molecular complexity model proposed by Hann et al.

Kenny & Montanari, JCAMD 2013 27:1-13 DOI

Page 6: BrazMedChem2014

Eu prefiro minha comida cozida e meus dados brutos….

Page 7: BrazMedChem2014

Correlation

• Strong correlation implies good predictivity

– Beware of ‘experts’ who say, “I have observed a correlation so you must use my rule” (Actually, beware of experts and rules).

• Multivariate data analysis (e.g. PCA) usually involves transformation to orthogonal basis

• Applying cutoffs (e.g. MW restriction) to data can distort correlations

• Noise in measurement and dynamic range impose limits on strength of correlation

Page 8: BrazMedChem2014

Quantifying strengths of relationships between continuous variables

• Correlation measures

– Pearson product-moment correlation coefficient (R)

– Spearman's rank correlation coefficient ()

– Kendall rank correlation coefficient (τ)

• Quality of fit measures

– Coefficient of determination (R2) is the fraction of the variance in Y that is explained by model

– Root mean square error (RMSE)

Page 9: BrazMedChem2014

Drug-likeness ‘experts’ are usually shy about sharing their data but there is a way forward…

Page 10: BrazMedChem2014

Preparation of synthetic data sets

Add Gaussian noise (SD=10) to Y

Kenny & Montanari (2013) JCAMD 27:1-13 DOI

Page 11: BrazMedChem2014

Correlation inflation by hiding variationSee Hopkins, Mason & Overington (2006) Curr Opin Struct Biol 16:127-136 DOI

Leeson & Springthorpe (2007) NRDD 6:881-890 DOI

Data is naturally binned (X is an integer) and mean value of Y is calculated for each value of X. In some studies, averaged data is only presented graphically and it is left to the reader to judge the strength of the correlation.

R = 0.34 R = 0.30 R = 0.31

R = 0.67 R = 0.93 R = 0.996

Page 12: BrazMedChem2014

rN 1202

R 0.247 ( 95% CI: 0.193 | 0.299)

N 8

R 0.972 ( 95% CI: 0.846 | 0.995)

Correlation Inflation in FlatlandSee Lovering, Bikker & Humblet (2009) JMC 52:6752-6756 DOI

Kenny & Montanari (2013) JCAMD 27:1-13 DOI

Page 13: BrazMedChem2014

Masking variation with standard error“In each plot provided, the width of the errors bars and the difference in the mean values of the different categories are indicative of the strength of the relationship between the parameters.” Gleeson (2008) JMC 51:817-834 DOI

Partition by value of X into four bins with equal numbers of data points and display 95% confidence interval for mean (green) and mean ± SD (blue) for each bin.

R = 0.12 R = 0.29 R = 0.28

Kenny & Montanari (2013) JCAMD 27:1-13 DOI

Page 14: BrazMedChem2014

N Bins Degrees of Freedom F P

40 4 3 0.2596 0.8540

400 4 3 12.855 < 0.0001

4000 4 3 115.35 < 0.0001

4000 2 1 270.91 < 0.0001

4000 8 7 50.075 < 0.0001

ANOVA tests whether differences in mean values for different categories are significant

ANOVA for binned synthetic data sets

Kenny & Montanari (2013) JCAMD 27:1-13 DOI

This analysis does not take account of ordering of categories (e.g. high, medium and low)

Page 15: BrazMedChem2014

Know your data

• Assays are typically run in replicate making it possible to estimate assay variance

• Every assay has a finite dynamic range and it may not always be obvious what this is for a particular assay

• Dynamic range may have been sacrificed for thoughput but this, by itself, does not make the assay bad

• We are likely to need to be able analyse in-range and out-of-range data within single unified framework– See Lind (2010) QSAR analysis involving assay results which are only known to

be greater than, or less than some cut-off limit. Mol Inf 29:845-852 DOI

Page 16: BrazMedChem2014

Correlation inflation: some stuff to think about

• Model continuous data as continuous data

• To be meaningful, a measure of the spread of a distribution must be independent of sample size

• Don’t confuse statistical significance with strength of a trend

• When selecting training data think in terms of Design of Experiments (e.g. evenly spaced values of X)

• Try to achieve normally distributed Y (e.g. use pIC50

rather than IC50)

Page 17: BrazMedChem2014

Ligand efficiency metrics (LEMs) considered harmful

Page 18: BrazMedChem2014

Introduction to ligand efficiency metrics (LEMs)

• We use LEMs to normalize activity with respect to risk factors such as molecular size and lipophilicity

• What do we mean by normalization?

• We make assumptions about underlying relationship between activity and risk factor(s) when we define an LEM

• LEM as measure of extent to which activity beats a trend?

Kenny, Leitão & Montanari (2014) JCAMD 28:699-701 DOILigand efficiency metrics considered harmful, FBDD & Molecular design blog

Page 19: BrazMedChem2014

Scale activity/affinity by risk factor

LE = ΔG/HA

Offset activity/affinity by risk factor

LipE = pIC50 ClogP

Ligand efficiency metrics

There is no reason that normalization of activity with respect to risk factor should be restricted to either of these functional forms.

Kenny, Leitão & Montanari (2014) JCAMD 28:699-701 DOI

Page 20: BrazMedChem2014

Use trend actually observed in data for normalization

rather than some arbitrarily assumed trend

Kenny, Leitão & Montanari (2014) JCAMD 28:699-701 DOI

Can we accurately claim to have normalized a data set if we have

made no attempt to analyse it?

Page 21: BrazMedChem2014

There’s a reason why we say standard free energy

of binding

DG = DH TDS = RTln(Kd/C0)

• Adoption of 1 M as standard concentration is

arbitrary

• A view of a chemical system that changes with

the choice of standard concentration is

thermodynamically invalid (and, with apologies to

Pauli, is ‘not even wrong’)

Kenny, Leitão & Montanari (2014) JCAMD 28:699-701 DOIEfficient voodoo thermodynamics, FBDD & Molecular design blog

Page 22: BrazMedChem2014

NHA Kd/M C/M (1/NHA) log10(Kd/C)

10 10-3 1 0.30

20 10-6 1 0.30

30 10-9 1 0.30

10 10-3 0.1 0.20

20 10-6 0.1 0.25

30 10-9 0.1 0.27

10 10-3 10 0.40

20 10-6 10 0.35

30 10-9 10 0.33

Effect on LE of changing standard concentration

Analysis from Kenny, Leitão & Montanari (2014) JCAMD 28:699-701 DOINote that our article overlooked a similar analysis from 5 years earlier by

Zhou & Gilson (2009) Chem Rev 109:4092-4107 DOI

Page 23: BrazMedChem2014

Scaling transformation of parallel lines by dividing Y by X

(This is how ligand efficiency is calculated)

Size dependency of LE in this example is consequence of non-zero intercept

Kenny, Leitão & Montanari (2014) JCAMD 28:699-701 DOI

Page 24: BrazMedChem2014

Affinity plotted against molecular weight for minimal binding

elements against various targets in inhibitor deconstruction

study showing variation in intercept term

Data from Hajduk (2006) JMC 49:6972–6976 DOI

Each line corresponds to a different target and no attempt has been

made to indicate targets for individual data points. Is it valid to

combine results from different assays in LE analysis?

Kenny, Leitão & Montanari (2014) JCAMD 28:699-701 DOI

Page 25: BrazMedChem2014

Offsetting transformation of lines with different slope and

common intercept by subtracting X from Y

(This is how lipophilic efficiency is calculated)

Thankfully (hopefully?) lipophilicity-dependent lipophilic

efficiency has not yet been ‘discovered’

Kenny, Leitão & Montanari (2014) JCAMD 28:699-701 DOI

Page 26: BrazMedChem2014

Linear fit of ΔG to HA for published PKB ligands

Data from Verdonk & Rees (2008) ChemMedChem 3:1179-1180 DOI

HA

Δ

G/

kcal

mo

l-1ΔG/kcalmol-1 0.87 (0.44 HA)R2 0.98 RMSE 0.43

-ΔGrigid

Page 27: BrazMedChem2014

Ligand efficiency, group efficiency and residuals plotted for PKB binding data

Res

id|

GE

GE

Resid

LE

Residuals and group efficiency values show similar trends with pyrazole (HA = 5)

appearing as outlier (GE is calculated using ΔGrigid ). Using residuals to compare

activity eliminates need to use ΔGrigid estimate (see Murray & Verdonk 2002

JCAMD 16:741-753 DOI) which is subject to uncertainty.

Page 28: BrazMedChem2014

Use residuals to quantify extent to which activity beats trend

• Normalize activity using trend(s) actually observed in data (this means we have to model the data)

• All risk factors are treated within the same data-analytic framework

• Residuals are invariant with respect to choice in standard concentration

• Uncertainty in residuals is not explicitly dependent of value of risk factor (not the case for scaled LEMs)

• Residuals can be used with other functional forms (e.g. non-linear and multi-linear)

Kenny, Leitão & Montanari (2014) JCAMD 28:699-701 DOI

Page 29: BrazMedChem2014

LEMs: some stuff to think about

• Ligand efficiency as response of activity to risk factor

• Need to model activity data if you want to normalize it

• Using LEMs distorts data analysis unnecessarily