The use of Design of Experiments to develop efficient Arrays for … · Can only explore chemical space defined by R-group combinations in the training set. Assessing Additivity Assumptions

The use of Design of Experiments to develop Efficient Arrays for SAR and Property Exploration

Chris Luscombe,

Computational Chemistry

GlaxoSmithKline

Summary of Talk

Traditional approaches

SAR

Free-Wilson

Design of experiments

Examples

Learnings

Conclusions

Objectives of Lead Optimisation

Design Array experiments to answer SAR

questions to enhance potency

Improve physicochemical properties

Discover new monomer groups of interest.

O

NH

O

LHS

Core/Linker RHS

How do we traditionally determine SAR

array design

Optimisation at a single position

allows

– Easy synthesis planning

– Detailed understanding of SAR

Assumes FW type additivity

– Substituent contributions at

different positions are

independent and additive

This approach is widely used and

very successful

Free Wilson theory R1-Core-R2

First mathematical technique for quantitative SAR

Response = effect of Core + effect R1 substituent + effect of R2 substituent

Assumptions

– Core makes a constant contribution

– All contributions are additive

– No interactions between core and substituent

– No interaction between substituents

Can only explore chemical space defined by R-group combinations in the

training set

Assessing Additivity Assumptions

Assessment of Additive/Nonadditive Effects in Structure-Activity Relationships: Implications for Iterative Drug Design J. Med. Chem. 2008, 51, 7552–7562 Yogendra Patel, Valerie J. Gillet, Trevor Howe, Joaquin Pastor, Julen Oyarzabal, and Peter Willett

Design of Experiments (DOE)

Experimental Design approaches are well established for the optimization of multi-factor experiments, such as reaction conditions.

Typically these domains utilize „continuous‟ variables such as temperature, addition rate, time etc

Can these same techniques be use where each variable is categorical?

DOE in Medicinal Chemistry?

We propose that Design of Experiments (DOE) based

approaches can be applied to array scenarios where the

full (e.g. M x N) array cannot be synthesized for practical

reasons.

By treating each monomer in the array as a categorical

factor of the design, a balanced fractional (“Sparse”) array

design can be generated.

This novel approach can be successfully used to

understand and exploit the SAR of a late stage

optimisation programme

Example of a Sparse Array1/3rd fraction from an 6 x 12 array

Scatter Plot

Phenol_ID

amine1

amine2

amine3

amine4

amine5

amine6

Scatter Plot

Phenol_ID

amine1

amine2

amine3

amine4

amine5

amine6

Questions

Is the fraction selected sufficient to explore the

chemistry space?

Can we adequately assess monomer potential?

Can we predict the „missing‟ compounds?

Is it a practical way to direct chemistry synthesis?

Is it an efficient process?

Does it work?

Sparse Array :What are the key steps?

Monomers

• Identify/ select which monomers to incorporate into the design

Design

• Create an experimental design template appropriate to the investigational space define from the monomer numbers

Optimise

• Allocate monomers into the design so define which compounds to actually synthesize

Analyse

• Measure assay endpoints and build free-Wilson models to understand the SAR

Monomer Selection

• Identify appropriate monomers at each

position

• Use diversity, physico-chemical, ADME

and scientific rationale to reduce the

monomer lists

Calculate the average desirability score

from each monomer across the whole

virtual library.

Select the higher scoring ones to be

included in the Final DOE array design

amines

Mean_NR profile_by_amine

0.35 0.4 0.45 0.5 0.55 0.6

A1

A10

A11

A12

A13

A14

A15

A16

A17

A18

A19

A2

A20

A21

A22

A23

A24

A3

A4

A5

A6

A7

A8

A9

aldehydes

Mean_NR profile_by_aldehyde

0.4 0.45 0.5 0.55 0.6

B1

B10

B11

B12

B13

B14

B15

B16

B17

B18

B19

B2

B20

B21

B22

B3

B4

B5

B6

B7

B8

B9

nr score

Binned NR profile

2

109

901

1298

897

1382

2437

2820

1624

146

0 0.09 0.18 0.27 0.36 0.45 0.54 0.63 0.72 0.81 0.90

500

1000

1500

2000

2500

landscape

NR profile

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1


Monomers


Design

• Create an experimental design template define from the monomer numbers

Optimise

• Allocate monomers into the design so define which compounds to synthesize

Analyse


Design Creation (Sparse arrays)

Create an in-complete balanced D-Optimal design

– Even numbers of monomers at each R position

– D Optimality

– Force balanceScatter Plot

Amine ID

Level 1 of B Level 2 of B Level 3 of B Level 4 of B Level 5 of B Level 6 of B

Level 1 of A

Level 10 of A

Level 11 of A

Level 12 of A

Level 2 of A

Level 3 of A

Level 4 of A

Level 5 of A

Level 6 of A

Level 7 of A

Level 8 of A

Level 9 of A

Many software packages around

which can generate these types of

Experimental Design


Monomers


Design


Optimise

• Allocate monomers into the design to define which compounds to actually synthesize

Analyse


Which Monomer at which Position?

In principle monomers could be allocated in any order,

including random, into the DOE array

GSK use an in-house algorithmic approach to allocate

monomers into the define positions in the DOE array so

as to optimise the compounds to be synthesized against

another property

– Eg diversity,

– lead-likeness,

– logP etc

Compounds A:Phenols B:Amines

1 Phenol 3 Amine 2

2 Phenol 6 Amine 6

3 Phenol 7 Amine 5

4 Phenol 3 Amine 3

5 Phenol 9 Amine 5

6 Phenol 6 Amine 1

7 Phenol 10 Amine 4

8 Phenol 10 Amine 2

9 Phenol 8 Amine 4

10 Phenol 5 Amine 3

11 Phenol 4 Amine 6

12 Phenol 5 Amine 1

13 Phenol 1 Amine 1

14 Phenol 11 Amine 5



17 Phenol 1 Amine 5

18 Phenol 8 Amine 3

19 Phenol 7 Amine 4


21 Phenol 2 Amine 1

22 Phenol 9 Amine 3

23 Phenol 4 Amine 2

24 Phenol 2 Amine 4

Scatter Plot

Amine ID

Level 1 of B Level 2 of B Level 3 of B Level 4 of B Level 5 of B Level 6 of B

Level 1 of A

Level 10 of A

Level 11 of A

Level 12 of A

Level 2 of A

Level 3 of A

Level 4 of A

Level 5 of A

Level 6 of A

Level 7 of A

Level 8 of A

Level 9 of A


Monomers


Design


Optimise

• Allocate monomers into the design so define which compounds to actually synthesize

Analyse


FW analysis of monomer contribution

A Free –Wilson analysis is a

regression based approach to

establish monomer contributions to a

predictive model

A high degree of fit suggests that the

potency profile could be additive in

nature.

– The presence of outliers may imply

non-additive behaviour

– Assess potential interaction terms

between monomers if the output

appears to be non-additive

Design-Expert® Software

GTPgS_CCR4_pIC50

Color points by value of

GTPgS_CCR4_pIC50:

7.7

5.5

2

2

2

22

2

Actual

Pre

dic

ted

Predicted vs. Actual

5.10

5.78

6.45

7.13

7.80

5.00 5.46 5.91 6.37 6.83 7.28 7.74

Example 1 Sparse array to evaluate defined N x M combinatorial space with a fractional subset

Design

– 12 Indazoles (R1)

Identified using

classical SAR

approaches

– 48 sulphonyl chlorides

monomers (R2) selected from library using

a variety of criteria

– Lead-likeness

score

Scatter Plot

Indazoles

R1

R2

•12 monomers per R1

•3 monomers per R2

Measured Potency for the Sparse array

142 of 144 compounds

from patchwork array

were synthesised and

tested

Coloured for potency,

sized by ligand efficiency

Clear that some

Indazoles are more

promising than others

Array

Indazole R1

Sparse Array Data Analysis

Scatter Plot (2)

FW-Fit:RG-All:mol8_GR203498:GTPgS_CCR4_Human_Antagonist_pic50_Value (1)

5.5 6 6.5 7 7.5

5

5.5

6

6.5

7

7.5Statistical analysis was done to

evaluate „additivity‟

Free Wilson model: Predicted

potencies were plotted against

measured potencies

The FW model show potential

excellent additivity with no

outliers.

Measured potency

Pre

dic

ted p

ote

ncy f

rom

FW

model

Predicted Potency for the complete array of 576

compounds (Fit and Predict), only Actives (pIC50>6.5 shown)

RG-R1 (12 Variants)

RG-R2

(48 Variants)

Array

Indazole R1

Find the predicted most potent compounds that haven’t already been synthesized

Array

Indazole R1

RG-R1 (12 variants)

RG-R2

(48 variants)

C1C2

C4

C5 C6

C3

C7

Predicted potent compounds

All compounds subsequently synthesized had measured potencies

within +/- 0.2 pIC50 of the predicted value

Validated the Additivity assumption

Identified promising alternatives which were sent for further PK

analysis – potential back up to the current pre-candidate

C4

Predicted GTPgS = 7.5

BEI = 14.2

Measured = 7.4

C2Predicted GTPgS = 7.5

BEI = 13.5

Measured = 7.6

C5


BEI = 15.6

Measured = 7.5

C3


BEI = 14.8

Measured = 7.3

C1Predicted GTPgS = 7.6

BEI = 16.0

Measured = 7.6

CAT friendly example Sparse Array Automation

CAT : Automated array chemistry system

A particular design (nicknamed the Tetris array) which is „array

automation‟ friendly and thus allows these investigational approaches

to be carried out efficiently from a synthetic perspective.

Exploration of Chemical space coverage for a Dual targetting programme

Scatter Plot

RG-R6

[4*]C

[4*]C(C)C1(C)CC1

[4*]C(CCO)C1CC1

[4*]CC(=C)C

[4*]CCC

[4*]CCCOC

[4*]CCN1CCOCC1

[4*]CCc1c[nH]c(n1)C(C)C

[4*]CCc1ccccc1OCC

[4*]CCn1cccn1

[4*]Cc1ccc(Cl)cc1

[4*]Cc1cccnc1

R6 monomers (67)

R4 m

onom

ers

(96)

Status of exploration

32 R4 and 12 R6 monomers were chosen for

inclusion in Sparse array

CAT friendly 8 (from 32) x 12 Tetris Array

The experimental design chosen is a 8 x12 chosen from a potential 32 x 12 fully enumerated array (384 potential compounds).

– (¼ fraction)

Each coloured block represents one of the 32 R4 monomers

– Each R4 monomer is used 3 times

– Each R6 monomer is used 12 times

Bar Chart (2)

Factor 2

8 8 8 8 8 8 8 8 8 8 8 8

0

1

2

3

4

5

6

7

8

Scatter Plot (2)

Factor 2

Level 1 of ALevel 10 of ALevel 11 of ALevel 12 of ALevel 13 of ALevel 14 of ALevel 15 of ALevel 16 of ALevel 17 of ALevel 18 of ALevel 19 of A

Level 2 of ALevel 20 of ALevel 21 of ALevel 22 of ALevel 23 of ALevel 24 of ALevel 25 of ALevel 26 of ALevel 27 of ALevel 28 of ALevel 29 of A

Level 3 of ALevel 30 of ALevel 31 of ALevel 32 of A

Level 4 of ALevel 5 of ALevel 6 of ALevel 7 of ALevel 8 of ALevel 9 of A

R6 Monomers

R4 M

ono

mers

Sparse array results

Using the CAT the synthesis was done efficiently and effectively

– Synthesis was actually done using 8 linear (1x12) arrays

For the Sparse array synthesis of 77 of the 96 compounds was achieved and the compounds delivered to screening.

– This is approximately 75% of full sparse array

– Only 20% of the fully enumerated array

R6 bar

RG_R6 Code

7

8 8

6

5

8 8

4

7 7

2

7

C15 C18 C21 C23 C30 C36 C39 C53 C54 C57 C60 C660

1

2

3

4

5

6

7

8

design

RG_R6 Code

C15 C18 C21 C23 C30 C36 C39 C53 C54 C57 C60 C66

B11

B13

B16

B33

B34

B35

B37

B39

B40

B41

B44

B46

B54

B55

B56

B57

B58

B60

B61

B63

B66

B70

B73

B76

B80

B87

B88

B9

B91

B92

B93

B94

Monomer contribution

The Programme team

concluded that the

chemistry within this area of

chemical space was well

understood wrt target

potency.

The Programme team

predicted potent analogues

with targetted physchem

profiles for synthesis

R4 Coeff bar

RG_R4 Code

B11 B16 B34 B37 B40 B44 B54 B56 B58 B61 B66 B73 B80 B88 B91 B93

-1.5

-1

-0.5

0

0.5

1

1.5

2

R6 coeff bar

RG_R6 Code

C15 C18 C21 C23 C30 C36 C39 C53 C54 C57 C60 C66

-2

-1.5

-1

-0.5

0

0.5

Further Developments Dual target

The monomers chosen in the array were selected to create Primary actives but were not thought likely to have any potential in Secondary target assay

However, surprisingly 14 compounds were found to be active in the second assay

– Currently being followed up in the programme team as potential dual antagonists

FW (2)

RG_R6 Code

C15 C18 C21 C23 C30 C36 C39 C53 C54 C57 C60 C66

5.2

5.4

5.6

5.8

6

6.2

6.4

Sized by Primary potency

coloured by R4 group

Further analogue expansion

around this monomer

Ort

hogonal d

ual assay

O

NH

O

EXAMPLE 3EXTENDED 3 RG TETRIS SPARSE ARRAY

3 points of change on the molecule

Acids

CorePhenols

Extended 3 RG Tetris Sparse array

Cores (A) = 3 (These were used to explore a stereo chemistry question)

Phenols (B) = 4

Acids (C) = 24

All acids represented 3 times

3x4x24 = 288 compounds

25% of full array synthesised

Distribution „balanced‟

Extended TETRIS array

Coloured by Acid monomer group

Latin Squares: Symmetrical design spaces

Useful for n x n x n problemswhere n = number of monomers in each RG position

Eg 6 R1 x 6 R2 x 6 R3

A 1/n fraction is selected

OTHER DESIGN TYPES

NR1R2

R3

Three RG positions – Latin Squares

Scatter PlotPie Chart

Factor 1

A1 A2 A3 A4 A5 A6

B1

B2

B3

B4

B5

B6

Each possible pair of monomers is present

once and that each monomer is present an

equal number of times – defined by the array

dimensions.

Predictive Array Design: LIPKIN, ROSE,SAR and QSAR in Environmental Research, 2002 Vol 13 (3-4) pp425-432

Pros and Cons of Sparse array approaches

Objective exploration generates an optimal data

set for ANOVA / Free-Wilson analysis.

Complete evaluation of potency response within the

design space from only a fraction of the possible

compounds

Defined endpoint to the work

Excellent data set for QSAR

Chemistry may be more difficult to carry-out

Needs a reasonable resource commitment

upfront

Needs majority of compounds chosen in the

array to be made and measured for the analysis to

be robust

Assumes Additivity (but then so does Linear SAR

exploration)

Learnings from experience

Ideally 3 examples minimum for each monomer within the

design, although 2 will work for a robust assay and

chemistry

Need to have confidence in getting some active

compounds

– If all the compounds are inactive its difficult to fit a

model!

Confidence in ability to synthesize compounds

– Some loss of particular compounds can be tolerated

but if whole reactions fail then the array design will be

compromised

Summary

Experimental Design may provide an alternative /complementary strategy

which may be suitable in some circumstances

– E.g. Initial exploration of new monomer space

– Identification of back up compounds

– Establish Addivity in the series

Efficient Lead Optimisation by exploring more than one point of

change at the same time on the molecular template

Can unearth some surprises which may never have been found by

traditional processes

There are different design types for different situations

– Software is available to create the designs

– Work well in situations where the bespoke synthesis is contracted out

Tony Cooper

Heather Hobbs

Nick Barton

Stephen Pickett

Darren Green

Acknowledgements

Medicinal Chemistry

Computational Chemistry

References to literature to date

How design Concepts can Improve Experimentation: Mager 1997

– Use of iterative search techniques to find monomers which have the required levels of particular physico chemical properties to fit into an ideal experimental design

Statistical Molecular Design of BB for CombiChem: Linusson, Wold etal 1999

– Selection of BB‟s using t- scores of the enumerated library as variables and then applying D-Optimal , Space Filling or Cluster based selection strategies

Predictive Array Design: LIPKIN, ROSE etal, 2002– Use of Latin Squares

– Simulated Annealing for Monomer assignment

The use of Design of Experiments to develop efficient Arrays for … · Can only explore chemical space defined by R-group combinations in the training set. Assessing Additivity Assumptions

Documents