The use of Design of Experiments to develop Efficient Arrays for SAR and Property Exploration Chris Luscombe, Computational Chemistry GlaxoSmithKline
The use of Design of Experiments to develop Efficient Arrays for SAR and Property Exploration
Chris Luscombe,
Computational Chemistry
GlaxoSmithKline
Summary of Talk
Traditional approaches
SAR
Free-Wilson
Design of experiments
Examples
Learnings
Conclusions
Objectives of Lead Optimisation
Design Array experiments to answer SAR
questions to enhance potency
Improve physicochemical properties
Discover new monomer groups of interest.
O
NH
O
LHS
Core/Linker RHS
How do we traditionally determine SAR
array design
Optimisation at a single position
allows
– Easy synthesis planning
– Detailed understanding of SAR
Assumes FW type additivity
– Substituent contributions at
different positions are
independent and additive
This approach is widely used and
very successful
Free Wilson theory R1-Core-R2
First mathematical technique for quantitative SAR
Response = effect of Core + effect R1 substituent + effect of R2 substituent
Assumptions
– Core makes a constant contribution
– All contributions are additive
– No interactions between core and substituent
– No interaction between substituents
Can only explore chemical space defined by R-group combinations in the
training set
Assessing Additivity Assumptions
Assessment of Additive/Nonadditive Effects in Structure-Activity Relationships: Implications for Iterative Drug Design J. Med. Chem. 2008, 51, 7552–7562 Yogendra Patel, Valerie J. Gillet, Trevor Howe, Joaquin Pastor, Julen Oyarzabal, and Peter Willett
Design of Experiments (DOE)
Experimental Design approaches are well established for the optimization of multi-factor experiments, such as reaction conditions.
Typically these domains utilize „continuous‟ variables such as temperature, addition rate, time etc
Can these same techniques be use where each variable is categorical?
DOE in Medicinal Chemistry?
We propose that Design of Experiments (DOE) based
approaches can be applied to array scenarios where the
full (e.g. M x N) array cannot be synthesized for practical
reasons.
By treating each monomer in the array as a categorical
factor of the design, a balanced fractional (“Sparse”) array
design can be generated.
This novel approach can be successfully used to
understand and exploit the SAR of a late stage
optimisation programme
Example of a Sparse Array1/3rd fraction from an 6 x 12 array
Scatter Plot
Phenol_ID
amine1
amine2
amine3
amine4
amine5
amine6
Scatter Plot
Phenol_ID
amine1
amine2
amine3
amine4
amine5
amine6
Questions
Is the fraction selected sufficient to explore the
chemistry space?
Can we adequately assess monomer potential?
Can we predict the „missing‟ compounds?
Is it a practical way to direct chemistry synthesis?
Is it an efficient process?
Does it work?
Sparse Array :What are the key steps?
Monomers
• Identify/ select which monomers to incorporate into the design
Design
• Create an experimental design template appropriate to the investigational space define from the monomer numbers
Optimise
• Allocate monomers into the design so define which compounds to actually synthesize
Analyse
• Measure assay endpoints and build free-Wilson models to understand the SAR
Monomer Selection
• Identify appropriate monomers at each
position
• Use diversity, physico-chemical, ADME
and scientific rationale to reduce the
monomer lists
Calculate the average desirability score
from each monomer across the whole
virtual library.
Select the higher scoring ones to be
included in the Final DOE array design
amines
Mean_NR profile_by_amine
0.35 0.4 0.45 0.5 0.55 0.6
A1
A10
A11
A12
A13
A14
A15
A16
A17
A18
A19
A2
A20
A21
A22
A23
A24
A3
A4
A5
A6
A7
A8
A9
aldehydes
Mean_NR profile_by_aldehyde
0.4 0.45 0.5 0.55 0.6
B1
B10
B11
B12
B13
B14
B15
B16
B17
B18
B19
B2
B20
B21
B22
B3
B4
B5
B6
B7
B8
B9
nr score
Binned NR profile
2
109
901
1298
897
1382
2437
2820
1624
146
0 0.09 0.18 0.27 0.36 0.45 0.54 0.63 0.72 0.81 0.90
500
1000
1500
2000
2500
landscape
NR profile
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Sparse Array :What are the key steps?
Monomers
• Identify/ select which monomers to incorporate into the design
Design
• Create an experimental design template define from the monomer numbers
Optimise
• Allocate monomers into the design so define which compounds to synthesize
Analyse
• Measure assay endpoints and build free-Wilson models to understand the SAR
Design Creation (Sparse arrays)
Create an in-complete balanced D-Optimal design
– Even numbers of monomers at each R position
– D Optimality
– Force balanceScatter Plot
Amine ID
Level 1 of B Level 2 of B Level 3 of B Level 4 of B Level 5 of B Level 6 of B
Level 1 of A
Level 10 of A
Level 11 of A
Level 12 of A
Level 2 of A
Level 3 of A
Level 4 of A
Level 5 of A
Level 6 of A
Level 7 of A
Level 8 of A
Level 9 of A
Many software packages around
which can generate these types of
Experimental Design
Sparse Array :What are the key steps?
Monomers
• Identify/ select which monomers to incorporate into the design
Design
• Create an experimental design template appropriate to the investigational space define from the monomer numbers
Optimise
• Allocate monomers into the design to define which compounds to actually synthesize
Analyse
• Measure assay endpoints and build free-Wilson models to understand the SAR
Which Monomer at which Position?
In principle monomers could be allocated in any order,
including random, into the DOE array
GSK use an in-house algorithmic approach to allocate
monomers into the define positions in the DOE array so
as to optimise the compounds to be synthesized against
another property
– Eg diversity,
– lead-likeness,
– logP etc
Compounds A:Phenols B:Amines
1 Phenol 3 Amine 2
2 Phenol 6 Amine 6
3 Phenol 7 Amine 5
4 Phenol 3 Amine 3
5 Phenol 9 Amine 5
6 Phenol 6 Amine 1
7 Phenol 10 Amine 4
8 Phenol 10 Amine 2
9 Phenol 8 Amine 4
10 Phenol 5 Amine 3
11 Phenol 4 Amine 6
12 Phenol 5 Amine 1
13 Phenol 1 Amine 1
14 Phenol 11 Amine 5
15 Phenol 12 Amine 2
16 Phenol 12 Amine 6
17 Phenol 1 Amine 5
18 Phenol 8 Amine 3
19 Phenol 7 Amine 4
20 Phenol 11 Amine 6
21 Phenol 2 Amine 1
22 Phenol 9 Amine 3
23 Phenol 4 Amine 2
24 Phenol 2 Amine 4
Scatter Plot
Amine ID
Level 1 of B Level 2 of B Level 3 of B Level 4 of B Level 5 of B Level 6 of B
Level 1 of A
Level 10 of A
Level 11 of A
Level 12 of A
Level 2 of A
Level 3 of A
Level 4 of A
Level 5 of A
Level 6 of A
Level 7 of A
Level 8 of A
Level 9 of A
Sparse Array :What are the key steps?
Monomers
• Identify/ select which monomers to incorporate into the design
Design
• Create an experimental design template appropriate to the investigational space define from the monomer numbers
Optimise
• Allocate monomers into the design so define which compounds to actually synthesize
Analyse
• Measure assay endpoints and build free-Wilson models to understand the SAR
FW analysis of monomer contribution
A Free –Wilson analysis is a
regression based approach to
establish monomer contributions to a
predictive model
A high degree of fit suggests that the
potency profile could be additive in
nature.
– The presence of outliers may imply
non-additive behaviour
– Assess potential interaction terms
between monomers if the output
appears to be non-additive
Design-Expert® Software
GTPgS_CCR4_pIC50
Color points by value of
GTPgS_CCR4_pIC50:
7.7
5.5
2
2
2
22
2
Actual
Pre
dic
ted
Predicted vs. Actual
5.10
5.78
6.45
7.13
7.80
5.00 5.46 5.91 6.37 6.83 7.28 7.74
Example 1 Sparse array to evaluate defined N x M combinatorial space with a fractional subset
Design
– 12 Indazoles (R1)
Identified using
classical SAR
approaches
– 48 sulphonyl chlorides
monomers (R2) selected from library using
a variety of criteria
– Lead-likeness
score
Scatter Plot
Indazoles
R1
R2
•12 monomers per R1
•3 monomers per R2
Measured Potency for the Sparse array
142 of 144 compounds
from patchwork array
were synthesised and
tested
Coloured for potency,
sized by ligand efficiency
Clear that some
Indazoles are more
promising than others
Array
Indazole R1
Sparse Array Data Analysis
Scatter Plot (2)
FW-Fit:RG-All:mol8_GR203498:GTPgS_CCR4_Human_Antagonist_pic50_Value (1)
5.5 6 6.5 7 7.5
5
5.5
6
6.5
7
7.5Statistical analysis was done to
evaluate „additivity‟
Free Wilson model: Predicted
potencies were plotted against
measured potencies
The FW model show potential
excellent additivity with no
outliers.
Measured potency
Pre
dic
ted p
ote
ncy f
rom
FW
model
Predicted Potency for the complete array of 576
compounds (Fit and Predict), only Actives (pIC50>6.5 shown)
RG-R1 (12 Variants)
RG-R2
(48 Variants)
Array
Indazole R1
Find the predicted most potent compounds that haven’t already been synthesized
Array
Indazole R1
RG-R1 (12 variants)
RG-R2
(48 variants)
C1C2
C4
C5 C6
C3
C7
Predicted potent compounds
All compounds subsequently synthesized had measured potencies
within +/- 0.2 pIC50 of the predicted value
Validated the Additivity assumption
Identified promising alternatives which were sent for further PK
analysis – potential back up to the current pre-candidate
C4
Predicted GTPgS = 7.5
BEI = 14.2
Measured = 7.4
C2Predicted GTPgS = 7.5
BEI = 13.5
Measured = 7.6
C5
Predicted GTPgS = 7.6
BEI = 15.6
Measured = 7.5
C3
Predicted GTPgS = 7.5
BEI = 14.8
Measured = 7.3
C1Predicted GTPgS = 7.6
BEI = 16.0
Measured = 7.6
CAT friendly example Sparse Array Automation
CAT : Automated array chemistry system
A particular design (nicknamed the Tetris array) which is „array
automation‟ friendly and thus allows these investigational approaches
to be carried out efficiently from a synthetic perspective.
Exploration of Chemical space coverage for a Dual targetting programme
Scatter Plot
RG-R6
[4*]C
[4*]C(C)C1(C)CC1
[4*]C(CCO)C1CC1
[4*]CC(=C)C
[4*]CCC
[4*]CCCOC
[4*]CCN1CCOCC1
[4*]CCc1c[nH]c(n1)C(C)C
[4*]CCc1ccccc1OCC
[4*]CCn1cccn1
[4*]Cc1ccc(Cl)cc1
[4*]Cc1cccnc1
R6 monomers (67)
R4 m
onom
ers
(96)
Status of exploration
32 R4 and 12 R6 monomers were chosen for
inclusion in Sparse array
CAT friendly 8 (from 32) x 12 Tetris Array
The experimental design chosen is a 8 x12 chosen from a potential 32 x 12 fully enumerated array (384 potential compounds).
– (¼ fraction)
Each coloured block represents one of the 32 R4 monomers
– Each R4 monomer is used 3 times
– Each R6 monomer is used 12 times
Bar Chart (2)
Factor 2
8 8 8 8 8 8 8 8 8 8 8 8
0
1
2
3
4
5
6
7
8
Scatter Plot (2)
Factor 2
Level 1 of ALevel 10 of ALevel 11 of ALevel 12 of ALevel 13 of ALevel 14 of ALevel 15 of ALevel 16 of ALevel 17 of ALevel 18 of ALevel 19 of A
Level 2 of ALevel 20 of ALevel 21 of ALevel 22 of ALevel 23 of ALevel 24 of ALevel 25 of ALevel 26 of ALevel 27 of ALevel 28 of ALevel 29 of A
Level 3 of ALevel 30 of ALevel 31 of ALevel 32 of A
Level 4 of ALevel 5 of ALevel 6 of ALevel 7 of ALevel 8 of ALevel 9 of A
R6 Monomers
R4 M
ono
mers
Sparse array results
Using the CAT the synthesis was done efficiently and effectively
– Synthesis was actually done using 8 linear (1x12) arrays
For the Sparse array synthesis of 77 of the 96 compounds was achieved and the compounds delivered to screening.
– This is approximately 75% of full sparse array
– Only 20% of the fully enumerated array
R6 bar
RG_R6 Code
7
8 8
6
5
8 8
4
7 7
2
7
C15 C18 C21 C23 C30 C36 C39 C53 C54 C57 C60 C660
1
2
3
4
5
6
7
8
design
RG_R6 Code
C15 C18 C21 C23 C30 C36 C39 C53 C54 C57 C60 C66
B11
B13
B16
B33
B34
B35
B37
B39
B40
B41
B44
B46
B54
B55
B56
B57
B58
B60
B61
B63
B66
B70
B73
B76
B80
B87
B88
B9
B91
B92
B93
B94
Monomer contribution
The Programme team
concluded that the
chemistry within this area of
chemical space was well
understood wrt target
potency.
The Programme team
predicted potent analogues
with targetted physchem
profiles for synthesis
R4 Coeff bar
RG_R4 Code
B11 B16 B34 B37 B40 B44 B54 B56 B58 B61 B66 B73 B80 B88 B91 B93
-1.5
-1
-0.5
0
0.5
1
1.5
2
R6 coeff bar
RG_R6 Code
C15 C18 C21 C23 C30 C36 C39 C53 C54 C57 C60 C66
-2
-1.5
-1
-0.5
0
0.5
Further Developments Dual target
The monomers chosen in the array were selected to create Primary actives but were not thought likely to have any potential in Secondary target assay
However, surprisingly 14 compounds were found to be active in the second assay
– Currently being followed up in the programme team as potential dual antagonists
FW (2)
RG_R6 Code
C15 C18 C21 C23 C30 C36 C39 C53 C54 C57 C60 C66
5.2
5.4
5.6
5.8
6
6.2
6.4
Sized by Primary potency
coloured by R4 group
Further analogue expansion
around this monomer
Ort
hogonal d
ual assay
O
NH
O
EXAMPLE 3EXTENDED 3 RG TETRIS SPARSE ARRAY
3 points of change on the molecule
Acids
CorePhenols
Extended 3 RG Tetris Sparse array
Cores (A) = 3 (These were used to explore a stereo chemistry question)
Phenols (B) = 4
Acids (C) = 24
All acids represented 3 times
3x4x24 = 288 compounds
25% of full array synthesised
Distribution „balanced‟
Extended TETRIS array
Coloured by Acid monomer group
Latin Squares: Symmetrical design spaces
Useful for n x n x n problemswhere n = number of monomers in each RG position
Eg 6 R1 x 6 R2 x 6 R3
A 1/n fraction is selected
OTHER DESIGN TYPES
NR1R2
R3
Three RG positions – Latin Squares
Scatter PlotPie Chart
Factor 1
A1 A2 A3 A4 A5 A6
B1
B2
B3
B4
B5
B6
Each possible pair of monomers is present
once and that each monomer is present an
equal number of times – defined by the array
dimensions.
Predictive Array Design: LIPKIN, ROSE,SAR and QSAR in Environmental Research, 2002 Vol 13 (3-4) pp425-432
Pros and Cons of Sparse array approaches
Objective exploration generates an optimal data
set for ANOVA / Free-Wilson analysis.
Complete evaluation of potency response within the
design space from only a fraction of the possible
compounds
Defined endpoint to the work
Excellent data set for QSAR
Chemistry may be more difficult to carry-out
Needs a reasonable resource commitment
upfront
Needs majority of compounds chosen in the
array to be made and measured for the analysis to
be robust
Assumes Additivity (but then so does Linear SAR
exploration)
Learnings from experience
Ideally 3 examples minimum for each monomer within the
design, although 2 will work for a robust assay and
chemistry
Need to have confidence in getting some active
compounds
– If all the compounds are inactive its difficult to fit a
model!
Confidence in ability to synthesize compounds
– Some loss of particular compounds can be tolerated
but if whole reactions fail then the array design will be
compromised
Summary
Experimental Design may provide an alternative /complementary strategy
which may be suitable in some circumstances
– E.g. Initial exploration of new monomer space
– Identification of back up compounds
– Establish Addivity in the series
Efficient Lead Optimisation by exploring more than one point of
change at the same time on the molecular template
Can unearth some surprises which may never have been found by
traditional processes
There are different design types for different situations
– Software is available to create the designs
– Work well in situations where the bespoke synthesis is contracted out
Tony Cooper
Heather Hobbs
Nick Barton
Stephen Pickett
Darren Green
Acknowledgements
Medicinal Chemistry
Computational Chemistry
References to literature to date
How design Concepts can Improve Experimentation: Mager 1997
– Use of iterative search techniques to find monomers which have the required levels of particular physico chemical properties to fit into an ideal experimental design
Statistical Molecular Design of BB for CombiChem: Linusson, Wold etal 1999
– Selection of BB‟s using t- scores of the enumerated library as variables and then applying D-Optimal , Space Filling or Cluster based selection strategies
Predictive Array Design: LIPKIN, ROSE etal, 2002– Use of Latin Squares
– Simulated Annealing for Monomer assignment