Leiden University. The university to discover. Desirability Indexes for Soft Constraint Modeling in Drug Design Johannes Kruisselbrink E-mail: jkruisse@liacs.nl.
Post on 14-Dec-2015
214 Views
Preview:
Transcript
Leiden University. The university to discover.
Desirability Indexes for Soft Constraint Modeling in Drug Design
Johannes KruisselbrinkE-mail: jkruisse@liacs.nl
Leiden University. The university to discover.
Scope
Context:- Quality measures for candidate molecular
structures for automated optimization
Contents:- Using the concept of Desirability for modeling
soft or fuzzy constraints- The applicability in automated drug design and
examples for integration within a scoring function
Leiden University. The university to discover.
Uncertainty and noise can arise in various parts of the optimization model:
Uncertainty and noise in optimization problems
Input X
External (uncontrollable) parameters A
Output YSystem(Model)
GOALS
f1max / min
f2max / min
|fmmax / min
g1 ≤ 0
g2 ≤ 0
|gn ≤ 0
Objectives
Constraints
A) Uncertainty and noise in the design variables
B) Uncertainty and noise environmental parameters
C) Uncertain and/or noisy system output
D) Vagueness / fuzziness in the constraints
Leiden University. The university to discover.
Our setup for Automated Molecule Evolution
Leiden University. The university to discover.
Automated molecule design- Search for molecular structures with specific
pharmacological or biological activity- Objectives: Maximization of potency of drug
(and minimization of side-effects)- Constraints: Stability, synthesizability, drug-
likeness, etc.- Aim: provide a set of molecular structures that
can be promising candidates for further research
Leiden University. The university to discover.
Molecule Evolution
Fragments extracted fromFrom Drug Databases
While not terminate do
Generate offspring O from PPt+1= select from (P U O)
Evaluate O
Initialize population P0
- ‘Normal’ evolution cycle- Graph based mutation and
recombination operators- Deterministic elitist (μ+λ) parent
selection (NSGA-II with Niching)
“The molecule evoluator. An interactive evolutionaryalgorithm for the design of drug-like molecules.“,E.-W. Lameijer, J.N. Kok, T. Bäck, A.P. IJzerman,J. Chem. Inf. Model., 2006, 46(2): 545-552.
Leiden University. The university to discover.
Objectives and constraintsObjectives- Activity predictors based on support vector machines:
- f1: activity predictor based on ECFP6 fingerprints
- f2: activity predictor based on AlogP2 Estate Counts
- f3: activity predictor based on MDL
Constraints- Bounds based on Lipinski’s rule of five and the minimal energy
confirmation:- Number of Hydrogen acceptors
- Number of Hydrogen donors
- Molecular solubility
- Molecular weight
- AlogP value
- Minimized energy
Leiden University. The university to discover.
Soft constraints in drug design
Leiden University. The university to discover.
Soft constraints in Drug Design
- Estimating the feasibility of candidate structures can be done using boundary values for certain molecule properties
- Examples are Lipinski’s rule-of-five and estimations of the minimal energy conformations
- But…, how strict are those rules?- Sometimes violations are easy to fix manually- Sometimes violations are not violations in practice
Leiden University. The university to discover.
Molecules failing Lipinski
Atorvastatin
Liothyronine
Ethopropazine
Olmesartan
Doxycycline
Bexarotene
Acarbose
MW
MW
MW / HA
log Plog P
(5.088)
HA / HDMW / HA
Leiden University. The university to discover.
Modeling constraints using desirability functions
Leiden University. The university to discover.
The real nature of the constraints
The constraints are of the following forms:
Where- x denotes a candidate structure- g(x) denotes the property value of x- Aj is the lower bound of the property filter- Bj is the upper bound of the property filter- reads: A is preferred to be smaller than B
Leiden University. The university to discover.
Modeling constraints as objectives
Constraints can be transformed into ‘objectives’ by mapping their values onto a function with the domain <0,1> where:- Values close to 0 correspond to undesirable results- Values close to 1 correspond to desirable results- Values between 0 and 1 fall into the grey area
1
0
violatedviolated satisfiedgrey area grey area
1
0
violated satisfiedgrey area
One-sided Two-sided
There are multiple ways to create such mappings!
Cutoff bound
Constraint bound
Leiden University. The university to discover.
Constraints in our studies
Fuzzy constraint scores based on Lipinski’s rule of five and bounds on the minimal energy confirmation:
Descriptor LB A B UB
Num H-acceptors 0 1 6 10
Num H-donors 0 1 3 5
Molecular solubility -6 -4 NA NA
Molecular weight 150 250 450 600
ALogP 0 1 4 5
Minimized energy NA NA 80 150
* Bounds settings were determined based on chemical intuition
Leiden University. The university to discover.
Harrington Desirability Functions
One-sided: Two-sided:
))'exp(exp()'( YYd
YbbdY 10)lnln('
)'exp()'(n
YYd
LU
LUYY
)(2
'
Leiden University. The university to discover.
Example one-sided Harrington DF
Molecular solubility:- Soft constraint: Y > -4- Absolute cutoff: Y < -6
))6(exp(exp(01.0
))4(exp(exp(99.0
10
10
bb
bb
6))01.0ln(ln(
4))99.0ln(ln(
10
10
bb
bb
10
10
61.5272-
44.6001
bb
bb
0637.3
8548.16
1
0
b
b
violated grey area satisfied
)))0637.38548.16exp(exp()( YYd ))'exp(exp()'( YYd
YbbdY 10)lnln('
Leiden University. The university to discover.
Example two-sided Harrington DF
Molecular weight:- Absolute lower cutoff: Y < 150- Lower bound constraint: Y > 250- Upper bound constraint: Y < 450- Absolute upper cutoff: Y > 600
Problematic!
- No support for non-symmetric boundaries- No explicit support for ‘completely satisfied’ intervals
)'exp()'(n
YYd
LU
LUYY
)(2
'
Leiden University. The university to discover.
violated grey area
satisfied violatedgrey area
8273.7
150600
)150600(2exp)(
YYd
)'exp()'(n
YYd
LU
LUYY
)(2
'
Example two-sided Harrington DF
One possibility:- Make symmetric- Base d(Y) on cutoff bounds- Tune n using a constraint bound
7.827399.0lnlog
5556.099.0ln
5556.0exp99.0
150600
)150600(2502exp)250(
5556.0
n
d
n
n
n
Leiden University. The university to discover.
)'exp()'(n
YYd
LU
LUYY
)(2
'
Example two-sided Harrington DF
Or:- Make symmetric- Base d(Y) on constraint bounds- Tune n using a cutoff bound
2.203301.0lnlog
201.0ln
2exp01.0
250450
)250450(1502exp)150(
2
n
d
n
n
n
violated grey area
satisfied violatedgrey area
2033.2
250450
)250450(2exp)(
YYd
Leiden University. The university to discover.
violated grey area
satisfied violatedgrey area
5.6927
200525
)200525(2exp)(
YYd
)'exp()'(n
YYd
LU
LUYY
)(2
'
Example two-sided Harrington DF
Or:- Make symmetric- Base d(Y) on average between
constraint bounds and cutoff bounds- Tune n using a cutoff bound
5.6927
01.0lnlog
3077.101.0ln
3077.1exp01.0
200525
)200525(1502exp)150(
3077.1n
d
n
n
n
Leiden University. The university to discover.
Harrington
- Advantages:
- Maps onto a continuous function- Strictly monotonous mapping- Distinction between completely violated points
- Downsides:
- Tuning the DF is somewhat arbitrary- Distinction between completely satisfied solutions- Not really suited for ‘completely satisfied intervals’- Does not allow non-symmetric constraints
Leiden University. The university to discover.
Derringer Desirability Functions
One-sided: Two-sided:
UY
UYBUB
UYBY
Ydl
,0
,
,1
)(
UY
UYTUT
UY
TYLLT
LYLY
Yd u
l
,0
,
,
,0
)(
Leiden University. The university to discover.
violated grey area satisfied
Example one-sided Derringer DF
Molecular solubility:- Soft constraint: Y > -4- Absolute cutoff: Y < -6
6,0
64,64
64,1
)(
Y
YY
Y
Ydl
Note: l=1linear
UY
UYBUB
UYBY
Ydl
,0
,
,1
)(
Leiden University. The university to discover.
Example two-sided Derringer DF
Molecular weight:- Absolute cutoff: Y < 150- Soft constraint: Y > 250- Soft constraint: Y < 450- Absolute cutoff: Y > 600
600,0
600450,600450
600450250,1
250150,150250
150150,0
)(
Y
YY
Y
YY
Y
Ydu
l
violated grey area
satisfied violatedgrey area
UY
UYTUT
UY
TYLLT
LYLY
Yd u
l
,0
,
,
,0
)(
Leiden University. The university to discover.
Derringer
- Advantages:
- Easy straightforward implementation- Control for modeling non-symmetric constraints- Easy integration for ‘completely satisfied’ intervals- No distinction between completely satisfied solutions
- Downsides:
- Maps onto a discontinuous function- Not strictly monotonous (just monotonous)- No distinction between solutions after lower cutoff
Leiden University. The university to discover.
Aggregating the Desirability Functions into score functions
Leiden University. The university to discover.
Many objective optimization
- Modeling fuzzy constraints using DFs generates many additional objective functions
- In our case:- 3 original objectives + 6 constraints 9 objectives
- The possibilities:- Pareto optimization
- Aggregation
- A combination of the both
Leiden University. The university to discover.
Aggregation
- Desirability functions can be easily integrated into one single scoring function, e.g.:- Weighted sum- Min performance- Geometrical mean- Average
kk
iii xgDxF
1
1
xgDxF iiki ...1
min
k
iii xgD
kxF
1
1
k
iiii xgDaxF
1
The Desirability Index
Leiden University. The university to discover.
Remodeling the objectives
- Desirability index aggregation of the objectives requires a normalization function that maps the objective function values to the interval [0,1]
- One possibility:
- Or…, use Harrington or Derringer DFs
maxexpˆ * xffdxf iiii
Original objective function minimization
Leiden University. The university to discover.
The aggregation possibilities
- Full aggregation:- Aggregate the constraints and the objectives into one
quality score (1 objective)
- Partial aggregation:- Aggregate the constraints into one constraint score
(1 extra objective 4 objectives)
- Aggregate the constraints and the objectives into two separate scoring function (2 objectives)
Leiden University. The university to discover.
A case study
Leiden University. The university to discover.
Experiments
Comparison of:- Complete aggregation (1 objective)- Separate aggregation of objectives and constraints (2
objectives)- Only aggregate constraint scores (4 objectives)Objectives:- three activity prediction models for estrogen receptor
antagonistsEA settings:
- NSGA-II for the multi-objective test-cases- 80 parents / 120 offspring- 1000 generations- No niching
Leiden University. The university to discover.
4D Pareto fronts
The Pareto fronts obtained using three different scoring methods
Optimization direction
Complete aggregation (1 objective) Only aggregate constraint scores (4 objectives)
Aggregate constraints and objectives separately (2 objectives)
Leiden University. The university to discover.
Random subsets of the results
Leiden University. The university to discover.
Separate constraints and objectives
Color: constraint scores(white = 0 black = 1)
f3: MDL max (=1)
f2: ECFP max (=1)
f1: AlogP2 EC max (=1)
Tamoxifen
Leiden University. The university to discover.
Conclusions
Leiden University. The university to discover.
Discussion - Ranking issues
- DFs that can yield 0 values will generate 0 values for the performance when aggregating using the geometric mean
- DFs that make distinctions between completely satisfied constraints might be involved in unnecessary further optimization (maximization while already satisfied)
1
0
violated satisfiedgrey area
An ideal DF?
Never 0 (distinction on the degree of constraint)
When satisfied 1 (no distinction between satisfied regions)
Leiden University. The university to discover.
Conclusions
- Desirability Functions and Desirability Indexes for modeling soft / fuzzy constraints:- Are intuitive and easy to incorporate- Allow for easy integration of additional constraints- Incorporate the concept of vagueness present in all
rule-of-thumb measures- Prevent the optimization method from ruling out
promising candate structures
Leiden University. The university to discover.
Thank you!
Johannes KruisselbrinkNatural Computing GroupLIACS, Universiteit Leidene-mail: jkruisse@liacs.nlhttp://natcomp.liacs.nl
Leiden University. The university to discover.
Matlab codes(no presentation stuff, just for creating the DF plots)
Leiden University. The university to discover.
Harrington one-sided example
clfx = [0:.1:10];y = exp(-exp(-(-8 + 2 * x)));plot(x, y)ylim([-.1 1.1])xlabel('Y')ylabel('d(Y)')
Leiden University. The university to discover.
Harrington two-sided example
clfx = [0:.01:10];y = exp(-abs((2 * x - (6 + 4))/(6 - 4)).^(3));plot(x, y)ylim([-.1 1.1])xlabel('Y')ylabel('d(Y)')
Leiden University. The university to discover.
One-sided Harrington DF in MATLAB
clfx = [-8:.1:-2];y = exp(-exp(-(16.8548 + 3.0637 * x)));plot(x, y)hold onplot([-8 -6 -4 -2],[0 0 1 1], '-.r')ylim([-.1 1.1])xlabel('Y')ylabel('d(Y)')legend('Harrington DF', 'Linear DF', 'Location', 'NorthWest')
Leiden University. The university to discover.
Two-sided Harrington DF 1 in MATLAB
clfx = [0:1:800];y = exp(-abs((2 * x - (600 + 150))/(600 - 150)).^(7.8273));plot(x, y)hold onplot([0 150 250 450 600 850], [0 0 1 1 0 0], '-.r')ylim([-.1 1.1])xlabel('Y')ylabel('d(Y)')legend('Harrington DF', 'Linear DF', 'Location', 'NorthEast')
Leiden University. The university to discover.
Two-sided Harrington DF 2 in MATLAB
clfx = [0:1:800];y = exp(-abs((2 * x - (450 + 250))/(450 - 250)).^(2.2033));plot(x, y)hold onplot([0 150 250 450 600 850], [0 0 1 1 0 0], '-.r')ylim([-.1 1.1])xlabel('Y')ylabel('d(Y)')legend('Harrington DF', 'Linear DF', 'Location', 'NorthEast')
Leiden University. The university to discover.
Two-sided Harrington DF 3 in MATLAB
clfx = [0:1:800];y = exp(-abs((2 * x - (525 + 200))/(525 - 200)).^(5.6927));plot(x, y)hold onplot([0 150 250 450 600 850], [0 0 1 1 0 0], '-.r')ylim([-.1 1.1])xlabel('Y')ylabel('d(Y)')legend('Harrington DF', 'Linear DF', 'Location', 'NorthEast')
Leiden University. The university to discover.
One-sided Derringer DF in MATLAB
clfhold onx = [-8:.01:-2];y1 = (x >= -4) * 1 + (x < -4) .* (x >= -6) .* ((x + 6)/(-4 + 6)).^0.5;plot(x, y1, '-.b')y2 = (x >= -4) * 1 + (x < -4) .* (x >= -6) .* ((x + 6)/(-4 + 6)).^1;plot(x, y2, '--r')y3 = (x >= -4) * 1 + (x < -4) .* (x >= -6) .* ((x + 6)/(-4 + 6)).^2;plot(x, y3, 'g')ylim([-.1 1.1])xlabel('Y')ylabel('d(Y)')legend('Derringer DF (l=0.5)', 'Derringer DF (l=1)', 'Derringer DF (l=2)',
'Location', 'NorthWest')
Leiden University. The university to discover.
Two-sided Derringer DF in MATLAB
clfhold onx = [0:.1:800];y1 = (x >= 150) .* (x < 250) .* ((x - 150) / (250 - 150)).^(0.5) + (x >= 250) .* (x
<= 450) .* 1 + (x > 450) .* (x <= 600) .* ((x - 600) / (450 - 600)).^(0.5);plot(x, y1, '-.b')y2 = (x >= 150) .* (x < 250) .* ((x - 150) / (250 - 150)).^(1) + (x >= 250) .* (x <=
450) .* 1 + (x > 450) .* (x <= 600) .* ((x - 600) / (450 - 600)).^(1);plot(x, y2, '--r')y3 = (x >= 150) .* (x < 250) .* ((x - 150) / (250 - 150)).^(2) + (x >= 250) .* (x <=
450) .* 1 + (x > 450) .* (x <= 600) .* ((x - 600) / (450 - 600)).^(2);plot(x, y3, 'g')ylim([-.1 1.1])xlabel('Y')ylabel('d(Y)')legend('Derringer DF (l=0.5)', 'Derringer DF (l=1)', 'Derringer DF (l=2)',
'Location', 'NorthEast')
top related