This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
• The pharmacophore pattern of a molecule characterizes the relative arrangement of all its pharmacophore types– What pharmacophore types are represented?
– How are they arranged (spatially, topologically) with respect to each other ?
– How can these aspects be captured numerically to yield molecular descriptors of the pharmacophore pattern?
• N-dimensional vector D(M)=[D1(M), D2(M), …,DN(M)]; each Di encodes an element of the pharmacophore pattern– Allows meaningful quantitative definitions of molecular
similarity: • Neighborhood Behavior: Similar molecules - characterized by covariant
vectors - are likely to display similar biological properties
• As chemists do not easily perceive the pharmacophore pattern, such covariance may reveal hidden but real molecular relatedness…
– May serve as starting point for searching a binding pharmacophore – the subset of features that really participate in binding to a receptor
• Machine learning to select those elements Di that are systematically present in actives, but not in inactives of a molecular learning set!
Basis Basis TripletsTriplets::• all possible feature combinationsall possible feature combinations• at a given series of distances…at a given series of distances…
First key improvement: First key improvement: Fuzzy Fuzzy mapping of mapping of atom triplets onto basis triplets in 2D-FPTatom triplets onto basis triplets in 2D-FPT
33 33
33
44
66
77
44
33 44
55
55 33
0 0 0 … 0 0 … +6 … … +3 … … … … 0 …
55
55 44
Hp3-H
p3-Hp3
Hp3-H
p3-Hp3
Hp3-H
p3-Hp4
Hp3-H
p3-Hp4
Hp3-H
p3-Hp5
Hp3-H
p3-Hp5
…… Ar4-H
p3-Hp4
Ar4-H
p3-Hp4
Ar4-H
p3-Hp5
Ar4-H
p3-Hp5
…… ………… …… Hp7-A
r4-PC6
Hp7-A
r4-PC6
……Hp3-H
A5-A
r5
Hp3-H
A5-A
r5
Hp4-H
A5-A
r5
Hp4-H
A5-A
r5
………… ……
Di(m) = total occupancy of basis triplet i in molecule m.
Combinatorial enumeration of basisCombinatorial enumeration of basis tripletstriplets• Example: there are 36796 basis triplets, verifying triangle
inequalities, when considering 6 pharmacophore types and 11 edge lenghts between Emin=3 to Emax=13 with an increment of Estep=1: (3, 4, 5,…13)– Canonical representation: T1d23-T2d13-T3d12 with T3≥T2≥T1
(alphabetically).
44
66
77
Hp7-Ar4-PC6
Ar4-Hp7-PC6
– Out of two corners of a same type, priority is given to the one opposed to the shorter edge.
• The triplet matching score represents the optimal degree of pharmacophore field overlap:– if corner k of the triplet is of pharmacophore type T, e.g. F(k,T)=1,
then it contributes to the total pharmacophore field of type T, observed at a point P of the plane:
)exp(),()(2
,
3
1Pk
kTTdTkFP
Horvath, D. ComPharm pp. 395-439; in "QSPR /QSAR Studies by Molecular Descriptors", Diudea, M., Editor, Nova Science Publishers, Inc., New York, 2001
Control parameters for tControl parameters for tririplet enumerationplet enumeration & & mmatchingatching in two 2D-FPT versions. in two 2D-FPT versions.
Parameter Description FPT-1 FPT-2
Emin Minimal Edge Length of basis triangles (number of bonds between two pharmacophore types)
2 4
Emax Maximal Triangle Edge Length of basis triangles 12 15
Estep Edge length increment for enumeration of basis triangles 2 2
e Edge length excess parameter: in a molecule, triplets with edge length > Emax+e are ignored
0 2
Maximal edge length discrepancy tolerated when attempting to overlay a molecular triplet atop of a basis triangle.
2 2
Hp = Ar
Gaussian fuzziness parameter for apolar (Hydrophobic and Aromatic) types
0.6 0.9
PC = NC
Gaussian fuzziness parameter for charged (Positive and Negative Charge) types
0.6 0.8
HA = HD
Gaussian fuzziness parameter for polar (Hydrogen bond Donor and Acceptor) types
0.6 0.7
l Aromatic-Hydrophobic interchangeability level 0.6 0.5
Number of basis triplets at given setup 4494 7155
Second key improvement: Second key improvement: Proteolytic Proteolytic equilibrium dependence of 2D-FPTequilibrium dependence of 2D-FPT
Ar5-N
C5-
PC8
Ar5-N
C5-
PC8
Ar8-N
C8-
PC8
Ar8-N
C8-
PC8
?12%
88%
Third key improvement: Third key improvement: a novel similarity a novel similarity scoring scheme for 2D-FPTscoring scheme for 2D-FPT
• Classical Euclidean and Hamming distances increase whenever k(m,M)=|Dk(M)-Dk(m)| >0…– pairs of small & simple molecules (m,m’), with
Dk(m)=Dk(m’)=0 for almost all the triplets k, have few non-zero contributions
– large & complex compounds (M,M’) with common, but slightly differently populated triplets Dk(M)Dk(M’) have many small contributions that may nevertheless sum up to higher Euclidean scores!
• With correlation coefficients, the importance of common triplets, contributing to the cross-product Dk(m)xDk(M) may be overemphasized…
Piecewise monitoring of the differences in the Piecewise monitoring of the differences in the fingerprint…fingerprint…
• A triplet k may, with respect to a pair of molecules, be shared (++), null (--) or exclusive (+-)– fuzzy levels of association to each category c={(++),(--),(+-)}
such that ++(M,m) + +- (M,m) + -- (M,m) =1
• Specifically calculate, for each category c:– fractions of triplets fc in that category, – weighed, normed partial Hamming distances Wc:
fc
M ,m1
N
N T
kc
M , mT k 1
Wc
m ,Mk 1
NT
W k kc
m , M k m k M
k 1
N T
W k
•The FPT-specific dissimilarity score FPT(M,m):• the linear combination of fractions and partial Hamming
distances with optimal Neighborhood Behavior with respect to a
subset of training data
•The FPT-specific dissimilarity score FPT(M,m):• the linear combination of fractions and partial Hamming
distances with optimal Neighborhood Behavior with respect to a
subset of training data
F P T m , M 0.1323 W+ - m , M
0.6357 W+ + m , M
0.2795 1 f + + m , M
s
1.0
)()(
)()(
)(rand
FN
rand
FP
FNFP
NN
NNs
Neighborhood behavior: in how far does Neighborhood behavior: in how far does structural similarity guarantee similar activities? structural similarity guarantee similar activities?
(M,m) l (M,m)> l
(M
,m)
s
TrueTruePositivesPositives (TP) (TP)
FalseFalsePositivesPositives (FP) (FP)
False (?)False (?)NegativesNegatives
(FN) (FN)
True True NegativesNegatives
(TN) (TN)
)()(
)()(
)(rand
FN
rand
FP
FNFP
NN
NNs
opt
s
mMmM
mMmMs
),(),(
),(),()( )(
s
BioPrint® activity profile differences (m,M)
Specific metric significantly improves the Specific metric significantly improves the Neighborhood Behavior of 2D-FPT (v1)Neighborhood Behavior of 2D-FPT (v1)
0.8
0.82
0.84
0.86
0.88
0.9
0.92
0.94
0.96
0.98
1
0.4 0.45 0.5 0.55 0.6 0.65 0.7 0.75
Consistency
Opt
imal
ity
.
Sum of Heavy Atoms in Pair Dice-N Dice Dice-W FPT-1
.
Consistency inversion of specific FPT metric Consistency inversion of specific FPT metric may be due to top ranking of complex pairs!may be due to top ranking of complex pairs!
0.86
0.88
0.9
0.92
0.94
0.96
0.98
1
0.55 0.57 0.59 0.61 0.63 0.65 0.67 0.69 0.71 0.73
Consistency
Optim
alit
y
.
Dice FPT-1
Proteolytic equilibrium dependence significantly Proteolytic equilibrium dependence significantly improves the NB of 2D-FPTimproves the NB of 2D-FPT
0.84
0.86
0.88
0.9
0.92
0.94
0.96
0.98
1
0.4 0.45 0.5 0.55 0.6 0.65 0.7 0.75
Consistency
Optim
alit
y
.
2D-FPT using rule-based pharmacophore flagging strategy FPT-1
Some ‘activity cliffs’ in Some ‘activity cliffs’ in rule-based descriptor rule-based descriptor spacespace are smoothed out in are smoothed out in 2D-FPT-space2D-FPT-space
•Neutral
•Cation
•Neutral
•Anion
•Neutral
• 90%C
ation
•Neutral
• 50%C
ation
•Neutral
•Anion •Neutral
•Neutral
•Neu
tral
• 40%
Cat
ion
•Neu
tral
• 70%
Cat
ion
Neighborhood Behavior of 2D-FPT compares Neighborhood Behavior of 2D-FPT compares favorably to the one of other descriptors/metrics favorably to the one of other descriptors/metrics
0.8
0.82
0.84
0.86
0.88
0.9
0.92
0.94
0.96
0.98
1
0.4 0.45 0.5 0.55 0.6 0.65 0.7 0.75 0.8
Consistency
Optim
alit
y
.
Sum of Heavy Atoms in Pair CF FBPA PFR PF FPT-1 FPT-2
Confirm ed Actives (PF) Confirm ed Inactives (PF)Confirm ed Actives (O PT 3) Confirm ed Inactives (O PT3)
0
5
10
15
20
25
30
35
40
45
50
0
5
10
15
20
25
30
35
40
45
0 20 40 60 80 100 120 140 160 180 200
Selection S ize
0
10
20
30
40
50
60
70
80
90
0
10
20
30
40
50
60
70
80
90
Confirm ed Actives (PF) Confirm ed Inactives (PF)Confirm ed Actives (O PT 3) Confirm ed Inactives (O PT3)Confirm ed Actives (PF) Confirm ed Inactives (PF)Confirm ed Actives (FPT -2) Confirm ed Inactives (FPT-2)
0
5
10
15
20
25
30
35
40
45
50
0
5
10
15
20
25
30
35
40
45
50
0
5
10
15
20
25
30
35
40
45
0 20 40 60 80 100 120 140 160 180 200
Selection S ize
0
5
10
15
20
25
30
35
40
45
0 20 40 60 80 100 120 140 160 180 200
Selection S ize
% R
etri
eved
Se
ed
Co
mp
oun
ds%
Ret
riev
ed S
ee
d C
om
pou
nds
0
1
2
3
4
5
6
7
Confirm ed Actives (PF) Confirm ed Inactives (PF)Confirm ed Actives (O PT 3) Confirm ed Inactives (O PT3)
0
1
2
3
4
5
6
7
8
0
10
20
30
40
50
60
70
80
90
0 20 40 60 80 100 120 140 160 180 200
Selection S ize
0
1
2
3
4
5
6
7
0
1
2
3
4
5
6
7
Confirm ed Actives (PF) Confirm ed Inactives (PF)Confirm ed Actives (O PT 3) Confirm ed Inactives (O PT3)Confirm ed Actives (PF) Confirm ed Inactives (PF)Confirm ed Actives (FPT -2) Confirm ed Inactives (FPT-2)
0
1
2
3
4
5
6
7
8
0
1
2
3
4
5
6
7
8
0
10
20
30
40
50
60
70
80
90
0 20 40 60 80 100 120 140 160 180 200
Selection S ize
0
10
20
30
40
50
60
70
80
90
0 20 40 60 80 100 120 140 160 180 200
Selection S ize
% R
etri
eved
Se
ed
Co
mp
oun
ds
% R
etri
eved
Se
ed
Co
mp
oun
ds
% R
etri
eved
Se
ed
Co
mp
oun
ds
D2
TK
Successful QSAR model construction with 2D-Successful QSAR model construction with 2D-FPTFPT: predicting c-Met TK activity: predicting c-Met TK activity
4
4.5
5
5.5
6
6.5
7
7.5
8
8.5
9
4 4.5 5 5.5 6 6.5 7 7.5 8 8.5 9
Calculated pIC50
Exp
erim
enta
l pI
C50
.
Learning Set Compounds Validation Set Compounds
25 variables entering nonlinear model153 molecules for training: RMSE=0.4 (log units), R2=0.8240 molecules for validation: RMSE=0.8 (log units), R2=0.538 validation molecules out of 40 mispredicted by more than 1 log
ChemAxon Tools used for development…ChemAxon Tools used for development…
• Software written in Java, based on the ChemAxon API:
– molecule input and standardization tools
– ShortestPath class used to calculate topological distances
– pKaPlugin used to enumerate all microspecies and their relative concentrations at given pH value
– PMapper used to set pharmacophore flag in each microspecies – using a customized .xml setup file that relies on the actual formal charges seen in the microspecies to set flags
– JChem used for 2D-FPT storage
– Marvin visualizer adapted to display actual occurrences of triplets in molecules
In progress & on the wishlist…In progress & on the wishlist…
• 3D FPT version under study
– does it pay off to generate conformers? How many would you need to get better results than with 2D-FPT? What’s the best conformational sampler to use?
• Accessibility-weighted fingerprints?
– class to return (topological and/or 3D) estimate of the solvent-accessible fraction of an atom?
• Tautomer-dependent fingerprints?
– if tautomers and their percentage were enumerated like any other microspecies…