Top Banner
Topological, pH-dependent Topological, pH-dependent Fuzzy Pharmacophore Triplet Fuzzy Pharmacophore Triplet Fingerprints Fingerprints 2D-FPT 2D-FPT Dragos Horvath Dragos Horvath , Fanny Bonachera , Fanny Bonachera UMR 8576 CNRS UMR 8576 CNRS Université des Sciences & Université des Sciences & Technologies de Lille Technologies de Lille [email protected] [email protected]
22

Topological, pH-dependent Fuzzy Pharmacophore Triplet Fingerprints 2D-FPT Dragos Horvath, Fanny Bonachera UMR 8576 CNRS Université des Sciences & Technologies.

Mar 26, 2015

Download

Documents

Timothy Joyce
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Topological, pH-dependent Fuzzy Pharmacophore Triplet Fingerprints 2D-FPT Dragos Horvath, Fanny Bonachera UMR 8576 CNRS Université des Sciences & Technologies.

Topological, pH-dependent Fuzzy Topological, pH-dependent Fuzzy Pharmacophore Triplet FingerprintsPharmacophore Triplet Fingerprints

2D-FPT2D-FPT

Dragos HorvathDragos Horvath, Fanny Bonachera, Fanny Bonachera

UMR 8576 CNRSUMR 8576 CNRS

Université des Sciences & Technologies de LilleUniversité des Sciences & Technologies de Lille

[email protected]@univ-lille1.fr

Page 2: Topological, pH-dependent Fuzzy Pharmacophore Triplet Fingerprints 2D-FPT Dragos Horvath, Fanny Bonachera UMR 8576 CNRS Université des Sciences & Technologies.

Pharmacophore PatternsPharmacophore Patterns

• Ligand-site affinity ~ functional group complementarity• Functional groups of similar physicochemical behavior

represent pharmacophore types: – Hydrophobic, Aromatic, Hydrogen Bond (HB) donors, Cations,

HB Acceptors, Anions.

• The pharmacophore pattern of a molecule characterizes the relative arrangement of all its pharmacophore types– What pharmacophore types are represented?

– How are they arranged (spatially, topologically) with respect to each other ?

– How can these aspects be captured numerically to yield molecular descriptors of the pharmacophore pattern?

Page 3: Topological, pH-dependent Fuzzy Pharmacophore Triplet Fingerprints 2D-FPT Dragos Horvath, Fanny Bonachera UMR 8576 CNRS Université des Sciences & Technologies.

Exploiting Exploiting ppharmacophore harmacophore ppatternsatterns……

• N-dimensional vector D(M)=[D1(M), D2(M), …,DN(M)]; each Di encodes an element of the pharmacophore pattern– Allows meaningful quantitative definitions of molecular

similarity: • Neighborhood Behavior: Similar molecules - characterized by covariant

vectors - are likely to display similar biological properties

• As chemists do not easily perceive the pharmacophore pattern, such covariance may reveal hidden but real molecular relatedness…

– May serve as starting point for searching a binding pharmacophore – the subset of features that really participate in binding to a receptor

• Machine learning to select those elements Di that are systematically present in actives, but not in inactives of a molecular learning set!

Page 4: Topological, pH-dependent Fuzzy Pharmacophore Triplet Fingerprints 2D-FPT Dragos Horvath, Fanny Bonachera UMR 8576 CNRS Université des Sciences & Technologies.

Tricentric Pharmacophore Fingerprints: Tricentric Pharmacophore Fingerprints: monitoring feature amonitoring feature arrangementrrangement

• Topological: the distance between two features equals the (minimal) number of chemical bonds between them

N

N

O

N

Cl

99 4

11

• Spatial: if stable conformers are known, use the distance in Ǻ between two features

Page 5: Topological, pH-dependent Fuzzy Pharmacophore Triplet Fingerprints 2D-FPT Dragos Horvath, Fanny Bonachera UMR 8576 CNRS Université des Sciences & Technologies.

Example: Example: Binary Pharmacophore TriBinary Pharmacophore Tripletsplets

33 33

33

33

66

77

44

33 44

44

33 55

Hp3-H

p3-Hp3

Hp3-H

p3-Hp3

Hp3-H

p3-Hp4

Hp3-H

p3-Hp4

Hp3-H

p3-Hp5

Hp3-H

p3-Hp5

…… Ar4-H

p3-Hp4

Ar4-H

p3-Hp4

Ar4-H

p3-Hp5

Ar4-H

p3-Hp5

…… …… …… …… Hp7-A

r4-PC6

Hp7-A

r4-PC6

……Hp3-H

A5-A

r5

Hp3-H

A5-A

r555

55 33

0 0 0 … 0 0 … … 1 … … … 0 … … 0 …

Basis Basis TripletsTriplets::• all possible feature combinationsall possible feature combinations• at a given series of distances…at a given series of distances…

Hp4-H

A5-A

r5

Hp4-H

A5-A

r5

55

55 44

??

Pickett, Mason & McLay, J. Chem. Inf. Comp. Sci. 36:1214-1223 (1996)

………… ……

Page 6: Topological, pH-dependent Fuzzy Pharmacophore Triplet Fingerprints 2D-FPT Dragos Horvath, Fanny Bonachera UMR 8576 CNRS Université des Sciences & Technologies.

First key improvement: First key improvement: Fuzzy Fuzzy mapping of mapping of atom triplets onto basis triplets in 2D-FPTatom triplets onto basis triplets in 2D-FPT

33 33

33

44

66

77

44

33 44

55

55 33

0 0 0 … 0 0 … +6 … … +3 … … … … 0 …

55

55 44

Hp3-H

p3-Hp3

Hp3-H

p3-Hp3

Hp3-H

p3-Hp4

Hp3-H

p3-Hp4

Hp3-H

p3-Hp5

Hp3-H

p3-Hp5

…… Ar4-H

p3-Hp4

Ar4-H

p3-Hp4

Ar4-H

p3-Hp5

Ar4-H

p3-Hp5

…… ………… …… Hp7-A

r4-PC6

Hp7-A

r4-PC6

……Hp3-H

A5-A

r5

Hp3-H

A5-A

r5

Hp4-H

A5-A

r5

Hp4-H

A5-A

r5

………… ……

Di(m) = total occupancy of basis triplet i in molecule m.

Page 7: Topological, pH-dependent Fuzzy Pharmacophore Triplet Fingerprints 2D-FPT Dragos Horvath, Fanny Bonachera UMR 8576 CNRS Université des Sciences & Technologies.

Combinatorial enumeration of basisCombinatorial enumeration of basis tripletstriplets• Example: there are 36796 basis triplets, verifying triangle

inequalities, when considering 6 pharmacophore types and 11 edge lenghts between Emin=3 to Emax=13 with an increment of Estep=1: (3, 4, 5,…13)– Canonical representation: T1d23-T2d13-T3d12 with T3≥T2≥T1

(alphabetically).

44

66

77

Hp7-Ar4-PC6

Ar4-Hp7-PC6

– Out of two corners of a same type, priority is given to the one opposed to the shorter edge.

44

66

77

Ar4-Hp7-Hp6

Ar5-Hp6-Hp7

Page 8: Topological, pH-dependent Fuzzy Pharmacophore Triplet Fingerprints 2D-FPT Dragos Horvath, Fanny Bonachera UMR 8576 CNRS Université des Sciences & Technologies.

TriTripletplet matching pmatching procedurerocedure

• The triplet matching score represents the optimal degree of pharmacophore field overlap:– if corner k of the triplet is of pharmacophore type T, e.g. F(k,T)=1,

then it contributes to the total pharmacophore field of type T, observed at a point P of the plane:

)exp(),()(2

,

3

1Pk

kTTdTkFP

Horvath, D. ComPharm pp. 395-439; in "QSPR /QSAR Studies by Molecular Descriptors", Diudea, M., Editor, Nova Science Publishers, Inc., New York, 2001

Page 9: Topological, pH-dependent Fuzzy Pharmacophore Triplet Fingerprints 2D-FPT Dragos Horvath, Fanny Bonachera UMR 8576 CNRS Université des Sciences & Technologies.

Control parameters for tControl parameters for tririplet enumerationplet enumeration & & mmatchingatching in two 2D-FPT versions. in two 2D-FPT versions.

Parameter Description FPT-1 FPT-2

Emin Minimal Edge Length of basis triangles (number of bonds between two pharmacophore types)

2 4

Emax Maximal Triangle Edge Length of basis triangles 12 15

Estep Edge length increment for enumeration of basis triangles 2 2

e Edge length excess parameter: in a molecule, triplets with edge length > Emax+e are ignored

0 2

Maximal edge length discrepancy tolerated when attempting to overlay a molecular triplet atop of a basis triangle.

2 2

Hp = Ar

Gaussian fuzziness parameter for apolar (Hydrophobic and Aromatic) types

0.6 0.9

PC = NC

Gaussian fuzziness parameter for charged (Positive and Negative Charge) types

0.6 0.8

HA = HD

Gaussian fuzziness parameter for polar (Hydrogen bond Donor and Acceptor) types

0.6 0.7

l Aromatic-Hydrophobic interchangeability level 0.6 0.5

Number of basis triplets at given setup 4494 7155

Page 10: Topological, pH-dependent Fuzzy Pharmacophore Triplet Fingerprints 2D-FPT Dragos Horvath, Fanny Bonachera UMR 8576 CNRS Université des Sciences & Technologies.

Second key improvement: Second key improvement: Proteolytic Proteolytic equilibrium dependence of 2D-FPTequilibrium dependence of 2D-FPT

Ar5-N

C5-

PC8

Ar5-N

C5-

PC8

Ar8-N

C8-

PC8

Ar8-N

C8-

PC8

?12%

88%

Page 11: Topological, pH-dependent Fuzzy Pharmacophore Triplet Fingerprints 2D-FPT Dragos Horvath, Fanny Bonachera UMR 8576 CNRS Université des Sciences & Technologies.

Third key improvement: Third key improvement: a novel similarity a novel similarity scoring scheme for 2D-FPTscoring scheme for 2D-FPT

• Classical Euclidean and Hamming distances increase whenever k(m,M)=|Dk(M)-Dk(m)| >0…– pairs of small & simple molecules (m,m’), with

Dk(m)=Dk(m’)=0 for almost all the triplets k, have few non-zero contributions

– large & complex compounds (M,M’) with common, but slightly differently populated triplets Dk(M)Dk(M’) have many small contributions that may nevertheless sum up to higher Euclidean scores!

• With correlation coefficients, the importance of common triplets, contributing to the cross-product Dk(m)xDk(M) may be overemphasized…

Page 12: Topological, pH-dependent Fuzzy Pharmacophore Triplet Fingerprints 2D-FPT Dragos Horvath, Fanny Bonachera UMR 8576 CNRS Université des Sciences & Technologies.

Piecewise monitoring of the differences in the Piecewise monitoring of the differences in the fingerprint…fingerprint…

• A triplet k may, with respect to a pair of molecules, be shared (++), null (--) or exclusive (+-)– fuzzy levels of association to each category c={(++),(--),(+-)}

such that ++(M,m) + +- (M,m) + -- (M,m) =1

• Specifically calculate, for each category c:– fractions of triplets fc in that category, – weighed, normed partial Hamming distances Wc:

fc

M ,m1

N

N T

kc

M , mT k 1

Wc

m ,Mk 1

NT

W k kc

m , M k m k M

k 1

N T

W k

•The FPT-specific dissimilarity score FPT(M,m):• the linear combination of fractions and partial Hamming

distances with optimal Neighborhood Behavior with respect to a

subset of training data

•The FPT-specific dissimilarity score FPT(M,m):• the linear combination of fractions and partial Hamming

distances with optimal Neighborhood Behavior with respect to a

subset of training data

F P T m , M 0.1323 W+ - m , M

0.6357 W+ + m , M

0.2795 1 f + + m , M

Page 13: Topological, pH-dependent Fuzzy Pharmacophore Triplet Fingerprints 2D-FPT Dragos Horvath, Fanny Bonachera UMR 8576 CNRS Université des Sciences & Technologies.

s

1.0

)()(

)()(

)(rand

FN

rand

FP

FNFP

NN

NNs

Neighborhood behavior: in how far does Neighborhood behavior: in how far does structural similarity guarantee similar activities? structural similarity guarantee similar activities?

(M,m) l (M,m)> l

(M

,m)

s

TrueTruePositivesPositives (TP) (TP)

FalseFalsePositivesPositives (FP) (FP)

False (?)False (?)NegativesNegatives

(FN) (FN)

True True NegativesNegatives

(TN) (TN)

)()(

)()(

)(rand

FN

rand

FP

FNFP

NN

NNs

opt

s

mMmM

mMmMs

),(),(

),(),()( )(

s

BioPrint® activity profile differences (m,M)

Page 14: Topological, pH-dependent Fuzzy Pharmacophore Triplet Fingerprints 2D-FPT Dragos Horvath, Fanny Bonachera UMR 8576 CNRS Université des Sciences & Technologies.

Specific metric significantly improves the Specific metric significantly improves the Neighborhood Behavior of 2D-FPT (v1)Neighborhood Behavior of 2D-FPT (v1)

0.8

0.82

0.84

0.86

0.88

0.9

0.92

0.94

0.96

0.98

1

0.4 0.45 0.5 0.55 0.6 0.65 0.7 0.75

Consistency

Opt

imal

ity

.

Sum of Heavy Atoms in Pair Dice-N Dice Dice-W FPT-1

Page 15: Topological, pH-dependent Fuzzy Pharmacophore Triplet Fingerprints 2D-FPT Dragos Horvath, Fanny Bonachera UMR 8576 CNRS Université des Sciences & Technologies.

.

Consistency inversion of specific FPT metric Consistency inversion of specific FPT metric may be due to top ranking of complex pairs!may be due to top ranking of complex pairs!

0.86

0.88

0.9

0.92

0.94

0.96

0.98

1

0.55 0.57 0.59 0.61 0.63 0.65 0.67 0.69 0.71 0.73

Consistency

Optim

alit

y

.

Dice FPT-1

Page 16: Topological, pH-dependent Fuzzy Pharmacophore Triplet Fingerprints 2D-FPT Dragos Horvath, Fanny Bonachera UMR 8576 CNRS Université des Sciences & Technologies.

Proteolytic equilibrium dependence significantly Proteolytic equilibrium dependence significantly improves the NB of 2D-FPTimproves the NB of 2D-FPT

0.84

0.86

0.88

0.9

0.92

0.94

0.96

0.98

1

0.4 0.45 0.5 0.55 0.6 0.65 0.7 0.75

Consistency

Optim

alit

y

.

2D-FPT using rule-based pharmacophore flagging strategy FPT-1

Page 17: Topological, pH-dependent Fuzzy Pharmacophore Triplet Fingerprints 2D-FPT Dragos Horvath, Fanny Bonachera UMR 8576 CNRS Université des Sciences & Technologies.

Some ‘activity cliffs’ in Some ‘activity cliffs’ in rule-based descriptor rule-based descriptor spacespace are smoothed out in are smoothed out in 2D-FPT-space2D-FPT-space

•Neutral

•Cation

•Neutral

•Anion

•Neutral

• 90%C

ation

•Neutral

• 50%C

ation

•Neutral

•Anion •Neutral

•Neutral

•Neu

tral

• 40%

Cat

ion

•Neu

tral

• 70%

Cat

ion

Page 18: Topological, pH-dependent Fuzzy Pharmacophore Triplet Fingerprints 2D-FPT Dragos Horvath, Fanny Bonachera UMR 8576 CNRS Université des Sciences & Technologies.

Neighborhood Behavior of 2D-FPT compares Neighborhood Behavior of 2D-FPT compares favorably to the one of other descriptors/metrics favorably to the one of other descriptors/metrics

0.8

0.82

0.84

0.86

0.88

0.9

0.92

0.94

0.96

0.98

1

0.4 0.45 0.5 0.55 0.6 0.65 0.7 0.75 0.8

Consistency

Optim

alit

y

.

Sum of Heavy Atoms in Pair CF FBPA PFR PF FPT-1 FPT-2

Page 19: Topological, pH-dependent Fuzzy Pharmacophore Triplet Fingerprints 2D-FPT Dragos Horvath, Fanny Bonachera UMR 8576 CNRS Université des Sciences & Technologies.

Successful Virtual Screening SimulationsSuccessful Virtual Screening Simulations

0

10

20

30

40

50

60

70

80

90

% R

etri

eved

Se

ed

Co

mp

oun

ds

Confirm ed Actives (PF) Confirm ed Inactives (PF)Confirm ed Actives (O PT 3) Confirm ed Inactives (O PT3)

0

5

10

15

20

25

30

35

40

45

50

0

5

10

15

20

25

30

35

40

45

0 20 40 60 80 100 120 140 160 180 200

Selection S ize

0

10

20

30

40

50

60

70

80

90

0

10

20

30

40

50

60

70

80

90

Confirm ed Actives (PF) Confirm ed Inactives (PF)Confirm ed Actives (O PT 3) Confirm ed Inactives (O PT3)Confirm ed Actives (PF) Confirm ed Inactives (PF)Confirm ed Actives (FPT -2) Confirm ed Inactives (FPT-2)

0

5

10

15

20

25

30

35

40

45

50

0

5

10

15

20

25

30

35

40

45

50

0

5

10

15

20

25

30

35

40

45

0 20 40 60 80 100 120 140 160 180 200

Selection S ize

0

5

10

15

20

25

30

35

40

45

0 20 40 60 80 100 120 140 160 180 200

Selection S ize

% R

etri

eved

Se

ed

Co

mp

oun

ds%

Ret

riev

ed S

ee

d C

om

pou

nds

0

1

2

3

4

5

6

7

Confirm ed Actives (PF) Confirm ed Inactives (PF)Confirm ed Actives (O PT 3) Confirm ed Inactives (O PT3)

0

1

2

3

4

5

6

7

8

0

10

20

30

40

50

60

70

80

90

0 20 40 60 80 100 120 140 160 180 200

Selection S ize

0

1

2

3

4

5

6

7

0

1

2

3

4

5

6

7

Confirm ed Actives (PF) Confirm ed Inactives (PF)Confirm ed Actives (O PT 3) Confirm ed Inactives (O PT3)Confirm ed Actives (PF) Confirm ed Inactives (PF)Confirm ed Actives (FPT -2) Confirm ed Inactives (FPT-2)

0

1

2

3

4

5

6

7

8

0

1

2

3

4

5

6

7

8

0

10

20

30

40

50

60

70

80

90

0 20 40 60 80 100 120 140 160 180 200

Selection S ize

0

10

20

30

40

50

60

70

80

90

0 20 40 60 80 100 120 140 160 180 200

Selection S ize

% R

etri

eved

Se

ed

Co

mp

oun

ds

% R

etri

eved

Se

ed

Co

mp

oun

ds

% R

etri

eved

Se

ed

Co

mp

oun

ds

D2

TK

Page 20: Topological, pH-dependent Fuzzy Pharmacophore Triplet Fingerprints 2D-FPT Dragos Horvath, Fanny Bonachera UMR 8576 CNRS Université des Sciences & Technologies.

Successful QSAR model construction with 2D-Successful QSAR model construction with 2D-FPTFPT: predicting c-Met TK activity: predicting c-Met TK activity

4

4.5

5

5.5

6

6.5

7

7.5

8

8.5

9

4 4.5 5 5.5 6 6.5 7 7.5 8 8.5 9

Calculated pIC50

Exp

erim

enta

l pI

C50

.

Learning Set Compounds Validation Set Compounds

25 variables entering nonlinear model153 molecules for training: RMSE=0.4 (log units), R2=0.8240 molecules for validation: RMSE=0.8 (log units), R2=0.538 validation molecules out of 40 mispredicted by more than 1 log

Page 21: Topological, pH-dependent Fuzzy Pharmacophore Triplet Fingerprints 2D-FPT Dragos Horvath, Fanny Bonachera UMR 8576 CNRS Université des Sciences & Technologies.

ChemAxon Tools used for development…ChemAxon Tools used for development…

• Software written in Java, based on the ChemAxon API:

– molecule input and standardization tools

– ShortestPath class used to calculate topological distances

– pKaPlugin used to enumerate all microspecies and their relative concentrations at given pH value

– PMapper used to set pharmacophore flag in each microspecies – using a customized .xml setup file that relies on the actual formal charges seen in the microspecies to set flags

– JChem used for 2D-FPT storage

– Marvin visualizer adapted to display actual occurrences of triplets in molecules

Page 22: Topological, pH-dependent Fuzzy Pharmacophore Triplet Fingerprints 2D-FPT Dragos Horvath, Fanny Bonachera UMR 8576 CNRS Université des Sciences & Technologies.

In progress & on the wishlist…In progress & on the wishlist…

• 3D FPT version under study

– does it pay off to generate conformers? How many would you need to get better results than with 2D-FPT? What’s the best conformational sampler to use?

• Accessibility-weighted fingerprints?

– class to return (topological and/or 3D) estimate of the solvent-accessible fraction of an atom?

• Tautomer-dependent fingerprints?

– if tautomers and their percentage were enumerated like any other microspecies…