Top Banner
Introduction Shape Signatures 1 is a method for compactly encoding the shapes of molecules (or receptor sites), and also their electrostatic properties. The technique uses ray- tracing to explore the volume enclosed by a molecular surface. We begin with the triangulated solvent- accessible molecular surface 2 , pick a point on the surface at random to initiate the ray, and then allow it to propagate by the laws of optical reflection. Fig. 1 illustrates ray propagation, and Fig. 2 shows a ray- trace for a small molecule, in this case a protease inhibitor. The “raw data” of a ray-trace is a collection of line segments connecting reflection points. From this information, we derive probability distributions, expressed as histograms, which encode information about the shape of the molecule. The simplest of these shape signatures is just the distribution of observed segment lengths. Fig. 3 shows this 1D signature for the protease inhibitor. ( “1D” indicates a one-dimensional
16

Introduction Shape Signatures 1 is a method for compactly encoding

Jan 03, 2016

Download

Documents

sopoline-abbott

Introduction Shape Signatures 1 is a method for compactly encoding - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Introduction Shape Signatures 1  is a method for compactly encoding

Introduction

Shape Signatures1 is a method for compactly encodingthe shapes of molecules (or receptor sites), and also their electrostatic properties. The technique uses ray-tracing to explore the volume enclosed by a molecular surface. We begin with the triangulated solvent-accessible molecular surface2, pick a point on the surface at random to initiate the ray, and then allow it to propagate by the laws of optical reflection. Fig. 1 illustrates ray propagation, and Fig. 2 shows a ray-trace for a small molecule, in this case a protease inhibitor.

The “raw data” of a ray-trace is a collection of line segments connecting reflection points. From this information, we derive probability distributions, expressed as histograms, which encode information about the shape of the molecule. The simplest of these shape signatures is just the distribution of observed segment lengths. Fig. 3 shows this 1D signature for the protease inhibitor. ( “1D” indicates a one-dimensional domain for the histogram, here the segment length.)

Signatures converge rapidly as the number of segments increases, and are independent of the starting point for the ray-trace.

Page 2: Introduction Shape Signatures 1  is a method for compactly encoding

θl1 l2

Indinavir

100 reflections 10000 reflections

Fig. 1. Reflection from a single surface element. The normal vector to the surface bisects the reflection angle

Fig. 2. Ray-traces for the HIV protease inhibitor indinavir, with 100 and 10,000 reflections. As the number of reflections increases, the volume of the molecule is densely filled with rays.

Page 3: Introduction Shape Signatures 1  is a method for compactly encoding

Indinavir length signature (100000)

0

1000

2000

3000

4000

5000

6000

7000

8000

9000

10000

0 1 2 3 4 5 6 7 8 910 11 12 13 14 15 16 17 18 19 20

(Sulfurous acid)

(Aziridine)

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

00.5

11.5

22.5

33.5

44.5

5

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

00.5

11.5

22.5

33.5

44.5

5

D = 0.082

Fig. 3. 1D shape signature for Indinavir

Fig. 4. Comparing signatures for two compounds. Here two small compounds with similar shape but very different connectivity are compared.

Page 4: Introduction Shape Signatures 1  is a method for compactly encoding

By computing the molecular electrostatic potential (MEP) at each reflection point, we can extend the signatures to include information about both shape and polarity. Fig. 5 shows a 2D-MEP signature for the protease inhibitor. Here the vertical axis represents the MEP measured at a reflection point, the horizontal axis the sum of the segment lengths on either side of a reflection, and the color coded distribution is the probability for simultaneously observing given values of these two parameters at a reflection point.

a

MolecularSurfaceSegment 1Segment 2ReflectionPoint

Atom -0.3

-0.25

-0.2

-0.15

-0.1

-0.05

0

0.05

0.1

0.15

0.2

Length (Å)

MEP

140-150

130-140

120-130

110-120

100-110

90-100

80-90

70-80

60-70

50-60

40-50

30-40

20-30

10-20

0-10

Fig. 5. (a) MEP is computed at the reflection point between segments 1 and 2. (b) 2D-MEP signature for the HIV protease inhibitor indinavir.

(a) (b)

Page 5: Introduction Shape Signatures 1  is a method for compactly encoding

Database Searching

Shape Signatures are compared using simple metrics. One approach, shown in Fig. 4, is to merely take the sum of the differences between histogram heights, measured at corresponding “bins” in the domain. The smaller this distance between two signatures, the greater the similarity we expect to observe in the molecules the signatures represent.

Our strategy is to “augment” a chemical database with shape signatures, and to then screen the database for compounds of interest by comparing the signature of a query compound against all the molecules in the library. While generating the signatures involves a significant computational expense, comparing them is computationally trivial, and the generation step need be done only once for a given database!

As an initial test, we compared each compound in the Tripos3 small-molecule database against all the other compounds in the database using the simple 1-D signatures. This database is chemically diverse, containing amino acids, carbohydrates, heterocycles, fatty acids, etc. Fig. 6 illustrates the power of the method in discriminating among molecules on the basis of shape.

Page 6: Introduction Shape Signatures 1  is a method for compactly encoding

Query Name Hit Names Query Structure &Signature

Hit Structure &Signature

1,2,3,4-tetrahydroiso-quinoline

chroman1,2,3,4-tetrahydroquinolineisochroman1,2,3,4-tetrahydronaphthaleneisochromeneindandihydrophenanthrene2,3-dihydrobenzofuran1-benzothiophene

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0 1 2 3 4 5 6 7 8 0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0 1 2 3 4 5 6 7 8

1,3-cycloheptadiene

1,8-dihydroazocinecyclohexenetetrahydroazepinechroman1,2,3,4-tetrahydroquinolinecycloheptene3-Phenylglycidol(2S,3S)_arco1,3-cyclohexadienecyclohexane_(boat) 0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

00.5

11.5

22.5

33.5

44.5

55.5

66.5

7

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

00.5

11.5

22.5

33.5

44.5

55.5

66.5

7

5H-dibenz[b,f]azepin

5a-gonan-3-one5(6)-gonen-17-one5a-gonane5a-gonane-3,17-dione4-gonene4-gonen-3-onetrans_decalinvitamin_D31,4-gonadien-3-one

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0 1 2 3 4 5 6 7 8 910 11 12

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0 1 2 3 4 5 6 7 8 910 11 12

Query Name Hit Names Query Structure &Signature

Hit Structure &Signature

Laurate(C12) Myristate(C14)Palmitate(C16)Stearate(C18)Palmitoleate(C16)Oleate(C18)Arachidate(C20)LysineLignocerate(C24)Methionine

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

01.5

34.5

67.5

9

10.512

13.515

16.518

19.5

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

01.5

34.5

67.5

9

10.512

13.515

16.518

19.5

Alpha-D-glucopyranose

Beta-D-galactopyranosebeta-D-mannopyranosealpha-D-mannopyranosealpha-D-galactopyranose3-Phenylglycidol(2R,3R)_arcobeta-D-glucopyranose

0

0.02

0.04

0.06

0.08

0.1

0.12

0 1 2 3 4 5 6 7 8 90

0.02

0.04

0.06

0.08

0.1

0.12

0 1 2 3 4 5 6 7 8 9

Cyclohexane(boat)

Cyclohexane(chair)PiperidineCyclohexeneTetrahydroazepineCyclohepteneHexahydroazepineThiomorpholineCyclohexane(twisted_boat)tetrahydrothiopyran

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

00.5

11.5

22.5

33.5

44.5

55.5

66.5

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

00.5

11.5

22.5

33.5

44.5

55.5

66.5

Fig. 6. Comparison of Tripos small molecule database against itself (selected results)

Page 7: Introduction Shape Signatures 1  is a method for compactly encoding

Application to Estrogenic Compounds

It is now widely recognized that chemical compounds that can mimic the biological effects of sex hormones can pose significant hazards to the health of both humans and wildlife4. These endocrine disruptors include as a subset estrogenic compounds, which can interact with estrogen receptors. There is increasing interest in the problem of quickly screening large chemical libraries for potential estrogen mimics5. Ideally, it would be possible to rapidly and accurately scan a given database for molecules with a high proabability of acting as estrogen mimics; once identified, candidate compounds could be subject to further scrutiny, including assays of biological activity.

A special problem is posed by the character of the estrogen receptor - it is promiscuous, interacting with compounds that feature no obvious structural similarity. It is our hypothesis that shape (and electrostatics) are better descriptors of estrogenicity than chemical structure. The shape signatures method is well-suited to searching chemical databases directly on the basis of shape and polarity.

Page 8: Introduction Shape Signatures 1  is a method for compactly encoding

Here we use shape signatures in “ligand-based” mode, scanning a large database for compounds similar in shape to known endocrine disruptors.

Our target is a large subset (115,000 compounds) of the NCI Database6. Coordinates for the compounds in the NCI library are supplied by Tripos, Inc. as part of the UNITY chemical database package. Shape signatures information has been generated for all compounds in NCI with molecular weight less than 800. Computations were carried out using the Beowulf cluster at the West Center for Computer-Aided Drug Discovery at the University of the Sciences in Philadelphia.

Queries

Our queries are four compounds known to be endocrine disruptors: 17--estradiol, coumestrol, DES, and tamoxifen.A selection of the top hits for 1D signatures searches are presented in Figs. 7-10. (Hits were ranked on the basis of distance between target and query signatures.)

Page 9: Introduction Shape Signatures 1  is a method for compactly encoding

Hit #1

59452-14-1

QUERY

17--estradiol

Hit #4(QUERY!)

50-28-2

Hit #7

16205-32-6

Hit #11

21513-89-3

Hit #15

547-81-9

Hit #24

82571-86-6

Hit #35

1740-19-8

Hit #36

2686-05-7

Fig. 7. Selected hits (from top 50), 17--estradiol as query, using 1D signatures

Page 10: Introduction Shape Signatures 1  is a method for compactly encoding

Hit #1

6316-25-2

QUERY

coumestrol

Hit #2(QUERY!)

479-13-0

Hit #5

520-28-5

Hit #7

29980-70-9

Hit #12

73460-18-3

Hit #15

23774-13-2

Hit #24

14191-22-1

Hit #31

67199-66-0

Fig. 8. Selected hits (from top 50), coumestrol as query, using 1D signatures

Page 11: Introduction Shape Signatures 1  is a method for compactly encoding

Hit #1

5465-75-8

QUERY

DES

Hit #3

3092-20-4

Hit #6

83456-29-5

Hit #10

21323-24-0

Hit #12

6321-89-7

Hit #15

5455-89-0

Hit #17

6960-48-1

Hit #24

2878-63-9

Fig. 9. Selected hits (from top 50), DES as query, using 1D signatures

Page 12: Introduction Shape Signatures 1  is a method for compactly encoding

Hit #19

316-07-4

Hit #14

6748-91-0

Hit #1

341-69-5

QUERY

tamoxifen

Hit #2

19142-68-8

Hit #4

65321-78-0

Hit #5

66421-87-2

Hit #7

85727-12-4

Hit #42

3733-63-9

Fig. 10. Selected hits (from top 50), tamoxifen as query, using 1D signatures

Page 13: Introduction Shape Signatures 1  is a method for compactly encoding

Comments on 1D search:

•If the query compound is present in the database, it is ranked close to the top, although it may not be the #1 hit. (Incomplete convergence of histograms? Sensitivity to conformation?)

•Compounds larger than the query, but which share a common motif, can be selected (Fig. 7/Hit #24) - also “rearrangements” of the query (Fig. 8/Hit #1)

•DES is present in the NCI database, but is NOT selected by the query. The structure in the Tripos-supplied version of NCI is incorrect, the phenol groups in cis arrangement about the central double bond, where they should be trans!

•Compounds are selected which are shape-similar to the query, but have distinctly different connectivity (Fig. 7/Hit #36).

•Shape signatures can effectively identify compounds on the basis of shape.

Page 14: Introduction Shape Signatures 1  is a method for compactly encoding

2D-MEP Searching

Searches were carried out against the NCI database using 2D-MEP signatures, with the same set of query compounds.

From our initial 1D search, it was clear that a hydroxyl on 17--estradiol was positioned with opposite orientation to the database hits, likewise for a hydroxyl on coumestrol. This makes little difference in the shape-only search, but is critical when electrostatics is included. These hydroxyl positions were modified prior to running the 2D search.

Results for the 2D-MEP search clearly show the influence of electrostatics. This is most dramatically seen for 17--estradiol, and and to a lesser extent for coumestrol; the top seven hits for each of these queries are shown on in Fig. 11. Compounds identified by the 2D-MEP search are similar to the queries both in polarity and size. In each case, the query finds itself as best hit.

Page 15: Introduction Shape Signatures 1  is a method for compactly encoding

Fig. 11. Best hits for two 2D-MEP searches17--estradiol:

Coumestrol:

Hit #1

50-28-2

#2

1090-04-6

#3

1630-83-7

#4

19882-03-2

#5

6301-88-8

#6

3597-38-4

#7

6301-87-7

QUERY!

Hit #1

479-13-0

QUERY! #2

55977-10-1

#3

80784-88-9

#4 #5

54108-08-6 6468-49-1

#6

6780-38-7

#7

1690-63-7

Page 16: Introduction Shape Signatures 1  is a method for compactly encoding

Conclusions

Shape Signatures promises to be a powerful tool for identifying molecules on the basis of shape and polarity. In our initial tests using estrogenic compounds as queries, searching with 1D signatures casts a “wider net”, selecting from the database compounds that are are shape similar to the query in all or in part. Searching with 2D-MEP signatures would appear to yield tighter selectivity, the result of screening on the basis of both shape and electrostatic potential.

References1. R. Zauhar, J. Fretz & W. Welsh, Shape signatures, a novel technique for ligand- and receptor-based molecular design, in preparation

2. SMART: A Solvent-Accessible Triangulated Surface Generator for Molecular Graphics and Boundary Element Applications, R.J. Zauhar, J. Comp-Aided Mol. Design., 9, 149-159 (1995).

3. Tripos, Inc., 1699 South Hanley Road, Saint Louis, MO

4. Kavlock, R.J.; Daston, G.P.; DeRosa, C.; Fenner-Crisp, P.; Gray, L.E.; Kaattari, S.; Lucier, G.; Luster, M.; Mac, M.J.; Maczka, C.; Miller, R.; Moore, J.; Rolland, R.; Scott, G.; Sheehan, D.M.; Sinks, T.; Tilson, H.A. Research needs for the risk assessment of health and environmental effects of endocrine disruptors: A report of the U.S. EPA-sponsored workshop. Environ. Health. Perspect. 1996, 104, 715-740.

5. Patlak, M. A testing deadline for endocrine disrupters. Environm. Sci. Technol. 1996, 30, 540A-544A.

6. NCI Database, Developmental Therapeutics Program, National Cancer Institute, National Institutes of Health, Bethesda, MD.