-
indian Journa l of Chemistry Vol. 42A, June 2003. pp. 1426-
1435
Use of distance-based topological indices in modeling
antihypertensive activity : Case of 2-aryl-imino-imidazolidines
Vijay K Agrawal'*, Sneha Karmarkar h, Padmakar V Khad ikar b
& Shachi Shrivastava" " QSAR and Computer Chemical
Laboratories, APS University, Rewa 486 003, India,
e-mai l: [email protected]
hResearch Division, Laxmi Fumigat ion and Pest Control Pv t.
Ltd., 3 Khatipura, Indore 452007, India.
and
Istavan Lukovits
Chemical Re~earch Center,Hungarian Academy of Sciences, H-1525
l3udapest P.O.B. 17. Hungary e-mail: lukovits@chemres. hu
Received 3 1 OClOber 2002
This paper describes topolog ical mode ling of antihypertensive
activity of 2-aryl-imino-illlidazolidines. A large pool of
distance-based topological indices consisting of W. B, X, J, Sz,
and log RB is initially used for this purpose. An excell ent
model is obtained in multi -parametric regression containing X
and logRB along with two indicator parameters.
A basic problem in drug design consists of finding a compound
sat isfy ing various constrains defined over a spectrum of chemical
and biological properties. Although the problem of designing drugs
pervades much of pharmaceutical research, statisticians have yet to
become significantly involved in this important realm of
research.
In constructing graph theoretical schemes to "traditional"
quantitative structure-actIvIty relationship (QSAR) methods 1·5 one
must not be wary of using a complementary approach. Traditional
QSAR is usually based on a large number of empirical parameters
1-7. The graph theoretical approach involves (a rather small set
of) structural or graph invariants. In QSAR, one uses statistical
methods in order to select critical descriptors and demonstrate a
structure-activity correlation. In graph theory, one manipulates a
structure algebraically, using partial order and ranking based on
selected standards . Of course, graph theoretical descriptors also
yield structure-property or structure-acti vity correlations8•
IO.
Application of graph theory in QSAR covers a variety of topics,
from the study of various physicochemical data to biological
activity and toxicity including graph theoretical descriptors and
pattern recognition 1.10. The prime distinction between graph
theoretical schemes and traditional QSAR is
that the former is "structure explicit" whi le the latter is
"structure cryptic". The former uses well defined mathematical
invariants which have a direct structural interpretation while the
latter are mostly expressed in terms of properties that remain to
be interpreted structurally. The combination of topological QSAR
methodology with experience and intuition of experts in drug design
may result in a much more organized search for the novel drugs for
human, animal, and plant therapy.
It has been known for some time that certain invariants of
molecular graph - usually referred to as topological indices - can
be used to demonstrate QSAR in pharmacology . One such index is the
Wiener index (W) introduced 50 years ago by the American chemist
Harry Wienerll. W is now considered to represent a measure of
compactness of the molecu les. However, only recent ly its relation
with the molecular van der Waals area has been demonstrated 12
.
Recent ly, a new topological index introduced by Gutman 13. 14
was named as Szeged index 14 and
abbreviated as Sz. This index is considered as a modification of
W for cyclic graphs. Some of the properties of Sz are reported in
the literature l). 17 but very few applications of Sz in QSAR
studies are known I8.22 .
-
AGRAWAL eI 0/: USE OF DISTANCE-BASED TOPOLOGICAL INDICES IN
MODELING 1427
Nowadays, the usual practice in developing QSAR models is to
initially use a large set of topological indices . If needed,
additional structure related molecular descriptors and/or indicator
parameters can also be used. [n the present study we have,
therefore used six distance-based topological indices: Wiener (W)-,
branching (B)-, first-order connectivity (x)-Balaban (1)-, Szeged
(Sz)- and logRB indices. In addition we have also used initially a
set of four indicator parameters, the details of which are given in
the next section23.
Materials and Methods (i) Definitions of Wiener (W) , Szeged
(Sz), Balaban (J) alld the molecular connectivity indices (X)
Let G be the usual, hydrogen atom depleted, graph representation
of the molecule under consideration24 . Hence, G is a connected
graph without directed and multiple edges and without loops. V(G)
and E(G) denote the vertex and edge sets of G, respectively . If c
is an edge of G, connecting the vertices u and v, then e =uv. The
number of vertices of G is denoted by iGi .
The distance between the vertices of G is defined as usual 24 :
the distance d(u, viG), between two vertices u and v of G is equal
to the length of the shortest path connecting these vertices.
Wiener index(W) The Wiener index (W)II of a graph G is just
the
sum of distances of all pairs of vertices of G:
W=W(G)=1/2 ~ ~ d(v,uiG) VE V(G)UE V(G)
=I12~ d(viG) VEV(G)
... (1)
where, d(viG) is called the distance number of vertex v and is
defined as below.
d(viG)= ~ d(v,uiG) uE V(G)
Szeged Index (Sz)
... (2)
Let e = UVE E(G).Then we define two subsets of vertex set of G
as follows:
N l,eIG) = {x E;: V(G)ld(x, uiG) > d(x, viG)} N2(eiGj = {{(E
V(G)id(x, uiG) > d(x, viG)}
(3)
(4)
The number of elements in set NI(e iG) and N2(e iG) are denoted
by nl(e iG) and n2(e iG),
respectively . Thus, nl(e iG) counts the vertices of G lying
closer to vertex u than to v. The meaning of n2(e iG) is analogous.
The vertices equidi stant from both ends of the edge uv belong
neither to N l(eiG) nor to N2(eiG)
The Szeged index l3.14 of the graph G is the sum of all edge
contributions:
Sz(G) = Sz = ~ niCe iG) n2(e iG) eEE{G)
Balaban index (J)
.. . (5 )
The Balaban index, J (the average distance sum connectivity
index) is defined25. 26 by :
. . . (6)
where M is the number of bonds in a graph G, Il is the
cyclomatic number of G and dj 's(i=1,2,3, ... ,N) are the distance
sums (distance degrees) of atoms in G such that
N
dj = I(O\ ... (7) j = 1
The cyclomatic numbeI: Il of G indicates the number of
independent cycles in G and is equal to the minimum number of cuts
(removal of bonds) necessary to convert a polycylic structure into
an acyclic structure:
Il=M-N+l . .. (8)
One way to compute the Balaban index (J) for hetero-system is to
modify the elements of the distance matrix for hetero-system as
follows: (i) The diagonal elements:
... (9)
where Zc = 6 and Zj = atomic number of the given element.
(ii) The off-diagonal elements:
N
I(O)jj=dj = Ik, ... (10) j =1
where the summation is over all bonds . The bond parameter k, is
given by:
-
1428 INDIAN J CHEM, SEC. A, JUNE 2003
where br is the bond weight with values: 1 for s ingle bond, 2
for double bond, 1.5 for aromatic bond and 3 for triple bond.
Molecular connectivity indices (X) The connectivity index X =
X(G) of a graph G IS
defined by Randic3. 4. 27 as follows :
... (11)
where OJ and OJ are the valence of a vertex i and j, equal to
the number of bonds connected to the atoms i andj, in G.
In the case of helero-systems the connectivity is given in terms
of valence delta values OjV and 8/ of atoms i and j and is denoted
by Xv. This version of the connectivity index is called the valence
connectivity index and is defined3.4. 27 as:
. .. (12)
where the sum is taken over all bonds i-j of the molecule.
Valence delta values are given by the following expression:
0" = Z;v - Hi I Z-Z-l
I J
. .. ( 13)
where Zj is the atomic number of atom i, Zjv is the number of
valence electron of the atom i and Hi is the number of hydrogen
atoms attached to atom i.
Nowadays the connectivity and the valence connectivity indices
expressed by Eqs. (II) and (12) are termed as first-order
connectivity and first-order valence connectivity indices
respectively . Lower or higher order indices are also possible
which are defined analogously .
The branching index log RB has been calculated by the method as
described by Todeschini et al. 8· 10
(ii) Indicator parameters These are dummy parameters that are
sometimes
used to obtain better (i.e. statistically more significant) QSAR
models in mu ltivariate regression
analysis. In the present study we have used four such dummy
parameters (indicator parameters) lp" Ip2, Ip3 and Ip4 The
indicator parameter, Ip" is equal to one unit if a chloro-group is
present, otherwise its value is zero. If a methyl-group is present
the indictor is IP2 and is equal to one while in the absence of a
methyl group, Ip2 is zero. Ip3 and Ip4 are equal to one if two or
more methyl- or chloro-groups are present, respectively, otherwise
their values are zero.
(iii) Statistics M I · I . I ' 28 29 C I . u tiP e regressIOn
ana YSls ' lor corre atmg
antihypertensive activity of the compounds under present study
were done using Regress-l software complied by one of the authors
(IL).
Results The formulae of 2-aryl-imino-imidazolidines, their
antihypertensive activity (expressed as log lIEDso), and the
indicator parameters (lp" Ip2, Ip3 and Ip4) are listed in Table
1.
Table 2 shows the distance-based topological indices used in the
present study .
Table 1- 2-Ary l-imino-imidazolidines used in the present study,
the ir log( I/ED.lo)and indicator values.
Compd R log Ipi IP2 Ip3 Ip4 No. ( I/ED.lo)
2,6-Di-CI 2.14 0 0 2 2,4.6-Tri-Cl 1.41 0 0 3 2,3,-Di-CI 1.37 0 0
4 2,6-Di-CI,4-Me 1.22 0 I 5 2-CI ,6-Me 1.18 I 0 0 6 2,6-Di-Me 0.85
0 I I 0 7 2,4-Di-CI 0.68 0 0 I 8 2-CI,4-Me 0.68 I 0 0 9
2,4-Di-CI,6-Me 0.57 0 1 10 2,4-Di-M~,6-CI 0.52 I 1 0 \I 2,5-Di-CI
0.32 0 0 I 12 2-CI 0.15 0 0 0 13 2,6-Di-Me ,4-CI -0.04 1 \J
14 2-Me ,4-CI -0.05 I 0 0
15 2,4,6-Tri-Me -0.07 0 1 0
16 2,4-Di-Me -0.56 0 1 0
17 2-Me -0.61 0 I 0 0
18 H -2.10 0 0 0 0
/
-
AGRAWAL et at: USE OF DISTANCE-BASED TOPOLOGICAL INDICES IN
MODELING 1429
Table 2- Values of topological indices calculated for compounds
used in the present study
Compd. No. W X J Sz log RB
I 301 6.7709 1.8619 437 94.6795
2 365 7.4617 1.8830 534 113.8738
3 306 6.7709 1.8270 447 95.5756
4 365 7.1647 1.8830 534 113.8738
5 301 6.7709 1.8619 437 94.6795
6 301 6.7709 1.8619 437 94.6795
7 313 6.7540 1.7867 461 96.8441
8 313 6.7540 1.7867 461 96.8441
9 365 7.1647 1.8830 534 113.8738
10 365 7.1647 1.8830 534 113.8738
II 308 6.7540 1.8156 451 96.0864
12 253 6.3602 1.7759 370 79.0361
13 365 7.1647 1.8830 534 113.8738
14 313 6.7540 1.7867 461 96.8441
15 365 7. 1647 1.8830 534 113.8738
16 313 6.7540 1.7867 461 96.8441
17 253 6.3602 1.7759 370 79.0361
18 209 5.9495 1.6943 309 64.7790
Table 3--Correlation matrix for the inter-correlation of
structural descriptors and their correlation with the activity
log( IIED50) w X J
log( IIED50) I .()()()()
W 0.4844 I.()()()()
X 0.5576 0.9930 I.()()()()
J 0.6704 0.8522 0.9029 I.()()()()
Sz 0.4662 0.9995 0.9887 0.8346
logRB 0.5015 0.9993 0.9962 0.8707
Ipi 0.6165 0.3819 0.3917 0.3291
Ip2 -0. 1216 0.3936 0.3904 0.3344
Ip3 -0.1906 0.3756 0.3751 0.3544
1124 0.5803 0.3024 0.3141 0.2936
Correlations between the aforementioned molecular descriptors
and antihypertensive activity (log I/EDso) are given in Table
3.
Table 4 · records the statistical parameters and ____ Quality of
various statistically significant uni- and . ..........
multiv~ate regression equations. Table ~oLlects different
significant models. The esti~ated .antihypertensive activities
values
(log l1ED50) obtained fron... most significant QSAR models are
presented in Tabk~ 6 and are compared with observed values.
Finally, Fig. 1 displays the correlation between observed and
estimated (obtained ·by using the most significant correlation
expressions) antihypertensive activities (log lIEDso)·
Sz logRB Ipi Ip2 Ip3 Ip4
I.()()()()
0.9976 I.()()()()
0.3805 0.3821 I.()()()()
0.3939 0.3919 -0.2403 I.()()()()
0.3738 0.3775 -0.4462 0.4947 I.()()()()
0.2991 0.3060 0.4947 -0.5325 -0.4947 I.()()()()
Discussion The data in Table 1 shows that degeneracy exists
in
the antihypertensive activity (log IIED50). The data in Table 2
also indicate that similar type of degeneracy exists in the
distance-based topological indices. The degeneracy in these indices
is obvious because these indices belong to the first generation
topological indices as described by Balaban30•
It is worth mentioning that the magnitude of all topological
indices used increases if the molecule becomes (through
substitution) bigger. Mono-substituted compounds exhibit lower
while tri-substituted compounds exhibit greater values of these
indices.
-
1430 IND IAN J CHEM. SEC. A. JUNE 2003
Table 4--Regress ion parameters and quality or the proposed
model s
Compd. Parameters Ai (8) (Se) (R2) (R) F- Q Prob. No. used i =
1.2.3,4 Ratio (RISe)
I. J 11.6041(±3.2 105) -20.7965 0.7312 0.4495 0.6704 13.064
0.9168 2.327xlO-3
2. X 21.8397(±4.6460) - 100.5X36 0.559lJ 0.6973 0.8351 17.280
1.4915 1.280x 104
logRB -0.4847(±0.1108) 3. X 1.0620(±0.5653) -7.5071 0.7210
0.4981 0.7058 7.444 0.9789 5.681 x 10-
3
Ipi 0.9753(±0.4124 ) 4. X 2.0300(±0.5932) - 12.9252 0.7570
0.4467 0.6684 6.056 0.8829 0.0118
IP2 -0.7628(±0.3976) 5. X I. I 842(±0.5506) -7.9678 0.7246
0.4931 0.7020 7.295 0.9688 6.124x I0-
J
Ip4 0.8567(±0.3690) 6. J 9.0756(±2.8973) -16.8373 0.6231 0.6252
0.7907 12.511 1.2690 6.359x I0-4
Ipi 0.9208(±0.3472) 7. J 14.6081 (±2.7900) -26.0082 0.5942
0.6592 0.8 119 14.507 1.3664 3. 117xlO-l
Ip3 1.0 I 58(±0.3344 )
8. Log RB 0.0211(±0.0139) -2.3944 0.7459 0.4628 0.6803 6.462
0.91 20 9.457xlO-3
Ipi 1.0320(±0.4248) 9. Log RB 0.0440(±0.0 151) -3.4594 0.8072
0.3710 0.6091 4.425 0.7546 0.039
IP2 0.7 I 63(±0.4242)
10. Log RB 0.0453(±0.0 144) -3.7746 0.7753 0.4198 0.6479 5.426
0.8357 0.0169 Ip3 0.1190(±0.4406)
II. Log RB 0.0242(±0.0 136) -2.3047 0.7531 0.4525 0.6727 6.199
0.8932 0.0101 Ip4 0.8976(±0.3824)
12. w 0.0062(±0.0044 ) -2.2767 0.7530 0.4526 0.6728 6.201 0.8935
0.0109 lpi 1.0479(±0.4288)
13. W 0.3584(±0.1134) -2.2962 0.6976 0.5302 0.7282 8.465 1.0439
3.462xI0-J
Sz 0.2389(±0.0778) 14. w 0.0072(±0.0043 ) -2. 1977 0.7604 0.4418
0.6647 5.936 0.8741 0.0126
IP4 0.9100(±0.3857) 15. Sz 0.0039(±0.0030) -2. 1587 0.7597
0.4428 0.6654 5.960 0.8759 0.0125
Jp l 1.0650(±0.4323) 16. Sz 0.0047(±0.0030) -2.0870 0.7679
0.4308 0.6563 5.676 0.8547 0.0146
1p4 O. 9228(±0.3891) 17. w 0.3428(±0.0894 ) -1.8070 0.5492
0.7282 0.8533 12.502 1.5537 3.006xlO-4
Sz -0.2309(±0.06 13) lp i 0.9997(±0.3130)
18. w 0.3296(±0.0956) -1.7925 0.5845 0.6922 0.8320 10.493 1.4234
7.034xlO-4 Sz -0.2210(±0.0654) Ip4 0.8088(±0.2980)
19. w 0.0053(±0.0042) -2 .0113 0.7208 0.5319 0.7293 5.303 1.0118
0.0119 Ip i 0.7464(±0.4547) ---- -------Ip4 0.6237(±0.4051 ) QL 20.
Sz 0.0033(±0.0029) 1.8956 0.7268 0.5240 0.7239 5.137 0.9960 Ip i
0.7591 (±0.4585) Ip4 0.631O(±0.4083)
21. X 27.6660(±8.0 179) -85 .7528 0.5594 0.71 80 0.8474 11 .883
1.5148 3.865xlO-4
J -27 .0832(±13.27 1 I ) Sz -0. 1140(±0.0320)
22. X -1.4589(±0.0424 ) -21.0851 0.604 1 0.6712 0.8 19:' 9.527
1.3562 1.10l x 10-3
J 16.7831 (±6.1820) Ip i 1.0337(±0.3462) COIlId.
-
AGRAWAL elal: USE OF DISTANCE-BASED TOPOLOGICAL I DICES IN
MODELING 1431
Table 4--Regression paramctcrs and quality of the propos.:d
mociels-Collici.
Compd. No.
Parameters used
Ai (B) (Sc) i = 1.2,3,4
(R) F-Ratio
Q (J
-
1432
Model No.
I
2 3 4 5. 6. 7. 8. 9. 10. II. 12. 13. 14. 15. 16. 17. 18. 19. 20.
21. 22. 23. 24. 25. 26. 27. 28. 29. 30. 3 1. 32.
INmAN J CHEM, SEC. A. JUNE 2003
Table 5--Various correlation models and their qualities of
correlations
Regression expression
log(ED50) = 11.6041(±3.2 105)J-20.7965 log(ED50)= 21
.8397(±4.6460)X+-O.4847(±o.1 108)log RB+ 100.5836 log(ED50)=
1.0620(±o.5653)x+O.9753(±o.4124)lpl +7.5071 log(ED50)=
2.0300(±o.5932)x-0.7628(±o.3976)lp2-12.9252 log(ED50)= I. I
842(±o.5506)X+O.8567(±o.3690)1p4-7 . 9678 log(ED50)= 9.0756
(±2.8973)J+O.9208(±o.3472)lp 1-16.8373 log(ED50) = 14.6081
(±2.7900)J + 1.0 I 58(±o.3344 )lp3+26.0082 log(ED50) =0.021 I
(±o.0139)log RB+(±)1P1-2.3944 log(ED50)= O.044O(±o.O 151 )Iog
RB+O.7163(±o.4242)lp2-3.4594 log(ED50)= 0.0453(±o.0144)log
RB+O.119O(±o.4406)lp3-3.7746 log(ED50)= 0.0242(±o. 136)log
RB+O.8976(±o.3824)1p4-2.3047 log(ED50)= 0.0062(±o.0044)W+
1.0479(±o.4288)lpl +2.2767 log(ED50)=O.3584(±o. II34)W+O.2389
(±o.0778)Sz-2.2962 log(ED50)=O.0072(±o.0043)W+O.9100
(±o.3857)1p4-2.1977 log(ED50)= 0.0039(±o.0030)Sz+ 1.0650(±o.4323)lp
1-2.1587 log(ED50)= 0.0047(±o.0030)Sz+O.9228(±o.389I )1p4-2.0870
log(ED50)= 0.3428(±o.0894)W -O.2309(±o.0613)Sz+O.9997(±o.3130)lp 1-
1.8070 log(ED50)= 0.3296(±o.0956)W-0.22I
O(±o.0654)Sz+O.8088(±0.2980)1p4- I. 7925 log(ED50)=
0.0053(±o.0042)W+O.7464(±o.4547)IP I +O.6237(±o.405I )1p4-2.0 113
log(ED50)= 0.0033(±o.0029)Sz+O.759 I (±o.4585)lP I
+o.6310(±o.4083)1p4+ 1.8956 log(ED50)=
27.666O(±18.0179)X-27.0832(±13.2711)J-0.114O(±o.0320)Sz-85.7528
log(ED50)= -1.4589(±o.0424)X+ 16.7831 (±6.1820)J+
1.0337(±o.3462)lpl -21.085I log(ED50)=
11.3886(±2.2930)X-0.0532(±o.0 I I 7)Sz+O.9 I 44(±o.27 I O)lp
1-53.21 02 log(ED50)=20.1589(±3.5781 )X+O.4553(±o.0849)log
RB+O.8426(±o.2456)1 P 1-92.6412 log(ED50)=21
.8309(±3.7442)X-0.4745(±0.0893)log RB-0.7I 52(±o.237I )lp2-1 0
1.0870 log(ED50)=21.6818(±3.2809)X-O.4698(±o.0783)log RB+O.90 II
(±o.224 7)lp3-1 00.71 97
log(ED50)=20.4524(±3.5339)X-O.46OO(±o.0840)log
RB+O.7572(±o.2163)1p4-93.8642 log(ED50)=21 .711
0(±2.8883)x-O.4667(±o.0689)log RB-0.4507(±o.2003)lp2+O.702
(±o.2166)lp3- 1 01.0070 log(ED50)=1
9.6818(±2.9852)X-0.4466(±o.0708)log RB+O.5884(±o.2254)lp I
+O.5372(±o.2004)Ip4-·90.2705 log(ED50)=22.2253(±6.5808)X-18.8563(±
I 0.8269)J-0.0944(±o.026I )Sz+0.80 17(±o.2613)lp 1-73.38 11
log(ED50)=20.6144(±3.2623)X-0.4568(±o.0772)log RB+O.6114(±o.2520)lp
1-0.4564(±o.2303)Ip2+95.1417
log(ED50)=O.3282(±o.0833)W+O.2214(±o.057I )Sz+O.7497(±o.3212)lp I
+0.5211 (±0.2874)1p4-1.6047
mono-parametric models are not adequate for modeling the
pharmaceutical activity.
At this stage we would like to emphasize that the use of
correlation coefficient, R, or coefficient of determination, R2, as
the sole criterion for a quality of regression is deficient and can
be misleading. Hence, the conclusions based on R or R2 have to be
taken with due reservation. It is desirable to verify such
correlation with some other statistical criteria. One such
criterion is the standard error of estimation (Se) and/or F-ratio.
On the basis of R and Se a quantity named as quality factor (Q) is
proposed in the literature31 , which is the ratio of correlation
coefficient (R) to the standard error of estimate (Se) viz., Q =
R/Se. We have, therefore used Q-values for describing the quality
of statistically significant correlations.
In view of the above fact we have used the maximum R2
improvement method28.29 to derive prediction models. This method
finds the "best" one variable model, the "best" two variable model
and so forth for the prediction of property/activity relationship.
Several models (combinations of variables) were examined to
identify combinations of variables with good prediction
capabilities. In all regression models developed in order to obtain
the most reliable results, we have examined a variety of statistics
associated with residues, i.e. the Wilks-Shapiro test for normality
and Cooks D-statistics for outliers28,29.
The regression parameters as well as quality of statistically
significant correlations are given in Table 4.
Table 4 shows that only the mono-parametric model is possible by
using the Balaban index (1):
-
AGRA WAL e/ al : USE OF DISTANCE-BASED TOPOLOGICAL INDICES IN
MODELING 1433
Table 6-Estimated log( IIED50) values using model-28 and -29 and
their comparison with the observed ones.
log 1/EDso = -20.7965 +1 1.6041 (±3 .2105)J . .. (14) n=18,
Se=0.7319, r=0.6704, F=13 .064, Q = 0.9168
Compd Obs No. log( lIED50)
I 2.140
2 1.410
3 1.370
4 1.220
5 1.180
6 0.850
7 0.680
8 0.680
9 0.570
10 0.520
II 0.320
12 0. 150
13 -0.040
14 -0.050
15 -0.070
16 -0.560
17 -0.610
18 -2. 100
E ~
-
1434 INDI AN J C HEM, SEC. A, JUN E 2003
common fit in all regression analysis in describing descriptors
that are highly inter-correlated. He further stated that by
discarding one of the descriptors that commonly duplicates another
we may be di scarding a descriptor that nevertheless carries useful
structural information in a way that does not parallel other descri
ptors.
Thus, following Randic33, we may say that in the referred
bi-parametric model containing X and log RB , their information
contents may be different. However, such unknown information
content is yet to be investigated. Randic claims that in spite of
high collinearity between X and log RB , the bi-parametric
regression equations can be considered stati stically justified.
Another resu lt in favor of thi s fi nding is that coefficients of
both X and log RB are considerably higher than respective standard
deviations and that such model(s) are considered stati stically
signi ficant.
Successive regression analysis resulted into several
three-parameter models (Table 4). Out of models containing X, log
RB , and Ip3 is found to be better than the bi-parametric model
discussed above. The best three-parametric model is :
log lIED50 = -100.7197 +2 1.6818(±3.2809) X -0.4698(±0.0783)
logRB -0.9011(±0.2247)Ip3 (16)
n=18, Se=0.3954, R=0.9269, F=28.467 , Q=2.3442
Once again thi s model also contains highly linearly correlated
X and 10gRB indices. However, its stati stical relevance is due to
the Randic arguments made above. In addition, this model contains
an indicator parameter Ip3 which accounts for the presence of
multiple -CH3 group in the drug moiety. The negative sign
associated with the coefficient of Ip3 term in the above model
shows that multi-substitution of methyl groups have adverse effect
on antihypertensive activity of the compounds.
Further, stepwise regression once again resulted into several
four-parameter models (Table 4), out of which the two model s
containing (i) X, log RB, Ipl, Ip4 and (ii) X, log RB , Ip2, Ip3 ;
respectively were fo und to be optimal :
log 11ED50= -90.2705+19.6818(±2.9852) X -0.4466(±0.0708)log RB
+0.5884(±0.2254)Ipl +0.5372(±0.2004)Ip4 ... (17)
n =18, Se=0.3557, R=0.9456, F=27.443, Q=2.6584
log lIED50= -101.0050+21.7110 (±12.8883) X -0.4667(±0.0689)log
RB
0.4507(±0.2003)lp2+0.7027(±6.2166)lp3 (18) n = 18, Se=0.348I ,
R=0.9480, F=28.8l4, Q=2.7234
The statistics involved show that thi s latter model (Eq.18) is
the most appropriate one for modeling the antihypertensive activity
. This model shows the dominating influence of methyl group In the
exhibi tion of antihypertensive activity of the compounds used.
Lookjng to the size of the sample we cannot attempt still higher
regression analysis. This is because there is thumb rule stating
that the number of descriptors to be used in providing the
statistically significant model should be at least one fourth of
the compounds involved in a set. In our ca')e there are 18
compounds, hence, at the most four descriptors can be used.
In order to confirm our findings we have estimated
antihypertensive activities from these two best models and compared
the results with observed activities. Such comparison is shown in
Table 6. The correlation between estimated activities and
experimental ones and the res idue i.e., the difference between the
observed and estimated activity supports our proposi tion that the
model expressed by Eq. 10 is the best. The predicti ve correlation
coeffic ient (rpred = 0.894 and 0.899) for the models expressed by
Eqs 17 and 18 respectively confirms our findings.
Conclusion From the above study, it may be concluded that
out
of the pool of distance-based topological indices X and 10gRB
are the most appropriate indices for modeling, monitoring, and
estimating antihypertensive activity of the compounds used.
Acknowledgement One of the authors (PVK) is highly obliged
and
thankful to Prof. Ivan Gutman for introducing him to the
fascinating field of Chemical Topology and Graph Theory .
References Kier L B & Hall L H, Advances in drug research,
(Academic Press, New York) 1992.
2 Chemical applications of topology and graph theory, edited by
R B King, (Elsevier, Amsterdam) 1983.
-
AGRAWAL et al: USE OF DISTANCE-BASED TOPOLOGICAL lNDICES IN
MODEUNG 1435
3 Kier L B & Hall L H, Molecular cOlUlectivity in
structure-activity reLationship, (Wiley, New York), 1986.
4 Kier L B & Hall L H, MoLecular connectivity in chemistry
and drug research, (Academic Press, New York), 1976.
5 Chemical applications of graph theory, edited by A T Balaban
(Academic Press, London) 1976.
6 Trinajstic N, Chemical graph theory, (CRC Press, Boca-Raton),
1983.
7 Trinajstic N, Chemical graph theory, (CRC Press, Boca -Raton),
1992.
8 Topological indices and related descriptors in QSAR and QSPR,
edited by J Devilliers & A T Balaban (Gordon & Breach,
Amsterdam) 1999.
9 Comparative QSAR, edited by J Devilliers (Tailor and Francis,
Washington), 1998.
IO Todeschini R & Consonni V, Handbook of moLecular
descriptors, (Wiley-VCH, Weinheim) 2000.
II Wiener H, J Am chem Soc, 69 (1947) 17. 12 Gutman I, Yeh Y N,
Lee S L & Luo Y L, Indian J Chem,
32A (1993) 651. I3 Gutman I, Graph Theory Notes New York, 27
(1994) 9. 14 Khadikar P V, Deshpande N V, Kale P P, Dobrynin A,
Gutman I & Domolor G, J chem InfComput Sci, 35 (1995)
547.
15 Gutman I, Popovic L, Khadikar P V, Karmarkar S, Joshi S &
M Mandloi, MATCH - COllIDI math comput Chem 35 (1997) 91.
16 Gutman I, Khadikar P V, Rajput P V & Karmarkar S, J Serb
chem Soc, 60 (1995) 759.
17 Gutman I, Khadikar P V & Khaddar T, MATCH-COIlIDI math
comput Chern, 35 (1997) 105.
18 Khadikar P V, Karmarkar S. Joshi S & Gutman L J Serb chem
Soc, 61 (1996) 89.
19 Karmarkar S, Karmarkar S, Joshi S, Das A & Khadikar P V,
J Serb chem Soc, 62 (1997) 227.
20 Agrawal V K & Khadikar P V, Bioorg med Chern 9(2001 )
3035.
21 Agrawal V K, Sharma R & Khadikar P V, Bioorg rned Chern.
IO (2002) 2993.
22 Agrawal V K, Sharma R & Khadikar P V, Bioorg rned Chern.
IO (2002) 3571.
23 Tinnermals P B & Vanzwieter P A, J rned Chem, 20 (1971)
1636.
24 Buckley F & Harary F, DistOllce in graphs.
(Addison-Wesley: Reading), 1990.
25 Balaban A. T. Chern Phys Lett., 89 (1982) 399. 26 Khadikar P
V, Sharma S, Sharma V, Joshi S, Lukovits I &
Kaveeshwar M, Bull Soc chern Belg. 106 (1997) 767. 27 Randic M,
J. Arn chem Soc. 97(1975) 6609. 28 Box G E B, Hunter W G &
Hunter J S. Statistics for
experiments, (Wiley, New York), 1978.
29 Chatterjee S, Hadi A S & Price B, Regression analysis by
examples, (Wiley, New York), 2000.
30 Balaban AT. Bonchev D, Mekenyan 0 , Charlon M & I Motoc
(Eds ), Steric effects ill drug design, (Akademic-Verlag, Berlin),
1983.
31 Pogliani L, Amino Acids, 6 (1994) 141. 32 Mandloi M, Sikarwar
A, Sapre N S, Karmark.ar S &
Khadikar P V, J chern InfCornpuJ Sci, 40 (2000) 57. 33 Randic M,
Croat chern Acta. 66(1993) 289.