Top Banner
In the name of GOD Basic Steps of QSAR/QSPR Investigations M.H. FATEMI Mazandaran University [email protected]
77

In the name of GOD Basic Steps of QSAR/QSPR Investigations M.H. FATEMI Mazandaran University [email protected].

Dec 19, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: In the name of GOD Basic Steps of QSAR/QSPR Investigations M.H. FATEMI Mazandaran University mhfatemi@umz.ac.ir.

In the name of GOD

Basic Steps of QSAR/QSPR Investigations

M.H. FATEMI

Mazandaran [email protected]

Page 2: In the name of GOD Basic Steps of QSAR/QSPR Investigations M.H. FATEMI Mazandaran University mhfatemi@umz.ac.ir.

QSAR

• Qualitative Structure-Activity Relationships• Can one predict activity (or properties in

QSPR) simply on the basis of knowledge of the structure of the molecule?

• In other, words, if one systematically changes a component, will it have a systematic effect on the activity?

Page 3: In the name of GOD Basic Steps of QSAR/QSPR Investigations M.H. FATEMI Mazandaran University mhfatemi@umz.ac.ir.

What is QSAR?

A QSAR is a mathematical relationship between a biological activity of a

molecular system and its geometric and chemical characteristics.

QSAR attempts to find consistent relationship between biological activity

and molecular properties, so that these “rules” can be used to evaluate the

activity of new compounds.

Page 4: In the name of GOD Basic Steps of QSAR/QSPR Investigations M.H. FATEMI Mazandaran University mhfatemi@umz.ac.ir.

Why QSAR?

The number of compounds required for synthesis in order to place 10 different groups in 4 positions of benzene ring is 104

Solution: synthesize a small number of compounds and from their data derive rules to predict the biological activity of other compounds.

Page 5: In the name of GOD Basic Steps of QSAR/QSPR Investigations M.H. FATEMI Mazandaran University mhfatemi@umz.ac.ir.
Page 6: In the name of GOD Basic Steps of QSAR/QSPR Investigations M.H. FATEMI Mazandaran University mhfatemi@umz.ac.ir.

QSXRQSXR

X=A Activity X=P Property

X=R Retention

X= bo+ b1D1+ b2D2+…..+ bnDn

bi regression coefficient

Di descriptors

n number of descriptors

Page 7: In the name of GOD Basic Steps of QSAR/QSPR Investigations M.H. FATEMI Mazandaran University mhfatemi@umz.ac.ir.

History

Page 8: In the name of GOD Basic Steps of QSAR/QSPR Investigations M.H. FATEMI Mazandaran University mhfatemi@umz.ac.ir.
Page 9: In the name of GOD Basic Steps of QSAR/QSPR Investigations M.H. FATEMI Mazandaran University mhfatemi@umz.ac.ir.
Page 10: In the name of GOD Basic Steps of QSAR/QSPR Investigations M.H. FATEMI Mazandaran University mhfatemi@umz.ac.ir.
Page 11: In the name of GOD Basic Steps of QSAR/QSPR Investigations M.H. FATEMI Mazandaran University mhfatemi@umz.ac.ir.
Page 12: In the name of GOD Basic Steps of QSAR/QSPR Investigations M.H. FATEMI Mazandaran University mhfatemi@umz.ac.ir.
Page 13: In the name of GOD Basic Steps of QSAR/QSPR Investigations M.H. FATEMI Mazandaran University mhfatemi@umz.ac.ir.
Page 14: In the name of GOD Basic Steps of QSAR/QSPR Investigations M.H. FATEMI Mazandaran University mhfatemi@umz.ac.ir.

Early Examples• Hammett (1930s-1940s)

COOH COO + H K0

COOH COO + H KpX X

COOH COO + H Km

X X

para = log10

meta = log10

Kp

Km

K0

K0

Page 15: In the name of GOD Basic Steps of QSAR/QSPR Investigations M.H. FATEMI Mazandaran University mhfatemi@umz.ac.ir.

Hammett (cont.)

• Now suppose have a related series

reflect sensitivity to substituent reflect sensitivity to different system

CH2COOH CH2COO + H K'x

log10K'xK'0

X X

=

Page 16: In the name of GOD Basic Steps of QSAR/QSPR Investigations M.H. FATEMI Mazandaran University mhfatemi@umz.ac.ir.

Free-Wilson Analysis

• Log 1/C = ai +

where C=predicted activity,

ai= contribution per group, and =activity of reference

Page 17: In the name of GOD Basic Steps of QSAR/QSPR Investigations M.H. FATEMI Mazandaran University mhfatemi@umz.ac.ir.

Free-Wilson example

Log 1/C = -0.30 [m-F] + 0.21 [m-Cl] + 0.43 [m-Br] + 0.58 [m-I] + 0.45 [m-Me] + 0.34 [p-F] + 0.77 [p-Cl]

+ 1.02 [p-Br] + 1.43 [p-I] + 1.26 [p-Me] + 7.82

NBr

X

Y HCl

activity of analogs

Problems include at least two substituent position necessary and only predict new combinations of the

substituents used in the analysis.

Page 18: In the name of GOD Basic Steps of QSAR/QSPR Investigations M.H. FATEMI Mazandaran University mhfatemi@umz.ac.ir.

Hansch Analysis

Log 1/C = a + b + c

where x) = log PRX – log PRH

and log P is the water/octanol partition

This is also a linear free energy relation

Page 19: In the name of GOD Basic Steps of QSAR/QSPR Investigations M.H. FATEMI Mazandaran University mhfatemi@umz.ac.ir.

Applications of QSAR

• 1-Drug design

• 2-Prediction of Chemical toxicity

• 3-Prediction of environmental activity

• 4-Prediction of molecular properties

• 5-Investigation of retention mechanism

Page 20: In the name of GOD Basic Steps of QSAR/QSPR Investigations M.H. FATEMI Mazandaran University mhfatemi@umz.ac.ir.
Page 21: In the name of GOD Basic Steps of QSAR/QSPR Investigations M.H. FATEMI Mazandaran University mhfatemi@umz.ac.ir.

Structure Entry &

Molecular Modeling

DescriptorGeneration

FeatureSelection

Construct Model

MLRA or CNN

ModelValidation

Steps in QSPR/QSAR

QSAR STEPS

Page 22: In the name of GOD Basic Steps of QSAR/QSPR Investigations M.H. FATEMI Mazandaran University mhfatemi@umz.ac.ir.

Data set selection

• 1-Structural similarity of studied molecules

• 2-Data collected in the same conditions

• 3-Data set would be as large as possible

Page 23: In the name of GOD Basic Steps of QSAR/QSPR Investigations M.H. FATEMI Mazandaran University mhfatemi@umz.ac.ir.
Page 24: In the name of GOD Basic Steps of QSAR/QSPR Investigations M.H. FATEMI Mazandaran University mhfatemi@umz.ac.ir.

Structure Entry &

Molecular Modeling

DescriptorGeneration

FeatureSelection

Construct Model

MLRA or CNN

ModelValidation

Steps in QSPR/QSAR

QSAR STEPS

Page 25: In the name of GOD Basic Steps of QSAR/QSPR Investigations M.H. FATEMI Mazandaran University mhfatemi@umz.ac.ir.

INTRODUCTION to Molecular Descriptors

• Molecular descriptors are numerical values that characterize properties of molecules

• Molecular descriptors encoded structural features of molecules as numerical descriptors

• Vary in complexity of encoded information and in compute time

• Examples:– Physicochemical properties (empirical)– Values from algorithms, such as 2D fingerprints

Page 26: In the name of GOD Basic Steps of QSAR/QSPR Investigations M.H. FATEMI Mazandaran University mhfatemi@umz.ac.ir.
Page 27: In the name of GOD Basic Steps of QSAR/QSPR Investigations M.H. FATEMI Mazandaran University mhfatemi@umz.ac.ir.

Classical Classification of Molecular Descriptors

*

O

CH2 CH2

O

NH CH CH2

O

O

O

O

CH2 O

CH2

OH

CH2 *n

Constitutional, Topological

2-D structural formula

Physicochemical

Geometrical

3-D shape and structure

Quantum Chemical

Hybrid descriptors

Page 28: In the name of GOD Basic Steps of QSAR/QSPR Investigations M.H. FATEMI Mazandaran University mhfatemi@umz.ac.ir.
Page 29: In the name of GOD Basic Steps of QSAR/QSPR Investigations M.H. FATEMI Mazandaran University mhfatemi@umz.ac.ir.
Page 30: In the name of GOD Basic Steps of QSAR/QSPR Investigations M.H. FATEMI Mazandaran University mhfatemi@umz.ac.ir.

Topological Indexes: Example:

• Wiener Index • Counts the number of bonds between pairs of atoms and sums the

distances between all pairs• Molecular Connectivity Indexes

– Randić branching index• Defines a “degree” of an atom as the number of adjacent

non-hydrogen atoms• Bond connectivity value is the reciprocal of the square root of

the product of the degree of the two atoms in the bond.• Branching index is the sum of the bond connectivities over all

bonds in the molecule.– Chi indexes – introduces valence values to encode sigma, pi,

and lone pair electrons

Page 31: In the name of GOD Basic Steps of QSAR/QSPR Investigations M.H. FATEMI Mazandaran University mhfatemi@umz.ac.ir.
Page 32: In the name of GOD Basic Steps of QSAR/QSPR Investigations M.H. FATEMI Mazandaran University mhfatemi@umz.ac.ir.

Electronic descriptors

• Electronic interactions have very important roles in controlling of molecular properties.

• Electronic descriptors are calculated to encode aspects of the structures that are related to the electrons

• Electronic interaction is a function of charge distribution on a molecule

Page 33: In the name of GOD Basic Steps of QSAR/QSPR Investigations M.H. FATEMI Mazandaran University mhfatemi@umz.ac.ir.
Page 34: In the name of GOD Basic Steps of QSAR/QSPR Investigations M.H. FATEMI Mazandaran University mhfatemi@umz.ac.ir.
Page 35: In the name of GOD Basic Steps of QSAR/QSPR Investigations M.H. FATEMI Mazandaran University mhfatemi@umz.ac.ir.

Physicochemical PropertiesUsed in this QSAR

1. Liquid solubility Sw,L in mg/L and mmol/m3

2. Octanol-water partition coefficient Kow

3. Liquid Vapor Pressure Pv,L in Pa

4. Henry’s Law constant Hc in Pa∙m3/mole

5. Boiling point

Page 36: In the name of GOD Basic Steps of QSAR/QSPR Investigations M.H. FATEMI Mazandaran University mhfatemi@umz.ac.ir.

Structure Entry &

Molecular Modeling

DescriptorGeneration

FeatureSelection

Construct Model

MLRA or CNN

ModelValidation

Steps in QSPR/QSAR

QSAR STEPS

Page 37: In the name of GOD Basic Steps of QSAR/QSPR Investigations M.H. FATEMI Mazandaran University mhfatemi@umz.ac.ir.

Feature Selection

• E.g. comparing faces first requires the

identification of key features.

• How do we identify these?

• The same applies to molecules.

Page 38: In the name of GOD Basic Steps of QSAR/QSPR Investigations M.H. FATEMI Mazandaran University mhfatemi@umz.ac.ir.
Page 39: In the name of GOD Basic Steps of QSAR/QSPR Investigations M.H. FATEMI Mazandaran University mhfatemi@umz.ac.ir.

Objective feature selection• After descriptors have been calculated for each

compound, this set must be reduced to a set of descriptors which is as information rich but as small as possible

1- Deleting of constant or near constant descriptors

2- Pair correlation cut-off selection3- Cluster analysis4- Principal component analysis5- K correlation analysis

Page 40: In the name of GOD Basic Steps of QSAR/QSPR Investigations M.H. FATEMI Mazandaran University mhfatemi@umz.ac.ir.
Page 41: In the name of GOD Basic Steps of QSAR/QSPR Investigations M.H. FATEMI Mazandaran University mhfatemi@umz.ac.ir.
Page 42: In the name of GOD Basic Steps of QSAR/QSPR Investigations M.H. FATEMI Mazandaran University mhfatemi@umz.ac.ir.

Descriptive Statistics

55 .01 9.44 .6524 1.66861

55 .02 708.00 13.2664 95.41298

55 .00 7.35 2.7035 2.06794

55 123.11 307.99 192.4207 42.41658

55 .02 .19 .0580 .03451

55 .00 .23 .0270 .03070

55 .00 312.00 5.6900 42.06771

54 63.45 153.63 95.7878 23.58493

55 4.07 9.13 5.9576 1.24159

55 2.20 4.68 3.1949 .76452

55 1.41 4.56 2.3626 .74960

55 .79 2.71 1.4072 .49032

55 .10 1.14 .2799 .16722

55 .43 1.90 .8358 .38795

55 .14 1.79 .4958 .27697

55 12.00 28.00 17.3091 4.11804

55 .05 .58 .3319 .19432

55 -.45 -.05 -.2652 .11673

55 4.05 6.37 5.2470 .99529

55 .75 6.95 2.5227 1.99339

55 .98 6.94 2.2400 1.62828

55 1.42 3.93 2.6579 .43353

55 106.12 218.34 146.2387 25.62153

55 129.62 262.24 175.1636 28.52871

55 44.02 80.88 57.0065 8.44310

55 22.66 56.08 31.9507 7.16801

55 18.74 38.74 25.0053 4.42347

55 .57 .80 .7089 .05104

55 .65 .92 .8291 .07153

55 .64 .90 .8080 .05988

55 1.49 6.63 3.6971 1.19562

54 1.02 5.62 3.1893 .84204

55 1.00 110.00 37.5636 33.22246

53

homo

lumo

dip

mw

mia

mib

mic

polar

x0

x1p

x2p

x3p

x3c

x4p

x4c

noa

pcpa

pcna

edn

edp

dspn

shape

volm

surf

s1zy

s2zx

s3xy

ss1

ss2

ss3

logp

bcf

number

Valid N (listwise)

N Minimum Maximum Mean Std. Deviation

Page 43: In the name of GOD Basic Steps of QSAR/QSPR Investigations M.H. FATEMI Mazandaran University mhfatemi@umz.ac.ir.
Page 44: In the name of GOD Basic Steps of QSAR/QSPR Investigations M.H. FATEMI Mazandaran University mhfatemi@umz.ac.ir.
Page 45: In the name of GOD Basic Steps of QSAR/QSPR Investigations M.H. FATEMI Mazandaran University mhfatemi@umz.ac.ir.
Page 46: In the name of GOD Basic Steps of QSAR/QSPR Investigations M.H. FATEMI Mazandaran University mhfatemi@umz.ac.ir.
Page 47: In the name of GOD Basic Steps of QSAR/QSPR Investigations M.H. FATEMI Mazandaran University mhfatemi@umz.ac.ir.

Variable reduction

• Principal Component Analysis

Page 48: In the name of GOD Basic Steps of QSAR/QSPR Investigations M.H. FATEMI Mazandaran University mhfatemi@umz.ac.ir.

Principal Component

• PC1 = a1,1x1 + a1,2x2 + … + a1,nxn

• PC2 = a2,1x1 + a2,2x2 + … + a2,nxn

• Keep only those components that possess largest variation

• PC are orthogonal to each other

Page 49: In the name of GOD Basic Steps of QSAR/QSPR Investigations M.H. FATEMI Mazandaran University mhfatemi@umz.ac.ir.

Subjective Feature Selection

• The aim is to reach optimal model

• 1-Search all possible model (Best MLR)

• 2-Forward, Backward & Stepwise methods

• 3-Genetic algorithm

• 4-Mutation and selection uncover models

• 5-Cluster significance analysis

• 6-Leaps & bounds regression

Page 50: In the name of GOD Basic Steps of QSAR/QSPR Investigations M.H. FATEMI Mazandaran University mhfatemi@umz.ac.ir.

Feature Selection:Most existing feature selection algorithms consist of :

Starting point in the feature space

Search procedure

Evaluation function

Criterion of stopping the search

Page 51: In the name of GOD Basic Steps of QSAR/QSPR Investigations M.H. FATEMI Mazandaran University mhfatemi@umz.ac.ir.

Feature Selection:

Starting point in the feature space

- no features

- all features

- random subset of features

Page 52: In the name of GOD Basic Steps of QSAR/QSPR Investigations M.H. FATEMI Mazandaran University mhfatemi@umz.ac.ir.

Forward Selection

• 1- variables are sequentially entered into the model.

The first variable considered for entry into the equation is the one with the largest positive or negative correlation with the dependent

variable. This variable is entered into the equation only if it satisfies

the criterion for entry. 2-If the first variable is entered, the independent

variable not in the equation that has the largest partial correlation is considered next.

3-The procedure stops when there are no variables that meet the entry criterion.

Page 53: In the name of GOD Basic Steps of QSAR/QSPR Investigations M.H. FATEMI Mazandaran University mhfatemi@umz.ac.ir.

Forward Selection exampleModel Summary

.704a .496 .486 .59485

.762b .581 .564 .54785

.810c .655 .634 .50184

.834d .695 .670 .47674

Model1

2

3

4

R R SquareAdjustedR Square

Std. Error ofthe Estimate

Predictors: (Constant), logpa.

Predictors: (Constant), logp, mwb.

Predictors: (Constant), logp, mw, dipc.

Predictors: (Constant), logp, mw, dip, miad.

Page 54: In the name of GOD Basic Steps of QSAR/QSPR Investigations M.H. FATEMI Mazandaran University mhfatemi@umz.ac.ir.

Backward Elimination

• 1- All variables are entered into the equation and then sequentially removed.

• 2-The variable with the smallest partial correlation with the dependent variable is

considered first for removal. If it meets the criterion for elimination, it is removed.

• 3- After the first variable is removed, the variable remaining in the equation with the smallest

partial correlation is considered next. • 4-The procedure stops when there are no

variables in the equation that satisfy the removal criteria.

Page 55: In the name of GOD Basic Steps of QSAR/QSPR Investigations M.H. FATEMI Mazandaran University mhfatemi@umz.ac.ir.

Stepwise

• Stepwise. At each step, the independent variable not in the equation that has the

smallest probability of F is entered, if that probability is sufficiently small. Variables

already in the regression equation are removed if their probability of F becomes sufficiently large. The method terminates

when no more variables are eligible for inclusion or removal.

Page 56: In the name of GOD Basic Steps of QSAR/QSPR Investigations M.H. FATEMI Mazandaran University mhfatemi@umz.ac.ir.

Stepwise ExampleModel Summary

.704a .496 .486 .59485

.762b .581 .564 .54785

.810c .655 .634 .50184

.834d .695 .670 .47674

.824e .679 .660 .48403

Model1

2

3

4

5

R R SquareAdjustedR Square

Std. Error ofthe Estimate

Predictors: (Constant), logpa.

Predictors: (Constant), logp, mwb.

Predictors: (Constant), logp, mw, dipc.

Predictors: (Constant), logp, mw, dip, miad.

Predictors: (Constant), logp, dip, miae.

Page 57: In the name of GOD Basic Steps of QSAR/QSPR Investigations M.H. FATEMI Mazandaran University mhfatemi@umz.ac.ir.

Forward, Backward & Stepwise variable selection methods

• Advantages

• Fast and simple

• Can do with very packages

• Limitation

• Risk of Local minima

Page 58: In the name of GOD Basic Steps of QSAR/QSPR Investigations M.H. FATEMI Mazandaran University mhfatemi@umz.ac.ir.

Genetic algorithm

Genetic Algorithm

Page 59: In the name of GOD Basic Steps of QSAR/QSPR Investigations M.H. FATEMI Mazandaran University mhfatemi@umz.ac.ir.

Search Space

Page 60: In the name of GOD Basic Steps of QSAR/QSPR Investigations M.H. FATEMI Mazandaran University mhfatemi@umz.ac.ir.

Definition

Genetic algorithm is a general purpose search and optimization

method based on genetic principles and Darwin’s law that applicable to

wide variety of problems

Page 61: In the name of GOD Basic Steps of QSAR/QSPR Investigations M.H. FATEMI Mazandaran University mhfatemi@umz.ac.ir.

Darvin’s rules

Survival of fittest individualsRecombinationMutation

Page 62: In the name of GOD Basic Steps of QSAR/QSPR Investigations M.H. FATEMI Mazandaran University mhfatemi@umz.ac.ir.

Biological background• Chromosome

• Gene

• Reproduction

• Mutation

• Fitness

Page 63: In the name of GOD Basic Steps of QSAR/QSPR Investigations M.H. FATEMI Mazandaran University mhfatemi@umz.ac.ir.

GA basic operation

• Population generation (chromosome )

• Selection (according to fitness )

• Recombination and mutation (offspring)

• Repetition

Page 64: In the name of GOD Basic Steps of QSAR/QSPR Investigations M.H. FATEMI Mazandaran University mhfatemi@umz.ac.ir.

GA flow chartInitialize

population generation

Evaluatecompute fitness for each chromosome

Exploitperform natural selection

Explorerecombination & mutation operation

Page 65: In the name of GOD Basic Steps of QSAR/QSPR Investigations M.H. FATEMI Mazandaran University mhfatemi@umz.ac.ir.

Binary Encoding

Chromosome A 1 0 1 1 0 0 1 1 1 0 0 0 0 1

Chromosome B 0 0 1 0 0 1 1 1 0 1 0 0 1 1

Every of chromosome is a string of bit 0 or 1

Page 66: In the name of GOD Basic Steps of QSAR/QSPR Investigations M.H. FATEMI Mazandaran University mhfatemi@umz.ac.ir.

Selection The best chromosome should

survive and create new offspring.

• Roulette wheel selection

• Rank selection

• Steady state selection

Page 67: In the name of GOD Basic Steps of QSAR/QSPR Investigations M.H. FATEMI Mazandaran University mhfatemi@umz.ac.ir.

Roulette wheel selection

Fitness 1> 2 > 3 >4

Page 68: In the name of GOD Basic Steps of QSAR/QSPR Investigations M.H. FATEMI Mazandaran University mhfatemi@umz.ac.ir.

Crossover ( binary encoding )

*Single point

11001011+11011111 = 11001111

11001011 + 11011111 = 11011111

* Two point crossover

Page 69: In the name of GOD Basic Steps of QSAR/QSPR Investigations M.H. FATEMI Mazandaran University mhfatemi@umz.ac.ir.

Mutation* Bit inversion (binary encoding )

11001001 => 10001001

* Ordering change ( permutation encoding )

(1 2 3 4 5 6 8 9 7) => (1 8 3 4 5 6 2 9 7)

Page 70: In the name of GOD Basic Steps of QSAR/QSPR Investigations M.H. FATEMI Mazandaran University mhfatemi@umz.ac.ir.

GA flow chartStart

Fitness

Selection

Crossover

Mutation

Replace

Test

End

Population generation

Page 71: In the name of GOD Basic Steps of QSAR/QSPR Investigations M.H. FATEMI Mazandaran University mhfatemi@umz.ac.ir.

Parameters of GA

• Crossover rate

• Mutation rate

• Population size

• Selection type

• Encoding

• Crossover and mutation type

Page 72: In the name of GOD Basic Steps of QSAR/QSPR Investigations M.H. FATEMI Mazandaran University mhfatemi@umz.ac.ir.

Advantages of GA

• Parallelism

• Provide a group of potential solutions

• Easy to implement

• Provide global optima

Page 73: In the name of GOD Basic Steps of QSAR/QSPR Investigations M.H. FATEMI Mazandaran University mhfatemi@umz.ac.ir.

How many descriptors can be used in a QSAR model?

Rule of tumb:

- Per descriptor at least 5 data point (molecule) must be exist in the model

Otherwise possibility of finding coincidental correlation is too high

Page 74: In the name of GOD Basic Steps of QSAR/QSPR Investigations M.H. FATEMI Mazandaran University mhfatemi@umz.ac.ir.

Structure Entry &

Molecular Modeling

DescriptorGeneration

FeatureSelection

Construct Model

MLRA or CNN

ModelValidation

Steps in QSPR/QSAR

QSAR STEPS

Page 75: In the name of GOD Basic Steps of QSAR/QSPR Investigations M.H. FATEMI Mazandaran University mhfatemi@umz.ac.ir.
Page 76: In the name of GOD Basic Steps of QSAR/QSPR Investigations M.H. FATEMI Mazandaran University mhfatemi@umz.ac.ir.
Page 77: In the name of GOD Basic Steps of QSAR/QSPR Investigations M.H. FATEMI Mazandaran University mhfatemi@umz.ac.ir.

Questions?