Introduction on QSAR Introduction on QSAR and modelling of and modelling of physico-chemical and physico-chemical and biological properties biological properties Alessandra Roncaglioni – IRFMN [email protected]Problems and approaches in computational chemistry
80
Embed
Introduction on QSAR and modelling of physico-chemical and biological properties Alessandra Roncaglioni – IRFMN [email protected] Problems and.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Introduction on QSAR Introduction on QSAR and modelling of and modelling of physico-chemical and physico-chemical and biological propertiesbiological properties
External validation - prediction ability◦Test set representative of training
set◦Tropsha criteria
Applicability domain
36
Cross validationCross validationLeave One OutAll the data are used for fitting but one compoundPredict the excluded sampleRepeat it for all samplesCalculate Q2 or R2cv similarly to R2 on the basis of
these predictions
Problem: to optimistic if there are many samples
Leave Many OutUse larger groups to obtain a more realistic
outcome
37
BootstrappingBootstrappingBootstrapping simulates what happen by
randomly resampling the data set with n objects
K n-dimensional groups are generated by a randomly repeated some objects
The model obtained on the different sets is used to predict the values for the excluded sample
From each bootstrap sample the statistical parameter of interest is calculated
The estimation of accuracy is obtained by the average of all calculated statistics
38
Y-scramblingY-scramblingRandomply permutate Y responses while X
variables are kept in the same order for several times
39
Tropsha criteria*Tropsha criteria*
40
* A. Golbraikh, M. Shen, Z. Xiao, Y.D. Xiao, K.-H. Lee, A. Tropsha, Rational selection of training and test sets for the development of validated QSAR models, JCAMD, 17 (2003) 241-253.
a) Q2 > 0.5; b) R2 > 0.6;
c) (R2 - R20)/ R2 < 0.1 and 0.85 < k < 1.15 or
(R2 – R’20)/ R2 < 0.1 and 0.85 < k’ < 1.15
(k=slope of the regression line)
(R20 = R2 related to y=kx)
d) if (c) is not fulfilled, then | R20 – R’2
0| < 0.3
Applicability domainApplicability domain
The applicability domain of a (Q)SAR model is the response and chemical structure space in which the model makes predictions with a given reliability.*
41
* Current status of methods for defining the applicability domain of (quantitative) structure-activity relationships. ATLA, 33:1-19, 2005.
Applicability domainApplicability domain
42
Training data
Applicability domainApplicability domain
43
Training data
New compounds
AD assessmentAD assessment
Similarity measures:
Response range (span of activity data)
Chemometric treatment of the descriptor space
Fragment-based approaches
44
Chemometric MethodsChemometric Methods
Descriptor range-based
45
0
2
4
6
8
10
12
0 5 10 15 20
Descr. 1
De
sc
r. 2
Chemometric MethodsChemometric Methods
Descriptor range-based
Geometric methods
46
0
2
4
6
8
10
12
0 5 10 15 20
Descr. 1
De
sc
r. 2
Chemometric MethodsChemometric Methods
Descriptor range-based
Geometric methods
Distance-based
47
Chemometric MethodsChemometric Methods
Descriptor range-based
Geometric methods
Distance-based
Probability density
distribution
48
AMBIT softwareAMBIT software
http://ambit.acad.bg/main.php 49
AD assessmentAD assessment
Similarity measures:
Response range (span of activity data)
Chemometric treatment of the descriptor space
Fragment-based approaches
50
Example of AD Example of AD assessmentassessment
0
10
20
30
40
50
60
70
80
90
100Within 1 log unit
Within 2 log unit
Test set 1
Test set 2
0
10
20
30
40
50
60
70
80
90
100
% o
f com
poun
ds
Within 1 log unit
Within 2 log unit
% of all compounds in the test set predicted within one or two log unit without assessing the AD
Unsafe for use in cosmetics Illegal ingredients (EU)
Illegal ingredients (US) Unsafe in infant products
Potential for harmful impurities Ingredient(s) not disclosed on label
Sunburn/skin cancer risk Estrogenic chemicals and other endocrine disruptors
Irritants - eye, skin, or lungs Fragrance
Persistent/bioaccumulative Immune system toxicants (allergies, sensitization)
Penetration enhancers Safety limits on use/purity/manufacturing
Classified as toxic Potential for infectious disease risk
Hazards for occupational exposures Industry safety warnings
Illegal for use in food Illegal for use in drugs
Insufficient safety data Wildlife/environmental toxicity
Ingredient(s) not assessed for safety No safety information in 37 regulatory/toxicity data sources
Summary - health information
Constrains: time consuming, expensive,
ethical issues
REACHREACH
Enterprises that manufacture or import more than one tonne of a chemical substance per year would be required to register it in a central database
It is estimated that the testing of the approximately 30’000 existing substances would result in total costs of about 2,1 billion €, over the next 11 years
Promotion of non-animal testing
55
Registration, Evaluation and Authorisation of CHemicals
REACHREACH
56
Registration, Evaluation and Authorisation of CHemicals
Additional cost Use of (Q)SARs,
read-across 2.3 billion Euro Minimal
use
1.5 billion Euro Average use (likely scenario)
1.1 billion Euro Maximal use
Cost-saving potential: € 800-1130 million
Pedersen et al. (2003). Assessment of additional testing needs under REACH.
REACHREACH
57
Registration, Evaluation and Authorisation of CHemicals
Additional animalsUse of (Q)SARs, read-
across 3.9 million Minimal
use
2.6 million Average use (likely scenario)
2.1 million Maximal use
Animal-saving potential: 1.3-1.9 million animals
Van der Jagt et al. (2004). Alternative approaches can reduce the use of test animals under REACH.
OECD principles for QSAR OECD principles for QSAR validationvalidationEfforts to improve transparency and acceptability of in silico methods:
A defined endpoint
An unambiguous algorithm
A defined domain of applicability
Appropriate measures of goodness-of-fit, robustness and predictivity