Advances in Advances in Cheminformatics Cheminformatics Applications in Biotechnology, Drug Applications in Biotechnology, Drug Design and Design and Bioseparations Bioseparations Curt M. Breneman Curt M. Breneman Department of Chemistry and Chemical Department of Chemistry and Chemical Biology/Center for Biotechnology and Biology/Center for Biotechnology and Interdisciplinary Studies Interdisciplinary Studies Rensselaer Polytechnic Institute Rensselaer Polytechnic Institute Presented at Siena College, NY 4/29/05
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Advances in Advances in CheminformaticsCheminformatics
Applications in Biotechnology, Drug Applications in Biotechnology, Drug Design and Design and BioseparationsBioseparations
Curt M. BrenemanCurt M. Breneman
Department of Chemistry and Chemical Department of Chemistry and Chemical Biology/Center for Biotechnology and Biology/Center for Biotechnology and
SynergiesSynergies–– Complementary to topological descriptorsComplementary to topological descriptors
Surface Property Distribution Histograms Surface Property Distribution Histograms (RECON/TAE) Descriptors(RECON/TAE) Descriptors
Molecular surface property distributions can be represented as Molecular surface property distributions can be represented as RECON/TAE histogram bin descriptorsRECON/TAE histogram bin descriptors
16 coefficients from S7 and 16 coefficients from S7 and D7 portions of the WCD D7 portions of the WCD vector represent surface vector represent surface property densities with property densities with >95% accuracy.>95% accuracy.
1024 raw wavelet coefficients capture PIP distribution on molecular surface.
Wavelet Wavelet Decomposition:Decomposition:
–– Creates a set of Creates a set of coefficients that coefficients that represent a represent a waveform.waveform.
–– Small coefficients Small coefficients may be omitted to may be omitted to compress data.compress data.
Wavelet Representations of HighWavelet Representations of High--Resolution Resolution Molecular Surface Property Densities. Molecular Surface Property Densities.
(1,2,10 and 20 Coefficient Decompositions)(1,2,10 and 20 Coefficient Decompositions)
Wavelet Representations of HighWavelet Representations of High--Resolution Resolution Molecular Surface Property Densities.Molecular Surface Property Densities.
Molecular Shape Encoding Molecular Shape Encoding
Karthigeyan Nagarajan, Randy Zauhar, and William J. Welsh, “Enrichment of Ligands for the Serotonin Receptor Using the Shape Signatures Approach” J. Chem. Inf. Model., 45, 49-57 (2005)
Curt M. Breneman, C. Matthew Sundling, N. Sukumar, Lingling Shen, William P. Katt and Mark J. Embrechts, “New developments in PEST shape/property hybrid descriptors” J. Computer-Aided Mol. Design, 17, 231–240, (2003)
PEST PEST (Property(Property--Encoded Encoded Surface Translation)Surface Translation)–– Adds shape information to encode Adds shape information to encode
the spatial relationships of surface the spatial relationships of surface propertiesproperties
PEST Molecular Ray Tracing AlgorithmPEST Molecular Ray Tracing Algorithm
PEST PropertyPEST Property--Encoded RaysEncoded Rays
PEST Hybrid Shape/Property Histogram PEST Hybrid Shape/Property Histogram Convergence : Four sets of initial conditionsConvergence : Four sets of initial conditions
Machine Learning and Machine Learning and Model BuildingModel Building
Model Building and ValidationModel Building and Validation
DATASET
Test set
PredictiveModel
Prediction
Training set
Training Validation
Bootstrap sample k
Tuning /Prediction
LearningModel
Y-scrambling model validation!
2ε
ξ *ξ
( ) ( )f x wx b ε= + +
( ) ( )f x wx b ε= + −
Support Vector Regression
Empirical errorε-insensitive loss function:
( ) max(0, | ( ) | )L x y f xε ε= − −
( )f x wx b= +
( )
( )
*
1 1
1
*
1
*
. + ( ) ( )
. .
, , , , 0 , 1, , 1, ,
n l
i i i ii i
n
j i i ji ji
n
i i ji j ji
i i j j
Cm in C b u vl
y u v x b
s t u v x b y
u v j l i n
νε ξ ξ
ε ξ
ε ξ
ξ ξ ε
= =
=
=
+ + + +
− − − ≤ +
− + − ≤ +
≥ = =
∑ ∑
∑
∑… …
11
. ( )m
ii
m in C L x wε=
+∑
Linear hypotheses
Minimize:
Empirical error + Complexity
Complexity controll1-norm weight vector:
11
n
ii
w=
= ∑w
l1-norm l2-norm
hERG: ROC Curve Comparisonsleave-one-out results from different models
Electron DensityElectron Density--Derived molecular property descriptors Derived molecular property descriptors contain valuable physicochemical informationcontain valuable physicochemical information
TAE descriptors are useful for building virtual highTAE descriptors are useful for building virtual high--throughput screening models (ADME, bioassay)throughput screening models (ADME, bioassay)
Predictive models can be built using TAE and PEST Predictive models can be built using TAE and PEST descriptorsdescriptors
Proteins (or protein binding sites) may be characterized Proteins (or protein binding sites) may be characterized using Protein PEST techniquesusing Protein PEST techniques
Current SoftwareCurrent SoftwareRECON 5.8 + Analyze w/Outlier detectionRECON 5.8 + Analyze w/Outlier detection–– RADRAD–– Fast KPLS test set mode with low memory footprintFast KPLS test set mode with low memory footprint
RECON for MOERECON for MOE–– DropDrop--in interactive or batch RECON 5.8 for MOE 2003in interactive or batch RECON 5.8 for MOE 2003
RECON 2001 for protein characterizationRECON 2001 for protein characterization–– Property moment descriptors (Cramer)Property moment descriptors (Cramer)–– Binding site/Binding site/ligandligand scoring using Universal Descriptor Space scoring using Universal Descriptor Space
((TropshaTropsha))TAE/DIXELTAE/DIXEL–– DNA Characterization and bioinformatics (Lawrence)DNA Characterization and bioinformatics (Lawrence)
PEST (Compatible with Gaussian or Jaguar 5.0)PEST (Compatible with Gaussian or Jaguar 5.0)–– PADPAD–– WSADWSAD–– WaveletsWavelets
ACKNOWLEDGMENTSACKNOWLEDGMENTSMembers of the DDASSL groupMembers of the DDASSL group
–– Breneman Research Group (RPI Chemistry)Breneman Research Group (RPI Chemistry)N. N. SukumarSukumarM. SundlingM. SundlingC. Whitehead (Pfizer)C. Whitehead (Pfizer)L. L. ShenShenL. Lockwood (Albany Molecular)L. Lockwood (Albany Molecular)M. SongM. SongD. D. ZhuangZhuangW. W. KattKattQ. Q. LuoLuo
–– Embrechts Research Group (RPI DSES)Embrechts Research Group (RPI DSES)–– TropshaTropsha Research Group (UNC Chapel Hill)Research Group (UNC Chapel Hill)–– Bennett Research Group (RPI Mathematics)Bennett Research Group (RPI Mathematics)
JinboJinbo BiBi
Collaborators:Collaborators:–– Lawrence Research Group (NYS Wadsworth Labs)Lawrence Research Group (NYS Wadsworth Labs)
Inna Inna VitolVitol–– Cramer Research Group (RPI Chemical Engineering)Cramer Research Group (RPI Chemical Engineering)