Data Mining in Bioinformatics Day 8: Graph Mining for Chemoinformatics and Drug Discovery Chloé-Agathe Azencott Machine Learning & Computational Biology Research Group MPIs Tübingen C.-A. Azencott Graph Mining for Chemoinformatics February 16, 2012 1
38
Embed
Data Mining in Bioinformatics Day 8: Graph Mining for
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Data Mining in BioinformaticsDay 8: Graph Mining for Chemoinformatics
and Drug Discovery
Chloé-Agathe Azencott
Machine Learning & Computational Biology Research GroupMPIs Tübingen
C.-A. Azencott Graph Mining for Chemoinformatics February 16, 2012 1
Drug Discovery
Modern Therapeutic Research
From serendipity to rationalized drug design
Ancient Greeks treatinfections with mould
CH 3
N
S
O
NH
O
HO
NH 2
O
HO
CH 3
Biapenem in PBP-1A
C.-A. Azencott Graph Mining for Chemoinformatics February 16, 2012 2
Drug Discovery
Drug Discovery Process
C.-A. Azencott Graph Mining for Chemoinformatics February 16, 2012 3
Drug Discovery
Drug Discovery Process
C.-A. Azencott Graph Mining for Chemoinformatics February 16, 2012 4
Drug Discovery
Drug Discovery Process
C.-A. Azencott Graph Mining for Chemoinformatics February 16, 2012 5
Drug Discovery
Chemoinformatics
How can computer science help?
→ Chemoinformatics!“...the mixing of information resources to transform data into information, and information into knowledge, forthe intended purpose of making better decisions faster in the arena of drug lead identification and optimisation.”– F. K. Brown
“... the application of informatics methods to solve chemical problems.” – J. Gasteiger and T. Engel
C.-A. Azencott Graph Mining for Chemoinformatics February 16, 2012 6
Drug Discovery
Chemoinformatics
C.-A. Azencott Graph Mining for Chemoinformatics February 16, 2012 7
Drug Discovery
The Chemical Space
É 1060 possible small moleculesÉ 1022 stars in the observable
universe
(Slide courtesy of Matthew A. Kayala)
C.-A. Azencott Graph Mining for Chemoinformatics February 16, 2012 8
C.-A. Azencott Graph Mining for Chemoinformatics February 16, 2012 34
Virtual HTS Measuring Performance
Inhibition of DHFR: ROC Curves
method AUCIRV 0.71SVM 0.59kNN 0.59
MAX-SIM 0.54
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
FPR
TP
R
RANDOM
IRV
SVM
MAXSIM
C.-A. Azencott Graph Mining for Chemoinformatics February 16, 2012 35
Virtual HTS Measuring Performance
Precision-Recall Curves
É Precision =# True Positives
# Predicted PositivesÉ Recall = sensitivity
0 1/4 2/4 3/4 1
01
/52
/53
/54
/51
Recall
Pre
cis
ion
x
x
x
x
x
x
x
xxx
0.95
0.94
0.9
0.81
0.73
0.52
0.2
0.170.120.09
perfect
real
C.-A. Azencott Graph Mining for Chemoinformatics February 16, 2012 36
Other Graph Mining Applications
Other Applications
É Database indexing and searchÉ Prediction of 3D structures of small compounds and
proteinsÉ Reaction Prediction
C.-A. Azencott Graph Mining for Chemoinformatics February 16, 2012 37
Other Graph Mining Applications
ReferencesC.-A. Azencott, A. Ksikes, S. J. Swamidass, J. H. Chen, L. Ralaivola, and P. Baldi.One- to Four-Dimensional Kernels for Virtual Screening and the Prediction of Physical, Chemical, andBiological Properties. J. Chem. Inf. Model, 2007http://www.igb.uci.edu/~pfbaldi/publications/journals/2006/ci600397p.pdf
P. Baldi, R. Benz, J. S. Swamidass, and D. S. Hirschberg.Lossless Compression of Chemical Fingerprints Using Integer Entropy Codes Improves Storage andRetrieval J. Chem. Inf. Model, 2007 http://www.ics.uci.edu/~dan/pubs/ci700200n.pdf
A. Ceroni, F. Costa, and P. Frasconi.Classification of Small Molecules by Two- and Three-Dimensional Decomposition Kernels. Bioinformatics,2007 http://bioinformatics.oxfordjournals.org/content/23/16/2038
T. Fawcett.ROC Graphs: Notes and Practical Considerations for Researchers HP Labs Tech Report, 2004http://citeseer.ist.psu.edu/viewdoc/summary?doi=10.1.1.10.9777
C. Helma, T. Cramer, S. Kramer, and L. De Raedt.Data Mining and Machine Learning Techniques for the Identification of Mutagenicity Inducing Substructuresand Structure Activity Relationships of Noncongeneric Compounds. J. Chem. Inf. Comput. Sci., 2004http://cbio.ensmp.fr/~jvert/svn/bibli/local/Helma2004Data.pdf
H. Saigo, S. Nowozin, T. Kadowaki, T. Kudo, and K. Tusda.gBoost: a mathematical programming approach to graph classification and regression Mach. Learn., 2009http://www.nowozin.net/sebastian/papers/saigo2008gboost.pdf
C.-A. Azencott Graph Mining for Chemoinformatics February 16, 2012 38