Shape and Color Clustering with SAESAR Norah E. MacCuish, John D. MacCuish, and Mitch Chapman Mesa Analytics & Computing, Inc.
Mar 27, 2015
Shape and Color Clustering with SAESAR
Norah E. MacCuish, John D. MacCuish, and
Mitch Chapman
Mesa Analytics & Computing, Inc.
ABSTRACT
SAESAR identifies potentially interesting patterns in shape and color space for leads from HTS screening data. Analysis of a several public datasets will be described as well as a discussion of a successful analysis in an industrial setting.
SAESAR Features
Data Exploration, Unsupervised and Supervised Learning with Shape, Electrostatics, and 2D Structure and Properties
Powerful OpenEye Scientific Software and Mesa Analytics & Computing tools with Visualization and 2D and 3D depictions.
Clustering Taylor’s (symmetric, asymmetric,
non-disjoint, disjoint versions) Hierarchical (RNN
implementations of Ward’s, Complete Link, Group Average)
Conformer Generation OMEGA User supplied
Modeling - Model Builder Classification
Linear and Quadratic discrimination KNN
Example Tasks Find Key Shapes Find Key Structures Find Key Color Groups Generate Predictive Model with
Shape, Electrostatics, Color, 2D Structure, other variables
SAESAR - 2D & 3D Clustering on Shape and Pharmacophore Features
2D Descriptors MACCS ‘drug like’ keys and public keys
from PubChem, 768 key fingerprints* 3D Descriptors
OEShape - volume overlap OEColor - hydrogen-bond donors, hydrogen-
bond acceptors, hydrophobes, anions, cations, and rings, can be user defined
*New key-based molecular fingerprinter for visualization and data analysis in compound clustering, similarity searching, and substructure commonality analysis,N. MacCuish, J.D.MacCuish, 233rd ACS, Chicago,March 25-29, 2007.
Mining Primary Screening Data
Three primary screens -JNK3,Rock2,FAK Cluster hits in 3D shape (full, subshape) Cluster in 3D color Identify ‘Key shape’ clusters Identify ‘Key color’ clusters Validate with secondary screening data
Datasets
Dataset Screen Structures Conformers
JNK3 Primary
Secondary
366
57
1724
256
FAK Primary
Secondary
756
189
3264
434
Rock-2 Primary
Secondary
212
67
869
273
Results Summary
Dataset Secondary Screen Matches
Expected if random
Significant
JNK3
Shape
Color
21
23
14
12
yes
yes
FAK
Shape
Color
24
42
11
44
yes
no
Rock-2
Shape
Color
47
31
41
32
Marginal
No
JNK3 ‘Key Shapes’
Jnk3 Color & shape 8 matches
JNK3 Color & Shape Common HitsJNK3 Color & Shape Common HitsSecondary screening hits which group both by shape and color
Xray Structures and ‘Key Shapes’Xray Structures and ‘Key Shapes’
Rock2 (2H9V)Matches 1stKey shape
FAK (2ETM)Matches 1stKey shape
JNK3 (2EXC)Sub-shapeMatch
Lead Hopping For SIRT1 Activators*
SIRT1 Actives and Not ActivesInput to SAESAR
3D ‘Key Shape’ Query
Potential leads are in a different 2D space, but similar 3D space as theactive SIRT1 compounds
Available Compounds
*See, J. Bemis, Bioorganic Gordon Research Conference, June 2008.
Lead Hopping For SIRT1 Activators
• SAESAR was used to identify key shapes which encapsulated 3D shape features of SIRT1 active compounds• Key shapes were queries in a virtual screened against 3D database of Available compounds• Sets of hits were identified:
• 20 compounds had highest overall shape matching Tanimoto scores• 47 compounds had shape Tanimoto scores > 0.6• 172 compounds had Tversky score > 0.8
• Compounds were ordered and screened in SIRT1 assay:• one novel scaffold was identified with low micromolar activity• optimization lowered SIRT1 activation potency
Acknowledgements
Jean Bemis, Sirtris Pharmaceuticals, a GSK Company
Evan Bolton, PubChem, NIH Software and Databases: CDK, R, PDB,
ZINC, PubChem OpenEye Scientific Software, Inc.