Clustering of Small Molecules Based on Similarity Scores From Flexible 3D Alignment Adrian Kalaszi, Gabor Imre, Miklos J. Szabo, Timea Polgar, Krisztian Niesz ChemAxon Ltd., 1031 Budapest, Zahony u. 7, Hungary Abstract There are several approaches for clustering chemical structures. Among these, the structure-based methods and techniques using classical 2D descriptors (e.g. chemical fingerprints or ECFP) are the most widely used. Considering 3D information, such as conformers, 3D pharmacophore maps or molecular shapes can provide researchers more insight into the process and facilitate a deeper and more natural interpretability of corresponding results. ChemAxon’s 3D alignment tool provides an automatic 3D shape-based flexible alignment option for handling small molecules and the resulting shape similarity scores calculated for the best fits can be further used in similarity-based clustering as a part of scaffold hopping for finding new lead molecules. Molecular Med TRI-CON 2013, February 11-15, 2013 Introduction It is generally accepted that molecular shape properties play a central role in ligand binding. Based on the growing number of publications in the field, several descriptors and methods have been applied in shape- based similarity screening [1]. These methods compete with other virtual screening techniques, such as ligand and structure-based methods [2, 3]. Therefore, it is also expected that considering 3D shape alignment-based similarity in clustering may also bring new and novel aspects besides the information provided by traditional 2D clustering methods. ChemAxon in 3D 3D structure generation / conformational analysis Generate3D [4,5] is a molecular coordinate generation / conformational analysis component of ChemAxon’s discovery tools (released in 2002), which is used by Marvin GUI’s Structure / Clean3D function, Conformers Calculator Plugins as well as the molconvert command line tool. 3D flexible alignment The 3D flexible alignment procedure (released in 2009; [6]) overlays two structures by maximizing the intersection of their van der Waals volumes. The volume is partitioned by the underlying atomic properties, such as extended atom types (force field types) or pharmacophoric types. Both molecules can be treated flexible by tweaking their rotatable bonds, flexible rings and ring systems in a continuous manner during the alignment. A single 3D conformer for each aligned structure is used as input for the alignment procedure. Thus, this method provides valid 3D similarity scores for 2D / 0D input structures by automatically calling Generate3D. After the alignment is completed the size of the volume intersection and the 3D Tanimoto (a dimensionless measure of similarity between 0 and 1) can be obtained for further processing. Example alignment workflow: 1) 2D input structures; 2) 3D conformer is generated and the shape is colored by atomic types; 3) the volume intersection, which maximized during the alignment, is shown along with the resulting pose. 3D similarity – ligand based virtual screening ”Screen3D” is a ligand based 3D similarity calculation tool released in 2010. ”Screen3D” calculates the intersection of the colored shape and the 3D Tamimoto. Apart from these shape-based measures ”Screen3D” can also return a 3D similarity score calculated from intermolecular distance ranges [7]. The distance ranges are calculated for each molecule by tweaking rotatable bonds to maximize or minimize the distance between every pair of the selected atoms. The distance range similarity score is comparable in screening performance to the shape based counterpart. Benchmark results: Venkatraman et. al. [2] compared the performance of various 2D and 3D similarity methods on the Directory of Useful Decoys [8]. The values represented by bluish columns are originated from their work; Screen3D performance results - shown in orange - were measured in house based on this publication, using the same approach. (SCREEN3D_S8V: shape similarity with volume intersection score, SCREEN3D_S8T: shape similarity with 3D Tanimoto score, SCREEN3D_H: distance range based similarity). Clustering - JKlustor ”JKlustor” Suite [9, 10] performs similarity and structure-based clustering of compound libraries and focused sets in both hierarchical and non-hierarchical fashion. In addition ”JKlustor” Suite can carry out diversity calculations and library comparisons based on molecular fingerprints and other descriptors. It is an essential tool in combinatorial chemistry, virtual library design or other areas where a large number of compounds need to be analyzed. The approach currently presented introduces 3D flexible alignment-based similarity calculation to the JKlustor Suite. This allows the available similarity based algorithms to use structural data in these clustering processes. Aligned structure pair Aligned shapes 2D (0D) input Flexibly aligned results ChemAxon Graphisoft Park, Hx Building H-1037 Budapest, Hungary Phone: +36 1 453 2660 Fax: +36 1 453 2659 http://www.chemaxon.com Structural frameworks MCS MCES 3D flexible alignment Chemical hashed fp BCUT-like* ECFP Pharmacophore 2D fp* Calculated property- based* User defined FCFP* 2D (0D) structure based algorithms Molecular descriptors Euclidean Tanimoto Intersection Similarity metrics Sphere exclusion K-means Ward’s minimum variance* Similarity-based clustering Euclidean Tanimoto Structure-based clustering Jarvis- Patrick* An overview of algorithms and descriptors available to use from the JKlustor suite. *Note: some components are available as standalone tools Proof of concept implementation The interface to the 3D flexible alignment functionality has been implemented in JKlustor through a transparent pairwise similarity calculation. Furthermore, a visualization tool is also provided in order to compare the results of the alignment-based similarity calculation with other descriptor implementations. Sphere exclusion Algorithm interface Generate 3D Aromacity H atoms Structure ID Descriptor cache Input structures (smiles) Cache file (DB)) Descriptor interface Orchestration, execution, hierarchy representation Visualization, UI (Web-enabled) Output 3D flex. Align. interface Flexible 3D alignment engine UI Client Structures, clusters,... Architecture of the JKlustor extension. Interaction points with the user; standard JKlustor elements and the interaction points with the flexible 3D alignment engine are depicted. Clustering results of a small 3D fragment library Clusters (centroids) resulting from a sphere exclusion clustering (r=0.4) of the heat shock protein 90 (hsp90) ligands contained by the DUD database. The bar lengths are proportional to the cluster size. References [1] Haigh, J. A.; Pickup, B. T.; Grant, J. A.; Nicholls, A.: Small Molecule shape-fingerprints. J. Chem. Inf. Model. 2005, 45, 673−684. [2] Venkatraman, V.; Perez-Nueno, V. I.; Mavridis, L.; Ritchie, D. W.: Comprehensive comparison of ligand-based virtual screening tools against the DUD data set reveals limitations of current 3D methods. J. Chem. Inf. Model. 2010, 50, 2079−93. [3] Hu, G.; Kuang, G.; Xiao, W.; Li, W.; Liu, G.; Tang, Y.: Performance Evaluation of 2D Fingerprint and 3D Shape Similarity Methods in Virtual Screening. J. Chem. Inf. Model. 2012, 52, 1103−1113 [4] http://www.chemaxon.com/marvin/help/calculations/conformation.html#conformer [5] http://www.chemaxon.com/conf/Advanced_automatic_generation_of_3D_molecular_structures.pdf [6] Marvin 5.1.2, 2012, ChemAxon (http://www.chemaxon.com) [7] Deng, W.; Kalászi, A.: Screen3D: A Ligand-based 3D Similarity Search without Conformational Sampling. International Conference and Exhibition on Computer Aided Drug Design & QSAR Oct 29th, 2012 Chicago, IL, USA [8] Irwin, J. J.; Community benchmarks for virtual screening. J. Comput.- Aided Mol. Des. 2008, 22, 193-9. [9] http://www.chemaxon.com/products/jklustor/ [10] http://www.chemaxon.com/conf/JKlustor.ppt 1 2 3 Using 3D similarity approaches to identify scaffold hopping cases In most scaffold hopping cases, the compared molecules look very similar in terms of their 3D properties, but they look quite different in terms of their 2-dimensional representation. Thus, it is proposed that such scaffold hopping cases can be captured by the comparison of the calculated 2D and 3D similarities. 3D SHAPE dissimilarity A) Scaffold hopping cases for Antihistamine drugs: 3D shape similarity values / 2D ECFP similarity values together with the corresponding pair wise 3D alignment of molecules; B) The corresponding molecular pairs shown in the 2D ECFP vs. 3D SHAPE dissimilarity space. 2D ECFP dissimilarity Do stop by booth 333 to pick up a discussion paper on our discovery tools or a reprint of this poster.