Top Banner
JKlustor clustering chemical libraries presented by … maintained by Miklós Vargyas Last update: 25 March 2010
25

JKlustor clustering chemical libraries presented by … maintained by Miklós Vargyas Last update: 25 March 2010.

Mar 26, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: JKlustor clustering chemical libraries presented by … maintained by Miklós Vargyas Last update: 25 March 2010.

JKlustorclustering chemical libraries

presented by …maintained by Miklós Vargyas

Last update: 25 March 2010

Page 2: JKlustor clustering chemical libraries presented by … maintained by Miklós Vargyas Last update: 25 March 2010.

JKlustor

Chemical clustering by similarity and structure

Page 3: JKlustor clustering chemical libraries presented by … maintained by Miklós Vargyas Last update: 25 March 2010.

JKlustor performs similarity and structure based clustering of compound libraries and focused sets in both hierarchical and non-hierarchical fashion.

Description of the product

JKlustor

Availability• part of Jchem• IJC (parts)• server version (accessible via API)• batch application programs • HTML user interface• one desktop application with GUI• GUI is available as an applet

Page 4: JKlustor clustering chemical libraries presented by … maintained by Miklós Vargyas Last update: 25 March 2010.

Wide range of methods• Unsupervised, agglomerative clustering• Hierarchical and non-hierarchical methods • Similarity based and structure based techniques

Flexible search options • Tanimoto and Euclidean metrics, weighting• Maximum common substructure identification• chemical property matching including atom type, bond type,

hybridization, charge

Interactive display • interactive hierarchy browser (dendrogram viewer)• SAR-table • R-table

Efficient• performance of tools varies between linear and quadratic scale

Summary of key featuresSummary of key features

Page 5: JKlustor clustering chemical libraries presented by … maintained by Miklós Vargyas Last update: 25 March 2010.

Versatile • Choose the most appropriate method to the clustering

problem• Combine methods to achieve best results• Use your trusted molecular descriptors in similarity

calculation• Easy integration in corporate discovery pipelines• Cluster chemical files directly no need to import structures

in database

Intuitive• Cluster formation is self-explanatory

Benefits

Page 6: JKlustor clustering chemical libraries presented by … maintained by Miklós Vargyas Last update: 25 March 2010.

Similarity based clustering

Hierarchical• Ward

Non-hierarchical• Sphere exclusion• k-means• Jarvis-Patrick

Page 7: JKlustor clustering chemical libraries presented by … maintained by Miklós Vargyas Last update: 25 March 2010.

• Ward's minimum variance method results in tight, well separated clusters

• Murtagh's reciprocal nearest neighbor (RNN) algorithm to speed it up

• quadratic scaling of running time (with respect to number of input structures)

• memory consumption scales linearly

• best used with smaller sets (like focused libraries), copes with < 100K structures

Ward Clustering Features

Page 8: JKlustor clustering chemical libraries presented by … maintained by Miklós Vargyas Last update: 25 March 2010.

• based on fingerprints and/or other numerical data

• running time linear with respect to number of input structures

• memory scales sub-linearly

• can easily cope with 1Ms of structures

• suitable for diverse subset selection

Sphere Exclusion Clustering Features

Page 9: JKlustor clustering chemical libraries presented by … maintained by Miklós Vargyas Last update: 25 March 2010.

• based on fingerprints and/or other numerical data

• minimises variance within each clusters

• number of clusters can directly be controlled

• finds the centre of natural clusters in the input data

• running time scales exponentially with respect to number of input structures

• can cope with <100Ks of structures

k-means Clustering Features

Page 10: JKlustor clustering chemical libraries presented by … maintained by Miklós Vargyas Last update: 25 March 2010.

• variable-length Jarvis-Patrick clustering

• based on fingerprints and/or other numerical data

• takes structures/fingerprint and data values from either files or form database tables

• running time scales better than quadratic but worse than linear (with respect to number of input structures)

• memory scales linearly

• Jarp can cope with 100Ks of structures

• depending on data and parameters may create large number of singletons

Jarp Clustering Features

Page 11: JKlustor clustering chemical libraries presented by … maintained by Miklós Vargyas Last update: 25 March 2010.

• 8 different sets of know active compounds mixed together• 5-HT3-antagonists• ACE inhibitors• angiotensin 2 antagonists• D2 antagonists• delta antagonists• FTP antagonists• mGluR1 antagonists• thrombin inhibitors

• ChemAxon’s 2D Pharmacophore fingerprint was generated

• Fingerprints of the mixture were clustered by Ward• 9 clusters were formed

• 8 centroids (cluster representative element) corresponded to the 8 activity classes

• 1 was a singleton

• All 8 real clusters contained structures only from the activity class of the centroid (over 95% true positive classification)

Ward Clustering Example

Page 12: JKlustor clustering chemical libraries presented by … maintained by Miklós Vargyas Last update: 25 March 2010.

Ward Clustering Example

Centroids

Page 13: JKlustor clustering chemical libraries presented by … maintained by Miklós Vargyas Last update: 25 March 2010.

Ward Clustering Example

Cluster of the D2 antagonists

Page 14: JKlustor clustering chemical libraries presented by … maintained by Miklós Vargyas Last update: 25 March 2010.

Structure based clustering

Non-hierarchical• Bemis-Mucko frameworks

Hierarchical• LibraryMCS

Page 15: JKlustor clustering chemical libraries presented by … maintained by Miklós Vargyas Last update: 25 March 2010.

Bemis-Murcko frameworks

Page 16: JKlustor clustering chemical libraries presented by … maintained by Miklós Vargyas Last update: 25 March 2010.

Bemis-Murcko frameworks

Page 17: JKlustor clustering chemical libraries presented by … maintained by Miklós Vargyas Last update: 25 March 2010.

• based on structure of molecules

• cluster formation is apparent, visual, meets human expectations

• running time linear with respect to number of input structures

• memory scales sub-linearly

• can easily cope with 1Ms of structures

• suitable for quick overview of very large sets

• spots scaffold hops

Bemis-Murcko frameworks features

Page 18: JKlustor clustering chemical libraries presented by … maintained by Miklós Vargyas Last update: 25 March 2010.

Identifies the largest subgraph shared by several molecular structures

LibraryMCS

Page 19: JKlustor clustering chemical libraries presented by … maintained by Miklós Vargyas Last update: 25 March 2010.

LibraryMCS: Hierarchical MCS

Page 20: JKlustor clustering chemical libraries presented by … maintained by Miklós Vargyas Last update: 25 March 2010.

SAR table view

Page 21: JKlustor clustering chemical libraries presented by … maintained by Miklós Vargyas Last update: 25 March 2010.

R-group decomposition

Page 22: JKlustor clustering chemical libraries presented by … maintained by Miklós Vargyas Last update: 25 March 2010.

• based on structure of molecules

• cluster formation is apparent, visual, meets human expectations

• running time near-linear with respect to number of input structures

• can cope with 100K-200K of structures

• suitable for very thorough analysis

• spots scaffold hops

• substituent-activity (property analysis)

LibraryMCS features

Page 23: JKlustor clustering chemical libraries presented by … maintained by Miklós Vargyas Last update: 25 March 2010.

LibraryMCS integration at Abbott

“Clustering for the masses…”,presented by Derek Debe at ChemAxon’s US UGM, Boston, 2008

Page 24: JKlustor clustering chemical libraries presented by … maintained by Miklós Vargyas Last update: 25 March 2010.

Clustering performance comparison

0

10

20

30

40

50

60

70

80

90

0 20000 40000 60000 80000 100000 120000

Structure count

Run

ning

tim

e (m

in)

LibraryMCSJarvis-PatrickWard-Murtagh

Page 25: JKlustor clustering chemical libraries presented by … maintained by Miklós Vargyas Last update: 25 March 2010.

Jklustor roadmap

In the development pipeline• Bemis-Murcko generalisations• IJC integration• KNIME integartion• New GUI• Manual clustering• Multiple class membership• Disconnected MCS (MOS)

Planned• PipelinePilot integration• Spotfire integration• JChemBase, JChemCartridge integration• JC4XLS integration

Blue sky• Multitouch gestures• LibraryMCS for 1M compound libraries