Top Banner
ChemAxon’s Chemical ChemAxon’s Chemical Fingerprints-Based Fingerprints-Based Clustering to Assess Clustering to Assess AurSCOPE Databases AurSCOPE Databases Chemical Diversity Chemical Diversity
20

ChemAxons Chemical Fingerprints-Based Clustering to Assess AurSCOPE Databases Chemical Diversity.

Mar 26, 2015

Download

Documents

Brian Pratt
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: ChemAxons Chemical Fingerprints-Based Clustering to Assess AurSCOPE Databases Chemical Diversity.

ChemAxon’s Chemical ChemAxon’s Chemical

Fingerprints-Based Clustering Fingerprints-Based Clustering

to Assess AurSCOPE Databases to Assess AurSCOPE Databases

Chemical DiversityChemical Diversity

Page 2: ChemAxons Chemical Fingerprints-Based Clustering to Assess AurSCOPE Databases Chemical Diversity.

Knowledge Base

Integration Platform

Query Interface

Analysis/Display Applications

The Aureus Pharma SystemThe Aureus Pharma System

Page 3: ChemAxons Chemical Fingerprints-Based Clustering to Assess AurSCOPE Databases Chemical Diversity.

AurSCOPE Statistics: March 2006

Publications Activities Ligands

GPCRGPCR

17 25017 250publications publications

includingincluding3525 patents3525 patents

635 000635 000 152 300152 300

Ion ChannelIon Channel

7 100 pub7 100 pubincludingincluding

2 685 patents2 685 patents217 600217 600 58 40058 400

KinaseKinase

2 565 pub2 565 pubIncluding 1069 Including 1069

patentspatents163 700163 700 51 80051 800

ADME/ADME/Drug-Drug Drug-Drug

InteractionsInteractions6 530 pub6 530 pub 179 000179 000

9 1009 100parent parent

compound + compound + metabolitesmetabolites

HERGHERG 800 pub800 pub 14 30014 300 3 5303 530

Page 4: ChemAxons Chemical Fingerprints-Based Clustering to Assess AurSCOPE Databases Chemical Diversity.

AurQUESTQuery management software for AurSCOPE

Web-based application integrating ChemAxon technology Powerful Query Builder

- Biological and Chemical Queries

- Structural search using ChemAxon tools

Efficient Navigation Different Export Formats (SDF, RDF, …)

Page 5: ChemAxons Chemical Fingerprints-Based Clustering to Assess AurSCOPE Databases Chemical Diversity.

• Counterions• MW > 700• Inorg• NAS

• Stereo-duplicates• Identical mol. but different salts…

AurSCOPE database

2D unique structures

1

2

3

4

Data Preprocessing

Page 6: ChemAxons Chemical Fingerprints-Based Clustering to Assess AurSCOPE Databases Chemical Diversity.

1151911519 molecules(*) (9897 uniques)

Protocols: Binding or Electrophysiology

Target: All

Target type: Wild

Parameter filterKi, EC50, IC50

< 300 nM< 300 nM(*) November 2005

AurSCOPE Ion Channels: Retrieving Active Molecules

Page 7: ChemAxons Chemical Fingerprints-Based Clustering to Assess AurSCOPE Databases Chemical Diversity.

0

500

1000

1500

2000

2500

3000

3500

4000

GABA

Nicotin

ic Ace

tylch

oline

rece

ptor

5HT3

NMDA

Calcium

Cha

nnel

Potas

sium

Cha

nnel

AMPA/K

A

Vanillo

id re

cept

or

Sodium

Cha

nnel

Ryano

dine

rece

ptor P2X IP

3

Acid S

ensin

g Io

n Cha

nnel

Glycine

rece

ptor

Chlorid

e Cha

nnel

AurSCOPE Ion Channels: Activity Distribution

Page 8: ChemAxons Chemical Fingerprints-Based Clustering to Assess AurSCOPE Databases Chemical Diversity.

Standardization of molecules.

Generating Chemical Fingerprints (CF).

Optimization of different CF parameters.

CF-based Jarvis-Patrick clustering with various

adjusted parameters.

Encoding Chemical Space and Clustering

Page 9: ChemAxons Chemical Fingerprints-Based Clustering to Assess AurSCOPE Databases Chemical Diversity.

Parameters for Generating Hashed Chemical Fingerprints• Fingerprint length

- The number of bits in the bit string.

- Bigger fingerprint increases the capacity for storing information on molecules.

• Maximum pattern length

- The maximum length of atoms in the linear paths that are considered during the fragmentation of the molecule. (The length of cyclic patterns is not limited.).

- Longer and more patterns hold more information on the molecule.

• Bits to be set for patterns

- After detecting a pattern, some bits of the bit string are set to "1". The number of bits used to code patterns is constant.

- Higher number of bits increases the coded information from a pattern.

• Darkness of the fingerprint

- The percentage of "1" digits in the bit string. We consider fingerprints with more ones "darker" than those with less ones.

Page 10: ChemAxons Chemical Fingerprints-Based Clustering to Assess AurSCOPE Databases Chemical Diversity.

FP lengthFP length Max #bondsMax #bonds Max #bitsMax #bits Aver. DarknessAver. Darkness Max. DarknessMax. Darkness

512 7 3 68.5 97.5512 7 4 82.2 99.4512 7 5 84.9 99.4512 8 3 76.1 99.2512 8 4 87.7 99.4512 8 5 89.8 99.4

1024 7 3 46.1 83.31024 7 4 61.5 94.81024 7 5 65.5 95.91024 8 3 54.8 91.91024 8 4 70.2 98.51024 8 5 73.8 98.9

2048 7 3 26.8 58.62048 7 4 39.1 78.620482048 77 55 42.442.4 81.681.62048 8 3 33.4 73.72048 8 4 47.5 89.62048 8 5 50.9 91.6

Chemical Fingerprints: Effect of Parameters

Page 11: ChemAxons Chemical Fingerprints-Based Clustering to Assess AurSCOPE Databases Chemical Diversity.

1.1. For each structure, collect the set of nearest neighbors that has a dissimilarity (distance) less than a T threshold value. Two structures cluster together if they are in each others list of nearest neighbors.

2.2. They have at least Rmin of their nearest neighbors in

common, where Rmin is a ratio of the length of the

shorter list.

CF-based Jarvis-Patrick Clustering

Page 12: ChemAxons Chemical Fingerprints-Based Clustering to Assess AurSCOPE Databases Chemical Diversity.

T Rmin # Clusters # Singletons

0.150.15 0.2 932 16630.3 938 16630.4 945 16630.5 977 1663

0.160.16 0.3 865 14990.5 910 1500

0.170.17 0.3 819 13720.5 860 1373

0.180.18 0.3 787 12380.5 826 1238

0.190.19 0.3 752 11400.5 780 1141

0.200.20 0.3 722 10510.5 752 1051

Chemical fingerprint length in bits: 2048Maximum number of bonds in patterns: 7Maximum number of bits to set for each pattern: 5

CF-based Jarvis-Patrick Clustering

Page 13: ChemAxons Chemical Fingerprints-Based Clustering to Assess AurSCOPE Databases Chemical Diversity.

0

50

100

150

200

250

300

350

1 54 107

160

213

266

319

372

425

478

531

584

637

690

743

796

849

902

size

CF-based Jarvis-Patrick ClusteringCF-based Jarvis-Patrick ClusteringSimilarity threshold = 0.85Similarity threshold = 0.85(*)(*)

0

50

100

150

200

250

300

350

400

1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49(*) Martin Y.C. et al. Do structurally similar molecules have similar biological activity? J. Med. Chem. 2002, 45, 4350-4358.

Page 14: ChemAxons Chemical Fingerprints-Based Clustering to Assess AurSCOPE Databases Chemical Diversity.

Most Populated ClustersMost Populated Clusters

Page 15: ChemAxons Chemical Fingerprints-Based Clustering to Assess AurSCOPE Databases Chemical Diversity.

Jarvis-Patrick Clustering: missclassifications ??

Page 16: ChemAxons Chemical Fingerprints-Based Clustering to Assess AurSCOPE Databases Chemical Diversity.

Jarvis-Patrick Clustering: Diverse Singletons

Page 17: ChemAxons Chemical Fingerprints-Based Clustering to Assess AurSCOPE Databases Chemical Diversity.

Most Populated Clusters: Biological " Most Populated Clusters: Biological " Projection"Projection"

Gamma aminobutyric acid A receptorVoltage-gated calcium channel

Nicotinic acetylcholine receptor Gamma aminobutyric acid A receptor

Gamma aminobutyric acid A receptor Nicotinic acetylcholine receptor Gamma aminobutyric acid A receptor

Page 18: ChemAxons Chemical Fingerprints-Based Clustering to Assess AurSCOPE Databases Chemical Diversity.

Gamma aminobutyric acid A receptor Potassium channel Gamma aminobutyric acid A receptor Voltage-gated calcium channel

5-HT3 Nicotinic acetylcholine receptor Gamma aminobutyric acid A receptor

Page 19: ChemAxons Chemical Fingerprints-Based Clustering to Assess AurSCOPE Databases Chemical Diversity.

Conclusions JKlustor integrates computationally rapid and efficient clustering tools.

Shortcomings to be addressed to deal with artificial singletons.

Future work: combination with Maximum Common Substructure approach (LibMCS).

Other algorithms (Ward,…)

Page 20: ChemAxons Chemical Fingerprints-Based Clustering to Assess AurSCOPE Databases Chemical Diversity.