Surveying ligandSurveying ligand-- and targetand target ... · >75,000 Human Sequences >116,000 Total PDB chains (~50K PDBs) > 42,000 Homology Models >194,000 PDB co-crystal sites

Surveying ligandSurveying ligand-- and targetand target--based similarities ithin thebased similarities ithin thebased similarities within the based similarities within the

KinomeKinome

Stephan Stephan SchürerSchürer & Steven Muskal& Steven Muskal

Kinase Targets of Clinical Interest from Vieth et al. Drug Disc. Today 10, 839 (2005).

Eidogen-Sertanty KKB SAR Data Point Distribution

Primary targets w/ reportedclinical data

Reported secondary targets & targets w/ >60% ID

Kinase SAR Knowledgebase – Hot Targets

>362,000 SAR data points curated from >4,270 journal articles and patents

>130 Bayesian QSAR Models

• Knowledge-Driven Discovery Solutions Provider• Formed in March 2005 when Sertanty (Libraria Sertanty 2003) acquired Eidogen (Bionomix 2000)• >$20M Invested in Technology Development• 12 FTE’s• Worldwide Customerbase• Cash-Positive

About Eidogen-Sertanty

• DirectDesign™ Discovery Collaborations• In Silico Target Screening (“Target Fishing” and Repurposing)• Target and compound prioritization services• Fast Follower Design: Novel, Patentable Leads

• Chemogenomic Databases & Analysis Software• TIP™ - Structural Informatics Platform• KKB™ - Kinase SAR and Chemistry Knowledgebase• CHIP™ - Chemical Intelligence Platform

> 400KSequences

> 158KChains &Models

> 388KSites

> 33MSequence Similarities

> 69MStructure Similarities

> 62MSite Similarities

TIP Algorithm Engine

STRUCTFAST™

Basic Principle: Gaps known to exist should not be strongly penalized.

Known Gap

Structure Alignment of Homologous Crystal Structures

STructure Realization Utilizing Cogent Tips From Aligned Structural Templates

Leverages experimental structure and structural alignment data to create better alignments

Known Gap

2) STRUCTFAST: Protein Sequence Remote Homology Detection and Alignment Using Novel Dynamic Programming and Profile-Profile Scoring Proteins. 2006 64:960-967

1) Convergent Island Statistics: A fast method for determining local alignment score significance. Bioinformatics, 2005, 21, 2827-2831

SiteSeeker™

Geometric Site-Finding Algorithms Find Many PocketsBut they don’t know which pockets are important!

Evolutionary Trace ApproachCan’t clearly define site boundary

Not all conserved residues are functionally relevant

Reliability & ConfidenceWe use proteins with apo- & co-crystal structures in the PDB to test the accuracy & reliability of method

Allows us to map SiteSeeker score to predict confidence!(e.g. At this SiteSeeker score, 80% are “real” co-crystal sites)

Sites with <60% confidence are not stored in TIP

SiteSeeker combines both methods

SiteSorter™Weighted Clique Detection Algorithm

Importance of Points Related To Conservation In Multiple Sequence Alignment

Surface Atoms Assigned One of 5 Different Chemical CharactersMatching points increase the SiteSorter similarity score

TIP Content>75,000 Human Sequences

>116,000 Total PDB chains (~50K PDBs)> 42,000 Homology Models

>194,000 PDB co-crystal sites>190,000 Predicted Sites (on PDBs & Models)

>33M Sequence Similarities

>69M Structural Similarities

>62M Site Similarities

Automatically updated with new models as the PDB grows

Updated monthly withnew PDBs and models:

e.g. March 2006:661 new PDBs added447 new models built- 153 had no previous structure in TIP - 294 had “better” models built

e.g. July 2008:576 new PDBs added1045 new models built

Kinase Knowledgebase (KKB)Kinase inhibitor structures and SAR data mined from

> 4278 journal articles/patents

KKB Content Summary (Q2-2008):# of kinase targets: >390# of SAR Data points: > 362,000# of unique kinase molecules with SAR data: >120,000# of annotated assay protocols: >16,000# of annotated chemical reactions: >2,300# of unique kinase inhibitors: >465,000 (~340K enumerated from patent chemistries)

KKB Growth Rate:• Average 15-20K SAR data points added per quarter• Average 20-30K unique structures added per quarter

Kinase Knowledgebase (KKB)Kinase inhibitor structures and SAR data mined from

> 4100 journal articles/patents

KKB Content Summary (Q1-2008):# of kinase targets: >300# of SAR Data points: > 345,000# of unique kinase molecules with SAR data: >118,000# of annotated assay protocols: >15,350# of annotated chemical reactions: >2,300# of unique kinase inhibitors: >463,000 (~340K enumerated from patent chemistries)

KKB Growth Rate:• Average 15-20K SAR data points added per quarter• Average 20-30K unique structures added per quarter

Kinase Validation Set

Three sizable datasets freely available to the research community

http://www.eidogen-sertanty.com/kinasednld.php

LIMK1 – ATP binding site comparison LIMK1

AURKA SRC 62% ID in ATP Site

>3000 inhibitors in KKB

58% ID in ATP Site


Hbond donors Hbond acceptors Hbond donor/acceptors

LCK 58% ID in ATP Site


Conserved with LIMK1 Not conserved with LIMK1

The ATP site of LIMK1 shares a high level of homology with several wellstudied kinases

Kinome by SequenceKinome by Sequence

Kinase domain sequence similarities - MST

CAMKCAMK

AGC second domainAGC second domain

AGCAGC

TKTKothersothers

CK1CK1

CMGCCMGC

STESTERGCRGCTKLTKL

Kinome by SARKinome by SAR

Relating kinase targets by SAR

• Relationships derived from Bayesian categorization models• Adopted from Schuffenhauer Org Biomol Chem 2004 3256

• Bayesian categorization models built within PipelinePilot:

• Kinase enzyme assay data activity cutoff pIC50 > 6 5; all other• Kinase enzyme assay data, activity cutoff pIC50 > 6.5; all other compounds “negative”

• Functional group connectivity fingerprints length 4• ROC > 0.7

• Bayesian feature weights (~10,000 features) extracted for each model

• Correlation matrix determined between Bayesian vectors

• Visualization via minimum spanning trees (Kruskal algorithm)• Visualization via minimum spanning trees (Kruskal algorithm)

Kinase SAR Bayesian models

1

IKBMAP3K10

AURKCALK

MSK1 TNK2 FAK2 PHK PIK4CA

MAP3K8

CDC7

MYLK MAP2K2

CDC2A

ROCK1TGFBR1

ROCK

NTRK2

PRKCH

PDKPRKCB2

CDK9

PRKCE MAPKAPK3

MAP2K1

ADKIKBKB

ABL1

CSKPRKCQ

PRKCB1

ERBB2MAPKAPK2

RO

C

0.98

0.99 PRKACA MAPKDYRK1A

IRAK4

EPHA2

AKT3MAP3K11RPS6KA5

RPS6KA1YES1

MAP3K9

CAMK2

ERN

ERBB4

AKT2

AURKB CDC2A

CAMKSYK WEE1

ILKPKG

ROCK2

PIK3CBTK1

IGF1R FGFR

PIK3

CHEK2

JAK3

CDK9

GSK3A

MAPK10

PRKCG

FAK

AKT1

RAF1

CHEK1

PIK3CG

AURKA

BRAF

MET

PKC

CDK5

FGFR1

ERBB2

GSK3B

CDK4CDC2

EGFR

0.96

0.97PHKA1 CSNK1D

MAP3K9

PLK1

AKT2

BTK

CSNK2

PIK3CD

MAPK8

MAPK9

PDGFR

GSK3

PIK3CA

CDK7

PRKCDITK

PRKDC

FLT1 CHUK

TEKCSF1R

FLT3

PKAPRKCA

LCK

PDGFRB

SRCKDR

0.95EPHB4

CDK6

RPS6KB1FGFR2

MAPK11

BCR_ABL

FYN

NTRK1

RET

KIT

ABL

PRKDCMAPK14

CDK2

0.93

0.94 MAPK11

CMGCCMGCCAMKCAMK AGCAGCSTESTETKTK othersothersTKLTKL

0.92

12 25 40 80 150 299 600 999 2000 4000 6000

LYNPDPK1

KKB Num DP

130 Kinase Enzyme ModelsFCFP_4 fingerprintsscaled by number of actives

Kinase target relationships by SAR – MST

TKTK

othersothers

PKCsPKCs

CMGCCMGC AGCAGC

MAPKsMAPKs

PI3KsPI3KsCDKsCDKs

130 kinase models MST – all “similarities” > 0.27

SAR-based similarity vs. Sequence identity

0.8

0.9

eatu

re s

im CDC2A / CHUK

NH

NHR

0 6

0.7SA

R fe N

RNH

R

0 4

0.5

0.6

0.3

0.4

0.1

0.2

FGFR2 / FGFR30

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

Sequence identity

FGFR2 / FGFR3

CDC2A and CHUK: > 90 ligands with activity against both targets

FGFR2 / FGFR3: no similar ligands

338 259 C

ount

FCFP4 Tanimoto (all pairs) Top active FGFR2:

200

250 238

One activeOne activeNeither activeNeither active

C

150

200

164 152

182 Both activeBoth active

Top active FGFR3:

100 80

98

Top active FGFR3:

50

20

65

31

00.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 0.2 0.22 0.24 0.26 0.28 0.3 0.32

2 8 13

3

Similarity Bin

Kinome by structure binding site similaritiesKinome by structure binding site similarities

Relating kinases by ATP binding-site similarity

• Human Kinase domain sequences extracted (Sugen, Swissprot, PFAM)

H Ki (500 ) d l d i STRUCTFAST• Human Kinome (500 sequences) modeled using STRUCTFAST• Multiple models per sequence (subset of 263 presented here)

• Binding sites for all models computed (SiteSeeker)Binding sites for all models computed (SiteSeeker)

• Binding site similarity scores computed (SiteSorter)

Si il i li d AB N AB / (AA BB AB)• Similarity scores normalized: AB_Norm := AB / (AA + BB - AB)

• AB – Site Similarity between sites A & B• AA / BB “Self Site” Similarity Scores• AA / BB – Self Site Similarity Scores

• Analysis and visualization with MST

Kinase Site Similarity Relationships – MST

TKTKothersothersCAMKCAMK

STESTETKLTKL

AGCAGCAGCAGC

CK1CK1

TKTK

CMGCCMGC263 kinases; MST – all “similarities” > 0.6

Sequence vs. Site Similarity

1ite S

im

MAP3K8 / NTRK1

0.9

Si

MAST4 / PIK3R4

0.7

0.8

0.6

Same templateSame template

0.4

0.5Same templateSame templateDifferent templateDifferent template

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9Sequence Identity

Similar sites – different sequences

• STE_STE11_MAP3K8: template 1u5rA• TK Trk TRKA (NTRK1): template 1ir3A_ _ ( ) te p ate 3

MAP3K8 NTRK1

LGKGAY.V.A.K.V.E.V.MEFV.GGS.S.D.NN.M.DLGEGAF V A K – E V FE-M –GD – D –N L D

MAP3K8NTRK1

Similar sites – different site AA composition

• AGC_MAST_MAST4: template 1z5mA• Other_VPS15_PIK3R4: template 1z5mA• Site sequence similarity: 0 2Site sequence similarity: 0.2• Normalized (physicochemical) site similarity: 0.78

PIK3R4 MAST4PIK3R4 MAST4

.K.ISNG.GAV.A.K.V.MEYVEGGD.T.K.DN.L.TD

.K.LGST.FKV.K.F.P.FRQYVRDN.D.S.EN.M.TDMAST4PIK3R4

What did we learn?

Expected global trend:Similar sequence results in physicochemical- and fold-similar binding sites

Dissimilar sequences do not always result in different binding sites

Binding site similarities group in “patches” by domain sequence similaritySubtle differences in site relationships among groups and sub-types

Modeling templates influence results: F ki i t l t t i t b t b d l dFor many kinases no experimental structures exist, but can be modeledGrowing body of structural information will optimize the picture

Body of selective Kinase compounds continues to grow

In principle, small molecules can be optimized to differentiate between very similar (sequence) kinases

Conclusions and Next steps

Quantifying similarity relationships within the Kinome can provide insight in early Kinase drug development

Similarity within the Kinome should consider SAR-based and structure-based binding site similarity (v. domain sequence-based similarity)

Next steps include

Analyze trends with respect to DFG-In/DFG-outy pQuantify template effectsInvestigate effects of site size and predicted vs. templated sites

Acknowledgements

• Stephan Schürer

• Kevin Hambly

• Joe Danzer

• Brian Palmer

• Derek Debe

• Aleksandar Poleksic

• Accelrys/Scitegic - Shikha Varma-O'Brien/Ton van Daelen

ContactSteven Muskal

Chief Executive [email protected]

Surveying ligandSurveying ligand-- and targetand target ... · >75,000 Human Sequences >116,000 Total PDB chains (~50K PDBs) > 42,000 Homology Models >194,000 PDB co-crystal sites

Documents