ICIC Berlin October 2012 - Dr. Haxel

The ChEMBL Database

ICIC 2012 Berlin, Germany October 2012

John P. Overington

EMBL-‐EBI

[email protected]

Drug-like compounds

Chemical Space All compounds Available compounds

Only certain molecules have features consistent with good pharmacological properties

Druggable targets

Target Space

Only certain targets have binding sites capable of ligand efficient binding of drug-like ligands

All targets Available targets

Accessible Pharmacological Space

Available compounds for

target but non-drug-like

Drug-like compounds but no complementary

targets

Druggable targets but no

complementary compounds

Druggable targets and complementary

compounds

Pre

sent

ed to

P&

G, C

inci

nnat

i, A

pril

2005

, © 2

005

Inph

arm

atic

a Lt

d.

All reasonable molecules 1020

All reasonable proteins

106

Screened proteins 103

Screened molecules 107-8

ChEMBL

Chemogenomics

Exploration of bioactivity space at genomic scale Structure Activity Relationship (SAR)

Drugs 103

Drug targets 102 Drugs

ChEMBL Database

•  hKp://www.ebi.ac.uk/chembl •  Funded by a Strategic Award from the Wellcome Trust •  World’s largest primary source of Open pharmacology/drug

discovery data –  Contains syntheTc small molecules, natural products and biologicals –  Strong integraTon and annotaTon of chemical and biological data –  OSINT approach to data gathering –  Tight integraTon with other EBI resources

•  Ensembl, 1000 Genomes, UniProt, PDBe, ArrayExpress, Atlas…. –  Data sharing agreements in place with key public resources, e.g. PubChem

•  Open Data – CC-‐BY-‐SA licence •  Free downloads, secure private searching,… •  REST web service API

Target Discovery

Lead Discovery Lead OpTmizaTon

Preclinical Development

Phase 1 Phase 2 Phase 3 Launch (Phase 4)

Drug Discovery

~1,400,000 compound records >10,000,000 bioacTviTes ~46,000 abstracted papers ~9,000 targets

~12,000 clinical candidates

~1,600 drugs

•  Target idenTficaTon •  Microarray profiling •  Target validaTon •  Assay development •  Biochemistry •  Clinical/Animal disease models

•  High-‐throughput Screening (HTS) •  Fragment-‐based screening •  Focused libraries • Screening collecTon

•  Medicinal Chemistry •  Structure-‐based drug design •  SelecTvity screens •  ADMET screens •  Cellular/Animal disease models •  PharmacokineTcs

•  Toxicology •  In vivo safety pharmacology •  FormulaTon •  Dose predicTon

PK tolerability Efficacy

Safety & Efficacy

IndicaTon discovery, repurptg & expansion

Med. Chem. SAR Clinical Candidates Drugs

Discovery Development Use

ChEMBL content

Only ~1% of Genome is a Drug Target

Drug Approvals

FDA Approved Drugs

NFκB Pathway – key control mechanism for inflammaTon

Affinity of Drugs for their‘Targets’

Ki, Kd, IC50, EC50, & pA2 endpoints for drugs against their‘efficacy targets’

2 3 4 5 6 7 8 9 10 11 12 0

50

100

150

200

250

300

350

400

Freq

uency

-‐log10 affinity

10mM 1mM 100mM 10mM 1mM 100nM 10nM 1nM 100pM 10pM 1pM

Overington, et al, Nature Rev. Drug Discov. 5 pp. 993-‐996 (2006) Gleeson et al, Nature Rev. Drug Discov. 10 pp. 197-‐208 (2011)

Clinical Candidates

•  CollecTon of clinical development candidates –  Contains ~12,000 2-‐D structures/sequences

•  EsTmated size ~35-‐45,000 compounds

–  Work in progress •  e.g. Protein kinases, 393 disTnct clinical candidates

Different Types of Drugs

Pharma Industry ProducTvity File RegistraTon number vs. USAN date

0

100,000 200,000 300,000 400,000 500,000 600,000 700,000 800,000

1960 1965 1970 1975 1980 1985 1990 1995 2000 2005 2010

Phase 2b date

~Discovery date

Overington, unpublished

Pharma Industry ProducTvity

0

10

20

30

40

50

60

70

1-‐ 100,000

100,001-‐ 200,000

200,001-‐ 300,000

300,001-‐ 400,000

400,001-‐ 500,000

500,001-‐ 600,000

600,001-‐ 700,000

700,001, 800,000

File registraTon number range

64 USANs/100,000 compounds

1.9 USANs/100,000 compounds

16 Drugs/100,000 compounds

0.4 Drugs/100,000 compounds

Large Pharma needs on average to synthesize and test ~250,000 compounds for each launched drug

Overington, unpublished

Patent and PublicaTon Lag

IBM Patent data and ChEMBL

Clinical Candidates

What Is the ChEMBL Data?

SAR Data

Compound

Assay

Ki=4.5 nM

>Thrombin MAHVRGLQLPGCLALAALCSLVHSQHVFLAPQQARSLLQRVRRANTFLEEVRKGNLERECVEETCSYEEAFEALESSTATDVFWAKYTACETARTPRDKLAACLEGNCAEGLGTNYRGHVNITRSGIECQLWRSRYPHKPEINSTTHPGADLQENFCRNPDSSTTGPWCYTTDPTVRRQECSIPVCGQDQVTVAMTPRSEGSSVNLSPPLEQCVPDRGQQYQGRLAVTTHGLPCLAWASAQAKALSKHQDFNSAVQLVENFCRNPDGDEEGVWCYVAGKPGDFGYCDLNYCEEAVEEETGDGLDEDSDRAIEGRTATSEYQTFFNPRTFGSGEADCGLRPLFEKKSLEDKTERELLESYIDGRIVEGSDAEIGMSPWQVMLFRKSPQELLCGASLISDRWVLTAAHCLLYPPWDKNFTENDLLVRIGKHSRTRYERNIEKISMLEKIYIHPRYNWRENLDRDIALMKLKKPVAFSDYIHPVCLPDRETAASLLQAGYKGRVTGWGNLKETWTANVGKGQPSVLQVVNLPIVERPVCKDSTRIRITDNMFCAGYKPDEGKRGDACEGDSGGPFVMKSPFNNRWYQMGIVSWGEGCDRDGKYGFYTHVFRLKKWIQKVIDQFGE

ED2=230 nM

What Is the ChEMBL Data?

Inhibition of human Thrombin

PTT (partial thromboplastin time)

ChEMBL Target Types

Protein complex

e.g. NicoTnic acetylcholine receptor e.g. Muscarinic receptors e.g. DNA

e.g. Mitochondria e.g. Trachea e.g. HEK293 cells e.g. Drosophila

e.g. PDE5

Protein Nucleic Acid Protein family

Cell line Tissue Sub-‐cellular frac>on Organism

Compound Searching

21

Spreadsheet Views

22

Ligand Efficiency

23

•  Ligand efficiency is an objecTve measure of how much binding energy comes from each atom in a parTcular interacTon – Drugs have high ligand efficiency

–  Every atom counts – Need to avoid affinity from lipophilicity

Target Class Data

Assay Organism Data

Allosteric Regulators •  Allosteric drugs can have some advantages over orthosteric drugs – SelecTvity – Orthosteric site may be undruggable

Allosteric/Orthosteric sites for GPCRs

hKp://www.chemblog.org

ICIC Berlin October 2012 - Dr. Haxel

Documents