22/01/2013 1 EBI is an Outstation of the European Molecular Biology Laboratory. Virtual Screening: Methods and Applications Dr Pedro J Ballester MRC Methodology Research Fellow EMBL-EBI, Cambridge, United Kingdom Talk outline Virtual Screening: Methods and Applications UCL MSc in Drug Design, Jan 2013 2 1. Introduction 2. Ligand-based Virtual Screening: Methods 3. Ligand-based Virtual Screening: Applications 4. Structure-based Virtual Screening: Methods 5. Structure-based Virtual Screening: Applications
27
Embed
Virtual Screening: Methods and Applications€¦ · • actives & 3D conformations as templates, growing size of compound libraries, limited computers, etc. • In practice, many
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
22/01/2013
1
EBI is an Outstation of the European Molecular Biology Laboratory.
Virtual Screening:
Methods and Applications
Dr Pedro J Ballester
MRC Methodology Research Fellow
EMBL-EBI, Cambridge, United Kingdom
Talk outline
Virtual Screening: Methods and Applications UCL MSc in Drug Design,
Virtual Screening: Methods and Applications UCL MSc in Drug Design,
Jan 2013
3
The Drug Discovery Process
• Developing new drug = average US$4 billion and 15 years http://www.forbes.com/sites/matthewherper/2012/02/10/the-truly-staggering-cost-of-inventing-new-drugs/
• While (pre)clinical trials are the most expensive stages,
the research determining approval at early stages:
• Finding a target linked to the disease and a molecule modulating
the function of target without trigering harmful side effects.
• New targets, but hard, less funding than traditional, etc.
Payne et al. (2007) Nat Rev. Drug Disc. 6:29
Payne et al. (2007) Nat Rev. Drug Disc. 6:29
Virtual Screening: Methods and Applications UCL MSc in Drug Design,
Jan 2013
4
Virtual Screening: Why?
• HTS: Main strategy for identifying active molecules (hits)
by wet-lab testing a library of molecules against a target.
• Computational methods (Virtual Screening) are needed:
• HTS is slow: HTS of corporate collections many months
• HTS is expensive: Average cost US$1M per screen.Payne et al. 2007
• Growing # of research targets no HTS until target validation
• Limited diversity:
HTS 106 cpds...
but 1060 small molecules!
(Dobson 2004 Nature)
• Target really undruggable?
22/01/2013
3
Virtual Screening (VS): when and which?
VS can complement HTS by
enriching libraries w/ likely ligands
Virtual Screening: Methods and Applications UCL MSc in Drug Design,
Jan 2013
5
HTS possible?
Can afford expense and
time?
Diversity essential?
HTS VS
VS
VS
Y
Y
N
N
Y
N
Ligand/s or structure for
target?
Target structure
available?
Ligand- > target-
based VS
Ligand-based VS
only
VS not possible
Y
Y
N
N
HTS or Virtual Screening (VS)? Type of Virtual Screening (VS)?
VS to predict target affinity (hit identification)
• Search for molecules that modulate
the function of a therapeutic target.
• Hypothesis: target fn modulation
cures/alleviates associated disease
Virtual Screening: Methods and Applications UCL MSc in Drug Design,
Jan 2013
6
• In silico predictions must be validated in vitro (IC50, Ki, Kd)
• Important requirements:
• Potent threshold depends on target and [drug]plasma
• Different chemical structure new IP, avoid previous problems
• Work well in practice vs Looking good on paper
22/01/2013
4
VS to predict selectivity (lead identification)
• Drugs must selectively bind to their intended target, as
binding to other proteins may cause harmful side-effects
1. The more chemically diverse the found hits are, the
more likely directly having selective hits will be.
Virtual Screening: Methods and Applications UCL MSc in Drug Design,
Jan 2013
7
2. Structure-based design: e.g. identify
hits that occupy a subpocket that is in
the target but not in related proteins.
3. Ligand-based design: 3D superpose
diverse hits and use activity against
related proteins to formulate hypothesis.
VS to predict whole-cell Activity (lead identif.)
• Drug molecules must also inhibit the
target in the cell environment.
• Whole-cell assay: e.g. cancer cell
growth inhibition (GI50) measured
in the presence of the molecule.
Virtual Screening: Methods and Applications UCL MSc in Drug Design,
Jan 2013
8
• Phenotypic screening is attractive ∵ many molecules
binding to the target do not have whole-cell activity.
• However, very few in silico methods for predicting which
molecules are likely to have whole-cell activity.
22/01/2013
5
Talk outline
Virtual Screening: Methods and Applications UCL MSc in Drug Design,
• Pose generation: estimating the conformation and orientation of
the ligand as bound to the target.
• Scoring: predicting how strongly the ligand binds to the target.
• Many relatively accurate algorithms for pose generation,
but imperfections of scoring functions continue to be the
major limiting factor for the reliability of docking.
Virtual Screening: Methods and Applications UCL MSc in Drug Design,
Jan 2013
29
• If X-ray structure of the target
is available Docking:
• predicting whether and how a
molecule binds to the target.
• Possible variables:
• translation and rotation of the ligand relative to the binding site
involves six degrees of freedom (3D position + 3D orientation)
• Torsional/conformational degrees of freedom of both the
ligand (on the fly/stored) and the protein (flexible docking).
Pose generation in Docking
• Goal: finding a pose as similar as possible to that of the ligand co-crystallised with the target.
• How: search algorithm generates several poses of each considered ligand (multimodal optimisation problem). • tradeoff between time and search space coverage
22/01/2013
16
• Force Field-based SFs (e.g. DOCK score)
• Empirical SFs (e.g. X-Score)
• Knowledge-based SFs (e.g. PMF)
• SFs are trained on pK data usually through MLR:
• FF (Aij, Bij), Emp(w0,…,w4) and sometimes KB ( )
Scoring Functions for Docking: functional forms
Virtual Screening: Methods and Applications UCL MSc in Drug Design,
Jan 2013
31
Scoring Functions for Docking: limitations
• Two major sources of error affecting all SFs:
1. Limited description of protein flexibility.
2. Implicit treatment of solvent.
• This is necessary to make SFs sufficiently fast.
• 3rd source of error has received little attention so far:
• Conventional scoring functions assume a theory-inspired
predetermined functional form for the relationship between:
• the structure-based description of the p-l complex
• and its measured/predicted binding affinity
• Problem: difficulty of explicitly modelling the various
contributions of intermolecular interactions to binding affinity.
• Also, SFs use an additive functional form, but this has been
specificly shown to be suboptimal (Kinnings et al. 2011 JCIM).
Virtual Screening: Methods and Applications UCL MSc in Drug Design,
Jan 2013
32
22/01/2013
17
Virtual Screening: Methods and Applications UCL MSc in Drug Design,
Jan 2013
33
non-parametric machine learning can be used to implicitly
capture the functional form (data-driven, not knowledge-based)
A Machine Learning Approach
A machine learning approach
• Main idea: a priori assumptions about the functional
form introduces modelling error no asumptions!
• reconstruct the physics of the problem implicitly in an
entirely data-driven manner using non-parametric ML.
• Random Forest (Breiman, 2001) to learn how the
atomic-level description of the complex relates to pK:
• Random Forest (RF): a large ensemble of diverse DTs.
• Decision Tree (DT): recursive partition of descriptor space s.t.
training error is minimal within each terminal node.
• But how do we characterise a protein-ligand complex as
set of numerical descriptors (features)?
Virtual Screening: Methods and Applications UCL MSc in Drug Design,
Jan 2013
34
22/01/2013
18
Characterising the protein-ligand complex
Virtual Screening: Methods and Applications UCL MSc in Drug Design,
Jan 2013
35
pKd/i C.C … C.Cl … C.I N.C … I.I PDB ID
5.70 95 30 0 73 0 2p33
+1 binding affinity
features or
descriptors
PDBbind benchmark
• De facto standard for SFs benchmarking: Cheng, T., Li, X., Li, Y., Liu, Z. & Wang, R. (2009) JCIM 49, 1079-1093
• Refined set 1300 manually curated protein-ligand
complexes with measured binding affinity ( diverse):
• Benchmark: 16 state-of-the-art SFs test set error
• RF-Score vs 16 SFs on test set error, but:
• Other SFs have an undisclosed number of cmpxes in common!