1 John Mitchell; James McDonagh; Neetika Nath Rob Lowe; Richard Marchese Robinson
Jan 15, 2016
1
John Mitchell; James McDonagh; Neetika Nath
Rob Lowe; Richard Marchese Robinson
RF-Score: a Machine Learning Scoring Functionfor Protein-Ligand Binding Affinities
• Ballester, P.J. & Mitchell, J.B.O. (2010) Bioinformatics 26, 1169-1175
Calculating the affinities of protein-ligand complexes:
For docking
For post-processing docking hits
For virtual screening
For lead optimisation
For 3D QSAR
Within series of related complexes
For any general complex
Absolute (hard!)
Relative
A difficult, unsolved problem.
Three existing approaches …
1. Force fields
Three existing approaches …
2. Empirical Functions
Three existing approaches …
2. Empirical Functions
Three existing approaches …
3. Knowledge based
How knowledge-based scoring functions have worked …
P-L complexes from PDBAssign atoms to typesFind histograms of type-type distancesConvert to an ‘energy’Add up the energies from all P-L atom pairs
This conversion of the histogram into an energy function uses a “reverse Boltzmann” methodology.
Thus it “assumes” that the atoms of protein and ligand are independent particles in equilibrium at temperature T.
For a variety of reasons, these are poor assumptions …
Molecular connectivity: atom-atom distances are miles from being independent.
Excluded volume effects.
No physical basis for assuming such an equilibrium.
Changes in structure with T are small and not like those implied by the Boltzmann distribution.
We thought about this …
… and wrote a paper saying
“It’s not true, but it sort of works”
We thought about this …
… and wrote a paper saying
“It’s not true, but it sort of works”
Then we had a better idea – could we dispense with the reverse Boltzmann formalism?
Instead of assuming a formula that relates the distance distribution to the binding free energy …
… use machine learning to learn the relationship from known structures and binding affinities.
Instead of assuming a formula that relates the distance distribution to the binding free energy …
… use machine learning to learn the relationship from known structures and binding affinities.
And persuade someone to pay for it!
Random Forest
Predicted binding affinity
Random Forest● Introduced by Briemann and Cutler (2001)● Development of Decision Trees (Recursive Partitioning):
● Dataset is partitioned into consecutively smaller subsets
● Each partition is based upon the value of one descriptor
● The descriptor used at each split is selected so as to optimise splitting
● Bootstrap sample of N objects chosen from the N available objects with replacement
The Random Forest is a just forest of randomly generated decision trees …
… whose outputs are averaged to give the final prediction
Building RF-Score
PDBbind 2007
Building RF-Score
PDBbind 2007
Validation results: PDBbind set
Following method of Cheng et al. JCIM 49, 1079 (2009) Independent test set PDBbind core 2007, 195 complexes from 65 clusters
Validation results: PDBbind set
RF-Score outperforms competitor scoring functions, at least on our test RF-Score is available for free from our group website
26
John Mitchell; James McDonagh; Neetika Nath
Rob Lowe; Richard Marchese Robinson