1 TM PRO & Comparison of Algorithms for “Protein Stability Prediction Upon Mutations” Madhavi Ganapathiraju Graduate student Carnegie Mellon University
Jan 03, 2016
1
TM PRO&
Comparison of Algorithms for “Protein Stability Prediction Upon Mutations”
Madhavi GanapathirajuGraduate student
Carnegie Mellon University
2
Overview
• TMpro evaluations on PDBTM, TMPDB and MPTOPO are complete
• Additional inputs to TMPro are being studied– Yule values (not successful)– Evolutionary Profile (promising)
• TMPro website has been completed• Evaluation of algorithms to predict protein
stability changes upon mutations
3
Part 1: TM pro
4
TMPro Evaluations
Segment Residuelevel
Method Qok SegmentF Score
Segment Recall
SegmentPrecision
Q2 Misclassified as
Soluble
MPtopo (101 TM proteins)
2a TMHMM 66 91 89 94 84 5
2b TMpro NN 60 93 92 94 79 0
PDBTM (191 TM proteins)
3a TMHMM 68 90 89 90 84 13
3b TMpro NN 57 93 93 93 81 2
5
TMPro web-server
is fully functional!
Competition for TMpro
Logo
Prize:See your
logo on the web!
6
Attempts to overcome confusion with globular soluble helices (1)
• Yule value features to be added– Yule value features that discriminate amino acid
neighbor propensities between TM and nonTM helices were computed earlier
– Tried to add these features as input to NN predictor, but could not achieve quantitative improvement
– I will discuss this in future when I have any results to present
7
Attempts to overcome confusion with globular soluble helices (2)
• Evolutionary profile information– It is known that knowledge of evolutionary profile of a
protein can improve prediction accuracy to a great extent
• TMPro is capable of predicting TMs without requiring knowledge of profile– Useful when you cannot extract sequence
alignments from known proteins
• But where profile is known, we would like to use that additional information
8
Profile generation
• Get multiple sequence alignments• Compute position specific scoring matrix for
each protein– 21 rows (20 amino acids, and 1 row for gaps)
• Profile is generated for each protein in the training and test sets
Those of you who have worked with evolutionary analysis before, please give feedback
PSSM (i,j) = log(C(i,j)/total counts at position j)log(C(i,j)/unigram count of i in the protein)
9
Doubts
• We have labels for training sequences– But when original sequence has gaps when aligned,
how to interpret the labels of the gaps?
--n------n----n------nnn-----n------n-----------------M-----2a65 369 --D------E----L------KLS-----R------K-----------------H----- 3772A65_A 369 --.------.----.------...-----.------.-----------------.----- 377AAC07817 369 --.------.----.------...-----.------.-----------------.----- 377YP_001956 364 --E------S----F------G.K-----.------.-----------------T----- 372
-M------M------M------M-------M----------M---------MM-------2a65 378 -A------V------L------W-------T----------A---------AI------- 3852A65_A 378 -.------.------.------.-------.----------.---------..------- 385AAC07817 378 -.------.------.------.-------.----------.---------..------- 385YP_001956 373 -S------C------.-----------------------------------IL------- 377
Even TM regions are having gaps such as shown above
What labels to assign to gaps?
10
Doubts
• When nothing is shown (gap/alignment) for some sequences, I am counting those as gaps
XP_659910 47 L-......K.----------...KAP----RSNQV.-..FVAGTMGLASAVGA.AT 86AAW43619 100 .....A..A-----------KNP----NTTRNV-..FMVGALGALGASSV.ST 136CAB59195 59 ----.N.RP.-A..VIGSARFAYMAWTRVA 83XP_466001 107 SKRA.-A.FVLSGGRFIYASLLRLL 130AAA20832 103 SKRA.-A.FVLTGGRFVYASLVRLL 126
What do with missing segment info for some sequences
11
Using profile for predictionStudied independent of TMpro
Neural network with 21 input, 21 hidden and 1 output neurons
Residue Number
Pre
dic
ted
ou
tpu
t(n
on
me
mb
ran
e=
0,
me
mb
ran
e =
1)
Experimentalobserved locationsof TM helices
12
Another output
13
NN architecture needs to be modifiedBut instead I did post-processing of Neural network output
Computed Wavelet TransformMexican hat wavelet, scale = 10
14
Some more wavelet outputs
Note that these are from the training data itself.. Yet to check how it performs overall
15
Part 2: Stability upon Mutations
16
Evaluation of predictions of protein stability changes upon mutations
• Effects of mutations on 2 TM proteins are available in our group– The two proteins are rhodopsin and
bacteriorhodopsin– Data available for how much mis-folding occurs– How stability of protein is affected
• There are algorithms that can also predict these changes
• We compared how accurate or reliable the prediction methods are, by comparing their results with our experimental data
17
3 Prediction algorithms
• I mutant 2.0– Support vector machine– Features: amino acid neighbors in 9nm sphere,
temperature, pH, relative solvent accessibility surface are
– http://gpcr2.biocomp.unibo.it/cgi/predictors/I-Mutant2.0/I-Mutant2.0.cgi
• DFIRE– Knowledge based statistical potentials– http://phyyz4.med.buffalo.edu/hzhou/mutation.html
• FOLDX– Statistical mechanics.. Account for various energy terms– http://fold-x.embl-heidelberg.de:1100/
18
Authors’ claims in 3 papers
19
Our results
Number of known mutations I mutant DFIRE FOLD-X
Folding 52 54.7 57.7 50Meta 2 32 78.1 73.3 46.9Both 84 64.3 63.0 50.6
Number of known mutations I mutant DFIRE FOLD-X
Folding 147 35.4 37.1 55.7Meta 2 159 56.0 47.5 67.2Both 279 55.3 38.7 52.7
Rhodopsin (PDB: 1U19)
Bacteriorhodopsin (PDB: 1QM8)
20
Bias in # of mutations that increase/decrease stability
Database bias affects apparent accuracies of algorithms
I-mutant for example, predicts decrease in stability for a majority of the mutations.
Whether the mutations studied through experiments preserve the natural bias of decreasing stability mutations, affects the apparent accuracy of the prediction algorithms
Experimental I-mutant DFIRE FOLDXRhodopsin 63 75 46 66Bacteriorhodopsin 81 97 81 65
21
Correlation with known data
I-mutant DFIRE FOLDXRhodopsin 0.11 0.16 0.24Bacteriorhodopsin -0.09 0.18 -0.18
Reported correlations for these methods are quite large (>0.7)
On data compared here the correlations are quite low
22
Notes ..
• Local installation of blast and netblast are on cologne:– /usr1/blast-2.2.13/ – /usr1/netblast-2.2.13/
• Java SDK on Cologne– /usr1/j2sdk1.4.2_11/
23
Acknowledgements
Judith Klein-Seetharaman
Christopher Jon Jursa Pitt Information sciences
(for developing web interface)