Introduction to Bioinformatics - Tutorial no. 8 Predicting protein structure PSI-BLAST
Dec 21, 2015
Introduction to Bioinformatics - Tutorial no. 8
Predicting protein structure
PSI-BLAST
PHDsec and PSIpred
PHDsec Rost & Sander, 1993 Based on sequence family alignments
PSIpred Jones, 1999 Based on PSI-BLAST profiles
Both consider long-range interactions
PSIpred Input
Input sequence
Type of Analysis
PSIpred Input (2)Filtering Options
Email address
GO!
PSIpred Output
Conf: Confidence (0=low, 9=high)Pred: Predicted secondary structure (H=helix, E=strand, C=coil) AA: Target sequence
Conf: 988766667637889999877999871289878877049963202468899999997887Pred: CCCCCCCCCCHHHHHHHHHHHHHHHHHCCCCCCHHHCCCCCHHHCHHHHHHHHHHHHHHH AA: MQRSPLEKASVVSKLFFSWTRPILRKGYRQRLELSDIYQIPSVDSADNLSEKLEREWDRE 10 20 30 40 50 60
Conf: 742888731467888768899999999999999987557888998875227887303678Pred: HHCCCCCCHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHCCCCCCCHHHH AA: LASKKNPKLINALRRCFFWRFMFYGIFLYLGEVTKAVQPLLLGRIIASYDPDNKEERSIA 70 80 90 100 110 120
Confidence level
Predicted structure
PHDsec Input (1)
Email addressType of
prediction
Additional output
Output format
Reduce processing
PHDsec Input (2)
Type (number) of input sequences
Upload file
Enter sequence
Wait for results?
PHDsec Output (1)Protein
classification
Structure proportions
Amino acid proportions
PHDsec Output (2)Estimated structure
Confidence level
Structure with high
confidence
PSI-BLAST
Position-Specific Iterative BLAST Extension to BLASTP
Finds more distantly related sequences Distant sequences with insignificant E values
Even in distantly related sequences, important domains can be highly conserved PSI-BLAST gives more weight to those
PSI-BLAST Profile
When close sequences are aligned – areas of conservation.
Scoring matrix becomes position specific Each column has a unique set of a.a.
frequencies. Score is column specific, based on a.a.
frequency. More frequent a.a. -> higher score.
A new sequence is scored based on the new scoring matrix.
123456
AMTYQR
CTTYQS
SMTYQA
Position-Specific Scoring Matrix
A PSI-BLAST Iteration Collect all database sequence segments
that have been aligned with query sequence with E-value below set threshold (default 0.01)
Construct position specific scoring matrix for collected sequences. Rough idea:
Align all sequences to the query sequence as the template.
Assign weights to the sequences Construct position specific scoring matrix
Find sequences that mach the profile
Using PSI-BLAST (1)
Available from main
BLAST page
Or switch on in BLASTP
E value threshold for initial inclusion in multiple alignment for profile
Using PSI-BLAST (2)
Select whether to include in next iterationNew result
Align selected sequences, generate profile, search again
Number of results to show next iteration
Exercise 11. There is a protein with an unknown structure:
>some protein
MEAFLGTWKMEKSEGFDKIMERLGVDFVTRKMGNLVKPNLIVTDLGGGKYK
MRSESTFKTTECSFKLGEKFKEVTRFTRGHFFMITVENGVMKHEQDDKTKV
TYIERVVEGNELKATVKVDEVVCVRTYSKVA
Can BLAST help us to predict its SS?
2. Use any secondary structure prediction method to predict the
secondary structure of 1O8V and compare it to the solved structure. NOTICE! The secondary structure definition in PDB is given in a 7 letter code
instead of 3 letter code (H, E, C). For comparison purposes consider: G H and
I as H; E as E ; all the rest including spaces as C.
3. What can you conclude about the secondary structure prediction in
this case?
4. Are the results consistent with the confidence value of the prediction?
5. Can you explain the prediction results based on the real structure?
Exercise 1
Exercise 2• Prion is the protein which responsible to the Mad Cow Disease.
In the normal situation the amino acids in a specific region are
arranged in α-helix (H1). In the abnormal situations this region
undergoes a change into a β-strand conformation. • This conformational change is thought to be the origin of the
disease, which brings to a rapid degeneration of the nerve
system, and usually causes death. • It is assumed that the prion molecules, which changed
conformations, accelerate the conformational change of
additional molecules.
1. Check what conformation is predicted for this protein.
2. The PDB code of the prion protein is 1ag2. The helix is located
at positions 21-30 on the sequence in this file. Does the
predicted SS correlates with the real one in the region of
interest?