This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
A sequence alignment is a representation of a whole series of evolutionary events, which left traces in the sequences.
Things that are more likely to happen during evolution should be most prominently observed in your alignment.
The purpose of a sequence alignment is to line up all residues in the sequence that were derived from the same residue position in the ancestral gene or protein.
To carry over information from a well studied protein sequence and its structure to a newly discovered protein sequence, we need a sequence alignment that represents the protein structures today, a structural alignment.
The implicit meaning of placing amino acid residues below each other in the same column of a protein (multiple) sequence alignment is that they are at the equivalent position in the 3D structures of the corresponding proteins!!
One can only transfer information if the similarity is significantly high between the two sequences.
Schneider (group of Sander) determined the “threshold curve” for transferring structural information from one known protein structure to another protein sequence:
If the sequences are > 80 aa long, then >25% sequence identity is enough to reliably transfer structural information.
If the sequences are smaller in length, a higher percentage of identity is needed.
Procedure of aligning depends on information available:
1) Use “only” identity of amino acid and its physico-chemical properties. This is more or less what alignment programs do.
2) Also use explicitly the secondary structure preference of the amino acids.Example: aligning 2 helices when sequence identity is low
3) Use 3D information if one or more of the structures in the alignment are known.
In most cases you will start with a alignment program (e.g. CLUSTAL) and then use your knowledge of the amino acids to improve the alignment, for instance by correcting the position of gaps.
1 2 3 4 5 6 7 8 9 10 11 A ILE CYS ARG LEU PRO GLY SER ALA GLU ALA VALB1 VAL CYS ARG THR PRO --- --- --- GLU ALA ILEB2 VAL CYS ARG --- --- --- THR PRO GLU ALA ILE
1 2 3 4 5 6 7 8 9 10 11 A ILE CYS ARG LEU PRO GLY SER ALA GLU ALA VALB1 VAL CYS ARG THR PRO --- --- --- GLU ALA ILEB2 VAL CYS ARG --- --- --- THR PRO GLU ALA ILE
• Applying these lessons to the practical exercises
• Performing your own bioinformatics research project!
Take home lesson:
Please remember to always use all structural information available to you to optimize a sequence alignment. This can be real 3D data, but can also be “just” your own knowledge about the properties and preferences of the amino acids.