This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
•PROSITE - A Dictionary of Protein Sites and Patterns
•1328 patterns and 577 profiles/matrices (dec 2005)
•For every pattern or profile there is documentation present (e.g. PDOC00975)- information on taxonomic occurrence- domain architecture, - function, - 3D structure, - main characteristics of the sequence - some references.
•If regular expressions fail to define the motif properly we need a profile.
•Profiles are specific representations that incorporate the entire information of a multiple sequence alignment.
•A profile is a position-specific scoring scheme and holds for each position in the sequence 20 scores for the 20 residue types, and sometimes also two values for gap open and gap elongation.
•Profiles provide a sensitive means of detecting distant sequence relationships
The window size can be changed. A small window produces "noisier" plots that more accurately reflect highly local hydrophobicity. A window of about 19 is generally optimal for recognizing the long hydrophobic stretches that typify transmembrane stretches.
Proteins have intrinsic signals that govern their transport and localization in the cell (nucleus, ER, mitochondria, chloroplasts)
Specific amino acid sequences determine whether a protein will pass through a membrane into a particular organelle, become integrated into the membrane, or be exported out of the cell.
The common structure of signal peptides from various proteins is described as:
• a positively charged (N-terminal) n-region
• followed by a hydrophobic h-region (which can adopt an -helical conformation in an hydrophobic environment)
• and a neutral but polar c-region (cleavage region; the signal sequence is cleaved off here after delivering the protein at the right site).
The (-3, -1) rule states that the residues at positions –3 and –1 (relative to the cleavage site) must be small and neutral for cleavage to occur correctly.
Signal Peptides (3)
Prokaryotes
Eukaryotes Gram-negative Gram-positive
Total length (average) 22.6 aa 25.1 aa 32.0 aa
n-regions only slightly Arg-rich Lys+Arg-rich
h-regions short, very
hydrophobic slightly longer, less
hydrophobic very long, less hydrophobic
c-regions short, no pattern short, Ser+Ala-rich longer, Pro+Thr-rich
-3,-1 positions small and neutral
residues almost exclusively Ala
+1 to +5 region no pattern rich in Ala, Asp/Glu, and Ser/Thr
•Although they are usually found in non-coding genomic regions, repeating sequences are also found within genes.
•Ranging from repeats of a single amino acid, through three residue short tandem repeats (e.g. in collagen), to the repetition of homologous domains of 100 or more residues.
•Duplicated sequence segments occur in 14 % of all proteins, but eukaryotic proteins are three times more likely to have internal repeats than prokaryotic proteins
The coiled-coil is a ubiquitous protein motif that is often used to control oligomerisation.
It is found in many types of proteins, including transcription factors, viral fusion peptides, and certain tRNA synthetases.
Most coiled-coil sequences contain heptad repeats - seven residue patterns denoted abcdefg in which the a and d residues (core positions) are generally hydrophobic.
A number of programs are available to predict coiled-coil regions in a protein: COILS, PAIRCOILS, MULTICOILS.