Top Banner
Assignment of PROSITE motifs to topological regions: Application to a novel database of well characterised transmembrane proteins Tim Nugent 6 Month Report 11-05-07
11

Assignment of PROSITE motifs to topological regions: Application to a novel database of well characterised transmembrane proteins Tim Nugent 6 Month Report.

Mar 28, 2015

Download

Documents

Lily Long
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Assignment of PROSITE motifs to topological regions: Application to a novel database of well characterised transmembrane proteins Tim Nugent 6 Month Report.

Assignment of PROSITE motifs to topological regions: Application to a novel database of well characterised

transmembrane proteins

Tim Nugent

6 Month Report 11-05-07

Page 2: Assignment of PROSITE motifs to topological regions: Application to a novel database of well characterised transmembrane proteins Tim Nugent 6 Month Report.

Identification of Transmembrane Regions

Most Hydrophobic: Least Hydrophobic:

4.5 Arginine -4.54.2 Lysine -3.93.8 -3.5

Phenylalanine 2.8 -3.52.5 -3.5

IsoleucineValineLeucine Glutamic acid

AsparagineCysteine Glutamine

To generate data for a plot, the protein sequence is scanned with a moving window of size 19-21 residues. At each position, the mean hydrophobic index of the amino acids within the window is calculated and that value plotted as the midpoint of the window.

Aquaporin

Page 3: Assignment of PROSITE motifs to topological regions: Application to a novel database of well characterised transmembrane proteins Tim Nugent 6 Month Report.

The Positive Inside Rule

Hydrophobic Positive

0

0.1

0.2

0.3

0.4

0.5

0.6

Inside Loops

Outside Loops

Helices

Ave

rag

e a

min

o a

cid

fra

ctio

nHydrophobic: Val, Phe, Ile, Leu, Met. Positive: Lys, Arg, His.

Cytoplasmic loops are enriched in positively charged residues: the 'positive-inside rule' of von Heijne

Page 4: Assignment of PROSITE motifs to topological regions: Application to a novel database of well characterised transmembrane proteins Tim Nugent 6 Month Report.

Assembling a novel data set of transmembrane proteins

In order to study and predict features of transmembrane proteins, the use of a high quality data set containing sequences with experimentally confirmed TM regions is essential.

The data set was based on the widely used Möller test set (2001).

Additional data was collected from MPTOPO, OPM, SWISSPROT and from the literature.

Sequences were blasted against the PDB in order to identify entries for which the TM region had complete structural coverage.

This set was then homology reduced at the 40% sequence identity level.

The makeup of the final data set contains 141 sequences, all with available structures, verifiable topology and N-terminal locations.

111 Alpha-helical proteins

30 Beta-barrel proteins

Page 5: Assignment of PROSITE motifs to topological regions: Application to a novel database of well characterised transmembrane proteins Tim Nugent 6 Month Report.

Column Example Entry

swissprotaccessionpdbn_terminal_locationn_terminal_in_outmembrane_typehelix_counttopologyswissprot_sequencesequence_lengthclassdatasourcedescriptiontaxonomydomainid

ANASPQ8YSC41XIO:APeriplasmicOutsideBacterial gram-negative inner membrane7A.3,26;B.35,56;C.70,89;D.99,121;E.127,148;F.168,185;G.195,218MNLESLLHWIYVAGMTIGALHFWSLSRNPRGVPQYEYLVAMFI...261Alpha-helicalOPMBacteriorhodopsin - Anabaena sp. (strain PCC 7120)Bacteria; Cyanobacteria; Nostocales; Nostocaceae; Nostoc.Prokaryotes1052

MySQL table schema

Page 6: Assignment of PROSITE motifs to topological regions: Application to a novel database of well characterised transmembrane proteins Tim Nugent 6 Month Report.

Hydrophobic Positive Negative Aromatic Polar0

0.1

0.2

0.3

0.4

0.5

0.6

Aver

age

amin

o ac

id fr

actio

n

Hydrophobic: Val, Phe, Ile, Leu, Met. Positive: Lys, Arg, His. Negative: Asp, Glu. Aromatic: Phe, Trp, Tyr. Polar: Cys, Pro, His, Asn, Gln, Ser, Thr.

Amino acid composition

Hydrophobic Positive Negative Aromatic Polar

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

Inside Loops

Outside Loops

Helices

Aver

age

amin

o ac

id fr

actio

n

Beta-barrel proteins Alpha-helical proteins

Page 7: Assignment of PROSITE motifs to topological regions: Application to a novel database of well characterised transmembrane proteins Tim Nugent 6 Month Report.

Assignment of PROSITE motifs to topological regions

We next explored the possibility that motifs from the PROSITE database could be used as constraints in subsequent topology prediction steps, by identifying a bias in their inside/outside frequency.

Extracelullar

Cytoplasm

Page 8: Assignment of PROSITE motifs to topological regions: Application to a novel database of well characterised transmembrane proteins Tim Nugent 6 Month Report.

Prosite ID Description Inside Outside Helix Multiple n χ2

PS00008 N-myristoylation site 20.23% 32.11% 44.82% 2.68% 598 7.89

PS00006 Casein kinase II phosphorylation site 53.97% 38.41% 5.96% 1.66% 302 18.83

PS00005 Protein kinase c phosphorylation site 63.08% 29.23% 7.69% 0.00% 260 50.23

PS00001 N-glycosylation site 49.51% 33.01% 13.59% 3.88% 103 5.32

PS00004 cAMP- and cGMP-dependent protein kinase phosphorylation site 67.86% 28.57% 3.57% 0.00% 28 10.72

PS00009 Amidation site 85.71% 7.14% 0.00% 7.14% 14 7.65

PS00029 Leucine Zipper 27.27% 0.00% 27.27% 45.45% 11 3.47

PS00221 MIP family signature 100.00% 0.00% 0.00% 0.00% 4 4.63

PS50857 Cytochrome c oxidase subunit II signature 0.00% 100.00% 0.00% 0.00% 4 3.46

Alpha-helical protein PROSITE motif assignments

Page 9: Assignment of PROSITE motifs to topological regions: Application to a novel database of well characterised transmembrane proteins Tim Nugent 6 Month Report.

Prosite ID Description Inside Outside Beta-sheet Multiple n χ2

PS00008 N-myristoylation site 30.29% 34.58% 29.49% 5.09% 373 24.88

PS00006 Casein kinase II phosphorylation site 34.50% 50.22% 11.79% 3.49% 229 7.18

PS00005 Protein kinase C phosphorylation site 31.95% 46.75% 21.30% 0.00% 169 4.79

PS00001 N-glycosylation site 31.13% 39.62% 26.42% 2.83% 106 5.18

PS00007 cAMP- and cGMP-dependent protein kinase phosphorylation site 42.11% 42.11% 5.26% 10.53% 19 2.46

PS00009 Amidation site 15.79% 68.42% 15.79% 0.00% 19 1.25

PS00430 TonB-dependent receptor proteins signature 100.00% 0.00% 0.00% 0.00% 4 8.59

PS00013 100.00% 0.00% 0.00% 0.00% 4 8.59Prokaryotic membrane lipoprotein lipid attachment site

Beta-barrel protein PROSITE motif assignments

Page 10: Assignment of PROSITE motifs to topological regions: Application to a novel database of well characterised transmembrane proteins Tim Nugent 6 Month Report.

my $im = DrawTransmembrane->new(-title=>'CLN3 topology prediction using MEMSAT3_6', -n_terminal=>'in',

-topology=>'37,56,100,119,129,153,200,224,280,299,353,375', -labels=> \%labels, -outside_label=>'Lumen', -inside_label=>'Cytoplasm', -membrane_label=>'Membrane');

print OUTPUT $im->png;

A Bioperl module to draw transmembrane proteins

Page 11: Assignment of PROSITE motifs to topological regions: Application to a novel database of well characterised transmembrane proteins Tim Nugent 6 Month Report.

Conclusions

I have successfully achieved my major goal for the first 6 months - to create a high quality dataset of transmembrane proteins of known topology.

My second goal - to scan the novel data set against motif and domain databases to identify signatures which were consistently located on either inside or outside loops - has also been completed.

In collaboration with Dr Sara Mole (MRC Laboratory for Molecular Cell Biology), I have begun an analysis of CLN3 (Batten's Disease protein) with a view to predicting the protein's topology using a combination of computational and experimental evidence.

I have written a module using Perl to create graphical representations of transmembrane proteins given the positions of their transmembrane helices and the location of the N-terminal. This module has been accepted by the Bioperl project and will be available in the next release.