Eukaryotic Secretome Prediction and Knowledge-Base Development

Post on 23-Feb-2016

24 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Eukaryotic Secretome Prediction and Knowledge-Base Development. Xiang-Jia “Jack” Min Ph.D., Assistant Professor. 2 nd International Conferences on Proteomics & Bioinformatics. Las Vegas, July 2 - 4, 2012 . DNA. RNA. protein. phenotype. Genome. Transcription. mRNA - PowerPoint PPT Presentation

Transcript

1

Eukaryotic Secretome Prediction and Knowledge-Base Development

Xiang-Jia “Jack” Min

Ph.D., Assistant Professor

2nd International Conferences on Proteomics & Bioinformatics. Las Vegas, July 2 - 4, 2012

2

DNA RNA phenotypeprotein

3

Genome

Transcriptome

Proteome

Secretome

mRNA (protein-coding DNA

sequences)

Protein sequences

Proteins with secretory signal peptide

Transcription

Translation

Secretion

4

Günter Blobel

5

6

7

8

Biomaterials Small molecules

Fungi

secreted enzymes

YeastsMouldsMushrooms

Biomaterials Bio-fuelsEnzymes

9

How to identify secreted proteins?

Genome

Transcriptome

Proteome

Secretome

Transcription

Translation

Secretion

(1) Direct identification using proteomics methods (Tsang et al. 2009)

(2) Computational prediction from predicted proteome

(3) EST data mining

10

Secreted Proteins

• Classical secreted proteins have a signal peptide at N-terminus;

• Not all proteins have a signal peptide are secreted:

• Signal peptide = secreted protein

11

SignalP: a program to predict if a protein contains a signal peptide.

Phobius: signal peptide and transmembrane domain predicton.

WolfPsort: a multiple subcellular location predictor

TargetP: detect proteins targeted to mitochondria.

TMHMM: transmembrane domain prediction.

PS-Scan: detection ER-retention signals

12

13

14

Human cytochrome C oxidase subunit 1 (COX1)

15

16

Data

Secreted Non-secreted

Fungi 241 5,992Animals 5,568 19,048Plants 216 7,528Protists 32 1,979

17

Method

• Sensitivity (%) = TP/(TP + FN) x 100

• Specificity (%) = TN/(TN + FP) x 100

• Mathews’ Correlation Coefficient (MCC) MCC (%) = (TP x TN – FP x FN) x 100 /((TP

+ FP) (TP + FN) (TN + FP) (TN + FN))1/2

18

TP FP TN FNSn

(%) Sp (%)MCC (%)

SignalP 232 329 5663 9 96.3 94.5 61.2

Phobius 226 203 5789 15 93.8 96.6 68.8

TargetP 228 583 5409 13 94.6 90.3 48.6

WolfPsort 230 167 5825 11 95.4 97.2 73.1

SignalP/TMHMM 228 168 5824 13 94.6 97.2 72.6

Phobius/TMHMM 224 200 5792 17 92.9 96.7 68.6

TargetP/TMHMM 224 265 5727 17 92.9 95.6 63.5

WolfPsort/TMHMM 227 135 5857 14 94.2 97.7 75.8

SignalP/TMHMM/WolfPsort 226 86 5906 15 93.8 98.6 81.6

SignalP/TMHMM//WolfPsort/Phobius 222 69 5923 19 92.1 98.8 83.1

SignalP/TMHMM/WolfPsort/Phobius/PS-Scan 222 67 5925 19 92.1 98.9 83.4

SignalP/TMHMM/WolfPsort/Phobius/TargetP/PS-Scan 218 66 5926 23 90.5 98.9 82.6

TP: true positives; FP: false positives; TN: true negatives; FN: false negatives. Sn: sensitivity; Sp:specificity;MCC: Mathews' correlation coefficient.

Table 1. Prediction accuracies of secreted proteins in fungi

Min XJ (2010) JPB 3:143-147.

19

Table 2. Prediction accuracies of secreted proteins in animals

TP FP TN FNSn (%) Sp (%)

MCC (%)

SignalP 5307 4108 14940 261 95.3 78.4 63.5

Phobius 5157 1167 17881 411 92.6 93.9 82.8

TargetP 5313 5412 13636 255 95.4 71.6 56.5

WolfPsort 5135 1762 17286 433 92.2 90.7 77.3

SignalP/TMHMM 5217 1383 17665 351 93.7 92.7 81.6

Phobius/TMHMM 5148 1142 17906 420 92.5 94.0 82.9

TargetP/TMHMM 5222 1369 17679 346 93.8 92.8 81.8

WolfPsort/TMHMM 5093 1084 17964 475 91.5 94.3 82.8

Phobius/WolfPsort 4959 555 18493 609 89.1 97.1 86.4

Phobius/WolfPsort/TMHMM 4952 544 18504 616 88.9 97.1 86.5

Phobius/WolfPsort/TMHMM/SignalP 4952 544 18504 616 88.9 97.1 86.5

Phobius/WolfPsort/TMHMM/TargetP 4934 505 18543 634 88.6 97.3 86.7

Phobius/WolfPsort/TMHMM/TargetP/PS-Scan 4931 482 18566 637 88.6 97.5 86.9

Phobius/WolfPsort/TMHMM/TargetP/PS-Scan/SignalP 4931 482 18566 637 88.6 97.5 86.9

TP: true positives; FP: false positives; TN: true negatives; FN: false negatives. Sn: sensitivity; Sp:specificity; MCC: Mathews' correlation coefficient.

Min XJ (2010) JPB 3:143-147.

20

Table 3. Prediction accuracies of secreted proteins in plants

TP: true positives; FP: false positives; TN: true negatives; FN: false negatives. Sn: sensitivity; Sp:specificity; MCC: Mathews' correlation coefficient.

TP FP TN FN Sn (%) Sp (%)MCC (%)

SignalP 199 364 7164 17 92.1 95.2 55.4

Phobius 188 638 6890 28 87.0 91.5 41.9

TargetP 198 442 7086 18 91.7 94.1 51.3

WolfPsort 108 70 7458 108 50.0 99.1 53.9

SignalP/TMHMM 197 237 7291 19 91.2 96.9 63.0

Phobius/TMHMM 188 636 6892 28 87.0 91.6 42.0

TargetP/TMHMM 195 256 7272 21 90.3 96.6 61.1

WolfPsort/TMHMM 106 45 7483 110 49.1 99.4 57.7

SignalP/HMM/TargetP 195 149 7379 21 90.3 98.0 70.6

Phobius/TargetP/TMHMM 183 122 7406 33 84.7 98.4 70.4

SignalP/TMHMM/WolfPsort 106 35 7493 110 49.1 99.5 59.9

SignalP/TMHMM/Phobius 188 183 7345 28 87.0 97.6 65.2

SignalP/HMM/Phobius/TargetP 183 113 7415 33 84.7 98.5 71.5

SignalP/HMM/Phobius/TargetP/PS-Scan 183 100 7428 33 84.7 98.7 73.2

SignalP/HMM/Phobius/TargetP/WolfPsort/PS-Scan 102 29 7499 114 47.2 99.6 59.8

Min XJ (2010) JPB 3:143-147.

21

Summary

• Different prediction tools have different accuracies for prediction of secretomes in different kingdoms of species;

• Combining these tools often increases the prediction accuracy. However, differential combination are needed for species in different kingdoms.

• Optimal methods are proposed.

22

23

24

25

Views

gi

accession

UniProt ID

Keywords

Species

User Inputs

Manual Curation

Subcellular Location

FunSecKB

fragAnchor

PS-SCAN

TMHMM

TargetP

WolfPsort

Phobius

SignalP

Database

RefSeq

UniProt

Prediction Tools

External Links

Lum G & Min XJ (2011) Database.

26

Summary of FunSecKB

• Currently the database contains a total of 478,073 fungal protein sequences

• 23,878 predicted and / or curated secreted proteins

• A total of 118 fungal species including 52 fungal species having a complete proteome

27Lum G & Min XJ (2011) Database.

28Lum G & Min XJ (2011) Database.

29Lum G & Min XJ (2011) Database.

30

31

32

33

34

35

36

Plant secretomes and other subcellular proteins

Vitis vinifera (%)

Populus trichocarpa (%)

Arabidopsis thaliana (%)

Oryza sativa (%)

SorghumBicolor (%)

Total proteins 29836 41794 32214 39997 32796

Secreted proteins 1892 (6.3) 2487 (6.0) 2835 (8.8) 3085 (7.7) 2394 (7.3)

Mitochondria

Membrane 490 (1.6) 566 (1.4) 415 (1.3) 832 (2.1) 666 (2.0)

Non-membrane 3877 (13.0) 5238 (12.5) 3729 (11.6) 7187 (18.0) 5768 (17.6)

Chloroplast

Membrane 565 (1.9) 601 (1.4) 671 (2.1) 720 (1.8) 610 (1.9)

Non-membrane 3675 (12.3) 4850 (11.6) 4865 (15.1) 6318 (15.8) 5385 (16.4)

ER proteins 29 (0.1) 37 (0.1) 60 (0.2) 32 (0.1) 25 (0.1)

Other membrane proteins 3251 (10.9) 4532 (10.8) 3649 (11.3) 3672 (9.2) 2900 (8.8)

Others (unknown) 16057 (53.8) 23483 (56.2) 15990 (49.64) 18151 (45.4) 15048 (45.9)

37

38

39

Acknowledgements

Gengkon Lum (M. S. Graduate)Jessica Orr (Undergraduate)Docylyne Shelton (Undergraduate)Braden Walters (Undergraduate)

top related