NP - Positive set Negative Set Full length ORFs Genome Annotated Candidate NPs Top ranked NPs Input Training NP catalogue Negative Set Negative set NP.

Post on 01-Apr-2015

220 Views

Category:

Documents

3 Downloads

Preview:

Click to see full reader

Transcript

NP - Positiveset

Negative Set

Full lengthORFs

Genome

Annotated

Candidate NPs

Top ranked NPs

Input Training NP catalogue

Negative Set

Negative Set

Negative set

NP processing tools

Translated proteome ML quality:

Cross validation

NeuroPID

prediction

Q

Y C

N

H

L D

R

W

M

T

S

G

A

V

P

F

I E

K

0

20

40

60

80

100

1-lo

g(p-

valu

e, t-

test

)

1-lo

g(p-

valu

e, t-

test

)

A B

GRAVY

Instabilit

y

Molecular

Weig

ht PI

Aromati

city

0

10

20

30

MRSRTSVLTSSLAFLYFFGIVGRSALAMEETPASSMNLQHYNN

MLNPMVFDDTMPEKRAYTYVSEYKRLPVYNFGIGKRWIDTNDN

KRGRDYSFGLGKRRQYSFGLGKRNDNADYPLRLNLDYLPVDNP

AFHSQENTDDFLEEKRGRQPYSFGLGKRAVHYSGGQPLGSKRP

NDMLSQRYHFGLGKRMSEDEEESSQR

MRSRTSVLTSSLAFLYFFGIVGRSALAMEETPASSMNLQHYNN

MLNPMVFDDTMPEKRAYTYVSEYKRLPVYNFGIGKRWIDTNDN

KRGRDYSFGLGKRRQYSFGLGKRNDNADYPLRLNLDYLPVDNP

AFHSQENTDDFLEEKRGRQPYSFGLGKRAVHYSGGQPLGSKRP

NDMLSQRYHFGLGKRMSEDEEESSQR

Random Forest Classifier RBF Linear SVC Gradient Boosting SVC Sigmoid Polynomal SVM0

0.2

0.4

0.6

0.8

1

‘accuracy’ ‘precision’ ‘recall’

Area under ROC curve

Cros

s va

lidati

on p

erfo

rman

ce

Cros

s va

lidati

on p

erfo

rman

ce

S. frugiperda (Fall armyworm) 5

H. armigera (Cotton bollworm) 6

S. gregorian (Desert locust ) 4

A. florea (Little honeybee) 0

M. rotundata (Alfalfa leafcutter bee)1

C. floridanus (Florida carpenter ant) 2

A. echinatior (Leafcutter ant) 3

A

C

B

D

D

SW Arthropods

UniProt Arthropods

 Random Forest

Gradient Boosting

Linear SVC

Random Forest

Gradient Boosting

Linear SVC

Mean Accuracy 0.94 0.95 0.94 0.92 0.92 0.86

Mean Precision 0.94 0.95 0.93 0.93 0.94 0.95

Mean Recall 0.92 0.92 0.92 0.95 0.95 0.85

Mean AUC 0.94 0.95 0.94 0.89 0.90 0.87SW

Chordata  UniProt

Chrodata

 Random Forest

Gradient Boosting

Linear SVC

Random Forest

Gradient Boosting

Linear SVC

Mean Accuracy 0.96 0.97 0.95 0.90 0.91 0.85

Mean Precision 0.94 0.94 0.88 0.91 0.92 0.89

Mean Recall 0.91 0.92 0.93 0.91 0.91 0.83

Mean AUC 0.95 0.95 0.94 0.90 0.91 0.85

Organism# sequences UniProtKB

# of full length

UniProtKB# of SP

# of NP & SP

# NeuroPID All methods

Functional annotation enrichment

B. mori 17908 17069 138 6 69Innate immunity;Insulin-like; Chorion, Hormne (NP)

S. invicta 14356 84 12 2 4 Innate immunity

D. melanogaster

39961 31091 475 21 120Innate immunity; Developmental; Channel ligand; Receptor, Hormone (NP)

C.elegans 26005 25534 464 21 89Hormone (NP), Channel ligand; Receptor, Protease

SW Arthropods

UniProt Arthropods

 Random Forest

Gradient Boosting

Linear SVC

Random Forest

Gradient Boosting

Linear SVC

Mean Accuracy 0.94 0.95 0.94 0.92 0.92 0.86Mean Precision 0.94 0.95 0.93 0.93 0.94 0.95Mean Recall 0.92 0.92 0.92 0.95 0.95 0.85Mean AUC 0.94 0.95 0.94 0.89 0.90 0.87

SW Chordates

  UniProt Chordates

 Random Forest

Gradient Boosting

Linear SVC

Random Forest

Gradient Boosting

Linear SVC

Mean Accuracy 0.96 0.97 0.95 0.90 0.91 0.85Mean Precision 0.94 0.94 0.88 0.91 0.92 0.89Mean Recall 0.91 0.92 0.93 0.91 0.91 0.83Mean AUC 0.95 0.95 0.94 0.90 0.91 0.85

organism / taxa# of UniProt (UniRef90)

# of NPP in SW

(UniRef90)

# of NPP in UniProt

(UniRef90)

PredictionNeuroPID

RBFa

Apis Melliferra 10394 6 19 7

SP in Apis Melliferab 2139 5 7

Gallus gallus 20760 5 5 5

SP in Gallus gallus 701 1 1

Bombyx mori 15250 5 17 9

SP in Bombyx mori 112 5 5 9

Octopoda 224 4 4 4

SP in Octopoda 76 3 3

Updates 5 7 2013

RF 7

ExtraTree 8

SVM-SVC 16GBR 79

4

11

2 2

1

RFExt-Tree

79

SVM-SVC

GBR

16

8786%

42%

100%100%

60%

75% RFExt-Tree

18

SVM-SVC

GBR

10

95100%

60%

100%100%

33%

80%

Apis mellifera Thaumeledone gunteri

Updates 5 7 2013

S. frugiperda (Fall armyworm) 5

H. armigera (Cotton bollworm) 6

S. gregorian (Desert locust ) 4

A. florea (Little honeybee) 0

M. rotundata (Alfalfa leafcutter bee)1

C. floridanus (Florida carpenter ant) 2

A. echinatior (Leafcutter ant) 3

A

C

B

0

0.3

0.6

0.9

1.2

1.5

% o

f ann

otat

ed N

Ps in

taxo

nom

y

113

77

2510

Mammalia

Insecta

Caenorhabditis

others

A B

66- 97 KRL....YDFG.........LG..............KRA..YsyvSEYKRL.............................pvYN..FGLGKR 98- 120 SKM....YGFG.........LG..............KR.......DG..RM...............................YS..FGLGKR 121- 164 DYD....Y.YGeededdqqaIGdedieesdvgdlmdKR..........DRL...............................YS..FGLGKR 165- 191 ARP....YSFG.........LG..............KRA..P...SGAQRL...............................YG..FGLGKR 192- 216 GGS...lYSFG.........LG..............KR........GDGRL...............................YA..FGLGKRPVN 222- 253 GRSsgsrFNFG.........LG..............KRS..D...DIDFRE...............................LEekFAEDKR 254- 316 .YPqehrFSFG.........LG..............KREveP...SELEAVrneekdnssvhdkknntndmhsgerikrslhYP..FGIRKL 347- 367 RRP....FNFG.........LG..............KRI..P........M...............................YD..FGIGKR

66- 97 KRL....YDFG.........LG..............KRA..YsyvSEYKRL........pvYN..FGLGKR98-120 SKM....YGFG.........LG..............KR.......DG..RM..........YS..FGLGKR121-164 DYD....Y.YGeededdqqaIGdedieesdvgdlmdKR..........DRL..........YS..FGLGKR165-191 ARP....YSFG.........LG..............KRA..P...SGAQRL..........YG..FGLGKR192-220 GGS...lYSFG.........LG..............KR........GDGRL..........YA..FGLGKRPVNS221-253 GRSsgsrFNFG.........LG..............KRS..D...DIDFRE..........LEekFAEDKR254-316 YPqehrFSFG.........LG..............KREveP...SELEAVrne(25)slhYP..FGIRKL346-367 RRP....FNFG.........LG..............KRI..P........M..........YD..FGIGKR

top related