Top Banner
NP - Positive set Negativ e Set Full length ORFs Genome Annotat ed Candidat e NPs Top ranked NPs Input Training NP catalogue Negativ e Set Negativ e Set Negativ e set NP processin g tools Transla ted proteom e ML quality: Cross validation NeuroPID prediction
42

NP - Positive set

Mar 22, 2016

Download

Documents

deliz

Training. NP catalogue. Input. NP - Positive set. Top ranked NPs. Genome. Negative Set . Negative Set . NP processing tools. Annotated. Negative Set . Negative set . Candidate NPs. Translated proteome. ML quality: Cross validation. NeuroPID. Full length ORFs. - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: NP - Positive set

NP - Positiveset

Negative Set

Full lengthORFs

Genome

Annotated

Candidate NPs

Top ranked NPs

Input Training NP catalogue

Negative Set

Negative Set

Negative set

NP processing tools

Translated proteome ML quality:

Cross validation

NeuroPID

prediction

Page 2: NP - Positive set
Page 3: NP - Positive set
Page 4: NP - Positive set

Q

Y C

N

H

L D

R

W

M

T

S

G

A

V

P

F

I E

K

0

20

40

60

80

100

1-lo

g(p-

valu

e, t-

test

)

1-lo

g(p-

valu

e, t-

test

)

A B

GRAVY

Instabilit

y

Molecular

Weig

ht PI

Aromati

city

0

10

20

30

Page 5: NP - Positive set
Page 6: NP - Positive set

MRSRTSVLTSSLAFLYFFGIVGRSALAMEETPASSMNLQHYNN

MLNPMVFDDTMPEKRAYTYVSEYKRLPVYNFGIGKRWIDTNDN

KRGRDYSFGLGKRRQYSFGLGKRNDNADYPLRLNLDYLPVDNP

AFHSQENTDDFLEEKRGRQPYSFGLGKRAVHYSGGQPLGSKRP

NDMLSQRYHFGLGKRMSEDEEESSQR

Page 7: NP - Positive set

MRSRTSVLTSSLAFLYFFGIVGRSALAMEETPASSMNLQHYNN

MLNPMVFDDTMPEKRAYTYVSEYKRLPVYNFGIGKRWIDTNDN

KRGRDYSFGLGKRRQYSFGLGKRNDNADYPLRLNLDYLPVDNP

AFHSQENTDDFLEEKRGRQPYSFGLGKRAVHYSGGQPLGSKRP

NDMLSQRYHFGLGKRMSEDEEESSQR

Page 8: NP - Positive set

Random Forest Classifier RBF Linear SVC Gradient Boosting SVC Sigmoid Polynomal SVM0

0.2

0.4

0.6

0.8

1

‘accuracy’ ‘precision’ ‘recall’

Area under ROC curve

Cros

s val

idati

on p

erfo

rman

ce

Page 9: NP - Positive set
Page 10: NP - Positive set
Page 11: NP - Positive set
Page 12: NP - Positive set

Cros

s val

idati

on p

erfo

rman

ce

Page 13: NP - Positive set
Page 14: NP - Positive set
Page 15: NP - Positive set
Page 16: NP - Positive set

S. frugiperda (Fall armyworm) 5

H. armigera (Cotton bollworm) 6

S. gregorian (Desert locust ) 4

A. florea (Little honeybee) 0

M. rotundata (Alfalfa leafcutter bee)1

C. floridanus (Florida carpenter ant) 2

A. echinatior (Leafcutter ant) 3

A

C

B

Page 17: NP - Positive set
Page 18: NP - Positive set
Page 19: NP - Positive set

D

Page 20: NP - Positive set

D

Page 21: NP - Positive set

SW Arthropods

UniProt Arthropods

 Random Forest

Gradient Boosting

Linear SVC

Random Forest

Gradient Boosting

Linear SVC

Mean Accuracy 0.94 0.95 0.94 0.92 0.92 0.86Mean Precision 0.94 0.95 0.93 0.93 0.94 0.95Mean Recall 0.92 0.92 0.92 0.95 0.95 0.85Mean AUC 0.94 0.95 0.94 0.89 0.90 0.87

SW Chordata  

UniProt Chrodata

 Random Forest

Gradient Boosting

Linear SVC

Random Forest

Gradient Boosting

Linear SVC

Mean Accuracy 0.96 0.97 0.95 0.90 0.91 0.85Mean Precision 0.94 0.94 0.88 0.91 0.92 0.89Mean Recall 0.91 0.92 0.93 0.91 0.91 0.83Mean AUC 0.95 0.95 0.94 0.90 0.91 0.85

Page 22: NP - Positive set

Organism # sequences UniProtKB

# of full length

UniProtKB# of SP # of

NP & SP# NeuroPID All methods

Functional annotation enrichment

B. mori 17908 17069 138 6 69 Innate immunity;Insulin-like; Chorion, Hormne (NP)

S. invicta 14356 84 12 2 4 Innate immunity

D. melanogaster 39961 31091 475 21 120

Innate immunity; Developmental; Channel ligand; Receptor, Hormone (NP)

C.elegans 26005 25534 464 21 89 Hormone (NP), Channel ligand; Receptor, Protease

Page 23: NP - Positive set
Page 24: NP - Positive set

SW Arthropods

UniProt Arthropods

 Random Forest

Gradient Boosting

Linear SVC

Random Forest

Gradient Boosting

Linear SVC

Mean Accuracy 0.94 0.95 0.94 0.92 0.92 0.86Mean Precision 0.94 0.95 0.93 0.93 0.94 0.95Mean Recall 0.92 0.92 0.92 0.95 0.95 0.85Mean AUC 0.94 0.95 0.94 0.89 0.90 0.87

SW Chordates

  UniProt Chordates

 Random Forest

Gradient Boosting

Linear SVC

Random Forest

Gradient Boosting

Linear SVC

Mean Accuracy 0.96 0.97 0.95 0.90 0.91 0.85Mean Precision 0.94 0.94 0.88 0.91 0.92 0.89Mean Recall 0.91 0.92 0.93 0.91 0.91 0.83Mean AUC 0.95 0.95 0.94 0.90 0.91 0.85

Page 25: NP - Positive set
Page 26: NP - Positive set

organism / taxa # of UniProt (UniRef90)

# of NPP in SW

(UniRef90)

# of NPP in UniProt

(UniRef90)

PredictionNeuroPID

RBFa

Apis Melliferra 10394 6 19 7

SP in Apis Melliferab 2139 5 7

Gallus gallus 20760 5 5 5

SP in Gallus gallus 701 1 1

Bombyx mori 15250 5 17 9

SP in Bombyx mori 112 5 5 9

Octopoda 224 4 4 4

SP in Octopoda 76 3 3

Page 27: NP - Positive set

Updates 5 7 2013

Page 28: NP - Positive set
Page 29: NP - Positive set

RF 7

ExtraTree 8

SVM-SVC 16GBR 79

4

11

2 2

1

Page 30: NP - Positive set

RFExt-Tree

79

SVM-SVC

GBR

16

8786%

42%

100%100%

60%

75% RFExt-Tree

18

SVM-SVC

GBR

10

95100%

60%

100%100%

33%

80%

Apis mellifera Thaumeledone gunteri

Page 31: NP - Positive set

Updates 5 7 2013

Page 32: NP - Positive set
Page 33: NP - Positive set
Page 34: NP - Positive set
Page 35: NP - Positive set
Page 36: NP - Positive set

S. frugiperda (Fall armyworm) 5

H. armigera (Cotton bollworm) 6

S. gregorian (Desert locust ) 4

A. florea (Little honeybee) 0

M. rotundata (Alfalfa leafcutter bee)1

C. floridanus (Florida carpenter ant) 2

A. echinatior (Leafcutter ant) 3

A

C

B

Page 37: NP - Positive set
Page 38: NP - Positive set
Page 39: NP - Positive set
Page 40: NP - Positive set

0

0.3

0.6

0.9

1.2

1.5

% o

f ann

otat

ed N

Ps in

taxo

nom

y

113

77

2510

Mammalia

Insecta

Caenorhabditis

others

Page 41: NP - Positive set

A B

Page 42: NP - Positive set

66- 97 KRL....YDFG.........LG..............KRA..YsyvSEYKRL.............................pvYN..FGLGKR 98- 120 SKM....YGFG.........LG..............KR.......DG..RM...............................YS..FGLGKR 121- 164 DYD....Y.YGeededdqqaIGdedieesdvgdlmdKR..........DRL...............................YS..FGLGKR 165- 191 ARP....YSFG.........LG..............KRA..P...SGAQRL...............................YG..FGLGKR 192- 216 GGS...lYSFG.........LG..............KR........GDGRL...............................YA..FGLGKRPVN 222- 253 GRSsgsrFNFG.........LG..............KRS..D...DIDFRE...............................LEekFAEDKR 254- 316 .YPqehrFSFG.........LG..............KREveP...SELEAVrneekdnssvhdkknntndmhsgerikrslhYP..FGIRKL 347- 367 RRP....FNFG.........LG..............KRI..P........M...............................YD..FGIGKR

66- 97 KRL....YDFG.........LG..............KRA..YsyvSEYKRL........pvYN..FGLGKR98-120 SKM....YGFG.........LG..............KR.......DG..RM..........YS..FGLGKR121-164 DYD....Y.YGeededdqqaIGdedieesdvgdlmdKR..........DRL..........YS..FGLGKR165-191 ARP....YSFG.........LG..............KRA..P...SGAQRL..........YG..FGLGKR192-220 GGS...lYSFG.........LG..............KR........GDGRL..........YA..FGLGKRPVNS221-253 GRSsgsrFNFG.........LG..............KRS..D...DIDFRE..........LEekFAEDKR254-316 YPqehrFSFG.........LG..............KREveP...SELEAVrne(25)slhYP..FGIRKL346-367 RRP....FNFG.........LG..............KRI..P........M..........YD..FGIGKR