Eat Raw & Fresh: Introducing isotopic Mass-to-charge Ratio and Envelope Fingerprinting (iMEF) and ProteinGoggle for Protein Database Search Zhixin(Michael)

Post on 29-Jan-2016

214 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

Transcript

Eat Raw & Fresh: Introducing isotopic Mass-to-charge

Ratio and Envelope Fingerprinting (iMEF) and

ProteinGoggle for Protein Database Search

Zhixin(Michael) Tian

CNCP

11/15/2012

What is mass?

5 1 1 5 1 2 5 1 3 5 1 4 5 1 5 5 1 6 5 1 70

2 0

4 0

6 0

8 0

1 0 0

Rel

ativ

e A

bu

nd

ance

(%

)

m /z (z = 1 )L. C. Dias, et al. J. Org. Chem. 2012, 77, 4046.

Monoisotopic mass(m/z, z=+1)

Missing monoisotopic mass in proteinMonoisotopic mass

Average mass: Error: ±1 u at 16,000 u (13C/12C ratio’s variability)

Mass of the most abundant isotope Error: ±1 Da or more

(mis-assignment of # of contributing heavy isotopes )

7 7 7 0 7 7 7 5 7 7 8 0 7 7 8 5 7 7 9 0 7 7 9 50

1 0

2 0

3 0

4 0

5 0

6 0

7 0

8 0

9 0

1 0 0

Rel

. A

bu

nd

. (%

)

m /z (z = 1 , 7 0 A v e ra g in e s )

0 2 0 4 0 6 0 8 0 1 0 0 1 2 00

1 0

2 0

3 0

4 0

5 0

6 0

7 0

8 0

9 0

1 0 0

Rel

. Ab

un

d. (

%)

n (A v e ra g in e )

: most significant & accurate

Deisotoping (Deconvolution)

Algorithms: AID-MS, ESI-ISOCONV, LASSO, MapQuant, MasSPIKE, MATCHING, msInspect, Peplist, quadratic deisotoping, RAPID, THRASH, Wang’s method, Zhang’s program, and ZSCORE

Steps:1.Calculate background noise level2.Determine charge state using FT/Patterson technique3.Calculate theoretical profile4.Fit with observed isotopic profile5.Monoisotopic mass

Search Engines: ProSightPC, SEQUEST, Mascot, X!Tandem, InsPecT, OMSSA, Andromeda, pFind

RAW File

MS Spectrum(iE)

MS/MS Spectra(iE)

A1/P1

Parent(Exp. mass)

Parent(Theo. mass)

Protein Database

A2/P3

CandidatesFragments

(Theo. mass)Fragments(Exp. mass)

A1/P2

A2/P4

Initial IDsFinal IDs

Peptide Mass Fingerprinting (PMF)

Ubiquitin - MS spectrum (profile)

5 0 0 1 0 0 0 1 5 0 0 2 0 0 00 .0

3 .0 x 1 0 6

6 .0 x 1 0 6

9 .0 x 1 0 6

1 .2 x 1 0 7

1 .5 x 1 0 7

856 .5 857 .0 857 .5 858 .0 858 .5 859 .0

z = 1 0

z=1

2z=

11

z=1

0

z=9

z=8

z=7A

bu

nd

an

ce

(a

u)

m /zz=

6

Ubiquitin – MS/MS (ETD) Spectrum (Profile)

4 0 0 6 0 0 8 0 0 1 0 0 0 1 2 0 0 1 4 0 0 1 6 0 0 1 8 0 0 2 0 0 00

1 x 1 0 5

2 x 1 0 5

3 x 1 0 5

4 x 1 0 5

5 x 1 0 5

6 x 1 0 5

8 2 0 8 2 5 8 3 0 8 3 5 8 4 0 8 4 5

Inte

ns

ity

(a

u)

m /z

0 2000 4000 6000 8000 100000.0

5.0x105

1.0x106

1.5x106

2.0x106

2.5x106

3.0x106

3.5x106

4.0x106

Inte

ns

ity

(a

.u.)

Monoisotopic Mass (Da)

Database search with PMF using ProSightPC

NMFs = 92NUMFs = 219P score = 4.86E-98

Neil L. Kelleher, et al. Nat. Biotechnol. 2001, 19, 952

f - the total number of observed fragments (NMFs + NUMFs);

n - the number of matching fragments (NMFs).

x - the mean probability that a mass of an observed fragment ion will

randomly match one from a generic protein

111.1 - the mass of the average amino acid, weighted for its

occurrence in proteins;

2 - the number of fragment ions generated from each bond cleavage,

which is assumed to be 2 (b- and y-type ions or c-and z•-type ions); Ma - the mass accuracy (a Ma of ±1 Da translates to a 2 Da window).

Definition of P_Score

Is “MFs” really good?

?

277.0 277.5 278.0 278.5 279.0 279.50

20

40

60

80

100

-10, -4

5

3, -3

3

Rela

tive A

bu

nd

an

ce (

%)

m/z

C2

1+

0,

0

864 865 866 8670

20

40

60

80

100

9, 20

Rela

tive A

bu

nd

an

ce (

%)

m/z

c7

1+

4, -2

2

Is “NUMFs” really good?

PeakPicking:SNRThreshold = 3.0BackgroundRatio = 5.0FitType = Lorentzian

DeconvPep:MaxCharge = 25ThScore = 0.0

AdvDeconv:MaxAbundancePeak = 3ScanNoModifier = 0MaxMissPeak = 3MassErr = 1.0E-05ThClustExt = 0.0IntsRangeErr = 0.5

THRASH (92+219=311) RAPID (28+49=77)

Better “deisotoping”?

NO “deisotoping”?

What is a mass spectrum?

5 0 0 1 0 0 0 1 5 0 0 2 0 0 00 .0

3 .0 x 1 0 6

6 .0 x 1 0 6

9 .0 x 1 0 6

1 .2 x 1 0 7

1 .5 x 1 0 7

856 .5 857 .0 857 .5 858 .0 858 .5 859 .0

z = 1 0

z=12

z=11

z=10

z=9

z=8

z=7A

bu

nd

ance

(au

)

m /zz=

6

MS of Ubiquitin

x, y coordinates

8 5 6 .5 8 5 7 .0 8 5 7 .5 8 5 8 .0 8 5 8 .50 .0

3 .0 x 1 0 6

6 .0 x 1 0 6

9 .0 x 1 0 6

1 .2 x 1 0 7

1 .5 x 1 0 7

Ab

un

da

nc

e (

au

)

m /z (z = 1 0 )

Profile Exp.

m/z

Exp.

Abundance 856.9821 6061857.0825 21811857.1826 52841857.2809 82342857.3782 93523857.4746 96019857.5714 75857857.6682 60680857.7663 42420857.8669 27294857.9680 14752858.0681 5685858.1685 1120858.2717 919858.3671 316858.4594 147

2

2

( )

22

1( )

2

x

f x e

0

20

1( ; , )

[1 ( ) ]f x x

x x

Centroid

8 5 6 .5 8 5 7 .0 8 5 7 .5 8 5 8 .0 8 5 8 .5 8 5 9 .0

0 .0

2 .0 x 1 0 6

4 .0 x 1 0 6

6 .0 x 1 0 6

8 .0 x 1 0 6

1 .0 x 1 0 7

1 .2 x 1 0 7

1 .4 x 1 0 7

Ab

un

da

nc

e (

au

)

m /z (z = 1 0 )

The nature of the iE of an ion

What are in a protein database?

MQIFVKTLTGKTITLEVEPSDTIENVKAKIQDKEGIPPDQQRLIFAGKQLEDGRTLSDYNIQKESTLHLVLRLRGG

C378H630N105O118S1

8 5 6 .5 8 5 7 .0 8 5 7 .5 8 5 8 .0 8 5 8 .5 8 5 9 .00

2 0

4 0

6 0

8 0

1 0 0

Rel

ativ

e A

bu

nd

ance

(%

)

m /z (z= 1 0 )

Exp.

m/z

Exp.

Abundance 856.9690 3.95857.0692 18.83857.1695 45.88857.2698 76.13857.3701 96.65857.4703 100.00857.5706 87.76857.6709 67.12857.7711 45.63857.8714 27.99857.9716 15.67858.0719 8.09858.1721 3.87858.2724 1.73858.3726 0.73858.4729 0.29

x, y coordinates

Centroid

RAW File

MS Spectrum(iE)

MS/MS Spectra(iE)

Protein Database

Candidates

Initial IDsFinal IDs

A1/P1

Parent(Exp. mass)

Fragments(Exp. mass)

A1/P2

iMEF(isotopic m/z & Envelope Fingerprinting)

A2/P3

A2/P4

Parent(Theo. mass)

Fragments(Theo. mass)

Parent(Theo. iE)

Fragments(Theo. iE)

A/P1

A/P2

1st isotopic peakDB A1/F1

Parent ion exp. iEParent ion theo. iEA2 F2

Protein candidates

Fragment ion exp. iEsFragment ion theo. iEsA2/F3

Preliminary protein IDs

2nd isotopic peakDB A1/F1 Y

3rd isotopic peakDB A1/F1 Y

Initial protein ID

NMFs PTM_Scores

Initial protein IDs

Final IDsRemove

duplicates

Isotopic peakexclusion list

Norm. isotopic peaks

removed

N

Combined initial protein IDs

N

Preliminary protein candidates

N

N

Y

Y

Y

N

N

N

Pre-Step 1: Customized database

MS Precursor ions

MS/MS fragment ions

0 1 0 2 0 3 0 4 0 5 00 .0

0 .5

1 .0

1 .5

2 .0

Rel

ativ

e A

bu

nd

ance

(%

)

% o f c h a n g e d p o in ts in a s te p s iz e o f 0 .1

Pre-Step 2: Noise level determination

0 2 4 6 8 1 00

2

4

6

8

1 0

Rel

ativ

e A

bu

nd

ance

(%

)

% o f c h a n g e d p o in ts in a s te p s iz e o f 0 .1

4 0 0 6 0 0 8 0 0 1 0 0 0 1 2 0 0 1 4 0 0 1 6 0 0 1 8 0 0 2 0 0 00

2

4

6

8

1 0

Re

lati

ve

Ab

un

da

nc

e (

%)

m /z5 0 0 1 0 0 0 1 5 0 0 2 0 0 0

0 .0

0 .5

1 .0

1 .5

2 .0

Re

lati

ve

Ab

un

da

nc

e (

%)

m /z

Ubiquitin - MS spectrum (profile)

5 0 0 1 0 0 0 1 5 0 0 2 0 0 00 .0

3 .0 x 1 0 6

6 .0 x 1 0 6

9 .0 x 1 0 6

1 .2 x 1 0 7

1 .5 x 1 0 7

856 .5 857 .0 857 .5 858 .0 858 .5 859 .0

z = 1 0

z=1

2z=

11

z=1

0

z=9

z=8

z=7A

bu

nd

an

ce

(a

u)

m /zz=

6

Ubiquitin – MS/MS (HCD) spectrum (profile)

4 0 0 6 0 0 8 0 0 1 0 0 0 1 2 0 0 1 4 0 0 1 6 0 0 1 8 0 0 2 0 0 00

1 x 1 0 5

2 x 1 0 5

3 x 1 0 5

4 x 1 0 5

5 x 1 0 5

6 x 1 0 5

8 2 0 8 2 5 8 3 0 8 3 5 8 4 0 8 4 5

Inte

ns

ity

(a

u)

m /z

9 1 0 .7 9 1 0 .80

1 x 1 0 4

2 x 1 0 4

3 x 1 0 4

Re

lati

ve

In

t. (

au

)

m /z

Step 1: Profile to centroid (MS & MS2)

Step 2: iMF of precursor ion candidates

8 5 4 8 5 5 8 5 6 8 5 7 8 5 8 8 5 9 8 6 00 .0

2 .0 x 1 0 4

4 .0 x 1 0 4

6 .0 x 1 0 4

8 .0 x 1 0 4

1 .0 x 1 0 5

Ab

un

da

nc

e (

au

)

m /z

isolation window (±3 m/z units)

857.47461 (4 ppm)

… … … … … …

8 5 7 8 5 8 8 5 90

2 0

4 0

6 0

8 0

1 0 0

60 21 20 13 1 0 -10

-6 -3 2 -2 -27

-70

-45

-55

-47

E x p e rim e n ta l T h e o re tic a l

Ab

un

da

nc

e (

%)

m /z

IP A C O (5 % )

15 15 15 13 9 5 1 -3 -6 -5 -4 -4 -4 -1 -6 -16

IPM

D

IP

AD

Step 3: iEF of precursor ion candidates

IPACO 5%IPMD 15ppmIPAD 30%

Step 4: iMF of fragment ion candidates

2 7 0 2 7 1 2 7 2 2 7 3 2 7 4 2 7 5 2 7 6 2 7 7 2 7 8 2 7 9 2 8 00

1 x 1 0 5

2 x 1 0 5

3 x 1 0 5

Ab

un

da

nc

e (a

u)

m /z

277.13278 (5 ppm)

C1;MAX_MZ=149.07431&C2;MAX_MZ=277.132888&C3;MAX_MZ=390.216952&C4;MAX_MZ=537.285366&C5;MAX_MZ=636.353779&C6;MAX_MZ=764.448743&C7;…

277.0 277.5 278.0 278.5 279.0 279.50

20

40

60

80

100

-10, -4

5

3, -3

3

Rela

tive A

bu

nd

an

ce (

%)

m/z

C2

1+

0,

0

Step 5: iEF of fragment ion candidates

IPACO 5%IPMD 10ppmIPAD 50%

Human histone H4_S1acK16acK20me2

Exemplary PTM_Score assignment

ID of ubiquitin from ETD

NMFs = 91 IPACO=10, IPMD=15, IPAD=100

IPMDO=20, IPMDOM=30, IPADO=20, IPADOM=200

0 5 10 15 200

20

40

60

80

100

NM

Fs

IPMD (ppm)

0 50 100 150 2000

20

40

60

80

100

NM

Fs

IPAD (%)

0 10 20 30 40 500

40

80

120

160

200

NM

Fs

IPACO (%)

NMFs vs. IPACO NMFs vs. IPMD NMFs vs. IPAD

Pros and Cons

Pros:

As-strict-as-you-choose confidence

•Strict quality control (QC)

Fine discrimination of close iEs

In-situ unwrapping of overlapped iEs

Cons:

More complex and bigger database

More data points for fingerprinting

Comparison with ProSightPC

277.0 277.5 278.0 278.5 279.0 279.50

20

40

60

80

100

-10, -4

5

3, -3

3

Rela

tive A

bu

nd

an

ce (

%)

m/z

C21+

0,

0

Pros: As-strict-as-you-choose confidence

Layman’s choice of parameters

Default values with statistical significance!

Pros: Fine discrimination of close iEs

b38-533+ b18-33

3+ or b19-343+ (b6-22-H2O)3+

Exp. m/z Theo. m/z IPMD Theo. m/z IPMD Theo. m/z IPMD

599.6575 599.6478 16 599.6511 11 599.6595 -3

599.9919 599.9821 16 599.9855 11 599.9939 -3

600.3242 600.3164 13 600.3197 8 600.3281 -6

600.6616 600.6506 18 600.6539 13 600.6623 -1

5 9 9 .5 6 0 0 .0 6 0 0 .5 6 0 1 .00

2 0

4 0

6 0

8 0

1 0 0 E x p e r im e n ta l

b 1 8 -3 33 + /b 1 9 -3 4

3 +

(b 6 -2 2 -H 2 O )3 +

b 3 8 -5 33 +

Rel

ativ

e A

bu

nd

ance

(%

)

m /z

Pros: In-situ unwrapping of overlapped iEs

Proportional partition k: # of overlapped isotopic peaksm: # of isotopic peak in each iEn: # of overlapped iEs

1036.5 1037.0 1037.5 1038.0 1038.5 1039.00.0

5.0x104

1.0x105

1.5x105

1037.0 1037.5 1038.0 1038.5

-3,0

-2,1

1-2,-3421,-31

2,1

9

1,-15

1,0

1,-5

m/z

c73 - 8+

1,2

2

1037.5 1038.0 1038.5 1039.0

-2,3

-2,-

22-1

,1

-4,-

29

-1,00,-

7

-1,-

42

0,1

4

m/z

c64 - 7+

14,-

16

1036.5 1037.0 1037.5 1038.00

30

60

90

120

150

18,-38

-2,-23

-2,-10-2

,-14

-1,-15

-1,7

-2,1

0

-1,0

-1,4

1

Rela

tive A

bundance (%

)

m/z

z74 - 8+

-1,-27

z74-8+

c73-8+

c64-7+

Ab

un

da

nc

e (

au

)

m/z

Other improvements and utilities

Improvements:

Bi-section method for fast indexing of candidates

LASSO-like approach to untangle overlapped iEs

Additional utilities:

A comprehensive confidence score

False discovery rate (FDR)

Customized ion types to look for new dissociation

channels

Customized MODs for the search of new modification or

labeled proteins

MS/MS spectrum annotation with matching fragments

CONCLUSIONS

An as-confident-as-you-choose protein database search algorithm, iMEF, has been created and implemented in the search engine ProteinGoggle

The principle of iMEF with ProteinGoggle is demonstrated with identification of ubiquitin from its tandem mass spectrum using ETD

iMEF as implemented in ProteinGoggle has been able to unwrap complex overlapping isotopic envelopes and confidently provide embedded fragment ions

iMEF could be adapted for peptide and glycan database search with customized databases

ACKNOWLEDGEMENTS

Funding: DICP “Research Start”China “Youth 1000-talents Theme”

The KENES. Co. Ltd.Miao ZhouShijin LiuBin Yang

DNL2003Li LiBo WangJing LiXu Zhao

THANK YOU VERY MUCH!

top related