How to identify peptides October 2013 Gustavo de Souza IMM, OUS
Dec 18, 2015
Fragmentation
Nomenclature for peptide sequence-ions:
Collision-Induced Dissociation (CID): MHn
n+* + N2 --> b + y
Electron Capture Dissociation (ECD): MHn
n++ e- --> MHn(n-1)+· --> c + z·
Fragmentation
H2NN H
H N
N H
H N
N H
R1
R2
R3
R4
R5
H N
R6
N H
R7
R8O
O
O
O
O
O
O
O
OH
y7
b1
y6
b2
y4
b4
y5
b3
y2
b6
y3
b5
y1
b7
Roepstorff-Fohlmann-Biemann-Nomenclature
MS/MS of a peptideLG_y2_13 #11793 RT: 84.81 AV: 1 NL: 3.57E5T: ITMS + c ESI d w Full ms2 [email protected] [ 190.00-1485.00]
200 300 400 500 600 700 800 900 1000 1100 1200 1300 1400m/z
0
5
10
15
20
25
30
35
40
45
50
55
60
65
70
75
80
85
90
95
100
Re
lativ
e A
bu
nd
an
ce
P y13
y12y11
y10
y9
y8
y7
y6
y5y4
y3
y2 b13b12b11
b10
b9
b8
b7
b6
b5
b4b3
P y++13VPTVDVSVVDLTVK
How to Identify MS/MS
Stenn and Mann, 2004.
Peptide Sequence Tags
Autocorrelation
Probability based match
How identification happen?
Your data Protein database (fasta)
Step 1: which theoretical peptides has the same mass of the observed ion?
Step 2: From those, which one have the most similar fragmentation pattern?
x x x
High mass accuracy – what is it good for?
All theoretical tryptic peptide masses from human IPI database
Example Tryptic HSP-70 peptide: ELEEIVQPIISK, mass 1396.7813 Da
1111
Ext.Ext.
2 ppm2 ppm
LTQ-FTLTQ-FT
9933335252344344# of tryptic # of tryptic peptides for peptides for m/z m/z 1396.78131396.7813
Ext-SIMExt-SIMInt.Int.Ext.Ext.Ext.Ext.CalibrationCalibration
1 ppm1 ppm10 ppm10 ppm20 ppm20 ppm500500Mass Mass AccuracyAccuracy
LTQ-FTLTQ-FTQSTARQSTARQSTARQSTARLTQLTQInstrumentInstrument
33
Int.Int.
0.5 ppm0.5 ppm
LTQ-FTLTQ-FT
The “Search Space”
0 mcl
12 34 5
6
1/2
12 34 5
6
2/3
3/44/5
5/6
1 mcl
1/2
12 34 5
6
2/3
3/44/5
5/6
2 mcl
1/2/3
2/3/43/4/5
4/5/6
Importance of Search Space Size
Search tool does not identify a peptide. It only reports the statiscally most suitable theoretical sequence related with the experimental data.
If you increase the size of the database too much, or the size of the search space, false-positive rates also
increase.
Chance that two peptides with different sequences but approximate Mr and sharing MS/MS similarities.
More variables inserted during search Higher chance to get random events Higher MOWSE score threshold
Parameters that can modify the MOWSE calculation:
-Database size;
-MMD (measured mass deviation);
-Number of PTMs choosen;
-Data quality.
MOWSE
Mycoplasma sp. sample (Munich 2006):
-Database had ~ 700 entries;
-Data accuracy had 0.7ppm average;
-MMD used during search: 3 ppm.
Probability Based Mowse ScoreIons score is -10*Log(P), where P is the probability that the observed match is a random event.Individual ions scores > 7 indicate identity or extensive homology (p<0.05).Protein scores are derived from ions scores as a non-probabilistic basis for ranking protein hits.
Example of MMD issue
Peng et al (2003). Evaluation of multidimensional chromatography coupled with tandem mass spectrometry (LC/LC-MS/MS) for large-scale protein analysis: the yeast
proteome. J Prot Res 2, 43-50.
Reversed database sequence
Strategies to Visualize FDRs
False positive identification using reversed database
HSP-70 trypticpeptide
K ELEEIVQPIISK
(forward) (reverse)
K SIIPQVIEELEK
PeptideMr
1396.7813Da 1396.7813Da
Mascot checksbothpeptides
Theoretical y series Theoretical y series
y1
y2
y3
....
y11
147.1
234.1
347.2
....
1267.7
147.1
276.2
389.2
....
1309.7
Expected ions fromreversedhit should not correlate
with oberved ions onexperiment
All peptides Mascot
0
20
40
60
80
100
120
140
160
5 7 9 11 13 15 17 19 21 23 25
Seq lenght
Mas
cot
Sco
reTypical Result
Are there any Reversed hit protein with 2 peptides above MOWSE score?
-No: All proteins identified with 2 peptides score higher than p<0.05 are good
-Yes: Repeat mascot search with more stringent parameters.
What about 1-hit wonders? (Proteins identified with only 1 peptide)
How to Validate the Data
Basically, the idea is to ”play around” with the statistics to make your result more reliable.
All peptides Mascot
0
20
40
60
80
100
120
140
160
5 7 9 11 13 15 17 19 21 23 25
Seq lenght
Mas
cot
Sco
reHow to Validate the Data
Take home message
1. Data quality (mass accuracy) and a well-defined search space are key for reliable peptide identification
2. Reliable identification is an interplay between asking enough without asking too much (careful when trying to get “as many IDs as I can”!)
PTM abundance in a cell
Total peptides in a sampleModified peptides
Num
ber
of
Pep
tides
Abundance level
Differences from 10e2 to 10e4