Top Banner
Confident Phosphorylation Site Localization Using the Mascot Delta Score S Mikhail M. Savitski‡, Simone Lemeer§, Markus Boesche‡, Manja Lang‡, Toby Mathieson‡, Marcus Bantscheff‡, and Bernhard Kuster§¶ Large scale phosphorylation analysis is more and more getting into focus of proteomic research. Although it is now possible to identify thousands of phosphorylated peptides in a biological system, confident site localization remains challenging. Here we validate the Mascot Delta Score (MD-score) as a simple method that achieves sim- ilar sensitivity and specificity for phosphosite localization as the published Ascore, which is mainly used in conjunc- tion with Sequest. The MD-score was evaluated using liquid chromatography-tandem MS data of 180 individu- ally synthesized phosphopeptides with precisely known phosphorylation sites. We tested the MD-score for a wide range of commonly available fragmentation methods and found it to be applicable throughout with high statistical significance. However, the different fragmentation tech- niques differ strongly in their ability to localize phosphor- ylation sites. At 1% false localization rate, the highest number of correctly assigned phosphopeptides was achieved by higher energy collision induced dissociation in combination with an Orbitrap mass analyzer followed very closely by low resolution ion trap spectra obtained after electron transfer dissociation. Both these methods are significantly better than low resolution spectra ac- quired after collision induced dissociation and multi stage activation. Score thresholds determined from simple cal- ibration functions for each fragmentation method were stable over replicate analyses of the phosphopeptide set. The MD-score outperforms the Ascore for tyrosine phos- phorylated peptides and we further show that the ability to call sites correctly increases with increasing distance of two candidate sites within a peptide sequence. The MD- score does not require complex computational steps which makes it attractive in terms of practical utility. We provide all mass spectra and the synthetic peptides to the community so that the development of present and future localization software can be benchmarked and any labo- ratory can determine MD-scores and localization proba- bilities for their individual analytical set up. Molecular & Cellular Proteomics 10: 10.1074/mcp.M110.003830, 1–12, 2011. Post translational modifications (PTMs) 1 of proteins are be- ing actively pursued owing to their broad biological signifi- cance. In particular, recent advances in liquid chromatogra- phy and mass spectrometry have made the large scale enrichment, identification and quantification of phosphopep- tides feasible (1– 6). At the same time, it has become increas- ingly difficult if not impossible to verify both identification and phosphorylation site assignments by manual inspection of tandem mass spectra (7). For reasons of throughput and objectivity, the automatic assignment of phosphorylation to the correct amino acid in a peptide has become an important yet challenging task (2, 8, 9). Owing to the frequently observed loss of the phosphate group in the gas phase, tandem MS spectra of phosphopeptides are often not straightforward to interpret (10). This complicates site localization because, un- like for peptide identification, the detection of one or few particular fragment ions is often required for unambiguous results. If a particular fragmentation technique does not gen- erate these ions efficiently or the employed mass spectrom- eter cannot efficiently detect them, site localization may be significantly impaired. The situation is further complicated by the presence of multiple potential sites of modification in a peptide and the fact that phosphopeptides are often identified by single spectra only. Many of the common automated pep- tide identification tools such as Mascot and Sequest (11, 12) do not explicitly score for proper PTM site assignment. In Mascot, ion scores are computed for each spectrum to pep- tide match, which may include alternative PTM site localiza- tions within a peptide but the ion score alone does generally not suffice to call a phosphorylation site correctly (9). To overcome issues of throughput and objectivity in phos- phorylation site localization, several computational ap- proaches have been published over the past few years (2, 8, 9, 13–17). The two best known are the Ascore (9) and the highly similar PTM score (2), which both use empirically col- lected information on the fragmentation behavior of phospho- From the ‡Cellzome AG, Meyerhofstrasse 1, 69117 Heidelberg, Germany, §Chair of Proteomics and Bioanalytics, Technische Univer- sita ¨ t Mu ¨ nchen, Emil Erlenmeyer Forum 5, 85354 Freising, Germany, ¶Center for Integrated Protein Sciences Munich (CIPSM) Received July 29, 2010, and in revised form, October 18, 2010 Published, MCP Papers in Press, November 6, 2010, DOI 10.1074/ mcp.M110.003830 1 The abbreviations used are: CID, collision induced dissociation; ETD, electron transfer dissociation; ETDSA, ETD with supplemental activation; FLR, false localization rate; HCD, higher energy collision induced dissociation; LC-MS/MS, liquid chromatography tandem mass spectrometry; MALDI, matrix-assisted laser desorption ionisa- tion; Mgf, Mascot generic format; MSA, multistage activation; PTMs, post translational modifications; QTOF, quadrupole time of flight. Technological Innovation and Resources © 2011 by The American Society for Biochemistry and Molecular Biology, Inc. This paper is available on line at http://www.mcponline.org Molecular & Cellular Proteomics 10.2 10.1074/mcp.M110.003830 –1
12

Confident Phosphorylation Site Localization Using the Mascot Delta Score

May 13, 2023

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Confident Phosphorylation Site Localization Using the Mascot Delta Score

Confident Phosphorylation Site LocalizationUsing the Mascot Delta Score□S

Mikhail M. Savitski‡, Simone Lemeer§, Markus Boesche‡, Manja Lang‡,Toby Mathieson‡, Marcus Bantscheff‡�, and Bernhard Kuster§¶�

Large scale phosphorylation analysis is more and moregetting into focus of proteomic research. Although it isnow possible to identify thousands of phosphorylatedpeptides in a biological system, confident site localizationremains challenging. Here we validate the Mascot DeltaScore (MD-score) as a simple method that achieves sim-ilar sensitivity and specificity for phosphosite localizationas the published Ascore, which is mainly used in conjunc-tion with Sequest. The MD-score was evaluated usingliquid chromatography-tandem MS data of 180 individu-ally synthesized phosphopeptides with precisely knownphosphorylation sites. We tested the MD-score for a widerange of commonly available fragmentation methods andfound it to be applicable throughout with high statisticalsignificance. However, the different fragmentation tech-niques differ strongly in their ability to localize phosphor-ylation sites. At 1% false localization rate, the highestnumber of correctly assigned phosphopeptides wasachieved by higher energy collision induced dissociationin combination with an Orbitrap mass analyzer followedvery closely by low resolution ion trap spectra obtainedafter electron transfer dissociation. Both these methodsare significantly better than low resolution spectra ac-quired after collision induced dissociation and multi stageactivation. Score thresholds determined from simple cal-ibration functions for each fragmentation method werestable over replicate analyses of the phosphopeptide set.The MD-score outperforms the Ascore for tyrosine phos-phorylated peptides and we further show that the ability tocall sites correctly increases with increasing distance oftwo candidate sites within a peptide sequence. The MD-score does not require complex computational stepswhich makes it attractive in terms of practical utility. Weprovide all mass spectra and the synthetic peptides to thecommunity so that the development of present and futurelocalization software can be benchmarked and any labo-ratory can determine MD-scores and localization proba-bilities for their individual analytical set up. Molecular &Cellular Proteomics 10: 10.1074/mcp.M110.003830, 1–12,2011.

Post translational modifications (PTMs)1 of proteins are be-ing actively pursued owing to their broad biological signifi-cance. In particular, recent advances in liquid chromatogra-phy and mass spectrometry have made the large scaleenrichment, identification and quantification of phosphopep-tides feasible (1–6). At the same time, it has become increas-ingly difficult if not impossible to verify both identification andphosphorylation site assignments by manual inspection oftandem mass spectra (7). For reasons of throughput andobjectivity, the automatic assignment of phosphorylation tothe correct amino acid in a peptide has become an importantyet challenging task (2, 8, 9). Owing to the frequently observedloss of the phosphate group in the gas phase, tandem MSspectra of phosphopeptides are often not straightforward tointerpret (10). This complicates site localization because, un-like for peptide identification, the detection of one or fewparticular fragment ions is often required for unambiguousresults. If a particular fragmentation technique does not gen-erate these ions efficiently or the employed mass spectrom-eter cannot efficiently detect them, site localization may besignificantly impaired. The situation is further complicated bythe presence of multiple potential sites of modification in apeptide and the fact that phosphopeptides are often identifiedby single spectra only. Many of the common automated pep-tide identification tools such as Mascot and Sequest (11, 12)do not explicitly score for proper PTM site assignment. InMascot, ion scores are computed for each spectrum to pep-tide match, which may include alternative PTM site localiza-tions within a peptide but the ion score alone does generallynot suffice to call a phosphorylation site correctly (9).

To overcome issues of throughput and objectivity in phos-phorylation site localization, several computational ap-proaches have been published over the past few years (2, 8,9, 13–17). The two best known are the Ascore (9) and thehighly similar PTM score (2), which both use empirically col-lected information on the fragmentation behavior of phospho-

From the ‡Cellzome AG, Meyerhofstrasse 1, 69117 Heidelberg,Germany, §Chair of Proteomics and Bioanalytics, Technische Univer-sitat Munchen, Emil Erlenmeyer Forum 5, 85354 Freising, Germany,¶Center for Integrated Protein Sciences Munich (CIPSM)

Received July 29, 2010, and in revised form, October 18, 2010Published, MCP Papers in Press, November 6, 2010, DOI 10.1074/

mcp.M110.003830

1 The abbreviations used are: CID, collision induced dissociation;ETD, electron transfer dissociation; ETDSA, ETD with supplementalactivation; FLR, false localization rate; HCD, higher energy collisioninduced dissociation; LC-MS/MS, liquid chromatography tandemmass spectrometry; MALDI, matrix-assisted laser desorption ionisa-tion; Mgf, Mascot generic format; MSA, multistage activation; PTMs,post translational modifications; QTOF, quadrupole time of flight.

Technological Innovation and Resources© 2011 by The American Society for Biochemistry and Molecular Biology, Inc.This paper is available on line at http://www.mcponline.org

Molecular & Cellular Proteomics 10.2 10.1074/mcp.M110.003830–1

Page 2: Confident Phosphorylation Site Localization Using the Mascot Delta Score

peptides and check for the presence and intensity order ofdiagnostic fragment ions in tandem mass spectra. The Ascorefor example then calculates the localization specific probabil-ity for every possible amino acid site present in a given pep-tide. Although both the Ascore and PTM score algorithmsappear to work well, there are shortcomings. For instance, theAscore (as implemented in available software) is incompatiblewith the widely used search engine Mascot. The PTM scorewas developed on large-scale phosphorylation data sets fromcell lines without validating the score performance on phos-phopeptides with known phosphorylation sites. Both scores(as published and implemented in available software) are onlyenabled for CID spectra acquired on low resolution ion trapanalyzers and their performance on other fragmentation typeshas not been systematically evaluated. The SloMo method (8)is an adaptation of the Ascore for electron capture dissocia-tion and electron transfer dissociation (ETD) data but (asimplemented in available software) only accepts Sequest andOMSSA search results.

In light of the above, there still is a need for additional toolsthat are more widely applicable to multiple mass spectrometryplatforms and fragmentation types or fill gaps in availablemethods. Using results of database search engines directlyfor phosphorylation site localization would be attractive forthis purpose. For ion trap collision induced dissociation (CID)data and the search engine Mascot, Beausoleil et al. (9) eval-uated the performance of a normalized Mascot delta ionsscore (normalized MD-score) that calculates the differencebetween the top two Mascot ion scores of alternative phos-phorylation sites in the same peptide sequence divided bythe ion score of the top ranking site. The authors found thenormalized MD-score to be inferior in performance to theAscore. The Heck laboratory has recently used the MD-score without normalization, but did not evaluate or validateits performance (18, 19).

In this work, we re-evaluated the ability of the MD-score toestimate the probability of correct phosphorylation site local-ization for commonly used peptide fragmentation types onthree types of mass spectrometers using 180 synthetic phos-phopeptides with precisely known phosphorylation sites. Wefound the MD-score to be applicable throughout and provideMD-score distributions, thresholds and scoring functions tai-lored for each fragmentation technique that researchers mayuse as guidelines to determine the false localization rates(FLRs) of phosphorylation site assignment made by Mascot.In addition, we provide the complete liquid chromatography-tandem MS (LC-MS/MS) data so that developers of localiza-tion software can benchmark the performance of these toolsagainst a standard data set. Finally, we make the physicalphosphopeptide collection available to the community so thatany laboratory can determine and implement MD-score char-acteristics for their particular analytical set up.

EXPERIMENTAL PROCEDURES

Peptide Synthesis—Based on a list of naturally occurring phospho-peptides (20), 180 peptides including positional p-site isomers(Supplemental Table S1) were synthesized individually by solid phasesynthesis at a scale of 2 �mol on a parallel peptide synthesizer(Intavis, Cologne) following the standard Fmoc strategy. Fmoc pro-tected amino acids were obtained from Intavis. Crude peptides werequality controlled by matrix-assisted laser desorption-time-of-flightMS (MALDI-TOF MS) and used for subsequent LC-MS/MS withoutfurther purification. Annotated tandem mass spectra of all synthe-sized phosphopeptides are documented in Supplemental Fig. S13.Peptides were either analyzed individually by LC-MS/MS or as fivepooled mixtures (Supplemental Table S2). For all mixtures, peptideswere chosen such that no phosphorylation site isomers were presentin any one mixture. The synthesized peptides vary in length between5 and 28 residues (average 16, Supplemental Table S1) and contain14% Ser, 6% Thr, and 5% Tyr residues, which is similar to the “homosapiens EGF data set” in Phosida (av. 14 residues, 16% Ser, 5% Thr,1% Tyr) (21). Peptides are detected as 2� (72%), 3� (26%), and 4�(2%) precursor ions and 33% of the peptides contain one missedprotease cleavage site (5% contain two such sites), which is similar toother phosphorylation studies using trypsin as the protease and elec-trospray ionization. The higher incidence of pY-containing peptides inour set compared with that typically found in large-scale studies wasdriven by the need to investigate a sufficient number of these pep-tides for statistical analysis. Multiple phosphorylated peptides areunder-represented in our study (�10% here versus �20% in Pho-sida). Therefore, although the same trends for singly and doublyphosphorylated peptides are observed, MD-score thresholds shouldbe carefully assessed in these cases.

LC-MS/MS of Individual Phosphopeptides on a QTOF Micro—Onehundred and eighty synthesized phosphopeptides were each sub-jected to a 20 min LC-MS/MS run using a 75 �m � 50 mm reversedphase column (ReproSil-PUR C18, Dr. Maisch, Germany) and a Ca-pLC instrument (Waters, UK) coupled on-line to a QTOF Micro (Wa-ters, UK). Separation was performed within 15 min using a lineargradient from 2% to 35% acetonitrile in 0.1% formic acid and aneffective flow-rate of 250 nl/min (passive flow-splitting). The eluentwas sprayed via emitter tips (New Objective, Woburn, MA) butt-connected to the analytical column. Survey spectra were collected for1 s followed by collecting CID spectra for the top three most abundantsignals for 3 s (nitrogen was used as the collision gas, m/z and chargedependent collision energy was between 18 and 46; dynamic precur-sor exclusion 30 s, minimal precursor intensity 15 counts). A customin-house software was used that read the centroid data from rawtandem mass spectra and converted these into Mascot generic fileformat (mgf). No further peak processing was performed. Mgf fileswere searched using Mascot (2.2) with carbamidomethyl cysteine asfixed modification and oxidized methionine, acetylated protein N ter-minus, and phosphorylation of serine, threonine, and tyrosine asvariable modifications. Trypsin was specified as the proteolytic en-zyme and up to three missed cleavages were allowed. The masstolerance of the precursor ion was set to 0.6 Da and that of fragmentions was set to 0.4 Da. The data was searched against an in-housecurated version of the human International Protein Index databasecombined with a decoy version thereof (22). This database contains atotal of 163,476 protein sequences (50% forward, 50% reverse) andrepresents a nonredundant composite of International Protein Indexversions 1.0–3.54 and the sequences of bovine serum albumin, por-cine trypsin, and mouse, rat, and sheep keratins.

LC-MS/MS Analysis of Phosphopeptide Mixtures on an LTQ-Or-bitrap XL ETD—Nanoflow LC-MS/MS was performed by coupling ananoLC Ultra 1D plus (Eksigent, Dublin, CA) to an LTQ Orbitrap XLETD mass spectrometer (ThermoFisher Scientific), using a custom

Phosphorylation Site Localization by Mascot Delta Score

10.1074/mcp.M110.003830–2 Molecular & Cellular Proteomics 10.2

Page 3: Confident Phosphorylation Site Localization Using the Mascot Delta Score

packed 20 mm � 75 �m ReproSil-Pur C18 (Dr. Maisch, Germany)trap column followed by a custom packed 400 mm � 75 �m Repro-Sil-Pur C18 (Dr. Maisch, Germany) analytical column. Separation wasperformed within 60 min using a gradient from 0% to 40% acetonitrilein 0.1% formic acid. The eluent was sprayed via emitter tips (NewObjective) butt-connected to the analytical column. The mass spec-trometer was operated in data dependent acquisition mode, automat-ically switching between MS and MS2 (dynamic exclusion off, minimalprecursor intensity 1000). Full scan MS spectra were acquired in theOrbitrap at a resolution of 60,000 at m/z 400 after accumulating ionsto a target value of 1 � 106. In separate runs, the five most intenseions were selected for fragmentation by either, collision induceddissociation (CID), multi stage activation (MSA), electron transfer dis-sociation (ETD), ETD with supplemental activation (ETDSA) or higher-energy collision-induced dissociation (HCD). For CID and MSA (acti-vation of neutral losses of 98, 49, 32.6, and 24.5), peptides werefragmented after accumulating ions to a target value of 5000 and amax. injection time of 500 ms. The fragment ions were recorded in theLTQ ion trap. For ETD with and without supplemental activation,peptides were fragmented after accumulating ions to a target value of5000 and a max. injection time of 500 ms. Fluoranthene was used asthe ETD reagent and the reaction time in the ion trap was dependenton the charge state (100 ms for 2� ions, 66.7 ms for 3� ions and 50ms for 4� ions). The fragment ions were recorded in the linear trapquadrupole (LTQ) ion trap. For HCD experiments, full scan MS spec-tra were acquired at a resolution of 30,000 at m/z 400. Peptides werefragmented after accumulating ions to a target setting of 50,000 andusing a normalized collision energy of 40%. Fragment ions weredetected at a resolution of 7500 at m/z 400 in the orbitrap. A customin-house software was used that read the centroid data from rawtandem mass spectra and converted these into Mascot generic fileformat (mgf). No further peak processing was performed. Mgf fileswere searched using Mascot (2.2) with carbamidomethyl cysteine asfixed modification and oxidized methionine, acetylated protein N ter-minus, and phosphorylation of serine, threnonine, and tyrosine asvariable modifications (allowing neutral loss of 98 for pS/T but not pYpeptides). Trypsin was specified as the proteolytic enzyme and up tothree missed cleavages were allowed. The mass tolerance of theprecursor ion was set to 10 ppm and that of fragment ions was set toeither 0.5 Da (CID, MSA, ETD, ETDSA, and HCD) or 0.02 Da (HCD; seeSupplemental Fig. S1 for choice of HCD search tolerance). The datawas searched against an in-house curated version of the humanInternational Protein Index database combined with a decoy versionthereof (22). This database contains a total of 163,476 protein se-quences (50% forward, 50% reverse) and represents a nonredundantcomposite of International Protein Index versions 1.0–3.54. and thesequences of bovine serum albumin, porcine trypsin, and mouse, rat,and sheep keratins. Searches were performed with and without priorfiltering of tandem mass spectra. Searches were performed with andwithout filtering of tandem mass spectra (as described (23)). De-isotoping and deconvolution of HCD spectra was performed as de-scribed (24).

LC-MS/MS Analysis of Phosphopeptide Mixtures on an LTQ—Phosphopeptide mixtures (Supplemental Table 2) were analyzed induplicate on an LTQ ion trap mass spectrometer (ThermoFisher)coupled to a NanoLC 1D plus (Eksigent). Peptides were trapped on acustom made 0.3 mm � 5 mm (ID) trap column (ReproSil-Pur C18, Dr.Maisch, Germany) followed by a custom made 50 cm � 75 �m (ID)reversed phase tip-column (ReproSil-Pur C18, Dr Maisch, Germany)and gradient elution was performed from 2% acetonitrile to 40%acetonitrile in 0.1% formic acid within 2 h. The mass spectrometerwas operated using the XCalibur Developers kit 2.0.7 in data depen-dent acquisition mode, automatically switching between MS and MS2(dynamic exclusion 30s, minimal precursor intensity 1,000). Full scan

MS spectra were acquired at a mass range of m/z 400–1200 afteraccumulating ions to a target value of 30,000 within 50 ms. For thefour most intense ions, the charge state was determined by a zoomscan (target value: 7000, max accumulation time: 50 ms), followed byCID fragmentation or MSA (neutral losses of m/z 98, 49, 32.6, and24.5) both with target values of 10,000 and 35% normalized collisionenergy. A custom in-house software was used that read the centroiddata from raw tandem mass spectra and converted these into Mascotgeneric file format (mgf). No further peak processing was performed.Mgf files were searched using Mascot (2.2) with carbamidomethylcysteine as fixed modification and oxidized methionine, acetylatedprotein N terminus, and phosphorylation of serine, threonine, andtyrosine as variable modifications (allowing neutral loss of 98 for pS/Tbut not pY peptides). Trypsin was specified as the proteolytic enzymeand up to three missed cleavages were allowed. The mass toleranceof the precursor ion was set to 3 Da and that of fragment ions was setto 0.6 Da. The data was searched against an in-house curated versionof the human International Protein Index database combined with adecoy version thereof (22). This database contains a total of 163,476protein sequences (50% forward, 50% reverse) and represents anonredundant composite of International Protein Index versions 1.0–3.54. and the sequences of bovine serum albumin, porcine trypsin,and mouse, rat, and sheep keratins. Searches were performed withand without prior filtering of tandem mass spectra. Filtering wasperformed as described (23).

Phosphorylation Site Localization—The MD-score score was com-puted from Mascot search result files by determining the differencebetween the best and second best Mascot ion scores for alternativephosphorylation site localizations on an otherwise identical peptidesequence. The normalized MD-score (nMD-score) was calculated bydividing the MD-score by the best Mascot ion score (9). A customversion of the Ascore algorithm was implemented in Python followingthe manuscript of Beausoleil et al. (9). False localization rate (FLR)calculation for all scores was performed by dividing the number ofincorrect site assignments by the total number of site assignments asa function of the score. For the combination of the Ascore and theMD-score, we fitted a straight line (fixed at origin) to the data in Fig.2B. The slope coefficient of the fit was 0.33. We then multipliedAscore values by 0.33 to put both scores onto the same scale.Subsequently we picked the highest scaled Ascore/MD-score pair foreach identified phosphorylation site and calculated its FLR.

Data and Reagent Availability—All MS data (raw mass spectrome-ter output as well as generated peak list files) and a Scaffold result filefor the QTOF data are available from the Tranche data repositoryunder the project name: Mascot Delta score; https://proteomecom-mons.org/tranche/. All synthetic peptides used in this study are avail-able from Intavis AG (Cologne, Germany; http://www.intavis.com).

RESULTS

MD-Score Features and Performance Evaluation—The pep-tide identification scores of Mascot or other search enginesare not in themselves necessarily a good indicator for thecorrect localization of a phosphorylation site within a peptidesequence. We thus revisited if the Mascot Delta Score (MD-score), which simply reflects the difference of Mascot ionscores between the highest and second highest ion scores forcandidate phosphorylation sites on an identical peptide se-quence in a database search, would be a suitable criterion forsite localization. Based on a set of naturally occurring phos-phopeptides (20), a collection of 180 phosphopeptides (129pS/pT; 48 pY; 3 mixed pS/pT/pY peptides; 164 singly and 16doubly phosphorylated) with precisely known phosphoryla-

Phosphorylation Site Localization by Mascot Delta Score

Molecular & Cellular Proteomics 10.2 10.1074/mcp.M110.003830–3

Page 4: Confident Phosphorylation Site Localization Using the Mascot Delta Score

tion sites and multiple positional isomers were synthesizedindividually and analyzed separately by LC-MS/MS employingCID on a QTOF Micro instrument. The properties of thesepeptides are similar to those found in other phosphorylationstudies and should thus be a good set of standards for thepurpose of evaluating the MD-score for p-site localization (seealso discussion). The LC-MS/MS approach generated bothstrong and weak tandem MS data for each peptide as wouldbe the case in a typical analysis of a proteomic sample. Intotal, 2174 MS/MS spectra were matched to the 180 differentphosphopeptides (229 peptides when considering partiallyoxidized Met residues) corresponding to 9.5 spectra per pep-tide on average (range of 1–62 spectra per peptide). For allpeptide-spectrum matches, Mascot ion score, Ascore, MD-score, and normalized MD-score values were obtained andthe number of correct/incorrect localizations as well as thefalse localization rates were determined based on the knownphosphorylation sites of the synthetic peptides.

Fig. 1 shows that the distribution of all three localizationscores strongly discriminate correct and incorrect phosphor-ylation site assignments whereas the Mascot score alone isnot a confident measure for the reliability of phosphorylationsite assignment (see also Supplemental Fig. S2). Analysis ofthe data shown in Figs. 1A–C reveals that the Ascore matcheda total of 1446 spectra to the correct site (138 incorrect)resulting in a total FLR of 9%. The MD-score (and nMD-score)was slightly more sensitive and made 1639 correct (201 in-correct) assignments but at the expense of a slightly highertotal FLR of 11%. For about 10% of all tandem mass spectra,

the MD-score (and nMD-score) was zero indicating that nojudgment between correct and incorrect site localizationcould be made. From the score distributions, one can easilyderive FLR thresholds (say 1%) that may be used for theanalysis of similar samples. For the Ascore, a threshold of 22is required to reach 1% FLR at which the Ascore made 884correct assignments. The respective threshold of the MD-score is 10 at which 899 correct assignments are made. The1% FLR threshold for the normalized MD-score is 0.36 atwhich 574 site assignments were correct. We note here thatthe cutoff values calculated for this relatively small peptideset, are very similar to the ones originally determined for theAscore (threshold 20) and normalized MD-score (threshold0.4) (9). Collectively, this data shows that the MD-score andAscore perform similarly well on our data and both substan-tially better than the normalized MD-score (Fig. 1D).

Because the CID fragmentation behavior of pS/pT and pYcontaining phosphopeptides can be quite distinct, we nextinvestigated if the Ascore and MD-score would be biased intheir ability to deal with phosphorylation site localization to thedifferent amino acids. To address this, the pS/T and pY datawas analyzed separately. As evident from Fig. 2A (andSupplemental Fig. S3), the Ascore works particularly well forpS/pT peptides (883 correct assignments at 1% FLR thresh-old of 20; total FLR 6%). At 1% FLR, the Ascore shows 40%higher sensitivity compared with the MD-score (615 correctassignments at 1% FLR threshold of 10; total FLR 11%). At3% FLR, both scores have about equal sensitivity. Fig. 2Aalso shows that the MD-score outperforms the Ascore for

FIG. 1. Phosphorylation site determination by the Ascore, MD-score and normalized MD-score. A, Distribution of Ascores for correctly(blue) and incorrectly (red) assigned phosphorylation sites of synthetic phosphopeptides. B, respective distribution of MD-scores andC, normalized MD-Scores. D, Comparison of phosphorylation site assignment performance by the Mascot ion score (orange), Ascore (violet),normalized MD-score (brown) and MD-Score (green). FLR: false localization rate.

Phosphorylation Site Localization by Mascot Delta Score

10.1074/mcp.M110.003830–4 Molecular & Cellular Proteomics 10.2

Page 5: Confident Phosphorylation Site Localization Using the Mascot Delta Score

pY-peptides by a factor of eight (306 correct assignments at1% FLR threshold of 7; total FLR 10% versus 36 correctassignments at 1% FLR threshold of 39; total FLR 20%). At3% or 5% FLR, the MD-score still outperforms the Ascore bya factor of three. Taken together, the results show that theAscore performs extremely well for pS/T peptides but isstrongly biased against pY-peptides. Conversely, the MD-score does not show a strong bias between the differentphospho-amino acids but is indeed less sensitive than theAscore for pS/T peptides.

As expected, a plot of observed MD-score and Ascorevalues for the �2,000 peptide to spectrum match against thedetermined FLR values shows that the FLR drops rapidly asthe score rises (Supplemental Fig. S4). The distribution can befitted to the sum of two exponentials, e.g. FLR � A*exp(-C*MDscore) � B*exp(-D*MD score) (see Supplemental Fig. S12 forvalues of the constants A-D), which allows calculation of theprobability of correct site localization for any given scorethreshold. This often is a useful alternative to filtering data tofixed FLR cutoffs. Table I shows that the MD-score thresholdsderived from the fit are virtually identical to those obtainedfrom counting correct/incorrect peptide to spectrum matcheswith the result that the numerical FLR values derived from thefit are also very close to those determined from countingcorrect/incorrect matches.

As shown above, both MD-score and Ascore have individ-ual strengths and weaknesses in using information from tan-dem MS spectra to infer the site of modification to the differ-ent amino acids. A prominent difference being that forcalculation of the Ascore fragment ions derived from thephosphorylated amino acid and the corresponding neutralloss are considered, provided both ion signals are of sufficientabundance. In contrast, Mascot mainly considers the bestscoring ion series (either the one containing the phosphory-lated amino acid or the one containing the neutral loss ofphosphoric acid). Consequently, the two scores show a pos-itive albeit not very strong correlation (R2 � 0.33, Fig. 2B). Thisobservation lead us to try to combine the two scores andindeed, Fig. 2C shows that the combination of both scoreslead to �20% higher sensitivity at 1% FLR (25% higher at 3%

FLR). There may be other factors influencing the correlation ofthe two scores but there is insufficient data to investigatemore subtle effects such as amino acid composition or se-quence features. Next we examined how the spacing of twoalternative phosphorylation sites within a peptide sequenceinfluenced the performance of both scores. The synthesizedphosphopeptide collection contains many such examplesthat enabled us to check for potential bias in the MD-scoreand Ascore for positional alternatives. Although both scorescan generally discriminate alternative phosphorylation sites,Fig. 3 shows that a significant bias exists toward more reliablelocalization in cases that the two alternative sites are morethan one amino acid residue apart. At 1% FLR, an MD-scoreof 14 is required for discriminating adjacent phosphorylationsites whereas an MD-score of 7 suffices if the putative phos-phorylation sites are further apart. The same effect is ob-served for the Ascore and the respective thresholds are 40 foradjacent sites and 18 for sites with larger spacing. This ob-servation highlights that global FLR values apply to the “av-erage” peptide in any data set but should be used with cau-tion when assessing site localizations of individual or subsetsof peptides containing particular sequence features. Scarcityof suitable data usually impairs development of feature spe-cific score thresholds, e.g. for the specific case of sitespacing.

Different Fragmentation Techniques Require Different MD-Score Thresholds—Above, we described the MD-score char-acteristics and its properties using CID spectra generated ona QTOF instrument. We next explored the utility of the MD-score for other fragmentation techniques commonly used inproteomics. In particular, we generated five phosphopeptidemixtures (Supplemental Table S2), analyzed the MD-scores ofthese pools following LC-MS/MS on an LTQ (CID and MSA)and an LTQ-Orbitrap XL ETD mass spectrometer (CID, MSA,ETD, ETDSA, and HCD). The outcomes of these experimentsare summarized in Fig. 4 and Table I. Interestingly, at 1% FLR(spectrum level), HCD (following de-isotoping and charge de-convolution) identified the highest number of unique phos-phopeptides (n � 131) with the correct phosphorylation local-ization closely followed by low resolution ETD spectra without

FIG. 2. Comparison of MD-Score and Ascore. A, The MD-Score (red) and Ascore (blue) perform similarly well for S/T phosphorylated (solidlines) peptides but the MD-Score outperforms the Ascore for Y-phosphorylated (dotted lines) peptides. B, Site localization scores made by theMD-Score and the Ascore vary significantly but show a positive correlation. C, Combining the two scorings improves the overall performanceof phosphorylation site localization at all FLR thresholds, MD-score (red), Ascore (blue) and combined (green).

Phosphorylation Site Localization by Mascot Delta Score

Molecular & Cellular Proteomics 10.2 10.1074/mcp.M110.003830–5

Page 6: Confident Phosphorylation Site Localization Using the Mascot Delta Score

(n � 127) or with (n � 116) supplemental activation (see alsoSupplemental Figs. S5 and S11). At 5% FLR, the differencebetween HCD and ETD diminishes (Supplemental Fig. S11).Spectra collected by multistage activation (MSA) were signif-icantly more successful than resonance activation CID per-formed on both the LTQ instrument (Fig. 4A and Supple-mental Fig. S6) and the Orbitrap instruments (Fig. 4B).However, both methods on both instruments performed sig-nificantly poorer than HCD and ETD implying that the highfragment ion mass accuracy afforded by HCD and the lack ofneutral losses in ETD spectra provide more specific site lo-calization information than low resolution CID and MSA spec-tra (see also below).

The overlap of all peptides detected by HCD (deisotoped,charge deconvoluted) and ETD is very high (91%). However,the correlation of MD-scores between the two techniques wasrather weak (R2 � 0.33, Supplemental Fig. S7). Not only dothe two techniques generate completely different fragmentions, they also have different cleavage preferences with re-spect to amino acid sequence context (25). Another reasonfor this behavior turned out to be the charge states of thefragmented precursors. Although HCD spectra (de-isotoped,charge deconvoluted) from 2� and 3� precursors had aver-age MD-scores of 15.4 and 16.3 respectively, the correspond-ing ETD spectra showed average MD-scores of 14.0 and 19.8for the respective 2� and 3� precursors. This reflects thegeneral observation that more highly charged precursors (3�

or higher) tend to yield better ETD spectra compared withdoubly charged precursors. As a side note, the MD-scores forunprocessed HCD data are 13.5 for 2� ions and 10.8 for 3�

ions highlighting the benefit of de-isotoping and charge de-convolution for Mascot searching in general and p-site local-ization in particular. The overlap of all peptides detected byCID and MSA is also very high (89%, LTQ instrument, filtereddata) and their MD-scores correlate much better than thosefor ETD and HCD (R2 � 0.63, Supplemental Fig. S7) implyingthat the fragment ions used for successful site localization arenot completely distinct.

Because our phosphopeptide mixtures are not overly com-plex, multiple tandem MS spectra were generated for eachpeptide (average of 4–12 depending on fragmentation tech-

TAB

LEI

Sum

mar

yof

MD

-sco

rep

erfo

rman

ced

ata

ond

iffer

ent

frag

men

tatio

nte

chni

que

s(L

TQan

dO

rbitr

ap)

for

the

loca

lizat

ion

ofp

hosp

hory

latio

nsi

tes

at1%

fals

elo

caliz

atio

nra

te(F

LR)

Frag

men

tatio

n/S

earc

hp

aram

eter

Pre

curs

orm

/zm

easu

rem

ents

Frag

men

tm

/zm

easu

rem

ents

MD

-Sco

reth

resh

old

Sp

ectr

aP

eptid

es(c

lust

ered

spec

traa

)P

eptid

es(to

psc

ore

spec

trab

)M

D-S

core

thre

shol

d(fi

t)N

umer

ical

FLR

(fit)

#C

orre

ctlo

caliz

atio

n#

Inco

rrec

tlo

caliz

atio

n#

Cor

rect

loca

lizat

ion

#In

corr

ect

loca

lizat

ion

#C

orre

ctlo

caliz

atio

n#

Inco

rrec

tlo

caliz

atio

n

CID

Orb

itrap

Ion

trap

1140

82

532

530

110.

005

MS

AO

rbitr

apIo

ntr

ap9

535

568

568

310

0.00

6C

ID_f

ilter

edO

rbitr

apIo

ntr

ap10

503

560

560

312

0.00

3M

SA

_filt

ered

Orb

itrap

Ion

trap

765

75

775

773

70.

008

ETD

Orb

itrap

Ion

trap

710

0610

127

712

43

70.

010

ETD

_SA

Orb

itrap

Ion

trap

488

89

116

811

53

40.

010

HC

D_0

.02_

Da

Orb

itrap

Orb

itrap

1339

34

853

841

110.

012

HC

D_0

.02_

Da_

filte

red

Orb

itrap

Orb

itrap

1152

15

101

410

12

140.

012

HC

D_0

.02_

Da_

dei

sdec

Orb

itrap

Orb

itrap

778

97

131

613

12

80.

009

HC

D_0

.5_D

aO

rbitr

apO

rbitr

ap22

156

144

144

115

0.01

5C

IDIo

ntr

apIo

ntr

ap19

324

344

344

318

0.01

4M

SA

Ion

trap

Ion

trap

1475

67

686

681

140.

009

CID

_filt

ered

Ion

trap

Ion

trap

1843

74

493

492

180.

009

MS

A_f

ilter

edIo

ntr

apIo

ntr

ap11

975

778

678

011

0.00

7

aA

llsp

ectr

aid

entif

ying

ap

eptid

ear

eus

edfo

rco

untin

gco

rrec

tan

din

corr

ect

assi

gnm

ents

.b

Onl

yth

eb

est

scor

ing

spec

trum

isus

edfo

rco

untin

gco

rrec

tan

din

corr

ect

assi

gnm

ents

;nu

mer

ical

FLR

(fit)

isca

lcul

ated

from

the

fitte

dM

D-s

core

dis

trib

utio

n.

FIG. 3. Influence of phoshorylation site spacing on localizationaccuracy. MD-Score (A) and Ascore (B) phosphorylation site assign-ments are more reliable if two putative phosphorylation sites are morethan one amino acid apart (red lines) compared with sites that areadjacent (blue lines).

Phosphorylation Site Localization by Mascot Delta Score

10.1074/mcp.M110.003830–6 Molecular & Cellular Proteomics 10.2

Page 7: Confident Phosphorylation Site Localization Using the Mascot Delta Score

nique). Although this redundancy allowed us to generatedrobust FLR values, typical phosphoproteomics studies prob-ably contain rather fewer spectra per peptide. In order toexamine if this would influence the results, we repeated thedata analysis using only the best scoring spectrum per pep-tide. The respective column in Table I shows that this mainlyleads to a decrease in incorrect localizations indicating thatthe use of score thresholds determined here would also beuseful for data sets with fewer available spectra per peptide.An alternative way to test the predictive value of the MD-scorethresholds would be to divide the data in two sets, determinethe score thresholds for one set and test if the same FLRvalues would be found for the other set. Because of thelimited number of unique phosphopeptides available to us, weinstead chose to address this point by replicate analysis. Fig.4C shows the FLR versus MD-score distributions of two in-dependently acquired and analyzed CID and MSA experi-ments using a 2-fold difference in the amount of materialinjected for analysis (LTQ instrument). Despite the differencesin analyte quantity, the two replicates almost perfectly super-impose showing that FLR thresholds determined for one ofthe data sets can be transferred to the other.

We suspected that the success of the MD-score using HCDdata is in part driven by fragment ion mass accuracy. To testthis hypothesis, we searched the very same HCD data witheither low (0.5 Da) or high (0.02 Da) fragment ion accuracy. At1% FLR, the number of correct localizations concomitantlyincreased from 44 to 85 unique peptides (Table I,Supplemental Fig. S8). Comparing site assignments for reso-nance CID data collected on an ion trap to those obtainedfrom CID on a QTOF instrument (Supplemental Fig. S9) re-veals that there are significantly fewer mistakes in the QTOFdata. Because we did not observe an obvious difference in theaverage MD-scores of pS/pT/pY peptides on the two instru-ments, the differences in localization performance are pre-sumably owing to several combined effects. The QTOF offersbetter fragment ion mass accuracy than an ion trap and

contains sequence ions that are frequently lost in ion trapsowing to their inherent inability to stabilize low m/z fragmentions (low mass cutoff). In addition, the neutral losses typicallyobserved for pS/pT peptides are less pronounced on QTOFtype instruments than on ion traps. On the other hand, ion trapspectra usually contain more abundant b-ions than QTOFspectra but the net effect of the above factors is that QTOFCID data leads to the matching of more fragment ions relevantfor site localization and hence better localization performance.

Our results also highlight that processing of tandem massspectra can have an effect on the success of p-site localiza-tion (Table I). It has previously been shown, that filteringtandem MS spectra to remove low signal:noise fragment ionsimproves the peptide identification rate of proteins from lowresolution ion trap spectra (26, 27). Such filtering is also usedin the Ascore and PTM score algorithms and our filtered CID,MSA, and HCD data shows that this also improves the suc-cess of phosphorylation localization by the MD-score by�15% (Table I, Supplemental Figs. S10 and S11). An alter-native data processing step leading to much improved sitelocalization is to deisotope and charge deconvolute HCDspectra. Both these improve the Mascot ion score becausede-isotoping reduces the number of signals the search algo-rithm has to consider and charge deconvolution reduces thenumber of random matches from splitting the sequence in-formation over two (or more) ion series. Both effects likely notonly drive the improvement of the Mascot ion score but alsothe MD-score.

Scoring Positional Phosphopeptide Isomers Using the MD-Score—About 50% of the set of 180 synthesized phospho-peptides represent positional isomers. The data presentedabove and in Fig. 3 illustrate that the MD-score can alsodistinguish the majority of these cases. To illustrate this utility,Fig. 5 shows ETD spectra of the peptide ETTTSPKKYYLAEK(derived from the Tyrosine-protein kinase Tec) in which eitherof the four adjacent Thr or Ser residues was synthesized tocarry one phosphate group. Evidently, all four spectra are

FIG. 4. False localization rates and reproducibility of MD-Score thresholds for different types of tandem mass spectra. A, phosphor-ylation site assignments from spectra collected on an LTQ linear ion trap mass spectrometer. B, site assignments from spectra collected ona hybrid LTQ-Orbitrap mass spectrometer. Fitting a sum of two exponentials of the type FLR � A*exp(-C*MDscore) � B*exp(-D*MDscore) tothese curves allows calculation of FLR values for any phosphopeptide assigned by Mascot in a tandem MS specific manner (for values ofconstants see Supplemental Fig. S12). C, Fitted FLR versus MD-score curves computed from two independent MSA and CID experimentsshow that the curves and score thresholds are highly reproducible.

Phosphorylation Site Localization by Mascot Delta Score

Molecular & Cellular Proteomics 10.2 10.1074/mcp.M110.003830–7

Page 8: Confident Phosphorylation Site Localization Using the Mascot Delta Score

highly similar; all but a few c-ions are identical and rarelypermit site localization because they cover only the C-terminal(i.e. unmodified) part of the peptides. Instead, the correctlocalization primarily relies on the few z-ions representing theN-terminal part of the peptides. Still, the minimal MD-score inall cases is �9, which assigns the correct phosphorylation sitein each case with �99% confidence (ETD score threshold is7, see Table I). Thus, the MD-score greatly helps to arrive at anobjective assessment of the most likely phosphorylation siteeither by itself or in conjunction with manual spectrum inter-pretation. Because phosphopeptide isomers can very oftenbe separated by reversed phase liquid chromatography usingshallow gradients, site assignment by the MD-score will gen-erally be possible in an LC-MS/MS experiment. However, ifisomeric peptides do happen to co-elute under the chromato-graphic conditions employed, conclusive site identificationmay only be possible for the most abundant isomer.

DISCUSSION

In this study, we have re-evaluated the performance of aMascot delta score (MD-score) metric for its ability to localize

phosphorylation sites in peptides. Instead of using the Mascotion score itself, the MD-score measures the difference inMascot ion scores between the two best alternative phos-phorylation site assignments suggested by the databasesearch. We generated a significant number of diverse andindividually synthesized phosphopeptides with precisely de-fined phosphorylation sites and properties similar to thosefound in typical phosphoproteomics studies. This set of re-agents allowed us to explore the merits of the MD-score indetail and to calibrate the score for different use cases. As aresult, false localization rates for phosphorylation site assign-ments made by the Mascot search engine can be computedfor phosphopeptide spectra generated by many commonlyused tandem mass spectrometry techniques, which we thinkis a useful extension to the available set of tools for phos-phorylation site localization.

We note that the MD-score is not a new idea but our worksuggests that it has more merit than previously appreciated.In the original Ascore publication (9), Beausoleil et al. alreadyevaluated the use of a normalized Mascot delta ion scoremetric (that is taking the difference in the ions score for the top

100 200 300 400 500 600 700 800 900 1000 1100 1200 1300 1400 1500 1600 1700

0

5

10

15

20

25

Rel

ativ

e A

bund

ance

580.4 870.4 970.31296.5 1723.8

770.2

898.4

444.0842.3 1133.4

260.2797.3

1480.6

607.31409.5

331.2 1594.5

971.4 1609.6

1392.71493.6914.5

445.1 746.9 1089.31712.1623.4 1566.61210.8 1695.5696.6 1026.1347.2 460.3 1677.31374.9350.3147.4

c7

c8

c9

c10

c11

c12

c13

z2

z3

z4

z5

z6

z8

z7

z10

[M+3H]3+ [M+3H]2+

y4

y5y3

c9c8c7

z8 z7 z5 z4 z2z3z6

E T T pT S P K K Y Y L A E K

c12c11c10 c13

z13 z10z11z12

z112+

z122+

z11 z12

z13

y13

z132+

[M+2H]+

-NH3

c3 c9c8

z8 z7 z5 z4 z2z3z6

E T T T pS P K K Y Y L A E K

c12c11c10 c13

z13 z10z11z12

100 200 300 400 500 600 700 800 900 1000 1100 1200 1300 1400 1500 1600 17000

5

10

15

20

25

Rel

ativ

e A

bund

ance

580.4 862.0444.1

1296.6

805.5

797.6

1723.8970.4

1480.6

898.3770.31409.6

1133.5260.1

607.4331.31609.9

1566.5

460.3 1392.6746.7 914.4 971.3623.3 1252.3 1711.31161.11089.7 1306.7261.2 1027.6348.9 1679.0574.6 650.9248.3147.3 389.0

z2

z3

z4 [M+3H]3+ [M+3H]2+

y4 y5

z5

z6

y10

y13

c8

c9

c10

c11

c12

c13

z11

y132+

z132+

z122+

c3

z7

z8z10

[M+2H]+

-NH3

100 200 300 400 500 600 700 800 900 1000 1100 1200 1300 1400 1500 1600 1700

0

5

10

15

20

Rel

ativ

e A

bund

ance

580.2 870.5 1740.4

970.4

770.3

1296.4

856.0 1723.61133.4

607.3

805.2444.2

1480.6898.4 1409.4 1610.0971.4331.2

1565.6

1089.2 1713.2260.0 623.3914.4460.3 1157.6 1498.0

691.4 1027.6 1352.8347.3 1669.0624.4 693.7203.7

c8

c9

c10

c11c12

z2

z3

z4

z5

z6

z7

z8

1210.6

z10y4y3

y5

y13

c13

-NH3

[M+3H]3+ [M+3H]2+ [M+2H]+

100 200 300 400 500 600 700 800 900 1000 1100 1200 1300 1400 1500 1600 1700

0

5

10

15

20

25

Rel

ativ

e A

bund

ance

580.3 861.9 1296.5 1723.9

870.6

970.4

1480.6444.3

805.5

770.3 1409.6

898.3 1133.4260.0

331.2

914.6

607.3 746.91694.5

1609.5

1565.41089.3623.1 1365.31158.2

1211.61026.5276.0 574.7445.3 714.3 1666.7347.2147.1 528.5202.3

z2

z3

z4

[M+3H]3+

[M+3H]2+

z5

z6

c9

c10

c8

c11

c12

y13

z12

z122+

y132+

c6

z7

z8z10

+2

1311.7

z11

z13

c13

[M+2H]+

-NH3

y3

y5

c6 c9c8

z8 z7 z5 z4 z2z3z6

E T pT T S P K K Y Y L A E K

c12c11c10 c13

z13 z10z11z12

z12z13

Phosphorylation site: T4 S5 T3 T2Mascot ion score: 84 75 62 51

c9c8

z8 z7 z5 z4 z2z3z6

pT T T S P K K Y Y L A E K

c12c11c10 c13 c14

z13 z10z12

E Phosphorylation site: T2 T3 T4 S5Mascot ion score: 57 47 47 37

Phosphorylation site: S5 T4 T3 T2Mascot ion score: 66 54 51 48

Phosphorylation site: T3 T4 T2 S5Mascot ion score: 82 72 67 60

25

FIG. 5. Example ETD spectra of the peptide ETTTSPKKYYLAEK with a single phosphate group on four alternative adjacent S/T sites.Mascot ion scores are all above identity threshold confirming the peptide sequence but only the MD-Score allows confident assignment of thecorrect phosphorylation site in these isomeric phosphopeptides.

Phosphorylation Site Localization by Mascot Delta Score

10.1074/mcp.M110.003830–8 Molecular & Cellular Proteomics 10.2

Page 9: Confident Phosphorylation Site Localization Using the Mascot Delta Score

two ranking peptides and dividing that difference by the firstranking peptide’s ions score) for low resolution ion trap CIDspectra but found it to be inferior to the Ascore. We appliedthe same methods to the analysis of our phosphopeptideLC-MS/MS data and in addition evaluated the performance ofa straight score difference (that is taking the difference in theions score for the top two ranking peptides). The results of adirect comparison between the methods are shown in Fig. 1.Our data confirm the previous results obtained by Beausoleilet al. that the normalized MD-score is significantly poorer thanthe Ascore. We also find a very similar cutoff to reach 99%localization confidence (0.36 in our study versus 0.4 in theBeausoleil study). However, Fig. 1 also clearly shows that thestraight MD-score significantly outperforms the normalizedMD-score and is very similar in overall performance to theAscore. The reason for the poor performance of the normal-ized MD-score is that it makes no difference between highand low quality database search results. For example, twoalternative sites with Mascot scores of 60 and 40 respectivelygenerate the same normalized MD-score as two alternativesites with Mascot scores of 6 and 4. Clearly, such scorenormalization will negatively impact the ability to call a p-sitecorrectly by allowing too many obviously poor assignments.The Heck laboratory recently also used the delta ion score ofMascot database search results for assessing alternativephosphorylation sites from CID and ETD data (18, 19) but didnot establish if the statistical assumptions made by the Mas-cot ion score are equally applicable for scoring p-site local-ization for these two fragmentation techniques. Even thoughthe MD-score and Ascore show similar overall performance,there are significant differences in detail. The MD-score out-performs the Ascore for tyrosine phosphorylated peptideswhereas the Ascore does so for S/T phosphorylation (Fig. 2).This observation may not be surprising given that the Ascorewas developed on a data set dominated by S/T phosphory-lation. For the same reason, phosphorylation sites with highMD-scores may not necessarily also have high Ascores andvice versa. Consequently, combining the MD-score and As-core leads to a moderate improvement in sensitivity and spec-ificity over using one score alone. There are other publishedstudies addressing the issue of phosphorylation site identifi-cation by either database searching using different searchengines (9, 14) or other localization scores (2, 8, 9, 14–17). Itwas beyond the scope of this work to compare these meth-ods systematically to the MD-score but it can be anticipatedthat differences between search engines and site localizationscores will exist depending on which criteria (and with whichweighting) are used for site identification. Most approachesuse empirical information about how phosphopeptides frag-ment in the gas phase but in X! Tandem (28), for example,phosphorylation motif information can also be used to biasphosphorylation site localization results. A noteworthy featureof the MD-score is that its numerical value and statisticalsignificance are independent of the size of the database

searched. The same MD-scores are in fact obtained whensearching the human subset of Swissprot (16,000 entries), thecomplete Swissprot (258,000 entries) or the full NCBInr data-base (4,627,000 entries) with or without modifications in ad-dition to variable phosphorylation on Ser, Thr, and Tyr resi-dues. Therefore, MD-score values determined for a particulartandem MS method may be used for scoring large or smalldata sets alike.

Using score differences from database search engines isgenerally attractive because it does not require specializedinformatics tools and our results show that the MD-score canbe used for many fragmentation techniques. However, itshould be noted that the MD-score is not an absolute oruniversal measure of phosphorylation site localization proba-bility because the MD-score distributions and significancethresholds are different for every fragmentation technique, animportant point not discovered or addressed by previousstudies. Using our set of synthetic phosphopeptides withprecisely known sites allowed us to calibrate the MD-score sothat false localization rates can be derived for any of thefragmentation techniques investigated (Fig. 4, Table I, Sup-plemental Fig. S12). To stress the point by example, an MD-score of 11 is required for correct site localization in lowresolution resonance CID spectra (1% FLR) but an MD-scoreof four suffices for low resolution ETDSA spectra to reachthe same level of confidence. We suspect that any sitelocalization score rooted in database searching will beprone to differences depending on the fragmentation tech-nique used, again stressing the importance to derive FLRvalues for each technique.

A pre-requisite for successful phosphorylation site determi-nation is of course the ability to identify the underlying peptidein the first place. Because of the gas phase fragmentationbehavior of phosphopeptides, not all fragmentation tech-niques and mass analyzers are equally suitable. We thereforeused the phosphopeptide collection to investigate the phos-phorylation site localization accuracy of the MD-score for allcommonly used fragmentation techniques. The results con-firmed earlier observations that low resolution resonance ac-tivation CID spectra are neither particularly sensitive nor veryaccurate in correctly assigning the site of phosphorylation (4,29). Multistage activation, MS3 or data filtering routines havebeen shown to improve the number of phosphopeptide iden-tification and localization moderately (30–32) and again, ourdata agrees with these studies. Villen et al. have argued thatMSA performs less well than CID for the large-scale identifi-cation of phosphopeptides because of the extra time requiredto record MSA spectra compared with resonance CID spectra(32) so that CID would simply outnumber MSA and thus bemore productive overall. However, there are also other reportsthat conclude the opposite (31). Our study focused on thequality of site-localization rather than p-peptide identificationand the data clearly suggests that MSA spectra offer benefits.At the same time, we find that this benefit is more pronounced

Phosphorylation Site Localization by Mascot Delta Score

Molecular & Cellular Proteomics 10.2 10.1074/mcp.M110.003830–9

Page 10: Confident Phosphorylation Site Localization Using the Mascot Delta Score

for an LTQ instrument than for an Orbitrap, which is in line withthe Villen study. A somewhat unexpected observation wasthat HCD fragmentation can be just as successful as ETDfragmentation (Table I), which is commonly referred to as themethod of choice for phosphorylation site determination (33).This observation can be attributed to several factors; the HCDspectrum does not suffer from the low mass cutoff inherent toion trap spectra and thus contains information on parts of thepeptide sequence that are not available from ion trap massanalyzers. In addition, the increased fragment ion mass ac-curacy offered by the Orbitrap analyzer reduces the probabil-ity of randomly assigning fragment ions. Further, the lowspectral noise in HCD spectra allows Mascot to score lowabundance ions important for localization more often thanpossible in ion trap spectra. Last, the high resolution of HCDspectra allow de-isotoping and charge deconvolution of frag-ment ions both of which lead to higher Mascot scores. Thelowest absolute significance threshold was obtained for ETDwith supplemental activation, which can be rationalized by themore efficient fragmentation of charge reduced precursor ionsas well as the presence of ETD and CID type fragment ions,which represents more information than ETD fragment ionsalone. At the same time, ETDSA identified fewer peptides thanHCD with correct phosphorylation site localization. This isbecause the majority of the synthetic tryptic peptides wereobserved as doubly charged ions that favor their identificationby HCD over ETD (see Results section for details). We choseto base the presentation and interpretation of our data on 1%FLR. This is very stringent for the purpose of site localization.Although the same trends are also observed at a less con-servative but acceptable level of 5% FLR (Supplemen-tal Fig. S11), the subtle differences among ETD, ETDSA, andthe HCD data processing varieties diminish. Given the goodperformance of ETD and HCD, it would have been interestingto combine ETD with fragment ion recording in the Orbitrap.Unfortunately, the sensitivity of this experiment on our instru-ment is very poor, so that we were unable to generate mean-ingful data. Future work in this area may include ETD on thelatest generation of QTOF and Orbitrap instruments. It shouldbe noted again, that the above conclusions are drawn for suc-cessful site localization and therefore do not necessarily infermore productive p-peptide identification in e.g. large-scalestudies because the acquisition of both ETD and HCD needmore time than resonance CID. However, a very recent reportsuggests that HCD is in fact a very competitive method forlarge-scale phosphorylation identification (34). Although we seethe same trends for singly and doubly phosphorylated peptides,the rather low number of these peptides in our data set makesit currently difficult to anticipate if HCD or some form of ETD willbe more successful for correct site localization of multiply phos-phorylated peptides. As a noteworthy side note, our data andother recent studies (19, 35) show that the reported gas phasephosphorylation site rearrangement of peptides (36) is not amajor concern for phosphoproteomic studies.

Despite the success of the MD-score phosphorylation sitelocalization, there are several aspects that should be giventhought. First, it should be noted that the FLRs we present inthis study both for the MD-score and the Ascore are “global”in the sense that they are unaware of parameters such asspacing between possible phosphorylation sites, compositionor secondary structure of flanking amino acid sequences etc.Our observation that p-sites that are further apart are morelikely to be called correctly indicates that a dependence onspacing indeed exists. In our simple evaluation (adjacent sitesversus spaced sites) the effect is actually quite large. Conse-quently, caution should be applied when assessing the local-ization information reported for individual peptides containingmany potential sites. It would indeed be very interesting toevaluate spacing (and others) as a parameter more system-atically to be able to keep a fixed FLR for all peptides. How-ever, a very large number of peptides (estimated thousands)with precisely known sites would be required to reach statis-tically sound conclusions, which was beyond the scope of thisstudy. Another point for consideration relates to the localiza-tion of multiple phosphorylations to peptides containing manypotential acceptor sites. Mascot tests a maximum of 256permutations of a modification on a peptide sequence in orderto keep the required computation time at a reasonable level.For singly phosphorylated peptides, the 256 permutation caplimits the number of potential sites to 256. For doubly phos-phorylated peptides, the limit is 23 possible sites, for triplyphosphorylated peptides the limit is 12 sites and for four, five,and six phosphates on a single peptide, the limit is 10 sites.We analyzed our data as well as the data presented in theAscore and PTM score manuscripts (2, 9) in this regard andfound that there is not a single sequence in our phosphopep-tide collection that would exceed these limits. The same istrue for the 2872 phosphopeptide sequences listed in theAscore paper. Of the 18,958 phosphopeptide sequenceslisted in the PTM score paper, 70 are above the limit (0.4%).Fortunately, we can conclude from this data that one shouldbe aware of the 256 permutation limit imposed by Mascot(other search engines are likely to have similar caps) but itdoes not constitute a severe issue for large-scale phospho-proteomics in general and the MD-score in particular. Thelimit does however become more relevant for “middle-down”proteomic approaches particularly if multiple modificationsare present on a reasonably large peptide. A third aspectrelates to how easily MD-score thresholds determined herecan be transferred to other data sets. We showed that thescore thresholds are very reproducible among replicate ex-periments. However, as for any other localization approach,one cannot comprehensively rule out the possibility thatchanges to experimental parameters such as data acquisitionsettings might change the content of tandem mass spectrasuch that score thresholds shift to slightly different values.This is why we not only make our data available to the com-munity but also provide the peptide collection so that individ-

Phosphorylation Site Localization by Mascot Delta Score

10.1074/mcp.M110.003830–10 Molecular & Cellular Proteomics 10.2

Page 11: Confident Phosphorylation Site Localization Using the Mascot Delta Score

ual laboratories can determine MD-score thresholds for theirindividual analytical setup.

CONCLUSIONS

Our data shows that the MD-score is a valuable tool for theobjective assessment of phosphorylation site assignmentsmade by Mascot, which should further improve the reliabilityof small and large-scale phosphoproteomics studies. The useof individual synthetic phosphopeptides with precisely knownphosphorylation sites independently validates approachessuch as the Ascore, PTM score and similar other scoringschemes that were developed on large phosphopeptide datasets in which the exact sites were not always known a priori.This is particularly important for large-scale studies in which itis no longer practical to validate each phosphorylation siteassignment by manual inspection of the tandem mass spec-tra. It might in fact be argued that manual inspection oftandem mass spectra may be more error prone than an au-tomated objective scoring scheme such as the MD-score. TheMD-score concept is applicable to many fragmentation tech-niques and can be obtained easily from Mascot databasesearch results. Given that Mascot is one of the most widelyused protein identification software tools in proteomics, theMD-score will enable many laboratories to assess their phos-phorylation data objectively without the need for using some-what arbitrary identification score thresholds. We are makingall LC-MS/MS data as well as the phosphopeptide collectionavailable to the community so that any laboratory may be ableto perform similar types of analysis as we did and adapt thereported scores to their analytical environment.

Acknowledgments—We thank Andrea Hubauer for technical as-sistance and John Cottrell for valuable discussions about the detailsof Mascot scoring. We further thank the ABRF Proteome InformaticsResearch Group (iPRG) for coming up with the term “false localizationrate.”

□S This article contains Supplementary Figs. S1–S13 and TablesS1–S2.

� To whom correspondence should be addressed: Chair of Pro-teomics and Bioanalytics, Technische Universitat Munchen, Emil Er-lenmeyer Forum 5, 85354 Freising, Germany. E-mail: [email protected]; Cellzome AG, Meyerhofstrasse 1, 69117 Heidelberg,Germany. E-mail: [email protected].

REFERENCES

1. Beausoleil, S. A., Jedrychowski, M., Schwartz, D., Elias, J. E., Villen, J., Li,J., Cohn, M. A., Cantley, L. C., and Gygi, S. P. (2004) Large-scalecharacterization of HeLa cell nuclear phosphoproteins. Proc. Natl. Acad.Sci. U.S.A. 101, 12130–12135

2. Olsen, J. V., Blagoev, B., Gnad, F., Macek, B., Kumar, C., Mortensen, P.,and Mann, M. (2006) Global, in vivo, and site-specific phosphorylationdynamics in signaling networks. Cell 127, 635–648

3. Pinkse, M. W., Uitto, P. M., Hilhorst, M. J., Ooms, B., and Heck, A. J. (2004)Selective isolation at the femtomole level of phosphopeptides from pro-teolytic digests using 2D-NanoLC-ESI-MS/MS and titanium oxide pre-columns. Anal Chem. 76, 3935–3943

4. Swaney, D. L., Wenger, C. D., Thomson, J. A., and Coon, J. J. (2009)Human embryonic stem cell phosphoproteome revealed by electrontransfer dissociation tandem mass spectrometry. Proc. Natl. Acad. Sci.

U.S.A. 106, 995–10005. Thingholm, T. E., Jensen, O. N., Robinson, P. J., and Larsen, M. R. (2008)

SIMAC (sequential elution from IMAC), a phosphoproteomics strategy forthe rapid separation of monophosphorylated from multiply phosphor-ylated peptides. Mol. Cell. Proteomics. 7, 661–671

6. Zhang, Y., Wolf-Yadlin, A., Ross, P. L., Pappin, D. J., Rush, J., Lauffen-burger, D. A., and White, F. M. (2005) Time-resolved mass spectrometryof tyrosine phosphorylation sites in the epidermal growth factor receptorsignaling network reveals dynamic modules. Mol. Cell. Proteomics. 4,1240–1250

7. Nichols, A. M., and White, F. M. (2009) Manual validation of peptidesequence and sites of tyrosine phosphorylation from MS/MS spectra.Methods Mol. Biol. 492, 143–160

8. Bailey, C. M., Sweet, S. M., Cunningham, D. L., Zeller, M., Heath, J. K., andCooper, H. J. (2009) SLoMo: automated site localization of modificationsfrom ETD/ECD mass spectra. J Proteome Res. 8, 1965–1971

9. Beausoleil, S. A., Villen, J., Gerber, S. A., Rush, J., and Gygi, S. P. (2006) Aprobability-based approach for high-throughput protein phosphorylationanalysis and site localization. Nat. Biotechnol. 24, 1285–1292

10. Lehmann, W. D., Kruger, R., Salek, M., Hung, C. W., Wolschin, F., andWeckwerth, W. (2007) Neutral loss-based phosphopeptide recognition: acollection of caveats. J. Proteome Res. 6, 2866–2873

11. Perkins, D. N., Pappin, D. J., Creasy, D. M., and Cottrell, J. S. (1999)Probability-based protein identification by searching sequence data-bases using mass spectrometry data. Electrophoresis 20, 3551–3567

12. Yates, J. R., 3rd, Eng, J. K., McCormack, A. L., and Schieltz, D. (1995)Method to correlate tandem mass spectra of modified peptides to aminoacid sequences in the protein database. Anal Chem. 67, 1426–1436

13. Lu, B., Ruse, C., Xu, T., Park, S. K., and Yates, J., 3rd (2007) Automaticvalidation of phosphopeptide identifications from tandem mass spectra.Anal Chem. 79, 1301–1310

14. Payne, S. H., Yau, M., Smolka, M. B., Tanner, S., Zhou, H., and Bafna, V.(2008) Phosphorylation-specific MS/MS scoring for rapid and accuratephosphoproteome analysis. J. Proteome Res. 7, 3373–3381

15. Ruttenberg, B. E., Pisitkun, T., Knepper, M. A., and Hoffert, J. D. (2008)PhosphoScore: an open-source phosphorylation site assignment tool forMSn data. J. Proteome Res. 7, 3054–3059

16. Schlosser, A., Vanselow, J. T., and Kramer, A. (2007) Comprehensivephosphorylation site analysis of individual phosphoproteins applyingscoring schemes for MS/MS data. Anal Chem. 79, 7439–7449

17. Wan, Y., Cripps, D., Thomas, S., Campbell, P., Ambulos, N., Chen, T., andYang, A. (2008) PhosphoScan: a probability-based method for phos-phorylation site prediction using MS2/MS3 pair information. J. ProteomeRes. 7, 2803–2811

18. Boersema, P. J., Mohammed, S., and Heck, A. J. (2009) Phosphopeptidefragmentation and analysis by mass spectrometry. J. Mass Spectrom.44, 861–878

19. Mischerikow, N., Altelaar, A. F., Navarro, J. D., Mohammed, S., and Heck,A. (2010) Comparative assessment of site assignments in CID and ETDspectra of phosphopeptides discloses limited relocation of phosphategroups. Mol. Cell. Proteomics 9, 2104–2148

20. Bantscheff, M., Eberhard, D., Abraham, Y., Bastuck, S., Boesche, M.,Hobson, S., Mathieson, T., Perrin, J., Raida, M., Rau, C., Reader, V.,Sweetman, G., Bauer, A., Bouwmeester, T., Hopf, C., Kruse, U., Neu-bauer, G., Ramsden, N., Rick, J., Kuster, B., and Drewes, G. (2007)Quantitative chemical proteomics reveals mechanisms of action of clin-ical ABL kinase inhibitors. Nat. Biotechnol. 25, 1035–1044

21. Gnad, F., Ren, S., Cox, J., Olsen, J. V., Macek, B., Oroshi, M., and Mann,M. (2007) PHOSIDA (phosphorylation site database): management,structural and evolutionary investigation, and prediction of phosphosites.Genome Biol. 8, R250

22. Elias, J. E., and Gygi, S. P. (2007) Target-decoy search strategy for in-creased confidence in large-scale protein identifications by mass spec-trometry. Nat. Methods. 4, 207–214

23. Cox, J., and Mann, M. (2008) MaxQuant enables high peptide identificationrates, individualized p.p.b.-range mass accuracies and proteome-wideprotein quantification. Nat. Biotechnol. 26, 1367–1372

24. Savitski, M. M., Mathieson, T., Becher, I., and Bantscheff, M. (2010) H-Score, a Mass Accuracy Driven Rescoring Approach for Improved Pep-tide Identification in Modification Rich Samples. J. Proteome. Res. 9,5511–5516

Phosphorylation Site Localization by Mascot Delta Score

Molecular & Cellular Proteomics 10.2 10.1074/mcp.M110.003830–11

Page 12: Confident Phosphorylation Site Localization Using the Mascot Delta Score

25. Zubarev, R. A., Zubarev, A. R., and Savitski, M. M. (2008) Electron capture/transfer versus collisionally activated/induced dissociations: solo orduet? J. Am. Soc Mass Spectrom. 19, 753–761

26. Beer, I., Barnea, E., Ziv, T., and Admon, A. (2004) Improving large-scaleproteomics by clustering of mass spectrometry data. Proteomics 4,950–960

27. Olsen, J. V., and Mann, M. (2004) Improved peptide identification in pro-teomics by two consecutive stages of mass spectrometric fragmenta-tion. Proc. Natl. Acad. Sci. U.S.A. 101, 13417–13422

28. Craig, R., and Beavis, R. C. (2004) TANDEM: matching proteins with tan-dem mass spectra. Bioinformatics 20, 1466–1467

29. Molina, H., Horn, D. M., Tang, N., Mathivanan, S., and Pandey, A. (2007)Global proteomic profiling of phosphopeptides using electron transferdissociation tandem mass spectrometry. Proc. Natl. Acad. Sci. U.S.A.104, 2199–2204

30. Mortensen, P., Gouw, J. W., Olsen, J. V., Ong, S. E., Rigbolt, K. T.,Bunkenborg, J., Cox, J., Foster, L. J., Heck, A. J., Blagoev, B., Ander-sen, J. S., and Mann, M. MSQuant, an open source platform for massspectrometry-based quantitative proteomics. J. Proteome Res. 9,

393–40331. Ulintz, P. J., Yocum, A. K., Bodenmiller, B., Aebersold, R., Andrews, P. C.,

and Nesvizhskii, A. I. (2009) Comparison of MS(2)-only, MSA, and MS(2)/MS(3) methodologies for phosphopeptide identification. J. ProteomeRes. 8, 887–899

32. Villen, J., Beausoleil, S. A., and Gygi, S. P. (2008) Evaluation of the utility ofneutral-loss-dependent MS3 strategies in large-scale phosphorylationanalysis. Proteomics 8, 4444–4452

33. Grimsrud, P. A., Swaney, D. L., Wenger, C. D., Beauchene, N. A., and Coon,J. J. Phosphoproteomics for the masses. ACS Chem. Biol. 5, 105–119

34. Nagaraj, N., D’Souza, R. C., Cox, J., Olsen, J. V., and Mann, M. Feasibility oflarge scale phosphoproteomics with HCD fragmentation. J. Proteome.Res. 9, 6786–6794

35. Aguiar, M., Haas, W., Beausoleil, S. A., Rush, J., and Gygi, S. P. Gas-PhaseRearrangements Do Not Affect Site Localization Reliability in Phospho-proteomics Data Sets. J. Proteome. Res. 9, 3103–3107

36. Edelson-Averbukh, M., Shevchenko, A., Pipkorn, R., and Lehmann, W. D.(2009) Gas-phase intramolecular phosphate shift in phosphotyrosine-containing peptide monoanions. Anal Chem. 81, 4369–4381

Phosphorylation Site Localization by Mascot Delta Score

10.1074/mcp.M110.003830–12 Molecular & Cellular Proteomics 10.2