MS Data analysis for Proteomics studies Suruchi Rao Harini Chandra The process of inferring accurate protein identification data from thousands of mass spectra generated in mass spectrometry based proteomics experiments is a complicated and challenging process. Improved computation and greater data storage capability developed over the last decade has now considerably simplified this process.
31
Embed
MS Data analysis for Proteomics studies Suruchi Rao Harini Chandra The process of inferring accurate protein identification data from thousands of mass.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
MS Data analysis for Proteomics studies
Suruchi RaoHarini Chandra
The process of inferring accurate protein identification data from thousands of mass spectra generated in mass spectrometry based proteomics experiments is a complicated and challenging process. Improved computation and greater data storage capability developed over the last decade has now considerably simplified this process.
Master Layout (Part 1)
5
3
2
4
1 This animation consists of 3 parts:Part 1 – Typical proteomics experimentPart 2 – Peptide Mass Fingerprinting (PMF)Part 3 – MS/MS Data analysis
Proteolysis (trypsin digestion)
+ + +++ ++
SDS-PAGE 2-DE
Tandem MS/MSMALDI
Mass spectra
Definitions of the components:Part 1 – Typical proteomics experiment
1. Typical Proteomics Experiment: One that involves the use of a Mass Spectrometer to analyze the content of a proteome or to elucidate individual components of a protein complex after they have been suitably separated by various gel-based or chromatographic techniques.
2. SDS-PAGE: SDS-PAGE is a separation technique that brings about protein separation under denaturing conditions. This is extensively used along with quantitative proteomics techniques like iTRAQ, SILAC etc. Once the proteins have been separated, the gel can be cut into pieces and the desired bands can be eluted out, which can then be taken for further identification by MS.
3. 2-DE: The commonly used protein separation technique that carries out fractionation of the protein mixture based on isoelectric point in one dimension and molecular weight in the second dimension. Protein bands from the gel can be excised and eluted using a suitable buffer and used for further analysis by MS.
4. Proteolysis: The process of site-specific digestion of proteins, typically by the proteolytic enzyme, Trypsin, which generates peptide fragments of appropriate size that are analyzed in the form of positive ions in MS.
5
3
2
4
1
Definitions of the components:Part 1 – Typical proteomics experiment
5. Tandem MS/MS: This is a MS technique that makes use of a combination of ion source and two mass analyzers, separated by a collision cell, in order to provide improved resolution of the fragment ions. The mass analyzers may either be the same or different. The first mass analyzer selects only a particular ion which is further fragmented and resolved in the second analyzer. This can be used for protein sequencing studies.
6. Matrix Assisted Laser Desorption Ionization (MALDI): MALDI is an efficient process for generating gas-phase ion of peptides and proteins for mass spectrometric detection. Target plate with dried matrix-protein sample is exposed to short, intense pulses from a UV laser.
7. Mass spectra: Charged peptide fragments are resolved by the mass analyzer on the basis of their mass-to-charge ratios and then detected by means of the detector, which generates a spectrum of relative abundances of the ions against their mass-to-charge ratio.
5
3
2
4
1
Part 1, Step 1
Action Audio Narration
1
5
3
2
4Description of the action
Tube containing trypsin & buffer
As shown in animation.
First show the two squares on top with the black patterns on them. Then show the red circle followed by the tube below & the two arrows. The black dots in the circle must enter the tube. This must then be zoomed into and the violet shape in the box must be shown. The green object must then appear which must move along the violet shape breaking it up into small fragments (shown on the right) as it moves.
Most proteomics experiments involve the separation of a protein mixture by means of electrophoresis followed by elution of the protein band of interest. This protein is then digested into small peptide fragments by means of proteolytic enzymes, the most commonly used one being trypsin. These small peptide fragments can then be further analyzed by MS
SDS-PAGE 2-DE
Protein of interest
Trypsin
Peptide fragments
Proteolytic digestion
Part 1, Step 2
Action Audio Narration
1
5
3
2
4Description of the action
Spectra of analyte protein
The peptide fragments obtained after digestion can be analyzed either by MALDI-TOF or by Tandem MS/MS. In MALDI-TOF, peptide ions are accelerated at different velocities depending on their mass to charge ratios. The spectrum generated provides a set of peaks whose masses represent each of the peptides present in the mixture. These spectra can then be analyzed by various available softwares to obtain more information about the protein.
As shown in animation.
First show the tube marked ‘tryptic digest’ followed by the down arrow with label and the setup shown below that. Next show a light coming out of the red cylinder which must hit the white plate on the left and then move towards the white ‘reflector’ on the right end of the tube and finally must be deflected onto the detector. Next show the ions of different sizes appearing which must move at different speeds across the tube with the smallest ones moving the fastest and largest moving slowly. They must move until they reach the detector after which the graph above must be shown.
Mass Spectrometry analysis – MALDI TOF
Tryptic digest
++
++
++
+ +++
++ +
Laser source
MALDI Sample plate
Detector
Reflector
TOF tube
Applied to sample plate
Part 1, Step 3
Action Audio Narration
1
5
3
2
4Description of the action
Tandem MS/MS is capable of providing more in-depth sequence information. Each peptide in the digest is further fragmented in the second ionization step and analyzed, thereby generating a spectrum for each peptide. These spectra can then be analyzed by various available softwares to obtain more information about the protein.
As shown in animation.
First show the tube on top marked ‘tryptic digest’ followed by the down arrow with label followed by the coloured ions and the remaining components. The ions must move towards the first set of rods & only the pink ions must be allowed through the opening. These must enter the orange cube. In this, they must get fragmented into smaller pieces and must come out of the other end as shown. These smaller pieces must fly through the second set of rods and enter the detector. As each of the fragments reaches the detector, the graph on the right must start appearing from left to right until all the fragments have been detected.
Mass Spectrometry analysis – Tandem MS/MS
Tryptic digest
++
+
++
+ +
Q1 – Scanning mode
Q2 – Collision cell
Q3 – RF mode
Peptide ions
Ions of selected m/z
Fragmented ions
DetectorSpectra of analyte protein
Peptide ions generated
Master Layout (Part 2)
5
3
2
4
1 This animation consists of 3 parts:Part 1 – Typical proteomics experimentPart 2 – Peptide Mass Fingerprinting (PMF)Part 3 – MS/MS Data analysis
Spectrum from MALDI analysis
Online search with sequence databases
Open shareware for PMF
Best fit – Score histogram
www.matrixscience.com
Definitions of the components:Part 2 – Peptide Mass Fingerprinting (PMF)
1. Peptide Mass Fingerprinting: This is one of the protein analysis methods which compares mass values of peptides generated from the protein analyte to a database of known proteins to arrive at its probable identity in the form of the “best fit”.
2. Spectrum from MALDI analysis: The peptide fragments generated after proteolytic digestion are analyzed by MALDI-TOF and the spectrum generated used for further analysis using online sequence databases.
3. Online search: Several open source databases are available online, which allow analysis of the MS spectrum generated.
4. Open shareware for PMF: These are database search algorithms used for comparing experimental masses against theoretically calculated peptide masses derived by applying “cleavage rules” to large primary sequence protein databases. The result of the comparison lists a number of proteins in the order of the best probable identity as derived by a probability score. The open shareware consists of the following fields which need to entered by the user:
i. Name and Email: Used for identification of search entry and also for e-mailing results page in case of loss of connection without requiring re-entry of data.
ii. Search Title: Used to identify and label search entry and typically includes the name of the protein whose information is required.
iii. Database/s: The primary sequence protein databases, including NCBInr and SwissProt against whom the query is run. A contaminants database is also recommended to eliminate contaminants such as keratin, trypsin and BSA.
5
3
2
4
1
Definitions of the components:Part 2 – Peptide Mass Fingerprinting (PMF)
5
3
2
4
1 iv. Taxonomy: It allows the search query to be limited to a particular species or a group of species bringing otherwise weaker hits to notice.
v. Enzyme: The proteolytic enzyme chosen during sample prep of analyte protein before its mass spectrometric analysis. Most popular of these is trypsin but if any other enzyme is used its site specificity is expected to be equal to or better than that of trypsin.
vi. Missed Cleavage Allowed: Occurrence of partial digests during trypsinization of analyte protein at one or two Arginine and Lysine sites is a common phenomenon and needs to be accounted for during search against calculated peptide masses.
vii. Modifications: During sample prep for Mass Spec Analysis of proteins, some changes in the mass of specific residues might occur, such as oxidation of methionine, carboxymethyl and cysteine etc. To account for these mass changes, the algorithm allows two types of modifications to be pre-selected- Fixed and Variable.• Fixed Modifications: Modifications that need to be applied collectively across
the database to account for change in mass of specific residue/s. Most common fixed modification is the selection of the mass of carboxymethyl over cysteine replacing its mass as 161 Da.
• Variable Modifications: These are mass changes suspected to occur during sample handling and accounted for by increasing the number of primary sequences compared against experimental masses. Most common variable modification is the oxidation of methionine residue in the analyte protein.
viii. Protein Mass: Mass of intact protein in the form of a contiguous stretch including all matched peptides. If mass is unknown, this parameter can be left empty and the mass will remain unrestricted.
Definitions of the components:Part 2 – Peptide Mass Fingerprinting (PMF)
ix. Peptide Tolerance: This is a parameter associated with accuracy and resolution of the mass spectrometer and is used to account for shifts in isotope spacings.
x. Mass Values: To specify the type of charge of the analyte being examined by Peptide Mass Fingerprinting, i.e. MH+ , M-H- or if the masses correspond to neutral values like Mr .
xi. Monoisotopic Mass Vs Average Mass Value: Depending upon the mass accuracy of a spectrometer, the experimental masses calculated for identification of analyte by Peptide mass fingerprinting is either chosen to be monoisotopic mass or the average mass of its isotopic elements. The selection of monoisotopic mass rests upon the ability of the instrument to resolve isotopes, and accurately determine peak mass. Average mass is the sum of abundance-weighted masses of all isotopes while the monoisotopic mass is the sum of masses of the most abundant isotope of each element. If the instrument has insufficient mass resolution capabilities combined with poor signal to noise ratio, the peptide mass of experimental values must be selected as being average to provide better identification.
5. Best fit – Score histogram: The “best fit” is defined as the primary identification of the analyte protein made by the database search algorithm representing either the exact protein being analyzed or the protein with the closest primary sequence homology, unusually with equivalent function in a related species. The score histogram depicts the distribution of protein scores for all the hits obtained by the query.5
3
2
4
1
Part 2, Step 1
Action Audio Narration
1
5
3
2
4Description of the action
There are many MS analysis softwares available online that allow data generated from MS to be analyzed. They require inputs from the user regarding the experimental parameters used such as enzyme cleavage, protein name, fixed modifications etc. and the desired search criteria like taxonomy, peptide tolerance, taxonomy etc. Commonly used protein databases against which the MS information is processed to retrieve sequence data include NCBI, MSDB and SwissProt. The data file generated from MS is uploaded and the search carried out. We will demonstrate data analysis using Mascot (www.matrixscience.com).
As shown in animaion.
First show the computer with the screen having a form on the inside. This must be zoomed into and the form above must be displayed. Each of the fields must be filled in as shown with some requiring selection using the white mouse pointer as depicted.
First show the computer with the screen displaying the search results. This must be zoomed into to clearly depict the report as shown. The arrows with the red text boxes must then appear.
Data output
The final results of the search are depicted in a concise report, beginning with a Protein Score Histogram. The protein score is a measure of the statistical significance of the protein hit. The histogram seen here displays the distribution of protein scores . Random matches made during database comparison are generally found in the green shaded region where the probability of finding a random hit is greater than 5%. The single red peak at the end of the histogram is the protein that has less than 5% chance of being a random hit, making it a statistically significant identity of the unknown protein analyte.
Mascot Search ResultsUser: ProteomicsEmail: [email protected] Search title: Transcription factorDatabase: SwissProtTime stamp: 2 June 2010 at 17:45:35 GMTTop score: 192 for PML_mouse, probable transcription factor
Mascot Score Histogram
>5% Random match
<5% Random match
www.matrixscience.com
Part 2, Step 3
Action Audio Narration
1
5
3
2
4Description of the action
As shown in animaion.
First show the computer with the screen with the search results displayed on the screen. This must be zoomed into to clearly depict it. The green box must then appear and flash along with the arrow and label. The user must be allowed to click on this and is taken to the next slide.
Data output
The Concise Summary report provides details of the peptide matches made by the algorithm which deduces the most probably protein match. The first hit is usually the “best fit” to the experimental masses that were entered in the search query. A protein score higher than 67 is considered to be a significant score. And a lower E value indicates that the probability of the hit being a random event is extremely low. Significant amount of information about the protein can be obtained from the report by clicking on the corresponding protein link.
Show all the text output. Next show the green highlighted boxes one at a time with the corresponding dialogue box appearing for each of the highlighted regions. The results on the next slide must also be displayed along with this page.
On selecting a particular protein link, the protein view provides details regarding the protein score, molecular weight, isoelectric point, the sequence coverage of the protein etc. The greater the percentage sequence coverage, more are the number of matching peptides for that particular protein. All sequences are displayed with the matching sequences being indicated in red.
Match to: PML_MOUSE Score: 192 Expect: 1e-14Probable transcription factor PML
Nominal mass (Mr): 97470; Calculated pI value: 5.88NCBI BLAST search of PML_MOUSE against nrUnformatted sequence string for pasting into other applications
Taxonomy: Mus musculus
Cleavage by Trypsin: cuts C-term side of KR unless next residue is PNumber of mass values searched: 18Number of mass values matched: 15Sequence Coverage: 22%
Protein viewProtein information – data analysis & interpretation
The protein score is a sum of the highest ion scores for each sequence, with duplicate matches being excluded. Score above 67 is significant for this hit.
Predicted mass of the protein. Predicted isoelectric
point of the protein.
Indicates the % of matching peptides. All peptides are
displayed with matching peptides indicated in red.
www.matrixscience.com
Part 2, Step 4 (b)
Action Audio Narration
1
5
3
2
4Description of the action
As shown in animaion.
Protein information – data analysis
Sequence of each peptide fragment processed in the database is displayed along with information regarding its molecular weight, starting and ending amino acid number and the number of missed cleavages during tryptic cleavage. All these data provides a comprehensive understanding of the protein being analyzed.
Show all the text output. Next show the green highlighted boxes one at a time with the corresponding dialogue box appearing for each of the highlighted regions. The results on the next slide must also be displayed along with this page.
www.matrixscience.com
Master Layout (Part 3)
5
3
2
4
1 This animation consists of 3 parts:Part 1 – Typical proteomics experimentPart 2 – Peptide Mass Fingerprinting (PMF)Part 3 – MS/MS Data analysis
Spectra from MS/MS analysis
Online search with sequence databases
Open shareware for MS/MS analysis
Peptide summary report
www.matrixscience.com
Definitions of the components:Part 3 – MS/MS data analysis
1. Tandem MS/MS analysis: This is another protein analysis method which compares the fragmentation spectra of the analyte protein. These fragmentation and parent masses, representative of the amino acid sequence of the analyte’s peptides are then compared to databases of known proteins to identify each peptide at a time and then infer protein identity by searching for the presence of particular peptides.
2. Spectrum from MS/MS analysis: MS/MS analysis generates fragmentation patterns for each peptide of the proteolytic digest. These are useful for determining the sequence of the protein analyte.
3. Online search: Several open source databases are available online, which allow analysis of the MS spectrum generated.
4. Open shareware for MS/MS analysis: This consists of a two step process involving; first, the identification of peptides by comparing sequenced peptides against theoretical databases of MS/MS Spectra generated from primary sequence databases and second, by collating these peptide identifications into a minimal protein list and scoring them to provide statistical validation. In addition to the same fields discussed for PMF, this shareware consists of the following additional fields which need to entered by the user:
i. Database/s: The databases available for MS/MS spectra comparison, include NCBInr Db, SwissProt Db apart from several EST databases if the initial search provides no positive Ids. Selecting a contaminants database is also recommended to eliminate contaminants such as keratin, trypsin and BSA.
5
3
2
4
1
Definitions of the components:Part 3 – MS/MS data analysis
5
3
2
4
1ii. Quantitation: It is a search parameter used to implement different search protocols
which might have been used to quantify protein analyte by mass spectrometry. Some examples of the options available for setting a particular quantitation method include, iTRAQ 4plex, SILAC multiplex, ICAT D8 etc.
iii. Precursor Value: This parameter calls for the m/z value of the parent peptide in case the MS/MS data format does not automatically provide it. It is used, in conjunction with the charge of the parent peptide, to calculate its relative molecular weight (Mr).
iv. Peptide Charge: It is the parameter used to indicate the charge state of the precursor peptide, so that its Mr can be calculated from the observed m/z value.
v. MS/MS Tolerance: It is associated with accuracy and resolution of the mass spectrometer and used to resolve isotope shifts in MS/MS fragmentation masses.
vi. Instrument: Informing the algorithm about the instrument used to carry out fragmentation studies helps especially when instead of just CID, either ETD or ECD has been used. Depending upon the instrument a particular ion stream is used to find a peptide match.
vii. Data Format: There are several data formats that are used to process MS/MS fragmentation data such as SCIEX API III, PerSeptive (.PKS) and Bruker (.XML) associated with software or instrument. Depending upon the search type, individual MS/MS spectrum or thousands of spectra from LC-MS/MS type search can be carried out.
Definitions of the components: Part 3 – MS/MS data analysis
viii. Error Tolerant Search: This parameter can be put to use in case, a large percent of the experimental MS/MS remains unidentified. By performing this type of search, it is possible to make adjustments to accommodate issues such as absence of peptide sequence in database, non-specificity of proteolytic enzyme used for protein digestion or even unknown post-translational modifications that cause fluctuations in the mass of analyte isomers.
5. Peptide summary report: The peptide summary report provides the most probable protein identity by individually identifying and grouping each of the peptides. The greater the number of peptides, the higher the protein score for the hit as it is derived from individual ion scores. Further statistical validations will help ascertain the find and improve the statistical health of the protein hit.
5
3
2
4
1
Part 3, Step 1
Action Audio Narration
1
5
3
2
4Description of the action
The MS/MS data analysis shareware has some extra inputs such as Quantitation, MS/MS tolerance, peptide charge, instrument etc. in addition to the fields for PMF. They require inputs from the user regarding the experimental parameters used such as enzyme cleavage, protein name, modifications etc. and the desired search criteria like taxonomy, peptide tolerance etc. Commonly used protein databases against which the MS information is processed to retrieve sequence data include NCBI, MSDB and SwissProt. The data file generated from MS is uploaded and the search carried out.
As shown in animaion.
First show the computer with the screen having a form on the inside. This must be zoomed into and the form above must be displayed. Each of the fields must be filled in as shown with some requiring selection using the white mouse pointer as depicted.
First show the computer with the screen displaying the search results. This must be zoomed into to clearly depict the report as shown. The red box must appear at the region indicated along with the blue arrow.
Data output
The Tandem MS protein analysis is used to obtain protein identities from each of the sequenced peptides. The results page begins with a list of probable protein identities and their respective sources. The score histogram provides details similar to the PMF analysis, with the probability distribution being displayed graphically. The green shaded region is indicative of a match that has greater than 5% chance of being random while the red peak indicates that the chances of a random match is less than 5%.
Mascot Search ResultsUser: proteomicsEmail: [email protected] Search title: Sample proteinDatabase: NCBInrTaxonomy: MammaliaTime stamp: 2 June 2010 at 17:45:35 GMTProtein hits:
Mascot Score Histogram>5% Random match
<5% Random match
www.matrixscience.com
Part 3, Step 3
Action Audio Narration
1
5
3
2
4Description of the action
As shown in animaion.
First show the computer with the screen displaying the search results. This must be zoomed into to clearly depict the report as shown. The green highlight boxes must then appear with their labels. User must be allowed to click on these highlighted regions. Clicking on ‘protein information’ must redirect user to steps 4 (a) & (b) while ‘peptide information’ must redirect user to steps 5(a) & (b).
Data output
The summary report lists all the protein matches obtained from the database search with their respective molecular weight, protein score, source organism and details regarding each of its fragmented peptides. Further information about any of the protein sequences can be obtained by clicking on the corresponding protein link. Data regarding each of the peptide fragmentation patterns can also be obtained by clicking on the peptide link indicated by the query number.
Peptide summary report
1. gi|31753114 Mass: 30840 Score: 225 Matches: 8(3) Sequences: 3(2) Unknown (protein for IMAGE:5194336) [Homo sapiens] Check to include this hit in error tolerant search Query Observed Mr(expt) Mr(calc) ppm Miss Score Expect Rank Unique Peptide 4 492.2200 982.4254 982.4913 -67.02 0 66 0.00036 1 U K.FGEAVWFK.A
2. gi|47522906 Mass: 60550 Score: 33 Matches: 3(0) Sequences: 2(0) zona pellucida sperm-binding protein 4 [Sus scrofa] Check to include this hit in error tolerant search Query Observed Mr(expt) Mr(calc) ppm Miss Score Expect Rank Unique Peptide 21 649.2406 1296.4666 1296.5768 -85.00 0 31 1.1 1 U K.GPGSSMGVEASYR.G 22 649.2485 1296.4823 1296.5768 -72.88 0 (21) 10 1 U K.GPGSSMGVEASYR.G 69 1237.2689 3708.7849 3710.1076 -356.51 1 3 6.4e+02 5 U K.YSRPPVDSHALWVAGLLGSLIIGALLVSYLVFRK.W
Protein information
Peptide information
www.matrixscience.com
Part 3, Step 4 (a)
Action Audio Narration
1
5
3
2
4Description of the action
As shown in animaion.
Protein information – data analysis & interpretation
Match to: gi|31753114 Score: 225Unknown (protein for IMAGE:5194336) [Homo sapiens]Found in search of C:\Users\harini\Desktop\MS\3C.LC-MS-MS data analysis Raw data file- mgf files\Data file1.mgf
Nominal mass (Mr): 30840; Calculated pI value: 6.00NCBI BLAST search of gi|31753114 against nrUnformatted sequence string for pasting into other applications
Taxonomy: Homo sapiensLinks to retrieve other entries containing this sequence from NCBI Entrez:gi|111494016 from Homo sapiens
Fixed modifications: Carbamidomethyl (C)Variable modifications: Oxidation (M)Cleavage by Trypsin: cuts C-term side of KR unless next residue is PSequence Coverage: 14%
The protein score is a sum of the highest ion scores for each sequence, with duplicate matches being excluded. A score above 67 is considered significant. In this case.
Predicted mass of the protein.
Predicted isoelectric point of the protein.
Indicates the % of matching peptides.
All peptides are displayed with matching peptides indicated in red.
Show all the text output. Next show the green highlighted boxes one at a time with the corresponding dialogue box appearing for each of the highlighted regions. The results on the next slide must also be displayed along with this page.
The protein view obtained on selecting a particular protein link, is very similar to the protein view observed in PMF. It provides details regarding the protein score, molecular weight, isoelectric point, the sequence coverage of the protein etc. Protein scores above 67 are considered significant and greater the percentage sequence coverage, more are the number of matching peptides for that particular protein. All sequences are displayed with the matching sequences being indicated in red.
www.matrixscience.com
Part 3, Step 4 (b)
Action Audio Narration
1
5
3
2
4Description of the action
As shown in animaion.
Protein information – data analysis & interpretation
Information about each of the matched peptides is also displayed. The start and end amino acid positions, calculated and experimental molecular weights, number of missed tryptic cleavages, sequence of each peptide fragment and their corresponding ion scores are shown. The highest ion scores are used for computing the final protein score.
Indicates score of each ion fragment. Used for calculation of the protein score.
Show all the text output. Next show the green highlighted boxes one at a time with the corresponding dialogue box appearing for each of the highlighted regions.
www.matrixscience.com
Part 3, Step 5 (a)
Action Audio Narration
1
5
3
2
4Description of the action
As shown in animaion.
Peptide information – data analysis and interpretation
Each peptide in Tandem MS/MS undergoes a second round of fragmentation when it passes through the second mass analyzer before it reaches the detector. This provides significantly larger amount of information regarding each peptide fragment. This can be viewed by clicking on the peptide links provided in the summary report. The fragmentation pattern is displayed graphically, which can be zoomed into as per the requirement by adjusting the x-axis plot values.
Mascot search resultsPeptide viewMS/MS Fragmentation of FGEAVWFKFound in gi|31753114, Unknown (protein for IMAGE:5194336) [Homo sapiens]
Match to Query 4: 982.425408 from(492.219980,2+) intensity(9920.0000)Title: Sum of 11 scans in range 1333 (rt=1686.21, f=2, i=174) to 1373 (rt=1732.47, f=2, i=184) [\\Qtof\Qtof 17\JAN2004.PRO\Data\6p013-sanjeeva-10.raw]Data file C:\Users\harini\Desktop\MS\3C.LC-MS-MS data analysis Raw data file- mgf files\Data file1.mgf
Peptide sequence whose fragmentation pattern is shown.
Range values for the x-axis that can be modified by the user to zoom in or zoom out of the graphical representation.
Show all the text output. Next show the green highlighted boxes one at a time with the corresponding dialogue box appearing for each of the highlighted regions.
www.matrixscience.com
Part 3, Step 5 (b)
Action Audio Narration
1
5
3
2
4Description of the action
As shown in animaion.
Peptide information – data analysis & interpretation
At low collision energy, each peptide fragment is cleaved at the amide bond which can result in the formation of two types of ions – the y ion & b ion. In y-ions, the positive charge is retained on the C-terminus of the peptide ion while in b-ions, charge is retained on the N-terminal. These ion masses can be used to compute the amino acid sequence by calculating the mass difference between consecutive ions. Each mass difference value corresponds to a particular amino acid, which can be obtained from a standard information table. The y-ion series & the b-ion series run opposite to each other as indicated in the example above.
Mascot search resultsPeptide viewMonoisotopic mass of neutral peptide Mr(calc): 982.4913Fixed modifications: Carbamidomethyl (C) (apply to specified residues or termini only)Ions Score: 66 Expect: 0.00036Matches : 23/78 fragment ions using 16 most intense peaks (help)
# Immon a a0 b b0 Seq y y* y0 #
1 120.0808 120.0808 148.0757 F 8
2 30.0338 177.1022 205.0972 G 836.4301 819.4036 818.4196 7
4 44.0495 377.1819 359.1714 405.1769 387.1663 A 650.3661 633.3395 5
5 72.0808 476.2504 458.2398 504.2453 486.2347 V 579.3289 562.3024 4
6 159.0917 662.3297 644.3191 690.3246 672.3140 W 480.2605 463.2340 3
7 120.0808 809.3981 791.3875 837.3930 819.3824 F 294.1812 277.1547 2
8 101.1073 K 147.1128 130.0863 1
Mass of the peptide fragment displayed.
b-ions: Ions formed with charge retained on N-terminal.
y-ions: Ions formed with positive charge retained on C-terminal.
Amino acid sequence obtained through computation using y-ion and b-ion values.
b1 (148.0757) – b2 (205.0972) = 57.0214 G
y2 (294.1812) - y1 (147.1128) =147.0684 F
y7 (836.4301) – y6 (779.4087))= 57.0214 G
b6 (690.3246) – b7 (837.3930) = 147.0684 F
Show all the text output. Next show the green highlighted boxes one at a time with the corresponding dialogue box appearing for each of the highlighted regions.
www.matrixscience.com
Interactivity option 1:Step No:1
Boundary/limitsInteracativity Type Results
1
2
5
3
4
Choose the correct answer.
The graph above with all values & the table shown in the next slide must be displayed. The four option must be shown & user must be allowed to choose any 1 of the 4 options.
OptionsThe correct answer is D. If user chooses this, it must turn green with the message ‘right answer’. If he chooses any of the others, it must turn red, with the message ‘wrong answer’.
242
402
473
601
530
m/z
0
25
50
75
100
Re
lati
ve
Ab
un
da
nc
e
72
171
299
769
Based on the mass values indicated in the graph shown below and the table provided showing the average and monoisotopic mass of each amino acid, deduce the sequence of this peptide fragment.
Interactivity option 2:Step No:2 1
2
5
3
4
Amino acid 3LC SLC Average MonoisotopicGlycine Gly G 57.0519 57.02146Alanine Ala A 71.0788 71.03711Serine Ser S 87.0782 87.02303Proline Pro P 97.1167 97.05276Valine Val V 99.1326 99.06841Threonine Thr T 101.1051 101.04768Cysteine Cys C 103.1388 103.00919Leucine Leu L 113.1594 113.08406Isoleucine Ile I 113.1594 113.08406Asparagine Asn N 114.1038 114.04293Aspartic acid Asp D 115.0886 115.02694Glutamine Gln Q 128.1307 128.05858Lysine Lys K 128.1741 128.09496Glutamic acid Glu E 129.1155 129.04259Methionine Met M 131.1926 131.04049Histidine His H 137.1411 137.05891Phenyalanine Phe F 147.1766 147.06841Arginine Arg R 156.1875 156.10111Tyrosine Tyr Y 163.1760 163.06333Tryptophan Trp W 186.2132 186.07931
D) AVAGCAGAR
C) AVACCAGAY
B) STAGTAGAR
A) AVAGCGGAFAnswers:
Questionnaire1
5
2
4
3
1. Which one of these is common across all Mass Spec based proteomics experiments carried out?
A) Liquid Chromatography B) Proteolysis C) 2-D Gel Electrophoresis D) Isoelectric Focusing
2. Peptide Mass Fingerprinting or PMF is defined as?
A) Finding the best fit for peptides identified by fragmentation.
B) Finding the best fir for protein by sequencing in a Triple Quadrupole Analyzer.
C) Finding fingerprints of proteins on 2-DE Gels.
D) Finding the best fit for masses of peptides identified by MALDI-TOF.
3. Which one of these mass values represents a protein/peptide ion?
A) M-H- B) M-H+ C) MH+ D) MH-
4. The average mass of which of the following amino acids corresponds to 87.0782?
A) Serine B) Glycine C) Alanine D) Glutamine
Links for further readingReference websites:1. http://www.matrixscience.com – The most popular Open shareware site for processing PMF and
Tandem Mass Spectrometric data called MASCOT is available here.
Research papers:1. Henzel.W.J., Watanabe.C., Stults.J.T. (2003). Protein Identification: The Origins of Peptide Mass
fingerprinting. J Am Soc Mass Spectrom., 14(9)., pp:931-42.
2. Nesvizhskii , A.I., Vitek, O., Aebersold, R. (2007). Analysis and validation of proteomic data generated
by tandem mass spectrometry. Nat.Methods., 4(!0), pp.787-97.
3. Deutsch, E.W., Lam, H., Abersold, R. (2008) Data analysis and bioinformatics tools for tandem mass
spectrometry in proteomics. Physiol Genomics. 33 (1), pp:18-25.
4. Yates, JR., 2008. Mass Spectrometry and the Age of Proteome. J.Mass.Spec., 33(1), pp.1-19.