Introduction to high-throughput analysis of proteins and metabolites by Mass Spectrometry The basic principle Brief introduction of techniques Computational issues
41
Embed
Introduction to high-throughput analysis of proteins and metabolites by Mass Spectrometry The basic principle Brief introduction of techniques Computational.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Slide 1
Introduction to high-throughput analysis of proteins and
metabolites by Mass Spectrometry The basic principle Brief
introduction of techniques Computational issues
Slide 2
Background High-throughput profiling of biological samples
(Picture edited from
http://www.ncbi.nlm.nih.gov/Class/MLACourse/Modules/MolBioReview/)
Metabolites Red line: central dogma Blue line: interaction DNA:
genotype, copy number, epigenetics... RNA: expression levels,
alternative splicing, microRNA Protein: concentration,
modification, interaction Metabolite: concentration, modification,
interaction
Slide 3
Why Mass Spectrometry The question: In the biological system,
there are tens of thousands (species) of proteins and metabolites.
How to identify and quantify them from a sample? Which protein is
this? Does it change significantly between control/disease
samples?
Slide 4
Background In a complex network, even if we know the entire
structure, the network behavior is hard to predict. Direct
profiling gives us snapshots of the status of the system. (Picture
from KEGG PATHWAY)
Slide 5
Proteins/metabolites could be separated according to their
properties: mass/size hydrophilicity/hydrophobicity binding to
specific ligands charge Using Chromatography Electrophoresis
http://en.wikibooks.org/ Why Mass Spectrometry
Slide 6
Problems with these separation techniques: Reproducibility
Identification / Quantification Inability to separate tens of
thousands of species Mass Spectrometry: Highly accurate, highly
reproducible measurements Theoretical values easy to obtain
identification Can study protein modifications (small ligands
attached) Measurements based on mass/charge ratio (m/z)
Slide 7
Mass Spectrometry --- getting ion from solution to gas phase
Matrix assisted laser desorption ionization (MALDI) Electrospray
ionization (ESI) Picture provided by Prof. Junmin Peng (Emory)
Slide 8
Mass Spectrometry --- finding m/z Time-of-flight: Putting a
charged particle in an electric field, the time of flight is k: a
constant related to instrument characteristics
Slide 9
Quadrupole: Radio-frequency voltage applied to opposing pair of
poles. Only ions with a specific m/z can pass to the detector at
each frequency. Mass Spectrometry --- finding m/z
Slide 10
Fourier transform MS. Ions detected not by hitting a detector,
but by passing by a detecting plate. Ions detected simultaneously.
Very high resolution. m/z detected based on the frequency of the
ion in the cyclotron. Mass Spectrometry --- finding m/z
Slide 11
Why is simple MS not enough A biological sample consists of
tens of thousands of species of molecules. The resolution is not
enough for clear separation. Biological interactions between the
molecules may interfere with ionization. The solution:
Multi-dimensional separation: combining MS with protein breakage by
enzymatic digestion and collision decomposition electrophoresis
chromatography
Slide 12
Tandem Mass Spec (MS/MS) for protein identification Picture
provided by Prof. Junmin Peng (Emory)
Slide 13
2D gel MS/MS Control samples Treatment samples Differential
spots In-gel digestion MS/MS protein identification
Slide 14
Int J Biol Sci 2007; 3:27-39 2D gel differential protein
finding in-gel digestion MS/MS protein identification
Slide 15
LC/MS Liquid chromotography retention time Mass-to-charge ratio
(m/z) Take slices in retention time, send to MS
Slide 16
LC/MS-MS Picture provided by Prof. JunminPeng (Emory)
Slide 17
LC/MS-MS Here is an example of LC/MS spectrum. The second MS
serves the purpose of protein identification. Matching the sequence
found by the second MS falls into the realm of sequence comparison
and database search. Peak quantification is done by the first MS.
(a) Original spectrum; (b) square root-transformed spectrum to show
smaller peaks; (c) A portion of the spectrum showing details.
Slide 18
Between proteomics and metabolomics Proteomics uses LC/MS-MS.
The second MS is for protein identification. Metabolomics uses
LC/MS. Sometimes a second MS is used, but data interpretation for
metabolite identification is much harder. What concerns
statisticians: (1) The shared LC/MS part: In metabolomics:
quantification, identification In proteomics: quantification (2)
The second MS: Protein identification: sequence modeling/comparison
Protein quantification: merging values from different peptides from
the same protein.
Slide 19
Some computational issues in LC/MS-MS Modeling peaks. Noise
reduction & peak detection Multiple peaks from one molecule
caused by (1) isotopes (2) multiple charge states Retention time
correction. Peak alignment. Peak quantification, especially with
overlapping peaks caused by m/z sharing (mostly in metabolomics)
From peptides to proteins.
Slide 20
General workflow for LC/MS
Slide 21
Modeling peaks In high-resolution LC/MS data, every peak is a
thin slice --- there is no need to model the MS dimension. Modeling
the LC dimension is important for quantification. Models have been
developed for traditional LC data, which can be applied here. Most
empirical peak shape models were derived from Gaussian model.
Changes were made to account for asymmetry in the peak shape.
Slide 22
Modeling peaks Asymmetric peak. asymmetry factor: b/a at 0.1h
Data Analysis and signal processing in chromatography. A.
Felinger
Slide 23
Modeling peaks The bi-Gaussian model: The area under peak is:
Data Analysis and signal processing in chromatography. A.
Felinger
Slide 24
Modeling peaks Generalized exponential function Data Analysis
and signal processing in chromatography. A. Felinger
Slide 25
Modeling peaks Data Analysis and signal processing in
chromatography. A. Felinger Log-normal function.
Slide 26
Noise reduction Reviewed by Katajamaa&Oresic (2007) J Chr.
A 1158:318
Slide 27
Noise reduction Signal-to-noise (S/N) ratio Where to make the
cut? Should it be a straight line or a smoother?
http://www.appliedbiomics.com/Service/Promotions/promotions.html
Slide 28
Anal Chem. 2006 Feb 1;78(3):779-87. Using filters to detect
peak from noise in conjunction with hard cutoff. Noise reduction
& peak detection
Slide 29
Matched filter. Calculate the convolution of the signal (x)
with the reverse of the standardized peak shape model (f). Try to
minimize with regard to the peak height alpha and the peak location
tau. Take differential: Data Analysis and signal processing in
chromatography. A. Felinger
Slide 30
Noise reduction & peak detection With data from Gaussian
model: The above equations become: The goal is to find where The
corresponding is the peak intensity. Data Analysis and signal
processing in chromatography. A. Felinger
Slide 31
Noise reduction & peak detection Data Analysis and signal
processing in chromatography. A. Felinger
Slide 32
Anal Chem. 2006 Feb 1;78(3):779-87. Retention time correction
With every run, the LC dimension data has some fluctuation.
Identify reliable peaks in both samples, use non-linear curve
fitting to adjust the retention time.
Slide 33
Multiple peaks from one molecule Caused by multiple charge
states (z = 1, 2, 3,), and different number of carbon isotopes
present in the molecule. Example: m=1000 (all C12) 1000 1001 1002
1003 500 500.5 501 501.5 333.33 333.67 334 3 charges 2 charges
single charge
Slide 34
Peak alignment Reviewed by Katajamaa&Oresic (2007) J Chr. A
1158:318
Anal Chem. 2006 Feb 1;78(3):779-87. Peak alignment First align
m/z dimension by binning. Use kernel density estimation to find
meta-peaks.
Slide 37
Dealing with overlapping peaks (1) Matched filter. (2) Some
traditional methods. Data Analysis and signal processing in
chromatography. A. Felinger
Slide 38
Dealing with overlapping peaks (3) Statistical modeling using
the EM algorithm Bi-Gaussian mixture Gaussian mixture
Slide 39
Anal Chem. 2006 Feb 1;78(3):779-87. An example of the overall
strategy in LC/MS metabolomics
Slide 40
In a complex biological sample (cell, tissue, serum, ), there
are several thousand proteins tens of thousands of peptides after
digestion; signal from less-abundant species may be suppressed.
Solution: Must reduce complexity to identify and quantify proteins.
Incorporate biochemical separation techniques: LC-MS/MS LC/LC-MS/MS
2D gel-MS/MS 2D gel/LC-MS/MS Affinity column separation LC-MS/MS
Separate proteins in multiple dimensions. Sacrifice speed. Analyze
a subset of proteins. Sacrifice coverage. Beyond LC/MS-MS