Introduction to Protein Chemistry October 2013 Gustavo de Souza IMM, OUS
Jan 13, 2016
Introduction to Protein Chemistry
October 2013
Gustavo de SouzaIMM, OUS
Relevance of the Proteome
Relevance of the Proteome
«The recipe of life»
X
Chocolate cake:-Egg-Flour-Sugar-Baker’s yeast-Chocolate
Biological relevance lies on how genes are expressed and translated to proteins, not if genes are present or not
Amino acid structure
AA side chains
Protein Translation
Peptide Bond
Primary Structure
Primary Structure
>sp|F2Z333|CA233_HUMAN Fibronectin type-III domain-containing transmembrane protein C1orf233
MRAPPLLLLLAACAPPPCAAAAPTPPGWEPTPDAPWCPYKVLPEGPEAGGGRLCFRSPAR
GFRCQAPGCVLHAPAGRSLRASVLRNRSVLLQWRLAPAAARRVRAFALNCSWRGAYTRFP
CERVLLGASCRDYLLPDVHDSVLYRLCLQPLPLRAGPAAAAPETPEPAECVEFTAEPAGM
QDIVVAMTAVGGSICVMLVVICLLVAYITENLMRPALARPGLRRHP
Folding
Primary Structure - Folding
>sp|F2Z333|CA233_HUMAN Fibronectin type-III domain-containing transmembrane protein C1orf233
MRAPPLLLLLAACAPPPCAAAAPTPPGWEPTPDAPWCPYKVLPEGPEAGGGRLCFRSPAR
GFRCQAPGCVLHAPAGRSLRASVLRNRSVLLQWRLAPAAARRVRAFALNCSWRGAYTRFP
CERVLLGASCRDYLLPDVHDSVLYRLCLQPLPLRAGPAAAAPETPEPAECVEFTAEPAGM
QDIVVAMTAVGGSICVMLVVICLLVAYITENLMRPALARPGLRRHP
Folding
Proteins can adopt only a limited number of different protein folds
Secondary Structure
Tertiary Structure
Quaternary Structure
Primary to Quaternary
Primary to Quaternary
What is a «protein sample» in proteomics?
RNA-binding protein modules
Take home message
1. Proteins are the functionally active molecule in a cell. 2. They possess a high degree of chemical and
structural heterogeneity.
3. Heterogeneity interfere in how a protein sample can be analyzed
Challenges in Protein and Proteomic Analysis
October 2013
Gustavo de SouzaIMM, OUS
A dangerous idea…
One gene, one protein
Homo sapiens
Complexity of Protein Samples in Eukaryotes
Complexity of Protein Samples in Eukaryotes
A less dangerous idea
One gene, some proteins (let’s say average 5 per gene)
Homo sapiens
Complexity of Protein Samples in Eukaryotes
PTMs
(modifications thatcontrol conformationchanges in histones)
An even less dangerous idea
One protein, possible 8 modification sites
Homo sapiens
An even less dangerous idea
But in reality…
One specific cell does NOT express all genes at once!
-Several transcriptomics studies indicated that the cells under study have ~14000 transcripts at a certain time
Homo sapiens
Proteome Dynamics
A B C
Genome is a relatively static element of an organism, the proteome is changing accordingly to cell type, cell
stage developmet, response to stress, etc.
Proteome dynamics within the same cell
Proteome can change with the least of the stimuli within a cell
Proteome chemical heterogeneity
DNA - Negatively charged molecule
Has the same phisico-chemical featuresregardless of: its nucleotide sequence,its tissue source, its donor source, thespecies of the donor, etc.
Amino acid structure
AA side chains
Proteome chemical heterogeneity
Membrane proteins
Proteome dynamic range
GenomeMostly, individual genes are observed equimolar amounts in a DNA molecule
Transcription/Translation
Protein concentration within a cellis unique to each individual protein
Difference between most and leastabundant molecule = dynamic range
Proteome dynamic range
Proteome dynamic range
Geiger et al., MCP 2012
Dynamic range of a proteome estimated to be around 10e8 (in serum isbelieved to be over 10e10)
Difference between the most and lowest abundant proteins
Pro
tein
ab
unda
nce
Protein GO classification
Cytoskeleton (Actin, tubulin, vimentin)
Chaperons (hsp60, hsp70, calreticulin)
Mytochondria (respiratory chain)
Metabolism (glycolisis, ribosomal)
Structure Nucleus (histones)
OrganellesSignalling pathway proteins, transcription factors, etc
Proteome dynamic range
Instrumentation
Aebersold & Mann, Nature 2003
Instrumentation
- Instrumentations with different hardware generate different types of raw data.
- Different brands developed different computer formats, with need for different libraries to read the file.
- Which lead to development of a whole bunch of specific software usingspecific computational protocols.
- Lack of standard routine.
Take home message
1. Proteomic composition is at least 6x more complex than the genomic composition of a cell, if only number of entities is considered.
2. It is an ever changing feature, limited by spatial and
time constrains.
3. Chemical properties and dynamic range has an relevant impact in success rate of identification using proteomic methods.
4. Instrumentation and Analysis is not standardized.
Introduction to Mass Spectrometry
Interpreting peptide/protein data
October 2013
Gustavo de SouzaIMM, OUS
3D Quadrupole ion trapLinear Quadrupole ion trap
Lets talk about…physics
What is it?
-Instrument which can detect the mass-to-charge (m/z)
of ions (or ionized molecules).
a) Ionization must generate ions in gas-phase
b) Ion detection is proportional to its abundance in the sample
c) MS performs at extremely low pressures (vacuum)
- Any molecule is ionizable: small organic/inorganic chemicals(less than 300 Da), average sized peptides or DNAfragments, intact proteins.
Mass Spectrometry Scheme
InletIon
SourceMass
AnalyzerDetector
MALDIES
Time-of-FlightQuadrupole
Ion Trap
LC
Ion Intensity = Ion abundance
Isotopes
Normally observed in nature.
Mass difference = 1 Da
What to expect from a mass spectrum
m/z
Inte
nsity
Avogadro number = 6.022x10e23 /mol
100331_Gustavo_Tuberculosis_179rif_Rep1_07 #2435 RT: 38.32 AV: 1 NL: 4.95E5T: FTMS + p NSI Full ms [300.00-2000.00]
1033.0 1033.5 1034.0 1034.5 1035.0 1035.5 1036.0 1036.5 1037.0 1037.5m/z
0
5
10
15
20
25
30
35
40
45
50
55
60
65
70
75
80
85
90
95
100R
ela
tive
Ab
un
da
nce
1034.49
1034.99
1035.49
1035.99
- Isotopes (12C, 13C, 14N, 15N)
Peptide mass spectrum
Mass Spectrometry Scheme
InletIon
SourceMass
AnalyzerDetector
MALDIES
Time-of-FlightQuadrupole
Ion Trap
LC
How is a sample ionized?
-Electron ionization-Chemical ionization-Fast Atom/Ion Bombardment-Field desorption-Plasma Desorption-Laser Desorption and MALDI-Thermospray-Electrospray-Atmospheric pressure chemical ionization
Matrix Assisted Laser Desorption Ionization
Peptide spectrum on MALDI
Protein spectrum on MALDI
A little history…
1985 – First use: up to a 3 kDa peptide could be ionized
1987 – Method to ionize intact proteins (up to 34 kDa) described
Instruments have no sequence capability
1989 – ESI is used for biomolecules (peptides)
Sequence capability, but low sensitivity
1994 – Term «Proteome» is coined
1995 – LC-MS/MS is implemented
«Gold standard» of proteomic analysis
A little history…
- Laborious- Low reproducibility- Time consuming- Low sensitivity- Limited amount of
identifications
Gradient elution:200 nl/min
Column (75 mm)/spray tip (8 mm)
Reverse-phase C18 beads, 3 mm
Platin-wire2.0 kV
Sample Loading:500 nl/min
No precolumn or split
ESI
15 cm
Fenn et al., Science 246:64-71, 1989.
Electrospray Ionization
ESI multiple charged elements
Peptides
+ + (-NH2)
+ ++
Proteins
+ + + + + + ++ + + + + + + +
+ ++ +
+ + +
+ +
ESI multiple charged elements
+ ++ +
+ + +
+ +
m/z
Inte
nsity
500.5 (+2)
334.0 (+3)
250.75 (+4)
1000 Da
100331_Gustavo_Tuberculosis_179rif_Rep1_09 #3828 RT: 56.72 AV: 1 NL: 1.53E7T: FTMS + p NSI Full ms [300.00-2000.00]
400 600 800 1000 1200 1400 1600 1800 2000m/z
0
5
10
15
20
25
30
35
40
45
50
55
60
65
70
75
80
85
90
95
100
Re
lativ
e A
bu
nd
an
ce
766.72
867.95
653.33
578.641149.57
709.06
557.31 1063.09
728.39
1227.11921.51 1891.351682.72483.80 1453.231346.65343.21
100331_Gustavo_Tuberculosis_179rif_Rep1_09 #3828 RT: 56.72 AV: 1 NL: 2.36E6T: FTMS + p NSI Full ms [300.00-2000.00]
1148.0 1148.5 1149.0 1149.5 1150.0 1150.5 1151.0 1151.5 1152.0m/z
0
5
10
15
20
25
30
35
40
45
50
55
60
65
70
75
80
85
90
95
100
Re
lativ
e A
bu
nd
an
ce
1149.57
1149.07
1150.07
1150.57
1151.08
1151.571152.08
100331_Gustavo_Tuberculosis_179rif_Rep1_09 #3828 RT: 56.72 AV: 1 NL: 1.53E7T: FTMS + p NSI Full ms [300.00-2000.00]
765.0 765.5 766.0 766.5 767.0 767.5 768.0 768.5 769.0 769.5m/z
0
5
10
15
20
25
30
35
40
45
50
55
60
65
70
75
80
85
90
95
100
Re
lative
Ab
un
da
nce
766.72
766.38
767.05
767.38
767.72
768.05768.39765.39764.90 766.04
767.43
0.5 Da (+2) 0.33 Da (+3)
Mr = 2297.14 Da
Peptides on ESI
ESI of intact protein *
Mass Spectrometry Scheme
InletIon
SourceMass
AnalyzerDetector
MALDIES
Time-of-FlightQuadrupole
Ion Trap
LC
How is an ion mass measured?
Time-of-flight
m/z
How is a ion mass measured?
Quadrupoles (RF)
How is a ion mass measured?
Orbitraps
Tandem Mass Spectrometry
InletIon
SourceMass
AnalyzerDetector
IonSource
MassAnalyzer
DetectorMass
AnalyzerMass
Analyzer
Collision cell
Data Dependent Acquisition
899.013
899.013
899.013
MS1 (or MS)
MS2 (or MS/MS)
*
Important Parameters in MS
- Resolution- Sensitivity- Dynamic Range…
m/zm/z
m/z m/z
2+ 2+
High resolution in MS
1. mass accuracy
Expected mass
Observed mass
High resolution in MS
1. mass accuracy
Ion trap (LTQ) Mass accuracy
-600-400-200
0200400600
0 1000 2000 3000
Mass [Da]
Err
or
[pp
m]
Av. = 65.8 ppm ± 71.5
FTICR MS SIM (LTQ-FT, 50K)
-3
-2
-1
0
1
2
0 1000 2000 3000 4000
Mass [Da]
Err
or
[pp
m]
qTOF Mass Accuracy (QSTAR)
-60-40-20
0204060
0 1000 2000 3000
Mass [Da]
Err
or
[pp
m]
Av. = 16.5 ppm ± 11.2
FTICR MS (LTQ-FT, 500K)
-30
-20
-10
0
10
20
30
500 1000 1500 2000 2500 3000
Mass [Da]
Err
or [p
pm]
Av. = 2.1 ppm ± 1.9 Av. = 0.68 ppm ± 0.47
RT
m/z
RT
m/z
2+ 2+
3+ 3+
2. Peak separation
High resolution in MS
LC-MS/MS
With all we (hopefully) learned so far
1) Use strong detergent for cell lysis and protein solubization (SDS,Triton, NP40, Tween)
2) LysC (cuts C-terminal side of K) and/or Trypsin (C-terminal of K and R)
With all we (hopefully) learned so far
ADFFFSTTHAASRMSHHHGTYYPPHKRFSDDDDT
ADFFFSTTHAASRMSHHHGTYYPPHKFSDDDT
+ +
Arg Lys
With all we (hopefully) learned so far
3) Nano-LC (300nL/min)
5) Quadrupole-Orbitrap (QExactive)
With all we (hopefully) learned so far
Mobile phase
A
A = 5% organic solvent in waterB = 95% organic solvent in water
B
C18 column, 25cm long
Time
20 s
899.013
899.013
899.013
With all we (hopefully) learned so far
MS1 (or MS)
MS2 (or MS/MS)
With all we (hopefully) learned so far
Quadrupole
Orbitrap
From Michalski et al., MCP 10, 2011.
With all we (hopefully) learned so far
172,800
Take home message
- Great diversity of hardware and principles. Different formsof Ionization and Mass measurement.
- For protein ID, information regarding the mass of a integral peptide and the mass of its fragments is enoughto provide identification
- Mass spectrometry is used to analyze the molecular massof molecules.