An ABC of Proteomics The aim of the lecture is to introduce you to the basic methods used in modern proteomics research. Afterwards you should be able to understand current literature and papers that refer to the use of these techniques. It is a short overview, for a more deeper introduction, please look at: Principles of Proteomics, RM Tyman. ISBN 978-1859962732. BIOS scientific publications. (2004) and Principles and Pratice of Biological Mass Spectrometry. C. Dass. ISBN 978-0-471-33053-0. (2006) Wiley Interscience. If your are seriously considering using proteomics techniques in the lab then the following (expensive) texts are highly recommended: Proteins and Proteomics, a laboratory manual. Richard Simpson. ISBN 0-87969-554-4. Cold Spring Harbour Press (2002) and Purifying Proteins for Proteomics, a laboratory manual. Richard Simpson. ISBN 0-87969-696-6. Cold Spring Haroubr Press (2004)
40
Embed
An ABC of Proteomics - LTHAn ABC of Proteomics The aim of the lecture is to introduce you to the basic methods used in modern proteomics research. Afterwards you should be able to
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
An ABC of Proteomics
The aim of the lecture is to introduce you to the basic methods used in modern proteomics research. Afterwards you should be able to understand current literature and papers that refer to the use of these techniques. It is a short overview, for a more deeper introduction, please look at: Principles of Proteomics, RM Tyman. ISBN 978-1859962732. BIOS scientific publications. (2004) and Principles and Pratice of Biological Mass Spectrometry. C. Dass. ISBN 978-0-471-33053-0. (2006) Wiley Interscience. If your are seriously considering using proteomics techniques in the lab then the following (expensive) texts are highly recommended: Proteins and Proteomics, a laboratory manual. Richard Simpson. ISBN 0-87969-554-4. Cold Spring Harbour Press (2002) and Purifying Proteins for Proteomics, a laboratory manual. Richard Simpson. ISBN 0-87969-696-6. Cold Spring Haroubr Press (2004)
An OMICS Glossary
•Genomics
- The study of the set of genes contained in the chromosomes
•Transcriptomics- The study of the set of mRNA molecules being expressed in a specific cell at a
given time under specified conditions
•Proteomics
- The study of the set of proteins being expressed in a specific cell at a given time under specified conditions and the state of modification
•Metabolomics
- The study of the set of small molecules in a specific cell at a given time under specified conditions
These are the basic definitions that cover modern day approaches to biology. From clinical applications to basic instrument development, these subjects now serve as the basic cornerstones upon which systems biology is built.
Genomics
The Genome is the set of genes contained in the chromosomes
−The human genome is contained in 23 pairs of chromosomes (1-22 + X/Y)
−Each chromosome is one long molecule of DNA
−Every cell has the same genome (except in cancer)
The genome is relatively stable, although it is huge in size, 2,000 Megabases, it is now relatively simple to analyse (though expensive). For example, look at the paper describing the first diploid sequence, the personal genome of Craig Venter (Levy et al. The Diploid Genome Sequence of an Individual Human. PLoS Biol. 2007 Sep 4;5(10):e254 )
The Genome
Analogy: Complete works of an author
in partially understood language
Two approaches
Page by page
All at once
Genomics began with the goal of sequencing entire genomes. To accomplish this task, two different sequencing approaches were developed. These methods can be thought of in the following way: Imagine that you have the complete works of an author, written in a language that you studied in school, but never became fluent in. Moreover, the books are in such bad shape that if you open them, they disintegrate. You have two alternatives. You can remove one page at a time, preserve it and decipher it. Or you can open all the books at once and then pick up the fragments of paper and use the words on them to figure out how they fit together.
Page-by-page sequencing strategy
Sequence = determining the letters of each word on each piece of paper
Assembly = fitting the words back together in the correct order
The page-by-page approach to sequencing the human genome was used by the public genome-sequencing consortium. This group first figured out how all the pages fit together and then deciphered all the words on each page. Finally, it assembled the pages back together to produce the whole genome. The advantage of this approach is that it is very precise. The disadvantage is that it takes a long time.
Technical foundations of genomics
Molecular biology: recombinant-DNA technology
DNA sequencing
Library construction
PCR amplification
Hybridisation techniques
Log
MW
Distance
. ...
Almost all of the underlying techniques of genomics originated with molecular biology, or recombinant-DNA technology. In particular, almost all DNA sequencing is still performed using the approach pioneered by Sanger, for which he won his second Nobel Prize. Also essential to high-throughput sequencing is the ability to generate libraries of genomic clones and then cut portions of these clones and introduce them into other vectors. These techniques were developed in the late 1970s by a number of scientists, including Maniatis and Cohen. The use of the polymerase chain reaction (PCR) to amplify DNA, developed in the 1980s, is another technique at the core of genomics approaches. Finally, the use of hybridization of one nucleic acid to another in order to detect and quantitate DNA and RNA was pioneered by Southern and Alewine in the late 1970s. This method remains the basis for genomics techniques such as microarrays.
Genomics relies on high-throughput
Automated sequencers
Robotics
Microarray spotters
Colony pickers
High-throughput genetics
What genomics added to these recombinant-DNA techniques was automation. The innovation that made the greatest impact on genomic sequencing was the use of fluorescent dyes and capillaries in an automated sequencing system. Pictured in the slide is Applied Biosystem’s ABI 3700, which has been the most widely used instrument for large-scale sequencing. It has 96 capillaries that are fed by robotic loading from two 384-well microtiter plates. It makes a sequence run every two to three hours and can read, on average, 600–700 bases per run. Celera, the company that produced a rough draft of the human genome in three years, used 200 of these machines running 24/7 to do so. Similarly, automation was applied to the processes of spotting DNA onto slides to make microarrays and of identifying and isolating bacterial colonies to grow up DNA for sequencing. While initially applied to improving genomics techniques, high-throughput approaches are now permeating much of biology. An example of such an application is the use of robots to automate genetic screens for new mutants.
All-at-once sequencing strategy
Find small pieces of paper
Decipher the words on each fragment
Look for overlaps to assemble
The biotechnology company Celera used the other method, called “whole genome shotgun sequencing,” in its competing effort to sequence the human genome. This method is equivalent to figuring out what’s written on all the fragments of paper from all of the volumes and then figuring out how they piece together. To do this procedure effectively requires starting with several copies of each volume so that overlaps among the fragments can be found. The number of original copies is referred to as “coverage.” To produce a high-quality sequence by this method usually requires eight- to tenfold coverage. The disadvantage of this method is that you rarely get the whole sequence to line up. The advantage is that the portion of the sequence that does line up is acquired much more rapidly than via the page-by-page method.
Shotgun genomics
Generates 100 MB per day, whole genome 2 is GB
This instrument was released onto the market in August 2006. The principle is to chop DNA into small strands and to encapsulate one strand with a bead in an oil droplet in an emulsion. The single DNA molecule is amplified and the resulting identical DNA sequences attached to the bead. By adding one base at a time and measuring the fluorescent output by using the pyrophosphate pp ion released upon base addition to drive a fluorescent reaction, small ca 100bp sequences can be read. Only one bead in each hole, however there are 100,000s of holes thus in a single instrument run of 4 hours, some 25 million bases of sequence can be read. The genome of mycoplasma genitalium was sequenced in one run at over 99.4% accuracy. Reference for method: Genome sequencing in microfabricated high-density picolitre reactors. Nature. 2005 Sep 15;437(7057):376-80
What is Proteomics?
Expression Proteomics- Define all gene products present in a cell and their modifications
Cell-Map Proteomics- Define the spatial & temporal positions of all proteins and interactions
Functional Proteomics- Define the biological function of all proteins within their networks and complexes
Structural Proteomics- Determine the structure of all proteins, alone and in complexes
Population Proteomics- Large scale version of expression proteomics for disease studies
There has been a rush to define everything using an omics suffix, most of which is unnecessary and will disappear with time as the focus goes back to understanding the principles of biology that underline most events studied today.
Proteomics ABC
23,000 genes in the Genome butca. 1,000,000 proteins caused by Exon splicing, 300+ Post-translational modifications
Dynamic RangeCell 106, Plasma 1012
The Dynamic ProteomeTemporal (milliseconds, month)
Spatial (cell, organelle),
Developmental (100+ cell types in the body, years)
All proteins exist in dynamic complexes
This determines their function and is highly dynamic
The point here is the genome deals with 42 molecules per cell. mRNA is found at between 10-1000 copies per cell. Both can be amplified using PCR. Proteins however cannot be amplified and are found a concentration of between 1-1,000,000 copies per cell or 1-1,000,000,000,000 copies per litre in the blood.
The Two Philosophies
‘Traditional’ ProteomicsProtein separation, digestion and MS
Usually two-dimensional electrophoresis, spot picking and digestion
Disadvantages: Lab-to-lab reproducibility, depth of coverage, non-automatable
‘Shotgun’ ProteomicsDigest down to peptides, multidimensional separation and MS
Usually ion-exchange separation followed by reverse phase HPLC-MS
Allows modification specific targetting (isolation of phosphopeptides etc)
Advantages: depth of coverage, automatable
Disadvantages: uncertainty in identification, slow, expensive
This refers back to genomics, a one-by-one approach to sequencing, or a shot-gun approach.
Page-by-page
Sequence = determining the letters of each word on each piece of paper
Assembly = fitting the words back together in the correct order
The page-by-page approach to sequencing the human genome was used by the public genome-sequencing consortium. This group first figured out how all the pages fit together and then deciphered all the words on each page. Finally, it assembled the pages back together to produce the whole genome. The advantage of this approach is that it is very precise. The disadvantage is that it takes a long time.
Protein Separation2D Electrophoresis
Proteins are very heterogenous chemically and there is no one separation method like with DNA (electrophoresis can separate DNA fragments which are over 1,000 bases long but differ in length by one base): Proteins are very hard to separate though the highest resolution method is to separate by charge first (1st dimension) and then by size.
The Current High-Tech Analyser
The left hand caption shows the protein mixture or cell extract being loaded on the first dimension focussing instrument. The right hand photo shows the proteins in the gel strip after removal from the focussing device, being loaded onto the second dimension gel for separation by size.
Two-Dimensional Gel Based Proteomics
Disease Tissue
123
4
5 68
7
39 9 1011
12
1314
15
16171819
202122
23242526
272928
30
3138
3233 40
34 3736
35
Healthy Tissue
The images show the protein patterns of extracts from normal and diseased tissue. The pattern alone can show which state the tissue is in and the spots that are changing are thus the differences between the two. Only those spots which change are cut out for further analysis.
Protein Fingerprinting
Enzyme digest
Peptide masses
Proteins can be identified in simple mixtures by digesting them with an enzyme and then measuring the masses of the peptides formed. The set of masses is called the peptide fingerprint. A database is made containing all the proteins in the species genome and the masses of all the peptides from each protein produced by a certain enzyme are calculated. Thus each protein has a theoretical peptide fingerprint. The experimental fingerprint is then compared to all the theoretical fingerprints and the best match is calculated. This should be the correct identity of the unknown protein.
proteases
• There are two types of Proteases– Endoproteases e.g. trypsin (C-terminal to KR but not if followed by P), AspN (N-
terminal to D), LysC (C-terminal to K), V8 (C-terminal to D and E depending onpH) are specific.There are less specific Endoproteases like Chymotrypsin (C-terminal to WFYLI etc), pepsin, elastase etc
– Exoproteases e.g. Carboxpeptidases C, P and Y and aminopeptidase B
Endoproteinase Red-Case
Specific cuts, n=0 miscuts
Specific cuts, n=1 miscuts
Fingerprints are generated by using specific proteases. These are ones that cut after known amino-acids and hence one can predict theoretically which peptides will be formed.
K <EK> DK <ALK> SK <GWK> IK <MEK> GR <GLVK> SR <YVR> AK <SPIK> VR <ADTR> EK <LEHK> DK <AMGYR> VK <GQIVGR> YK <EELFR> S<SIPETQK> GR <YVVDTSK> K <DIVGAVLK> AK <IGDYAGIK> WK <DIPVPKPK> AR <VLGIDGGEGK> ER <EALDFFAR> GK <ANELLINVK> YK <GVIFYESHGK> LK <CCSDVFNQVVK> SK <SISIVGSYVGNR> AR <SIGGEVFIDFTK> ER <ANGTTVLVGMPAGAK> CK <VVGLSTLPEIYEK> MK <LPLVGGHEGAGVVVGMGENVK> GK <ATDGGAHGVINVSVSEAAIEASTR> YK <YSGVCHTDLHAWHGDWPLPVK> L
A specific enzyme, here trypsin, is used to cut the protein into peptides. Trypsin cuts after arginine (R) and lysine (K) and the masses of the peptides can then be calculated.
Ovarian Unsupervised Analysis
- based on outcome (5 year follow-up)
Even without knowing the identities of the proteins on a gel, the patterns can be used to distinguish different states. Here the patterns distinguish between benign and malignant ovarian cancer.
Gel Reproducibility
BRCA is tissue pre-emptively removed from a patient but clearly shows it was malignant, in the left and right nodes
Unsupervised Pearson Clustering of dye swaps
Here normal tissue was taken from a patient who had hereditary breast cancer and opted to have her ovaries removed. The protein pattern showed that the tissue, although not a tumour, was already in a precancerous state.
All these Proteins Appear in Ovarian, Breast, Prostate, Glioma……
The Hallmarks of Cancer……?
Obviously one can identify the protein spots in the pattern that help distinguish between benign and malignant. However for many different cancer types, the same proteins are involved. This is quite logical since the tumours have the same problems, avoiding detection by the immune system, growing fast, needed blood vessels to supply oxygen to the interior of the tumour etc.
Why bother with Proteins?
DNA Microarray is faster and more accurate??
primary GBM, Grade II, Grade III, normal brain
mR
NA
Prot
ein
Both DNA and Protein expression patterns can determine the cancer state. However, very often the genes used to determine the state are different.
Correlating Proteins and mRNA
Most, ca 58% show a positive correlation
Dow
n-U
p-R
egul
atio
n
Correlating Proteins and mRNA
Some, ca 42% show negative or no correlation
-Cor
rN
o C
orr
Why does the level of protein expression very often not agree with the level of mRNA expression. One example is time dependance. If the mRNA is unstable it may disappear quickly leaving a high amount of protein. Alternatively the mRNA level may be high but the protein not present if a protease has been activated that removes the protein. Also proteins may be excreted from the cell.
Whichever way, You Must Validate
Tissue microarray of anti-Integrin alpha 5 Antibodies
Strong staining in Ovarian Tumours
No staining in normal Ovarian Tissues
Whatever experiment you do, you must confirm your findings using a different technique. Very often antibodies can be used to confirm the presence/abscence of a protein and often to show where the protein is in a cell.
Whole Body Summary for Integrin
For the example of a cancer biomarker, antibodies can be used to screen the whole body to try and confirm that the marker is unique to the tissue of origin.
All-at-once sequencing strategy
Find small pieces of paper
Decipher the words on each fragment
Look for overlaps to assemble
The biotechnology company Celera used the other method, called “whole genome shotgun sequencing,” in its competing effort to sequence the human genome. This method is equivalent to figuring out what’s written on all the fragments of paper from all of the volumes and then figuring out how they piece together. To do this procedure effectively requires starting with several copies of each volume so that overlaps among the fragments can be found. The number of original copies is referred to as “coverage.” To produce a high-quality sequence by this method usually requires eight- to tenfold coverage. The disadvantage of this method is that you rarely get the whole sequence to line up. The advantage is that the portion of the sequence that does line up is acquired much more rapidly than via the page-by-page method.
2D-HPLC in depth and breadth coverage
Digesting a whole cell extract generates around 100,000 different peptides. These are usually separated in multiple dimensions. Here they are separated using strong cation exchange chromatography (according to charge) and each of the fractions is then separated by reverse-phase chromatography (according to charge). Usually the peptides are detected by a mass spectrometer which is coupled directly to the end of the reverse-phase column.
Non-gel Based Proteomics
IIICorrelative sequence database searching
Theoretical AcquiredProtein identification
200 400 600 800 1000 1200m/z
200 400 600 800 1000 1200m/z
Q2Collision Cell
Q3
II200 400 600 80010001200m/z
Tandem mass spectrumQ1
*
IPeptides
1D, 2D, 3D peptide separation
12 14 16Time (min)
*
Protein mixture
MS scanning modes
The mass spectrometer used for detection operates in two modes. In the first scan, the masses of the intact peptides coming from the HPLC are measured. Then in the second scan, one peptide is selected, broken into pieces by smashing it into a gas (CID) and the fragments are measured (MS/MS mode). Then the mass spectrometer goes back to measure intact masses again (MS mode)
Peptide Fingerprinting
Peptide disintegration
Fragment ions
Peptides are identified in a similar way to how proteins are identified. Maybe 10 peptides are entering the mass spectrometer. The MS picks automatically the most intense, isolates it (throwing away the other 9 peptides) and then smashes it into pieces. The mass of the peptide is used to search the database to find all peptides with the same mass. The fragmentation spectra of all these peptides are then calculated and compared to the experimental fragments observed. The best matching peptide sequence is then selected.
(Reference)
Stable Heavy
• Ratio of h/l signals indicates ratio of analytes
Sample 1 Sample 2
IncorporateStable Light
Isotope
Incorporate
Isotope
Analyze by Mass Spectrometer
Combine Samples
Quantitation by Isotope dilution
• h/l analytes are chemically identical ⇒ identical specific signal in MS
The most common way to compare samples is to label one same with a heavy atom. For example cells grown in normal medium have oxygen in the form of O-16. However the cells can also be grown in O-18 water. Alternatively amino acids with C-13 can be fed to the cells. The proteins from each cell are mixed together and then separated by 2D-HPLC. The peptides from each cell elute at the same time, but one peptide is heavier that the other.
Protein Labelling in Culture
Cell culture SILAC (for MS)
–Control with Arg C-12 and/or Lys -12
–Experimental with Arg C-13 and/or Lys -13
–Extract and combine proteins then digest
Mass spectrometry
Inte
nsity
m/z
Metabolic stableisotope labeling
Digest
Mix protein extractsDigest together
Culture A Culture B
AA
ABB
B
Here cells are grown in culture with labelled amino acids. The peptides appear as pairs, one light from culture A and one heavy from culture B.
Peptide Digest Labelling
Peptide labelling strategies
Chemical labelling of protein, mix, digest
Digest, chemical label, mix
Enzymatic digestion in H2O18 or H2O16, mix
Isotope taggingby chemical reaction
Digest
Label
Mass spectrometry
Inte
nsity
Alternatively, chemical labels can be attached to the peptides before or after digestion but before mixing.
usefulness
This is an excellent example of modern mass spectrometry coupled with intelligent biology. The proteins in the signalling pathway of the DAF-2 receptor are identified.
Experiment
Feed C.elegans on N15 labelled bacteria
Use data-independent MS/MS
Find DAF-2 targets changed by mutation
Confirm by SRM
The whole worm is fed nitrogen in the form of ammonium salts, either as the light (N-14) or as heavy isotope (N-15). The a worm with a mutation in the DAF receptor was generated which causes the worm to live longer. The proteins that were up or down regulated when comparing the mutant to the wild-type were identified by MS.
only orthogonal data
The levels of the proteins were analysed after temperature shifting. The proteins that were identified as changing in expression level were confirmed by a special mass spectrometric technique called SRM which allows only those peptides coming from the proteins identified to be detected. The proteins were also confirmed by western blotting using antibodies
Confirm targets by rnai
To really confirm that the proteins identified were involved in the DAF-2 receptor pathway, microRNAs called inhibitor RNA were added which destroy the mRNA for that protein knocking it out temporarily. The effect of this on the biological function was then assayed (the formation of a hibernation life stage called a dauer).