Top Banner
High-throughput Proteomics David Birnbaum
56

High-throughput Proteomics David Birnbaum. Introduction What is Proteomics ? Proteomics is the analysis of genomic complements of proteins. Why proteomics.

Jan 02, 2016

Download

Documents

Shauna Cain
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: High-throughput Proteomics David Birnbaum. Introduction What is Proteomics ? Proteomics is the analysis of genomic complements of proteins. Why proteomics.

High-throughput Proteomics

David Birnbaum

Page 2: High-throughput Proteomics David Birnbaum. Introduction What is Proteomics ? Proteomics is the analysis of genomic complements of proteins. Why proteomics.

Introduction

What is Proteomics ?

Proteomics is the analysis of genomic complements of proteins.

Why proteomics ?

•Until now we have looked at many methods which deal with RNA & DNA.

•However, Proteins ultimately define how the cell behaves.

•Therefore, information about RNA & DNA, while a necessary prerequisite to analyzing proteins on large scale can’t tell us much about the function of the proteins they encode for.

•Proteomics tries to define the function, quantities and structures of large complements of proteins.

Page 3: High-throughput Proteomics David Birnbaum. Introduction What is Proteomics ? Proteomics is the analysis of genomic complements of proteins. Why proteomics.

What will we see?

• We will see two studies, from Erin O’shea’s lab who used the same method to manipulate large complements of proteins.

• They checked localization and expression levels of proteins.

• A study which tries to build a better model of the cell’s dynamics.

• Some proteomics methods

Page 4: High-throughput Proteomics David Birnbaum. Introduction What is Proteomics ? Proteomics is the analysis of genomic complements of proteins. Why proteomics.

Global analysis of protein localization in budding yeastWon-Ki Huh et. al. nature 425 (october 2003)

• Goal: to classify for each Protein the cell compartment(s) in which he resides.

• This would lead to understand and verify data about protein-protein interactions and protein function

Page 5: High-throughput Proteomics David Birnbaum. Introduction What is Proteomics ? Proteomics is the analysis of genomic complements of proteins. Why proteomics.

Howson et al. Construction, verification and experimental use of two epitope-tagged collections of budding yeast strains Comp Funct Genom 2005; 6: 2–16.

Page 6: High-throughput Proteomics David Birnbaum. Introduction What is Proteomics ? Proteomics is the analysis of genomic complements of proteins. Why proteomics.

Howson et al. Construction, verification and experimental use of two epitope-tagged collections of budding yeast strains Comp Funct Genom 2005; 6: 2–16.

Page 7: High-throughput Proteomics David Birnbaum. Introduction What is Proteomics ? Proteomics is the analysis of genomic complements of proteins. Why proteomics.

Localization determination

• Micrographs of each strain lacking ORF name were evaluated independently by two scorers.

• Initial classification of each protein to one or more of 12 subcellular localization categories

• Refinement by co-localization

Page 8: High-throughput Proteomics David Birnbaum. Introduction What is Proteomics ? Proteomics is the analysis of genomic complements of proteins. Why proteomics.

Rox3nucleus

llv6mitochondrion

Hof1Bud neck

Erg6Lipid particle

Pho86ER

Nic96Nuclear perphery

Example micrograph

Page 9: High-throughput Proteomics David Birnbaum. Introduction What is Proteomics ? Proteomics is the analysis of genomic complements of proteins. Why proteomics.

Co-localization

Final localization to the nucleolus

Final localization to the nucleus

nucleolus marker

nucleolus marker

Cbf2 was initially

localized to the nucleus

Utp13 was initially

localized to the nucleus

Page 10: High-throughput Proteomics David Birnbaum. Introduction What is Proteomics ? Proteomics is the analysis of genomic complements of proteins. Why proteomics.

Results

Page 11: High-throughput Proteomics David Birnbaum. Introduction What is Proteomics ? Proteomics is the analysis of genomic complements of proteins. Why proteomics.

Results

Page 12: High-throughput Proteomics David Birnbaum. Introduction What is Proteomics ? Proteomics is the analysis of genomic complements of proteins. Why proteomics.

Comparison with older data

• The Saccharomyces Genome Database (SGD) contained localization information for 60% of the proteins visualized.

• 80% aggrement with SGD • A mass spectrometric analysis of the

nuclear pore complex revealed 29 NPC proteins.

• 25 of those were visualized, 23 to the nuclear periphery.

Page 13: High-throughput Proteomics David Birnbaum. Introduction What is Proteomics ? Proteomics is the analysis of genomic complements of proteins. Why proteomics.

Localization and interaction

• There is large-scale data about protein-protein interaction

• However, total coverage is poor and false positive rates remain high.

• To interact proteins need to be in close proximity.

• Therefore, combining the localization and interaction data might help to verify interaction results.

Page 14: High-throughput Proteomics David Birnbaum. Introduction What is Proteomics ? Proteomics is the analysis of genomic complements of proteins. Why proteomics.

Localization and interaction

• Data about interactions was taken from the GRID db.

• The statistical method:• For each interaction pair the fraction of total

number of interactions occurring between it was calculated

• A set of random pairs was chosen and analyzed the same way

• Fold enrichment was calculated between the two sets.

Page 15: High-throughput Proteomics David Birnbaum. Introduction What is Proteomics ? Proteomics is the analysis of genomic complements of proteins. Why proteomics.

The interaction matrix

Page 16: High-throughput Proteomics David Birnbaum. Introduction What is Proteomics ? Proteomics is the analysis of genomic complements of proteins. Why proteomics.

Interaction map

Page 17: High-throughput Proteomics David Birnbaum. Introduction What is Proteomics ? Proteomics is the analysis of genomic complements of proteins. Why proteomics.

problems

• Possible destruction of localization signals

• Possible Interruption of post translational modifications

• Fusion proteins might have steric hindrance

Page 18: High-throughput Proteomics David Birnbaum. Introduction What is Proteomics ? Proteomics is the analysis of genomic complements of proteins. Why proteomics.

Summary

• Localization data for many previously unlocalized proteins was collected.

• Agreement with older data.

• Identification of interacting compartments.

Page 19: High-throughput Proteomics David Birnbaum. Introduction What is Proteomics ? Proteomics is the analysis of genomic complements of proteins. Why proteomics.

Web Site

• The authors created a web site with all the micrographs and the localization data:

• http://yeastgfp.ucsf.edu/

Page 20: High-throughput Proteomics David Birnbaum. Introduction What is Proteomics ? Proteomics is the analysis of genomic complements of proteins. Why proteomics.

Global analysis of proteinexpression in yeast

Ghaemmaghami S. et. al. nature 425 (October 2003)

• Goal: to globally determine protein expression levels in the infamous budding yeast.

• Identification of mis-annotated genes

Page 21: High-throughput Proteomics David Birnbaum. Introduction What is Proteomics ? Proteomics is the analysis of genomic complements of proteins. Why proteomics.

The experiment

• Creation of fusion library:– The tag is the Tandem Affinity Purification Tag

(TAP)– Marker for histidine

Page 22: High-throughput Proteomics David Birnbaum. Introduction What is Proteomics ? Proteomics is the analysis of genomic complements of proteins. Why proteomics.

The experiment

• Checking that the proteins are fused and are activeWestern blot analysis.Clb2, Sic 1 comparison.Successful integration for 98% of ORF’s

annotated in the Sacchromyces DB.Successful integration for 93% of essential

genes

Western blotClb2, Sic1 comparison

Page 23: High-throughput Proteomics David Birnbaum. Introduction What is Proteomics ? Proteomics is the analysis of genomic complements of proteins. Why proteomics.

Results

• Detected 79 % of essential ORF’s.

• 83 % of ORF’s with assigned gene names

• Identified Protein product for 1,018 functionally uncharacterized ORF’s

• 73 % of all annotated ORF’s– Sporious ORF’s– Genes encoding for product not needed in log

phase

Page 24: High-throughput Proteomics David Birnbaum. Introduction What is Proteomics ? Proteomics is the analysis of genomic complements of proteins. Why proteomics.

Eliminating spurious ORF’s

• Product not identified both by GFP & TAP.

• CEC values below an arbitary cut-off.

Page 25: High-throughput Proteomics David Birnbaum. Introduction What is Proteomics ? Proteomics is the analysis of genomic complements of proteins. Why proteomics.

Codon Enrichment Correlation

• Codon usage in genuine coding regions deviates systematically from randomly generated ORF’s– Preference in amino acid composition– Bias in the usage of synonymous codons

• The CEC evaluates the pattern of codon usage in potential ORF’s

Page 26: High-throughput Proteomics David Birnbaum. Introduction What is Proteomics ? Proteomics is the analysis of genomic complements of proteins. Why proteomics.

CEC

• Calculation of CEC:

• The prevalence of each codon in the 3,753 named ORF’s was calculated.

• The prevalance of each codon in a random sequence was calculated as well.

• The enrichment of each codon :Prevelance in Positive set

Prevalance in random set

Page 27: High-throughput Proteomics David Birnbaum. Introduction What is Proteomics ? Proteomics is the analysis of genomic complements of proteins. Why proteomics.

CEC

• The enrichment was calculated in the same way for each test ORF

• The CEC is the linear correlation coefficient between the test ORF and the positive set

• They could have calculated P-values instead.

Page 28: High-throughput Proteomics David Birnbaum. Introduction What is Proteomics ? Proteomics is the analysis of genomic complements of proteins. Why proteomics.

CEC - exampleYDL121C and YKR047W are two small, uncharacterized ORFs

that are listed as ‘hypothetical’ in the yeast genome database

CEC 0.7Protein observed

CEC -0.33No product detectedMarked as sporious

Page 29: High-throughput Proteomics David Birnbaum. Introduction What is Proteomics ? Proteomics is the analysis of genomic complements of proteins. Why proteomics.

Remember kellis?

Dedicated to Nir & Sonia

Page 30: High-throughput Proteomics David Birnbaum. Introduction What is Proteomics ? Proteomics is the analysis of genomic complements of proteins. Why proteomics.

Sequencing and comparison of yeast species to identify genes and

regulatory elements(*)

Mannollis Kellisחוקר ראשי: •Eric s. Landerראש מעבדה: •

(*) Kellis M, et al. (2003) Sequencing and comparison of yeast species to identify genes and regulatory elements. Nature 423: 241–254

Page 31: High-throughput Proteomics David Birnbaum. Introduction What is Proteomics ? Proteomics is the analysis of genomic complements of proteins. Why proteomics.

Remember Kellis?

• Kellis identified 496 sporious ORF’s

• 489 of those were not observed in this study.

• 381 sporious ORF’s that this study identified overlap with Kellis.

Page 32: High-throughput Proteomics David Birnbaum. Introduction What is Proteomics ? Proteomics is the analysis of genomic complements of proteins. Why proteomics.

Results

Page 33: High-throughput Proteomics David Birnbaum. Introduction What is Proteomics ? Proteomics is the analysis of genomic complements of proteins. Why proteomics.

Results

• Range of abundance 50-106

• Greater sensetivity than previous methods (MS, 2d gels)

Page 34: High-throughput Proteomics David Birnbaum. Introduction What is Proteomics ? Proteomics is the analysis of genomic complements of proteins. Why proteomics.

Results

• Are the small amounts of proteins identified represent functionally significant amounts of Protein?– Proteins not needed are not detected at all– Same distribution profile for essential proteins

and for the full set. – Transcription factors are present at low levels

Page 35: High-throughput Proteomics David Birnbaum. Introduction What is Proteomics ? Proteomics is the analysis of genomic complements of proteins. Why proteomics.

Results

Page 36: High-throughput Proteomics David Birnbaum. Introduction What is Proteomics ? Proteomics is the analysis of genomic complements of proteins. Why proteomics.

mRNA - Protein

• Observed a significant relationship between mRNA levels1 to Protein levels(rs=0.57)

• The Codon Adaptation Index is used to measure the composition of codons in a given ORF in comparison with a predefined set of reference genes.

• Higher CAI values usually mean faster translation.

• Relationship also between protein abundance and CAI values (rs=0.55)

1. Holstege, F. C. et al. Dissecting the regulatory circuitry of a eukaryotic genome. Cell 95, 717–728 (1998).

Page 37: High-throughput Proteomics David Birnbaum. Introduction What is Proteomics ? Proteomics is the analysis of genomic complements of proteins. Why proteomics.

mRNA-Protein

mRNA concentration

Pro

tein

ab

unda

nce

CAI values

CA

I va

lues

mRNA

CAI

mRNA & CAI

Protein levels vs.

Page 38: High-throughput Proteomics David Birnbaum. Introduction What is Proteomics ? Proteomics is the analysis of genomic complements of proteins. Why proteomics.

Summary

• The research demonstrated the possibility to globally determine Protein abundances.

• The results show a great variation in expression level between proteins.

• The method also helps in the identification of spurious ORF’s

Page 39: High-throughput Proteomics David Birnbaum. Introduction What is Proteomics ? Proteomics is the analysis of genomic complements of proteins. Why proteomics.

Post transcriptional expression regulation in the yeast Saccharomyces cerevisiae on a genomic scale

Beyer et. al.Molecular & Cellular Proteomics 3.11

• Goal: to invesigate the relation of transcription, translation and protein turnover on a genomic scale.

Page 40: High-throughput Proteomics David Birnbaum. Introduction What is Proteomics ? Proteomics is the analysis of genomic complements of proteins. Why proteomics.

Motivation

• To make use of the large scale data to create a more complete picture of the cell’s strategy in determining expression profiles of proteins

Page 41: High-throughput Proteomics David Birnbaum. Introduction What is Proteomics ? Proteomics is the analysis of genomic complements of proteins. Why proteomics.

Source Data

• mRNA abundance data from 36 experiments

• 30 independent measurements for > 6000 ORF’s

• The data was normalized to reduce inconsistencies between the data sets.

• Resulting dataset characterized with low noise

Page 42: High-throughput Proteomics David Birnbaum. Introduction What is Proteomics ? Proteomics is the analysis of genomic complements of proteins. Why proteomics.

Source Data

• Protein abundance data from Previous article and from Greenbaum D et. al. Comparing protein abundance and mRNA expression levels on a genomic scale. Genome Biol. 4, 117.1-117.8

• Only 1,669 ORF’s are contained in both studies.• Therefore, protein abundances are much less

certain than mRNA abundances.• However, when using the average the

correlation between Protein and mRNA levels are most significant.

• With new data the correlations will improve.

Page 43: High-throughput Proteomics David Birnbaum. Introduction What is Proteomics ? Proteomics is the analysis of genomic complements of proteins. Why proteomics.

Results

• Protein – mRNA abundance• The median value for the entire cell

– Protein 2,800– mRNA 0.7

• Median values are less affected by extreme values

• Median is more stable against variations between the studies.

• The correlation discovered was rs=0.58.

Page 44: High-throughput Proteomics David Birnbaum. Introduction What is Proteomics ? Proteomics is the analysis of genomic complements of proteins. Why proteomics.

Source Data

• Trying to make a better model.• Adding two more values which determine the

translation rate • Ribosome occupancy – the fraction of mRNA

bound to ribosomes• Ribosome density – how many ribosomes are on

one mRNA• Translational rate – Ribosome density * Ribosome occupancy• Translational activity – mRNA levels * Translational rate

Page 45: High-throughput Proteomics David Birnbaum. Introduction What is Proteomics ? Proteomics is the analysis of genomic complements of proteins. Why proteomics.

The advanced model

mRNA level

Ribosome occupancy

Ribosome density

Page 46: High-throughput Proteomics David Birnbaum. Introduction What is Proteomics ? Proteomics is the analysis of genomic complements of proteins. Why proteomics.

results

Page 47: High-throughput Proteomics David Birnbaum. Introduction What is Proteomics ? Proteomics is the analysis of genomic complements of proteins. Why proteomics.

Results

• Spearman rank correlation coefficents were taken between mRNA abundance and Protein abundance and between Translational activity and protein abundance

• The rs improved only to 0.596

Page 48: High-throughput Proteomics David Birnbaum. Introduction What is Proteomics ? Proteomics is the analysis of genomic complements of proteins. Why proteomics.

Results

• Protein to mRNA ratio is indicative of post transcriptional regulation as well as to translational activity

• The turnover rate of proteins has a significant affect on Protein-mRNA ratios

• The Protein Half-life Descriptor (PHD)

Page 49: High-throughput Proteomics David Birnbaum. Introduction What is Proteomics ? Proteomics is the analysis of genomic complements of proteins. Why proteomics.

PHD

•[Pi] – Protein concentration of the i ORF.

•[mRNAi] – mRNA concentration of the i ORF

•Ktransl,I – Translational Rate (Ribosome density * Ribosome occupancy)

•Kp – Genome wide translation constant

•Kd,I – Destruction rate of the i ORF.

Page 50: High-throughput Proteomics David Birnbaum. Introduction What is Proteomics ? Proteomics is the analysis of genomic complements of proteins. Why proteomics.

Results

Page 51: High-throughput Proteomics David Birnbaum. Introduction What is Proteomics ? Proteomics is the analysis of genomic complements of proteins. Why proteomics.

Problems

• The data about protein abundance is much less accurate than the mRNA data

• As well as the data for ribosome density, occupancy and for transcript length.

• The calculation of ribosome density was done with relation to transcript length.

• PHD calculations are based on measurements from different labs, where growth conditions might not be identical.

Page 52: High-throughput Proteomics David Birnbaum. Introduction What is Proteomics ? Proteomics is the analysis of genomic complements of proteins. Why proteomics.

Advantages

• Opens a new way of analyzing the cell’s activity.

• With more data which is sure to come, the measuerments will be more accurate.

Page 53: High-throughput Proteomics David Birnbaum. Introduction What is Proteomics ? Proteomics is the analysis of genomic complements of proteins. Why proteomics.

Protein Microarray

• Functional –individually purified proteins are spotted on a surface and analyzed for activity

• Analytical – protein specific ligands are spooted on the surface and are used to monitor levels of proteins

Page 54: High-throughput Proteomics David Birnbaum. Introduction What is Proteomics ? Proteomics is the analysis of genomic complements of proteins. Why proteomics.
Page 55: High-throughput Proteomics David Birnbaum. Introduction What is Proteomics ? Proteomics is the analysis of genomic complements of proteins. Why proteomics.

Microarray pros & cons

• Pros:– Direct identification of protein.– Identification of multiple positives.– automation

• Cons:– Proteins are not in their natural environment– Requires individual growth and purification of each

protein– Proteins may not fold correctly or may take only one

isoform.

Page 56: High-throughput Proteomics David Birnbaum. Introduction What is Proteomics ? Proteomics is the analysis of genomic complements of proteins. Why proteomics.

“Going from sequence to Going from sequence to consequence is of course what consequence is of course what proteomicsproteomics is all about.” is all about.”

Greg PetskoGreg PetskoC & E News, November 26 (2001C & E News, November 26 (2001))