Analysis of Gene Analysis of Gene Networks and Signaling Networks and Signaling Pathways Based on Gene Pathways Based on Gene Expression and Proteome Expression and Proteome Data Data Marek Kimmel Marek Kimmel Rice University Rice University Houston, TX, USA Houston, TX, USA [email protected][email protected]
45
Embed
Analysis of Gene Networks and Signaling Pathways Based on Gene Expression and Proteome Data Marek Kimmel Rice University Houston, TX, USA [email protected].
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Analysis of Gene Networks Analysis of Gene Networks and Signaling Pathways and Signaling Pathways
Based on Gene Expression Based on Gene Expression and Proteome Dataand Proteome Data
Chop into short pieces suitable for hybridizing to 25mers on GeneChipChop into short pieces suitable for hybridizing to 25mers on GeneChip
Attach chromophore, then inject Attach chromophore, then inject onto the GeneChiponto the GeneChip
Affymetrix HybridizationAffymetrix Hybridization
PMPM MMMM
AGTCGGATTAAGCGCTATACGGTTC|
AGTCGGATTAAGCGCTATACGGTTC|
AGTCGGATTAAGCGCTATACGGTTC|
AGTCGGATTAAGCGCTATACGGTTC|
AGTCGGATTAAGCGCTATACGGTTC|
AGTCGGATTAAGCGCTATACGGTTC|
AGTCGGATTAAGCGCTATACGGTTC|
AGTCGGATTAAGCGCTATACGGTTC|
AGTCGGATTAAGAGCTATACGGTTC|
AGTCGGATTAAGTGCTATACGGTTC|
AGTCGGATTAAGGGCTATACGGTTC|
AGTCGGATTAAGAGCTATACGGTTC|
AGTCGGATTAAGGGCTATACGGTTC|
AGTCGGATTAAGTGCTATACGGTTC|
AGTCGGATTAAGAGCTATACGGTTC|
AGTCGGATTAAGGGCTATACGGTTC|
|TCAGCCTAATTCGCGATATGCCAAG
|TCAGCCTAATTCGCGATATGCCAAG
X
Affymetrix HybridizationAffymetrix Hybridization
PMPM MMMM
AGTCGGATTAAGCGCTATACGGTTC|
AGTCGGATTAAGCGCTATACGGTTC|
AGTCGGATTAAGCGCTATACGGTTC|
AGTCGGATTAAGCGCTATACGGTTC|
AGTCGGATTAAGCGCTATACGGTTC|
AGTCGGATTAAGCGCTATACGGTTC|
AGTCGGATTAAGCGCTATACGGTTC|
AGTCGGATTAAGCGCTATACGGTTC|
AGTCGGATTAAGAGCTATACGGTTC|
AGTCGGATTAAGTGCTATACGGTTC|
AGTCGGATTAAGGGCTATACGGTTC|
AGTCGGATTAAGAGCTATACGGTTC|
AGTCGGATTAAGGGCTATACGGTTC|
AGTCGGATTAAGTGCTATACGGTTC|
AGTCGGATTAAGAGCTATACGGTTC|
AGTCGGATTAAGGGCTATACGGTTC|
|TCAGCCTAATTCGCGATATGCCAAG
|TCAGCCTAATTCGCGATATGCCAAG
X
FormsFormsduplex with duplex with complementary complementary strandstrand
Mismatch!
Mismatch!
MatchMatch
Probe Cell IntensitiesProbe Cell Intensities
Average Difference = Average Difference = (PM – MM)/Pairs in Average(PM – MM)/Pairs in Average
1,662
Measuring Gene ExpressionMeasuring Gene Expression“Spotted DNA Microarrays”“Spotted DNA Microarrays”
• Each spot is the cDNA for a specific gene.
• RNA from the experimental sample is labeled with Cy5 red fluorescent dye.
• RNA from the reference sample is labeled with Cy3 green fluorescent dye.
• Fluorescent intensity ratios (Cy5/Cy3) are measured.
http://www.microarrays.org/software.html
http://rana.lbl.gov/ http://www.bioinfo.utmb.edu/
Where Do We Get the Data?Where Do We Get the Data?Disease,
Pathogens,Drugs, etc…
mRNA expressedin response to stimulus
mRNA collected and hybridized onto microarray
Microarray analyzed for spot intensities
Gene co-expression patterns
cDNA Gene Microarray
MethodMethod
• Get mRNA samples from multiple Get mRNA samples from multiple conditions.conditions.
• Hybridize to DNA microarrays.Hybridize to DNA microarrays.
• Measure intensities.Measure intensities.
• Cluster.Cluster.
• Analyze results.Analyze results.
• Design new experiment.Design new experiment.
Discrimination between samplesDiscrimination between samples
• Green is “Green is “downdown”.”.• Red is “Red is “upup”.”.• We can We can
differentiate differentiate clearly between clearly between tumor and normal tumor and normal tissue.tissue.
• Can we find Can we find differences differences between between progressing and progressing and non-progressing non-progressing tumors?tumors?
Problematic quality of dataProblematic quality of data
• Note the large dynamic range.
• And the very large number of data points.
• And the limited information content.
ProteomicsProteomics• Is to protein Is to protein
expression what expression what genomics is to genomics is to gene expression.gene expression.
• Due to variations Due to variations like post-like post-translational translational modifications, modifications, there are many there are many more proteins more proteins than genes.than genes.
ProteomicsProteomics• Holds new promise for the future Holds new promise for the future
understanding of complex biological understanding of complex biological systems. systems.
• Many challenges remain, Many challenges remain, e.g.e.g. isolating, isolating, identifying, characterizing, and identifying, characterizing, and quantifying small amounts of a very large quantifying small amounts of a very large number of varieties of proteinsnumber of varieties of proteins
• Currently, we primarily use 2D gels and Currently, we primarily use 2D gels and mass spectroscopy.mass spectroscopy.
Protein Separation Using 2D Protein Separation Using 2D Gel ElectrophoresisGel Electrophoresis
• Protein analysis uses a diseased or treated Protein analysis uses a diseased or treated sample and a control sample. 2D gel sample and a control sample. 2D gel electrophoresis is performed for each sample to electrophoresis is performed for each sample to separate proteins based on their molecular separate proteins based on their molecular weight and charge. weight and charge.
• Black marks on the gel images indicate a Black marks on the gel images indicate a protein or cluster of proteins and are referred to protein or cluster of proteins and are referred to as "features." as "features."
• The x-axis is the Isoelectric point (pI) which is The x-axis is the Isoelectric point (pI) which is analagous to pH, while the y-axis is molecular analagous to pH, while the y-axis is molecular weight (Mw) or size. weight (Mw) or size.
• Gels are fixed and stained with a Gels are fixed and stained with a fluorescent dye, then scanned.fluorescent dye, then scanned.
• Expression levels are measured based on Expression levels are measured based on the size of each feature on the gel. the size of each feature on the gel.
• Provides information about those Provides information about those proteins which are up and down-proteins which are up and down-regulated, including how their abundance regulated, including how their abundance changed.changed.
• Proteins are excised from the gel and treated with a Proteins are excised from the gel and treated with a succession of enzymes that cut amino acid chains into succession of enzymes that cut amino acid chains into short polypeptides about 5-10 amino acids in length.short polypeptides about 5-10 amino acids in length.
• The polypeptide fragments for each protein are then The polypeptide fragments for each protein are then separated by capillary electrophoresis and analyzed separated by capillary electrophoresis and analyzed using rapid-throughput mass spectrometry. using rapid-throughput mass spectrometry.
• At this point, we know the amino acid sequence of the At this point, we know the amino acid sequence of the polypeptide fragments, their mass, as well as post-polypeptide fragments, their mass, as well as post-translational modifications that occurred such as translational modifications that occurred such as glycosylation and phosphorylation. glycosylation and phosphorylation.
Protein CharacterizationProtein Characterization
Systems BiologySystems Biology• Consolidates genomics and Consolidates genomics and
proteomics differential expression proteomics differential expression data into a systematic description of data into a systematic description of pathways.pathways.– Signaling pathways.Signaling pathways.– Inflammatory response pathways.Inflammatory response pathways.– Metabolic pathways.Metabolic pathways.– Etc…Etc…
• Potential for Potential for understanding the understanding the interrelationships between genes, interrelationships between genes, proteins, and disease and identifying proteins, and disease and identifying potential therapeutic targets.potential therapeutic targets.
Gene ExpressionGene Expressionvs. Protein Abundancevs. Protein Abundance
• What exactly are we measuring?What exactly are we measuring?
• What is the relationship between What is the relationship between - ““level of gene expression” andlevel of gene expression” and
- ““abundance of proteins” ?abundance of proteins” ?
Dogma of Molecular BiologyDogma of Molecular Biology
Balance equationsBalance equations
][)()();()()( lineassemblytXtTtTtXtR
][)()();()()( rateturnovertPtEtEtTtP
)()()( tXtXtR
)()()( tXtPtP
1)(//, iiiiiiiiii RPXPXR
In the steady state, for a given geneIn the steady state, for a given gene ii
Complicating FactorsComplicating Factors
• For any gene, product (protein) abundance For any gene, product (protein) abundance is not necessarily proportional to the is not necessarily proportional to the relative expression level, even under relative expression level, even under “steady state” .“steady state” .
• Products do not follow 1-order elimination Products do not follow 1-order elimination kinetics. Instead they enter into kinetics. Instead they enter into complicated interactions with each other complicated interactions with each other and with external factors.and with external factors.
Application:Application:Identification of Gene NetworksIdentification of Gene Networks
General ideas:General ideas:
• Level of expression of a gene affects levels Level of expression of a gene affects levels of expressions of other genesof expressions of other genes
• Only three levels possible: Only three levels possible: Normal (0)Normal (0)
Over-expression (1)Over-expression (1)
Under-expression (-1)Under-expression (-1)
• Data: Arrays of perturbed expression levels Data: Arrays of perturbed expression levels in a set of genesin a set of genes
““General nonlinear framework for General nonlinear framework for the analysis of gene interaction via the analysis of gene interaction via multivariate expression arrays”multivariate expression arrays”
Journal of Biomedical OpticsJournal of Biomedical Optics 5, 411– 5, 411–424424
• Data tableData table
• Perceptron function: Perceptron function: gg(.) is sigmoidal, (.) is sigmoidal,
XX’s and ’s and YY quantized to 3 levels quantized to 3 levels
)( 2211 bXaXaXagY mmpred
• Training: Estimating coefficients a so that a coefficient Training: Estimating coefficients a so that a coefficient of determination (of determination () is maximized.) is maximized.
• Of all possible dependencies, only these with Of all possible dependencies, only these with above above threshold, are retained.threshold, are retained.
General ideas:General ideas:• Use 2-D protein gels and mass spectrometry to Use 2-D protein gels and mass spectrometry to
measure abundance changes of proteins in measure abundance changes of proteins in cancer cells, relative to normal cells.cancer cells, relative to normal cells.
• Use perturbed systems to draw conclusions on Use perturbed systems to draw conclusions on some specific signaling pathways.some specific signaling pathways.
• Example:Example: Signaling pathways of one of the Signaling pathways of one of the Fibroblast growth factors (FGF-2) in breast Fibroblast growth factors (FGF-2) in breast cancer.cancer.
ReferenceReference
Hondermarck et al. (2001)Hondermarck et al. (2001)
““Proteomics of breast cancer Proteomics of breast cancer for marker discovery and for marker discovery and signal pathway profiling”signal pathway profiling”
Proteomics 1 Proteomics 1 , 1216–1232, 1216–1232
Figure 2.Figure 2.
Silver stained Silver stained 2-DE profile of 2-DE profile of MCF-7 breast MCF-7 breast cancer cells. cancer cells. The major The major proteins were proteins were determined by determined by MALDI-TOF and MALDI-TOF and MS/MS after MS/MS after trypsin trypsin digestion. digestion.
Figure 3Figure 3MALDI-TOF and MALDI-TOF and MS/MS spectra MS/MS spectra obtained forobtained forHSP70.HSP70.(A) MALDI-TOF and (A) MALDI-TOF and (B) MS/MS analysis of (B) MS/MS analysis of peak peak m/z m/z 1488.5 was 1488.5 was performed. The performed. The letters labeling the letters labeling the peaks are the single peaks are the single letter code for the letter code for the amino acids identifiedamino acids identifiedby MS/MS. Database by MS/MS. Database searching allowed the searching allowed the identificationidentificationof HSP70.of HSP70.
Figure 5Figure 5
2-D 2-D patterns patterns showing showing the down-the down-regulation regulation of 14-3-3 of 14-3-3 sigma sigma (indicated (indicated by an by an arrow) in arrow) in seven seven representatrepresentative breast ive breast tumor tumor samples samples (C–I)(C–I)
Design of experimentsDesign of experiments
• Previously depicted: “abundance Previously depicted: “abundance proteomics”, no clues as to how things proteomics”, no clues as to how things work.work.
• “ “Functional proteomics”Functional proteomics” Use perturbations of the hypothetical causal Use perturbations of the hypothetical causal
factor.factor. Measure not simply abundance but Measure not simply abundance but
Figure 7Figure 7Changes of Changes of protein synthesis protein synthesis induced by FGF-2 induced by FGF-2 stimulation in stimulation in MCF-7 breast MCF-7 breast cancer cells. 35 cancer cells. 35 S-labeled S-labeled proteins from proteins from unstimulated (A, unstimulated (A, C) or stimulated C) or stimulated (B, D) MCF-7 cells (B, D) MCF-7 cells were separated were separated by 2-DE and 2-D by 2-DE and 2-D gels were gels were subjected to subjected to autoradiography.autoradiography.
CreditsCredits
• Bruce Luxon (UTMB, Galveston, TX)Bruce Luxon (UTMB, Galveston, TX)
• George Weinstock (BCM, Houston, TX)George Weinstock (BCM, Houston, TX)
• Guy de Maupassant Guy de Maupassant
[“three major virtues of a French writer: [“three major virtues of a French writer:
clarity, clarity, and clarity”]clarity, clarity, and clarity”]