1 Data Acquisition Data Acquisition DNA microarrays The functional genomics pipeline The functional genomics pipeline Experimental design affects outcome data analysis Supervised Analysis Differential analysis, Classification, … Unsupervised Analysis Clustering, Bi-clustering, … Enrichment analysis GO annotation, GSEA, … “In silico” testing Cross validation, train/test, etc, “In vitro” testing Back to the lab Data acquisition microarray processing Data preprocessing scaling/normalization/filtering Data analysis/Hypothesis generation Data analysis/Hypothesis generation Validation/Annotation Validation/Annotation
12
Embed
Data Acquisition - Broad InstituteBack to the lab Data acquisition microarray processing Data preprocessing ... 2. green dot: the gene is expressed in the control but not in the treated
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
1
Data AcquisitionData AcquisitionDNA microarrays
The functional genomics pipelineThe functional genomics pipelineExperimental designaffects outcome data analysis
Data analysis/Hypothesis generationData analysis/Hypothesis generation
Validation/AnnotationValidation/Annotation
2
MicroarraysMicroarrays• A technology that has reshaped molecular biology.
• Traditional methods in molecular biology work on a "one gene in one experiment" basis, but the "whole picture" of gene function is hard to obtain. –– hypothesis testinghypothesis testing approach.
• Microarray technology: put thousands of genes on a chip to measure their activity/expression simultaneously.–– hypothesis generationhypothesis generation approach.
• Main technologies: cDNA arrayscDNA arrays and oligonucleotideoligonucleotide chipschips.
• “Gene expression levels” measured in terms of mRNA abundance.
From Tissues to MicroarraysFrom Tissues to Microarrays
N Engl J Med, 354: 2463, 2006.
3
MicroarraysMicroarrays• DNA Microarray - A technology that is reshaping molecular biology.• Traditional methods in molecular biology generally work on a "one
gene in one experiment" basis, but the "whole picture" of gene function is hard to obtain. – This is a hypothesis drivenhypothesis driven approach.
• Microarray technology: put thousands of genes on a chip to measure their activity/expression simultaneously.– This is a hypothesis generationhypothesis generation approach!
• Main technologies: cDNA arrayscDNA arrays and oligonucleotideoligonucleotide arraysarrays.• Both measure the “gene expression levels” in terms of mRNA
abundance.
cDNA cDNA MicroarraysMicroarraysFor each gene, synthesize its sequence and print/spot it on the chip surface.
The labeled probes are allowed to bind to complementary DNA strands (targets) on the slides.
Examination of the fluorescence in each probe tells us which gene is present in which sample.
Nomenclature: probe and target are interchanged in
cDNA and oligo arrays.
Prepare cDNA targetsPrepare cDNA targets
Hybridize target to
microarray
4
ImageImage1.1. red dotred dot: the gene is
expressed in the treated sample but not in the control;
2.2. green dotgreen dot: the gene is expressed in the control but not in the treated sample;
3.3. yellow dotyellow dot: the gene is expressed in both samples;
4.4. grey dotgrey dot: the gene is expressed in neithersamples.
Gene name Healthy Tumortranscription terminat 0.72 0.1selenoprotein P plasm 1.58 1.05Hs protein (peptidyl-p 1.1 0.97erythrocyte membran 0.97 1acid-inducible phosph 1.21 1.29ESTs 1.45 1.44lumican 1.15 1.1cathepsin K (pycnody 1.32 1.35carnitine palmitoyltra 1.01 1.38KIAA0455 gene prod 0.85 1.03ESTs 1.12 0.92ribosomal protein L5 1.23 1.21
Gene name Ratiotranscription terminati 0.14selenoprotein P plasm 0.66Hs protein (peptidyl-p 0.88erythrocyte membran 1.03acid-inducible phosph 1.07ESTs 0.99lumican 0.96cathepsin K (pycnody 1.02carnitine palmitoyltran 1.37KIAA0455 gene produ 1.21ESTs 0.82ribosomal protein L5 0.98
From Image to DataFrom Image to Data
genes
samples
Healthy cell Tumor cell
One microarray gives measures for genes in two conditions
not necessarily paired
reference sample sample of interest
5
Oligonucleotide MicroarraysOligonucleotide Microarrays• Oligonucleotide arrays: Affymetrix genechip.• Represent a gene with a set of 11-20 probe pairsprobe pairs:
– Each probe (oligonucleotide) is a 25-long sequence of bases characteristic of one gene.
Oligonucleotide MicroarraysOligonucleotide Microarrays• Oligonucleotide arrays : Affymetrix genechip.• Represent a gene with a set of 11-20 probe pairsprobe pairs:
– Each probe (oligonucleotide) is a 25-long sequence of bases characteristic of one gene.
• Each probe pairprobe pair consists of:–– Perfect matchPerfect match (PM): a probe that should hybridize.–– MismatchMismatch (MM): a probe that should not hybridize, because the
central base has been inverted (internal control).
– Irizarry et al., Nat Meth 2(5): 1-5– Larkin et al., Nat Meth 2(5): 337-343
•• MultiMulti--lab comparison of three platformslab comparison of three platforms– 1-color short oligos (Affy)– 2-color cDNA– 2-color long oligos
•• ResultsResults– Affymetrix accuracy best overall.– Precision comparable among platforms.– Lab effect stronger than platform effect.– Reproducibility across labs and platform good although not perfect.
ChipMan A multiplicative model similar to that of dChip is fit to the PM Linear transformation (Lauren, 2003)
dChip A multiplicative model is fit MM intensities are subtracted Spline fitted to rank invariant set (Li and Wong, 2001)GL As RMA None Loess fitted to subset (Freudenberg, 2005)
gMOSv.1
Parameters from a gamma model are estimated from the PM and MM. These account for background and signal
(Milo et al., 2003)
GCRMA As RMA Based on probe sequnece As RMA (Wu et al., 2004)GSVDmod Generalized SVD is used None Scale normalization (Zuzan, 2003)MAS5.0 A robust average (Tukey biweight) Spatial effect and MM subtracted Scale normalization (Affymetrix, 2002)MMEI A linear mixed model is fitted None Linear mixed model used as well (Deng et al., 2005)
PerfectMatch
Model accounts for background and signal. The non-specific and specific effects are predicted using a free energy model
(Zhang et al., 2003)
PLIER A multiplicative model is fitted to PM-MM. Accounts for heteroskedacity As RMA (Hubbell et al., 2004)