Normalization for cDNA Normalization for cDNA Microarray Data Microarray Data Yee Hwa Yang, Sandrine Dudoit, Percy Luu and Terry Speed. SPIE BIOS 2001, San Jose, CA January 22, 2001
Jan 20, 2016
Normalization for cDNA Normalization for cDNA Microarray DataMicroarray Data
Yee Hwa Yang, Sandrine Dudoit, Percy Luu and Terry Speed.
SPIE BIOS 2001, San Jose, CA
January 22, 2001
Normalization issuesNormalization issues
Within-slide– What genes to use– Location– Scale
Paired-slides (dye swap)– Self-normalization
Between slides
Within-Slide NormalizationWithin-Slide Normalization
—Normalization balances red and green intensities.
—Imbalances can be caused by – Different incorporation of dyes– Different amounts of mRNA– Different scanning parameters
—In practice, we usually need to increase the red intensity a bit to balance the green
Methods?log2R/G -> log2R/G - c = log2R/ (kG)
Standard Practice (in most software)
c is a constant such that normalized log-ratios have zero mean or median.
Our Preference:
c is a function of overall spot intensity and print-tip-group.
What genes to use?— All genes on the array— Constantly expressed genes (house keeping)— Controls
– Spiked controls (e.g. plant genes)– Genomic DNA titration series
— Other set of genes
KO #8
Probes: ~6,000 cDNAs, including 200 related to lipid metabolism.
mRNA samplesR = Apo A1 KO mouse liverG = Control mouse liver(All C57Bl/6)
Experiment
M vs. AM vs. AM = log2(R / G)A = log2(R*G) / 2
Normalization - MedianNormalization - Median
—Assumption: Changes roughly symmetric
—First panel: smooth density of log2G and log2R.
—Second panel: M vs. A plot with median set to zero
Normalization - lowessNormalization - lowess— Global lowess— Assumption: changes roughly symmetric at all intensities.
Normalisation - print-tip-groupNormalisation - print-tip-groupAssumption: For every print group, changes roughly symmetric
at all intensities.
M vs. A - after print-tip-group M vs. A - after print-tip-group normalizationnormalization
Effects of Location NormalisationEffects of Location Normalisation
Before normalisation After print-tip-groupnormalisation
Within print-tip-group box plots forWithin print-tip-group box plots forprint-tip-group normalized Mprint-tip-group normalized M
Assumptions:
– All print-tip-groups have the same spread.
True ratio is ij where i represents different print-tip-groups, j represents different spots.
Observed is Mij, where
Mij = ai ij
Robust estimate of ai is
MADi = medianj { |yij - median(yij) | }
Taking scale into accountTaking scale into account
II
i i
i
MAD
MAD
1
Effect of location + scale normalizationEffect of location + scale normalization
Effect of location + scale normalizationEffect of location + scale normalization
Comparing different normalisation Comparing different normalisation methodsmethods
Follow-up ExperimentFollow-up Experiment
— 50 distinct clones with largest absolute
t-statistics from the first experiment.
— 72 other clones.
— Spot each clone 8 times .
— Two hybridizations:
Slide 1, ttt -> red ctl-> green.
Slide 2, ttt -> green ctl->red.
Follow-up Experiment
Paired-slidesPaired-slides: : dye swapdye swap
— Slide 1, M = log2 (R/G) - c
— Slide 2, M’ = log2 (R’/G’) - c’
Combine by subtract the normalized log-ratios:
[ (log2 (R/G) - c) - (log2 (R’/G’) - c’) ] / 2
[ log2 (R/G) + (log2 (G’/R’) ] / 2
[ log2 (RG’/GR’) ] / 2
provided c = c’
Assumption: the separate normalizations are the same.
Verify AssumptionVerify Assumption
Result of Self-NormalizationResult of Self-NormalizationPlot of (M - M’)/2 vs. (A + A’)/2
SummarySummaryCase 1: A few genes that are likely to changeWithin-slide:
– Location: print-tip-group lowess normalization.– Scale: for all print-tip-groups, adjust MAD to equal
the geometric mean for MAD for all print-tip-groups.
Between slides (experiments) :– An extension of within-slide scale normalization
(future work).
Case 2: Many genes changing (paired-slides)– Self-normalization: taking the difference of the two
log-ratios.– Check using controls or known information.
http://www.stat.berkeley.edu/users/terry/zarray/Html/
Technical Reports from Terry’s group:
http://www.stat.Berkeley.EDU/users/terry/zarray/Html
/papersindex.html— Comparison of Discrimination Methods for the Classification of Tumor
s Using Gene Expression Data
— Statistical methods for identifying differentially expressed genes in replicated cDNA microarray experiments.
— Comparison of methods for image analysis on cDNA microarray data.
— Normalization for cDNA Microarray Data
Statistical software R
http://lib.stat.cmu.edu/R/CRAN/
AcknowledgmentsAcknowledgments
Terry Speed
Sandrine Dudoit
Natalie Roberts
Ben Bolstad
Matt Callow (LBL)
John Ngai’s Lab (UCB)
Percy Luu
Dave Lin
Vivian Pang
Elva Diaz