Techniques for Analysing Microarrays Which genes are involved in ovarian and prostate cancer?
Techniques forAnalysing Microarrays
Which genes are involved in ovarian and prostate cancer?
Common Questions(1) Which genes are “up” or “down” in different conditions• Cancer patient versus Normal• Non-invasive cancer versus invasive cancer
(2) Which genes can differentiate between cancer sub-types?
(3) Which genes relate to the survival of the patient?
(4) Which genes may be in the same pathway as a gene of interest?
EOS chips
• Use Affymetrix GeneChip technology• 25mers• 8 probes in a probe set• 59,000 probe sets ~ 46,000 gene clusters
(all human expressed sequences known at time)
• Normalised distributions of all chips to each other (gamma distribution)
• Single measure of intensity for each probe set (Tukey’s trimean)
Var
ianc
e (l
inea
r sc
ale)
Var
ianc
e (l
og s
cale
)
mean
mean
After the “fix”…..(Add constant and log2)
Data after “normalisation”
Variance increases with mean
Which genes are differentially expressed between
ovarian cancer and normal ovaries?
•6 normal ovaries
•38 ovarian cancers
o3 mucinous
o5 endometriod
o30 serous
Statistical techniques
•ranked t-statistics (unequal variance)
•quantile-quantile plots against normal distribution
•Westfall and Young permutation test
http://stat-www.berkeley.edu/users/terry/zarray/Html/
S. Dudoit, Y.H. Yang, M. J. Callow and T.P.Speed. Statistical methods for identifying differentially expressed genes in replicated cDNA microarray experiments. August 2000
•Ratios of Cancer/Normal
.
t statistic
2
22
1
21
12
ns
ns
xxt
The tstat gets more extreme as
Difference in means
The standard deviation of each of the two samples
The size of the samples
0 +ve-vetstats ranked
Quantile-Quantile Plot
R library(sma) or R library(base)
Westfall and Young PermutationtpWY program: http://www.cbil.upenn.edu/tpWY/
6 normal ovaries, 38 ovarian cancers
• Randomise labels (OvCa, N)• Compute tstats• 100,000 iterations
• Unadjusted p value:Proportion of iterations where
• p value adjusted for multiple testing
ttiteration
How many genes were “statistically” significant?
Ovarian Cancer Normal(Candidates for antibody therapy?)
• 110 candidates (adjusted p<0.01)• 181 candidates (adjusted p <0.05)
Ovarian Cancer Normal(Candidates for tumor suppressor genes?)
• 7 candidates (adjusted p<0.01)• 15 candidates (adjusted p<0.05)
0100200300400500600700800900
High in cancer
Excel
-150
-100
-50
0
50
100
150
200
250
300
Low in cancer
Excel
How can we deal with(a) Biological variation?(b) More than one cause for cancer?
Which genes are differentially expressed between
non-invasive and invasive ovarian cancer?
No. samples.Non-invasive Invasive
Mucinous 5 4Endometriod 1 7Serous 2 33
Future: Model all variables togetherNow: ranked t-stats, qqplots
Assume equal variance for t-stats?
S2
non-invasive (n=5)
S2
inva
sive
(n
=4)
Theoretical quantiles (F distribution)
Rat
io v
aria
nce
s
eg.mucinous cancer
What to do when n=2?
Assume equal variance?
Error model?
Limitations of Westfall & Young permutation method
No. samples. No. Permut.Non-invasive Invasive
Mucinous 5 4 126Endometriod 1 7 ---Serous 2 33 595
Not enough power when small sample sizes?
Mucinous: non-invasive versus invasive
R library(base)
Which genes relate to prognosis of patients with prostate cancer?
Methods: R survival package & SAS
• 72 patients with prostate cancer
• Treatment: Radical prostatectomy
• 17 relapsed: PSA rise >0.4ng/ml
p
iii X
ethXth 1)(),( 0
Baseline hazard:(Independent of gene expression or PSA)
Exponential:(InvolvesGene & PSAIndependent of Time)
Cox Proportional Hazards Model
A
B
relapsed
B
Survival Curves: Gene +PSA model
High (>= 25th percentile) Low (< 25th percentile).
S(t
)
S(t
)Time(disease free months) Time(disease free months)
Probe set Hazards Ratio unadjusted p value
A 0.26 (95% CI: 0.12 to 0.54) 0.000351
B 0.32 (95% CI : 0.16 to 0.67) 0.002151
* False discovery rate for top 50 candidates is 20% (SAM)
Hazard Ratio: 75th/25th percentile
Summary(1) Which genes are “up” or “down” in different
conditions?- ranked t-statistics- qq plots (normal distribution)- Westfall & Young permutations (multiple testing)
(2) Which genes relate to the survival of the patient?- Cox proportional hazards- SAM multiple testing
Acknowledgements
• Garvan– Sue Henshall,
Rob Sutherland,Patricia Vanden Bergh
• EOS– Jordan Hiller, Daniel Afar,
Kurt Gish, David Mack
• Royal Hospital for Women– Nigel Hacker
• ANU/John Curtin– John Maindonald
– Yvonne Pittelkow
• Walter and Elisa Hall Institute– Terry Speed,
Natalie Thorne
• University of Queensland– Jessica Marr