Analyzing Cross-Plattform Consistency Using Tests Against Ordered Alternatives CAMDA Emerald Competition Florian Klinglmueller 1 Thomas Tuechler 2 1 Core Unit for Medical Statistics and Informatics Medical University of Vienna fl[email protected]2 WWTF Chair for Bioinformatics BOKU University [email protected]05.12.2008 / CAMDA@Boku University
52
Embed
Analyzing Cross-Plattform Consistency Using Tests Against ...bioinf.boku.ac.at/CAMDA2008/05.12.2008/klinglm_talk.pdfIntroduction Material and Methods Experimental Design Methods Exploratory
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Analyzing Cross-Plattform Consistency UsingTests Against Ordered Alternatives
CAMDA Emerald Competition
Florian Klinglmueller1 Thomas Tuechler2
1Core Unit for Medical Statistics and InformaticsMedical University of Vienna
I Comptute one sided permutation test p-values for eachanimal, on each platform seperately with Quantile - andBaseline - normalized data.
I Combine per animal tests from each plaform.
I Combine per platform tests from each normalization.
Results
Finally!
Exploratory AnalysisDistribution of Group Means on Raw Data
●
●
●
●
●●●●
●
●
●
●
●●●
●
●
●
●
●●
●●●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●●
●
●●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●●
●
●●●●
●
●
●
●
●
●
●
●●
●●
●●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●●●
●
●
●
●●●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●●
●
●●●●
●●
●
●●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●●●
●
●
●●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●●●●
●
●
●
●
●●
●
●
●●●
●●
●
●
●
●
●
●
●
●
●
●●
●
●●●
●
●
●
●
●●●
●
●
●
●●
●
●
●
●
●●
●
●
●●
●●
●
●
●
●●
●●
●
●●
●●●●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●●
●
●●●●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●●●●
●●●
●●●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●●●●
●
●
●●●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●●
●
●
●●●
●
●
●●●
●●
●
●●
●
●
●
●●●
●●●
●
●
●●●●
●
●
●●
●
●●
●●●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●●
●●●
●
●●
●
●
●
●●●
●
●
●
●
●●
●
●
●●●
●●
●
●
●
●
●
●●●
●●●
●
●
●●●
●
●
●
●●●●
●
●●●
●
●
●
●●
●●
●●
●●●
●●
●●
●
●●●
●
●
●
●
●
●
●●
●●●
●
●
●●●
●●●
●
●●
●●●
●
●
●
●
●
●●
LM
1M
2 K
0
5
10
15
20
Affymetrix
LM
1M
2 K
0
5
10
15
20
Agilent
●
●
●
●
●●
●
●
●●●●●●
●●
●
●
●
●●
●
●●
●
●●
●
●
●●
●●●
●
●●●●
●
●●
●●●
●
●●
●
●●
●
●
●●
●
●
●
●●●
●
●
●●
●
●
●●
●●●●
●
●
●
●
●
●●●●
●
●●●●●●
●●●●
●
●●●●●
●●●
●●
●
●●
●●
●●●●●
●●
●●
●
●●●
●
●● ●●●●●
●●●
●●
●
●●
●●●●
●●●●●●●
●
●
●●●
●
●●
●●●
●
●●●●●●●
●
LM
1M
2 K
0
5
10
15
20
Illumina
I Location-shift
I Higher messenger-RNA content in kidney?
I Both normalization methods remove anyvisible trends in location
I Baseline
I Quantile - also in scale
Exploratory AnalysisDistribution of Group Means on Raw Data
●
●
●
●
●●●●
●
●
●
●
●●●
●
●
●
●
●●
●●●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●●
●
●●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●●
●
●●●●
●
●
●
●
●
●
●
●●
●●
●●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●●●
●
●
●
●●●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●●
●
●●●●
●●
●
●●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●●●
●
●
●●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●●●●
●
●
●
●
●●
●
●
●●●
●●
●
●
●
●
●
●
●
●
●
●●
●
●●●
●
●
●
●
●●●
●
●
●
●●
●
●
●
●
●●
●
●
●●
●●
●
●
●
●●
●●
●
●●
●●●●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●●
●
●●●●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●●●●
●●●
●●●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●●●●
●
●
●●●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●●●
●
●
●●●
●
●
●●●
●●
●
●●
●
●
●
●●●
●●●
●
●
●●●●
●
●
●●
●
●●
●●●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●●
●●●
●
●●
●
●
●
●●●
●
●
●
●
●●
●
●
●●●
●●
●
●
●
●
●
●●●
●●●
●
●
●●●
●
●
●
●●●●
●
●●●
●
●
●
●●
●●
●●
●●●
●●
●●
●
●●●
●
●
●
●
●
●
●●
●●●
●
●
●●●
●●●
●
●●
●●●
●
●
●
●
●
●●
LM
1M
2 K
0
5
10
15
20
Affymetrix
LM
1M
2 K
0
5
10
15
20
Agilent
●
●
●
●
●●
●
●
●●●●●●
●●
●
●
●
●●
●
●●
●
●●
●
●
●●
●●●
●
●●●●
●
●●
●●●
●
●●
●
●●
●
●
●●
●
●
●
●●●
●
●
●●
●
●
●●
●●●●
●
●
●
●
●
●●●●
●
●●●●●
●
●●●●
●
●●●●●
●●●
●●
●
●●
●●
●●●●●
●●
●●
●
●●●
●
●●●●●●●
●
●●
●
●
●
●●
●●
●●
●●●●●
●●●
●
●●●
●
●●
●●●
●
●●●●●●●
●
LM
1M
2 K
0
5
10
15
20
Illumina
I Location-shift
I Higher messenger-RNA content in kidney?
I Both normalization methods remove anyvisible trends in location
I Baseline
I Quantile - also in scale
Exploratory AnalysisDistribution of Group Means on Raw Data
●
●
●
●●●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●●●
●
●●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●●●●●
●●
●
●
●
●●
●
●
●
●●
●
●●
●
●
●
●●●●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●●●●
●
●
●
●
●●
●
●
●●●
●●
●
●
●
●
●
●
●
●
●
●●
●
●●●
●
●
●
●
●●●
●
●
●●
●●
●
●
●
●
●●
●●
●●
●●
●●
●
●●
●●
●
●●
●●●●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●●
●
●●●●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●●●
●●
●●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●●
●
●
●
●
●
●●●●
●
●
●●
●●●
●●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●●●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●●●
●●●
●
●
●
●●●●
●●
●
●
●●
●
●●
●●●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●●●
●●●
●
●●
●●
●●●●
●
●●
●●
●
●
●●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●●
●
●
●
●●
●
●
●●●
●
●
●
●
●
●●●●
●
●●●
●
●
●
●●
●●
●●
●●
●●
●
●
●●●●
●
●●●
●
●
●
●●●
●
●
●
●
●
●
●●
●
●●●
●
●
●
●●
●
●
●●●
●●●
●●
●
●
●●
●
●●
●
●
●
●
●
●●
●
LM
1M
2 K
0
5
10
15
20
Affymetrix
LM
1M
2 K
0
5
10
15
20
Agilent
●●
●●
●●●●●
●
●
●●●●●●●●●
●●
●●●●●●
●
●
●
●
●●●●●
●
●●●●●
●
●●●●●●●●●
●●
●●
●
●●
●
●
●
●●●
●●
●●
●
●●●
●●
●
●●●
●●●
●●
●●●●
●●
●
●
●
●
●
●●
●●
●
●●
●
●●
●●
●
●●●●●●
●
●
●
●
●●
●●
●
●●
●
●●●
●
●
●
●●
●●●
●●
●
●
●●
●
●
●
●●
●
●
●
●
●●●
●
●
●●●
●
LM
1M
2 K
0
5
10
15
20
Illumina
I Location-shift
I Higher messenger-RNA content in kidney?
I Both normalization methods remove anyvisible trends in location
I Baseline
I Quantile - also in scale
Exploration of TrendRelationship between Increases
M1
L
KM
2
L M1 M2 K
I Relationship betweenfirst/second increase
I Scatterplot - Illumina:Trends not linear;When first increaselarge then lastincrease small andvice versa
I Scatterplot - Agilent
I Scatterplot -Affymetrix
I Rightmost point
I Lowest point
I Saturation?
Exploration of TrendRelationship between Increases
0 2 4 6
−6
−4
−2
0
M1−L
K−
M2
Illumina
I Relationship betweenfirst/second increase
I Scatterplot - Illumina:Trends not linear;When first increaselarge then lastincrease small andvice versa
I Scatterplot - Agilent
I Scatterplot -Affymetrix
I Rightmost point
I Lowest point
I Saturation?
Exploration of TrendRelationship between Increases
0 5 10
−6
−4
−2
0
M1−L
K−
M2
Agilent
I Relationship betweenfirst/second increase
I Scatterplot - Illumina:Trends not linear;When first increaselarge then lastincrease small andvice versa
I Scatterplot - Agilent
I Scatterplot -Affymetrix
I Rightmost point
I Lowest point
I Saturation?
Exploration of TrendRelationship between Increases
0 2 4 6
−6
−5
−4
−3
−2
−1
0
M1−L
K−
M2
Affymetrix
I Relationship betweenfirst/second increase
I Scatterplot - Illumina:Trends not linear;When first increaselarge then lastincrease small andvice versa
I Scatterplot - Agilent
I Scatterplot -Affymetrix
I Rightmost point
I Lowest point
I Saturation?
Exploration of TrendRelationship between Increases
●
● ● ●
05
1015
20
Mixture
Mea
n E
xpre
ssio
n
L M1 M2 K
NM_052802
Maximum Mean Expression
I Relationship betweenfirst/second increase
I Scatterplot - Illumina:Trends not linear;When first increaselarge then lastincrease small andvice versa
I Scatterplot - Agilent
I Scatterplot -Affymetrix
I Rightmost point
I Lowest point
I Saturation?
Exploration of TrendRelationship between Increases
0 2 4 6
−6
−5
−4
−3
−2
−1
0
M1−L
K−
M2
Affymetrix
I Relationship betweenfirst/second increase
I Scatterplot - Illumina:Trends not linear;When first increaselarge then lastincrease small andvice versa
I Scatterplot - Agilent
I Scatterplot -Affymetrix
I Rightmost point
I Lowest point
I Saturation?
Exploration of TrendRelationship between Increases
● ● ●
●
05
1015
20
Mixture
Mea
n E
xpre
ssio
n
L M1 M2 K
NM_022519
Maximum Mean Expression
I Relationship betweenfirst/second increase
I Scatterplot - Illumina:Trends not linear;When first increaselarge then lastincrease small andvice versa
I Scatterplot - Agilent
I Scatterplot -Affymetrix
I Rightmost point
I Lowest point
I Saturation?
Exploration of TrendRelationship between Increases
0 2 4 6
−6
−5
−4
−3
−2
−1
0
M1−L
K−
M2
Affymetrix
I Relationship betweenfirst/second increase
I Scatterplot - Illumina:Trends not linear;When first increaselarge then lastincrease small andvice versa
I Scatterplot - Agilent
I Scatterplot -Affymetrix
I Rightmost point
I Lowest point
I Saturation?
Test Setup
Settings
I R package IsoGene provided by Lin et al.
I 20000 permutations (1 week on Cluster)
I 2 Normalization Methods × 3 Platforms × 6 Animals
I 6111 well annotated genes available on all platforms
I remove one animal from Illumina data
I Family Wise Error: Bonferoni-Holm
Proportions of Significant GenesGeneral Overview
updownnone
updownnone
updownnone
0 20 40 60 80 100
IlluminaAgilentAffymetrix
I Baseline
I Quantile
Proportions of Significant GenesGeneral Overview
updownnone
updownnone
updownnone
0 20 40 60 80 100
IlluminaAgilentAffymetrix
I Baseline
I Quantile
Agreement Between PlatformsNumber of Genes
Affy−AgilAffy−IlluAgil−Illu
All
Affy−AgilAffy−IlluAgil−Illu
All
0 20 40 60 80 100
BaselineQuantile
I Fleiss’ κ-coefficient - agreement across platforms using FWRadjusted combined p-Vaues
I Quantile Normalisation: .52
I Baseline Normalisation: .37
Agreement Between NormalizationsNumber of Genes significant
Quantile Baseline
711
1070520 3810
Fleiss κ-coefficient: .57
I around 2 times moresignificant genesexclusive to baselinethan to quantilenormalized data
I more than 97% ofgenes exclusive tobaseline normalizeddata are upregulated
I up-down in quantileexclusive genes 40:60
SummaryResults
Data
I Substantial number of genes show significant monotonicity
I Across platform agreement exceeds chance levels
I Agreement on baseline normalized data is worse
I Baseline noramlized data shows more upward trends -incomplete removal of total/messenger-RNA effect
I Genes exclusively significant in baseline data are mostlyupward trends
Methods
I Isotonic regression as a means to detect monotonic trends
I p-Value combination as a means to compare results fromdiffernt platforms.
SummaryResults
Data
I Substantial number of genes show significant monotonicity
I Across platform agreement exceeds chance levels
I Agreement on baseline normalized data is worse
I Baseline noramlized data shows more upward trends -incomplete removal of total/messenger-RNA effect
I Genes exclusively significant in baseline data are mostlyupward trends
Methods
I Isotonic regression as a means to detect monotonic trends
I p-Value combination as a means to compare results fromdiffernt platforms.
SummaryResults
Data
I Substantial number of genes show significant monotonicity
I Across platform agreement exceeds chance levels
I Agreement on baseline normalized data is worse
I Baseline noramlized data shows more upward trends -incomplete removal of total/messenger-RNA effect
I Genes exclusively significant in baseline data are mostlyupward trends
Methods
I Isotonic regression as a means to detect monotonic trends
I p-Value combination as a means to compare results fromdiffernt platforms.
SummaryResults
Data
I Substantial number of genes show significant monotonicity
I Across platform agreement exceeds chance levels
I Agreement on baseline normalized data is worse
I Baseline noramlized data shows more upward trends -incomplete removal of total/messenger-RNA effect
I Genes exclusively significant in baseline data are mostlyupward trends
Methods
I Isotonic regression as a means to detect monotonic trends
I p-Value combination as a means to compare results fromdiffernt platforms.
SummaryResults
Data
I Substantial number of genes show significant monotonicity
I Across platform agreement exceeds chance levels
I Agreement on baseline normalized data is worse
I Baseline noramlized data shows more upward trends -incomplete removal of total/messenger-RNA effect
I Genes exclusively significant in baseline data are mostlyupward trends
Methods
I Isotonic regression as a means to detect monotonic trends
I p-Value combination as a means to compare results fromdiffernt platforms.
SummaryResults
Data
I Substantial number of genes show significant monotonicity
I Across platform agreement exceeds chance levels
I Agreement on baseline normalized data is worse
I Baseline noramlized data shows more upward trends -incomplete removal of total/messenger-RNA effect
I Genes exclusively significant in baseline data are mostlyupward trends
Methods
I Isotonic regression as a means to detect monotonic trends
I p-Value combination as a means to compare results fromdiffernt platforms.
Thanks
I MSI - Martin Posch
I Statistic - Univie: Cluster
References
[1] Richard E. Barlow. Statistical Inference Under OrderRestrictions. John Wiley and Sons Ltd, 1972.
[2] D. Lin, Z. Shkedy, D. Yekutieli, T Burzykowski, H. Gaehlmann,A. Bondt, T. Perera, T. Geerts, and L. Bijnens. Testing fortrends in dose-response microarray experiments: a comparisonof several testing procedures, multiplicity and resampling-basedinference. Statistical Applications in Genetics and MolecularBiology, 2007.
[3] Tim Robertson, F. T. Wright, and R. L. Dykstra. OrderRestricted Statistical Inference. John Wiley & Sons Inc, 1988.