Analyzing Cross-Plattform Consistency Using Tests Against ...bioinf.boku.ac.at/CAMDA2008/05.12.2008/klinglm_talk.pdfIntroduction Material and Methods Experimental Design Methods Exploratory

Analyzing Cross-Plattform Consistency UsingTests Against Ordered Alternatives

CAMDA Emerald Competition

Florian Klinglmueller1 Thomas Tuechler2

1Core Unit for Medical Statistics and InformaticsMedical University of Vienna

florian.klinglmueller@meduniwien.ac.at2WWTF Chair for Bioinformatics

BOKU Universitythomas.tuechler@boku.ac.at

05.12.2008 / CAMDA@Boku University

Introduction

Material and MethodsExperimental DesignMethods

Exploratory Data AnalysisTotal-RNA to Messenger-RNASaturation

ResultsMonotone GenesAcross PlatformNormalization Effect

Discussion - OutlookSummary and Discussion

Titration

4:0 ...L

0:4 ...K

1:3 ...M2

3:1 ...M1

Kidney

Total – RNA Mixtures:

Design Hierarchy

Affymetrix

Illumina

Agilent

3 Platforms

Experimental Design:

Design Hierarchy

Affymetrix

Rat 1Illumina

Agilent

3 Platforms

6 Rats

Design Hierarchy

Affymetrix

4:0 ...L

3:1 ...M1

1:3 ...M2

0:4 ...K

Illumina

Agilent

3 Platforms

6 Rats

4 Mixtures

Design Hierarchy

Affymetrix

4:0 ...L

3:1 ...M1

1:3 ...M2

0:4 ...K

Illumina

Agilent

3 Platforms

6 Rats

4 Mixtures

3 Replicates

Main Questions

I Do the measured intensities reflect the titration?

I Agreement across platforms.

I Influence of normalization.

Tests Against Order-Restricted Alternatives

I Dose-response studiesI 70’s and 80’s literature:

I Barlow [1]I Robertson et al. [3]

I Microarray Application: Lin et al. [2]

I 5 Statistics: Marcus, Wilson, E2, M, ModifiedM

I E2 most powerful ⇒ we use E2

TestNull Hypothesis

We test the null hypothesis of equal means

H0,g : µL,g = µM1,g = µM2,g = µK ,g , (1)

against the ordered alternatives

Hup1,g : µL,g ≤ µM1,g ≤ µM2,g ≤ µK ,g , (2)

Hdown1,g : µL,g ≥ µM1,g ≥ µM2,g ≥ µK ,g , (3)

with at least one strict inequality.

I Main Principle: Isotonic Regression

Isotonic RegressionFitting Monotone Functions

Isotonic Regression: Formulation

Isotonic Function I Set T := {t1, ..., tn} with order relationI m(ti ) is called isotonic if

ti ≤ tj ⇒ m(ti ) ≤ m(tj)I F(T ): all isotonic functions on TI Direction has to be specified

Isotonic Regression I yi = m(ti ) + εi , m ∈ F(T )I Least-squares fit:

m̂ = argminm∈F(T )

∑ni=1(yi −m(ti ))2.

Isotonic RegressionExample

I T = {L ≤ M1 ≤ M2 ≤ K}I yg (ti ) = mup(ti ) + εiI Some gene expressions:

Mixtures

L M1 M2 K

unrestrictedisotonic

Isotonic RegressionUpwards Trend

I T = {L ≤ M1 ≤ M2 ≤ K}I yg (ti ) = mup(ti ) + εiI Isotonic Regression for upwards trend:

Mixtures

L M1 M2 K

● ●

Isotonic RegressionDownwards Trend

I T = {L ≥ M1 ≥ M2 ≥ K}I yg (ti ) = mdown(ti ) + εiI Isotonic Regression for downwards trend:

Mixtures

L M1 M2 K

● ● ● ●

StatisticDefinition of E2 Statistic

E2 (Barlow [1],Robertson et al. [3]):

E2up01 = 1−

∑kj(ykj − m̂up(ti ))2∑

kj(ykj − y)2, (4)

I Likelihood-ratio:

E2up01 = 1− ESS

p-Value CombinationCapturing the Hierarchical Variance Structure

I Revisit the design hierarchyI Now we add a new level: Normalization

Affymetrix

4:0 ...L

3:1 ...M1

1:3 ...M2

0:4 ...K

Illumina

Agilent

3 Platforms

6 Rats

4 Mixtures

3 ReplicatesAffymetrix

Agilent

2 Normalizations

NormalizationsBaseline vs. Quantile Normalization

I Both widely used

Baseline NormalizationAlign per array medians

1. From each array remove array-wise median

2. To each array add overall median

Removes systematic location shifts

Quantile NormalizationAlign order statistics

1. Per array - reduce expressions to ranks

2. Per array - reassign ranks to quantiles from mean distribution(means of order statistics)

Removes any systematic disturbance that keeps the order

I Both widely used

p-Value CombinationCapturing the Hierarchical Variance Structure

I Revisit the design hierarchyI We want p

Affymetrix

4:0 ...L

3:1 ...M1

1:3 ...M2

0:4 ...K

Illumina

Agilent

3 Platforms

6 Rats

4 Mixtures

Agilent

2 Normalizations

p-Value CombinationInverse Normal Method

I Combine one-sided p-values:

pC ,upg = 1− Φ(

Φ−1(1− pupig )), (5)

I pC ,downg analogue

I uniformly distritibuted conservative one-sided p-values

I Bonferroni correct directional decision:pCg = 2min(pC ,up

g , pC ,downg ).

p-Value CombinationPer Animal p-Values

I 6 Animals × 3 Platforms × 2 Normalizations → 36 timespupNorm,Plat,ig , pdown

Norm,Plat,ig , pNorm,Plat,ig

I Combine the 6 × 6 pupNorm,Plat,ig , pdown

Norm,Plat,ig to get get 6:

pCPlat ,upNorm,g , pCPlat ,down

Norm,g , and pCPlat

Norm,g

I Combine the 3 pCPlat ,upNorm,g , pCPlat ,down

Norm,g to get 2:

pCNorm,upg , pCNorm,down

Affymetrix

4:0 ...L

3:1 ...M1

1:3 ...M2

0:4 ...K

Illumina

Agilent

3 Platforms

6 Rats

4 Mixtures

Agilent

2 Normalizations

Norm,g , and pCPlat

Norm,g

Norm,g to get 2:

Affymetrix

4:0 ...L

3:1 ...M1

1:3 ...M2

0:4 ...K

Illumina

Agilent

3 Platforms

6 Rats

4 Mixtures

Agilent

2 Normalizations

Norm,g , and pCPlat

Norm,g

Norm,g to get 2:

Affymetrix

4:0 ...L

3:1 ...M1

1:3 ...M2

0:4 ...K

Illumina

Agilent

3 Platforms

6 Rats

4 Mixtures

Agilent

2 Normalizations

Norm,g , and pCPlat

Norm,g

Norm,g to get 2:

Affymetrix

4:0 ...L

3:1 ...M1

1:3 ...M2

0:4 ...K

Illumina

Agilent

3 Platforms

6 Rats

4 Mixtures

Agilent

2 Normalizations

p-Value CombinationSummary

I Comptute one sided permutation test p-values for eachanimal, on each platform seperately with Quantile - andBaseline - normalized data.

I Combine per animal tests from each plaform.

I Combine per platform tests from each normalization.

Results

Finally!

Exploratory AnalysisDistribution of Group Means on Raw Data

●●●●

●●●

●●

●●●●

●●

●●●

●●

●●●●

●●

●●●

●●

●●●●

●●

●●●●

●●

●●●●●●

●●

●●●

●●

●●●

●●

●●●●

●●

●●●

●●●●●

●●

●●●

●●●●

●●●

●●

●●●●

●●●

●●

●●●

●●

●●●

●●●●

●●

●●●

●●

●●●

●●

●●●

●●

●●●

●●

●●●

●●●●

●●●

●●

●●●

●●

●●●

●●

●●●

●●

●●●

●●

Affymetrix

Agilent

●●

●●●●●●

●●

●●●

●●●●

●●

●●●

●●

●●●

●●

●●●●

●●●●●●

●●●●

●●●●●

●●●

●●

●●●●●

●●

●●●

●● ●●●●●

●●●

●●

●●●●

●●●●●●●

●●●

●●

●●●

●●●●●●●

Illumina

I Location-shift

I Higher messenger-RNA content in kidney?

I Both normalization methods remove anyvisible trends in location

I Baseline

I Quantile - also in scale

●●●●

●●●

●●

●●●●

●●

●●●

●●

●●●●

●●

●●●

●●

●●●●

●●

●●●●

●●

●●●●●●

●●

●●●

●●

●●●

●●

●●●●

●●

●●●

●●●●●

●●

●●●

●●●●

●●●

●●

●●●●

●●●

●●

●●●●

●●●

●●

●●●

●●●●

●●

●●●

●●

●●●

●●

●●●

●●

●●●

●●

●●●

●●●●

●●●

●●

●●●

●●

●●●

●●

●●●

●●

●●●

●●

Affymetrix

Agilent

●●

●●●●●●

●●

●●●

●●●●

●●

●●●

●●

●●●

●●

●●●●

●●●●●

●●●●

●●●●●

●●●

●●

●●●●●

●●

●●●

●●●●●●●

●●

●●●●●

●●●

●●

●●●

●●●●●●●

Illumina

I Location-shift

I Baseline

●●●

●●

●●●●

●●

●●●

●●●●●

●●

●●●●

●●

●●●●

●●

●●●

●●

●●●

●●

●●●●●

●●

●●●

●●●●●

●●

●●●

●●

●●●

●●

●●●●

●●

●●●

●●

●●●

●●

●●●

●●●●

●●

●●●

●●

●●●

●●

●●●●

●●

●●●

●●●●

●●●

●●

●●●●

●●●

●●

●●●

●●

●●●

●●

Affymetrix

Agilent

●●

●●●●●

●●●●●●●●●

●●

●●●●●●

●●●●●

●●●●●●●●●

●●

●●●

●●

●●●

●●

●●●

●●

●●●●

●●

●●●●●●

●●

●●●

●●

●●●

●●

●●●

Illumina

I Location-shift

I Baseline

Exploration of TrendRelationship between Increases

L M1 M2 K

I Relationship betweenfirst/second increase

I Scatterplot - Illumina:Trends not linear;When first increaselarge then lastincrease small andvice versa

I Scatterplot - Agilent

I Scatterplot -Affymetrix

I Rightmost point

I Lowest point

I Saturation?

0 2 4 6

M1−L

Illumina

I Rightmost point

I Lowest point

I Saturation?

0 5 10

M1−L

Agilent

I Rightmost point

I Lowest point

I Saturation?

0 2 4 6

M1−L

Affymetrix

I Rightmost point

I Lowest point

I Saturation?

● ● ●

Mixture

L M1 M2 K

NM_052802

Maximum Mean Expression

I Rightmost point

I Lowest point

I Saturation?

0 2 4 6

M1−L

Affymetrix

I Rightmost point

I Lowest point

I Saturation?

● ● ●

Mixture

L M1 M2 K

NM_022519

Maximum Mean Expression

I Rightmost point

I Lowest point

I Saturation?

0 2 4 6

M1−L

Affymetrix

I Rightmost point

I Lowest point

I Saturation?

Test Setup

Settings

I R package IsoGene provided by Lin et al.

I 20000 permutations (1 week on Cluster)

I 2 Normalization Methods × 3 Platforms × 6 Animals

I 6111 well annotated genes available on all platforms

I remove one animal from Illumina data

I Family Wise Error: Bonferoni-Holm

Proportions of Significant GenesGeneral Overview

updownnone

0 20 40 60 80 100

IlluminaAgilentAffymetrix

I Baseline

I Quantile

Proportions of Significant GenesGeneral Overview

updownnone

0 20 40 60 80 100

IlluminaAgilentAffymetrix

I Baseline

I Quantile

Agreement Between PlatformsNumber of Genes

Affy−AgilAffy−IlluAgil−Illu

0 20 40 60 80 100

BaselineQuantile

I Fleiss’ κ-coefficient - agreement across platforms using FWRadjusted combined p-Vaues

I Quantile Normalisation: .52

I Baseline Normalisation: .37

Agreement Between NormalizationsNumber of Genes significant

Quantile Baseline

1070520 3810

Fleiss κ-coefficient: .57

I around 2 times moresignificant genesexclusive to baselinethan to quantilenormalized data

I more than 97% ofgenes exclusive tobaseline normalizeddata are upregulated

I up-down in quantileexclusive genes 40:60

SummaryResults

I Substantial number of genes show significant monotonicity

I Across platform agreement exceeds chance levels

I Agreement on baseline normalized data is worse

I Baseline noramlized data shows more upward trends -incomplete removal of total/messenger-RNA effect

I Genes exclusively significant in baseline data are mostlyupward trends

Methods

I Isotonic regression as a means to detect monotonic trends

I p-Value combination as a means to compare results fromdiffernt platforms.

SummaryResults

Methods

SummaryResults

Methods

SummaryResults

Methods

SummaryResults

Methods

SummaryResults

Methods

Thanks

I MSI - Martin Posch

I Statistic - Univie: Cluster

References

[1] Richard E. Barlow. Statistical Inference Under OrderRestrictions. John Wiley and Sons Ltd, 1972.

[2] D. Lin, Z. Shkedy, D. Yekutieli, T Burzykowski, H. Gaehlmann,A. Bondt, T. Perera, T. Geerts, and L. Bijnens. Testing fortrends in dose-response microarray experiments: a comparisonof several testing procedures, multiplicity and resampling-basedinference. Statistical Applications in Genetics and MolecularBiology, 2007.

[3] Tim Robertson, F. T. Wright, and R. L. Dykstra. OrderRestricted Statistical Inference. John Wiley & Sons Inc, 1988.

Thank you for your attention

Analyzing Cross-Plattform Consistency Using Tests Against ...bioinf.boku.ac.at/CAMDA2008/05.12.2008/klinglm_talk.pdfIntroduction Material and Methods Experimental Design Methods Exploratory

Documents

Die Java Plattform Strategie

Die Vernetzungsinitiative – „Mobility inside“€¦ ·...

EPP-plattform - Kristdemokraterna

En introduksjon til - MarkedsPartner · gjøre for din...

Lærerprofesjonens etiske plattform

Kommunalpolitisk plattform

Пермские новости №49 (1492) 05.12.2008

OPS Forum SSA Preparatory Programme 05.12.2008

Michel Houellebecq Plattform

Pressemappe Plattform gegen Rechts

Lærarprofesjonen si etiske plattform

Intelligente Cloud-Identity- Plattform

WHITEPAPER EINE PLATTFORM ZUR MODERNISIERUNG ... - … ·.....

Plattform Kooperativismus als Antwort auf den Plattform...

INTERNET-PLATTFORM FÜR SCHÜLER

Plattform Industrie 4