Analyzing Cross-Plattform Consistency Using Tests Against ...bioinf.boku.ac.at/CAMDA2008/05.12.2008/klinglm_talk.pdfIntroduction Material and Methods Experimental Design Methods Exploratory

Analyzing Cross-Plattform Consistency UsingTests Against Ordered Alternatives

CAMDA Emerald Competition

Florian Klinglmueller1 Thomas Tuechler2

1Core Unit for Medical Statistics and InformaticsMedical University of Vienna

[email protected] Chair for Bioinformatics

BOKU [email protected]

05.12.2008 / CAMDA@Boku University

Introduction

Material and MethodsExperimental DesignMethods

Exploratory Data AnalysisTotal-RNA to Messenger-RNASaturation

ResultsMonotone GenesAcross PlatformNormalization Effect

Discussion - OutlookSummary and Discussion

Titration

4:0 ...L

0:4 ...K

1:3 ...M2

3:1 ...M1

Liver

Kidney

Total – RNA Mixtures:

Design Hierarchy

Affymetrix

Illumina

Agilent

3 Platforms

Experimental Design:

Design Hierarchy

Affymetrix

...

...

...

...

Rat 2

Rat 1Illumina

Agilent

3 Platforms

6 Rats


Design Hierarchy

Affymetrix

...

...

...

...

Rat 2

Rat 1

4:0 ...L

3:1 ...M1

1:3 ...M2

0:4 ...K

Illumina

Agilent

3 Platforms

6 Rats

4 Mixtures


Design Hierarchy

Affymetrix

...

...

...

...

Rat 2

Rat 1

4:0 ...L

3:1 ...M1

1:3 ...M2

0:4 ...K

Rep 1

Rep 2

Rep 3

Illumina

Agilent

3 Platforms

6 Rats

4 Mixtures

3 Replicates


Main Questions

I Do the measured intensities reflect the titration?

I Agreement across platforms.

I Influence of normalization.

Tests Against Order-Restricted Alternatives

I Dose-response studiesI 70’s and 80’s literature:

I Barlow [1]I Robertson et al. [3]

I Microarray Application: Lin et al. [2]

I 5 Statistics: Marcus, Wilson, E2, M, ModifiedM

I E2 most powerful ⇒ we use E2

TestNull Hypothesis

We test the null hypothesis of equal means

H0,g : µL,g = µM1,g = µM2,g = µK ,g , (1)

against the ordered alternatives

Hup1,g : µL,g ≤ µM1,g ≤ µM2,g ≤ µK ,g , (2)

Hdown1,g : µL,g ≥ µM1,g ≥ µM2,g ≥ µK ,g , (3)

with at least one strict inequality.

I Main Principle: Isotonic Regression

Isotonic RegressionFitting Monotone Functions

Isotonic Regression: Formulation

Isotonic Function I Set T := {t1, ..., tn} with order relationI m(ti ) is called isotonic if

ti ≤ tj ⇒ m(ti ) ≤ m(tj)I F(T ): all isotonic functions on TI Direction has to be specified

Isotonic Regression I yi = m(ti ) + εi , m ∈ F(T )I Least-squares fit:

m̂ = argminm∈F(T )

∑ni=1(yi −m(ti ))2.

Isotonic RegressionExample

I T = {L ≤ M1 ≤ M2 ≤ K}I yg (ti ) = mup(ti ) + εiI Some gene expressions:

●

●

●

●

1.0

2.0

3.0

4.0

Mixtures

Exp

ress

ion

L M1 M2 K

unrestrictedisotonic

Isotonic RegressionUpwards Trend

I T = {L ≤ M1 ≤ M2 ≤ K}I yg (ti ) = mup(ti ) + εiI Isotonic Regression for upwards trend:

●

●

●

●

1.0

2.0

3.0

4.0

Mixtures

Exp

ress

ion

L M1 M2 K

●

● ●

●


Isotonic RegressionDownwards Trend

I T = {L ≥ M1 ≥ M2 ≥ K}I yg (ti ) = mdown(ti ) + εiI Isotonic Regression for downwards trend:

●

●

●

●

1.0

2.0

3.0

4.0

Mixtures

Exp

ress

ion

L M1 M2 K

● ● ● ●


StatisticDefinition of E2 Statistic

E2 (Barlow [1],Robertson et al. [3]):

E2up01 = 1−

∑kj(ykj − m̂up(ti ))2∑

kj(ykj − y)2, (4)

I Likelihood-ratio:

E2up01 = 1− ESS

TSS

p-Value CombinationCapturing the Hierarchical Variance Structure

I Revisit the design hierarchyI Now we add a new level: Normalization

Affymetrix

...

...

...

...

Rat 2

Rat 1

4:0 ...L

3:1 ...M1

1:3 ...M2

0:4 ...K

Rep 1

Rep 2

Rep 3

Illumina

Agilent

3 Platforms

6 Rats

4 Mixtures

3 ReplicatesAffymetrix

Agilent

2 Normalizations

NormalizationsBaseline vs. Quantile Normalization

I Both widely used

Baseline NormalizationAlign per array medians

1. From each array remove array-wise median

2. To each array add overall median

Removes systematic location shifts

Quantile NormalizationAlign order statistics

1. Per array - reduce expressions to ranks

2. Per array - reassign ranks to quantiles from mean distribution(means of order statistics)

Removes any systematic disturbance that keeps the order


I Both widely used










I Both widely used









p-Value CombinationCapturing the Hierarchical Variance Structure

I Revisit the design hierarchyI We want p

Affymetrix

...

...

...

...

Rat 2

Rat 1

4:0 ...L

3:1 ...M1

1:3 ...M2

0:4 ...K

Rep 1

Rep 2

Rep 3

Illumina

Agilent

3 Platforms

6 Rats

4 Mixtures


Agilent

2 Normalizations

p-Value CombinationInverse Normal Method

I Combine one-sided p-values:

pC ,upg = 1− Φ(

1√N

∑i

Φ−1(1− pupig )), (5)

I pC ,downg analogue

I uniformly distritibuted conservative one-sided p-values

I Bonferroni correct directional decision:pCg = 2min(pC ,up

g , pC ,downg ).

p-Value CombinationPer Animal p-Values

I 6 Animals × 3 Platforms × 2 Normalizations → 36 timespupNorm,Plat,ig , pdown

Norm,Plat,ig , pNorm,Plat,ig

I Combine the 6 × 6 pupNorm,Plat,ig , pdown

Norm,Plat,ig to get get 6:

pCPlat ,upNorm,g , pCPlat ,down

Norm,g , and pCPlat

Norm,g

I Combine the 3 pCPlat ,upNorm,g , pCPlat ,down

Norm,g to get 2:

pCNorm,upg , pCNorm,down

g

Affymetrix

...

...

...

...

Rat 2

Rat 1

4:0 ...L

3:1 ...M1

1:3 ...M2

0:4 ...K

Rep 1

Rep 2

Rep 3

Illumina

Agilent

3 Platforms

6 Rats

4 Mixtures


Agilent

2 Normalizations







Norm,g , and pCPlat

Norm,g


Norm,g to get 2:


g

Affymetrix

...

...

...

...

Rat 2

Rat 1

4:0 ...L

3:1 ...M1

1:3 ...M2

0:4 ...K

Rep 1

Rep 2

Rep 3

Illumina

Agilent

3 Platforms

6 Rats

4 Mixtures


Agilent

2 Normalizations







Norm,g , and pCPlat

Norm,g


Norm,g to get 2:


g

Affymetrix

...

...

...

...

Rat 2

Rat 1

4:0 ...L

3:1 ...M1

1:3 ...M2

0:4 ...K

Rep 1

Rep 2

Rep 3

Illumina

Agilent

3 Platforms

6 Rats

4 Mixtures


Agilent

2 Normalizations







Norm,g , and pCPlat

Norm,g


Norm,g to get 2:


g

Affymetrix

...

...

...

...

Rat 2

Rat 1

4:0 ...L

3:1 ...M1

1:3 ...M2

0:4 ...K

Rep 1

Rep 2

Rep 3

Illumina

Agilent

3 Platforms

6 Rats

4 Mixtures


Agilent

2 Normalizations

p-Value CombinationSummary

I Comptute one sided permutation test p-values for eachanimal, on each platform seperately with Quantile - andBaseline - normalized data.

I Combine per animal tests from each plaform.

I Combine per platform tests from each normalization.

Results

Finally!

Exploratory AnalysisDistribution of Group Means on Raw Data

●

●

●

●

●●●●

●

●

●

●

●●●

●

●

●

●

●●

●●●●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●●

●

●

●●

●

●●

●

●●●

●

●

●

●

●

●

●

●

●

●

●

●●

●●

●

●●

●

●●●●

●

●

●

●

●

●

●

●●

●●

●●

●

●

●●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●●●

●

●

●

●●●

●

●

●

●●●

●

●

●

●

●

●●

●

●

●●

●

●

●

●

●●

●

●●●●

●●

●

●●

●

●

●

●●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●●●●

●

●

●●

●

●●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●●●●●

●

●

●

●

●●

●

●

●●●

●●

●

●

●

●

●

●

●

●

●

●●

●

●●●

●

●

●

●

●●●

●

●

●

●●

●

●

●

●

●●

●

●

●●

●●

●

●

●

●●

●●

●

●●

●●●●

●

●

●●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●●●

●

●●●●●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●●

●

●

●

●●●●

●●●

●●●

●

●

●

●

●

●

●

●

●

●●

●

●●

●

●

●

●●●●

●

●

●●●

●●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●●

●●

●

●

●●●

●

●

●●●

●●

●

●●

●

●

●

●●●

●●●

●

●

●●●●

●

●

●●

●

●●

●●●

●

●

●

●

●

●

●

●

●

●

●

●●●

●

●

●●

●●●

●

●●

●

●

●

●●●

●

●

●

●

●●

●

●

●●●

●●

●

●

●

●

●

●●●

●●●

●

●

●●●

●

●

●

●●●●

●

●●●

●

●

●

●●

●●

●●

●●●

●●

●●

●

●●●

●

●

●

●

●

●

●●

●●●

●

●

●●●

●●●

●

●●

●●●

●

●

●

●

●

●●

LM

1M

2 K

0

5

10

15

20

Affymetrix

LM

1M

2 K

0

5

10

15

20

Agilent

●

●

●

●

●●

●

●

●●●●●●

●●

●

●

●

●●

●

●●

●

●●

●

●

●●

●●●

●

●●●●

●

●●

●●●

●

●●

●

●●

●

●

●●

●

●

●

●●●

●

●

●●

●

●

●●

●●●●

●

●

●

●

●

●●●●

●

●●●●●●

●●●●

●

●●●●●

●●●

●●

●

●●

●●

●●●●●

●●

●●

●

●●●

●

●● ●●●●●

●●●

●●

●

●●

●●●●

●●●●●●●

●

●

●●●

●

●●

●●●

●

●●●●●●●

●

LM

1M

2 K

0

5

10

15

20

Illumina

I Location-shift

I Higher messenger-RNA content in kidney?

I Both normalization methods remove anyvisible trends in location

I Baseline

I Quantile - also in scale


●

●

●

●

●●●●

●

●

●

●

●●●

●

●

●

●

●●

●●●●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●●

●

●

●●

●

●●

●

●●●

●

●

●

●

●

●

●

●

●

●

●

●●

●●

●

●●

●

●●●●

●

●

●

●

●

●

●

●●

●●

●●

●

●

●●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●●●

●

●

●

●●●

●

●

●

●●●

●

●

●

●

●

●●

●

●

●●

●

●

●

●

●●

●

●●●●

●●

●

●●

●

●

●

●●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●●●●

●

●

●●

●

●●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●●●●●

●

●

●

●

●●

●

●

●●●

●●

●

●

●

●

●

●

●

●

●

●●

●

●●●

●

●

●

●

●●●

●

●

●

●●

●

●

●

●

●●

●

●

●●

●●

●

●

●

●●

●●

●

●●

●●●●

●

●

●●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●●●

●

●●●●●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●●

●

●

●

●●●●

●●●

●●●

●

●

●

●

●

●

●

●

●

●●

●

●●

●

●

●

●●●●

●

●

●●●

●●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●●●●

●

●

●●●

●

●

●●●

●●

●

●●

●

●

●

●●●

●●●

●

●

●●●●

●

●

●●

●

●●

●●●

●

●

●

●

●

●

●

●

●

●

●

●●●

●

●

●●

●●●

●

●●

●

●

●

●●●

●

●

●

●

●●

●

●

●●●

●●

●

●

●

●

●

●●●

●●●

●

●

●●●

●

●

●

●●●●

●

●●●

●

●

●

●●

●●

●●

●●●

●●

●●

●

●●●

●

●

●

●

●

●

●●

●●●

●

●

●●●

●●●

●

●●

●●●

●

●

●

●

●

●●

LM

1M

2 K

0

5

10

15

20

Affymetrix

LM

1M

2 K

0

5

10

15

20

Agilent

●

●

●

●

●●

●

●

●●●●●●

●●

●

●

●

●●

●

●●

●

●●

●

●

●●

●●●

●

●●●●

●

●●

●●●

●

●●

●

●●

●

●

●●

●

●

●

●●●

●

●

●●

●

●

●●

●●●●

●

●

●

●

●

●●●●

●

●●●●●

●

●●●●

●

●●●●●

●●●

●●

●

●●

●●

●●●●●

●●

●●

●

●●●

●

●●●●●●●

●

●●

●

●

●

●●

●●

●●

●●●●●

●●●

●

●●●

●

●●

●●●

●

●●●●●●●

●

LM

1M

2 K

0

5

10

15

20

Illumina

I Location-shift



I Baseline



●

●

●

●●●

●

●

●

●

●

●●

●

●

●

●●

●

●

●

●●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●●●●

●

●●

●

●

●

●●

●●

●

●

●

●

●

●

●

●

●

●●●

●

●

●

●●●●●

●●

●

●

●

●●

●

●

●

●●

●

●●

●

●

●

●●●●

●

●

●

●

●

●

●

●

●●

●

●●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●●

●●●●

●

●

●

●

●●

●

●

●●●

●●

●

●

●

●

●

●

●

●

●

●●

●

●●●

●

●

●

●

●●●

●

●

●●

●●

●

●

●

●

●●

●●

●●

●●

●●

●

●●

●●

●

●●

●●●●●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●●●

●

●●●●●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●●

●

●

●

●

●●●

●●

●●●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●●

●

●

●●

●

●

●

●

●

●●●●

●

●

●●

●●●

●●●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●●

●

●

●

●

●●

●

●●

●

●

●

●

●

●●

●

●

●

●

●

●●●

●

●●

●

●

●●

●

●

●

●

●

●

●

●

●

●●●

●●●

●

●

●

●●●●

●●

●

●

●●

●

●●

●●●

●

●

●●

●

●

●

●

●

●

●●

●

●

●

●

●

●●●

●●●

●

●●

●●

●●●●

●

●●

●●

●

●

●●

●

●

●●

●

●

●

●

●●

●

●

●

●

●

●●

●

●●

●

●

●

●

●

●●

●

●

●

●●

●

●

●●●

●

●

●

●

●

●●●●

●

●●●

●

●

●

●●

●●

●●

●●

●●

●

●

●●●●

●

●●●

●

●

●

●●●

●

●

●

●

●

●

●●

●

●●●

●

●

●

●●

●

●

●●●

●●●

●●

●

●

●●

●

●●

●

●

●

●

●

●●

●

LM

1M

2 K

0

5

10

15

20

Affymetrix

LM

1M

2 K

0

5

10

15

20

Agilent

●●

●●

●●●●●

●

●

●●●●●●●●●

●●

●●●●●●

●

●

●

●

●●●●●

●

●●●●●

●

●●●●●●●●●

●●

●●

●

●●

●

●

●

●●●

●●

●●

●

●●●

●●

●

●●●

●●●

●●

●●●●

●●

●

●

●

●

●

●●

●●

●

●●

●

●●

●●

●

●●●●●●

●

●

●

●

●●

●●

●

●●

●

●●●

●

●

●

●●

●●●

●●

●

●

●●

●

●

●

●●

●

●

●

●

●●●

●

●

●●●

●

LM

1M

2 K

0

5

10

15

20

Illumina

I Location-shift



I Baseline


Exploration of TrendRelationship between Increases

M1

L

KM

2

L M1 M2 K

I Relationship betweenfirst/second increase

I Scatterplot - Illumina:Trends not linear;When first increaselarge then lastincrease small andvice versa

I Scatterplot - Agilent

I Scatterplot -Affymetrix

I Rightmost point

I Lowest point

I Saturation?


0 2 4 6

−6

−4

−2

0

M1−L

K−

M2

Illumina





I Rightmost point

I Lowest point

I Saturation?


0 5 10

−6

−4

−2

0

M1−L

K−

M2

Agilent





I Rightmost point

I Lowest point

I Saturation?


0 2 4 6

−6

−5

−4

−3

−2

−1

0

M1−L

K−

M2

Affymetrix





I Rightmost point

I Lowest point

I Saturation?


●

● ● ●

05

1015

20

Mixture

Mea

n E

xpre

ssio

n

L M1 M2 K

NM_052802

Maximum Mean Expression





I Rightmost point

I Lowest point

I Saturation?


0 2 4 6

−6

−5

−4

−3

−2

−1

0

M1−L

K−

M2

Affymetrix





I Rightmost point

I Lowest point

I Saturation?


● ● ●

●

05

1015

20

Mixture

Mea

n E

xpre

ssio

n

L M1 M2 K

NM_022519

Maximum Mean Expression





I Rightmost point

I Lowest point

I Saturation?


0 2 4 6

−6

−5

−4

−3

−2

−1

0

M1−L

K−

M2

Affymetrix





I Rightmost point

I Lowest point

I Saturation?

Test Setup

Settings

I R package IsoGene provided by Lin et al.

I 20000 permutations (1 week on Cluster)

I 2 Normalization Methods × 3 Platforms × 6 Animals

I 6111 well annotated genes available on all platforms

I remove one animal from Illumina data

I Family Wise Error: Bonferoni-Holm

Proportions of Significant GenesGeneral Overview

updownnone

updownnone

updownnone

0 20 40 60 80 100

IlluminaAgilentAffymetrix

I Baseline

I Quantile

Proportions of Significant GenesGeneral Overview

updownnone

updownnone

updownnone

0 20 40 60 80 100

IlluminaAgilentAffymetrix

I Baseline

I Quantile

Agreement Between PlatformsNumber of Genes

Affy−AgilAffy−IlluAgil−Illu

All

Affy−AgilAffy−IlluAgil−Illu

All

0 20 40 60 80 100

BaselineQuantile

I Fleiss’ κ-coefficient - agreement across platforms using FWRadjusted combined p-Vaues

I Quantile Normalisation: .52

I Baseline Normalisation: .37

Agreement Between NormalizationsNumber of Genes significant

Quantile Baseline

711

1070520 3810

Fleiss κ-coefficient: .57

I around 2 times moresignificant genesexclusive to baselinethan to quantilenormalized data

I more than 97% ofgenes exclusive tobaseline normalizeddata are upregulated

I up-down in quantileexclusive genes 40:60

SummaryResults

Data

I Substantial number of genes show significant monotonicity

I Across platform agreement exceeds chance levels

I Agreement on baseline normalized data is worse

I Baseline noramlized data shows more upward trends -incomplete removal of total/messenger-RNA effect

I Genes exclusively significant in baseline data are mostlyupward trends

Methods

I Isotonic regression as a means to detect monotonic trends

I p-Value combination as a means to compare results fromdiffernt platforms.

SummaryResults

Data






Methods



SummaryResults

Data






Methods



SummaryResults

Data






Methods



SummaryResults

Data






Methods



SummaryResults

Data






Methods



Thanks

I MSI - Martin Posch

I Statistic - Univie: Cluster

References

[1] Richard E. Barlow. Statistical Inference Under OrderRestrictions. John Wiley and Sons Ltd, 1972.

[2] D. Lin, Z. Shkedy, D. Yekutieli, T Burzykowski, H. Gaehlmann,A. Bondt, T. Perera, T. Geerts, and L. Bijnens. Testing fortrends in dose-response microarray experiments: a comparisonof several testing procedures, multiplicity and resampling-basedinference. Statistical Applications in Genetics and MolecularBiology, 2007.

[3] Tim Robertson, F. T. Wright, and R. L. Dykstra. OrderRestricted Statistical Inference. John Wiley & Sons Inc, 1988.

Thank you for your attention

Analyzing Cross-Plattform Consistency Using Tests Against ...bioinf.boku.ac.at/CAMDA2008/05.12.2008/klinglm_talk.pdfIntroduction Material and Methods Experimental Design Methods Exploratory

Documents