Top Banner
A Geometric Note on a Type of Multiple Testing Dipak K Dey, Junfeng Liu, Nalini Ravishanker, Edwards Qiang Zhang (07-24-2015) ABSTRACT. For a collection of subjects, the within-subject replicate measurements are usually modeled as subject-specific mean (zero and/or non-zero) plus random noises. For the problem of selecting a set of potentially significant subjects (likely with non-zero means) out of all subjects, we study some new aspects of the elegant false discovery rate (FDR) control procedure proposed by Benjamini and Hochberg (1995). 1 Introduction We present the collected measurements as y i,j = µ i + ϵ i,j , ϵ i,j N (02 i ), j =1,...,m, i =1,...,n, where n is the total number of subjects and m is the sample size (number of replicates) for each subject. A type of confidence interval for each subject mean (µ i ) could be constructed as µ i ¯ y i ± Ct m-1,1- α 2 ˆ σ m-1 / m, i =1,...,n. where, 1 - α is the prescribed confidence level, subject-specific variance is estimated as ˆ σ 2 m-1 = 1 m-1 m j =1 (y i,j - ¯ y i ) 2 involving subject-specific mean estimator ¯ y i = 1 m m j =1 y i,j , t m-1,1- α 2 is the 1 - α 2 quantile of the central Students’ t-distribution with degrees of freedom m - 1, and C is a cross-the-board tuning parameter. Simply employing the following rule | m¯ y i ˆ σ m-1 | > Ct m-1,1- α 2 reject µ i =0 | m¯ y i ˆ σ m-1 |≤ Ct m-1,1- α 2 accept µ i =0 (1) relates to checking out t-statistic based p-value, p i =1 - F (| m¯ y i ˆ σ m-1 |), where F is the probability distribution function for certain random variable. For instance, F could be the probability distri- bution of |T m-1 | (T m-1 is the central Students’ t-statistic with degrees of freedom m - 1). Rule (1) becomes into 1
22

A Geometric Note on a Type of Multiple Testing-07-24-2015

Jan 17, 2017

Download

Documents

Junfeng Liu
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: A Geometric Note on a Type of Multiple Testing-07-24-2015

A Geometric Note on a Type of Multiple Testing

Dipak K Dey, Junfeng Liu, Nalini Ravishanker, Edwards Qiang Zhang (07-24-2015)

ABSTRACT. For a collection of subjects, the within-subject replicate measurements are usually

modeled as subject-specific mean (zero and/or non-zero) plus random noises. For the problem of

selecting a set of potentially significant subjects (likely with non-zero means) out of all subjects,

we study some new aspects of the elegant false discovery rate (FDR) control procedure proposed

by Benjamini and Hochberg (1995).

1 Introduction

We present the collected measurements as

yi,j = µi + ϵi,j, ϵi,j ∼ N(0, σ2i ), j = 1, . . . ,m, i = 1, . . . , n,

where n is the total number of subjects and m is the sample size (number of replicates) for each

subject. A type of confidence interval for each subject mean (µi) could be constructed as

µi ∈ yi ± Ctm−1,1−α2σm−1/

√m, i = 1, . . . , n.

where, 1 − α is the prescribed confidence level, subject-specific variance is estimated as σ2m−1 =

1m−1

∑mj=1(yi,j − yi)

2 involving subject-specific mean estimator yi = 1m

∑mj=1 yi,j, tm−1,1−α

2is the

1 − α2quantile of the central Students’ t-distribution with degrees of freedom m − 1, and C is a

cross-the-board tuning parameter. Simply employing the following rule

|√myi

σm−1| > Ctm−1,1−α

2→ reject µi = 0

|√myi

σm−1| ≤ Ctm−1,1−α

2→ accept µi = 0

(1)

relates to checking out t-statistic based p-value, pi = 1 − F (|√myi

σm−1|), where F is the probability

distribution function for certain random variable. For instance, F could be the probability distri-

bution of |Tm−1| (Tm−1 is the central Students’ t-statistic with degrees of freedom m− 1). Rule (1)

becomes into

1

Page 2: A Geometric Note on a Type of Multiple Testing-07-24-2015

pi < 1− Fm−1(Ctm−1,1−α2) → reject µi = 0,

pi ≥ 1− Fm−1(Ctm−1,1−α2) → accept µi = 0.

We first look at an example where three groups have subject mean (indexed by i = 1,. . .,n = 100

for each group) profiles µi = 0 (Group 1), µi = 0.01(1 + sin10in) (Group 2) and µi = 0.10(1 + sin10i

n)

(Group 3), respectively. The within-subject variation σ = 1. Under rule (1), the rejection proportion

profiles (α = 0.10, m varies from 6 to 10, C = 1 + j20

(j = 1, . . . , 16)) are plotted in Figure 1.

Groups 1 and 2 have similar rejection proportion profiles since these two subject mean profiles are

substantially close to each other. Thus, the resultant false discovery rate in this manner is roughly

1/2 when we combine these two groups (H0 = Group 1 (zero mean), Ha = Group 2 (non-zero

mean)) under any values of m and C.

2 New perspectives

Built upon the ordered-p-value set ({p(j), 1 ≤ j ≤ n}) with p-values (indexed by rank j) being

arranged from the smallest to the largest, the elegant false discovery rate (FDR) control procedure

(Benjamini and Hochberg, 1995) would

reject all subjects with rank ≤ max{j : p(j) ≤j

nq, 1 ≤ j ≤ n}, (2)

where n is the total number of hypotheses (subjects) with H0 and Ha combined. If rejections are

found then the instant FDR is calculated as the proportion of wrong rejections out of all rejections

(H0 and Ha combined). If no rejections are found then the instant FDR is defined as 0. The

so-called FDR which is defined as the expectation of the instant FDRs is controlled at π0q, where

π0 is the proportion of H0 hypotheses (subjects) out of all hypotheses. For illustration purposes,

we set the subject mean function (Ha) as

f(u, x) = 0.08u(1 + |sin(6x)|u), x ∈ [0, 1], u = 1, 2, . . . . (3)

The subject means underH0 are implemented through setting u = 0 at x = in0

(i = 1, . . . , n0), where

n0 is the number of subjects (hypotheses) under H0. The subject means under Ha are implemented

2

Page 3: A Geometric Note on a Type of Multiple Testing-07-24-2015

through setting x = in1

(i = 1, . . . , n1), where n1 is the number of subjects (hypotheses) under Ha.

Under any numerical simulation configuration (subject group size (n0, n1), within-subject variation

(σ), within-subject replicate/sample size (m)), separating Ha subjects from H0 is expected to be

easier as we increase Ha subject mean profile to ∞. We take a look at the resultant specificity

profiles and find they approach to a limit (regulated by q) as Ha mean profile increases. Such

a limit is achieved exactly once Ha mean profile is sufficiently large. We are thus motivated to

take a geometric view by juxtaposing the ordered-p-value profiles (H0 and Ha) along with an

overriding adaptive hypothesis rejection cut-off route (indexed by subjects, H0 and H1 combined)

for sequential p-value check. In Figure 2, the ordered-p-value profile under H0 roughly resembles

a straight line connecting points (π1,0) and (1,1). As ordered-p-value profile under Ha approaches

to the bottom (mean profile increases), the rejected hypothesis set includes all Ha and those H0

subjects with p-value located from D to B (Rule (2)). The limiting specificity is subsequently

calculated. Along the cut route (the solid line spanning from (0,0) to (x1,y1)) in Figure 2, each

check point j∗ ∈ {1, . . . , n(= n0 + n1)} corresponds to a number (n0(j∗)) of p-values (≤ j∗

nq, under

H0) and another number (n1(j∗)) of p-values (≤ j∗

nq, under Ha) (Figure 3). All those hypotheses

linked to these n0(j∗)+n1(j

∗) p-values will be rejected as long as n0(j∗) + n1(j

∗) ≥ j∗. However,

any check point (j∗) along the cut route (Figure 2) which is beyond that one (j∗B) corresponding to

point B would not be able to collect a sufficient number of hypotheses (H0 and Ha combined) such

that n0(j∗) + n1(j

∗) ≥ j∗. The set {j∗ − n0(j∗) : 1 ≤ j∗ ≤ j∗B} roughly formulates a no-rejection

region boundary prescribed for Ha hypotheses (the bold dash line, Figure 3), i.e., there will be no

discovery (rejection) unless the ordered-p-value profile under Ha ever crosses this boundary from

upper portion (“NO REJECTION region”, Figure 3)) to the lower portion. When there is such a

crossing, geometric arguments show that the instant FDR is always around π0q no matter where the

crossing point is located along the no-rejection boundary. Numerical simulation would disclose some

operating characteristics under different specifications on experimental factors (e.g., within-subject

variation (σ), within-subject sample size (m), Ha subject mean profile, population size (n0 + n1),

H0 proportion (π0 = n0/(n0 + n1)), etc.). Moreover, we also try applying a quadratic cut route

reject all subjects with rank ≤ max{j : p(j) ≤ (j

n)2q, 1 ≤ j ≤ n}, (4)

3

Page 4: A Geometric Note on a Type of Multiple Testing-07-24-2015

We summarize some observations.

• In Figure 2, the intersection (B) between H0 p-value profile ( y = (x − π1)/π0) and linear

cut route (y = xq) has location (x1,y1) with x1 = (1 − π0)/(1 − qπ0), the intersection (C)

between H0 p-value profile and quadratic cut route (y = x2q) has location (x2,y2) with x2 =

(1−√

1− 4π0(1− π0)q)/(2qπ0).

• From Figure 3, when the probability of discovery= 1, FDR=pFDR (positive false discov-

ery rate)= π0q (constant) no matter where the ordered-p-value profile (Ha) crosses the no-

rejection boundary. The no-rejection boundary function g(x) = qx/(1 − qπ0) (0 ≤ x ≤ π1,

under linear cut) and g(x) = (1−2qπ0x)−√1−4qπ0x

2qπ20

(0 ≤ x ≤ π1, under quadratic cut). The

relationship between instant FDR(=pFDR) and no-rejection boundary function (g(x)) is

pFDR= π0g(x)/(x+ π0g(x)) (0 ≤ x ≤ π1).

• In Figure 4, at each q, the instant FDR(=pFDR) increases with the location (x ∈ (0, π1), the

x-axis) where the ordered-p-value profile (Ha) crosses the no-rejection boundary. When q=1,

FDR= π0 for any cut routine (linear, quadratic).

• In Figure 5, under linear cut, when probability of discovery is less than one (e.g., ordered-

p-value profiles are close between H0 and Ha), pFDR>FDR and FDR= π0q. pFDR is less

sensitive to q compared to FDR. This is relevant to the observation in Figure 1 (Groups 1 and

2). In Figure 5, under quadratic cut, the FDR is much less than that under linear cut case.

When Ha mean profiles are close to zero, the pFDR is more volatile than linear cut case.

• Under linear cut, the specificity approaches to (1−q)/(1−qπ0) as µ increases. Under quadratic

cut, the specificity approaches to 1π0

− 1−(1−4qπ0(1−π0))1/2

2qπ20

as µ increases. See Figures 5, 6, 7,

10, 11, 12.

• As ordered-p-value profile under Ha decreases (mean profile increases), the numbers of discov-

eries becomes very small. The number increases with Ha subject mean profile. The expected

number of discoveries under linear cut is higher than that under quadratic cut. The difference

is larger as π0 gets larger. See Figures 8, 9.

4

Page 5: A Geometric Note on a Type of Multiple Testing-07-24-2015

• As n increases, the limiting specificity profile approaches to the aforementioned calculated

curve more closely. See the left panels in Figures 5, 10.

• As π0 decreases, the limiting specificity profile approaches to the aforementioned calculated

curve more closely. See the left panels in Figures 10, 11.

• The specificity under linear cut is lower than that under quadratic cut and the difference

lessens as π0 decreases. The sensitivity under linear cut is higher than that under quadratic

cut. The probability of discovery under linear cut is higher than that under quadratic cut.

• We consider an unrealistic case where H0 ordered-p-value profile is not random: {i/n0, i =

1, . . . , n0}. The FDR and pFDR is less than π0q when the Ha mean profile is close to zero

(Figure 13).

• When σ (homogeneous among subjects) increases, the resultant cluster of profiles (collected

from mean profile set) behaves similarly to a sub-cluster of profiles (collected from mean profile

set with small values) with small σ (Figure 14 ).

• When σ is heterogeneous across subjects (roughly independent of subject mean), the proba-

bility of discovery tend to be larger (closer to one) than that under homogeneous σ case when

the mean profile is close to zero. The pFDR under heterogeneous σ is closer to FDR compared

to the case with homogeneous σ. All other profiles (sensitivity, specificity) are similar between

these two cases (homogeneous and heterogeneous σ) (Figure 15 ).

• If all Ha p-values are ≤ p, we reject all p-values ≤ p. The false rejection rate ≤ π0q amounts

to p ≤ π1qπ1+(1−q)π0

(Figures 16, 17).

3 A note on p-value

We numerically study the ranking of p-values through setting set size (n), Ha subject mean pro-

file (µ) and noise variance (σ2) and others. Stochastic p-value rankings from both H0 and Ha

5

Page 6: A Geometric Note on a Type of Multiple Testing-07-24-2015

(Figures 18. Although the subject means are clearly ordered across domain [0, 1]) and the within-

subject variation is moderate (=1) or minor (=1/100), the rankings of p-values are substantially

fluctuating around a trend. The degree of shuffling seems to be similar between two cases (σ =1 and

1/100). The p-values are individually calculated for each subject without considering the overall

model structure (e.g., mean profile function, homogeneous variation, etc.). Each p-value is associ-

ated with a probability function, Pr(Tm−1 >√mxm/σm), where xm and σm are independent of each

other. This pair of statistics (sample standard deviation, sample mean) is also used to estimate

the population coefficient of variation (σ/µ). The stochastic σm has an substantial shuffling impact

on the ranking of xm. For instance, given another subject *, the comparison between√mxm/σm

and√mx∗

m/σ∗m may be confused by the stochastic relative magnitude between σm and σ∗

m. The

distribution of estimate of coefficient of variation is available (e.g., Hendricks and Robey (1936),

Vangel (1996)). Even pairwise comparison between any two subject means is generally complicated

under certain circumstances and numerical investigation is usually needed (e.g., Hsu (1938)).

References

[1] W.A. Hendricks, K.W. Robey (1936). The sampling distribution of the coefficient of variation.

The Annals of Mathematical Statistics 7(3): 129-132.

[2] P.L. Hsu (1938). Contribution to the theory of “Student’s” t-test as applied to the problem of

two samples. Statistical Research Memoirs 2: 1-24.

[3] Y. Benjamini and Y. Hochberg (1995). Controlling the false discovery rate: A practical and

powerful approach to multiple testing. Journal of the Royal Statistical Society (B) 57: 289-300.

[4] M.G. Vangel (1996). Confidence intervals for a normal coefficient of variation. The American

Statistician 50(1): 21-26.

4 APPENDIX

6

Page 7: A Geometric Note on a Type of Multiple Testing-07-24-2015

0.0 0.2 0.4 0.6 0.8 1.0

−0

.10

.10

.30

.5

Subject means

Subject population

Su

bje

ct m

ea

n

(n=100 per group)

Group 1 2 3

6 7 8 9 10

0.0

00

.05

0.1

00

.15

Rejection proportion

m (replicates)

Re

ject

ion

pro

po

rtio

n

(Group 1)

6 7 8 9 10

0.0

00

.05

0.1

00

.15

Rejection proportion

m (replicates)

Re

ject

ion

pro

po

rtio

n

(Group 2)

6 7 8 9 10

0.0

00

.05

0.1

00

.15

Rejection proportion

m (replicates)

Re

ject

ion

pro

po

rtio

n

(Group 3)

Figure 1: The rejection proportion profiles arising from applying testing rule (1) (α = 0.10). Three

groups (1,2,3) have subject mean (subject index i= 1, . . . , 100) profiles µi = 0, µi = 0.01(1+ sin10in)

and µi = 0.10(1+sin10in), respectively (the top-left panel). The within-subject variation (σ)= 1. The

tuning parameter (C) in rule (1) = 1+ j20

(j = 1, . . . , 16) with resultant rejection proportion profiles

(with m spanning from 6 to 10) located from top to bottom in each panel (top-right, bottom-left,

bottom-right).

7

Page 8: A Geometric Note on a Type of Multiple Testing-07-24-2015

Geometry of false discovery rate control

B(x1,y1)

C(x2,y2)

A

D

Specificity=AB/AD (linear cut) Specificity=AC/AD (quadratic cut)

π1 π0

q

p−va

lue

H0

Ha

Figure 2: The bold dash line represents the ordered p-values from Ha with large positive means

(π1 = 0.7). The bold dot line represents the ordered p-values from H0 (π0 = 0.3). The solid lines

represent the linear and quadratic cut routes (x-axis is the ordered p-value index, y-axis is the

threshold for H0 rejection). Under Benjamini-Hochberg (1995) FDR control procedure, specificity

approaches to its limit as the alternative means increase. The intersection points between the

linear and quadratic cut routes and H0 ordered p-value profile are the final p-value cut-off point

for rejecting H0, which are labeled as B (location=(x1,y1)) and C (location=(x2,y2)), respectively.

The specificities are calculated.

8

Page 9: A Geometric Note on a Type of Multiple Testing-07-24-2015

FDR control (linear cut)

π1 π0

(NO REJECTION region)

(Ha) q

p−valu

e

FDR control (quadratic cut)

π1 π0

(NO REJECTION region)

(Ha) q

p−valu

e

Figure 3: The left panel shows the geometry of Benjamini-Hochberg FDR control procedure (1995).

The bold solid line represents the linear cut route (x-axis is the ordered p-value index, y-axis is

the threshold for H0 rejection). The bold dot line represents the ordered p-value profile under H0

(group size ∝ π0). The bold dash line represents the no-rejection region boundary for the ordered

p-values from Ha (group size ∝ π1). In the horizontal direction, the distance between the bold dash

and the solid lines equals the distance between the bold dot line and the point which separates the

two regions labeled by “π1” and “π0”, respectively). The right panel shows the geometry of FDR

control procedure under quadratic cut route.

9

Page 10: A Geometric Note on a Type of Multiple Testing-07-24-2015

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.1

0.2

0.3

0.4

Positive false discovery rate (linear and quadratic cut)

Exceeding point (0 to π1) (Ha)

FD

R

LinearQuadratic

q (1/10)

(10/10,by 1/10)

LinearQuadratic

q (1/10)

(10/10,by 1/10)

LinearQuadratic

q (1/10)

(10/10,by 1/10)

LinearQuadratic

q (1/10)

(10/10,by 1/10)

LinearQuadratic

q (1/10)

(10/10,by 1/10)

LinearQuadratic

q (1/10)

(10/10,by 1/10)

LinearQuadratic

q (1/10)

(10/10,by 1/10)

LinearQuadratic

q (1/10)

(10/10,by 1/10)

LinearQuadratic

q (1/10)

(10/10,by 1/10)

LinearQuadratic

q (1/10)

(10/10,by 1/10)

Figure 4: The FDR under linear and quadratic cut routes. FDR under linear cut is a constant

among exceeding points. FDR under quadratic cut is an increasing function of exceeding position.

10

Page 11: A Geometric Note on a Type of Multiple Testing-07-24-2015

0.0 0.2 0.4 0.6 0.8 1.0

0.00.2

0.40.6

0.81.0

Linear cut

q

Propor

tions

(n0,n1)=(900,100)

FDR

pFDR

Sensitivity

Specificity

Pr(discovery)

0.0 0.2 0.4 0.6 0.8 1.0

0.00.2

0.40.6

0.81.0

Quadratic cut

q

Propor

tions

(n0,n1)=(900,100)

Figure 5: The FDR, pFDR, specificity and sensitivity profiles under linear and quadratic cut routes

(n0 = 900 (H0),n1 = 100 (Ha)). σ = 1, m = 6 and Ha subject mean profile = 0.08u(1 + |sin(6x)|u),

u = 1, . . . , 35.

11

Page 12: A Geometric Note on a Type of Multiple Testing-07-24-2015

0.0 0.2 0.4 0.6 0.8 1.0

0.00.2

0.40.6

0.81.0

Linear cut

q

Propor

tions

(n0,n1)=(500,500)

FDR

pFDR

Sensitivity

Specificity

Pr(discovery)

0.0 0.2 0.4 0.6 0.8 1.0

0.00.2

0.40.6

0.81.0

Quadratic cut

q

Propor

tions

(n0,n1)=(500,500)

Figure 6: The FDR, pFDR, specificity and sensitivity profiles under linear and quadratic cut routes

(n0 = 500 (H0),n1 = 500 (Ha)). σ = 1, m = 6 and Ha subject mean profile = 0.08u(1 + |sin(6x)|u),

u = 1, . . . , 35.

12

Page 13: A Geometric Note on a Type of Multiple Testing-07-24-2015

0.0 0.2 0.4 0.6 0.8 1.0

0.00.2

0.40.6

0.81.0

Linear cut

q

Propor

tions

(n0,n1)=(100,900)

FDR

pFDR

Sensitivity

Specificity

Pr(discovery)

0.0 0.2 0.4 0.6 0.8 1.0

0.00.2

0.40.6

0.81.0

Quadratic cut

q

Propor

tions

(n0,n1)=(100,900)

Figure 7: The FDR, pFDR, specificity and sensitivity profiles under linear and quadratic cut routes

(n0 = 100 (H0),n1 = 900 (Ha)). σ = 1, m = 6 and Ha subject mean profile = 0.08u(1 + |sin(6x)|u),

u = 1, . . . , 35.

13

Page 14: A Geometric Note on a Type of Multiple Testing-07-24-2015

0.0 0.2 0.4 0.6 0.8 1.0

020

4060

80100

Linear cut

q

Numb

er of d

iscove

ries

(n0,n1)=(90,10)

0.0 0.2 0.4 0.6 0.8 1.0

020

4060

80100

Quadratic cut

q

Numb

er of d

iscove

ries

(n0,n1)=(90,10)

Figure 8: The number of discoveries under linear and quadratic cut routes (n0 = 90 (H0),n1 = 10

(Ha)). σ = 1, m = 6 and Ha subject mean profile = 0.08u(1 + |sin(6x)|u), u = 1, . . . , 35.

0.0 0.2 0.4 0.6 0.8 1.0

020

4060

80100

Linear cut

q

Numb

er of d

iscove

ries

(n0,n1)=(10,90)

0.0 0.2 0.4 0.6 0.8 1.0

020

4060

80100

Quadratic cut

q

Numb

er of d

iscove

ries

(n0,n1)=(10,90)

Figure 9: The number of discoveries under linear and quadratic cut routes (n0 = 10 (H0),n1 = 90

(Ha)). σ = 1, m = 6 and Ha subject mean profile = 0.08u(1 + |sin(6x)|u), u = 1, . . . , 35.

14

Page 15: A Geometric Note on a Type of Multiple Testing-07-24-2015

0.0 0.2 0.4 0.6 0.8 1.0

0.00.2

0.40.6

0.81.0

Linear cut

q

Propor

tions

(n0,n1)=(90,10)

FDR

pFDR

Sensitivity

Specificity

Pr(discovery)

0.0 0.2 0.4 0.6 0.8 1.0

0.00.2

0.40.6

0.81.0

Quadratic cut

q

Propor

tions

(n0,n1)=(90,10)

Figure 10: The FDR, pFDR, specificity and sensitivity profiles under linear and quadratic cut routes

(n0 = 90 (H0),n1 = 10 (Ha)). σ = 1, m = 6 and Ha subject mean profile = 0.08u(1 + |sin(6x)|u),

u = 1, . . . , 35.

15

Page 16: A Geometric Note on a Type of Multiple Testing-07-24-2015

0.0 0.2 0.4 0.6 0.8 1.0

0.00.2

0.40.6

0.81.0

Linear cut

q

Propor

tions

(n0,n1)=(50,50)

FDR

pFDR

Sensitivity

Specificity

Pr(discovery)

0.0 0.2 0.4 0.6 0.8 1.0

0.00.2

0.40.6

0.81.0

Quadratic cut

q

Propor

tions

(n0,n1)=(50,50)

Figure 11: The FDR, pFDR, specificity and sensitivity profiles under linear and quadratic cut routes

(n0 = 50 (H0),n1 = 50 (Ha)). σ = 1, m = 6 and Ha subject mean profile = 0.08u(1 + |sin(6x)|u),

u = 1, . . . , 35.

16

Page 17: A Geometric Note on a Type of Multiple Testing-07-24-2015

0.0 0.2 0.4 0.6 0.8 1.0

0.00.2

0.40.6

0.81.0

Linear cut

q

Propor

tions

(n0,n1)=(10,90)

FDR

pFDR

Sensitivity

Specificity

Pr(discovery)

0.0 0.2 0.4 0.6 0.8 1.0

0.00.2

0.40.6

0.81.0

Quadratic cut

q

Propor

tions

(n0,n1)=(10,90)

Figure 12: The FDR, pFDR, specificity and sensitivity profiles under linear and quadratic cut routes

(n0 = 10 (H0),n1 = 90 (Ha)). σ = 1, m = 6 and Ha subject mean profile = 0.08u(1 + |sin(6x)|u),

u = 1, . . . , 35.

17

Page 18: A Geometric Note on a Type of Multiple Testing-07-24-2015

0.0 0.2 0.4 0.6 0.8 1.0

0.00.2

0.40.6

0.81.0

Linear cut (p non−random)

q

Propor

tions

(n0,n1)=(900,100)

0.0 0.2 0.4 0.6 0.8 1.0

0.00.2

0.40.6

0.81.0

Quadratic cut (p non−random)

q

Propor

tions

(n0,n1)=(900,100)

Figure 13: The FDR under linear and quadratic cut routes with ordered H0 p-values forming a non-

random equal-partition of [0, 1]. σ = 1, m = 6 and Ha subject mean profile = 0.08u(1+ |sin(6x)|u),

u = 1, . . . , 35.

18

Page 19: A Geometric Note on a Type of Multiple Testing-07-24-2015

0.0 0.2 0.4 0.6 0.8 1.0

0.00.2

0.40.6

0.81.0

Linear cut

q

Propor

tions

(n0,n1)=(90,10), σ increased

FDR

pFDR

Sensitivity

Specificity

Pr(discovery)

0.0 0.2 0.4 0.6 0.8 1.0

0.00.2

0.40.6

0.81.0

Quadratic cut

q

Propor

tions

(n0,n1)=(90,10), σ increased

Figure 14: The FDR, pFDR, specificity and sensitivity profiles under linear and quadratic cut routes

(n0 = 90 (H0),n1 = 10 (Ha)). σ = 10, m = 6 and Ha subject mean profile = 0.08u(1 + |sin(6x)|u),

u = 1, . . . , 35.

19

Page 20: A Geometric Note on a Type of Multiple Testing-07-24-2015

0.0 0.2 0.4 0.6 0.8 1.0

0.00.2

0.40.6

0.81.0

Linear cut

q

Propor

tions

(n0,n1)=(90,10), σ diverse

FDR

pFDR

Sensitivity

Specificity

Pr(discovery)

0.0 0.2 0.4 0.6 0.8 1.0

0.00.2

0.40.6

0.81.0

Linear cut

q

Propor

tions

(n0,n1)=(50,50), σ diverse

FDRpFDRSensitivitySpecificityPr(discovery)

Figure 15: The FDR, pFDR, specificity and sensitivity profiles under linear and quadratic cut

routes (n0 = 90 (H0),n1 = 10 (Ha)). σ is heterogeneous among subjects, m = 6 and Ha subject

mean profile = 0.08u(1+ |sin(6x)|u), u = 1, . . . , 35. Subject variation= 2|cos(1000i)| (i = 1, . . . , n0)

(under H0) and subject variation= 2|cos(1000i)| (i = 1, . . . , n1) (under Ha).

20

Page 21: A Geometric Note on a Type of Multiple Testing-07-24-2015

Histogram (class=5)

p−value

Freque

ncy

n0 n1 =5x 104

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

AH0

Ha

p

Histogram (class=5)

p−value

Freque

ncy

n0 n1 =5x 104

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

BH0

Ha

p

Figure 16: Histogram of p-values. σ = 1, m = 6 and Ha subject mean profile = 0.08u(1+|sin(6x)|u),

u = 1, 5 (A,B).

Histogram (class=5)

p−value

Freque

ncy

n0 n1 =5x 104

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

CH0

Ha

p

Histogram (class=5)

p−value

Freque

ncy

n0 n1 =5x 104

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

DH0

Ha

p

Figure 17: Histogram of p-values. σ = 1, m = 6 and Ha subject mean profile = 0.08u(1+|sin(6x)|u),

u = 10, 20 (C,D).

21

Page 22: A Geometric Note on a Type of Multiple Testing-07-24-2015

0 5 10 15 20

−0.2

0.00.2

0.40.6

0.81.0

Lag

Autoc

orrela

tion

Autocorrelation(rank residual)

0 5 10 15 20

−0.2

0.00.2

0.40.6

0.81.0

Lag

Autoc

orrela

tion

Autocorrelation(rank)

0.0 0.5 1.0 1.5

0.00.2

0.40.6

0.81.0

Rank (p−value)

Subject(index=i/n)

Rank(

p−valu

e)

(f(i)=i/n,σ=1/100,m=6,n=100)

Rank fitRankMean

0 5 10 15 20 25 30

0.00.2

0.40.6

0.81.0

Lag

Autoc

orrela

tion

Autocorrelation(rank residual)

0 5 10 15 20 25 30

0.00.2

0.40.6

0.81.0

Lag

Autoc

orrela

tion

Autocorrelation(rank)

0.0 0.5 1.0 1.5

0.00.2

0.40.6

0.81.0

Rank (p−value)

Subject(index=i/n)

Rank(

p−valu

e)

(f(i)=i/n,σ=1/100,m=6,n=1000)

Rank fitRankMean

Figure 18: Rankings of p-values. Subject mean profile is modeled as i/n (i = 1, . . . , n). σ = 1/100.

22