Higher criticism for estimating proportion of non-null effect in high-dimensional multiple comparison Annika Tillander joint work with Tatjana Pavlenko Karolinska Institutet, Department of Medical Epidemiology and Biostatistics LinStat 26 August 2014 Annika Tillander bHC for estimating proportion of non-null 1 / 39
41
Embed
Higher criticism for estimating proportion of non-null …conferences.mai.liu.se/LinStat2014/presentations/Til... · 2014-08-26 · Higher criticism for estimating proportion of non-null
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Higher criticism for estimating proportion ofnon-null effect in high-dimensional multiple
comparison
Annika Tillanderjoint work with Tatjana Pavlenko
Karolinska Institutet, Department of Medical Epidemiology and Biostatistics
LinStat 26 August 2014
Annika Tillander bHC for estimating proportion of non-null 1 / 39
Annika Tillander bHC for estimating proportion of non-null 2 / 39
Classification
We have n observations where each observationx = (x1, . . . , xp) corresponds to p number of features.Supervised classification problem with C classes, class labelyj = c where j = 1, . . . , n, c ∈ {1, . . . , C}
T = {(x1, y1), (x2, y2), . . . , (xn, yn)} .Using the training data a prediction model is built which enablesprediction of new observations where the outcome is unknown.
Annika Tillander bHC for estimating proportion of non-null 3 / 39
Linear Discriminant Analysis (LDA)
Assume that the outcome in each class is modeled by theGaussian distribution, i.e. xc ∼ N(µc ,Σc), where µc is theclass mean and Σc is the class-wise covariance matrix.
Dc(x) = x′Σ−1c µc − 1
2µ′cΣ
−1c µc + logπc
πc is the prior probability of class c andC∑
c=1πc = 1.
c∗ = argmaxc=1,...,CDc(x)
Annika Tillander bHC for estimating proportion of non-null 4 / 39
Two classes
Assign x to class 1 iflog π1
π2+(
x − 12(µ1 + µ2)
)′Σ−1(µ1 − µ2) ≥ 0
Annika Tillander bHC for estimating proportion of non-null 5 / 39
High dimensionality
It is the relation between the number of observation (n) and thenumber of features (p) that decides the dimensionality of thedata.In the case with more features than available observations, theproblem is said to be ”high-dimensional” (p > n).
Standard asymptoticp is fixed and n → ∞
Asymptotic when p is not fixedp = pn grows with n, pn ≫ n for n → ∞
Annika Tillander bHC for estimating proportion of non-null 6 / 39
gLasso
Using the correlation matrix K−1 instead of covariance allowsfor faster convergence in high-dimensional setting Rothman et al.
(2008).
Let Γ denote the diagonal matrix of true standarddeviationsΣ−1 = Γ−1KΓ−1
Sparse inverse covariance estimation with the graphicallasso Friedman et al. (2009)
K̂λ = arg minK≻0
{
trace(
KK̂−1)
− log detK + λ∥
∥K−1∥
∥
1
}
Annika Tillander bHC for estimating proportion of non-null 7 / 39
Cuthill-McKee ordering
Reducing the bandwidth of sparse symmetric matrices Cuthill &
McKee (1969)
Let S be a p × p symmetric matrix where i denote rowsand j denote columns.The bandwidth of S is the maximum value of |i − j | for thenon-zero elements
Determine a permutation matrix P such that non-zeroelements will cluster about the main diagonalSC = PSPT
Annika Tillander bHC for estimating proportion of non-null 8 / 39
SC= PSPT
S=
SC=
Annika Tillander bHC for estimating proportion of non-null 9 / 39
Annika Tillander bHC for estimating proportion of non-null 10 / 39
Algorithm
Combining gLasso and Cuthill-McKee orderingBootstrap sample, calculate K̂−1
j
Estimate K̂j [λi ] with gLassoSij = 1K̂j [λi ]>0
S̃ik = 1 r∑
j=1Sij
r >qk
Find permutation matrix Pik for skeleton S̃ik with Cuthill-McKeeordering algorithm
Annika Tillander bHC for estimating proportion of non-null 11 / 39
Identifying block-structure
(a)
(b)
λλ == 0.63, limit == 0.99
M1
(c)
λλ == 0.66,limit == 0.99
M2
λλ == 0.66,limit == 0.99
M3
λλ == 0.65,limit == 0.99
M4
Annika Tillander bHC for estimating proportion of non-null 12 / 39
Additive classifier
Block diagonal segmentationΣ−1 = diag
[
Σ−11 , . . . ,Σ−1
b
]
where b is the number of blocks. Both the class means µc andthe observed vector x can be partioned into b disjoint subsetsµc,i = (µc,i1 , . . . , µc,ipi
) and xi = (xi1 , . . . , xipi), pi is the block
size, i = 1, . . . , b, such that for any i 6= j , xi and xj areconditionally independent given the class variable y .
Two-class linear function with additive structure
D(x) =b∑
i=1
(
xi − 12(µ1,i + µ2,i)
)′Σ−1
i (µ1,i − µ2,i)
Annika Tillander bHC for estimating proportion of non-null 13 / 39
Mahalanobis distance
Let π1 = π2 = 1/2 then the optimal misclassificationprobability can be expressed asε = Φ
(
−12
√δ2)
where Φ(·) is the Gaussian cumulative distribution function and
δ2 =∥
∥Σ−1/2µ
∥
∥
2is the Mahalanobis shift vector norm, where
µ = µ1 − µ2 is a shift vector and ‖·‖ denotes the ℓ2 norm.
Annika Tillander bHC for estimating proportion of non-null 14 / 39
Separation strength
The i th block separation strength
δ2i =
∥
∥
∥Σ−1/2i µi
∥
∥
∥
2
Rescaled estimate of the i th separation strengthS2
i = ηµ̂′i Σ̂
−1i µ̂i
where η = n1n2n , µ̂i = µ̂1i − µ̂2i is the shift vector of the sample
class means and Σ̂i is the maximum likelihood estimate of thecovariance matrix of the i th block. S2 ∼ χ2(p0, ω
2) where p0
degrees of freedom and ω2 = ηδ2 the non-centrality parameter.
Annika Tillander bHC for estimating proportion of non-null 15 / 39
Misclassifiation
0 10 20 30 40 50
0.0
0.1
0.2
0.3
0.4
0.5
Data from West et.al
Number of variables
Mis
cla
ssific
atio
n L
DA
0 50 100 150
0.2
00
.25
0.3
00
.35
0.4
00
.45
0.5
0
Data from Pawitan et.al
Number of variables
Mis
cla
ssific
atio
n L
DA
1
1Data from West et al. (2001) and Pawitan et al. (2005)Annika Tillander bHC for estimating proportion of non-null 16 / 39
Sparse and weak setting
Traditional mixture
Fre
quen
cy
−2 0 2 4 6
050
100
150
200
Non informativeInformative
Sparse and weak mixture
Fre
quen
cy−4 −2 0 2 4
050
010
0015
00
Non informativeInformative
Annika Tillander bHC for estimating proportion of non-null 17 / 39
falsely selected blocks (fpr ) and the misclassification rate (mc)averaged over 100 runs, presented as mean (m) and standarddeviation (sd) for block size p0 = 20.
Annika Tillander bHC for estimating proportion of non-null 36 / 39
Real dataBlock No.selected blocks Misclassification ratesize bHC Fdr Lfdr bHC Fdr Lfdr All
Breast cancer data I1 657 999 583 0.24 0.23 0.24 -2 328 1461 804 0.23 0.23 0.22 0.285 131 1219 929 0.22 0.27 0.25 0.26
Annika Tillander bHC for estimating proportion of non-null 38 / 39
Thank You
Annika Tillander bHC for estimating proportion of non-null 39 / 39
Benjamini, Y. & Hochberg, Y. (1995), ‘Controlling the false discovery rate: A practicaland powerful approach to multiple testing’, Journal of the Royal Statistical Society.Series B (Methodological) 57(1), pp. 289–300.URL: http://www.jstor.org/stable/2346101
Cuthill, E. & McKee, J. (1969), Reducing the bandwidth of sparse symmetric matrices,in ‘Proceedings of the 1969 24th national conference’, ACM ’69, ACM, New York,NY, USA, pp. 157–172.URL: http://doi.acm.org/10.1145/800195.805928
Donoho, D. & Jin, J. (2004), ‘Higher criticism for detecting sparse heterogeneousmixtures’, Ann. Statist pp. 962–994.
Donoho, D. & Jin, J. (2008), ‘Higher criticism thresholding: Optimal feature selectionwhen useful features are rare and weak’, Proceedings of the National Academy ofSciences 105(39), 14790–14795.URL: http://www.pnas.org/content/105/39/14790.abstract
Donoho, D. & Jin, J. (2009), ‘Feature selection by higher criticism thresholding achievesthe optimal phase diagram’, Philosophical Transactions of the Royal Society A:Mathematical, Physical and Engineering Sciences 367(1906), 4449–4470.URL: http://rsta.royalsocietypublishing.org/content/367/1906/4449.abstract
Efron, B. (2004), Local false discovery rates, Technical report, Department of Statistics,Stanford University.
Efron, B., Storey, J. D. & Tibshirani, R. (2001), ‘Microarrays, empirical bayes methods,and false discovery rates’, Genet. Epidemiol 23, 70–86.
Friedman, J., Hastie, T. & Tibshirani, R. (2009), Graphical lasso- estimation ofGaussian graphical models. Manual to the R-package glasso.
Ingster, Y. I. (1999), ‘Minimax detection of a signal for lpn balls’, Mathematical Methodsof Statistics 7, 401–428.
Annika Tillander bHC for estimating proportion of non-null 39 / 39
Klaus, B. & Strimmer, K. (2013), ‘Signal identification for rare and weak features: highercriticism or false discovery rates?’, Biostatistics 14, 129.
Meinshausen, N. & Rice, J. (2006), ‘Estimating the proportion of false null hypothesesamong a large number of independently tested hypotheses’, Annals of Statistics34(1), 373–393.
Pawitan, Y., Bjohle, J., Amler, L., Borg, A., Egyhazi, S., Hall, P., Han, X., Holmberg, L.,Huang, F., Klaar, S., Liu, E. T., Miller, L., Nordgren, H., Ploner, A., Sandelin, K.,Shaw, P. M., Smeds, J., Skoog, L., Wedren, S. & Bergh, J. (2005), ‘Gene expressionprofiling spares early breast cancer patients from adjuvant therapy: derived andvalidated in two population-based cohorts’, Breast Cancer Research 7(6), 953–964.
Rothman, A., Bickel, P., Levina, E. & Zhu, J. (2008), ‘Sparse permutation invariantcovariance estimation’, Electronic Journal of Statistics 2, 494–515.
Storey, J. D. & Tibshirani, R. (2003), ‘Statistical significance for genomewide studies’,Proceedings of the National Academy of Sciences 100(16), 9440–9445.URL: http://www.pnas.org/content/100/16/9440.abstract
Sun, W. & Cai, T. T. (2007), ‘Oracle and adaptive compound decision rules for falsediscovery rate control’, Journal of the American Statistical Association102(479), 901–912.URL: http://amstat.tandfonline.com/doi/abs/10.1198/016214507000000545
West, M., Blanchette, C., Dressman, H., Huang, E., Ishida, S., Spang, R., Zuzan, H.,Olson, J. A., Marks, J. R. & Nevins, J. R. (2001), ‘Predicting the clinical status ofhuman breast cancer by using gene expression profiles’, PNAS98(20), 11462–11467.
Annika Tillander bHC for estimating proportion of non-null 39 / 39