Hypothesis Testing of Matrix Graph Model with Application to Brain Connectivity Analysis Yin Xia Department of Statistics and Operations Research, University of North Carolina at Chapel Hill, Chapel Hill, NC 27514, U.S.A. Email: [email protected]and Lexin Li Division of Biostatistics, University of California at Berkeley, Berkeley, CA 94720, U.S.A. Email: [email protected]Abstract Brain connectivity analysis is now at the foreground of neuroscience research. A connectivity network is characterized by a graph, where nodes represent neural elements such as neurons and brain regions, and links represent statistical depen- dences that are often encoded in terms of partial correlations. Such a graph is in- ferred from matrix-valued neuroimaging data such as electroencephalography and functional magnetic resonance imaging. There have been a good number of success- ful proposals for sparse precision matrix estimation under normal or matrix normal distribution; however, this family of solutions do not offer a statistical significance 1 arXiv:1511.00718v1 [stat.ME] 2 Nov 2015
36
Embed
Hypothesis Testing of Matrix Graph Model with Application … · Hypothesis Testing of Matrix Graph Model with Application to Brain Connectivity Analysis Yin Xia Department of Statistics
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Hypothesis Testing of Matrix Graph Model with
Application to Brain Connectivity Analysis
Yin Xia
Department of Statistics and Operations Research,
University of North Carolina at Chapel Hill, Chapel Hill, NC 27514, U.S.A.
Given Ti,j, 1 ≤ i < j ≤ p are heteroscedastic and can possibly have a wide variability,
we standardize Ti,j by its standard error, which leads to the standardized statistics,
Wi,j =Ti,j√θi,j
, 1 ≤ i < j ≤ p.
In the next section, we test hypotheses (1) and (2) based on Wi,jpi,j=1.
Remark 1 Construction of the test statistics for the data-driven procedure is almost
the same as that for the oracle procedure, except that the oracle procedure starts with
transformed sample Yk = XkΣ−1/2T in (3), whereas the data-driven one replaces it with
Yk = XkΣ−1/2T . Furthermore, the regression coefficients slightly vary at different time
points in the data-driven scenario, and we shall replace (3) by Yk,i,l = Y Tk,−i,lβi,l + εk,i,l,
for 1 ≤ i ≤ p, 1 ≤ l ≤ q.
Remark 2 When ΣT is unknown, E(ΣT ) = trace(ΣL)/pΣT . If trace(ΣL) = cp, with
c 6= 1, an unbiased estimator of ΣT becomes ΣT/c. Accordingly, we shall define the
transformed data Yk =√cXkΣ
−1/2T , for k = 1, . . . , n. Then we have the bias-corrected
estimator ri,j = cri,j , which in turn leads to Ti,j = Ti,j/c, and θi,j = θi,j/c2. Thus, the
standardized statistic Wi,j remains the same, as the constant c is cancelled. Therefore,
c does not affect our final test statistics, and thus for notational simplicity, we set c = 1
from the beginning, without loss of any generality.
2.3 Global testing procedure
We propose the following test statistic for testing the global null hypothesis H0 : ΩL is
diagonal,
Mnq = max1≤i<j≤p
W 2i,j.
Furthermore, we define the global test Ψα by
Ψα = I(Mnq ≥ qα + 4 log p− log log p)
9
where qα is the 1−α quantile of the type I extreme value distribution with the cumulative
distribution function exp(8π)−1/2e−t/2, i.e.,
qα = − log(8π)− 2 log log(1− α)−1.
The hypothesis H0 is rejected whenever Ψα = 1.
The above test is developed based on the asymptotic properties of Mnq, which will
be studied in detail in Section 3.2. Intuitively, Wi,jpi,j=1 are approximately standard
normal variables under the null distribution, and are only weakly dependent under suitable
conditions. ThusMnq is the maximum of the squares of p(p−1)/2 such random variables,
and its value should be close to 2 logp(p−1)/2 ≈ 4 log p under H0. We will later show
that, under certain regularity conditions, Mnq − 4 log p − log log p converges to a type I
extreme value distribution under H0.
2.4 Multiple testing procedure
Next we develop a multiple testing procedure for H0,i,j : ωL,i,j = 0, so to identify spatial
locations that are conditionally dependent. The test statistic Wi,j defined in Section 2.2
is employed. Since there are (p2 − p)/2 simultaneous hypotheses to test, it is important
to control the false discovery rate. Let t be the threshold level such that H0,i,j is rejected
if |Wi,j| ≥ t. Let H0 = (i, j) : ΩL,i,j = 0, 1 ≤ i < j ≤ p be the set of true
nulls. Denote by R0(t) =∑
(i,j)∈H0I(|Wi,j| ≥ t) the total number of false positives, and
by R(t) =∑
1≤i<j≤p I(|Wi,j| ≥ t) the total number of rejections. The false discovery
proportion and false discovery rate are then defined as
FDP(t) =R0(t)
R(t) ∨ 1, FDR(t) = EFDP(t).
An ideal choice of t would reject as many true positives as possible while controlling the
false discovery rate and false discovery proportion at the pre-specified level α. That is,
we select
t0 = inf
0 ≤ t ≤ 2(log p)1/2 : FDP(t) ≤ α.
10
We shall estimate∑
(i,j)∈H0I|Wi,j| ≥ t by 21−Φ(t)|H0|, where Φ(t) is the standard
normal cumulative distribution function. Note that |H0| can be estimated by (p2 − p)/2
due to the sparsity of ΩL. This leads to the following multiple testing procedure.
Step 1. Calculate the test statistics Wi,j .
Step 2. For given 0 ≤ α ≤ 1, calculate
t = inf
0 ≤ t ≤ 2(log p)1/2 :
21− Φ(t)(p2 − p)/2R(t) ∨ 1
≤ α
.
If t does not exist, set t = 2(log p)1/2.
Step 3. For 1 ≤ i < j ≤ p, reject H0,i,j if and only if |Wi,j| ≥ t.
3 Theory
In this section, we analyze the theoretical properties of the global and multiple testing
procedures for both the oracle and data-driven scenarios. We show that the data-driven
procedures perform asymptotically as well as the oracle procedures and enjoy certain
optimality under the regularity conditions. For separate treatment of the oracle and data-
driven procedures, we now distinguish the notations of the two, and add the superscript
“o” to denote the statistics and tests of the oracle procedures, e.g., βoi ,Monq,Ψ
oα, t
o, and
the superscript “d” to denote those of the data-driven procedures, e.g., βdi ,Mdnq,Ψ
dα, and
td.
3.1 Regularity conditions
For the oracle procedure, we require the following set of regularity conditions.
(C1) Assume that max1≤i≤p |βoi −βi|1 = op[log max(p, q, n)−1], and max1≤i≤p |βoi −
βi|2 = op
(nq log p)−1/4
.
(C2) Assume that log p = o(nq)1/5, and there are constants c0, c1 > 0 such that,
c−10 ≤ λmin(ΩL) ≤ λmax(ΩL) ≤ c0, and c−1
1 ≤ λmin(ΩT ) ≤ λmax(ΩT ) ≤ c1.
11
(C3) LetDL be the diagonal of ΩL and letRL = D−1/2L ΩLD
−1/2L with elements ηL,i,j, 1 ≤
i, j ≤ p. Assume that max1≤i<j≤p |ηL,i,j| ≤ ηL < 1, for some constant 0 < ηL < 1.
For the data-driven procedure, we replace the above condition (C1) with a slightly
different one (C1′), then introduce a new condition (C4).
(C1′) Assume that max1≤i≤p,1≤l≤q |βdi − βi,l|1 = op[log max(p, q, n)−1],
and max1≤i≤p,1≤l≤q |βdi − βi,l|2 = op
(nq log p)−1/4
.
(C4) Define sp = max1≤l≤q max1≤i≤p∑p
j=1 maxI(ωL,i,j 6= 0), I(ωdl,i,j 6= 0), where
(ωdl,i,j)p×p = Ωdl = cov−1(XkΣ
−1/2T )·,l. Assume that ‖ΩT‖2
L1‖ΩL‖2
L1=
omin(r1,n,p,q, r2,n,p,q), where r1,n,p,q = [np/s2pq
3 log q log3 max(p, q, n)]1/2, and
r2,n,p,q = (np2/[s2pq
7log q log max(p, q, n)2 log p])1/4.
A few remarks are in order. The estimator βoi satisfying (C1) can be easily obtained
via standard estimation methods such as Lasso and Dantzig selector. For instance, if
one uses the Lasso estimator, then (C1) is satisfied under (C2) and the sparsity condition
max1≤i≤p |βi|0 = o[(nq)1/2/log max(p, q, n)3/2]. Similarly, βdi satisfying (C1′) can be
obtained by Lasso if (C4) holds and the data-driven regression coefficients βi,l satisfy the
similar sparsity condition. Conditions (C2) and (C3) are regularity conditions that are
commonly used in the high-dimensional hypothesis testing setting (Cai et al., 2013; Liu,
2013; Xia et al., 2015). (C4) is a mild technical condition. If ΩT , ΩL and Ωdl satisfy
max1≤i≤q∑q
j=1 I(ωT,i,j 6= 0) ≤ s and sp ≤ s, for some constant s > 0, then the condi-
tions on matrix 1-norms can be relaxed to the conditions only related to n, p and q, namely,
q3 log q log3 max(p, q, n) = o(np) and q7log q log max(p, q, n)2 log p = o(np2).
3.2 Oracle global testing procedure
We next analyze the limiting null distribution of the oracle global test statistic M onq and
the power of the corresponding test Ψoα. We are particularly interested in the power of
the test under the alternative when ΩL is sparse, and show that the power is minimax rate
optimal.
12
The following theorem states the asymptotic null distribution for M onq, and indicates
that, under H0, M onq − 4 log p+ log log p converges weakly to a Gumbel random variable
with distribution function exp−(8π)−1/2e−t/2.
Theorem 1 Assume (C1), (C2) and (C3). Then under H0, for any t ∈ IR,
pr(M onq − 4 log p+ log log p ≤ t)→ exp−(8π)−1/2 exp(−t/2), as nq, p→∞.
Under H0, the above convergence is uniform for all Xknk=1 satisfying (C1)-(C3).
We next study the power of the corresponding test Ψoα. We define the following class
of precision matrices for spatial locations:
U(c) =
ΩL : max
1≤i<j≤p
|ωL,i,j|θi,j
1/2≥ c(log p)1/2
. (5)
This class of matrices include all precision matrices such that there exists one standard-
ized off-diagonal entry having the magnitude exceeding c(log p)1/2. By the definition in
(4), θi,j is of the order 1/(nq), and thus we only require one of the off-diagonal entries to
have size larger than Clog p/(nq)1/2 for some constant C > 0, where C is fully deter-
mined by c0 and c1 in Condition (C2). Then if we choose the constant c = 4, that is, if
there exists one standardized off-diagonal entry having the magnitude larger or equal than
4(log p)1/2, the next theorem shows that the null parameter set in which ΩL is diagonal is
asymptotically distinguishable from U(4) by the test Ψoα. That is, H0 is rejected by the
test Ψoα with overwhelming probability if ΩL ∈ U(4).
Theorem 2 Assume (C1) and (C2). Then,
infΩL∈U(4)
pr(Ψoα = 1)→ 1, as nq, p→∞.
The next theorem further shows that this lower bound 4(log p)1/2 is rate-optimal. Let Tαbe the set of all α-level tests, i.e., pr(Tα = 1) ≤ α under H0 for all Tα ∈ Tα.
Theorem 3 Suppose that log p = o(nq). Let α, β > 0 and α + β < 1. Then there exists
a constant c2 > 0 such that for all sufficiently large nq and p,
infΩL∈U(c2)
supTα∈Tα
pr(Tα = 1) ≤ 1− β.
13
As Theorem 3 indicates, if c2 is sufficiently small, then any α level test is unable to reject
the null hypothesis correctly uniformly over ΩL ∈ U(c2) with probability tending to one.
So the order (log p)1/2 in the lower bound of max1≤i<j≤p|ωL,i,j|θ−1/2i,j in (5) cannot be
further improved.
3.3 Oracle multiple testing procedure
We next investigate the properties of the oracle multiple testing procedure. The following
theorem shows that the oracle procedure controls the false discovery proportion and false
discovery rate at the pre-specified level α asymptotically.
Theorem 4 Assume (C1) and (C2), and let
Sρ =
(i, j) : 1 ≤ i < j ≤ p,
|ωL,i,j|θ
1/2i,j
≥ (log p)1/2+ρ
.
Suppose for some ρ, δ > 0, |Sρ| ≥ [1/(8π)1/2α+δ](log log p)1/2. Suppose l0 = |H0| ≥
c0p2 for some c0 > 0, and p ≤ c(nq)r for some c, r > 0. Letting l = (p2 − p)/2, then,
lim(nq,p)→∞
FDR(to)
αl0/l= 1,
FDP(to)
αl0/l→ 1
in probability, as (nq, p)→∞.
We comment that the condition |Sρ| ≥ [1/(8π)1/2α + δ](log log p)1/2 in Theorem 4 is
mild, because we have (p2 − p)/2 hypotheses in total and this condition only requires a
few entries of ΩL having standardized magnitude exceeding (log p)1/2+ρ/(nq)1/2 for
some constant ρ > 0.
3.4 Data-driven procedures
We next turn to data-driven procedures for both the global testing and the multiple testing.
We show that they perform as well as the oracle testing procedures asymptotically.
Theorem 5 Assume (C1′) , (C2)-(C4).
14
(i) Under H0, for any t ∈ IR,
pr(Mdnq − 4 log p+ log log p ≤ t)→ exp−(8π)−1/2 exp(−t/2), as nq, p→∞.
Under H0, the above convergence is uniform for all Xknk=1 satisfying (C1′), (C2)-
(C4).
(ii) Furthermore, infΩL∈U(4) pr(Ψdα = 1)→ 1, as nq, p→∞.
This theorem shows that Mdnq has the same limiting null distribution as the oracle test
statistics M onq, and the power of the corresponding test Ψd
α performs as well as the ora-
cle test and is thus minimax rate optimal. The same observation applies to Theorem 6
below, which shows that the data-driven multiple procedure also performs as well as the
oracle case, and controls the false discovery proportion and false discovery rate at the
pre-specified level α asymptotically.
Theorem 6 Assume (C1′) and (C4). Then under the same conditions as in Theorem 4,
lim(nq,p)→∞
FDR(td)
αl0/l= 1,
FDP(td)
αl0/l→ 1
in probability, as (nq, p)→∞.
4 Simulations
We study in this section the finite-sample performance of the proposed testing procedures.
For the global testing of (1), we measure the size and power of the oracle test Ψoα and the
data-driven version Ψdα, and for the multiple testing of (2), we measure the empirical
FDR and power. We compare the oracle and data-driven testing procedures, as well as a
simple alternative that was developed by Xia et al. (2015) under normal instead of matrix
normal distribution, which ignores the separable spatial-temporal structure. The temporal
covariance matrix ΣT is constructed with elements σT,i,j = 0.4|i−j|, 1 ≤ i, j ≤ p. The
sample size and the number of time points is set at n = 20, q = 20 and n = 50, q = 30,
respectively, whereas the spatial dimension p varies among 50, 200, 400, 800. We have
15
chosen this setting, since our primary interest is on inferring about spatial connectivity
networks with different spatial dimensions. We keep the temporal dimension small, since
it is a nuisance in our setup, and choose a relatively small sample size to reflect the fact
that there is usually only a limited sample size in many neuroimaging studies.
For each generated dataset below, we use Lasso to estimate βi as
βi = D− 1
2i arg min
u
1
2nq
∣∣∣(Y·,−i − Y(·,−i))D−1/2i u− (Y(i) − Y(i))
∣∣∣22
+ λn,i|u|1, (6)
where Y is the nq × p data matrix by stacking the transformed samples (Yk,·,l, k =
1, . . . , n, l = 1, . . . , q, where Yk = XkΣ−1/2T for the oracle procedure and Yk = XkΣ
−1/2T
for the data-driven procedure, k = 1, . . . , n, Di = diag(ΣL,−i,−i), and ΣL is the sample
covariance matrix of ΣL with nq transformed samples, and λn,i = κΣL,i,i log p/(nq)1/2.
4.1 Global testing simulation
For the global testing, the data X1, . . . , Xn are generated from a matrix normal distri-
bution with mean zero and precision matrix I⊗ΩT under the null. To evaluate the power,
let U be a matrix with eight random nonzero entries. The locations of four nonzero
entries are selected randomly from the upper triangle of U , each with a magnitude gen-
erated randomly and uniformly from the set [−4log p/(nq)1/2,−2log p/(nq)1/2] ∪
[2log p/(nq)1/2, 4log p/(nq)1/2]. The other four nonzero entries in the lower tri-
angle are determined by symmetry. We set ΩL = (I + U + δI)/(1 + δ), with δ =
|λmin(I + U)|+ 0.05, and choose the tuning parameter κ = 2 in (6).
The size and power, in percentage, of the global testing are reported in Table 1, based
on 1000 data replications and the significance level α1 = 0.05. We see from Table 1 that
the empirical sizes of the proposed oracle and data-driven procedures are well controlled
under the significance level α1 = 0.05. However, for the vector normal based procedure
that ignores the spatial-temporal dependence structure, there is a serious size distortion
across all settings. The empirical sizes for the new procedures are slightly below the
nominal level for high dimensions, due to the correlation among the variables. Similar
phenomenon has also been observed and justified in Cai et al. (2013, Proposition 1).
16
We also see from the table that the new procedures are powerful in all settings, even
though the two spatial precision matrices differ only in eight entries with the magnitude
of difference of the order log p/(nq)1/2. For both the empirical sizes and powers, the
data-driven procedure is seen to perform similarly as the oracle procedure.
4.2 Multiple testing simulation
For the multiple testing, the data X1, . . . , Xn are generated from a matrix normal distri-
bution with mean zero and precision matrix ΩL⊗ΩT . Three choices of ΩL are considered:
Model 1: Ω(1)L = (ω
(1)L,i,j) where ω(1)
L,i,i = 1, ω(1)L,i,i+1 = ω
(1)L,i+1,i = 0.6, ω(1)
L,i,i+2 = ω(1)L,i+2,i =
0.3 and ω(1)L,i,j = 0 otherwise.
Model 2: Ω∗(2)L = (ω
∗(2)L,i,j) where ω∗(2)
L,i,j = ω∗(2)L,j,i = 0.5 for i = 10(k − 1) + 1 and 10(k −