-
Submitted to the Annals of StatisticsarXiv: math.PR/0000000
A WELL CONDITIONED AND SPARSE ESTIMATE OFCOVARIANCE AND INVERSE
COVARIANCE MATRIX
USING A JOINT PENALTY
By Ashwini Maurya
Michigan State University
We develop a method for estimating a well conditioned andsparse
covariance matrix from a sample of vectors drawn from a
sub-gaussian distribution in high dimensional setting. The proposed
esti-mator is obtained by minimizing the squared loss function and
jointpenalty of `1 norm and sum of squared deviation of the
eigenvaluesfrom a positive constant. The joint penalty plays two
important roles:i) `1 penalty on each entry of covariance matrix
reduces the eectivenumber of parameters and consequently the
estimate is sparse andii) the sum of squared deviations penalty on
the eigenvalues controlsthe over-dispersion in the eigenvalues of
sample covariance matrix.In contrast to some of the existing
methods of covariance matrix es-timation, where often the interest
is to estimate a sparse matrix, theproposed method is exible in
estimating both a sparse and well-conditioned covariance matrix
simultaneously. We also extend themethod to inverse covariance
matrix estimation and establish the con-sistency of the proposed
estimators in both Frobenius and Operatornorm. The proposed
algorithm of covariance and inverse covariancematrix estimation is
very fast, ecient and easily scalable to largescale data analysis
problems. The simulation studies for varying sam-ple size and
number of variables shows that the proposed estimatorperforms
better than graphical lasso, PDSCE estimates for variouschoices of
structured covariance and inverse covariance matrices. Wealso use
our proposed estimator for tumor tissues classication usinggene
expression data and compare its performance with some
otherclassication methods.
1. Introduction. With the recent surge in data technology and
storagecapacity, today's statisticians often encounter data sets
where sample size nis small and number of variables p is very
large: often hundreds, thousandsand even million or more. Examples
include gene expression data and websearch problems [Clarke et al.
(2008), Pass et al. (2006)]. For many of thehigh dimensional data
problems, the choice of classical statistical methodsbecomes
inappropriate for making valid inference. The recent
developments
AMS 2000 subject classications: Primary 62G20,62G05; secondary
62H12.Corresponding author: MauryaKey words and phrases. Sparsity,
Eigenvalue Penalty, Matrix Estimation, Penalized Esti-mation.
1
-
2 ASHWINI MAURYA
in asymptotic theory deal with increasing p as long as both p
and n tend toinnity at some rate depending upon parameter of
interest.The estimation of covariance and inverse covariance matrix
is a problem
of primary interest in multivariate statistical analysis. Some
of the appli-cations include: (i) Principal component analysis
(PCA) [Johnstone et al.(2004), Zou et al. (2006)]: where the goal
is to project the data on \best"k-dimensional subspace, where best
means the projected data explains asmuch of the variation in
original data without increasing k. (ii) Discrimi-nant analysis
[Mardia et al. (1975)]: where the goal is to classify
observationsinto dierent classes, an estimate of covariance and
inverse covariance matrixplays an important role as the classier is
often a function of these entities.(iii) Regression analysis: If
interest focuses on estimation of regression coef-cients with
correlated (or longitudinal) data, a sandwich estimator of
thecovariance matrix may be used to provide standard errors for the
estimatedcoecients that are robust in the sense that they remain
consistent undermis-specication of the covariance structure. (iv)
Gaussian graphical mod-eling [Meinshausen (2006), Wainwright et al.
(2006), Yuan et al. (2007)]: therelationship structure among nodes
can be inferred from inverse covariancematrix. A zero entry in the
inverse covariance matrix implies conditionalindependence between
the corresponding nodes.
The estimation of large dimensional covariance matrix based on
few sam-ple observations is a dicult problem, especially when n p
(here an bnmeans that there exist positive constants c and C such
that c an=bn C).In these situations, the sample covariance matrix
becomes unstable whichexplodes the estimation error. It is well
known that the eigenvalues of samplecovariance matrix are
over-dispersed which means that the eigen-spectrumof sample
covariance matrix is not a good estimator of its population
coun-terpart [Marcenko (1967), Karoui (2008)]. To illustrate this
point, considerp = Ip, so all the eigenvalues are 1. A result from
[Geman S. (1980)]shows that if entries of Xi's are i.i.d and have a
nite fourth moment and ifp=n! > 0, then the largest sample
eigenvalue l1 satises:
l1 ! (1 +p)2; a:s
This suggests that l1 is not a consistent estimator of the
largest eigenvalue1 of population covariance matrix. In particular
if n = p then l1 tendsto 4 whereas 1 is 1. This is also evident in
the eigenvalue plot in gure2.1. The distribution of l1 also depends
upon the underlying structure ofthe true covariance matrix. From
gure 2.1, it is evident that the smallersample eigenvalues tend to
underestimate the true eigenvalues for large p and
-
JPEN FOR COVARIANCE AND INVERSE COVARIANCE MATRIX
ESTIMATION3
small n. For more discussion here see [Karoui (2008)]. To
correct this bias,a natural choice would be to shrink the sample
eigenvalues towards somesuitable constant to reduce the
over-dispersion. For instance, Stein (1975)proposed an estimator of
the form ~ = ~U(~) ~U where (~) is a diagonalmatrix with diagonal
entries as transformed function of sample eigenvaluesand ~U is
matrix of eigenvectors. In another interesting paper Ledoit
andWolf(2004) proposed an estimator that shrinks the sample
covariance matrixtowards the identity matrix. In another paper,
Karoui (2008) proposed anon-parametric estimation of spectrum of
eigenvalues and show that hisestimator is consistent in sense of
weak convergence of distributions.The covariance matrix estimates
based on eigen-spectrum shrinkage are
well conditioned in the sense that their eigenvalues are well
bounded awayfrom zero. These estimates are based on the shrinkage
of the eigenvalues andtherefore invariant under some orthogonal
group i.e. the shrinkage estimatorsshrink the eigenvalues but
eigenvectors remain unchanged. In other words,the basis
(eigenvector) in which the data are given is not taken advantageof
and therefore the methods rely on premise that one will be able to
nda good estimate in any basis. In particular, it is reasonable to
believe thatthe basis generating the data is somewhat nice. Often
this translates intothe assumption that the covariance matrix has
particular structure that oneshould be able to take advantage of.
In these situations, it becomes naturalto perform certain form of
regularization directly on the entries of samplecovariance
matrix.Much of the recent literature focuses on two broad class of
regularized co-
variance matrix estimation. i) The one class rely on natural
ordering amongvariables, where one often assumes that the variables
far apart are weeklycorrelated and ii) the other class where there
is no assumption on the naturalordering among variables. The rst
class includes the estimators based onbanding and tapering [Bickel
and Levina (2008), Cai et al. (2010)]. Theseestimators are
appropriate for a number of applications for ordered data(time
series, spectroscopy, climate data). However for many applications
in-cluding gene expression data, priori knowledge of any canonical
ordering isnot available and searching for all permutation of
possible ordering wouldnot be feasible. In these situations, an `1
penalized estimator becomes moreappropriate which yields a
permutation-invariant estimate.To obtain a suitable estimate which
is both well conditioned and sparse,
we introduce two regularization terms: i) `1 penalty to each of
the o-diagonal elements of matrix and, ii) squared deviation
penalty to eigenvaluesfrom a suitable constant. The `1 minimization
problems are well studied inthe covariance and inverse covariance
matrix estimation literature [Freidman
-
4 ASHWINI MAURYA
et al. (2007), Banerjee et al. (2008), Bickel and Levina (2008),
Ravikumar etal. (2011), Jacob and Tibshirani (2011), Maurya (2014)
etc.]. Meinshausenand Buhlmann (2006) studied the problem of
variable selection using highdimensional regression with lasso and
show that it is a consistent selectionscheme for high dimensional
graphs. Rothman et al. (2008) propose an `1 pe-nalized
log-likelihood estimator and show that their estimator is
consistent
in Frobenius and operator norm at the rate of OP
pf(p+ s) log pg=n,as both p and n approach to innity. Here s is
the number of non-zero o-diagonal elements in true covariance
matrix. Jacob and Tibshirani (2011)propose an estimator of
covariance matrix as penalized maximum likelihoodestimator with a
weighted lasso type penalty. In these optimization prob-lems, the
`1 penalty results in sparse (as compared to other lq; q 6= 1
penal-ties) and a permutation-invariant estimate as compared to
other lq; q 6= 1penalties. Another advantage is that the `1 norm is
a convex function whichmakes it suitable for large scale
optimization problems and a number of fastalgorithms exist for
covariance and inverse covariance matrix estimation[(Freidman et
al. (2007), Rothman (2012)]. The eigenvalue squared penaltyfrom a
suitable constant overcomes the over-dispersion in the sample
covari-ance matrix so that the estimator remains well
conditioned.Ledoit and Wolf (2004) proposed an estimator of
covariance matrix as a
linear combination of sample covariance and identity matrix.
Their estimatorof covariance matrix is well conditioned but it is
not sparse. Rothman (2012)proposed estimator of covariance matrix
based on squared error penalty and`1 penalty with a log-barrier on
the determinant of covariance matrix. Thelog-determinant barrier is
a valid technique to achieve positive denitenessbut it is still
unclear whether the iterative procedure proposed in this pa-per
[Rothman (2012)] actually nds the right solution to the
correspondingoptimization problem. In another interesting paper,
Xue et al. (2012) pro-pose an estimator of covariance matrix as a
minimizer of penalized squaredloss function over set of positive
denite cones. In this paper, the authorssolve a positive denite
constrained optimization problem and establish theconsistency of
estimator. The resulting estimator is sparse and positive def-inite
but whether it overcomes the over-dispersion of the
eigen-spectrumof sample covariance matrix, is hard to justify.
Maurya (2014) proposed ajoint convex penalty as function of `1 and
trace norm (dened as sum ofsingular values of a matrix) for inverse
covariance matrix estimation basedon penalized likelihood
approach.In this paper, we derive an explicit rate of convergence
of the proposed
estimator (2.4) in Frobenius norm and operator norm. This rate
dependsupon level of sparsity of the true covariance matrix. In
addition, for a slight
-
JPEN FOR COVARIANCE AND INVERSE COVARIANCE MATRIX
ESTIMATION5
modication of the method (Theorem 3.3), we prove the consistency
of ourestimate in operator norm and show that its rate is similar
to that of bandedestimator of Bickel and Levina (2008). One of the
major advantage of theproposed estimator is that the derived
algorithm is very fast, ecient andeasily scalable to a large scale
data analysis problem.The rest of the paper is organized as
following. The next section highlights
some background and problem set-up for covariance and inverse
covariancematrix estimation. In section 3, we give proposed
estimator and establishits theoretical consistency. In section 4,
we give an algorithm and compareits computational time with some
other existing algorithms. Section 5 high-lights the performance of
proposed estimator on simulated data while anapplication of
proposed estimator to real life colon tumor data is given inSection
6.Notation: For a matrix M , let kMk1 denote its `1 norm dened as
the
sum of absolute values of the entries of matrix M , kMkF denote
the Frobe-nius norm of matrix M dened as sum of squared element of
M , kMkdenote the operator norm (also called spectral norm) dened
as largest ab-solute eigenvalue of M , M denote matrix M where all
diagonal elementsare set to zero, M+ denote matrix M where all
o-diagonal elements areset to zero, i(M) denote the i
th largest eigenvalue of M , tr(M) denotes itstrace and let
det(M) denote its determinant.
2. Background and Problem Set-up. Let X = (X1; X2; ; Xp) bea
zero-mean p-dimensional random vector. The focus of this paper is
theestimation of the covariance matrix := E(XXT ) and its inverse 1
froma sample of independently and identically distributed data
fX(k)gnk=1. Inthis section we provide some background and problem
setup more precisely.The choice of loss function is very crucial in
any optimization problem.
An optimal estimator for a particular loss function may not be
optimal foranother choice of loss function. Recent literature in
covariance matrix andinverse covariance matrix estimation mostly
focus on estimation based onlikelihood function or quadratic loss
function [Freidman et al. (2007), Baner-jee et al. (2008), Bickel
and Levina (2008), Ravikumar et al. (2011), Roth-man (2012), Maurya
(2014) etc.]. The maximum likelihood estimation re-quires a
tractable probability distribution of observations whereas
quadraticloss function does not have any such requirement and
therefore fully non-parametric. The quadratic loss function is
convex and due to this analyticaltractability, it is a widely
applicable choice for many data analysis problem.
-
6 ASHWINI MAURYA
2.1. Proposed Estimator. Let S be the sample covariance matrix.
Con-sider the following optimization problem.
(2.1) ^; = argmin=T
hjj Sjj22 + kk1 +
pXi=1
aifi() tg2i;
where i() is the ith largest eigenvalue of matrix , and are
some
positive constants. Note that by penalty function kk1, we only
penalizeo-diagonal elements of . The t 2 R+ is a suitably chosen
constant. A choiceof t can be mean or median of sample eigenvalues.
Weights ai's are shrinkageweights associated with ith eigenvalue i.
For ai = 1; 8i = 1; 2; p, the op-timization problem (2.1) shrinks
all the eigenvalues by same weight towardsthe same constant t (mean
of eigenvalues) and consequently (due to squareddeviation penalty
on eigenvalues) this will yield maximum shrinkage in
theeigen-spectrum. The squared deviation penalty term for
eigenvalues shrink-age is chosen from following points of interest:
i) It is easy to interpret andii) this choice of penalty function
yields a very fast optimization algorithm.From here onwards we
suppress the dependence of ; on ^ and denote^; by ^.For = 0, the
standard lasso estimator for quadratic loss function and
its solution is (see x4 for derivation of this estimator):
^ii = sii
^ij = sign(sij)maxjsij j
2; 0; i 6= j:
(2.2)
where sign(x) is sign of x and jxj is absolute value of x. It is
clear from thisexpression that a suciently large value of will
result in sparse covariancematrix estimate. But it is hard to
assess whether ^ of (2.2) overcomes theover-dispersion in the
sample eigenvalues. The following eigenvalue plot (g-ure (2.1))
illustrates this phenomenon for a neighbourhood type (see x5
fordetails on description of neighborhood type of matrix) of
covariance matrix.We simulated random vectors from multivariate
normal distribution withn = 50; p = 50.
-
JPEN FOR COVARIANCE AND INVERSE COVARIANCE MATRIX
ESTIMATION7
Fig 2.1. Comparison of eigenvalues of sample and JPEN estimate
of Covariance Matrix
As is evident from gure 2.1, eigenvalues of sample covariance
matrix areover-dispersed as most of them are either too large or
close to zero. Eigenval-ues of the Joint Penalty (JPEN) estimate
(2.4) of the covariance matrix areconsistent for the eigenvalues of
true covariance matrix. See x5 for detaileddiscussion. Another
drawback of the estimator (2.2) is that the estimate canbe negative
denite [for details here see Xue et al. (2012)].
As argued earlier, to overcome the over-dispersion in sample
covariancematrix, we include eigenvalues squared deviation penalty.
To illustrate itsadvantage, consider = 0. After some algebra, let ^
be the minimizer of(2.1) (for = 0) is given by:
(2.3) ^ =1
2(^1 + ^
T1 ) where ^1 = (S + t UAU
T )(I + UAUT )1;
where A = diag(A11; A22; ; App) with Aii = ai and U is a matrix
ofeigenvectors (refer to x4 for details for choice of U). Note that
^1 in (2.3)may not be symmetric but ^ is. To see if the estimate
above is positive
-
8 ASHWINI MAURYA
denite, since min(^1) = min(^T1 ), after some algebra, we
have:
min(^) = min(SU(I + A)1UT + t UA(I + A)1UT )
min(SU(I + A)1UT ) + t min(UA(I + A)1UT ) min(S)
1 + maxip(Aii)+ t min
ip Aii1 + Aii
t min
ipAii
1 + Aii> 0
for minipAii > 0. This means that the eigenvalues squared
deviationpenalty improves S to a positive denite estimator ^
provided that >0; t > 0;minipAii > 0. Note that the
estimator (2.3) is well conditionedbut need not be sparse. Sparsity
can be achieved by imposing `1 penalty toeach entry of covariance
matrix. Simulation experiments have shown that ingeneral the
minimizer of (2.1) is not positive denite for all values of >
0and > 0. To achieve both well conditioned and sparse positive
deniteestimator we optimize the objective function of (2.1) over
specic regionof values of (; ) which depends upon S; t; and A. The
proposed JPENestimator of covariance matrix is given by:
(2.4) ^ = argmin=T j(;)2R^S;t;A;1
hjjSjj2F +kk1+
pXi=1
aifi() tg2i;
where
R^S;t;A;1 =S>0
n(; ) : (; ; ) 2 RS;t;A;1 )
o;
and
RS;t;A;1 =n(; ; ) : > 0;
rlog p
n;
min(S)
1 + maxipAii
+ t minip
Aii1 + Aii
2maxip
(1 + Aii)1
o:
The minimization in (2.4) over is for xed (; ) 2 R^S;t;A;1 ; is
somepositive constant. Note that such choice of ; guarantees the
minimumeigenvalue of the estimate in (2.4) to be at least > 0.
Theorem 3.1 estab-
lishes that the set R^S;t;A;1 is asymptotic nonempty.
2.2. Our Contribution. The main contributions are the
following:i) The proposed estimator is both sparse and well
conditioned simultane-ously. This approach allows to take advantage
of a prior structure if known
-
JPEN FOR COVARIANCE AND INVERSE COVARIANCE MATRIX
ESTIMATION9
on the eigenvalues of true covariance matrix.ii)We establish
theoretical consistency of proposed estimator in both Frobe-nius
and Operator norm.iii) The proposed algorithm is very fast, ecient
and easily scalable to largescale optimization problems.We did
simulations to compare the performance of the proposed esti-
mators of covariance and inverse covariance matrix to some other
existingestimators for a number of structured covariance and
inverse covariance ma-trices for varying sample sizes and
dimensions. See x5 for further details.
3. Analysis of JPEN Method. Def: A random vector X is said
tohave sub-gaussian distribution if for each y 2 Rp f0g with kyk2 =
1 andfor t 0, there exist 0 < tg et2=2
Theorem 3.1. X := (X1; X2; ; Xp) be a mean zero subgaussian
ran-dom vector as dened in (3.1). Let S = (1=n)XXT be the sample
covariance
matrix and pn ! < 1 as n = n(p)!1. Let R^S;t;A;1 be as dened
in (2.4).For (; ) 2 R^S;t;A;1 we have R^S;t;A;1 4R;1 ! in
probability, where
R;1 =[>0
ng() >
o;
where g() > 0 is the limit of smallest eigenvalue of S in
probability and is the empty set.
Next we give the theoretical results about the consistency of
our proposedestimator (2.4) of covariance matrix.
3.1. Covariance Matrix Estimation. We make the following
assumptionsabout the true covariance matrix 0.A0. The X := (X1; X2;
; Xp) be a mean zero vector with covariancematrix 0 such that each
Xi=
p0ii has subgaussian distribution with pa-
rameter as dened in (3.1).A1. With E = f(i; j) : 0ij 6= 0; i 6=
jg; the cardinality(E) s for somepositive integer s.A2. There
exists a nite positive real number k > 0 such that 1=k min(0)
max(0) k, where min(0) and max(0) are the mini-mum and maximum
eigenvalues of matrix 0 respectively.
-
10 ASHWINI MAURYA
Assumption A2 guarantees that the true covariance matrix 0 is
well con-ditioned (i.e. all the eigenvalues are nite and positive).
A well conditionedmeans that [Ledoit and Wolf (2004)] inverting the
matrix does not explodesthe estimation error. Assumption A1 is more
of a denition which says thatthe number of non-zero o diagonal
elements are bounded by some posi-tive integer. The Theorem 3.2
below gives the rate of convergence of theproposed covariance
matrix estimator (2.4) in Frobenius norm.
Theorem 3.2. Let (; ) 2 R^S;t;A;1 and ^ be as dened in (2.4).
UnderAssumptions A0, A1, A2 and for min(0) t max(0), we have:
(3.2) k^ 0kF = OPr(p+ s)log p
n
Here the worst part of rate of convergence comes from estimating
the
diagonal entries. For correlation matrix estimation, the rate
can be improved
to OP
ps log p=n
(Corollary 3.2).
Let 0 =WW be the variance correlation decomposition of true
covari-ance matrix 0 where is true correlation matrix and W is the
a diagonalmatrix of true standard deviations. Let K^ be the
solution to following opti-mization problem.
(3.3) K^ = argminK=KT j(;)2R^^;t;A;1a
nkK^k2 + kKk1 +
pXi=1
aifi(K)tg2o
where R^^;t;A;1a is given by:
R^^;t;A;1a =[
>0
n(; ) : (; ; ) 2 R^;t;A;1a )
o;(3.4)
and
R^;t;A;1a =n(; ; ) : > 0;
rlog p
n;
min(^)
1 + maxipAii
+ t minip
Aii1 + Aii
2maxip
(1 + Aii)1
o:
and ^ is the sample counterpart of . Similar to Theorem 3.1, the
following
corollary establishes that the set of symmetric dierence between
R^^;t;A;1aand its asymptotic counterpart R;1a is empty as n =
n(p)!1.
-
JPEN FOR COVARIANCE AND INVERSE COVARIANCE MATRIX
ESTIMATION11
Corollary 3.1. X := (X1; X2; ; Xp) be a mean zero random
vectorwhere each fXigi=1; ;p has subgaussian distribution as dened
in (3.1). Let^ be the sample correlation matrix. Let pn ! < 1 as
n = n(p) ! 1. LetR^^;t;A;1a be as dened in (3.4). We have R^
^;t;A;1a 4R;1a ! in probability,
where
R;1a =[ >0
n(1
p)2 >
oWe have the following rate of convergence for correlation
matrix estimate
K^ of (3.3).
Corollary 3.2. Under the Assumption of A0; A1; A2, min() t max()
and for (; ) 2 R^^;t;A;1a ,
(3.5) kK^ kF = OPrs log p
n
:
The improved rate is due to the fact that for correlation
matrix, all thediagonal entries are one. Dene ^c := W^ K^W^ , where
W^ is a diagonal matrixof the estimates of true standard deviations
based on observations. Thefollowing theorem gives the rate of
convergence of correlation matrix basedcovariance matrix estimator
in operator norm.
Theorem 3.3. Under the assumption A0, A1, A2 and for (; )
2R^^;t;A;1a ,
(3.6) k^c 0k = OPr(s+ 1)log p
n
:
Note that k^c0kF ppk^c0k. Therefore the rate of convergencein
Frobenius norm of the correlation matrix based estimator of
covariancematrix is the same as the one dened in (2.4).Remark: This
rate of operator norm convergence is same as the one ob-tained in
Bickel and Levina (2008) for banded covariance matrices.
Althoughthe method of proof is very dierent but the similar rate of
convergence inoperator norm is due to the similar kind of tail
inequality for sample covari-ance matrix of Gaussian and
sub-Gaussian random variables [Ravikumaret al. (2011)]. Rothman
(2012) propose an estimator of covariance matrixbased on similar
loss function but the choice of dierent penalty functionyields very
dierent estimate. This is also exhibited in simulation analy-sis of
x5. Moreover our proposed estimator is applicable to estimate
any
-
12 ASHWINI MAURYA
non-negative covariance matrix which is not the case for
Rothman's (2012)estimator (since Rothman's estimator involves
logarithmic of determinantof the estimator as another penalty to
keep all the eigenvalues of estimatedmatrix away from zero).
3.2. Estimation of Inverse Covariance Matrix. Notation: We shall
use
for inverse covariance matrix.Assumptions: We make following
assumptions about the true inverse co-variance matrix 0. Let 0
=
10
B0. The random vector X := (X1; X2; ; Xp) is a mean zero vector
withcovariance matrix 0 such that each Xi=
p0ii has subgaussian distribution
with parameter as in (3.1).B1. With H = f(i; j) : 0ij 6= 0; i 6=
jg, the cardinality(H) s for somepositive integer s.B2. There exist
0 < k < 1 large enough such that (1=k) min(0) max(0) k and
min(S + I) 1=(k) for all
plog p=n and S =
(1=n)XXT .Remark: In Assumption B2, we require the minimum
eigenvalue of S1 :=(S + I)1 to be bounded above by some positive
constant. Letlimn(p)!1 p=n = < 1, then by a result from Bai and
Yin (1993),limn(p)!1 min(S) = g() > 0. Consequently min(S + I)
1=(k) forlarge enough k. This condition is required in establishing
the rate of conver-gence of estimator (3.7) (see the Theorem
3.5).Dene the JPEN estimator of inverse covariance matrix 0 as the
solutionto the following optimization problem,(3.7)
^ = argmin
=T j(;)2R^S;t;A;2
hkS1 k2 + kk1 +
pXi=1
aifi() tg2i
where
R^S;t;A;
2 =[
>0
n(; ) : (; ; ) 2 RS;t;A;2 )
o;(3.8)
with
RS;t;A;
2 =n(; ; ) : > 0;
qlog pn ;
min(S1)
1+maxip Aii
+ t minip
Aii1+Aii
2 maxip(1 + Aii)1 o;for A = diag(A11; A22; ; App) with Aii = ai
and ai dened in (3.7).Remark: Note that this choice of S is
positive denite matrix and thereforeinvertible.
-
JPEN FOR COVARIANCE AND INVERSE COVARIANCE MATRIX
ESTIMATION13
Theorem 3.4. X := (X1; X2; ; Xp) be a mean zero vector whereeach
fXigi=1; ;p has subgaussian distribution as dened in (3.1). Let S
=(1=n)XXT ; S = S + I for plog p=n. Let pn ! < 1 as n = n(p)!1.
Let R^S;t;A;2 be as dened in (3.8). We have R^S
;t;A;2 4R;2 ! in
probability, where
R;2 =n : > 0; g1() >
o;
g1() = limn=n(p)!1 min(S1) and is empty set.
The following theorem gives the consistency of inverse
covariance matrixestimator (3.7) in Frobenius norm.
Theorem 3.5. Let ^ be the minimizer as dened in (3.7). Under
As-
sumptions B0, B1, B2 and for (; ) 2 R^S;t;A;2 and min(0) t
max(0), we have:
(3.9) k^ 0kF = OPr(p+ s)log p
n
Note that the rate of convergence here is the same as for the
covariance
matrix estimation. Let L^ be the solution to following
optimization problem:(3.10)
L^ = argminL=LT j(;)2R^^;t;A;2a
nkL ^1k2 + kLk1 +
pXi=1
aifi(L) tg2o
where ^1 = W^S1W^ and
R^^;t;A;2a =[
>0
n(; ) : (; ; ) 2 R^;t;A;2a )
o;(3.11)
with
R^;t;A;2a =n(; ; ) : > 0;
qlog pn ;
min(^1)
1+maxip Aii
+ t minip
Aii1+Aii
2 minip(1 + Aii)1 o;Corollary 3.3. X := (X1; X2; ; Xp) be a mean
zero vector where
each fXigi=1; ;p has subgaussian distribution as dened in (3.1).
Let pn ! < 1 as n = n(p) ! 1 and R^^;t;A;2a be as dened in
(3.8). For (; ) 2R^^;t;A;2a , we have R^
^;t;A;2a 4R;2a ! in probability, where
R;2a =[>0
ng2()
o;
-
14 ASHWINI MAURYA
g2() is limit of smallest eigenvalue of ^1 and is empty set.
We have following rate of convergence of the inverse of the
correlationmatrix estimator given in (3.10).
Corollary 3.4. Let L^ be the minimizer of (3.10). Under the
assump-
tion B0, B1, B2 and for (; ) 2 R^^;t;A;2a ,
(3.12) kL^ 1kF = OPrs log p
n
This rate is the same as that of correlation matrix estimator
given in
(3.3).Dene ^c := W^
1L^W^1. We have the following result on the operator
normconsistency of inverse correlation matrix based inverse
covariance matrix.
Theorem 3.6. Under the assumption of B0, B1, B2 and for (; )
2R^^;t;A;2a ,
(3.13) k^c 0k = OPr(s+ 1)log p
n
Since k^c 0kF ppk^ 0k, the rate of convergence of the
inverse
covariance matrix based on inverse correlation matrix is same as
that of thecovariance matrix estimator based on correlation
matrix.
4. An Algorithm.
4.1. Covariance Matrix Estimation:. The optimization problem
(2.4) canbe written as:
^ = argmin=T j(;)2R^S;t;A;1
f();(4.1)
where
f() = jj Sjj2F + kk1 + pX
i=1
aifi() tg2:
A solution to (4.1) is given by:
^ii =Mii;
^ij = signMij
max
njMij j
2(1 + maxipAii); 0o; i 6= j;(4.2)
-
JPEN FOR COVARIANCE AND INVERSE COVARIANCE MATRIX
ESTIMATION15
where
M =1
2
M1 +M
T1
with M1 = (S + t UAU
T )(I + UAUT )1;
A = diag(A11; A22; ; App) with Aii = ai and (; ) 2 R^S;t;A;1
.Choice of U:Note that U is the matrix of eigenvectors of , which
is unknown. One choiceof U is matrix of eigenvectors of
corresponding eigenvalue decomposition ofS + I for some > 0 i.e.
let S + I = U1D1U
T1 , then take U = U1.
Choice of and :For given value of , we can nd the value of
satisfying:
< 2 (1 + minip
Aii)n min(S)1 + maxiAii
o+ 2 t min
ipAii 2 ;
and such choice of (; ) 2 R^S;t;A;1 which guarantees that the
minimumeigenvalue of the estimate (4.2) will be at least >
0.
4.2. Inverse Covariance Matrix Estimation:. To get an expression
of in-verse covariance matrix estimate, we replace S by S1 in
(4.2). Let A bethe weight matrix for eigenvalues of inverse
covariance matrix of optimiza-tion problem (3.7), then an optimal
solution to optimization problem (3.7)is give by:
(4.3)
^ii =M
ii
^ij = signMij
max
njMij j 2(1+maxip Aii) ; 0
o: i 6= j:
where M = 12(M2 +MT2 ); M2 = (S
1 + t U1AUT1 )(I + U1AUT1 )1,and (; ) 2 R^S;t;A;2 . A choice of
U1 is matrix of eigenvectors of eigen-decomposition of S1 = U1D1UT1
.
4.2.1. Computational Time. We compare the computational timing
ofour algorithm to some other existing algorithms glasso[12]
(Friedman etal.(2008)), PDSCE [28] (Rothman (2011)). Note that the
exact timing ofthese algorithm also depends upon the
implementation, platform etc. (wedid our computations in R on a AMD
2.8GHz processor). For each estimate,the optimal tuning parameter
was obtained by minimizing the empirical lossfunction
(4.4) k^ SrobustkF
-
16 ASHWINI MAURYA
where ^ is an estimate of the the covariance matrix, Srobust is
the samplecovariance matrix based on 20000 sample observations
(refer the section x5for detailed discussion).Figure 4.1
illustrates the total computational time taken to estimate
thecovariance matrix by Glasso; PDSCE and JPEN algorithms for
dierentvalues of p for Toeplitz type of covariance matrix on
log-log scale (see sectionx5 for Toeplitz type of covariance
matrix). Although the proposed methodrequires optimization over a
grid of values of (; ) 2 R^S;t;A;1 , our algorithmis very fast and
easily scalable to large scale data analysis problems.
500 1000 2000 5000
51
05
01
00
50
05
00
0
number of covariates p
tim
e in
se
ce
on
ds
JPEN
glasso
PDSCE
Fig 4.1. Timing comparison of JPEN, Graphical Lasso(Glasso),
PDSCE on log-log scale.
5. Simulation Results.
We compare the performance of the proposed method to other
existingmethods on simulated data for four types of structured
covariance and in-verse covariance matrices.
(i) Hub Graph: The rows/columns of 0 are partitioned into J
equally-sized disjoint groups: fV1 [ V2 [; :::;[ VJg = f1; 2; :::;
pg; each group isassociated with a pivotal row k. Let size jV1j =
s. We set 0i;j = 0j;i = for i 2 Vk and 0i;j = 0j;i = 0 otherwise.
In our experiment, J = [p=s]; k =1; s+ 1; 2s+ 1; :::; and we always
take = 1=(s+ 1) with J = 20.
-
JPEN FOR COVARIANCE AND INVERSE COVARIANCE MATRIX
ESTIMATION17
(ii) Neighborhood Graph: We rst uniformly sample (y1; y2; :::;
yn)from a unit square. We then set 0i;j = 0j;i = with
probability
(p2)
1exp(4kyi yjk2). The remaining entries of 0 are set to be
zero.
The number of nonzero o-diagonal elements of each row or column
is re-stricted to be smaller than [1=] where is set to be
0.245.
(iii) Toeplitz Matrix: We set 0i;j = 2 for i = j; 0i;j =
j0:75jjijjfor ji jj = 1; 2; and 0i;j = 0 otherwise.
(iv) Block Diagonal Matrix: In this setting 0 is a block
diagonalmatrix with varying block size. For p = 500 number of
blocks is 4 and forp = 1000 the number of blocks is 6. Each block
of covariance matrix is takento be Toeplitz type matrix as in case
(iii).
We chose similar structure of 0 for simulations. For all these
choicesof covariance and inverse covariance matrices, we generate
random vec-tors from multivariate normal distribution with varying
n and p. We chosen = 50; 100 and p = 500; 1000. Here we report the
results for n = 50 andp = 500; 1000. Please refer the section 8 for
detailed simulation analysis.We compare the performance of proposed
covariance matrix estimator toto graphical lasso, PDSC Estimate
[Rothman (2011)] and Ledoit-Wolf es-timate of covariance matrix.
The JPEN estimate (4.2) of the covariancematrix was computed using
R software(version 3.0.2). The graphical lassoestimate of the
covariance matrix was computed using R package
\glasso"(http://statweb.stanford.edu/ tibs/glasso/). The
Ledoit-Wolf estimate wasobtained using code from (http:
//www.econ.uzh.ch/faculty/wolf/ publi-cations.html#9). The PDSC
estimate was obtained using PDSCE pack-age (http://cran. r-project.
org/web/ packages/PDSCE/index.html). Forinverse covariance matrix
performance comparison we only include glassoand PDSCE. For each of
covariance and inverse covariance matrix estimate,we calculate
Average Relative Error (ARE) based on 50 iterations usingfollowing
formula:
ARE(; ^) = jlog(f(S; ^)) log(f(S;))j=j(log(f(S;))j;where f(S; )
is density of multivariate normal distribution, S is sample
co-variance matrix, is the true covariance, ^ is the estimate of .
Otherchoices of performance criteria are Kullback Leibler used by
Yuan and Lin[2007], Bickel and Levina [2008]. The optimal values of
tuning parametersfor and were obtained by minimizing empirical loss
function given in(4.4). Simulation shows that the optimal choice of
tuning parameters and
-
18 ASHWINI MAURYA
are same as if we replace Srobust by true covariance matrix .
The averagerelative error and their standard deviations are given
in table 5.1. The num-bers in the bracket are the standard error
estimate of relative error. Table5.1 gives average relative errors
and standard errors of the covariance ma-trix estimates based on
glasso, Ledoit-Wolf, PDSCE and JPEN for n = 50and p = 500; 1000.
The glasso estimate of covariance matrix performs verypoorly among
all the methods. The Ledoit-Wolf estimate performs goodbut the
estimate is generally not sparse. Also the eigenvalues estimates
ofLedoit-Wolf estimator is heavily shrunk towards the center than
the trueeigenvalues. The JPEN estimators outperforms other
estimators for most ofthe values of p for all four type of
covariance matrices. PDSCE estimateshave lower average relative
error and close to JPEN. This could be due tothe fact the PDSCE and
JPEN uses quadratic optimization function with adierent penalty
function. Table 5.2 reports the average relative error andtheir
standard deviations for inverse covariance matrix estimation. Here
wedo not include the Ledoit-Wolf estimator and only compare glasso,
PDSCEestimates with proposed JPEN estimator. The JPEN estimate of
inversecovariance matrix outperforms other methods for all values
of p = 500 andp = 1000 for all four types of structured inverse
covariance matrices. Figure5.1 report the zero recovery plot of
percentage of time each zero element ofcovariance matriz was truly
recovered based on 50 realizations. The JPENestimates recovers the
true zeros for about 90% of times for Hub and Neigh-borhood type of
covariance matrix. Our proposed estimator also reect therecovery of
true structure of non-zero entries and any pattern among
therows/columns of covariance matrix.
Table 5.1Covariance matrix estimation
Hub type matrix Neighborhood type matrixp=500 p=1000 p=500
p=1000
Ledoit-Wolf 2.13(0.103) 2.43(0.043) 1.36(0.054)
2.89(0.028)Glasso 10.8(0.06) 14.7(0.052) 11.9(0.056)
14.3(0.03)PDSCE 1.22(0.052) 2.23(0.051) 0.912(0.077)
1.85(0.028)JPEN 1.74(0.051) 1.97(0.037) 0.828(0.052)
1.66(0.028)
Block type matrix Toeplitz type matrixLedoit-Wolf 1.54(0.102)
2.96(0.0903) 1.967(0.041) 2.344(0.028)
Glasso 30.8(0.0725) 33.9(0.063) 12.741(0.051) 18.22(0.04)PDSCE
1.62(0.118) 3.08(0.0906) 0.873(0.042) 1.82(0.028)JPEN 1.01(0.101)
1.91(0.0909) 0.707(0.042) 1.816(0.028)
-
JPEN FOR COVARIANCE AND INVERSE COVARIANCE MATRIX
ESTIMATION19
Table 5.2Inverse covariance matrix estimation
Hub type matrix Neighborhood type matrixp=500 p=1000 p=500
p=1000
Glasoo 13.4(0.057) 17.5(0.065) 12.694(0.03) 13.596(0.033)PDSCE
1.12(0.046) 2.34(0.044) 0.958(0.04) 1.85(0.038)JPEN 0.613(0.033)
0.282(0.028) 0.392(0.038) 0.525(0.036)
Block type matrix Toeplitz type matrixGlasoo 12.7(0.0406)
13.6(0.0316) 19.4(0.037) 20.7(0.022)PDSCE 1.02(0.0562) 1.9(0.038)
1.91(0.064) 3.7(0.037)JPEN 0.372(0.0481) 0.579(0.0328) 0.664(0.068)
2.42(0.045)
To see the implication of eigenvalues shrinkage penalty as
compared toother methods, we plot (Figure 5.2) the eigenvalues of
estimated covariancematrix for n = 20,p = 50. JPEN estimates of
eigen-spectrum are far betterthan other methods and closest being
PDSC estimates of eigenvalues.
Fig 5.1. Heatmap of zeros identied in covariance matrix out of
50 realizations. Whitecolor is 50/50 zeros identied, black color is
0/50 zeros identied.
-
20 ASHWINI MAURYA
Fig 5.2. Eigenvalues plot for n = 20; p = 50 based on 50
realizations
6. Colon Tumor Classication Example. In this section, we
com-pare performance of our proposed covariance matrix estimator
for LinearDiscriminant Analysis (LDA) classication of tumors using
gene expres-sion data from Alon et al. (1999). In this experiment,
colon adenocarci-noma tissue samples were collected, 40 of which
were tumor tissues and22 non-tumor tissues. Tissue samples were
analyzed using an Aymetrixoligonucleotide array. The data were
processed, ltered, and reduced to asubset of 2,000 gene expression
values with the largest minimal intensity overthe 62 tissue samples
(source:
http://genomics-pubs.princeton.edu/oncology/aydata/index.html).
Additional information about the dataset and itspre-processing can
be found in Alon et al. (1999). In our analysis, we re-duce the
number of genes by selecting p most signicant genes based
onlogistic regression. We obtain estimates of inverse covariance
matrix forp = 50; 100; 200 and then use LDA to classify these
tissues as either tu-morous or non-tumorous (normal). We classify
each test observation x toeither class k = 0 or k = 1 using the LDA
rule
k(x) = argmaxk
nxT ^^k 1
2^k^^k + log(k)
o:(6.1)
where k is the proportion of class k observations in the
training data, k isthe sample mean for class k on the training
data, and ^ := ^1 is an esti-mator of the inverse of the common
covariance matrix on the training datacomputed by one of the
methods under consideration. Tuning parameters and were chosen
using 5-fold cross validation. To create training and test
-
JPEN FOR COVARIANCE AND INVERSE COVARIANCE MATRIX
ESTIMATION21
sets, we randomly split the data into a training set of size 42
and a testingset of size 20; following the approach used by Wang et
al. (2007), we requirethe training set to have 27 tumor samples and
15 non-tumor samples. Werepeat the split at random 100 times and
measure the average classicationerror.
Table 6.1Averages and standard errors of classication errors
over 100 replications in %.
Method p=50 p=100 p=200
Logistic Regression 21.0(0.84) 19.31(0.89) 21.5(0.85)SVM
16.70(0.85) 16.76(0.97) 18.18(0.96)Naive Bayes 13.3(0.75)
14.33(0.85) 14.63(0.75)Graphical Lasso 10.9(1.3) 9.4(0.89)
9.8(0.90)Joint Penalty 9.9(0.98) 8.9(0.93) 8.2(0.81)
Since we do not have separate validation set, we do the 5-fold
cross val-idation on training data. At each split, we divide the
training data into 5subsets (fold) where 4 subsets are used to
estimate the covariance matrixand 1 subset is used to measure the
classier's performance. For each split,this procedure is repeated 5
times by taking one of the 5 subsets as vali-dation data. An
optimal combination of and is obtained by minimizingthe average
classication error. Tuning parameter for graphical lasso
wasobtained by similar criteria.The average classication errors
with standard errors over the 100 splits arepresented in Table 6.1.
Since the sample size is less than the number of genes,we omit the
inverse sample covariance matrix as its not well dened and in-stead
include the naive Baye's and support vector machine classiers.
NaiveBayes has been shown to perform better than the sample
covariance matrixin high-dimensional settings (Bickel and Levina
(2004)). Support VectorMachine(SVM) is another popular choice for
high dimensional classicationtool (Chih-Wei Hsu et al. (2010)).
Among all the methods covariance matrixbased based LDA classiers
perform far better that Naive Bayes, SVM andLogistic Regression.
For all other classiers the classication performancedeteriorates
for increasing p. For larger p i.e. when more genes are added tothe
data set, the classication performance of JPEN estimate based
LDAclassier improves which is dierent from Rothman et el. (2008)
analysis ofsame data set where the authors pointed out that as more
genes are addedto the data set, the classiers performance
deteriorates. Note that the clas-sication error of a covariance
matrix based classier initially decreases forincreasing p and
deteriorates for large p. This is due to the fact that as
di-mension of covariance matrix increases, the estimator does not
remain very
-
22 ASHWINI MAURYA
informative. In particular for p = 2000, when all the genes are
used in dataanalysis, the classication error of JPEN and glasso is
about 30% which ismuch higher than for p = 50.
7. Summary. We have proposed and analyzed regularized
estimationof large covariance and inverse covariance matrices using
joint penalty. Oneof its biggest advantages is that the
optimization carries no computationalburden unlike many other
methods for covariance regularization and the re-sulting algorithm
is very fast, ecient and easily scalable to large scale
dataanalysis problems. We show that our estimators of covariance
and inversecovariance matrix are consistent in the Frobenius and
operator norm. Theoperator norm consistency guarantees consistency
for principal components,hence we expect that PCA will be one of
the most important applicationsof the method. Although the
estimators in (2.4) and (3.7) do not requireany assumption on the
structure of true covariance and inverse covariancematrices
respectively, but priori knowledge of any structure of true
covari-ance matrix might be helpful to choose a suitable weight
matrix and henceimprove estimation.AcknowledgmentsI would like to
express my deep gratitude to Professor Hira L. Koul for hisvaluable
and constructive suggestions during the planning and developmentof
this research work.
References.
[1] Alon U., Barkai N., Notterman D., Gish K., Ybarra S., Mack
D. and Levine A., Broadpatterns of gene expression revealed by
clustering analysis of tumor and normal colontissues probed by
oligonucleotide arrays. Proceeding of National Academy of
ScienceUSA, 96(12):67456750, 1999.
[2] Banerjee O., El Ghaoui L. and dAspremont A., Model selection
through sparse max-imum likelihood estimation for multivariate
Gaussian or binary data. Journal of Ma-chine Learning Research,
9,485-516, 2008.
[3] Bickel P. and Levina E., Regulatized estimation of large
covariance matrices. TheAnnals of Statistics, 36,199-227, 2008.
[4] Bickel P. and Levina E., Covariance regularization by
thresholding The Annals ofStatistics, Volume 36, 2577-2604,
2008.
[5] Cai T., Zhang C. and Zhou H., Optimal rates of convergence
for covariance matrixestimation. The Annals of Statistics 38,
2118-2144, 2010.
[6] Cai T., Liu W. and Luo X., A constrained `1 minimization
approach to sparse precisionmatrix estimation. Journal of American
Statistical Association 106, 594-607, 2011.
[7] Chaudhury S., Drton M. and Richardson T., Estimation of a
covariance matrix withzeros. Biometrica, Volume 94, Issue 1Pp.
199-216, 2007.
[8] Clarke R., Ressom H., Wang A., Xuan J., Liu M., Gehan E. and
Wang Y., Theproperties of high-dimensional data spaces:
implications for exploring gene and proteinexpression data. Nat Rev
Cancer. Jan 2008; 8(1): 3749.
[9] Dempster A., covariance Selection. Biometrika, 32,95-108,
1972.
-
JPEN FOR COVARIANCE AND INVERSE COVARIANCE MATRIX
ESTIMATION23
[10] Dey D. and Srinivasan C., Estimation of a Covariance Matrix
under Stein's Loss.Annals of Statistics Volume 13, Number 4,
1581-1591, 1985.
[11] Fan J., Fan Y. and LV J., High-dimensional covariance
matrix estimation using afactor model. Journal of Econometrics
[12] Friedman J., Hastie T. and Tibshirani R., Sparse inverse
covariance estimation withthe graphical lasso. Biostatistics. 2008
Jul; 9(3),432-441. 2007.
[13] Geman S., A Limit Theorem for the Norm of Random Matrices.
Annals of Statistics,Volume 8, Number 2, 252-261, 1980.
[14] Bein J., Tibshirani R., Sparse estimation of a covariance
matrix. Biometrica, Volume98, Issue 4Pp. 807-820, 2011
[15] Johnstone I. and LU Y., Sparse principal components
analysis. Unpublishedmanuscript, 2004.
[16] El Karoui N., Spectrum estimation for large dimensional
covariance matrices usingrandom matrix theory. Annals of
Statistics, Volume 36, Number 6 (2008), 2757-2790.
[17] El Karoui N., Operator norm consistent estimation of large
dimensional sparse co-variance matrices. Annals of Statistics.
36:2717-56, 2008.
[18] Ledoit O. and Wolf M., A well-conditioned estimator for
large-dimensional covariancematrices. Journal of Multivariate
Analysis, 88 (2004), pp. 365411.
[19] Marcenko V. and Pastur L., Distributions of eigenvalues of
some sets of randommatrices. Math. USSR-Sb 1 507536,1967.
[20] Mardia K., Kent J. and Bibby J., Multivariate Analysis.
Academic Press, New York.MR0560319, 1979.
[21] Maurya Ashwini., A joint convex penalty for inverse
covariance matrix estimation.Computational Statistics and Data
Analysis, Volume 75, July 2014, Pages 1527.
[22] Maurya Ashwini., A suuplement to "A well conditioned and
sparse estimate of co-variance and inverse covariance matrix using
joint penalty". Submitted to Annals ofStatistics, Nov, 2014.
[23] Meinshausen and Buhlmann P., High dimensional graphs and
variable selection withthe lasso, Annals of Statistics 34,
1436-1462 2006.
[24] Pass G., Chowdhury A. and Torgeson C., "A Picture of
Search". The First Interna-tional Conference on Scalable
Information Systems, Hong Kong, June, 2006.
[25] Pourahmadi M., Modeling covariance matrices: The GLM and
regularization per-spectives. Statistical Science, 26:369-87,
2011.
[26] Pourahmadi M., Cholesky decompositions and estimation of a
covariance matrix: or-thogonality of variance-correlation
parameters. Biometrika 94 (2007), no. 4, 10061013.
[27] Ravikumar P., Wainwright M., Raskutti G. and Yu B.,
High-dimensional covarianceestimation by minimizing l1-penalized
log-determinant divergence. Electronic Journalof Statistics, Volume
5, 935-980, 2011.
[28] Rothman A. J., Bickel P. J., Levina E. and Zhu J., Sparse
permutation invariantcovariance estimation. Electron. J. Stat. 2
494-515,2008.
[29] Rothman A., Positive denite estimators of large covariance
matrices. Biometrica,Volume 99, Issue 3Pp. 733-740, 2012.
[30] Wainwright M., Ravikumar P. and Laerty J., High-dimensional
graphical modelselection using L1-regularized logistic regression.
Proceedings of Advances in NeuralIn formation Processing Systems,
2006.
[31] Stein C., Estimation of a covariance matrix. Rietz lecture,
39th Annual Meeting IMS.Atlanta, Georgia, 1975
[32] Wang S., Kuo T. and Hsu C., Trace bounds on the solution of
the algebraic matrixRiccati and Lyapunov equation. IEEE
Transactions on Automatic Control, VOL AC-31, NO. 7, July 1986.
-
24 ASHWINI MAURYA
[33] Wang L., Zhu J. and Zou H., Hybrid huberized support vector
machines for microar-ray classication. In ICML 07: Proceedings of
the 24th International Conference onMachine Learning, pages 983990,
New York, NY, USA. ACM Press. 2007
[34] Xue L., Ma S. and Zou Hui, Positive-Denite l1-Penalized
Estimation of Large Co-variance Matrices. Journal of American
Statistical Association, Theory and Methods,Vol 107,
No.500,2012.
[35] Yin Y. and Bai Z., Limit of the smallest eigenvalue of
large dimensional samplecovariance matrix. The Annals of
Probability Vol 21, No.3, 1275-1294,1993.
[36] Yuan M. and Lin Y., Model selection and estimation in the
Gaussian graphical model.Biometrika 94(1), 19-35,2007.
[37] Yuan M., Sparse inverse covariance matrix estimation via
linear programming. Jour-nal of Machine Learning Research 11,
2261-2286, 2009.
[38] Zhou S., Rutimann P., Xu M. and Buhlmann P.,
High-dimensional covariance esti-mation based on Gaussian graphical
models. Journal of Machine Learning Research,to appear, 2011 .
[39] Zou H., Hastie T. and Tibshirani R., Sparse principal
components analysis. J. Com-put. Graph. Statist. 15 265286
MR2252527, 2006.
8. Technical Proofs.
Proof of Theorem 3.1. Let = UDUT be the eigenvalue
decomposi-tion of . Let,
f1(D) = kUDUT Sk2F + k UDUT k1 + X1ip
aifi() tg2
= tr(D2) 2 tr(SUDUT ) + tr(S2) + k UDUT k1+ftr(AD2) 2 t tr(AD) +
t2 tr(A)g
= tr(D2(I + A)) 2 tr(D(UTSU + t A) + tr(S2) + k UDUT k1+
t2tr(A)
Note that this is quadratic in D and since (I + A) is a positive
denitematrix, f1(D) is convex. Dierentiating with respect to D, we
obtain
@f1(D)
@D= 2D(I + A) 2(USUT + t A) + UUT sign(UDUT )
@f1(D)@D = 0 satises,
D^ = (USUT + t A)(I + A)1 =2UUT sign(UD^UT )(I + A)1.Positive
deniteness of eigenvalues matrix D^ implies positive deniteness
of^. Next we derive the lower bound on the smallest eigenvalue of
D. Notethat
maxfUUT sign(UD^UT )(I + A)1g = max(I + A)1 = 11+ minip Aii
:
-
JPEN FOR COVARIANCE AND INVERSE COVARIANCE MATRIX
ESTIMATION25
Hence we obtain,
min(D^) minfUSUT (I + A)1g+ t minfA(I + A)1g2
1
1 + minipAii
min(S)1 + maxipAii
+ t minip
Aii1 + Aii
2
1
1 + minipAii:
For plog p=n and plog p=n, we have min(S) ! g() > 0
inprobability by a theorem in [34]. Next we shall prove that
R^S;t;A;1 4R;1 ! in probability. Dene,
Y;;; =min(S)
1 + maxipAii+ t min
ipAii
1 + Aii
2
1
1 + minipAii
Since min(S)! g(), therefore for given > 0, there exist a
positive integerN1 such that for all n = n(p) N1,
PjY;;; g()j < 1 ;
i.e. g() Y;;;xi g() + . Take ! 0, we have R^S;t;A;1 4R;1 =
.Hence the theorem.
Remark: Note that the above result is true in asymptotic sense
underassumption of Theorem 3.1. For nite samples when n < p,
min(S) = 0and because minipAii > 0,
min(D^) t minip
Aii1 + Aii
2
1
1 + minipAii
=1
1 + t minipAii
n
tmin
ipAii
2
o> 0;
for suciently large , t . This guarantees the existence of
nonempty setR^S;t;A;1 for nite samples.
Proof of Theorem 3.2. Let
f() = jj Sjj2F + kk1 + pX
i=1
aifi() tg2;
where is the matrix with all the diagonal elements set to zero.
Denethe function Q(:) as following:
Q() = f() f(0)
-
26 ASHWINI MAURYA
where 0 is the true covariance matrix and is any other
covariance matrix.Let = UDUT be eigenvalue decomposition of , D is
diagonal matrix ofeigenvalues and U is matrix of eigenvectors. We
have,
Q() = k Sk2F + kk1 + tr(AD2 2t AD + t2 A) k0 Sk2F k0 k1 tr(AD20
2t AD20 + t2 A)
(8.1)
where A = diag(a1; a2; ; ap) and 0 = U0D0UT0 is eigenvalue
decompo-sition of 0. Let n(M) := f : = T ; kk2 = Mrn; 0 < M <
1 g.The estimate ^ minimizes the Q() or equivalently ^ = ^ 0
minimizesthe G() = Q(0 +). Note that G() is convex and if ^ be its
solution,then we have G(^) G(0) = 0. Therefore if we can show that
G() isnon-negative for 2 n(M), this will imply that the ^ lies
within sphereof radius Mrn. We require rn =
q(p+s) log p
n ! 0 as n = n(p) goes to 1.This will give consistency of our
estimate in Frobenius norm at rate O(rn).
k Sk2F k0 Sk2F = tr(0 20S + S0S) tr(000 20S + S0S)= tr(0 000) 2
tr(( 0)S)= tr((0 +)
0(0 +) 000) 2 tr(0S)= tr(0) 2 tr(0(S 0))
Next, we bound term involving S in above expression, we have
jtr((0 S))j Xi 6=j
jij(0ij Sij)j+Xi=1
jii(0ii Sii)j
maxi6=j
(j0ij Sij j)kk1 +ppmaxi=1
(j0ii Siij)sX
i=1
2ii
C0(1 + )maxi(0ii)
nr log pn
kk1 +rp log p
nk+k2
o C1
nr log pn
kk1 +rp log p
nk+k2
oholds with high probability by a result (Lemma 1) from
Ravikumar et al.(2011) on the tail inequality for sample covariance
matrix of sub-gaussianrandom vectors and where C1 = C0(1 +
)maxi(0ii); C0 > 0. Next weobtain upper bound on the terms
involving in (3.7). we have,
tr(AD2 2t AD) tr(AD20 2t AD0)= trfA(UT2U UT0 20U0)g 2t trfA(UTU
UT0 0U0)g
-
JPEN FOR COVARIANCE AND INVERSE COVARIANCE MATRIX
ESTIMATION27
(i)tr(A(UT2U UT0 20U0)) 1(A)tr(2 20) trf( + 0)2 20)g tr(200 + 0)
2kppk+kF + tr(0):
(ii)tr(A(UTU UT0 0U0)) 1(A)tr( 0) trf( + 0) 0)g tr() ppk+kF
:
To bound the term (k+0 k1k0 k1) in (3.7), let E be index set as
de-ned in Assumption A.2 of Theorem 3.2. Then using the triangle
inequality,we obtain,
(k +0 k1 k0 k1) = (kE +0 k1 + kEk1 k0k1) (k0 k1 kEk1 + kEk1 k0
k1) (kEk1 kEk1)
Let = (C1=)plog p=n, = (C1=1)
plog p=n; where (; ) 2 R^S;t;A;1 and
(1=k) t k, we obtain,
G() tr(0) 2 C1nr log p
n(kk1) +
rp log p
nk+kF
oC11
rlog p
n
n2kppkkF + kk2F + 2
ppk+kF
o+C1
rlog p
n
kEk1 Ek1 kk2F (1
C11
rlog p
n) 2C1
rp log p
nk+kF
2C1rlog p
n
kEk1 + kEk1+ C1rlog p
n
kEk1 Ek12C1
rlog p
n(1 + k)
ppkkF :
Also because kEk1 =P
(i;j)2E;i 6=j ij pskkF ,
2C1rlog p
nkEk1 +
C1
rlog p
nkEk1
rlog p
nkEk1
2C1 + C1
0
for suciently small . Also
2C1
rlog p
nkEk 2C1
rlog p
n
pskkF
-
28 ASHWINI MAURYA
Therefore,
G() kk2F1 C1
1
rlog p
n
2C1rp log pn
k+kF
2C11
rp log p
n
1 + k)k+kF
2C1r log pn
pskkF
kk2F1 C1
1
rlog p
n
2 C1r(p+ s)log pn
k+kF
2C1r(p+ s) log p
nkkF 2C1 (1 +
k)
1
r(p+ s) log p
nk+kF
k+k2Fh1 C1
1
rlog p
n 2k+k1F
r(p+ s) log p
nC1
1 +
1 + k
1
i+kk2F
h1 C1
1
rlog p
n 2C1kk1F
r(p+ s) log p
n
i k+k2F
h1 C1
1
rlog p
n 2C1 +
2C1(1+k)1
M
i+ kk2F
h1 C1
1
rlog p
n 2C1
M
i 0;
for all suciently large n and M . Hence the theorem.
Proof of Corollary 3.1. Note that for a correlation matrix, all
the vari-ables are standardized to have mean zero and variance 1.
Using a result fromBai and Yin (1993), we have limn=n(p)!0 min(S) =
(1
p)2 > 0, for < 1.
Rest of the proof of this corollary is similar to Theorem 3.1
and hence omit-ted.
Proof of Corollary 3.2. This corollary is special case of
Theorem 3.2when all of the variables are standardized to have mean
zero and variance1.
Proof of theorem 3.3. We have,
k^c 0k = kW^ K^W^ WWk kW^ WkkK^ kkW^ Wk
+kW^ Wk(kK^kkWk+ kW^kkk) + kK^ kkW^kkWk:
-
JPEN FOR COVARIANCE AND INVERSE COVARIANCE MATRIX
ESTIMATION29
Since kk = O(1), it follows from Corollary (3.2) that kK^k =
O(1). Also,
kW^ 2 W 2k = maxkxk2=1
pXi=1
j(w^2i w2i )jx2i max1ip
j(w^2i w2i )jpX
i=1
x2i
= max1ip
j(w^2i w2i )j = Or log p
n
:
holds with high probability by using a result (Lemma 1) from
Ravikumar etal. (2011) on the tail inequality on entries of sample
covariance matrix of sub-gaussian random vectors. Next we shall
shows that kW^Wk kW^ 2W 2k,(where AB means A=OP (B) and B=OP (A)).
We have,
kW^ Wk = maxkxk2=1
pXi=1
j(w^i wi)jx2i = maxkxk2=1pX
i=1
j w^2i w2iw^i + wi
jx2i C3
pXi=1
j(w^2i w2i )jx2i = C3kW^ 2 W 2k:
where we have used the fact that the true standard deviations
are well abovezero, i.e., 9 0 < C3 < 1 such that 1=C3 w1i C3
8i = 1; 2; ; p, andsample standard deviation are all positive, i.e,
w^i > 08i = 1; 2; ; p: Nowsince kW^ 2W 2k kW^ Wk, this follows
that kW^k = O(1):Which impliesthat k^c 0k2 = O
s log pn +
log pn
. Hence the Theorem 3.3 follows.
Proof of theorem 3.5. The method of proof for inverse covariance
ma-trix is similar to covariance matrix estimation. We keep the
notations similarto that in proof of Theorem 3.2. Dene,
Q() = k S1k2 + kk1 + tr(AD2 2tAD + t2A) k0 S1k2 k0 k1 tr(AD20
2tAD20 + t2A)
(8.2)
where 0 is the true inverse covariance matrix and is any other
covariancematrix, A = diag(A11; A22; ; App), = UDUT and 0 = U0D0UT0
beeigenvalue decomposition of and 0 respectively where D and D0
arediagonal matrices of eigenvalues and U and U0 are matrices of
eigenvectors.Let = 0 (dierence between any estimate and true
inversecovariance matrix 0). Dene the set of symmetric as: (M) = f
: =T ; kkF = Mrn; 0 < M < 1 g. The estimate ^ minimizes the
Q() orequivalently ^ = ^ 0 minimizes the G() = Q(0 +) where G()
isconvex. Note that if ^ is a solution to G(), then we have G(^)
G(0) = 0.As argued in the Proof of Theorem 3.2, if we can show that
G() is non-negative for 2 n(M), this will imply that the ^ lies
within sphere of
-
30 ASHWINI MAURYA
radius Mrn. We require rn =
q(p+s) log p
n ! 0 as n goes to 1. This willgive consistency of our estimate
in Frobenius norm at rate O(rn). On similar
lines as in proof of Theorem 3.2, for (; ) 2 R^S;t;A;2 , we
obtain
G() tr(0) 2 tr((S1 0)) + C1
rlog p
n
kHk1 Hk1C11
rlog p
nf2kppkkF + kk2F + 2
ppk+kF g
where H be the index set as dened in Assumption B1 and H = f(i;
j) :(i; j) 62 H; i; j = 1; 2; pg. Also kHk k
pskkF .
Consider the term involving S1,
jtr(0 S1)j = jtr(S1(S 10 )0)j 1(S1)jtr((S 10 )0)j= 1(S
1)jtr((S 10 )0)j 1(S1)jtr((S 10 ))j1(0) k2jtr((S 10 ))j:
by using a result on trace norm inequality from [31]. Now
consider the term,tr((S 10 )),
tr((S 10 )) = tr((S + I 10 )) = tr((S 10 ))) + tr()
C1r(p+ s) log p
nk+kF +
rlog p
nkk1
+C1
rp log p
nkkF
holds with high probability by using a result (Lemma 1) from
Ravikumaret al. (2011) on the tail inequality of subgaussian random
vectors where
-
JPEN FOR COVARIANCE AND INVERSE COVARIANCE MATRIX
ESTIMATION31
plog p=n and C1 is dened as in proof of Theorem 3.2. we have,G()
kk2F
1 C1
1
rlog p
n) 2k2C1
r(p+ s) log p
nk+kF
C1rlog p
n
2pp(1 + k)1
k+kF + k2pskkF + 2(1 +
k2)
1kkF
k+k2F
h1 C1
1
rlog p
n 2 k2C1
r(p+ s)log p
nk+kF1
2C11
rp log p
n(1 + k)k+kF1 2C1
rlog p
n(1 + k2)k+kF1
i+kk2F
h1 C1
1
rlog p
n C1 k2
rs log p
nkkF1
2C11
rp log p
n(1 + k2)kkF1
i k+k2F
h1 C1
1
rlog p
n 2
k2C1 +2C1(1+k)
1 2C1(1 + k2)
M
i+ kk2F
h1 C1
1
rlog p
n C1
k2 2C1(1 + k)M
i 0
for all suciently large n and M . Hence the result.
Proof of Corollary 3.3. The proof of this Corollary is similar
to The-orem 3.1 and hence omitted.
Proof of Corollary 3.4. The proof of this Corollary is similar
to Corol-lary 3.2 and hence omitted.
Proof of Theorem 3.4. The proof of this Theorem is similar to
Theo-rem 3.1 and hence omitted.
Proof of Theorem 3.6. The proof of this Theorem is similar to
Theo-rem 3.3 and hence omitted.
8.1. Derivation of the Algorithm.
8.1.1. Covariance matrix estimation. The optimization problem
(2.4)can be written as:
-
32 ASHWINI MAURYA
^ = argmin=T j(;)2R^S;t;A;1
f();(8.3)
where
f() = jj Sjj2F + kk1 + pX
i=1
aifi() tg2:
Note that for a non-negative denite square matrix, singular
values are thesame as its eigenvalues. We have the following trace
identity:
Sum of eigenvalues of matrix = tr():
Let = UDUT where D is the diagonal matrix of eigenvalues and U
isorthogonal matrix of eigenvectors. We have
Ppi=1 ai
2i () =
Ppi=1 aiD
2ii =
tr(AD2), where A = diag(a1; a2; :::; ap). Again D = UTU =) D2
=
DTD = UTTU = UT2U . Therefore
tr(AD) = tr(UAUT ) and
tr(AD2) = tr(AUT2U) = tr(2UAUT )
The third term in the right hand side of (8.3) can be written
as:
pXi=1
aifi() tg2 = pX
i=1
fai 2i () 2t ai i() + ai t2g
= tr(2UAUT ) 2t tr(UAUT ) + pX
i=1
ait2;
Therefore,
f() = k Sk2F + kk1
+ tr(2UAUT ) 2t tr(UAUT ) + pX
i=1
ait2
= tr(0) 2 tr(0S) + tr(S0S) + kk1 + tr(2UAUT )2 t tr(UAUT ) +
t2tr(A)
= tr(2(I + UAUT )) 2trf(S + t UAUT )g+ tr(S0S)+ kk1 + t2
tr(A)
= tr(2C) 2tr(B) + tr(S0S) + kk1 + t2 tr(A)= trf2 2BC1Cg+ tr(S0S)
+ kk1 + t2 tr(A)
-
JPEN FOR COVARIANCE AND INVERSE COVARIANCE MATRIX
ESTIMATION33
where I is the identity matrix, C = I+ UAUT and B = S+ t UAUT .
Notethat UAUT = UA1=2A1=2UT = (UA1=2)0(UA1=2) is positive denite
matrix.Since is non negative, C is sum of two positive denite
matrices, thereforepositive denite. Also C1 = U(I+A)1UT and 1(C)
1+maxipAii.Consider the term involving only ,
f1() = trf2 2BC1Cg+ kk1
tr2 2BC11(C) + kk1= kBC1k2F (1 + max
ipAii) + kk1
= (1 + maxip
Aii)kBC1k2F + f=(1 + max
ipAii)gkk1
= f2():
where
f2() = (1 + maxip
Aii)kBC1k2F + f=(1 + max
ipAii)gkk1
:
The function f2() is convex in and therefore minimizer of f2()
is unique.Note that for arbitrary choices of and , minimization of
f2() can yield annon-positive denite estimator. However as argued
earlier values of (; ) 2R^S;t;A;1 will yield a sparse and well
conditioned positive denite estimator.Clearly the minimum of f2()
is obtained for
(8.4) sign(ij) = sign(ji) = sign(BC1)ij
we dierentiate the right side of f2(), which yields,
@d
@f() = (2) 2BA1 + f=(1 + max
ipAii)sign
)= 0:
Using the optimality condition (8.4), we have,
^ii = (BC1)ii;
^ij = (BC1)ij
2(1 + maxipAii)sign(BC1)ij for i 6= j
(8.5)
Note that the estimate ^ involves matrix of eigenvectors U .
Since for a giveneigenvalue, the eigenvectors are not unique, we
can choose some suitablematrix of eigenvectors corresponding to
some positive denite covariancematrix. One choice is U = U1 where
S+ I = U1D1U
T1 for some > 0. Next
to check whether the solution of f2() given by (9.3) is
feasible, consider:
-
34 ASHWINI MAURYA
Case (i): ij 0. The solution (8.5) satises optimality condition
(8.4)if and only if (BC1)ij 2(1+maxip Aii) .
Case (ii): ij < 0: As in Case (i), the solution (8.5) satises
the opti-mality condition (8.4) if and only if (BC1)ij <
2(1+maxip Aii) .
Note that BC1 may not be symmetric. To get a symmetric estimate,
wemake it symmetric as following:
M =1
2
BC1 + (BC1)T
Combining these two cases, the optimal solution of (8.3) is
given by:
^ii =Mii:
^ij = signMij
max
njMij j
2(1 + maxipAii); 0o
i 6= j;(8.6)
where sign(x) is sign of x and jxj is absolute value of x.
Choice of U:Note that U is the matrix of eigenvectors of , which
is unknown. In prac-tice, one can chose U as matrix of eigenvectors
of corresponding eigenvaluedecomposition of S + I for some > 0
i.e. let S + I = U1D1U
T1 , then take
U = U1.
Choice of and :For given value of , we can nd the value of
satisfying:
< 2 (1 + minip
Aii)n min(S)1 + maxiAii
o+ 2 t min
ipAii 2 ;
and such choice of (; ) guarantees that the minimum eigenvalue
of the
estimate will be at least > 0 and such choice of (; ) 2
R^S;t;A;1 . Inpractice one might choose a higher value of that
corresponds to sparseand positive denite covariance matrix.
8.2. Simulation Results.
8.2.1. Choice of weight matrix A:. For p > n, (p-n) sample
eigenvaluesare identically equal to zero as well as many of the
non-zero eigenvalues areapproximately zero. The simulation analysis
shows that if we shrink each
-
JPEN FOR COVARIANCE AND INVERSE COVARIANCE MATRIX
ESTIMATION35
eigenvalues towards a xed constant (i:e: same amount of
shrinkage for eachof the sample eigenvalues), the smaller
eigenvalues are shrunk upward heav-ily away from the true
eigenvalues. Therefore we choose nonuniform weightsfor eigenvalues
to avoid over-shrinkage. Note that given a priori knowledgeof
eigenvalues dispersion, one might be able to nd better weights.
Here wedo not assume knowledge of any structure among eigenvalues
and choose theweights as per following scheme: (we assume all the
eigenvalues are orderedin decreasing order of magnitude.)
i) Let t=average(of sample eigenvalues). Let k be index such
that kth or-dered eigenvalue is less than t: Let r = p=n; b1 =
max(diag(S)) (1+
pp=n)2.
ii) For j=1 to p,
cj = bj (1 + :005 log(1 + r))jjkj; bj+1 = b2j=cj
iii)
A = diag(a1; a2; ; ap); where aj = cj=pX
j=1
cj :
where jxj is absolute value of x. Such choice of weights allows
more shrinkageof extreme sample eigenvalues than the ones in center
of eigen-spectrum.Choice of logarithmic term was to scale the
weights but this is arbitrarychoice which has worked in our
simulation setting.The gure (8.1) shows the heatmap of zero
recovery (sparsity) for block
and Toeplitz type covariance matrices based on 50 realizations
for n=50 andp=50. The JPEN estimate of covariance matrix recovers
the true zeros forabout 80% for Toeplitz and block type of
covariance matrices. Our proposedestimator also reect the recovery
of true structure of non-zero entries andany pattern among the
rows/columns of covariance matrix.
-
36 ASHWINI MAURYA
Fig 8.1. Heatmap of zeros identied in covariance matrix out of
50 realizations. Whitishgrid is 50/50 zeros identied, blackish grid
is 0/50 zeros identied.
Table 8.1 gives average relative errors and standard errors of
the covari-ance matrix estimates based on glasso, Ledoit-Wolf,
PDSCE and JPEN forn = 100 and p = 500; 1000. The glasso estimate of
covariance matrix per-forms very poorly among all the methods. The
Ledoit-Wolf estimate per-forms good but the estimate is generally
not sparse. Also the eigenvaluesestimates of Ledoit-Wolf estimator
is heavily shrunk towards the center thanthe true eigenvalues. The
JPEN estimators outperforms other estimators formost of the values
of p for all four type of covariance matrices. PDSCE es-timates
have lower average relative error and close to JPEN. This could
bedue to the fact the PDSCE and JPEN uses quadratic optimization
functionwith a dierent penalty function. Table 8.2 reports the
average relative er-ror and their standard deviations for inverse
covariance matrix estimation.Here we do not include the Ledoit-Wolf
estimator and only compare glasso,PDSCE estimates with proposed
JPEN estimator. The JPEN estimate of in-verse covariance matrix
outperforms other methods for all values of p = 500
-
JPEN FOR COVARIANCE AND INVERSE COVARIANCE MATRIX
ESTIMATION37
and p = 1000 for all four types of structured inverse covariance
matrices.
8.2.2. Covariance Matrix Estimation. -
Table 8.1Covariance matrix estimation
-
n=100 Hub type matrix Neighborhood type matrixp=500 p=1000 p=500
p=1000
Ledoit-Wolf 1.07(0.165) 3.47(0.0477) 1.1(0.0331)
2.32(0.0262)Glasso 9.07(0.167) 10.2(0.022) 9.61(0.0366)
10.4(0.0238)PDSCE 1.48(0.0709) 2.03(0.0274) 0.844(0.0331)
1.8(0.0263)JPEN 0.854(0.0808) 1.82(0.0273) 0.846(0.0332)
1.7(0.0263)n=100 Block type matrix Toeplitz type matrix
Ledoit-Wolf 4.271(0.0394) 2.18(0.11) 1.967(0.041)
2.344(0.028)Glasso 9.442(0.0438) 30.4(0.0875) 12.741(0.051)
18.221(0.0398)PDSCE 0.941(0.0418) 1.66(0.11) 0.873(0.0415)
1.82(0.028)JPEN 0.887(0.0411) 1.66(0.11) 0.707(0.0416)
1.816(0.0282)
8.3. Inverse Covariance matrix Estimation. -
Table 8.2Inverse covariance matrix estimation
n=100 Hub type matrix Neighborhood type matrixp=500 p=1000 p=500
p=1000
Glasoo 9.82(0.0212) 10.9(0.0204) 12.365(0.0176)
13.084(0.0178)PDSCE 1.13(0.0269) 2.07(0.0238) 1.74(0.0549)
3.79(0.0676)JPEN 0.138(0.0153) 0.856(0.0251) 0.260(0.0234)
1.208(0.0277)n=100 Block type matrix Toeplitz type matrixGlasoo
12.4(0.0266) 13.1(0.0171) 19.3(0.0271) 20.7(0.0227)PDSCE
0.993(0.0375) 1.83(0.0251) 1.89(0.0465) 3.79(0.0382)JPEN
0.355(0.0319) 1.18(0.0258) 1.24(0.0437) 3.18(0.0432)
Ashwini MauryaDepartment of Statisticsand ProbabilityMichigan
State UniversityEast Lansing, MI 48824-1027U. S. A.E-mail:
[email protected]
IntroductionBackground and Problem Set-upProposed EstimatorOur
Contribution
Analysis of JPEN MethodCovariance Matrix EstimationEstimation of
Inverse Covariance Matrix
An AlgorithmCovariance Matrix Estimation:Inverse Covariance
Matrix Estimation:Computational Time
Simulation ResultsColon Tumor Classification
ExampleSummaryReferencesTechnical ProofsDerivation of the
AlgorithmCovariance matrix estimation
Simulation ResultsChoice of weight matrix A:Covariance Matrix
Estimation
Inverse Covariance matrix Estimation
Author's addresses