Covariance estimation with Cholesky decomposition and ...bchang/gmrg/files/Bo_05222015.pdf · Covariance estimation with Cholesky decomposition and generalized linear model Bo Chang

Covariance estimation with Cholesky decomposition andgeneralized linear model

Bo Chang

Graphical Models Reading Group

May 22, 2015

Bo Chang (UBC) Cholesky decomposition and GLM May 22, 2015 1 / 21

Modified Cholesky decomposition

Goal: Find a re-parameterization of a covariance matrix that isunconstrained and statistically interpretable.

Assume Y = (Y1, . . . ,Yp)′ is an ordered (time-ordered) randomvector with mean 0 and covariance matrix Σ.

Yt =t−1∑j=1

φt,jYj + εt .

Let σ2t = Var(εt) and

Cov(ε) = diag(σ21, . . . , σ2p) = D.



Rearranging

Yt =t−1∑j=1

φt,jYj + εt ,

we have TY = ε, where

T =

1−φ2,1 1−φ3,1 −φ3,2 1

......

. . .

−φp,1 −φp,2 · · · −φp,p−1 1

.

Cov(TY ) = Cov(ε) = TΣT′ = D.



Definition: For a positive-definite covariance matrix Σ, its modifiedCholesky decomposition is

TΣT′ = D,

where T is a unique unit lower-triangular matrix having ones on itsdiagonal and D is a unique diagonal matrix.

Precision matrix can be written as

Σ−1 = T′D−1T.

T is unconstrained and statistically meaningful.

T and D can be fitted by regressing a variable Yt on its predecessors.


Sparse estimation

k-banding:

AR(k) model.

Yt =k∑

i=1

φt,t−iYt−i + εt

The resulting estimate of the precision matrix is also k-banded.


Sparse estimation


Sparse estimation

k-banding:

Nonparametric estimation: Wu and Pourahmadi (2003) used localpolynomial estimators to smooth the subdiagonals of T.

k∑j=0

fj ,p(t/p)Yt−j = σp(t/p)εt ,

where f0,p(·) = 1, fj ,p(·) and σp(·) are continuous functions on [0, 1].εt are independent with mean 0 and variance 1.

φt,t−j = fj ,p(t/p), σt = σp(t/p).


Sparse estimation

Lasso penalty: Huang et al. (2006)

Minimize

n log |Σ|+ ntr(D−1TST′) + λ

p∑t=2

t−1∑j=1

|φt,j |.

Zeros are placed in T with no regular patterns.

Sparsity of the precision matrix is not guaranteed.


Sparse estimation


Sparse estimation

Nested lasso penalty / Adaptive banding: Levina et al. (2008)

Minimize

n log |Σ|+ ntr(D−1TST′) + λ

p∑t=2

P(φt),

P(φt) = |φt,t−1|+|φt,t−2||φt,t−1|

+ · · ·+ |φt,1||φt,2|

,

where 0/0 is defined to be zero.

Select the best model that regresses the jth variable on its k closestpredecessors, where k = kj is dependent on j .


Sparse estimation


Sparse estimation

Forward adaptive banding: Leng and Li. (2011)

Minimize modified BIC:

n log |Σ|+ ntr(D−1TST′) + Cn log(n)

p∑j=1

kj ,

s.t. kj ≤ min{n/(log n)2, j − 1},

where kj is the band length.

Fit AR(kj) to obtain T and D.


Cholesky decomposition: summary

Cholesky decomposition is dependent on the order in which thevariables appear in the random vector Y .

It works when the variables have a natural ordering.


GLM for covariance matrices

Another way to reduce number of covariance parameters is to usecovariates, as in modeling the mean vector.

Path of development: linear → log-linear → GLM.


Linear covariance models

Linear covariance models (LCM):

Σ± = α1U1 + · · ·+ αqUq,

where Ui ’s are some known symmetric basis matrices (covariates) andαi ’s are unknown parameters.

For q = p2, any covariance matrix can be written as:

Σ = (σij) =

p∑i=1

p∑j=1

σijUij ,

where Uij is matrix with 1 on (i , j)th position and 0 elsewhere.


Linear covariance models

MLE: the score equation of αi is

tr(Σ−1Ui )− tr(SΣ−1UiΣ−1) = 0,

which can be solved by an iterative method.

Constraint: αi ’s are restricted so that the matrix is positive definite.

Lack of interpretation.


Log-linear covariance models

Log-linear covariance models:

log Σ = α1U1 + · · ·+ αqUq,

αi ’s are now unconstrained.


GLM via Cholesky decomposition

Pourahmadi (1999):

Cholesky decomposition: Σ−1 = T′D−1T.

T and log D are unconstrained.

Parametric models for φt,j and log σ2t :

log σ2t = z ′tλ, φt,j = w ′t,jγ,

where zt and wt,j are q × 1 and d × 1 vectors of covariates, λ and γare parameters.

Common covariates are powers of times and lags

zt = (1, t, t2, . . . , tq−1)′,

wt,j = (1, t − j , (t − j)2, . . . , (t − j)d−1)′.


GLM via Cholesky decomposition

Number of parameters: q + d .

Computing MLE is relatively simple:

−2l(λ, γ) = n log |D|+ ntr(D−1TST′).

Given D, the MLE of T has a closed form. Similarly, given T, theMLE of D has a closed form.


References

Pourahmadi, M. (2011). Covariance estimation: The GLM and regularizationperspectives. Statistical Science, 26(3), 369-387.

Pourahmadi, M. (2013). High-Dimensional Covariance Estimation: WithHigh-Dimensional Data. John Wiley & Sons.

Pourahmadi, M. (1999). Joint mean-covariance models with applications tolongitudinal data: Unconstrained parameterisation. Biometrika, 86(3), 677-690.

Huang, J. Z., Liu, N., Pourahmadi, M., & Liu, L. (2006). Covariance matrixselection and estimation via penalised normal likelihood. Biometrika, 93(1), 85-98.

Leng, C., & Li, B. (2011). Forward adaptive banding for estimating largecovariance matrices. Biometrika, 98(4), 821-830.

Levina, E., Rothman, A., & Zhu, J. (2008). Sparse estimation of large covariancematrices via a nested Lasso penalty. The Annals of Applied Statistics, 2(1),245-263.

Wu, W. B., & Pourahmadi, M. (2003). Nonparametric estimation of largecovariance matrices of longitudinal data. Biometrika, 90(4), 831-844.


The End


Covariance estimation with Cholesky decomposition and ...bchang/gmrg/files/Bo_05222015.pdf · Covariance estimation with Cholesky decomposition and generalized linear model Bo Chang

Documents