Top Banner
ELE 538B: Mathematics of High-Dimensional Data Gaussian Graphical Models and Graphical Lasso Yuxin Chen Princeton University, Fall 2018
14

Gaussian Graphical Models and Graphical Lasso · 2018-11-07 · Gaussian Graphical Models and Graphical Lasso Yuxin Chen Princeton University, Fall 2018. Multivariate Gaussians Consider

Jun 08, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Gaussian Graphical Models and Graphical Lasso · 2018-11-07 · Gaussian Graphical Models and Graphical Lasso Yuxin Chen Princeton University, Fall 2018. Multivariate Gaussians Consider

ELE 538B: Mathematics of High-Dimensional Data

Gaussian Graphical Models and Graphical Lasso

Yuxin Chen

Princeton University, Fall 2018

Page 2: Gaussian Graphical Models and Graphical Lasso · 2018-11-07 · Gaussian Graphical Models and Graphical Lasso Yuxin Chen Princeton University, Fall 2018. Multivariate Gaussians Consider

Multivariate Gaussians

Consider a random vector x ∼ N (0,Σ) with probability density

f(x) = 1(2π)p/2 det (Σ)1/2 exp

{−1

2x>Σ−1x

}∝ det (Θ)1/2 exp

{−1

2x>Θx

}where Σ = E[xx>] � 0 is the covariance matrix, and Θ = Σ−1 isthe inverse covariance matrix or precision matrixGraphical lasso 10-2

Page 3: Gaussian Graphical Models and Graphical Lasso · 2018-11-07 · Gaussian Graphical Models and Graphical Lasso Yuxin Chen Princeton University, Fall 2018. Multivariate Gaussians Consider

Undirected graphical models

x1 ⊥⊥ x4 | {x2, x3, x5, x6, x7, x8}

• Represent a collection of variables x = [x1, · · · , xp]> by a vertexset V = {1, · · · , p}• Encode conditional independence by a set E of edges

◦ For any pair of vertices u and v,

(u, v) /∈ E ⇐⇒ xu ⊥⊥ xv | xV\{u,v}

Graphical lasso 10-3

Page 4: Gaussian Graphical Models and Graphical Lasso · 2018-11-07 · Gaussian Graphical Models and Graphical Lasso Yuxin Chen Princeton University, Fall 2018. Multivariate Gaussians Consider

Gaussian graphical models

Fact 10.1(Homework) Consider a Gaussian vector x ∼ N (0,Σ). For any u andv,

xu ⊥⊥ xv | xV\{u,v}

iff Θu,v = 0, where Θ = Σ−1

conditional independence ⇐⇒ sparsity

Graphical lasso 10-4

Page 5: Gaussian Graphical Models and Graphical Lasso · 2018-11-07 · Gaussian Graphical Models and Graphical Lasso Yuxin Chen Princeton University, Fall 2018. Multivariate Gaussians Consider

Gaussian graphical models

∗ ∗ 0 0 ∗ 0 0 0∗ ∗ 0 0 0 ∗ ∗ 00 0 ∗ 0 ∗ 0 0 ∗0 0 0 ∗ 0 0 ∗ 0∗ 0 ∗ 0 ∗ 0 0 ∗0 ∗ 0 0 0 ∗ 0 00 ∗ 0 ∗ 0 0 ∗ 00 0 ∗ 0 ∗ 0 0 ∗

︸ ︷︷ ︸

Θ

Graphical lasso 10-5

Page 6: Gaussian Graphical Models and Graphical Lasso · 2018-11-07 · Gaussian Graphical Models and Graphical Lasso Yuxin Chen Princeton University, Fall 2018. Multivariate Gaussians Consider

Likelihoods for Gaussian models

Draw n i.i.d. samples x(1), · · · ,x(n) ∼ N (0,Σ), then thelog-likelihood (up to additive constant) is

` (Θ) = 1n

n∑i=1

log f(x(i)) = 12 log det (Θ)− 1

2n

n∑i=1x(i)>Θx(i)

= 12 log det (Θ)− 1

2 〈S,Θ〉 ,

where S = 1n

∑ni=1 x

(i)x(i)>: sample covariance; 〈S,Θ〉 = tr(SΘ)

Maximum likelihood estimation

maximizeΘ�0 log det (Θ)− 〈S,Θ〉

Graphical lasso 10-6

Page 7: Gaussian Graphical Models and Graphical Lasso · 2018-11-07 · Gaussian Graphical Models and Graphical Lasso Yuxin Chen Princeton University, Fall 2018. Multivariate Gaussians Consider

Challenge in high-dimensional regime

Classical theory says MLE coverges to the truth as sample size n→∞

Practically, we are often in the regime where the sample size n issmall (n < p)• In this regime, S is rank-deficient, and the MLE does not even

exist (why?)

Graphical lasso 10-7

Page 8: Gaussian Graphical Models and Graphical Lasso · 2018-11-07 · Gaussian Graphical Models and Graphical Lasso Yuxin Chen Princeton University, Fall 2018. Multivariate Gaussians Consider

Graphical lasso (Friedman, Hastie, &Tibshirani ’08)

In practice, many pairs of variables might be conditionally independent⇐⇒ many missing links in the graphical model (sparsity)

Key idea: use `1 regularization to promote sparsity

maximizeΘ�0 log det (Θ)− 〈S,Θ〉 − λ‖Θ‖1︸ ︷︷ ︸lasso penalty

• Convex program! (homework)

Graphical lasso 10-8

Page 9: Gaussian Graphical Models and Graphical Lasso · 2018-11-07 · Gaussian Graphical Models and Graphical Lasso Yuxin Chen Princeton University, Fall 2018. Multivariate Gaussians Consider

Graphical lasso (Friedman, Hastie, &Tibshirani ’08)

maximizeΘ�0 log det (Θ)− 〈S,Θ〉 − λ‖Θ‖1︸ ︷︷ ︸lasso penalty

• First-order optimality condition

0 ∈ Θ−1 − S − λ ∂‖Θ‖1︸ ︷︷ ︸subdifferential

(10.1)

• For diagonal entries, one has 1 ∈ ∂|Θi,i| (since Θi,i > 0)

=⇒ (Θ−1)i,i = Si,i + λ, 1 ≤ i ≤ p

Graphical lasso 10-9

Page 10: Gaussian Graphical Models and Graphical Lasso · 2018-11-07 · Gaussian Graphical Models and Graphical Lasso Yuxin Chen Princeton University, Fall 2018. Multivariate Gaussians Consider

(Optional) Blockwise coordinate descentIdea: repeatedly cycle through all columns / rows and, in each step,optimize only a single column / row

Notation: use W to denote a working version of Θ−1. Partition allmatrices into 1 column / row vs. the rest

Θ =[

Θ11 θ12θ>12 θ22

]S =

[S11 s12s>12 s22

]W =

[W11 w12w>12 w22

]Graphical lasso 10-10

Page 11: Gaussian Graphical Models and Graphical Lasso · 2018-11-07 · Gaussian Graphical Models and Graphical Lasso Yuxin Chen Princeton University, Fall 2018. Multivariate Gaussians Consider

(Optional) Blockwise coordinate descent

Blockwise step: suppose we fix all but the last row / column. Itfollows from (10.1) that

0 ∈W11β − s12 − λ∂‖θ12‖1 = W11β − s12 + λ∂‖β‖1 (10.2)

where β = −θ12/θ̃22 (since[

Θ11 θ12θ>12 θ22

]−1=[

∗ − 1θ̃22

Θ−111 θ12

∗ ∗

]︸ ︷︷ ︸

matrix inverse formula

) with

θ̃22 = θ22 − θ>12Θ−111 θ12 > 0

This coincides with the optimality condition for

minimizeβ12∥∥W 1/2

11 β −W−1/211 s12

∥∥22 + λ‖β‖1 (10.3)

Graphical lasso 10-11

Page 12: Gaussian Graphical Models and Graphical Lasso · 2018-11-07 · Gaussian Graphical Models and Graphical Lasso Yuxin Chen Princeton University, Fall 2018. Multivariate Gaussians Consider

(Optional) Blockwise coordinate descent

Algorithm 10.1 Block coordinate descent for graphical lassoInitialize W = S + λI and fix its diagonals {wi,i}.Repeat until covergence:

for t = 1, · · · p:(i) Partition W (resp. S) into 4 parts, where the upper-left part

consists of all but the jth row / column(ii) Solve

minimizeβ12∥∥W 1/2

11 β −W−1/211 s12

∥∥22 + λ‖β‖1

(iii) Update w12 = W11β

Set θ̂12 = −θ̂22β with θ̂22 = 1/(w22 −w>12β)

Graphical lasso 10-12

Page 13: Gaussian Graphical Models and Graphical Lasso · 2018-11-07 · Gaussian Graphical Models and Graphical Lasso Yuxin Chen Princeton University, Fall 2018. Multivariate Gaussians Consider

(Optional) Blockwise coordinate descent

The only remaining thing is to ensure W � 0. This is automaticallysatisfied:

Lemma 10.2 (Mazumder & Hastie ’12)

If we start with W � 0 satisfying ‖W − S‖∞ ≤ λ, then everyrow / column update maintains positive definiteness of W .

• If we start with W (0) = S + λI, then W (t) will always bepositive definite

Graphical lasso 10-13

Page 14: Gaussian Graphical Models and Graphical Lasso · 2018-11-07 · Gaussian Graphical Models and Graphical Lasso Yuxin Chen Princeton University, Fall 2018. Multivariate Gaussians Consider

Reference

[1] ”Sparse inverse covariance estimation with the graphical lasso,”J. Friedman, T. Hastie, and R. Tibshirani, Biostatistics, 2008.

[2] ”The graphical lasso: new insights and alternatives,” R. Mazumder andT. Hastie, Electronic journal of statistics, 2012.

[3] ”Statistical learning with sparsity: the Lasso and generalizations,”T. Hastie, R. Tibshirani, and M. Wainwright, 2015.

Graphical lasso 10-14