Top Banner
Elementary Estimators for General Moment Parameters Elementary Estimators for Linear Models Elementary Estimators for Gaussian Graphical Models Refer Elementary Estimators for High-Dimensional Statistical Models Pradeep Ravikumar Joint work with Eunho Yang and Aur´ elie C. Lozano University of Texas, Austin Jun. 26, 2014 Pradeep Ravikumar (UT Austin) Elementary Estimators for High-Dimensional Statistical Models Jun. 26, 2014 1 / 48
54

Elementary Estimators for High-Dimensional Statistical Models

Dec 25, 2021

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Elementary Estimators for High-Dimensional Statistical Models

Elementary Estimators for General Moment Parameters Elementary Estimators for Linear Models Elementary Estimators for Gaussian Graphical Models References

Elementary Estimators forHigh-Dimensional Statistical Models

Pradeep Ravikumar

Joint work with Eunho Yang and Aurelie C. Lozano

University of Texas, Austin

Jun. 26, 2014

Pradeep Ravikumar (UT Austin) Elementary Estimators for High-Dimensional Statistical Models Jun. 26, 2014 1 / 48

Page 2: Elementary Estimators for High-Dimensional Statistical Models

Elementary Estimators for General Moment Parameters Elementary Estimators for Linear Models Elementary Estimators for Gaussian Graphical Models References

Background - High-Dimensional Statistics

When the ambient dimension p is larger than sample size n

Need structural constraints on high-dimensional statistical modelsI Sparsity: only a small number of entries are non-zerosI Group sparsity: only small number of groups are non-zerosI Low rank: when the parameters are matrix-structured

...

ex) Linear models, y = Xθ∗ + w :

I minimize 12n‖Xθ − y‖2

2 +λn‖θ‖1

ex) Gaussian graphical models, P(y ; Θ∗) ∝ exp{− 1

2〈〈yy>,Θ∗〉〉 − A(Θ∗)

}:

I minimize 〈〈S ,Θ〉〉 − logdet Θ +λn‖Θ‖1

where S := 1n

∑ni=1(x (i) − x)(x (i) − x)>, and x := 1

n

∑ni=1 x

(i).

Pradeep Ravikumar (UT Austin) Elementary Estimators for High-Dimensional Statistical Models Jun. 26, 2014 2 / 48

Page 3: Elementary Estimators for High-Dimensional Statistical Models

Elementary Estimators for General Moment Parameters Elementary Estimators for Linear Models Elementary Estimators for Gaussian Graphical Models References

Background - High-Dimensional Statistics

When the ambient dimension p is larger than sample size n

Need structural constraints on high-dimensional statistical modelsI Sparsity: only a small number of entries are non-zerosI Group sparsity: only small number of groups are non-zerosI Low rank: when the parameters are matrix-structured

...

ex) Linear models, y = Xθ∗ + w :

I minimize 12n‖Xθ − y‖2

2 +λn‖θ‖1

ex) Gaussian graphical models, P(y ; Θ∗) ∝ exp{− 1

2〈〈yy>,Θ∗〉〉 − A(Θ∗)

}:

I minimize 〈〈S ,Θ〉〉 − logdet Θ +λn‖Θ‖1

where S := 1n

∑ni=1(x (i) − x)(x (i) − x)>, and x := 1

n

∑ni=1 x

(i).

Pradeep Ravikumar (UT Austin) Elementary Estimators for High-Dimensional Statistical Models Jun. 26, 2014 2 / 48

Page 4: Elementary Estimators for High-Dimensional Statistical Models

Elementary Estimators for General Moment Parameters Elementary Estimators for Linear Models Elementary Estimators for Gaussian Graphical Models References

Background - High-Dimensional Statistics

When the ambient dimension p is larger than sample size n

Need structural constraints on high-dimensional statistical modelsI Sparsity: only a small number of entries are non-zerosI Group sparsity: only small number of groups are non-zerosI Low rank: when the parameters are matrix-structured

...

ex) Linear models, y = Xθ∗ + w :

I minimize 12n‖Xθ − y‖2

2 +λn‖θ‖1

ex) Gaussian graphical models, P(y ; Θ∗) ∝ exp{− 1

2〈〈yy>,Θ∗〉〉 − A(Θ∗)

}:

I minimize 〈〈S ,Θ〉〉 − logdet Θ +λn‖Θ‖1

where S := 1n

∑ni=1(x (i) − x)(x (i) − x)>, and x := 1

n

∑ni=1 x

(i).

Pradeep Ravikumar (UT Austin) Elementary Estimators for High-Dimensional Statistical Models Jun. 26, 2014 2 / 48

Page 5: Elementary Estimators for High-Dimensional Statistical Models

Elementary Estimators for General Moment Parameters Elementary Estimators for Linear Models Elementary Estimators for Gaussian Graphical Models References

Background - High-Dimensional Statistics

When the ambient dimension p is larger than sample size n

Surge of Recent Work:I (Linear models:) Tibshirani (1996); van de Geer and Buhlmann (2009); Meinshausen and Yu

(2009); Candes and Tao (2006); Meinshausen and Buhlmann (2006); Wainwright (2009); Zhao

and Yu (2006); Tropp et al. (2006); Zhao et al. (2009); Yuan and Lin (2006); Jacob et al. (2009);

Lounici et al. (2009); Baraniuk et al. (2008); Recht et al. (2010); Bach (2008); Negahban et al.

(2012) . . .

I (Inverse covariance estimation:) Yuan and Lin (2007); Friedman et al. (2007); Bannerjee

et al. (2008); Ravikumar et al. (2011); Boyd and Vandenberghe (2004); Friedman et al. (2007);

Bannerjee et al. (2008); Meinshausen and Buhlmann (2006); Cai et al. (2011) . . .

...

Still expensive for very-large scale problems!

Pradeep Ravikumar (UT Austin) Elementary Estimators for High-Dimensional Statistical Models Jun. 26, 2014 3 / 48

Page 6: Elementary Estimators for High-Dimensional Statistical Models

Elementary Estimators for General Moment Parameters Elementary Estimators for Linear Models Elementary Estimators for Gaussian Graphical Models References

Background - High-Dimensional Statistics

When the ambient dimension p is larger than sample size n

Surge of Recent Work:I (Linear models:) Tibshirani (1996); van de Geer and Buhlmann (2009); Meinshausen and Yu

(2009); Candes and Tao (2006); Meinshausen and Buhlmann (2006); Wainwright (2009); Zhao

and Yu (2006); Tropp et al. (2006); Zhao et al. (2009); Yuan and Lin (2006); Jacob et al. (2009);

Lounici et al. (2009); Baraniuk et al. (2008); Recht et al. (2010); Bach (2008); Negahban et al.

(2012) . . .

I (Inverse covariance estimation:) Yuan and Lin (2007); Friedman et al. (2007); Bannerjee

et al. (2008); Ravikumar et al. (2011); Boyd and Vandenberghe (2004); Friedman et al. (2007);

Bannerjee et al. (2008); Meinshausen and Buhlmann (2006); Cai et al. (2011) . . .

...

Still expensive for very-large scale problems!

Pradeep Ravikumar (UT Austin) Elementary Estimators for High-Dimensional Statistical Models Jun. 26, 2014 3 / 48

Page 7: Elementary Estimators for High-Dimensional Statistical Models

Elementary Estimators for General Moment Parameters Elementary Estimators for Linear Models Elementary Estimators for Gaussian Graphical Models References

Main Question: If we restrict to closed-form estimators;can we nonetheless obtain consistent estimators with sharpconvergence rates?

Pradeep Ravikumar (UT Austin) Elementary Estimators for High-Dimensional Statistical Models Jun. 26, 2014 4 / 48

Page 8: Elementary Estimators for High-Dimensional Statistical Models

Elementary Estimators for General Moment Parameters Elementary Estimators for Linear Models Elementary Estimators for Gaussian Graphical Models References

Why Closed-Form Estimators?

Current approach to structurally constrained statistical model estimation istwo-staged:

I Statistical: Devise regularized likelihood-based statistical estimatorsI Computational: Devise efficient optimization methods, allied with

parallel/distributed frameworks, to solve these estimators — increasinglyimportant in modern Big Data settings

Comptastical Approach: devise statistical estimators with computationalconstraints in mind

I Closed-form estimators are a particularly stringent class of computationalconstraints

I As we will show, they can nonetheless enjoy strong statistical guarantees!

Pradeep Ravikumar (UT Austin) Elementary Estimators for High-Dimensional Statistical Models Jun. 26, 2014 5 / 48

Page 9: Elementary Estimators for High-Dimensional Statistical Models

Elementary Estimators for General Moment Parameters Elementary Estimators for Linear Models Elementary Estimators for Gaussian Graphical Models References

1 Elementary Estimators for General Moment Parameters

2 Elementary Estimators for Linear Models

3 Elementary Estimators for Gaussian Graphical Models

Pradeep Ravikumar (UT Austin) Elementary Estimators for High-Dimensional Statistical Models Jun. 26, 2014 6 / 48

Page 10: Elementary Estimators for High-Dimensional Statistical Models

Elementary Estimators for General Moment Parameters Elementary Estimators for Linear Models Elementary Estimators for Gaussian Graphical Models References

Moment Parameter Estimation

X ∈ Rp: Random vector with distribution P,

{Xi}ni=1: n i.i.d. observations drawn from P.

Goal: Estimating moment parameter µ∗ := E[φ(X )], where φ : Rp 7→ Rm isvector-valued feature function

Pradeep Ravikumar (UT Austin) Elementary Estimators for High-Dimensional Statistical Models Jun. 26, 2014 7 / 48

Page 11: Elementary Estimators for High-Dimensional Statistical Models

Elementary Estimators for General Moment Parameters Elementary Estimators for Linear Models Elementary Estimators for Gaussian Graphical Models References

WHY NOT Regularized Likelihood-Based Estimators?

A natural distributional setting: Exponential family, with sufficient statisticsset φ(X ):

P(X ; θ) = exp{〈θ, φ(X )〉 − A(θ)

}

A natural estimator is the `1-regularized MLE:

minimizeµ

{−〈θ(µ), µn〉+ A

(θ(µ)

)︸ ︷︷ ︸Negative log-likelihood L(µ)

+‖µ‖1

}

where µn is the sample moment: 1n

∑ni=1 φ(Xi )

Pradeep Ravikumar (UT Austin) Elementary Estimators for High-Dimensional Statistical Models Jun. 26, 2014 8 / 48

Page 12: Elementary Estimators for High-Dimensional Statistical Models

Elementary Estimators for General Moment Parameters Elementary Estimators for Linear Models Elementary Estimators for Gaussian Graphical Models References

WHY NOT Regularized Likelihood-Based Estimators?

Let us derive a “Dantzig variant” in this general setting. We have:

∇L(µ) = −∇2A∗(µ) µn +∇2A∗(µ)∇A(θ(µ)

)= ∇2A∗(µ)

(− µn + µ

).

Then the “Dantzig variant” of the structured moment estimator:

minimizeµ

‖µ‖1

s. t.∥∥∥∇2A∗(µ)(µ− µn)

∥∥∥∞≤ λn,

Proposition: The estimation problems above are both non-convex for generalexponential families!

Pradeep Ravikumar (UT Austin) Elementary Estimators for High-Dimensional Statistical Models Jun. 26, 2014 9 / 48

Page 13: Elementary Estimators for High-Dimensional Statistical Models

Elementary Estimators for General Moment Parameters Elementary Estimators for Linear Models Elementary Estimators for Gaussian Graphical Models References

General Structured Moment Estimation

Our estimator for general structurally constrained moment parameters:

minimizeµ

R(µ)

s. t. R∗(µ− µn

)≤ λn,

where R∗(a) = supb:R(b)6=0〈a,b〉R(b) .

The optimal solution µ has a closed-form solution! (Provided R(·) is atomic

norm (Chandrasekaran et al., 2010))

Pradeep Ravikumar (UT Austin) Elementary Estimators for High-Dimensional Statistical Models Jun. 26, 2014 10 / 48

Page 14: Elementary Estimators for High-Dimensional Statistical Models

Elementary Estimators for General Moment Parameters Elementary Estimators for Linear Models Elementary Estimators for Gaussian Graphical Models References

Statistical Guarantees for General StructuresOur estimator for general structure:

minimize R(µ)

s. t. R∗(µ− µn

)≤ λn

TheoremSuppose that the population mean parameter µ∗ lies in some low dimensionalspace M, and that R(·) is decomposable w.r.t. M. Also suppose that we setλn ≥ R∗(µ∗ − µn). Then,

R∗(µ− µ∗) ≤ 2λn ,

‖µ− µ∗‖2 ≤ 4λnΨ ,

R(µ− µ∗) ≤ 8λnΨ2

where Ψ := supu∈M\{0}R(u)‖u‖ .

Pradeep Ravikumar (UT Austin) Elementary Estimators for High-Dimensional Statistical Models Jun. 26, 2014 11 / 48

Page 15: Elementary Estimators for High-Dimensional Statistical Models

Elementary Estimators for General Moment Parameters Elementary Estimators for Linear Models Elementary Estimators for Gaussian Graphical Models References

General Moment Estimation - Sparsity Case

Our estimator for arbitrary moment parameters: given empirical momentµn = 1

n

∑ni=1 φ(Xi ),

minimizeµ

‖µ‖1

s. t.∥∥µ− µn

∥∥∞ ≤ λn,

Pradeep Ravikumar (UT Austin) Elementary Estimators for High-Dimensional Statistical Models Jun. 26, 2014 12 / 48

Page 16: Elementary Estimators for High-Dimensional Statistical Models

Elementary Estimators for General Moment Parameters Elementary Estimators for Linear Models Elementary Estimators for Gaussian Graphical Models References

Statistical Guarantees - Sparsity Case

Our estimator for sparsity case:

minimizeµ

‖µ‖1

s. t.∥∥µ− µn

∥∥∞ ≤ λn,

TheoremSuppose that µ∗ has at most s non-zero elements. Also suppose that we setλn ≥

∥∥µ∗ − µn

∥∥∞. We then have:

‖µ− µ∗‖∞ ≤ 2λn,

‖µ− µ∗‖2 ≤ 4√

sλn,

‖µ− µ∗‖1 ≤ 8 sλn.

Pradeep Ravikumar (UT Austin) Elementary Estimators for High-Dimensional Statistical Models Jun. 26, 2014 13 / 48

Page 17: Elementary Estimators for High-Dimensional Statistical Models

Elementary Estimators for General Moment Parameters Elementary Estimators for Linear Models Elementary Estimators for Gaussian Graphical Models References

Example: Estimating Covariance

Special case: Estimating covariance matrix:

Σ∗ = E[(X − E(X ))(X − E(X ))>]

Figure : Principal component analysis, source: Wikipedia

Pradeep Ravikumar (UT Austin) Elementary Estimators for High-Dimensional Statistical Models Jun. 26, 2014 14 / 48

Page 18: Elementary Estimators for High-Dimensional Statistical Models

Elementary Estimators for General Moment Parameters Elementary Estimators for Linear Models Elementary Estimators for Gaussian Graphical Models References

Special Case: Sparse Covariance Estimation

Our estimator for covariance estimation:

minimizeΣ

‖Σ‖1

s. t. ‖S − Σ‖∞ ≤ λn (1)

where S = 1n

∑ni=1

(Xi − X

)(Xi − X

)>, and X = 1

n

∑ni=1 Xi .

Pradeep Ravikumar (UT Austin) Elementary Estimators for High-Dimensional Statistical Models Jun. 26, 2014 15 / 48

Page 19: Elementary Estimators for High-Dimensional Statistical Models

Elementary Estimators for General Moment Parameters Elementary Estimators for Linear Models Elementary Estimators for Gaussian Graphical Models References

Special Case: Sparse Covariance Estimation

Decomposable into element-wise problems

minimizeΣst

‖Σst‖1

s. t. ‖Sst − Σst‖∞ ≤ λn

The optimal solution Σ of (1) is simply Sλn(S) where[Sλ(u)]i = sign(ui ) max(|ui | − λ, 0)

I Covariance estimation by element-wise soft-thresholding: Rothman et al.(2009); Bickel and Levina (2008) analyzed it is consistent in terms ofoperator norm.

Pradeep Ravikumar (UT Austin) Elementary Estimators for High-Dimensional Statistical Models Jun. 26, 2014 16 / 48

Page 20: Elementary Estimators for High-Dimensional Statistical Models

Elementary Estimators for General Moment Parameters Elementary Estimators for Linear Models Elementary Estimators for Gaussian Graphical Models References

Statistical GuaranteesOur estimator for covariance estimation:

minimize ‖Σ‖1

s. t. ‖S − Σ‖∞ ≤ λn

TheoremSuppose that Σ∗ of Gaussian has s non-zero elements at most. Also suppose

that λn = c1

√log pn . Then, with high probability,

‖Σ− Σ∗‖∞ ≤ 2c1

√log p

n

‖Σ− Σ∗‖F ≤ 4c1

√s log p

ncf. Tighter than previous result: O

(√ps log p

n

)‖Σ− Σ∗‖1 ≤ 8c1 s

√log p

n

Pradeep Ravikumar (UT Austin) Elementary Estimators for High-Dimensional Statistical Models Jun. 26, 2014 17 / 48

Page 21: Elementary Estimators for High-Dimensional Statistical Models

Elementary Estimators for General Moment Parameters Elementary Estimators for Linear Models Elementary Estimators for Gaussian Graphical Models References

Extension to Superposition Structures

µ∗ =∑α∈I µ

∗α, where µ∗α is a “clean” structured parameter.

Ex: Robust PCA where Σ∗ is the sum of low rank Θ∗ and sparse Γ∗

“Elem-Super-Moment” estimators:

minimizeµ1,µ2,...,µ|I|

∑α∈I

λαRα(µα)

s. t. R∗α(µn −

∑α∈I

µα

)≤ λα for ∀α ∈ I .

Pradeep Ravikumar (UT Austin) Elementary Estimators for High-Dimensional Statistical Models Jun. 26, 2014 18 / 48

Page 22: Elementary Estimators for High-Dimensional Statistical Models

Elementary Estimators for General Moment Parameters Elementary Estimators for Linear Models Elementary Estimators for Gaussian Graphical Models References

Statistical Guarantees for General StructuresElem-Super-Moment estimators:

minimizeµ1,µ2,...,µ|I|

∑α∈I

λαRα(µα)

s. t. R∗α(µn −

∑α∈I

µα

)≤ λα for ∀α ∈ I .

Theorem

Suppose that µ∗ =∑α∈I µ

∗α, where each µ∗α lies in a low dimensional subspace

Mα, and that each Rα(·) is decomposable w.r.t. corresp. Mα. Also supposethat we set λα ≥ R∗α(µ∗ − µ). We then have:

R∗α(µ− µ∗) ≤ 2λα ,

Rα(µα − µ∗α) ≤ 16|I |λα

(maxα∈I

λαΨ(Mα))2

,

‖µ− µ∗‖F ≤ 4√

2|I |maxα∈I

λαΨ(Mα) .

Pradeep Ravikumar (UT Austin) Elementary Estimators for High-Dimensional Statistical Models Jun. 26, 2014 19 / 48

Page 23: Elementary Estimators for High-Dimensional Statistical Models

Elementary Estimators for General Moment Parameters Elementary Estimators for Linear Models Elementary Estimators for Gaussian Graphical Models References

Experiments - Simulations

Σ∗ = Σ∗1 + Σ∗2I where Σ∗1 = 0.5(1p1T

p ) and Σ∗2 = Ip/5 ⊗ (0.2(151T5 ) + 0.2I5)

Method Spectral Frobenius Nuclear Matrix 1-norm

n=100,p=200Elem-Super-Moment 7.10 (0.15) 8.56(0.18) 35.87 (0.43) 11.65 (0.12)

Thresholding 8.30 (0.17) 10.43 (0.11) 45.84 (0.39) 19.85 (0.21)Well-conditioned 12.22 (0.12) 13.19 (0.17) 48.11 (0.45) 23.89(0.18)

n=100,p=400Elem-Super-Moment 25.63 (0.54) 26.67 (0.49) 198.76 (1.31) 50.77 (0.72)

Thresholding 33.55 (0.49) 41.91(0.60) 331.41 (2.05) 67.64 (0.73)Well-conditioned 35.71 (0.50) 34.83 (0.46) 207.97(2.27) 93.60 (0.91)

Pradeep Ravikumar (UT Austin) Elementary Estimators for High-Dimensional Statistical Models Jun. 26, 2014 20 / 48

Page 24: Elementary Estimators for High-Dimensional Statistical Models

Elementary Estimators for General Moment Parameters Elementary Estimators for Linear Models Elementary Estimators for Gaussian Graphical Models References

1 Elementary Estimators for General Moment Parameters

2 Elementary Estimators for Linear Models

3 Elementary Estimators for Gaussian Graphical Models

Pradeep Ravikumar (UT Austin) Elementary Estimators for High-Dimensional Statistical Models Jun. 26, 2014 21 / 48

Page 25: Elementary Estimators for High-Dimensional Statistical Models

Elementary Estimators for General Moment Parameters Elementary Estimators for Linear Models Elementary Estimators for Gaussian Graphical Models References

Background - Linear RegressionConsider the linear regression model:

yi = x>i θ∗ + wi , i = 1, . . . , n,

I θ∗ ∈ Rp: fixed unknown regression parameter of interestI yi ∈ R: real-valued responseI xi ∈ Rp: known observation vectorI wi ∈ N (0, σ2): independent zero-mean Gaussian noiseI Collate n independent observations: y = Xθ∗ + w

Pradeep Ravikumar (UT Austin) Elementary Estimators for High-Dimensional Statistical Models Jun. 26, 2014 22 / 48

Page 26: Elementary Estimators for High-Dimensional Statistical Models

Elementary Estimators for General Moment Parameters Elementary Estimators for Linear Models Elementary Estimators for Gaussian Graphical Models References

Background - Linear RegressionConsider the linear regression model:

yi = x>i θ∗ + wi , i = 1, . . . , n

Used extensively in practical applications.I Finance: Modeling Investment risk, Spending, Demand, etc. (responses)

given market conditions (features)I Epidemiology: Linking Tobacco Smoking (feature) to Mortality (response)

Pradeep Ravikumar (UT Austin) Elementary Estimators for High-Dimensional Statistical Models Jun. 26, 2014 23 / 48

Page 27: Elementary Estimators for High-Dimensional Statistical Models

Elementary Estimators for General Moment Parameters Elementary Estimators for Linear Models Elementary Estimators for Gaussian Graphical Models References

Classical Closed-Form Estimators - OLS

When p < n (and X>X is full-rank),I Ordinary least squares (OLS) estimator: (X>X )−1X>y

When p > n, X>X cannot be full rankI The OLS estimator is no longer well-defined.

Pradeep Ravikumar (UT Austin) Elementary Estimators for High-Dimensional Statistical Models Jun. 26, 2014 24 / 48

Page 28: Elementary Estimators for High-Dimensional Statistical Models

Elementary Estimators for General Moment Parameters Elementary Estimators for Linear Models Elementary Estimators for Gaussian Graphical Models References

Classical Closed-Form Estimators - Ridge

Ridge regularized least squares estimators:

θ = arg minθ

{‖y − Xθ‖2

2 + ε‖θ‖22

}.

where θ = (X>X + εI )−1X>y .

Ridge estimators are not consistent in high-dimensional sampling regimes!

Pradeep Ravikumar (UT Austin) Elementary Estimators for High-Dimensional Statistical Models Jun. 26, 2014 25 / 48

Page 29: Elementary Estimators for High-Dimensional Statistical Models

Elementary Estimators for General Moment Parameters Elementary Estimators for Linear Models Elementary Estimators for Gaussian Graphical Models References

Classical Closed-Form Estimators - Ridge

Ridge regularized least squares estimators:

θ = arg minθ

{‖y − Xθ‖2

2 + ε‖θ‖22

}.

where θ = (X>X + εI )−1X>y .

Ridge estimators are not consistent in high-dimensional sampling regimes!

Pradeep Ravikumar (UT Austin) Elementary Estimators for High-Dimensional Statistical Models Jun. 26, 2014 25 / 48

Page 30: Elementary Estimators for High-Dimensional Statistical Models

Elementary Estimators for General Moment Parameters Elementary Estimators for Linear Models Elementary Estimators for Gaussian Graphical Models References

Variants of Ridge and OLS Closed-Form Estimators

We derived variants of ridge and OLS closed-form estimators for generalstructurally constrained linear regression models

Pradeep Ravikumar (UT Austin) Elementary Estimators for High-Dimensional Statistical Models Jun. 26, 2014 26 / 48

Page 31: Elementary Estimators for High-Dimensional Statistical Models

Elementary Estimators for General Moment Parameters Elementary Estimators for Linear Models Elementary Estimators for Gaussian Graphical Models References

The Elem-OLS EstimatorRecall ordinary least squares: (X>X )−1X>y .

For any matrix A, we define element-wise operator Tν :

[Tν(A)

]ij

=

{Aii + ν if i = jsign(Aij)(|Aij | − ν) otherwise, i 6= j

⇒ Instead of X>X , apply Tν to obtain Tν(X>Xn

)Each row of X is i.i.d. sampled from N(0,Σ)

The design matrix X is column normalized

The covariance Σ is strictly diagonally dominant

Proposition: For any ν ≥ 8(maxi Σii )√

10τ log p′

n , the matrix Tν(X>Xn

)is

invertible with probability at least 1− 4/p′τ−2 for p′ := max{n, p} and anyconstant τ > 2.

Pradeep Ravikumar (UT Austin) Elementary Estimators for High-Dimensional Statistical Models Jun. 26, 2014 27 / 48

Page 32: Elementary Estimators for High-Dimensional Statistical Models

Elementary Estimators for General Moment Parameters Elementary Estimators for Linear Models Elementary Estimators for Gaussian Graphical Models References

The Elem-OLS EstimatorRecall ordinary least squares: (X>X )−1X>y .

For any matrix A, we define element-wise operator Tν :

[Tν(A)

]ij

=

{Aii + ν if i = jsign(Aij)(|Aij | − ν) otherwise, i 6= j

⇒ Instead of X>X , apply Tν to obtain Tν(X>Xn

)

Each row of X is i.i.d. sampled from N(0,Σ)

The design matrix X is column normalized

The covariance Σ is strictly diagonally dominant

Proposition: For any ν ≥ 8(maxi Σii )√

10τ log p′

n , the matrix Tν(X>Xn

)is

invertible with probability at least 1− 4/p′τ−2 for p′ := max{n, p} and anyconstant τ > 2.

Pradeep Ravikumar (UT Austin) Elementary Estimators for High-Dimensional Statistical Models Jun. 26, 2014 27 / 48

Page 33: Elementary Estimators for High-Dimensional Statistical Models

Elementary Estimators for General Moment Parameters Elementary Estimators for Linear Models Elementary Estimators for Gaussian Graphical Models References

The Elem-OLS EstimatorRecall ordinary least squares: (X>X )−1X>y .

For any matrix A, we define element-wise operator Tν :

[Tν(A)

]ij

=

{Aii + ν if i = jsign(Aij)(|Aij | − ν) otherwise, i 6= j

⇒ Instead of X>X , apply Tν to obtain Tν(X>Xn

)Each row of X is i.i.d. sampled from N(0,Σ)

The design matrix X is column normalized

The covariance Σ is strictly diagonally dominant

Proposition: For any ν ≥ 8(maxi Σii )√

10τ log p′

n , the matrix Tν(X>Xn

)is

invertible with probability at least 1− 4/p′τ−2 for p′ := max{n, p} and anyconstant τ > 2.

Pradeep Ravikumar (UT Austin) Elementary Estimators for High-Dimensional Statistical Models Jun. 26, 2014 27 / 48

Page 34: Elementary Estimators for High-Dimensional Statistical Models

Elementary Estimators for General Moment Parameters Elementary Estimators for Linear Models Elementary Estimators for Gaussian Graphical Models References

The Elem-OLS Estimator for General Structure

Our Elem-OLS estimator for general structurally constrained linear models:

minimizeθ

R(θ) (2)

s. t. R∗(θ −

[Tν(X>X

n

)]−1 X>y

n

)≤ λn.

Pradeep Ravikumar (UT Austin) Elementary Estimators for High-Dimensional Statistical Models Jun. 26, 2014 28 / 48

Page 35: Elementary Estimators for High-Dimensional Statistical Models

Elementary Estimators for General Moment Parameters Elementary Estimators for Linear Models Elementary Estimators for Gaussian Graphical Models References

The Elem-OLS Estimator - Sparsity Case

Our Elem-OLS estimator for sparsity case:

θ = Sλn

([Tν(X>X

n

)]−1 X>y

n

)where [Sλ(u)]i = sign(ui ) max(|ui | − λ, 0) is the soft-thresholding

Pradeep Ravikumar (UT Austin) Elementary Estimators for High-Dimensional Statistical Models Jun. 26, 2014 29 / 48

Page 36: Elementary Estimators for High-Dimensional Statistical Models

Elementary Estimators for General Moment Parameters Elementary Estimators for Linear Models Elementary Estimators for Gaussian Graphical Models References

Statistical Guarantees of Elem-OLS Estimator

Our Elem-OLS estimator for general structurally constrained linear models:

minimizeθ

R(θ)

s. t. R∗(θ −

[Tν(X>X

n

)]−1 X>y

n

)≤ λn.

TheoremSuppose that the true parameter θ∗ lies in some low dimensional space M, andthat R(·) is decomposable w.r.t. M. Denote Ψ(M) := supu∈M\{0}

(R(u)/‖u‖

).

Suppose also that we set λn ≥ R∗(θ∗ − (X>X + εI )−1X>y

). We then have:

R∗(θ − θ∗

)≤ 2λn ,

‖θ − θ∗‖2 ≤ 4Ψ(M)λn ,

R(θ − θ∗

)≤ 8[Ψ(M)]2λn .

Pradeep Ravikumar (UT Austin) Elementary Estimators for High-Dimensional Statistical Models Jun. 26, 2014 30 / 48

Page 37: Elementary Estimators for High-Dimensional Statistical Models

Elementary Estimators for General Moment Parameters Elementary Estimators for Linear Models Elementary Estimators for Gaussian Graphical Models References

Statistical Guarantees of Elem-OLS Estimator - Sparsityθ∗ is sparse with k non-zero entries

Corollary

Suppose ν := 8(maxi Σii )√

10τ log pn and λn := 1

δmin

(2σ√

log p′

n + c√

log p′

n ‖θ∗‖1

).

Then, any optimal solution of Elem-OLS estimator satisfies

‖θ − θ∗‖∞ ≤2

δmin

(2σ

√log p

n+ c

√log p

n‖θ∗‖1

),

‖θ − θ∗‖2 ≤4

δmin

(2σ

√k log p

n+ c

√k log p

n‖θ∗‖1

),

‖θ − θ∗‖1 ≤8

δmin

(2σk

√log p

n+ ck

√log p

n‖θ∗‖1

)with probability at least 1− c1 exp(−c2p).

Cf.) Similar to rates of standard LASSO: ‖θLASSO − θ∗‖2 ≤ O(√

k log pn

)Pradeep Ravikumar (UT Austin) Elementary Estimators for High-Dimensional Statistical Models Jun. 26, 2014 31 / 48

Page 38: Elementary Estimators for High-Dimensional Statistical Models

Elementary Estimators for General Moment Parameters Elementary Estimators for Linear Models Elementary Estimators for Gaussian Graphical Models References

Experiments - Simulated Datayi = x>i θ

∗ + wi , i = 1, . . . , n:

I X ∼ N(0,Σ) where Σi,j = 0.5|i−j|

I w ∼ N(0, 1).

I k := ‖θ‖0 = 10,

I Non-zero element of θ∗ chosen independently and uniformly in (1, 3)

Table : Average performance measure and standard deviation in parenthesis for`1-penalized comparison methods on simulated data for sparse linear models.

Method TP FP `2 `∞

n=1000,p=1000Elem-OLS 100.00 (0.00) 2.05 (1.15) 0.551 (0.071) 0.255 (0.041)

Elem-Ridge 100.00 (0.00) 2.44 (2.12) 0.741 (0.411) 0.435 (0.064)LASSO 100.00 (0.00) 9.84 (2.45) 0.563 (0.067) 0.270 (0.039)

Thr-LASSO 100.00 (0.00) 8.33 (1.14) 0.560 (0.066) 0.274 (0.071)OMP 98.24 (0.64) 3.20 (1.38) 0.559 (0.113) 0.282 (0.055)

n=1000,p=2000Elem-OLS 100.00 (0.00) 2.22 (2.02) 0.656 (0.111) 0.314( 0.071)

Elem-Ridge 100.00 (0.00) 11.94 (4.48) 3.8834 (0.411) 1.678 (0.349)LASSO 100.00 (0.00) 18.88 (6.93) 0.657(0.110) 0.316(0.075)

Thr-LASSO 99.59(0.36) 14.35(2.66) 0.656 (0.099) 0.315(0.052)OMP 96.36(1.00) 10.25 (4.24) 0.735(0.222) 0.536(0.136)

Pradeep Ravikumar (UT Austin) Elementary Estimators for High-Dimensional Statistical Models Jun. 26, 2014 32 / 48

Page 39: Elementary Estimators for High-Dimensional Statistical Models

Elementary Estimators for General Moment Parameters Elementary Estimators for Linear Models Elementary Estimators for Gaussian Graphical Models References

1 Elementary Estimators for General Moment Parameters

2 Elementary Estimators for Linear Models

3 Elementary Estimators for Gaussian Graphical Models

Pradeep Ravikumar (UT Austin) Elementary Estimators for High-Dimensional Statistical Models Jun. 26, 2014 33 / 48

Page 40: Elementary Estimators for High-Dimensional Statistical Models

Elementary Estimators for General Moment Parameters Elementary Estimators for Linear Models Elementary Estimators for Gaussian Graphical Models References

Background - Gaussian Graphical ModelsConsider X = (X1, . . . ,Xp) with Gaussian distribution N (X |µ,Σ):

P(X |θ,Θ) = exp(− 1

2〈〈Θ,XX>〉〉+ 〈θ,X 〉 − A(Θ, θ)

)

Θ−1 corresponds to the set of edges in Gaussian Markov random fields

`1 regularized maximum likelihood estimator:

minimize�0

〈〈S ,Θ〉〉 − log det Θ + λn‖Θ‖1,off

Pradeep Ravikumar (UT Austin) Elementary Estimators for High-Dimensional Statistical Models Jun. 26, 2014 34 / 48

Page 41: Elementary Estimators for High-Dimensional Statistical Models

Elementary Estimators for General Moment Parameters Elementary Estimators for Linear Models Elementary Estimators for Gaussian Graphical Models References

The Elementary Estimator for Gaussian Graphical Models

Our Elem-GM estimator for general structurally constrained Gaussiangraphical models:

minimizeΘ

R(Θ)

s. t. R∗(

Θ− [Tν(S)]−1)≤ λn

Pradeep Ravikumar (UT Austin) Elementary Estimators for High-Dimensional Statistical Models Jun. 26, 2014 35 / 48

Page 42: Elementary Estimators for High-Dimensional Statistical Models

Elementary Estimators for General Moment Parameters Elementary Estimators for Linear Models Elementary Estimators for Gaussian Graphical Models References

The Elementary Estimator for Gaussian Graphical Models -Sparsity Case

Our Elem-GM estimator for sparsity case:

Θ = Sλn

([Tν(S)]−1

)where [Sλ(u)]i = sign(ui ) max(|ui | − λ, 0) is the soft-thresholding

Pradeep Ravikumar (UT Austin) Elementary Estimators for High-Dimensional Statistical Models Jun. 26, 2014 36 / 48

Page 43: Elementary Estimators for High-Dimensional Statistical Models

Elementary Estimators for General Moment Parameters Elementary Estimators for Linear Models Elementary Estimators for Gaussian Graphical Models References

Statistical Guarantees of Elem-GM EstimatorOur Elem-GM estimator for general structurally constrained Gaussiangraphical models:

minimizeΘ

R(Θ)

s. t. R∗(

Θ− [Tν(S)]−1)≤ λn

TheoremSuppose that the true parameter Θ∗ lies in some low dimensional space M, andthat R(·) is decomposable w.r.t. M. Denote Ψ(M) := supu∈M\{0}

(R(u)/‖u‖

).

Suppose also that we set λn ≥ R∗(Θ∗ − [Tν(S)]−1

). We then have:

R∗(Θ−Θ∗

)≤ 2λn ,

‖Θ−Θ∗‖2 ≤ 4Ψ(M)λn ,

R(Θ−Θ∗

)≤ 8[Ψ(M)]2λn .

Pradeep Ravikumar (UT Austin) Elementary Estimators for High-Dimensional Statistical Models Jun. 26, 2014 37 / 48

Page 44: Elementary Estimators for High-Dimensional Statistical Models

Elementary Estimators for General Moment Parameters Elementary Estimators for Linear Models Elementary Estimators for Gaussian Graphical Models References

Statistical Guarantees of Elem-GM Estimator - Sparsity

Θ∗ is sparse with k non-zero entries

Corollary

Suppose ν := 8(maxi Σii )√

10τ log pn and λn := O

(√log pn

). Then, any optimal

solution of elementary Gaussian estimator satisfies

‖Θ−Θ∗‖∞ ≤ O

(√log p

n

), ‖Θ−Θ∗‖F ≤ O

(√k

log p

n

)‖Θ−Θ∗‖1 ≤ O

(k

√log p

n

)with probability at least 1− c1 exp(−c2p).

Cf.) Asymp. equivalent to rates of standard `1 regularized MLE:

‖Θ`1MLE −Θ∗‖F ≤ O(√

k log pn

)Pradeep Ravikumar (UT Austin) Elementary Estimators for High-Dimensional Statistical Models Jun. 26, 2014 38 / 48

Page 45: Elementary Estimators for High-Dimensional Statistical Models

Elementary Estimators for General Moment Parameters Elementary Estimators for Linear Models Elementary Estimators for Gaussian Graphical Models References

Experiments

Approximately 10p non-zero entries in Θ∗ (random structure)

λn := K√

log pn

(n, p) = (800, 1600)

Table : Performance comparisons our closed-form estimators against state of the artQUIC algorithm (Hsieh et al., 2011) solving `1 MLE

K Time(sec) `F (off) `∞ (off) FPR TPR

Elem-GM0.01 < 1 6.36 0.1616 0.48 0.990.02 < 1 6.19 0.1880 0.24 0.990.05 < 1 5.91 0.1655 0.06 0.990.1 < 1 6 0.1703 0.01 0.97

QUIC

0.5 2575.5 12.74 0.11 0.52 1.001 1009 7.30 0.13 0.35 0.992 272.1 6.33 0.18 0.16 0.993 78.1 6.97 0.21 0.07 0.944 28.7 7.68 0.23 0.02 0.86

Pradeep Ravikumar (UT Austin) Elementary Estimators for High-Dimensional Statistical Models Jun. 26, 2014 39 / 48

Page 46: Elementary Estimators for High-Dimensional Statistical Models

Elementary Estimators for General Moment Parameters Elementary Estimators for Linear Models Elementary Estimators for Gaussian Graphical Models References

ExperimentsApproximately 10p non-zero entries in Θ∗ (random structure)

λn := K√

log pn

(n, p) = (800, 1600)

0 0.1 0.2 0.30.8

0.85

0.9

0.95

1

False Positive Rate

Tru

e P

ositiv

e R

ate

QUIC(1)QUIC(2)QUICElem−GM

Figure : Receiver operator curves for support set recovery task.

Pradeep Ravikumar (UT Austin) Elementary Estimators for High-Dimensional Statistical Models Jun. 26, 2014 40 / 48

Page 47: Elementary Estimators for High-Dimensional Statistical Models

Elementary Estimators for General Moment Parameters Elementary Estimators for Linear Models Elementary Estimators for Gaussian Graphical Models References

Experiments

Approximately 10p non-zero entries in Θ∗ (random structure)

λn := K√

log pn

(n, p) = (5000, 10000)

Table : Performance comparisons our closed-form estimators against state of the artQUIC algorithm (Hsieh et al., 2011) solving `1 MLE

K Time(sec) `F (off) `∞ (off) FPR TPR

Elem-GM

0.05 47.3 11.73 0.1501 0.13 1.000.1 46.3 8.91 0.1479 0.03 1.000.5 45.8 5.66 0.1308 0.0 1.001 46.2 8.63 0.1111 0.0 0.99

QUIC

2 * * * * *2.5 * * * * *3 4.8× 104 9.85 0.1083 0.06 1.00

3.5 2.7× 104 10.51 0.1111 0.04 0.99

Pradeep Ravikumar (UT Austin) Elementary Estimators for High-Dimensional Statistical Models Jun. 26, 2014 41 / 48

Page 48: Elementary Estimators for High-Dimensional Statistical Models

Elementary Estimators for General Moment Parameters Elementary Estimators for Linear Models Elementary Estimators for Gaussian Graphical Models References

ExperimentsApproximately 10p non-zero entries in Θ∗ (random structure)

λn := K√

log pn

(n, p) = (5000, 10000)

0 0.05 0.10.9

0.92

0.94

0.96

0.98

1

False Positive Rate

Tru

e P

ositiv

e R

ate

QUIC(1)QUIC(2)QUICElem−GM

Figure : Receiver operator curves for support set recovery task.

Pradeep Ravikumar (UT Austin) Elementary Estimators for High-Dimensional Statistical Models Jun. 26, 2014 42 / 48

Page 49: Elementary Estimators for High-Dimensional Statistical Models

Elementary Estimators for General Moment Parameters Elementary Estimators for Linear Models Elementary Estimators for Gaussian Graphical Models References

Conclusion

We propose a case of elementary convex estimators for estimating generalstatistical models

I Available in closed-form in many casesI Provide a unified statistical analysis for general structure

Future workI Develop this closed form estimation framework for more general

high-dimensional problemsI Extend the framework to non-convex penalty functions

Pradeep Ravikumar (UT Austin) Elementary Estimators for High-Dimensional Statistical Models Jun. 26, 2014 43 / 48

Page 50: Elementary Estimators for High-Dimensional Statistical Models

Elementary Estimators for General Moment Parameters Elementary Estimators for Linear Models Elementary Estimators for Gaussian Graphical Models References

Thank you!

Pradeep Ravikumar (UT Austin) Elementary Estimators for High-Dimensional Statistical Models Jun. 26, 2014 44 / 48

Page 51: Elementary Estimators for High-Dimensional Statistical Models

Elementary Estimators for General Moment Parameters Elementary Estimators for Linear Models Elementary Estimators for Gaussian Graphical Models References

R. Tibshirani. Regression shrinkage and selection via the lasso. Journal of theRoyal Statistical Society, Series B, 58(1):267–288, 1996.

S. van de Geer and P. Buhlmann. On the conditions used to prove oracle resultsfor the lasso. Electronic Journal of Statistics, 3:1360–1392, 2009.

N. Meinshausen and B. Yu. Lasso-type recovery of sparse representations forhigh-dimensional data. Annals of Statistics, 37(1):246–270, 2009.

E. Candes and T. Tao. The Dantzig selector: Statistical estimation when p ismuch larger than n. Annals of Statistics, 2006.

N. Meinshausen and P. Buhlmann. High-dimensional graphs and variable selectionwith the Lasso. Annals of Statistics, 34:1436–1462, 2006.

M. J. Wainwright. Sharp thresholds for high-dimensional and noisy sparsityrecovery using `1-constrained quadratic programming (Lasso). IEEE Trans.Information Theory, 55:2183–2202, May 2009.

P. Zhao and B. Yu. On model selection consistency of Lasso. Journal of MachineLearning Research, 7:2541–2567, 2006.

J. A. Tropp, A. C. Gilbert, and M. J. Strauss. Algorithms for simultaneous sparseapproximation. Signal Processing, 86:572–602, April 2006. Special issue on”Sparse approximations in signal and image processing”.

Pradeep Ravikumar (UT Austin) Elementary Estimators for High-Dimensional Statistical Models Jun. 26, 2014 44 / 48

Page 52: Elementary Estimators for High-Dimensional Statistical Models

Elementary Estimators for General Moment Parameters Elementary Estimators for Linear Models Elementary Estimators for Gaussian Graphical Models References

P. Zhao, G. Rocha, and B. Yu. Grouped and hierarchical model selection throughcomposite absolute penalties. Annals of Statistics, 37(6A):3468–3497, 2009.

M. Yuan and Y. Lin. Model selection and estimation in regression with groupedvariables. Journal of the Royal Statistical Society B, 1(68):49, 2006.

L. Jacob, G. Obozinski, and J. P. Vert. Group Lasso with Overlap and GraphLasso. In International Conference on Machine Learning (ICML), pages433–440, 2009.

K. Lounici, M. Pontil, A. B. Tsybakov, and S. van de Geer. Taking advantage ofsparsity in multi-task learning. Technical Report arXiv:0903.1468, ETH Zurich,March 2009.

R. G. Baraniuk, V. Cevher, M. F. Duarte, and C. Hegde. Model-based compressivesensing. Technical report, Rice University, 2008. Available at arxiv:0808.3572.

B. Recht, M. Fazel, and P. Parrilo. Guaranteed minimum-rank solutions of linearmatrix equations via nuclear norm minimization. SIAM Review, Vol 52(3):471–501, 2010.

F. Bach. Consistency of trace norm minimization. Journal of Machine LearningResearch, 9:1019–1048, June 2008.

S. Negahban, P. Ravikumar, M. J. Wainwright, and B. Yu. A unified frameworkfor high-dimensional analysis of M-estimators with decomposable regularizers.Statistical Science, 27(4):538–557, 2012.

Pradeep Ravikumar (UT Austin) Elementary Estimators for High-Dimensional Statistical Models Jun. 26, 2014 44 / 48

Page 53: Elementary Estimators for High-Dimensional Statistical Models

Elementary Estimators for General Moment Parameters Elementary Estimators for Linear Models Elementary Estimators for Gaussian Graphical Models References

M. Yuan and Y. Lin. Model selection and estimation in the Gaussian graphicalmodel. Biometrika, 94(1):19–35, 2007.

J. Friedman, T. Hastie, and R. Tibshirani. Sparse inverse covariance estimationwith the graphical Lasso. Biostatistics, 2007.

O. Bannerjee, , L. El Ghaoui, and A. d’Aspremont. Model selection through sparsemaximum likelihood estimation for multivariate Gaussian or binary data. Jour.Mach. Lear. Res., 9:485–516, March 2008.

P. Ravikumar, M. J. Wainwright, G. Raskutti, and B. Yu. High-dimensionalcovariance estimation by minimizing `1-penalized log-determinant divergence.Electronic Journal of Statistics, 5:935–980, 2011.

S. Boyd and L. Vandenberghe. Convex optimization. Cambridge University Press,Cambridge, UK, 2004.

T. Cai, W. Liu, and X. Luo. A constrained `1 minimization approach to sparseprecision matrix estimation. Journal of the American Statistical Association,106(494):594–607, 2011.

V. Chandrasekaran, B. Recht, P. A. Parrilo, and A. S. Willsky. The convexgeometry of linear inverse problems. In 48th Annual Allerton Conference onCommunication, Control and Computing, 2010.

Pradeep Ravikumar (UT Austin) Elementary Estimators for High-Dimensional Statistical Models Jun. 26, 2014 44 / 48

Page 54: Elementary Estimators for High-Dimensional Statistical Models

Elementary Estimators for General Moment Parameters Elementary Estimators for Linear Models Elementary Estimators for Gaussian Graphical Models References

A. J. Rothman, E. Levina, and J. Zhu. Generalized thresholding of largecovariance matrices. Journal of the American Statistical Association (Theoryand Methods), 104:177–186, 2009.

P. J. Bickel and E. Levina. Covariance regularization by thresholding. Annals ofStatistics, 36(6):2577–2604, 2008.

C.-J. Hsieh, M. Sustik, I. Dhillon, and P. Ravikumar. Sparse inverse covariancematrix estimation using quadratic approximation. In Neur. Info. Proc. Sys.(NIPS), 24, 2011.

Pradeep Ravikumar (UT Austin) Elementary Estimators for High-Dimensional Statistical Models Jun. 26, 2014 44 / 48