Elementary Estimators for General Moment Parameters Elementary Estimators for Linear Models Elementary Estimators for Gaussian Graphical Models Refer Elementary Estimators for High-Dimensional Statistical Models Pradeep Ravikumar Joint work with Eunho Yang and Aur´ elie C. Lozano University of Texas, Austin Jun. 26, 2014 Pradeep Ravikumar (UT Austin) Elementary Estimators for High-Dimensional Statistical Models Jun. 26, 2014 1 / 48
54
Embed
Elementary Estimators for High-Dimensional Statistical Models
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Elementary Estimators for General Moment Parameters Elementary Estimators for Linear Models Elementary Estimators for Gaussian Graphical Models References
Pradeep Ravikumar (UT Austin) Elementary Estimators for High-Dimensional Statistical Models Jun. 26, 2014 1 / 48
Elementary Estimators for General Moment Parameters Elementary Estimators for Linear Models Elementary Estimators for Gaussian Graphical Models References
Background - High-Dimensional Statistics
When the ambient dimension p is larger than sample size n
Need structural constraints on high-dimensional statistical modelsI Sparsity: only a small number of entries are non-zerosI Group sparsity: only small number of groups are non-zerosI Low rank: when the parameters are matrix-structured
Pradeep Ravikumar (UT Austin) Elementary Estimators for High-Dimensional Statistical Models Jun. 26, 2014 2 / 48
Elementary Estimators for General Moment Parameters Elementary Estimators for Linear Models Elementary Estimators for Gaussian Graphical Models References
Background - High-Dimensional Statistics
When the ambient dimension p is larger than sample size n
Need structural constraints on high-dimensional statistical modelsI Sparsity: only a small number of entries are non-zerosI Group sparsity: only small number of groups are non-zerosI Low rank: when the parameters are matrix-structured
Pradeep Ravikumar (UT Austin) Elementary Estimators for High-Dimensional Statistical Models Jun. 26, 2014 2 / 48
Elementary Estimators for General Moment Parameters Elementary Estimators for Linear Models Elementary Estimators for Gaussian Graphical Models References
Background - High-Dimensional Statistics
When the ambient dimension p is larger than sample size n
Need structural constraints on high-dimensional statistical modelsI Sparsity: only a small number of entries are non-zerosI Group sparsity: only small number of groups are non-zerosI Low rank: when the parameters are matrix-structured
Pradeep Ravikumar (UT Austin) Elementary Estimators for High-Dimensional Statistical Models Jun. 26, 2014 2 / 48
Elementary Estimators for General Moment Parameters Elementary Estimators for Linear Models Elementary Estimators for Gaussian Graphical Models References
Background - High-Dimensional Statistics
When the ambient dimension p is larger than sample size n
Surge of Recent Work:I (Linear models:) Tibshirani (1996); van de Geer and Buhlmann (2009); Meinshausen and Yu
(2009); Candes and Tao (2006); Meinshausen and Buhlmann (2006); Wainwright (2009); Zhao
and Yu (2006); Tropp et al. (2006); Zhao et al. (2009); Yuan and Lin (2006); Jacob et al. (2009);
Lounici et al. (2009); Baraniuk et al. (2008); Recht et al. (2010); Bach (2008); Negahban et al.
(2012) . . .
I (Inverse covariance estimation:) Yuan and Lin (2007); Friedman et al. (2007); Bannerjee
et al. (2008); Ravikumar et al. (2011); Boyd and Vandenberghe (2004); Friedman et al. (2007);
Bannerjee et al. (2008); Meinshausen and Buhlmann (2006); Cai et al. (2011) . . .
...
Still expensive for very-large scale problems!
Pradeep Ravikumar (UT Austin) Elementary Estimators for High-Dimensional Statistical Models Jun. 26, 2014 3 / 48
Elementary Estimators for General Moment Parameters Elementary Estimators for Linear Models Elementary Estimators for Gaussian Graphical Models References
Background - High-Dimensional Statistics
When the ambient dimension p is larger than sample size n
Surge of Recent Work:I (Linear models:) Tibshirani (1996); van de Geer and Buhlmann (2009); Meinshausen and Yu
(2009); Candes and Tao (2006); Meinshausen and Buhlmann (2006); Wainwright (2009); Zhao
and Yu (2006); Tropp et al. (2006); Zhao et al. (2009); Yuan and Lin (2006); Jacob et al. (2009);
Lounici et al. (2009); Baraniuk et al. (2008); Recht et al. (2010); Bach (2008); Negahban et al.
(2012) . . .
I (Inverse covariance estimation:) Yuan and Lin (2007); Friedman et al. (2007); Bannerjee
et al. (2008); Ravikumar et al. (2011); Boyd and Vandenberghe (2004); Friedman et al. (2007);
Bannerjee et al. (2008); Meinshausen and Buhlmann (2006); Cai et al. (2011) . . .
...
Still expensive for very-large scale problems!
Pradeep Ravikumar (UT Austin) Elementary Estimators for High-Dimensional Statistical Models Jun. 26, 2014 3 / 48
Elementary Estimators for General Moment Parameters Elementary Estimators for Linear Models Elementary Estimators for Gaussian Graphical Models References
Main Question: If we restrict to closed-form estimators;can we nonetheless obtain consistent estimators with sharpconvergence rates?
Pradeep Ravikumar (UT Austin) Elementary Estimators for High-Dimensional Statistical Models Jun. 26, 2014 4 / 48
Elementary Estimators for General Moment Parameters Elementary Estimators for Linear Models Elementary Estimators for Gaussian Graphical Models References
Why Closed-Form Estimators?
Current approach to structurally constrained statistical model estimation istwo-staged:
I Statistical: Devise regularized likelihood-based statistical estimatorsI Computational: Devise efficient optimization methods, allied with
parallel/distributed frameworks, to solve these estimators — increasinglyimportant in modern Big Data settings
Comptastical Approach: devise statistical estimators with computationalconstraints in mind
I Closed-form estimators are a particularly stringent class of computationalconstraints
I As we will show, they can nonetheless enjoy strong statistical guarantees!
Pradeep Ravikumar (UT Austin) Elementary Estimators for High-Dimensional Statistical Models Jun. 26, 2014 5 / 48
Elementary Estimators for General Moment Parameters Elementary Estimators for Linear Models Elementary Estimators for Gaussian Graphical Models References
1 Elementary Estimators for General Moment Parameters
2 Elementary Estimators for Linear Models
3 Elementary Estimators for Gaussian Graphical Models
Pradeep Ravikumar (UT Austin) Elementary Estimators for High-Dimensional Statistical Models Jun. 26, 2014 6 / 48
Elementary Estimators for General Moment Parameters Elementary Estimators for Linear Models Elementary Estimators for Gaussian Graphical Models References
Moment Parameter Estimation
X ∈ Rp: Random vector with distribution P,
{Xi}ni=1: n i.i.d. observations drawn from P.
Goal: Estimating moment parameter µ∗ := E[φ(X )], where φ : Rp 7→ Rm isvector-valued feature function
Pradeep Ravikumar (UT Austin) Elementary Estimators for High-Dimensional Statistical Models Jun. 26, 2014 7 / 48
Elementary Estimators for General Moment Parameters Elementary Estimators for Linear Models Elementary Estimators for Gaussian Graphical Models References
WHY NOT Regularized Likelihood-Based Estimators?
A natural distributional setting: Exponential family, with sufficient statisticsset φ(X ):
P(X ; θ) = exp{〈θ, φ(X )〉 − A(θ)
}
A natural estimator is the `1-regularized MLE:
minimizeµ
{−〈θ(µ), µn〉+ A
(θ(µ)
)︸ ︷︷ ︸Negative log-likelihood L(µ)
+‖µ‖1
}
where µn is the sample moment: 1n
∑ni=1 φ(Xi )
Pradeep Ravikumar (UT Austin) Elementary Estimators for High-Dimensional Statistical Models Jun. 26, 2014 8 / 48
Elementary Estimators for General Moment Parameters Elementary Estimators for Linear Models Elementary Estimators for Gaussian Graphical Models References
WHY NOT Regularized Likelihood-Based Estimators?
Let us derive a “Dantzig variant” in this general setting. We have:
∇L(µ) = −∇2A∗(µ) µn +∇2A∗(µ)∇A(θ(µ)
)= ∇2A∗(µ)
(− µn + µ
).
Then the “Dantzig variant” of the structured moment estimator:
minimizeµ
‖µ‖1
s. t.∥∥∥∇2A∗(µ)(µ− µn)
∥∥∥∞≤ λn,
Proposition: The estimation problems above are both non-convex for generalexponential families!
Pradeep Ravikumar (UT Austin) Elementary Estimators for High-Dimensional Statistical Models Jun. 26, 2014 9 / 48
Elementary Estimators for General Moment Parameters Elementary Estimators for Linear Models Elementary Estimators for Gaussian Graphical Models References
General Structured Moment Estimation
Our estimator for general structurally constrained moment parameters:
minimizeµ
R(µ)
s. t. R∗(µ− µn
)≤ λn,
where R∗(a) = supb:R(b)6=0〈a,b〉R(b) .
The optimal solution µ has a closed-form solution! (Provided R(·) is atomic
norm (Chandrasekaran et al., 2010))
Pradeep Ravikumar (UT Austin) Elementary Estimators for High-Dimensional Statistical Models Jun. 26, 2014 10 / 48
Elementary Estimators for General Moment Parameters Elementary Estimators for Linear Models Elementary Estimators for Gaussian Graphical Models References
Statistical Guarantees for General StructuresOur estimator for general structure:
minimize R(µ)
s. t. R∗(µ− µn
)≤ λn
TheoremSuppose that the population mean parameter µ∗ lies in some low dimensionalspace M, and that R(·) is decomposable w.r.t. M. Also suppose that we setλn ≥ R∗(µ∗ − µn). Then,
R∗(µ− µ∗) ≤ 2λn ,
‖µ− µ∗‖2 ≤ 4λnΨ ,
R(µ− µ∗) ≤ 8λnΨ2
where Ψ := supu∈M\{0}R(u)‖u‖ .
Pradeep Ravikumar (UT Austin) Elementary Estimators for High-Dimensional Statistical Models Jun. 26, 2014 11 / 48
Elementary Estimators for General Moment Parameters Elementary Estimators for Linear Models Elementary Estimators for Gaussian Graphical Models References
General Moment Estimation - Sparsity Case
Our estimator for arbitrary moment parameters: given empirical momentµn = 1
n
∑ni=1 φ(Xi ),
minimizeµ
‖µ‖1
s. t.∥∥µ− µn
∥∥∞ ≤ λn,
Pradeep Ravikumar (UT Austin) Elementary Estimators for High-Dimensional Statistical Models Jun. 26, 2014 12 / 48
Elementary Estimators for General Moment Parameters Elementary Estimators for Linear Models Elementary Estimators for Gaussian Graphical Models References
Statistical Guarantees - Sparsity Case
Our estimator for sparsity case:
minimizeµ
‖µ‖1
s. t.∥∥µ− µn
∥∥∞ ≤ λn,
TheoremSuppose that µ∗ has at most s non-zero elements. Also suppose that we setλn ≥
∥∥µ∗ − µn
∥∥∞. We then have:
‖µ− µ∗‖∞ ≤ 2λn,
‖µ− µ∗‖2 ≤ 4√
sλn,
‖µ− µ∗‖1 ≤ 8 sλn.
Pradeep Ravikumar (UT Austin) Elementary Estimators for High-Dimensional Statistical Models Jun. 26, 2014 13 / 48
Elementary Estimators for General Moment Parameters Elementary Estimators for Linear Models Elementary Estimators for Gaussian Graphical Models References
Example: Estimating Covariance
Special case: Estimating covariance matrix:
Σ∗ = E[(X − E(X ))(X − E(X ))>]
Figure : Principal component analysis, source: Wikipedia
Pradeep Ravikumar (UT Austin) Elementary Estimators for High-Dimensional Statistical Models Jun. 26, 2014 14 / 48
Elementary Estimators for General Moment Parameters Elementary Estimators for Linear Models Elementary Estimators for Gaussian Graphical Models References
Special Case: Sparse Covariance Estimation
Our estimator for covariance estimation:
minimizeΣ
‖Σ‖1
s. t. ‖S − Σ‖∞ ≤ λn (1)
where S = 1n
∑ni=1
(Xi − X
)(Xi − X
)>, and X = 1
n
∑ni=1 Xi .
Pradeep Ravikumar (UT Austin) Elementary Estimators for High-Dimensional Statistical Models Jun. 26, 2014 15 / 48
Elementary Estimators for General Moment Parameters Elementary Estimators for Linear Models Elementary Estimators for Gaussian Graphical Models References
Special Case: Sparse Covariance Estimation
Decomposable into element-wise problems
minimizeΣst
‖Σst‖1
s. t. ‖Sst − Σst‖∞ ≤ λn
The optimal solution Σ of (1) is simply Sλn(S) where[Sλ(u)]i = sign(ui ) max(|ui | − λ, 0)
I Covariance estimation by element-wise soft-thresholding: Rothman et al.(2009); Bickel and Levina (2008) analyzed it is consistent in terms ofoperator norm.
Pradeep Ravikumar (UT Austin) Elementary Estimators for High-Dimensional Statistical Models Jun. 26, 2014 16 / 48
Elementary Estimators for General Moment Parameters Elementary Estimators for Linear Models Elementary Estimators for Gaussian Graphical Models References
Statistical GuaranteesOur estimator for covariance estimation:
minimize ‖Σ‖1
s. t. ‖S − Σ‖∞ ≤ λn
TheoremSuppose that Σ∗ of Gaussian has s non-zero elements at most. Also suppose
that λn = c1
√log pn . Then, with high probability,
‖Σ− Σ∗‖∞ ≤ 2c1
√log p
n
‖Σ− Σ∗‖F ≤ 4c1
√s log p
ncf. Tighter than previous result: O
(√ps log p
n
)‖Σ− Σ∗‖1 ≤ 8c1 s
√log p
n
Pradeep Ravikumar (UT Austin) Elementary Estimators for High-Dimensional Statistical Models Jun. 26, 2014 17 / 48
Elementary Estimators for General Moment Parameters Elementary Estimators for Linear Models Elementary Estimators for Gaussian Graphical Models References
Extension to Superposition Structures
µ∗ =∑α∈I µ
∗α, where µ∗α is a “clean” structured parameter.
Ex: Robust PCA where Σ∗ is the sum of low rank Θ∗ and sparse Γ∗
“Elem-Super-Moment” estimators:
minimizeµ1,µ2,...,µ|I|
∑α∈I
λαRα(µα)
s. t. R∗α(µn −
∑α∈I
µα
)≤ λα for ∀α ∈ I .
Pradeep Ravikumar (UT Austin) Elementary Estimators for High-Dimensional Statistical Models Jun. 26, 2014 18 / 48
Elementary Estimators for General Moment Parameters Elementary Estimators for Linear Models Elementary Estimators for Gaussian Graphical Models References
Statistical Guarantees for General StructuresElem-Super-Moment estimators:
minimizeµ1,µ2,...,µ|I|
∑α∈I
λαRα(µα)
s. t. R∗α(µn −
∑α∈I
µα
)≤ λα for ∀α ∈ I .
Theorem
Suppose that µ∗ =∑α∈I µ
∗α, where each µ∗α lies in a low dimensional subspace
Mα, and that each Rα(·) is decomposable w.r.t. corresp. Mα. Also supposethat we set λα ≥ R∗α(µ∗ − µ). We then have:
R∗α(µ− µ∗) ≤ 2λα ,
Rα(µα − µ∗α) ≤ 16|I |λα
(maxα∈I
λαΨ(Mα))2
,
‖µ− µ∗‖F ≤ 4√
2|I |maxα∈I
λαΨ(Mα) .
Pradeep Ravikumar (UT Austin) Elementary Estimators for High-Dimensional Statistical Models Jun. 26, 2014 19 / 48
Elementary Estimators for General Moment Parameters Elementary Estimators for Linear Models Elementary Estimators for Gaussian Graphical Models References
Pradeep Ravikumar (UT Austin) Elementary Estimators for High-Dimensional Statistical Models Jun. 26, 2014 20 / 48
Elementary Estimators for General Moment Parameters Elementary Estimators for Linear Models Elementary Estimators for Gaussian Graphical Models References
1 Elementary Estimators for General Moment Parameters
2 Elementary Estimators for Linear Models
3 Elementary Estimators for Gaussian Graphical Models
Pradeep Ravikumar (UT Austin) Elementary Estimators for High-Dimensional Statistical Models Jun. 26, 2014 21 / 48
Elementary Estimators for General Moment Parameters Elementary Estimators for Linear Models Elementary Estimators for Gaussian Graphical Models References
Background - Linear RegressionConsider the linear regression model:
yi = x>i θ∗ + wi , i = 1, . . . , n,
I θ∗ ∈ Rp: fixed unknown regression parameter of interestI yi ∈ R: real-valued responseI xi ∈ Rp: known observation vectorI wi ∈ N (0, σ2): independent zero-mean Gaussian noiseI Collate n independent observations: y = Xθ∗ + w
Pradeep Ravikumar (UT Austin) Elementary Estimators for High-Dimensional Statistical Models Jun. 26, 2014 22 / 48
Elementary Estimators for General Moment Parameters Elementary Estimators for Linear Models Elementary Estimators for Gaussian Graphical Models References
Background - Linear RegressionConsider the linear regression model:
yi = x>i θ∗ + wi , i = 1, . . . , n
Used extensively in practical applications.I Finance: Modeling Investment risk, Spending, Demand, etc. (responses)
given market conditions (features)I Epidemiology: Linking Tobacco Smoking (feature) to Mortality (response)
Pradeep Ravikumar (UT Austin) Elementary Estimators for High-Dimensional Statistical Models Jun. 26, 2014 23 / 48
Elementary Estimators for General Moment Parameters Elementary Estimators for Linear Models Elementary Estimators for Gaussian Graphical Models References
Classical Closed-Form Estimators - OLS
When p < n (and X>X is full-rank),I Ordinary least squares (OLS) estimator: (X>X )−1X>y
When p > n, X>X cannot be full rankI The OLS estimator is no longer well-defined.
Pradeep Ravikumar (UT Austin) Elementary Estimators for High-Dimensional Statistical Models Jun. 26, 2014 24 / 48
Elementary Estimators for General Moment Parameters Elementary Estimators for Linear Models Elementary Estimators for Gaussian Graphical Models References
Classical Closed-Form Estimators - Ridge
Ridge regularized least squares estimators:
θ = arg minθ
{‖y − Xθ‖2
2 + ε‖θ‖22
}.
where θ = (X>X + εI )−1X>y .
Ridge estimators are not consistent in high-dimensional sampling regimes!
Pradeep Ravikumar (UT Austin) Elementary Estimators for High-Dimensional Statistical Models Jun. 26, 2014 25 / 48
Elementary Estimators for General Moment Parameters Elementary Estimators for Linear Models Elementary Estimators for Gaussian Graphical Models References
Classical Closed-Form Estimators - Ridge
Ridge regularized least squares estimators:
θ = arg minθ
{‖y − Xθ‖2
2 + ε‖θ‖22
}.
where θ = (X>X + εI )−1X>y .
Ridge estimators are not consistent in high-dimensional sampling regimes!
Pradeep Ravikumar (UT Austin) Elementary Estimators for High-Dimensional Statistical Models Jun. 26, 2014 25 / 48
Elementary Estimators for General Moment Parameters Elementary Estimators for Linear Models Elementary Estimators for Gaussian Graphical Models References
Variants of Ridge and OLS Closed-Form Estimators
We derived variants of ridge and OLS closed-form estimators for generalstructurally constrained linear regression models
Pradeep Ravikumar (UT Austin) Elementary Estimators for High-Dimensional Statistical Models Jun. 26, 2014 26 / 48
Elementary Estimators for General Moment Parameters Elementary Estimators for Linear Models Elementary Estimators for Gaussian Graphical Models References
The Elem-OLS EstimatorRecall ordinary least squares: (X>X )−1X>y .
For any matrix A, we define element-wise operator Tν :
[Tν(A)
]ij
=
{Aii + ν if i = jsign(Aij)(|Aij | − ν) otherwise, i 6= j
⇒ Instead of X>X , apply Tν to obtain Tν(X>Xn
)Each row of X is i.i.d. sampled from N(0,Σ)
The design matrix X is column normalized
The covariance Σ is strictly diagonally dominant
Proposition: For any ν ≥ 8(maxi Σii )√
10τ log p′
n , the matrix Tν(X>Xn
)is
invertible with probability at least 1− 4/p′τ−2 for p′ := max{n, p} and anyconstant τ > 2.
Pradeep Ravikumar (UT Austin) Elementary Estimators for High-Dimensional Statistical Models Jun. 26, 2014 27 / 48
Elementary Estimators for General Moment Parameters Elementary Estimators for Linear Models Elementary Estimators for Gaussian Graphical Models References
The Elem-OLS EstimatorRecall ordinary least squares: (X>X )−1X>y .
For any matrix A, we define element-wise operator Tν :
[Tν(A)
]ij
=
{Aii + ν if i = jsign(Aij)(|Aij | − ν) otherwise, i 6= j
⇒ Instead of X>X , apply Tν to obtain Tν(X>Xn
)
Each row of X is i.i.d. sampled from N(0,Σ)
The design matrix X is column normalized
The covariance Σ is strictly diagonally dominant
Proposition: For any ν ≥ 8(maxi Σii )√
10τ log p′
n , the matrix Tν(X>Xn
)is
invertible with probability at least 1− 4/p′τ−2 for p′ := max{n, p} and anyconstant τ > 2.
Pradeep Ravikumar (UT Austin) Elementary Estimators for High-Dimensional Statistical Models Jun. 26, 2014 27 / 48
Elementary Estimators for General Moment Parameters Elementary Estimators for Linear Models Elementary Estimators for Gaussian Graphical Models References
The Elem-OLS EstimatorRecall ordinary least squares: (X>X )−1X>y .
For any matrix A, we define element-wise operator Tν :
[Tν(A)
]ij
=
{Aii + ν if i = jsign(Aij)(|Aij | − ν) otherwise, i 6= j
⇒ Instead of X>X , apply Tν to obtain Tν(X>Xn
)Each row of X is i.i.d. sampled from N(0,Σ)
The design matrix X is column normalized
The covariance Σ is strictly diagonally dominant
Proposition: For any ν ≥ 8(maxi Σii )√
10τ log p′
n , the matrix Tν(X>Xn
)is
invertible with probability at least 1− 4/p′τ−2 for p′ := max{n, p} and anyconstant τ > 2.
Pradeep Ravikumar (UT Austin) Elementary Estimators for High-Dimensional Statistical Models Jun. 26, 2014 27 / 48
Elementary Estimators for General Moment Parameters Elementary Estimators for Linear Models Elementary Estimators for Gaussian Graphical Models References
The Elem-OLS Estimator for General Structure
Our Elem-OLS estimator for general structurally constrained linear models:
minimizeθ
R(θ) (2)
s. t. R∗(θ −
[Tν(X>X
n
)]−1 X>y
n
)≤ λn.
Pradeep Ravikumar (UT Austin) Elementary Estimators for High-Dimensional Statistical Models Jun. 26, 2014 28 / 48
Elementary Estimators for General Moment Parameters Elementary Estimators for Linear Models Elementary Estimators for Gaussian Graphical Models References
The Elem-OLS Estimator - Sparsity Case
Our Elem-OLS estimator for sparsity case:
θ = Sλn
([Tν(X>X
n
)]−1 X>y
n
)where [Sλ(u)]i = sign(ui ) max(|ui | − λ, 0) is the soft-thresholding
Pradeep Ravikumar (UT Austin) Elementary Estimators for High-Dimensional Statistical Models Jun. 26, 2014 29 / 48
Elementary Estimators for General Moment Parameters Elementary Estimators for Linear Models Elementary Estimators for Gaussian Graphical Models References
Statistical Guarantees of Elem-OLS Estimator
Our Elem-OLS estimator for general structurally constrained linear models:
minimizeθ
R(θ)
s. t. R∗(θ −
[Tν(X>X
n
)]−1 X>y
n
)≤ λn.
TheoremSuppose that the true parameter θ∗ lies in some low dimensional space M, andthat R(·) is decomposable w.r.t. M. Denote Ψ(M) := supu∈M\{0}
(R(u)/‖u‖
).
Suppose also that we set λn ≥ R∗(θ∗ − (X>X + εI )−1X>y
). We then have:
R∗(θ − θ∗
)≤ 2λn ,
‖θ − θ∗‖2 ≤ 4Ψ(M)λn ,
R(θ − θ∗
)≤ 8[Ψ(M)]2λn .
Pradeep Ravikumar (UT Austin) Elementary Estimators for High-Dimensional Statistical Models Jun. 26, 2014 30 / 48
Elementary Estimators for General Moment Parameters Elementary Estimators for Linear Models Elementary Estimators for Gaussian Graphical Models References
Statistical Guarantees of Elem-OLS Estimator - Sparsityθ∗ is sparse with k non-zero entries
Corollary
Suppose ν := 8(maxi Σii )√
10τ log pn and λn := 1
δmin
(2σ√
log p′
n + c√
log p′
n ‖θ∗‖1
).
Then, any optimal solution of Elem-OLS estimator satisfies
‖θ − θ∗‖∞ ≤2
δmin
(2σ
√log p
n+ c
√log p
n‖θ∗‖1
),
‖θ − θ∗‖2 ≤4
δmin
(2σ
√k log p
n+ c
√k log p
n‖θ∗‖1
),
‖θ − θ∗‖1 ≤8
δmin
(2σk
√log p
n+ ck
√log p
n‖θ∗‖1
)with probability at least 1− c1 exp(−c2p).
Cf.) Similar to rates of standard LASSO: ‖θLASSO − θ∗‖2 ≤ O(√
k log pn
)Pradeep Ravikumar (UT Austin) Elementary Estimators for High-Dimensional Statistical Models Jun. 26, 2014 31 / 48
Elementary Estimators for General Moment Parameters Elementary Estimators for Linear Models Elementary Estimators for Gaussian Graphical Models References
Experiments - Simulated Datayi = x>i θ
∗ + wi , i = 1, . . . , n:
I X ∼ N(0,Σ) where Σi,j = 0.5|i−j|
I w ∼ N(0, 1).
I k := ‖θ‖0 = 10,
I Non-zero element of θ∗ chosen independently and uniformly in (1, 3)
Table : Average performance measure and standard deviation in parenthesis for`1-penalized comparison methods on simulated data for sparse linear models.
Pradeep Ravikumar (UT Austin) Elementary Estimators for High-Dimensional Statistical Models Jun. 26, 2014 32 / 48
Elementary Estimators for General Moment Parameters Elementary Estimators for Linear Models Elementary Estimators for Gaussian Graphical Models References
1 Elementary Estimators for General Moment Parameters
2 Elementary Estimators for Linear Models
3 Elementary Estimators for Gaussian Graphical Models
Pradeep Ravikumar (UT Austin) Elementary Estimators for High-Dimensional Statistical Models Jun. 26, 2014 33 / 48
Elementary Estimators for General Moment Parameters Elementary Estimators for Linear Models Elementary Estimators for Gaussian Graphical Models References
Background - Gaussian Graphical ModelsConsider X = (X1, . . . ,Xp) with Gaussian distribution N (X |µ,Σ):
P(X |θ,Θ) = exp(− 1
2〈〈Θ,XX>〉〉+ 〈θ,X 〉 − A(Θ, θ)
)
Θ−1 corresponds to the set of edges in Gaussian Markov random fields
`1 regularized maximum likelihood estimator:
minimize�0
〈〈S ,Θ〉〉 − log det Θ + λn‖Θ‖1,off
Pradeep Ravikumar (UT Austin) Elementary Estimators for High-Dimensional Statistical Models Jun. 26, 2014 34 / 48
Elementary Estimators for General Moment Parameters Elementary Estimators for Linear Models Elementary Estimators for Gaussian Graphical Models References
The Elementary Estimator for Gaussian Graphical Models
Our Elem-GM estimator for general structurally constrained Gaussiangraphical models:
minimizeΘ
R(Θ)
s. t. R∗(
Θ− [Tν(S)]−1)≤ λn
Pradeep Ravikumar (UT Austin) Elementary Estimators for High-Dimensional Statistical Models Jun. 26, 2014 35 / 48
Elementary Estimators for General Moment Parameters Elementary Estimators for Linear Models Elementary Estimators for Gaussian Graphical Models References
The Elementary Estimator for Gaussian Graphical Models -Sparsity Case
Our Elem-GM estimator for sparsity case:
Θ = Sλn
([Tν(S)]−1
)where [Sλ(u)]i = sign(ui ) max(|ui | − λ, 0) is the soft-thresholding
Pradeep Ravikumar (UT Austin) Elementary Estimators for High-Dimensional Statistical Models Jun. 26, 2014 36 / 48
Elementary Estimators for General Moment Parameters Elementary Estimators for Linear Models Elementary Estimators for Gaussian Graphical Models References
Statistical Guarantees of Elem-GM EstimatorOur Elem-GM estimator for general structurally constrained Gaussiangraphical models:
minimizeΘ
R(Θ)
s. t. R∗(
Θ− [Tν(S)]−1)≤ λn
TheoremSuppose that the true parameter Θ∗ lies in some low dimensional space M, andthat R(·) is decomposable w.r.t. M. Denote Ψ(M) := supu∈M\{0}
(R(u)/‖u‖
).
Suppose also that we set λn ≥ R∗(Θ∗ − [Tν(S)]−1
). We then have:
R∗(Θ−Θ∗
)≤ 2λn ,
‖Θ−Θ∗‖2 ≤ 4Ψ(M)λn ,
R(Θ−Θ∗
)≤ 8[Ψ(M)]2λn .
Pradeep Ravikumar (UT Austin) Elementary Estimators for High-Dimensional Statistical Models Jun. 26, 2014 37 / 48
Elementary Estimators for General Moment Parameters Elementary Estimators for Linear Models Elementary Estimators for Gaussian Graphical Models References
Statistical Guarantees of Elem-GM Estimator - Sparsity
Θ∗ is sparse with k non-zero entries
Corollary
Suppose ν := 8(maxi Σii )√
10τ log pn and λn := O
(√log pn
). Then, any optimal
solution of elementary Gaussian estimator satisfies
‖Θ−Θ∗‖∞ ≤ O
(√log p
n
), ‖Θ−Θ∗‖F ≤ O
(√k
log p
n
)‖Θ−Θ∗‖1 ≤ O
(k
√log p
n
)with probability at least 1− c1 exp(−c2p).
Cf.) Asymp. equivalent to rates of standard `1 regularized MLE:
‖Θ`1MLE −Θ∗‖F ≤ O(√
k log pn
)Pradeep Ravikumar (UT Austin) Elementary Estimators for High-Dimensional Statistical Models Jun. 26, 2014 38 / 48
Elementary Estimators for General Moment Parameters Elementary Estimators for Linear Models Elementary Estimators for Gaussian Graphical Models References
Experiments
Approximately 10p non-zero entries in Θ∗ (random structure)
λn := K√
log pn
(n, p) = (800, 1600)
Table : Performance comparisons our closed-form estimators against state of the artQUIC algorithm (Hsieh et al., 2011) solving `1 MLE
Pradeep Ravikumar (UT Austin) Elementary Estimators for High-Dimensional Statistical Models Jun. 26, 2014 39 / 48
Elementary Estimators for General Moment Parameters Elementary Estimators for Linear Models Elementary Estimators for Gaussian Graphical Models References
ExperimentsApproximately 10p non-zero entries in Θ∗ (random structure)
λn := K√
log pn
(n, p) = (800, 1600)
0 0.1 0.2 0.30.8
0.85
0.9
0.95
1
False Positive Rate
Tru
e P
ositiv
e R
ate
QUIC(1)QUIC(2)QUICElem−GM
Figure : Receiver operator curves for support set recovery task.
Pradeep Ravikumar (UT Austin) Elementary Estimators for High-Dimensional Statistical Models Jun. 26, 2014 40 / 48
Elementary Estimators for General Moment Parameters Elementary Estimators for Linear Models Elementary Estimators for Gaussian Graphical Models References
Experiments
Approximately 10p non-zero entries in Θ∗ (random structure)
λn := K√
log pn
(n, p) = (5000, 10000)
Table : Performance comparisons our closed-form estimators against state of the artQUIC algorithm (Hsieh et al., 2011) solving `1 MLE
Pradeep Ravikumar (UT Austin) Elementary Estimators for High-Dimensional Statistical Models Jun. 26, 2014 41 / 48
Elementary Estimators for General Moment Parameters Elementary Estimators for Linear Models Elementary Estimators for Gaussian Graphical Models References
ExperimentsApproximately 10p non-zero entries in Θ∗ (random structure)
λn := K√
log pn
(n, p) = (5000, 10000)
0 0.05 0.10.9
0.92
0.94
0.96
0.98
1
False Positive Rate
Tru
e P
ositiv
e R
ate
QUIC(1)QUIC(2)QUICElem−GM
Figure : Receiver operator curves for support set recovery task.
Pradeep Ravikumar (UT Austin) Elementary Estimators for High-Dimensional Statistical Models Jun. 26, 2014 42 / 48
Elementary Estimators for General Moment Parameters Elementary Estimators for Linear Models Elementary Estimators for Gaussian Graphical Models References
Conclusion
We propose a case of elementary convex estimators for estimating generalstatistical models
I Available in closed-form in many casesI Provide a unified statistical analysis for general structure
Future workI Develop this closed form estimation framework for more general
high-dimensional problemsI Extend the framework to non-convex penalty functions
Pradeep Ravikumar (UT Austin) Elementary Estimators for High-Dimensional Statistical Models Jun. 26, 2014 43 / 48
Elementary Estimators for General Moment Parameters Elementary Estimators for Linear Models Elementary Estimators for Gaussian Graphical Models References
Thank you!
Pradeep Ravikumar (UT Austin) Elementary Estimators for High-Dimensional Statistical Models Jun. 26, 2014 44 / 48
Elementary Estimators for General Moment Parameters Elementary Estimators for Linear Models Elementary Estimators for Gaussian Graphical Models References
R. Tibshirani. Regression shrinkage and selection via the lasso. Journal of theRoyal Statistical Society, Series B, 58(1):267–288, 1996.
S. van de Geer and P. Buhlmann. On the conditions used to prove oracle resultsfor the lasso. Electronic Journal of Statistics, 3:1360–1392, 2009.
N. Meinshausen and B. Yu. Lasso-type recovery of sparse representations forhigh-dimensional data. Annals of Statistics, 37(1):246–270, 2009.
E. Candes and T. Tao. The Dantzig selector: Statistical estimation when p ismuch larger than n. Annals of Statistics, 2006.
N. Meinshausen and P. Buhlmann. High-dimensional graphs and variable selectionwith the Lasso. Annals of Statistics, 34:1436–1462, 2006.
M. J. Wainwright. Sharp thresholds for high-dimensional and noisy sparsityrecovery using `1-constrained quadratic programming (Lasso). IEEE Trans.Information Theory, 55:2183–2202, May 2009.
P. Zhao and B. Yu. On model selection consistency of Lasso. Journal of MachineLearning Research, 7:2541–2567, 2006.
J. A. Tropp, A. C. Gilbert, and M. J. Strauss. Algorithms for simultaneous sparseapproximation. Signal Processing, 86:572–602, April 2006. Special issue on”Sparse approximations in signal and image processing”.
Pradeep Ravikumar (UT Austin) Elementary Estimators for High-Dimensional Statistical Models Jun. 26, 2014 44 / 48
Elementary Estimators for General Moment Parameters Elementary Estimators for Linear Models Elementary Estimators for Gaussian Graphical Models References
P. Zhao, G. Rocha, and B. Yu. Grouped and hierarchical model selection throughcomposite absolute penalties. Annals of Statistics, 37(6A):3468–3497, 2009.
M. Yuan and Y. Lin. Model selection and estimation in regression with groupedvariables. Journal of the Royal Statistical Society B, 1(68):49, 2006.
L. Jacob, G. Obozinski, and J. P. Vert. Group Lasso with Overlap and GraphLasso. In International Conference on Machine Learning (ICML), pages433–440, 2009.
K. Lounici, M. Pontil, A. B. Tsybakov, and S. van de Geer. Taking advantage ofsparsity in multi-task learning. Technical Report arXiv:0903.1468, ETH Zurich,March 2009.
R. G. Baraniuk, V. Cevher, M. F. Duarte, and C. Hegde. Model-based compressivesensing. Technical report, Rice University, 2008. Available at arxiv:0808.3572.
B. Recht, M. Fazel, and P. Parrilo. Guaranteed minimum-rank solutions of linearmatrix equations via nuclear norm minimization. SIAM Review, Vol 52(3):471–501, 2010.
F. Bach. Consistency of trace norm minimization. Journal of Machine LearningResearch, 9:1019–1048, June 2008.
S. Negahban, P. Ravikumar, M. J. Wainwright, and B. Yu. A unified frameworkfor high-dimensional analysis of M-estimators with decomposable regularizers.Statistical Science, 27(4):538–557, 2012.
Pradeep Ravikumar (UT Austin) Elementary Estimators for High-Dimensional Statistical Models Jun. 26, 2014 44 / 48
Elementary Estimators for General Moment Parameters Elementary Estimators for Linear Models Elementary Estimators for Gaussian Graphical Models References
M. Yuan and Y. Lin. Model selection and estimation in the Gaussian graphicalmodel. Biometrika, 94(1):19–35, 2007.
J. Friedman, T. Hastie, and R. Tibshirani. Sparse inverse covariance estimationwith the graphical Lasso. Biostatistics, 2007.
O. Bannerjee, , L. El Ghaoui, and A. d’Aspremont. Model selection through sparsemaximum likelihood estimation for multivariate Gaussian or binary data. Jour.Mach. Lear. Res., 9:485–516, March 2008.
P. Ravikumar, M. J. Wainwright, G. Raskutti, and B. Yu. High-dimensionalcovariance estimation by minimizing `1-penalized log-determinant divergence.Electronic Journal of Statistics, 5:935–980, 2011.
S. Boyd and L. Vandenberghe. Convex optimization. Cambridge University Press,Cambridge, UK, 2004.
T. Cai, W. Liu, and X. Luo. A constrained `1 minimization approach to sparseprecision matrix estimation. Journal of the American Statistical Association,106(494):594–607, 2011.
V. Chandrasekaran, B. Recht, P. A. Parrilo, and A. S. Willsky. The convexgeometry of linear inverse problems. In 48th Annual Allerton Conference onCommunication, Control and Computing, 2010.
Pradeep Ravikumar (UT Austin) Elementary Estimators for High-Dimensional Statistical Models Jun. 26, 2014 44 / 48
Elementary Estimators for General Moment Parameters Elementary Estimators for Linear Models Elementary Estimators for Gaussian Graphical Models References
A. J. Rothman, E. Levina, and J. Zhu. Generalized thresholding of largecovariance matrices. Journal of the American Statistical Association (Theoryand Methods), 104:177–186, 2009.
P. J. Bickel and E. Levina. Covariance regularization by thresholding. Annals ofStatistics, 36(6):2577–2604, 2008.
C.-J. Hsieh, M. Sustik, I. Dhillon, and P. Ravikumar. Sparse inverse covariancematrix estimation using quadratic approximation. In Neur. Info. Proc. Sys.(NIPS), 24, 2011.
Pradeep Ravikumar (UT Austin) Elementary Estimators for High-Dimensional Statistical Models Jun. 26, 2014 44 / 48