Penalized Maximum Likelihood Inference for Sparse Gaussian Graphical Models with Latent Structure Christophe Ambroise, Julien Chiquet and Catherine Matias Laboratoire Statistique et G ´ enome, La g ´ enopole - Universit ´ e d’ ´ Evry Statistique et sant ´ e publique, le 13 janvier 2009 Ambroise, Chiquet, Matias 1
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Penalized Maximum Likelihood Inference forSparse Gaussian Graphical Models with
Latent Structure
Christophe Ambroise, Julien Chiquet and Catherine Matias
Laboratoire Statistique et Genome,La genopole - Universite d’Evry
Statistique et sante publique, le 13 janvier 2009
Ambroise, Chiquet, Matias 1
Inferring Sparse Networks with LatentStructure
Christophe Ambroise, Julien Chiquet and Catherine Matias
Laboratoire Statistique et Genome,La genopole - Universite d’Evry
Statistique et sante publique, le 13 janvier 2009
Ambroise, Chiquet, Matias 1
Biological networksDifferent kinds of biological interactions
Banerjee et al. Model selection through sparse maximumlikelihood estimation for multivariate Gaussian, JMLR, 2008.
We deal with a more complex penalty term here.
Ambroise, Chiquet, Matias 23
Let us work on the covariance matrix
PropositionThe maximization problem over K is equivalent to the following,dealing with the covariance matrix Σ:
Σ = arg max‖(Σ−Sn)·/P‖∞≤1
log det(Σ),
where ·/
is the term-by-term division and
P = (pij)i,j∈P =2n
∑q,`
τiqτj`λq`
.
The proof use some optimization, primal/dual tricks
Ambroise, Chiquet, Matias 24
A Block-wise resolution
Denote
Σ =
[Σ11 σ12
σᵀ12 Σ22
], Sn =
[S11 s12
sᵀ12 S22
], P =
[P11 p12
pᵀ12 P22
], (2)
where Σ11 is a (p− 1)× (p− 1) matrix, σ12 is a p− 1 lengthcolumn vector and Σ22 is a scalar.
Each column of Σ satisfies (by det of Schur complement)
σ12 = arg min{‖(y−s12)·/p12‖∞≤1}
{yᵀΣ
−1
11 y},
Ambroise, Chiquet, Matias 25
A `1–norm penalized writing
PropositionSolving the block-wise problem is equivalent to solve thefollowing dual problem
minβ
∥∥∥∥12Σ
1/2
11 β − Σ−1/2
11 s12
∥∥∥∥2
2
+ ‖p12 ? β‖`1 ,
where ? is the term-by-term product. Vectors σ12 and β arelinked by
σ12 = Σ11β/2.
A LASSO-like formulation with existing costless algorithms
Ambroise, Chiquet, Matias 26
The full EM algorithm
while bQτ ( bK(m)) has not stabilized do
//THE E-STEP: LATENT STRUCTURE INFERENCEif m = 1 then
// First passApply spectral clustering on the empirical covariance S to initialize bτ
elseCompute bτ with via fix-point algorithm, using bK(m−1)
end
//THE M-STEP: NETWORK INFERENCEConstruct the penalty matrix P according to bτwhile bΣ(m)
has not stabilized do
for each column of bΣ(m)do
Compute bσ12 by solving the LASSO–like problem with path-wisecoordinate optimization
endend
Compute bK(m) by block-wise inversion of bΣ(m)
m← m+ 1end
Ambroise, Chiquet, Matias 27
The full EM algorithm
while bQτ ( bK(m)) has not stabilized do
//THE E-STEP: LATENT STRUCTURE INFERENCEif m = 1 then
// First passApply spectral clustering on the empirical covariance S to initialize bτ
elseCompute bτ with via fix-point algorithm, using bK(m−1)
end
//THE M-STEP: NETWORK INFERENCEConstruct the penalty matrix P according to bτwhile bΣ(m)
has not stabilized do
for each column of bΣ(m)do
Compute bσ12 by solving the LASSO–like problem with path-wisecoordinate optimization
endend
Compute bK(m) by block-wise inversion of bΣ(m)
m← m+ 1end
Ambroise, Chiquet, Matias 27
The full EM algorithm
while bQτ ( bK(m)) has not stabilized do
//THE E-STEP: LATENT STRUCTURE INFERENCEif m = 1 then
// First passApply spectral clustering on the empirical covariance S to initialize bτ
elseCompute bτ with via fix-point algorithm, using bK(m−1)
end
//THE M-STEP: NETWORK INFERENCEConstruct the penalty matrix P according to bτwhile bΣ(m)
has not stabilized do
for each column of bΣ(m)do
Compute bσ12 by solving the LASSO–like problem with path-wisecoordinate optimization
endend
Compute bK(m) by block-wise inversion of bΣ(m)
m← m+ 1end
Ambroise, Chiquet, Matias 27
The full EM algorithm
while bQτ ( bK(m)) has not stabilized do
//THE E-STEP: LATENT STRUCTURE INFERENCEif m = 1 then
// First passApply spectral clustering on the empirical covariance S to initialize bτ
elseCompute bτ with via fix-point algorithm, using bK(m−1)
end
//THE M-STEP: NETWORK INFERENCEConstruct the penalty matrix P according to bτwhile bΣ(m)
has not stabilized do
for each column of bΣ(m)do
Compute bσ12 by solving the LASSO–like problem with path-wisecoordinate optimization
endend
Compute bK(m) by block-wise inversion of bΣ(m)
m← m+ 1end
Ambroise, Chiquet, Matias 27
Outline
Give the network a modelGaussian graphical modelsProviding the network with a latent structureThe complete likelihood
Inference strategy by alternate optimizationThe E–step: estimation of the latent structureThe M–step: inferring the connectivity matrix
Numerical ExperimentsSynthetic dataBreast cancer data
Ambroise, Chiquet, Matias 28
Outline
Give the network a modelGaussian graphical modelsProviding the network with a latent structureThe complete likelihood
Inference strategy by alternate optimizationThe E–step: estimation of the latent structureThe M–step: inferring the connectivity matrix
Numerical ExperimentsSynthetic dataBreast cancer data
Ambroise, Chiquet, Matias 29
Simulations settings
Five inference methods
1. InvCorEdge estimation based on empirical correlation matrix inversion.
2. GeneNet (Strimmer et al.)Edge estimation based on partial correlation with shrinkage.
3. GLasso (Friedman et al.)Edge estimation uses a uniform penalty matrix.
4. “perfect” SIMoNe (best results our method can aspire to)Edge estimation uses a penalty matrix constructed according to the theoretic
node classification.
5. SIMoNe (Statistical Inference for MOdular NEtworks)Edge estimation uses a penalty matrix constructed according to the estimated
node classification, iteratively.
Ambroise, Chiquet, Matias 30
Test simulation setup
Simulated Graphs
I Graphs simulated using an affiliation model (two sets ofparameters: intra-groups and inter-groups connections)
I p = 200 nodes p(p− 1)/2 = 19900 possible interactions.I 50 graphs (repetitions) were simulated per situation.I Gene expression data (i.e., Gaussian samples) was then
simulated using the sampled graph:1. Favorable setting (n = 10p),2. Middle case (n = 2p)3. Unfavorable setting (n = p/2)
Unstructured graph
I When no structure SIMoNe is comparable to GeneNet andGLasso
Ambroise, Chiquet, Matias 31
Concentration matrix and structure
(a) (b) (c)
Figure: Simulation of the structured sparse concentration matrix.Adjacency matrix without (a), with (b) columns reorganizedaccording the affiliation structure and corresponding graph (c).
Ambroise, Chiquet, Matias 32
Example of graph recoveryFavorable case
Figure: Theoretical graph and SIMoNe estimationAmbroise, Chiquet, Matias 33
Example of graph recoveryFavorable case
Figure: Theoretical graph and SIMoNe estimationAmbroise, Chiquet, Matias 33
Precision/Recall CurvesDefinition
Precision =TP
TP + FP= Proportion of true positives among all positives
Recall =TP
TP + FN= Proportion of true positive among all edges
Ambroise, Chiquet, Matias 34
Precision/Recall CurvesFavorable setting – n = 10p
I With n� p, PerfectSIMoNe and SIMoNeperform equivalently
I When 3p > n > p thestucture is partiallyrecovered, SIMoNeimproves the edgesselection.
I With n� p, PerfectSIMoNe and SIMoNeperform equivalently
I When 3p > n > p thestucture is partiallyrecovered, SIMoNeimproves the edgesselection.
I When n ≤ p all methodsperform poorly. . .
0.0 0.2 0.4 0.6 0.8 1.0
0.2
0.4
0.6
0.8
1.0
Recall
Prec
ision
SIMoNeGLassoPerfectGeneNetInvcor
Figure: GeneNet, GLasso, PerfectSIMoNe, SIMoNe
Ambroise, Chiquet, Matias 34
Precision/Recall CurvesUnfavorable case – n = p
I With n� p, PerfectSIMoNe and SIMoNeperform equivalently
I When 3p > n > p thestucture is partiallyrecovered, SIMoNeimproves the edgesselection.
I When n ≤ p all methodsperform poorly. . .
0.0 0.2 0.4 0.6 0.8 1.0
0.2
0.4
0.6
0.8
Recall
Prec
ision
SIMoNeGLassoPerfectGeneNet
Figure: GeneNet, GLasso, PerfectSIMoNe, SIMoNe
Ambroise, Chiquet, Matias 34
Precision/Recall CurvesUnfavorable case – n = p/2
I With n� p, PerfectSIMoNe and SIMoNeperform equivalently
I When 3p > n > p thestucture is partiallyrecovered, SIMoNeimproves the edgesselection.
I When n ≤ p all methodsperform poorly. . .
0.0 0.2 0.4 0.6 0.8 1.0
0.2
0.4
0.6
0.8
1.0
Recall
Prec
ision
SIMoNeGLassoPerfectGeneNet
Figure: GeneNet, GLasso, PerfectSIMoNe, SIMoNe
Ambroise, Chiquet, Matias 34
Outline
Give the network a modelGaussian graphical modelsProviding the network with a latent structureThe complete likelihood
Inference strategy by alternate optimizationThe E–step: estimation of the latent structureThe M–step: inferring the connectivity matrix
Numerical ExperimentsSynthetic dataBreast cancer data
Ambroise, Chiquet, Matias 35
First results on real a datasetPrediction of the outcome of preoperative chemotherapy
Two types of patients
1. Patient response can be classified as either a pathologiccomplete response (PCR)
2. or residual disease (Not PCR).
Gene expression data
I 133 patients (99 not PCR, 34 PCR)I 26 identified genes (differential analysis)
Ambroise, Chiquet, Matias 36
First result on real a datasetPrediction of the outcome of preoperative chemotherapy
AMFR
BB_S4
BECNI
BTG3
CA12
CTNND2
E2F3
ERBB4
FGFRIOP
FLJ10916
FLJI2650
GAMT
GFRAI
IGFBP4
JMJD2B
KIA1467
MAPT
MBTP_SI
MELK
METRN
PDGFRA
RAMPI
RRM2
SCUBE2
THRAP2
ZNF552
Full SampleAmbroise, Chiquet, Matias 37
First result on real a datasetPrediction of the outcome of preoperative chemotherapy
AMFR
BB_S4
BECNI
BTG3
CA12
CTNND2
E2F3
ERBB4
FGFRIOP
FLJ10916
FLJI2650
GAMT
GFRAI
IGFBP4
JMJD2B
KIA1467
MAPT
MBTP_SI
MELK
METRN
PDGFRA
RAMPI
RRM2
SCUBE2
THRAP2
ZNF552
Not PCRAmbroise, Chiquet, Matias 37
First result on real a datasetPrediction of the outcome of preoperative chemotherapy
AMFR
BB_S4
BECNI
BTG3
CA12
CTNND2
E2F3
ERBB4
FGFRIOP
FLJ10916
FLJI2650
GAMT
GFRAI
IGFBP4
JMJD2B
KIA1467
MAPT
MBTP_SI
MELKMETRN
PDGFRA
RAMPI
RRM2
SCUBE2
THRAP2
ZNF552
PCRAmbroise, Chiquet, Matias 37
Conclusions
To sum-up
I We proposed an inference strategy based on apenalization scheme given by an underlying unknownstructure.
I The estimation strategy is based on a variational EM
algorithm, in which a LASSO-like procedure is embedded.I Preprint on arxiv.I R package SIMoNe
Perspectives
I Consider alternative prior more biologically relevant: hubs,motifs.
I Time segmentation when dealing with temporal data
Ambroise, Chiquet, Matias 38
Conclusions
To sum-up
I We proposed an inference strategy based on apenalization scheme given by an underlying unknownstructure.
I The estimation strategy is based on a variational EM
algorithm, in which a LASSO-like procedure is embedded.I Preprint on arxiv.I R package SIMoNe
Perspectives
I Consider alternative prior more biologically relevant: hubs,motifs.
I Time segmentation when dealing with temporal data
Ambroise, Chiquet, Matias 38
Penalty choice (1)
Let Ci denote the connectivity component of i in the trueconditional dependency graph, and Ci the correspondingcomponent resulting from the estimate K.
PropositionFix some ε > 0 and choose the penalty parameters λ such that,for all q, ` ∈ Q,
2p2Fn−2
2nλq`
(maxi 6=j
SiiSjj −1λ2q`
)−1/2
(n− 2)1/2
≤ ε,where 1− Fn−2 is the c.d.f. of a Students’s t-distribution withn− 2 degrees of freedom. Then
P(∃k, Ck * Ck) ≤ ε. (3)
Ambroise, Chiquet, Matias 39
Penalty choice (2)
It’s enough to choose λq` such as
λq`(ε) ≥2n
(n− 2 + t2n−2
(ε
2p2
))1/2
×
maxi 6=j
ZiqZj`=1
SiiSjj
−1/2
tn−2
(ε
2p2
)−1
.
Ambroise, Chiquet, Matias 40
Penalty choice (3)
Practically,
I Relax the λq` in the E–step (variational inference), thusmaking variational EM in the E-step.
I Fix the λq` in the M-step, adapting the above rule to thecontext.E.g., for an affiliation structure, we fix the ratio λin/λout = 1.2 and either let the
value 1/λin vary when considering precision/recall curves for synthetic data, or fix
this parameter relying on the above rule when dealing with real data