Rebuilding Factorized Information Criterion: Asymptotically Accurate Marginal Likelihood Kohei Hayashi 12 Shin-ichi Maeda 3 Ryohei Fujimaki 4 1 National Institute of Informatics 2 Kawarabayashi Large Graph Project, ERATO, JST 3 Kyoto University 4 NEC Knowledge Discovery Laboratories July 10, 2015 1 / 21
31
Embed
Rebuilding Factorized Information Criterion: Asymptotically Accurate Marginal Likelihood
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Rebuilding Factorized InformationCriterion: Asymptotically Accurate
Marginal Likelihood
Kohei Hayashi12 Shin-ichi Maeda3
Ryohei Fujimaki4
1National Institute of Informatics
2Kawarabayashi Large Graph Project, ERATO, JST
3Kyoto University
4NEC Knowledge Discovery Laboratories
July 10, 20151 / 21
Introduction
Factorized asymptotic Bayesian inference (FAB)
• Recently-developed approximate Bayesian method
4 Accurate and tractable
6 Limited to binary latent variable models (LVMs)
Our contributions:
• Extend FAB to general LVMs (e.g. PCA)
• Analyze theoretical properties that are unclear in theprevious studies
2 / 21
..1 Revisiting FAB
..2 Generalization of FAB
3 / 21
Bayesian Inference for Binary LVMs
Binary LVM:p( X︸︷︷︸
data
, Z︸︷︷︸LVs
, Π︸︷︷︸params
| K︸︷︷︸model
) = p(Π)︸ ︷︷ ︸prior
p(X,Z|Π, K)︸ ︷︷ ︸joint likelihood
Assumptions:
• X and Z are jointly i.i.d.
p(X,Z|Π, K) =N∏n=1
p(xn, zn|Π, K)
• The prior doesn’t depend on N• ln p(Π) = O(1)• “Flat” prior
4 / 21
Goal: To obtain
• the marginal likelihood:
p(X|K) =
∫p(X,Z,Π|K)dZdΠ
• the marginal posteriors:
p(Z|X, K) =
∫p(X,Z,Π|K)dΠ/p(X|K)
p(Π|X, K) =
∫p(X,Z,Π|K)dZ/p(X|K)
Problem: The marginalizations are intractable5 / 21
Key idea: Use
• the variational representation for∫dZ
• Laplace’s method for∫dΠ
.Factorized information criterion (FIC)..
......
FIC(K) ≡ maxq
Eq
[maxΠ
ln p(X,Z|Π, K)]
− Eq
[DΠ
2
∑k
ln∑n
znk
]︸ ︷︷ ︸
FIC penalty term
+H(q)+O(lnN)
• q(Z): trial distribution
• H(q): entropy
6 / 21
Accuracy of FIC
4 Asymptotically equivalent to the marginal likelihood
.Theorem 3 of [Fujimaki+ 12a]..
......
In mixture models, under mild conditions,
FIC(K) = ln p(X|K) +O(1)
≈ ln p(X|K)
Similar results are obtained for:
• HMMs [Fujimaki+ 12b]
• Latent feature models [KH+ 13]
• Mixture of experts [Eto+ 14]
• Factorial relational models [Liu+ yesterday]
7 / 21
Optimizing FIC
Computation of FIC is difficult
maxq
Eq
[maxΠ
ln p(X,Z|Π, K)]− DΠ
2
∑k
Eq
[ln∑n
znk
]+H(q)
≥maxq∈Q
Eq
[maxΠ
ln p(X,Z|Π, K)]− DΠ
2
∑k
Eq
[ln∑n
znk
]+H(q)
Mean-field approx. (Q ≡ {q(Z)|q(Z) =∏
n q(zn)})
≥ maxq∈Q,Π
Eq [ln p(X,Z|Π, K)]− DΠ
2
∑k
ln∑n
Eq[znk] +H(q)
Jensen’s ineq.
≡FIC(K)
8 / 21
Algorithm
Optimization problem:
maxq∈Q,Π
Eq [ln p(X,Z|Π, K)]− DΠ
2
∑k
ln∑n
Eq[znk] +H(q)
Can be solved by EM-like alternating updates:
..1 Initialize q and Π
..2 Update q (Fix Π)
..3 Update Π (Fix q)
..4 Repeat step 2 and 3 until convergence
9 / 21
Model PruningThe FAB algorithm eliminates irrelevant componentsautomatically
Eq [ln p(X,Z|Π, K)]− DΠ
2
∑k
ln∑n
Eq[znk]︸ ︷︷ ︸penalty term
+H(q)
0 2 4 6 8 10
-log(x)
• The penalty term introduces group sparsity to Z
K=6
Z
K=6 K=3
update update
10 / 21
Summary of FIC/FAB
4 Asymptotically equivalent to the marginal likelihood• Fits to “Big Data” situations
4 Performs parameter inference and model selectionsimultaneously
• EM-like updates of q and Π• ARD-like model pruning
4 Doesn’t depend on the choice of p(Π)• More frequentist than Bayesian
4 Works in many binary LVMs
11 / 21
Limitations of FIC/FAB
6 Limited to binary LVMs• In real Z,
∑n znk can be negative
• − ln∑
n znk may diverge
6 Missing relations to EM and VB• Similar approaches, but which are better?
6 Unclear legitimacy of optimizing FIC• e.g. tightness