Introduction Stochastic Variational Inference Stochastic Variational Inference in Topic Models Some Bibliograpy Stochastic Variational Inference Jesus Fernandez Bes Machine Learning Group March 27, 2014 Jesus Fernandez Bes Stochastic Variational Inference
29
Embed
Stochastic Variational Inference - UC3Mjesusfbes/MLG_SVI.pdfThe natural gradient of the ELBO Stochastic Variational Inference 3 Stochastic Variational Inference in Topic Models Topic
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
IntroductionStochastic Variational Inference
Stochastic Variational Inference in Topic ModelsSome Bibliograpy
Stochastic Variational Inference
Jesus Fernandez Bes
Machine Learning Group
March 27, 2014
Jesus Fernandez Bes Stochastic Variational Inference
Stochastic Variational Inference in Topic ModelsSome Bibliograpy
1 Introduction
2 Stochastic Variational InferenceModels with local and global hidden variablesMean-field variational inferenceThe natural gradient of the ELBOStochastic Variational Inference
3 Stochastic Variational Inference in Topic ModelsTopic ModelsLatent Diriclet AllocationHierarchichal Dirichlet Process
4 Some Bibliograpy
Jesus Fernandez Bes Stochastic Variational Inference
IntroductionStochastic Variational Inference
Stochastic Variational Inference in Topic ModelsSome Bibliograpy
MotivationMain Ideas
Challenges of modern data analysis
Massive
Complex
High-dimensional
Probability Models (and Graphical Models) deal with complexity.Scale is the problem.
“Traditional” Variational Inference
1 Inference =⇒ High-dimensional optimization.
2 Solved using Coordinate ascent algorithms.
Analyze ALL the data.Re-estimate hidden structure.Analyze ALL the data.. . .
DO NOT SCALE WITH BIG DATA
Jesus Fernandez Bes Stochastic Variational Inference
IntroductionStochastic Variational Inference
Stochastic Variational Inference in Topic ModelsSome Bibliograpy
MotivationMain Ideas
How to make a general Variational method that scales.
Use Stochastic Optimization. Follow cheap noisy estimates ofthe gradient.
Use Natural Gradient. Stochastic Variational Inference has anattractive form.
Structure of SVI
1 Subsample one or more data points from the data.
2 Analyze the subsample using current variational parameters.
3 Implement a closed-form update of the parameters.
4 Repeat.
Jesus Fernandez Bes Stochastic Variational Inference
IntroductionStochastic Variational Inference
Stochastic Variational Inference in Topic ModelsSome Bibliograpy
Models with local and global hidden variablesMean-field variational inferenceThe natural gradient of the ELBOStochastic Variational Inference
p(x, z, β|α) = p(β|α)
N∏n=1
p(xn, zn|β)
N observations x = x1:N .
Vector of global hidden variables β.
N local hidden variables z = z1:N each is a collection of Jvariables zn = zn,1:J .
Vector of fixed parameters α.
Jesus Fernandez Bes Stochastic Variational Inference
IntroductionStochastic Variational Inference
Stochastic Variational Inference in Topic ModelsSome Bibliograpy
Models with local and global hidden variablesMean-field variational inferenceThe natural gradient of the ELBOStochastic Variational Inference
Complete Conditional assumption
Complete conditionals are in the exponential family
Jesus Fernandez Bes Stochastic Variational Inference
IntroductionStochastic Variational Inference
Stochastic Variational Inference in Topic ModelsSome Bibliograpy
Models with local and global hidden variablesMean-field variational inferenceThe natural gradient of the ELBOStochastic Variational Inference
Gradient of the ELBO and Coordinate Ascent Inference
∇λL = ∇2λag(λ)(Eq [ηq(x, z, α)]− λ)
∇φnjL = ∇2
φnjal(φnj)(Eq [ηl(xn, zn,−j , β)]− φnj)
Both of them equal 0 by setting
λ = Eq [ηg(x, z, α)]
φn,j = Eq [ηl(xn, zn,−j , β)]
Jesus Fernandez Bes Stochastic Variational Inference
IntroductionStochastic Variational Inference
Stochastic Variational Inference in Topic ModelsSome Bibliograpy
Models with local and global hidden variablesMean-field variational inferenceThe natural gradient of the ELBOStochastic Variational Inference
Gradient, if exists, points to the direction of steepest ascent,
arg maxdλ
f(λ− dλ) subject to ‖dλ‖2 < ε
for small ε. Gradient depends on euclidean distance metric in theparameter space.
In probability distributions euclidean metric can be a bad metric.Jesus Fernandez Bes Stochastic Variational Inference
IntroductionStochastic Variational Inference
Stochastic Variational Inference in Topic ModelsSome Bibliograpy
Models with local and global hidden variablesMean-field variational inferenceThe natural gradient of the ELBOStochastic Variational Inference
Natural gradient accounts for the information geometry of itsparameter space.
Symmetrized KL divergence
Natural measure of dissimilarity between probability distributions
DsymKL (λ, λ′) = Eλ
[log
q(β|λ)
q(β|λ′)
]+ Eλ′
[log
q(β|λ′)q(β|λ)
]Using this distance, the direction of steepest ascent is
arg maxdλ
f(λ+ dλ) subject to DsymKL (λ, λ+ dλ) < ε
Jesus Fernandez Bes Stochastic Variational Inference
IntroductionStochastic Variational Inference
Stochastic Variational Inference in Topic ModelsSome Bibliograpy
Models with local and global hidden variablesMean-field variational inferenceThe natural gradient of the ELBOStochastic Variational Inference
Natural Gradient
Natural Gradient points in the direction of steeped ascent inthe Riemannian space.
∇̂λf(λ) = G(λ)−1∇λf(λ)
where G(λ) = Eλ[(∇λ log q(β, λ))(∇λ log q(β, λ))T
]is the
fisher information matrix of q(λ).
For exponential family: G(λ) = ∇2λag(λ)
For our mean-field model:
∇̂λL = Eφ [ηq(x, z, α)]− λ
∇̂φnjL = Eλ,φn,−j
[ηl(xn, zn,−j , β)]− φnj
Jesus Fernandez Bes Stochastic Variational Inference
IntroductionStochastic Variational Inference
Stochastic Variational Inference in Topic ModelsSome Bibliograpy
Models with local and global hidden variablesMean-field variational inferenceThe natural gradient of the ELBOStochastic Variational Inference
Why Natural Gradients?
Traditional Gradients
∇λL = ∇2λag(λ)(Eq [ηq(x, z, α)]− λ)
∇φnjL = ∇2
φnjal(φnj)(Eq [ηl(xn, zn,−j , β)]− φnj)
Natural Gradients
∇̂λL = Eφ [ηq(x, z, α)]− λ∇̂φnj
L = Eλ,φn,−j[ηl(xn, zn,−j , β)]− φnj
Coordinate ascent is equal to taking a natural gradient step oflength one.
Easier to compute. Use them to develop scalable variationalinferece algorithms.
Jesus Fernandez Bes Stochastic Variational Inference
IntroductionStochastic Variational Inference
Stochastic Variational Inference in Topic ModelsSome Bibliograpy
Models with local and global hidden variablesMean-field variational inferenceThe natural gradient of the ELBOStochastic Variational Inference
Stochastic Optimization
We have a random function B(λ) with Eq [B(λ)] = ∇λf(λ). Wecan optimize f(λ) iteratively as,
λ(t) = λ(t−1) + ρtbt(λ(t−1))
where bt is an independent draw from B. The sequence of ρt mustsatisfy Robbins-Monro conditions.
Follow noisy estimates of the gradient with a decreasing stepsize.If gradient can be written as a sum of terms (one per datapoint) a fast noisy approximation can be computed bysubsampling data.λ(t) will converge to the optimal λ∗ (if f is convex) or a localoptimum of f (if not convex *).
Jesus Fernandez Bes Stochastic Variational Inference
IntroductionStochastic Variational Inference
Stochastic Variational Inference in Topic ModelsSome Bibliograpy
Models with local and global hidden variablesMean-field variational inferenceThe natural gradient of the ELBOStochastic Variational Inference
L(λ) =
global︷ ︸︸ ︷Eq [log p(β)]− Eq [log q(β)]
+
N∑n=1
maxφn
(Eq [log p(xn, zn|β)]− Eq [log q(zn)])︸ ︷︷ ︸sum of local
We choose I ∼ Unif(1, · · · , N) and define LI(λ) as the randomfunction
LI(λ) = Eq [log p(β)]− Eq [log q(β)]
+ N maxφI
(Eq [log p(xI , zI |β)]− Eq [log q(zI)])
Expectation of LI is equal to the objective, and consequently∇̂λLI is a noisy but unbiased estimate of the natural gradient ofthe objective.
Jesus Fernandez Bes Stochastic Variational Inference
IntroductionStochastic Variational Inference
Stochastic Variational Inference in Topic ModelsSome Bibliograpy
Models with local and global hidden variablesMean-field variational inferenceThe natural gradient of the ELBOStochastic Variational Inference
Stochastic Optimization for global parameters
∇̂λLi = Eq[ηg(x
(N)i , z
(N)i , α)
]− λ
ηg(x(N)i , z
(N)i , α) = α+N · (t(xn, zn), 1)
∇̂λLi = α+N · (Eq [t(xn, zn)] , 1)− λ
Using Stochastic optimization
λ̂t , α+NEφ(λ) [(t(xi, zi), 1)]
λ(t) = λ(t−1) + ρt
(λ̂t − λ(t−1)
)= (1− ρt)λ(t−1) + ρtλ̂t
Jesus Fernandez Bes Stochastic Variational Inference
IntroductionStochastic Variational Inference
Stochastic Variational Inference in Topic ModelsSome Bibliograpy
Models with local and global hidden variablesMean-field variational inferenceThe natural gradient of the ELBOStochastic Variational Inference
Stochastic Variational Inference
Jesus Fernandez Bes Stochastic Variational Inference
IntroductionStochastic Variational Inference
Stochastic Variational Inference in Topic ModelsSome Bibliograpy
Models with local and global hidden variablesMean-field variational inferenceThe natural gradient of the ELBOStochastic Variational Inference
Extensions
Minibatches
Pick more than one data point each time,
λ(t) = (1− ρt)λ(t−1) +ρtS
∑s
λ̂s.
Empirical Bayes estimation of hyperparameters
Get a point estimate of the value of hyperparameters α
α(t) = α(t−1) + ρt∇αLt(λ(t−1), φ, α(t−1)).
Jesus Fernandez Bes Stochastic Variational Inference
IntroductionStochastic Variational Inference
Stochastic Variational Inference in Topic ModelsSome Bibliograpy
Topic ModelsLatent Diriclet AllocationHierarchichal Dirichlet Process
Topic Models
Observations:
Words wdn is the nth word in the dth document. Element of afixed vocabulary of V terms.
Latent Variables:
A topic βk is a distribution over the vocabulary. Point inV − 1-simplex.Topic proportions θd are asociated to each document.Distribution over topics.Each word in each document comes from a single topic. TopicAssignment zdn are topic indexes.
Consider two models: Latent Dirichlet Allocation (LDA) has afixed number of K topics. Hierarchical Dirichlet Process (HDP)has infinite number of topics.
Jesus Fernandez Bes Stochastic Variational Inference
IntroductionStochastic Variational Inference
Stochastic Variational Inference in Topic ModelsSome Bibliograpy
Topic ModelsLatent Diriclet AllocationHierarchichal Dirichlet Process
Analyzing the documents
Posterior inference of p(β, θ, z|w)
Jesus Fernandez Bes Stochastic Variational Inference
IntroductionStochastic Variational Inference
Stochastic Variational Inference in Topic ModelsSome Bibliograpy
Topic ModelsLatent Diriclet AllocationHierarchichal Dirichlet Process
Generative model
1 Draw topics bk ∼ Dirichlet(η, · · · , η).2 For each document d ∈ {1, · · · , D}:
1 Draw topic proportions θ ∼ Dirichlet(α, · · · , α).2 For each word w ∈ {1, · · · , N}:
Jesus Fernandez Bes Stochastic Variational Inference
IntroductionStochastic Variational Inference
Stochastic Variational Inference in Topic ModelsSome Bibliograpy
Topic ModelsLatent Diriclet AllocationHierarchichal Dirichlet Process
Variational Inference in LDA
Mean-field for LDA
q(zdn) = Multinomial(φdn)
q(θd) = Dirichlet(γd)
q(βk) = Dirichlet(λk)
1 Update per-document d local variational parameters
φkdn ∝ exp{Ψ(γdk) + Ψ(λk,wdn)−Ψ(
∑v
λkv)} for n ∈ {1, · · · , N}
γd = α+
N∑n=1
φdn
2 Update global parameters λk = η +∑D
d=1
∑Nn=1 φ
kdnwdn
Jesus Fernandez Bes Stochastic Variational Inference
IntroductionStochastic Variational Inference
Stochastic Variational Inference in Topic ModelsSome Bibliograpy
Topic ModelsLatent Diriclet AllocationHierarchichal Dirichlet Process
Stochastic Variational Inference in LDA
Jesus Fernandez Bes Stochastic Variational Inference
IntroductionStochastic Variational Inference
Stochastic Variational Inference in Topic ModelsSome Bibliograpy
Topic ModelsLatent Diriclet AllocationHierarchichal Dirichlet Process
Results LDA
DATA
Nature: 350k docs, 58M words, 4200 terms.
New York Times: 1.8M docs, 461M words, 8000 terms.
Wikipedia: 3.8M docs,482M words, 7700 terms.
* Batch Variational uses a subset of 100k docs.Jesus Fernandez Bes Stochastic Variational Inference
IntroductionStochastic Variational Inference
Stochastic Variational Inference in Topic ModelsSome Bibliograpy
Topic ModelsLatent Diriclet AllocationHierarchichal Dirichlet Process
Results HDP
Jesus Fernandez Bes Stochastic Variational Inference
IntroductionStochastic Variational Inference
Stochastic Variational Inference in Topic ModelsSome Bibliograpy
Some Bibliograpy
Main Paper
Hoffman, M. D., and Blei, D. M., and Wang, C., andPaisley, J. (2013). “Stochastic variational inference”. The Journalof Machine Learning Research, 14(1), 1303-1347.
Other References
Blei, D. M.. “Variational Inference”. Lecture Notes ofCOS597C: Advanced Methods in Probabilistic Modeling,Princeton University, fall 2011,www.cs.princeton.edu/courses/archive/fall11/
cos597C/lectures/variational-inference-i.pdf.
Blei, D. M., “Exponential Families,” Lecture Notes ofCOS597C: Advanced Methods in Probabilistic Modeling,Princeton University, fall 2011,www.cs.princeton.edu/courses/archive/fall11/
cos597C/lectures/exponential-families.pdf.
Jesus Fernandez Bes Stochastic Variational Inference