Top Banner
Latent variable models for discrete data Jianfei Chen Department of Computer Science and Technology Tsinghua University, Beijing 100084 [email protected] Janurary 13, 2014 Murphy, Kevin P. Machine learning: a probabilistic perspective. The MIT Press, 2012. Chapter 27.
21

Latent variable models for discrete dataml.cs.tsinghua.edu.cn/~jianfei/static/lvm.pdf · Latent variable models for discrete data Jianfei Chen Department of Computer Science and Technology

Jun 08, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Latent variable models for discrete dataml.cs.tsinghua.edu.cn/~jianfei/static/lvm.pdf · Latent variable models for discrete data Jianfei Chen Department of Computer Science and Technology

Latent variable models for discrete data

Jianfei Chen

Department of Computer Science and TechnologyTsinghua University, Beijing 100084

[email protected]

Janurary 13, 2014

Murphy, Kevin P. Machine learning: a probabilistic perspective. The MIT Press, 2012.Chapter 27.

Page 2: Latent variable models for discrete dataml.cs.tsinghua.edu.cn/~jianfei/static/lvm.pdf · Latent variable models for discrete data Jianfei Chen Department of Computer Science and Technology

Introduction

We want to model three types of discrete data

Sequence of tokens: p(yi,1:Li)

Bag of words: p(ni)

Discrete features: p(yi,1:R)

Jianfei Chen (THU) Latent variable models Janurary 13, 2014 2 / 21

Page 3: Latent variable models for discrete dataml.cs.tsinghua.edu.cn/~jianfei/static/lvm.pdf · Latent variable models for discrete data Jianfei Chen Department of Computer Science and Technology

Outline

Mixture Models

LSA / PLSI / LDA / GaP / NMF

LDA

EvaluationInferenceVariants: CTM, DTM, LDA-HMM, SLDA, MedLDA, etc.

RBM

Jianfei Chen (THU) Latent variable models Janurary 13, 2014 3 / 21

Page 4: Latent variable models for discrete dataml.cs.tsinghua.edu.cn/~jianfei/static/lvm.pdf · Latent variable models for discrete data Jianfei Chen Department of Computer Science and Technology

Mixture models

p(y) =∑k

p(y|qi = k)p(qi = k)

Sequence of tokens: p(yi,1:Li |qi = k) =∏Lil=1Cat(yil|bk)

Discrete features: p(yi,1:R|qi = k) =∏Rr=1Cat(yir|b

(r)k )

Bag of words (known Li): p(ni|Li, qi = k) = Mu(ni|Li,bk)Bag of words (unknown Li): p(ni|qi = k) =

∏Vv=1 Poi(niv|λvk)

Jianfei Chen (THU) Latent variable models Janurary 13, 2014 4 / 21

Page 5: Latent variable models for discrete dataml.cs.tsinghua.edu.cn/~jianfei/static/lvm.pdf · Latent variable models for discrete data Jianfei Chen Department of Computer Science and Technology

Mixture models

Theorem

If ∀i,Xi ∼ Poi(λi), let n =∑

iXi

p(X1, · · · , Xk|n) = Mu(X|n, π)

where πi =λi∑k λk

.

Jianfei Chen (THU) Latent variable models Janurary 13, 2014 5 / 21

Page 6: Latent variable models for discrete dataml.cs.tsinghua.edu.cn/~jianfei/static/lvm.pdf · Latent variable models for discrete data Jianfei Chen Department of Computer Science and Technology

Exponential Family PCA

latent semantic analysis (LSA) / latent semantic indexing (LSI)

Sequence of tokens: p(yi,1:Li |zi) =∏Lil=1Cat(yil|S(Wzi))

Discrete features: p(yi,1:R|zi) =∏Rr=1Cat(yir|S(Wrzi))

Bag of words (known Li): p(ni|Li, zi) = Mu(ni|Li, S(Wzi))

Bag of words (unknown Li): p(ni|zi) =∏Vv=1 Poi(niv|exp(wv,:zi))

where S(·) is the softmax transformation, zi ∈ RK , W,Wr ∈ RV×K .Inference

coordinate ascent / degenerated EM (problem: overfitting?)

variational EM / MCMC

Jianfei Chen (THU) Latent variable models Janurary 13, 2014 6 / 21

Page 7: Latent variable models for discrete dataml.cs.tsinghua.edu.cn/~jianfei/static/lvm.pdf · Latent variable models for discrete data Jianfei Chen Department of Computer Science and Technology

LSA / PLSI / LDA

Unigram: p(yi,1:Li |qi = k) =∏Lil=1Cat(yil|bk)

LSI: p(yi,1:Li |zi) =∏Lil=1Cat(yil|S(Wzi))

PLSI: p(yi,1:Li |πi) =∏Lil=1Cat(yil|Bπi)

LDA: p(yi,1:Li |πi) =∏Lil=1Cat(yil|Bπi), πi ∼ Dir(πi|α)

LDA for other data types

Bag of words:p(ni|Li, πi) = Mu(ni|Li,Bπi)Discrete features:p(yi,1:R|πi) =

∏Rr=1Cat(yir|B(r)πi)

Question: What is dual parameter? Why is it convenient?

Jianfei Chen (THU) Latent variable models Janurary 13, 2014 7 / 21

Marlin, Benjamin M. ”Modeling user rating profiles for collaborative filtering.” Advancesin neural information processing systems. 2003.

Page 8: Latent variable models for discrete dataml.cs.tsinghua.edu.cn/~jianfei/static/lvm.pdf · Latent variable models for discrete data Jianfei Chen Department of Computer Science and Technology

Gamma-Poisson Model

LDA

models p(ni|Li, πi) = Mu(ni|Li,Bπi)Prior πi ∼ Dir(α)

Constraint 0 ≤ πik,∑

j πik = 1, 0 ≤ Bvk,∑

v Bvk = 1

GaP

models p(ni|z+i ) =∏Vv=1 Poi(niv|b>v,:z

+i )

Prior p(z+i ) =∏kGa(z+ik|αk, βk)

Constraint 0 ≤ zik, 0 ≤ BvkCan use sparse-inducing prior (27.17)GaP only have non-negative constraints

Jianfei Chen (THU) Latent variable models Janurary 13, 2014 8 / 21

Page 9: Latent variable models for discrete dataml.cs.tsinghua.edu.cn/~jianfei/static/lvm.pdf · Latent variable models for discrete data Jianfei Chen Department of Computer Science and Technology

Non-negative matrix factorization

Given non-negative matrix V , find non-negative matrix factors W,H suchthat

V ≈WH

Vi ≈∑k

WikHk

Can be view as GaP when prior αk = βk = 0.

Jianfei Chen (THU) Latent variable models Janurary 13, 2014 9 / 21

Seung, D., and L. Lee. ”Algorithms for non-negative matrix factorization.” Advances inneural information processing systems.

Page 10: Latent variable models for discrete dataml.cs.tsinghua.edu.cn/~jianfei/static/lvm.pdf · Latent variable models for discrete data Jianfei Chen Department of Computer Science and Technology

Latent Dirichlet Allocation (LDA)

Notation

πz|α ∼ Dir(α) (1)

qil|πi ∼ Cat(πi) (2)

bk|γ ∼ Dir(γ) (3)

yil|qil = k,B ∼ Cat(bk) (4)

Geometric interpretation

Simplex: handle ambiguity (?)

Unidentifiable: Labeled LDA

Jianfei Chen (THU) Latent variable models Janurary 13, 2014 10 / 21

D. Blei et al. ”Latent dirichlet allocation.” JMLRG. Heinrich. ”Parameter estimation for text analysis.”D. Ramage, et al. ”Labeled LDA: A supervised topic model for credit attribution in multi-labeled corpora.” EMNLP

http://www.cs.princeton.edu/courses/archive/fall11/cos597C/lectures/variational-inference-i.pdf

Page 11: Latent variable models for discrete dataml.cs.tsinghua.edu.cn/~jianfei/static/lvm.pdf · Latent variable models for discrete data Jianfei Chen Department of Computer Science and Technology

Evaluation: Perplexity

Perplexity of language model q given language p is defined as (both p, qare stocastic process)

perplexity(p, q) = 2H(p,q)

where H(p, q) is cross-entrypy

H(p, q) = limN→∞

− 1

N

∑y1:N

p(y1:N ) log q(y1:N )

Approximations

N is finite

p(y1:N ) = δy∗1:N

(y1:N )

Jianfei Chen (THU) Latent variable models Janurary 13, 2014 11 / 21

Page 12: Latent variable models for discrete dataml.cs.tsinghua.edu.cn/~jianfei/static/lvm.pdf · Latent variable models for discrete data Jianfei Chen Department of Computer Science and Technology

Evaluation: Perplexity

H(p, q) = − 1

Nlog q(y∗1:N )

Intuition: weighted average branching factorFor unigram model

H = − 1

N

N∑i=1

1

Li

Li∑l=1

log q(y∗il)

For LDA

H = − 1

N

∑i=1N

p(y∗i,1:Li)

Use variational evidence lower bound (ELBO)

Use annealed importance sampling

Use validation set and plug in approximation

Jianfei Chen (THU) Latent variable models Janurary 13, 2014 12 / 21

H. Wallach, et al. ”Evaluation methods for topic models.” ICML 2009

Page 13: Latent variable models for discrete dataml.cs.tsinghua.edu.cn/~jianfei/static/lvm.pdf · Latent variable models for discrete data Jianfei Chen Department of Computer Science and Technology

Evaluation: Coherence

TODO

Jianfei Chen (THU) Latent variable models Janurary 13, 2014 13 / 21

D. Newman et al. ”Automatic evaluation of topic coherence.” NAACL HLT 2010.

Page 14: Latent variable models for discrete dataml.cs.tsinghua.edu.cn/~jianfei/static/lvm.pdf · Latent variable models for discrete data Jianfei Chen Department of Computer Science and Technology

Inference

Exponential number of inference algorithms

Variational inference vs sampling vs both

Collapsed vs non-collpased

Online vs stocastic vs offline

Empirical Bayes vs fully Bayes

Other algorithms: expectation propagation, etc.

Jianfei Chen (THU) Latent variable models Janurary 13, 2014 14 / 21

Page 15: Latent variable models for discrete dataml.cs.tsinghua.edu.cn/~jianfei/static/lvm.pdf · Latent variable models for discrete data Jianfei Chen Department of Computer Science and Technology

Inference: towards large scale

algorithms

Online / stocasticSparsitySpectral methods

system

Distributed: Yahoo-LDA, Petuum, Parameter-Server, etc.GPU: BIDMach, etc.

Jianfei Chen (THU) Latent variable models Janurary 13, 2014 15 / 21

Page 16: Latent variable models for discrete dataml.cs.tsinghua.edu.cn/~jianfei/static/lvm.pdf · Latent variable models for discrete data Jianfei Chen Department of Computer Science and Technology

Model Selection

Compute evidence with AIS / ELBO

Cross validation

Bayesian non-parametrics

Jianfei Chen (THU) Latent variable models Janurary 13, 2014 16 / 21

Teh et al. ”Hierarchical dirichlet processes.” Journal of the american statistical association (2006).

Page 17: Latent variable models for discrete dataml.cs.tsinghua.edu.cn/~jianfei/static/lvm.pdf · Latent variable models for discrete data Jianfei Chen Department of Computer Science and Technology

Extensions of LDA

Correlation: Correlated topic model

Time series: Dynamic topic model

Syntax: LDA-HMM

Supervision: many

1D categorial label: SLDA (generative), DLDA (discrimitive), MedLDA(regularized)nD label: MR-LDA, random effects mixture of experts, conditionaltopic random field, Dirchlet multinomial regression LDAK labels per document: labeled LDAlabels per word: TagLDA

Structural: RTM

Jianfei Chen (THU) Latent variable models Janurary 13, 2014 17 / 21

Page 18: Latent variable models for discrete dataml.cs.tsinghua.edu.cn/~jianfei/static/lvm.pdf · Latent variable models for discrete data Jianfei Chen Department of Computer Science and Technology

Restricted Boltzmann machines

Restricted Boltzmann machines

p(h,v|θ) = 1

Z(θ)

R∏r=1

K∏k=1

ψrk(vr, hk)

where h,v are binary vectors.factorized posterior

p(h|v, θ) =∏k

p(hk|v, θ)

advantage: symmetric, both posterior inference (backward) and generating(forward) are easy.

Exponential family harmonium (harmonium is 2-layer UGM)

Jianfei Chen (THU) Latent variable models Janurary 13, 2014 18 / 21

Page 19: Latent variable models for discrete dataml.cs.tsinghua.edu.cn/~jianfei/static/lvm.pdf · Latent variable models for discrete data Jianfei Chen Department of Computer Science and Technology

Restricted Boltzmann machines

Binary latent and binary visiable (other models exist, see Table 27.2)

p(v,h|θ) = 1

Z(θ)exp(−E(v,h; θ)) (5)

E(v,h; θ) = v>Wh (6)

p(h|v, θ) =∏k

Ber(hk|sigm(w>:,k,v)) (7)

p(v|h, θ) =∏r

Ber(vr|sigm(w>r,:,h)) (8)

Jianfei Chen (THU) Latent variable models Janurary 13, 2014 19 / 21

Page 20: Latent variable models for discrete dataml.cs.tsinghua.edu.cn/~jianfei/static/lvm.pdf · Latent variable models for discrete data Jianfei Chen Department of Computer Science and Technology

Restricted Boltzmann machines

Goal: maximize p(v|θ)

∇wl = Epemp(·|θ)[vh>]− Ep(·|θ)[vh>]

Jianfei Chen (THU) Latent variable models Janurary 13, 2014 20 / 21

Page 21: Latent variable models for discrete dataml.cs.tsinghua.edu.cn/~jianfei/static/lvm.pdf · Latent variable models for discrete data Jianfei Chen Department of Computer Science and Technology

Conclusions

Why there are many things to do

Exponential number of inference algorithms

Exponential number of models

Exponential × exponential number of solutions

Application, evaluation, theory (e.g. spectral), etc.

Need a way for information retriver, data miners find correct & fastsolutions for them...

Jianfei Chen (THU) Latent variable models Janurary 13, 2014 21 / 21