Generative Topic Models for Community Analysis

Pilfered from: Ramesh Nallapatihttp://www.cs.cmu.edu/~wcohen/10-802/lda-sep-18.ppt

2 / 57

Objectives

• Cultural literacy for ML: – Q: What are “topic models”?

– A1: popular indoor sport for machine learning researchers

– A2: a particular way of applying unsupervised learning of Bayes nets to text

• Quick historical survey of some sample papers in the area

3 / 57

Outline• Part I: Introduction to Topic Models

– Naive Bayes model– Mixture Models

• Expectation Maximization

– PLSA– LDA

• Variational EM• Gibbs Sampling

• Part II: Topic Models for Community Analysis– Citation modeling with PLSA– Citation Modeling with LDA– Author Topic Model– Author Topic Recipient Model– Modeling influence of Citations– Mixed membership Stochastic Block Model

4 / 57

Introduction to Topic Models

• Multinomial Naïve Bayes

W1 W2 W3 ….. WN

• For each document d = 1,, M

• Generate Cd ~ Mult( ¢ | )

• For each position n = 1,, Nd

• Generate wn ~ Mult(¢|,Cd)

5 / 57

Introduction to Topic Models• Naïve Bayes Model: Compact representation

W1 W2 W3 ….. WN

6 / 57

• Mixture model: unsupervised naïve Bayes model

• Joint probability of words and classes:

• But classes are not visible:Z

7 / 57

8 / 57

• Probabilistic Latent Semantic Analysis Model

• Select document d ~ Mult()

• generate zn ~ Mult( ¢ | d)

• generate wn ~ Mult( ¢ | zn)

Topic distribution

9 / 57

• Probabilistic Latent Semantic Analysis Model– Learning using EM– Not a complete generative model

• Has a distribution over the training set of documents: no new document can be generated!

– Nevertheless, more realistic than mixture model

• Documents can discuss multiple topics!

10 / 57

• PLSA topics (TDT-1 corpus)

11 / 57

12 / 57

• Latent Dirichlet Allocation

• For each document d = 1,,M

• Generate d ~ Dir(¢ | )

13 / 57

• Latent Dirichlet Allocation– Overcomes the issues with PLSA

• Can generate any random document

– Parameter learning:• Variational EM

– Numerical approximation using lower-bounds

– Results in biased solutions

– Convergence has numerical guarantees

• Gibbs Sampling – Stochastic simulation

– unbiased solutions

– Stochastic convergence

14 / 57

• Variational EM for LDA– Approximate the posterior by a simpler

distribution

• A convex function in each parameter!

15 / 57

• Gibbs sampling– Applicable when joint distribution is hard to evaluate but

conditional distribution is known– Sequence of samples comprises a Markov Chain– Stationary distribution of the chain is the joint distribution

16 / 57

• LDA topics

17 / 57

• LDA’s view of a document

18 / 57

• Perplexity comparison of various models

Unigram

Mixture model

LDALower is better

19 / 57

Outline• Part I: Introduction to Topic Models

– Naive Bayes model– Mixture Models

• Expectation Maximization

– PLSA– LDA

• Variational EM• Gibbs Sampling

• Part II: Topic Models for Community Analysis– Citation modeling with PLSA– Citation Modeling with LDA– Author Topic Model– Author Topic Recipient Model– Modeling influence of Citations– Mixed membership Stochastic Block Model

20 / 57

Hyperlink modeling using PLSA

21 / 57

Hyperlink modeling using PLSA[Cohn and Hoffman, NIPS, 2001]

• Select document d ~ Mult()

• For each citation j = 1,, Ld

• generate zj ~ Mult( ¢ | d)

• generate cj ~ Mult( ¢ | zj)L

22 / 57

PLSA likelihood:

New likelihood:

Learning using EM

23 / 57

Heuristic:

0 · · 1 determines the relative importance of content and hyperlinks

24 / 57

• Classification performance

Hyperlink content Hyperlink content

25 / 57

Hyperlink modeling using LDA

26 / 57

Hyperlink modeling using LDA[Erosheva, Fienberg, Lafferty, PNAS, 2004]

• For each document d = 1,,M

• Generate d ~ Dir(¢ | )

•For each citation j = 1,, Ld

• generate zj ~ Mult( . | d)

• generate cj ~ Mult( . | zj)

Learning using variational EM

27 / 57

Hyperlink modeling using LDA[Erosheva, Fienberg, Lafferty, PNAS, 2004]

28 / 57

Author-Topic Model for Scientific Literature

29 / 57

Author-Topic Model for Scientific Literature[Rozen-Zvi, Griffiths, Steyvers, Smyth UAI, 2004]

• For each author a = 1,,A

• Generate a ~ Dir(¢ | )

• For each topic k = 1,,K

• Generate k ~ Dir( ¢ | )

•For each document d = 1,,M

•Generate author x ~ Unif(¢ | ad)

• generate zn ~ Mult( ¢ | a)

30 / 57

Author-Topic Model for Scientific Literature [Rozen-Zvi, Griffiths, Steyvers, Smyth UAI, 2004]

Learning: Gibbs sampling

31 / 57

Author-Topic Model for Scientific Literature [Rozen-Zvi, Griffiths, Steyvers, Smyth UAI, 2004]

• Topic-Author visualization

32 / 57

Author-Topic-Recipient model for email data [McCallum, Corrada-Emmanuel,Wang, ICJAI’05]

33 / 57

Gibbs sampling

34 / 57

• Datasets– Enron email data

• 23,488 messages between 147 users

– McCallum’s personal email• 23,488(?) messages with 128 authors

35 / 57

• Topic Visualization: Enron set

36 / 57

• Topic Visualization: McCallum’s data

37 / 57

Modeling Citation Influences

38 / 57

Modeling Citation Influences[Dietz, Bickel, Scheffer, ICML 2007]

• Citation influence model

39 / 57

• Citation influence graph for LDA paper

40 / 57

• Words in LDA paper assigned to citations

41 / 57

Link-PLSA-LDA: Topic Influence in Blogs (ICWSM 2008)

Ramesh Nallapati,

Amr Ahmed

Eric Xing

42 / 57

Generative Topic Models for Community Analysis

topic modelsintroduction

generative topic models

topic modelsmixture

topic modelsvariational

topic modelsldas view

topic modelsnave bayes

topic modelsplsa topics

mult d

Documents

Document Hierarchies from Text and Linkshierarchical...

Exploiting Generative Models in Discriminative...

Lecture 12: Generative Models

Generative Adversarial Networks, and Applications...

Deep Generative models for Inverse...

Introduction to Generative Models - CSE · PDF...

GENERATIVE LINE Topic: Art Enrica Colabella

Semantic History Embedding in Online Generative Topic Models...

ADVERSARIAL EXAMPLES FOR GENERATIVE MODELS

Learning Deep Generative Models - MIT

Fingerprint Inpainting with Generative ModelsFingerprint...

Unsupervised Learning – Generative Models (PixelRNNs,...

Generative Models Utilized for Superior Design...

short FormatedDraft - Automatic Fault Diagnosis for AUVs...

Generative Models

Tutorial on Deep Generative Models - shakirm.com ·...