Top Banner
Technical Foundations and Inference Topic Model Tutorial - Part 2 Hannover, 2016 Arnim Bleier [email protected]
25

Technical Foundations and Inference - topicmodels.infotopicmodels.info/ckling/tmt/part2.pdf · Technical Foundations and Inference Topic Model Tutorial - Part 2 Hannover, 2016 ...

May 22, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Technical Foundations and Inference - topicmodels.infotopicmodels.info/ckling/tmt/part2.pdf · Technical Foundations and Inference Topic Model Tutorial - Part 2 Hannover, 2016 ...

Technical Foundationsand Inference

Topic Model Tutorial - Part 2 Hannover, 2016

Arnim [email protected]

Page 2: Technical Foundations and Inference - topicmodels.infotopicmodels.info/ckling/tmt/part2.pdf · Technical Foundations and Inference Topic Model Tutorial - Part 2 Hannover, 2016 ...

2

● Probabilistic Graphical Models are a general framework to represent assumptions about the (in-) dependence between random variables.

● Knowing the inner workings of Topic Models helps us to better interpret their results.

Why should we care?

Page 3: Technical Foundations and Inference - topicmodels.infotopicmodels.info/ckling/tmt/part2.pdf · Technical Foundations and Inference Topic Model Tutorial - Part 2 Hannover, 2016 ...

3

Outline

● Generative storylines & Plates

● Gibbs sampling

● Simple Topic Model

● Latent Dirichlet Allocation

Page 4: Technical Foundations and Inference - topicmodels.infotopicmodels.info/ckling/tmt/part2.pdf · Technical Foundations and Inference Topic Model Tutorial - Part 2 Hannover, 2016 ...

4

Recap: Conference dinner

?

k 1 k 2k 1 k 3510

210

310

for k 1

for k 2

Probabilities:

for k 3

Page 5: Technical Foundations and Inference - topicmodels.infotopicmodels.info/ckling/tmt/part2.pdf · Technical Foundations and Inference Topic Model Tutorial - Part 2 Hannover, 2016 ...

5

Recap: Conference dinner

k 1 k 2k 1 k 3

?

=normalizing constant

number of observations in k

General case:

Page 6: Technical Foundations and Inference - topicmodels.infotopicmodels.info/ckling/tmt/part2.pdf · Technical Foundations and Inference Topic Model Tutorial - Part 2 Hannover, 2016 ...

6

Generative Storyline

=

N+1

Page 7: Technical Foundations and Inference - topicmodels.infotopicmodels.info/ckling/tmt/part2.pdf · Technical Foundations and Inference Topic Model Tutorial - Part 2 Hannover, 2016 ...

7

Generative Storyline

N+1

prior

Page 8: Technical Foundations and Inference - topicmodels.infotopicmodels.info/ckling/tmt/part2.pdf · Technical Foundations and Inference Topic Model Tutorial - Part 2 Hannover, 2016 ...

8

Plate Notation

i ;

Page 9: Technical Foundations and Inference - topicmodels.infotopicmodels.info/ckling/tmt/part2.pdf · Technical Foundations and Inference Topic Model Tutorial - Part 2 Hannover, 2016 ...

9

Gibbs sampling

X

Iteratively sample each variable conditioned on all other variables.

Page 10: Technical Foundations and Inference - topicmodels.infotopicmodels.info/ckling/tmt/part2.pdf · Technical Foundations and Inference Topic Model Tutorial - Part 2 Hannover, 2016 ...

10

Gibbs sampling

X

prior

iterations

stationarydistribution

Page 11: Technical Foundations and Inference - topicmodels.infotopicmodels.info/ckling/tmt/part2.pdf · Technical Foundations and Inference Topic Model Tutorial - Part 2 Hannover, 2016 ...

11

Simple Topic Model

Generative Storyline:

d

Draw a global distribution over topics.

For each document ddraw a topic.

Page 12: Technical Foundations and Inference - topicmodels.infotopicmodels.info/ckling/tmt/part2.pdf · Technical Foundations and Inference Topic Model Tutorial - Part 2 Hannover, 2016 ...

12

Simple Topic Model

Generative Storyline:

d

For each topic k, draw a distribution over the vocabulary.

dw

For each document ddraw the words w from the topic

indexed by z .

d

d

d

* Mixture of Unigrams

*

Page 13: Technical Foundations and Inference - topicmodels.infotopicmodels.info/ckling/tmt/part2.pdf · Technical Foundations and Inference Topic Model Tutorial - Part 2 Hannover, 2016 ...

13

Likelihood of document d being generated from topic k.

Simple Topic Model

*

* Approximation not considering the dependence of words within documents.

d

di=1

d

d

di

d

Page 14: Technical Foundations and Inference - topicmodels.infotopicmodels.info/ckling/tmt/part2.pdf · Technical Foundations and Inference Topic Model Tutorial - Part 2 Hannover, 2016 ...

14

Simple Topic Model

*

d

di=1

d

d

di

d= ?

We need to know from which topic k document d was generated.

Global distribution over topics.

topics

document

Page 15: Technical Foundations and Inference - topicmodels.infotopicmodels.info/ckling/tmt/part2.pdf · Technical Foundations and Inference Topic Model Tutorial - Part 2 Hannover, 2016 ...

15

Simple Topic Model

document

*

d

di=1

d

d

di

d=

Page 16: Technical Foundations and Inference - topicmodels.infotopicmodels.info/ckling/tmt/part2.pdf · Technical Foundations and Inference Topic Model Tutorial - Part 2 Hannover, 2016 ...

16

Simple Topic Model

document

*

d

di=1

d

d

di

d=

Page 17: Technical Foundations and Inference - topicmodels.infotopicmodels.info/ckling/tmt/part2.pdf · Technical Foundations and Inference Topic Model Tutorial - Part 2 Hannover, 2016 ...

17

Simple Topic Model

document

*

d

di=1

d

d

di

d=

Page 18: Technical Foundations and Inference - topicmodels.infotopicmodels.info/ckling/tmt/part2.pdf · Technical Foundations and Inference Topic Model Tutorial - Part 2 Hannover, 2016 ...

18

Simple Topic Model

document

*

d

di=1

d

d

di

d=

We can now sample the membership for document d and update the model.

Page 19: Technical Foundations and Inference - topicmodels.infotopicmodels.info/ckling/tmt/part2.pdf · Technical Foundations and Inference Topic Model Tutorial - Part 2 Hannover, 2016 ...

19

Latent Dirichlet Allocation

Generative Storyline:

Page 20: Technical Foundations and Inference - topicmodels.infotopicmodels.info/ckling/tmt/part2.pdf · Technical Foundations and Inference Topic Model Tutorial - Part 2 Hannover, 2016 ...

20

Latent Dirichlet Allocation

Generative Storyline:

Document specific distribution over topics.

Page 21: Technical Foundations and Inference - topicmodels.infotopicmodels.info/ckling/tmt/part2.pdf · Technical Foundations and Inference Topic Model Tutorial - Part 2 Hannover, 2016 ...

21

Latent Dirichlet Allocation

Likelihood of word i in document d being generated from topic k.

Page 22: Technical Foundations and Inference - topicmodels.infotopicmodels.info/ckling/tmt/part2.pdf · Technical Foundations and Inference Topic Model Tutorial - Part 2 Hannover, 2016 ...

22

(Simple Topic Model)Associated Press Topics

Page 23: Technical Foundations and Inference - topicmodels.infotopicmodels.info/ckling/tmt/part2.pdf · Technical Foundations and Inference Topic Model Tutorial - Part 2 Hannover, 2016 ...

23

Associated Press Topics(LDA)

Page 24: Technical Foundations and Inference - topicmodels.infotopicmodels.info/ckling/tmt/part2.pdf · Technical Foundations and Inference Topic Model Tutorial - Part 2 Hannover, 2016 ...

24

Conclusions

● Topic Models can be formulated within the wider framework of Probabilistic Graphical Models.

● Different versions of Topic Models can be formulated.

● More complex models are not necessarily better.

● However, more complex models can help to express assumptions about the dataset.

Thank you!

Page 25: Technical Foundations and Inference - topicmodels.infotopicmodels.info/ckling/tmt/part2.pdf · Technical Foundations and Inference Topic Model Tutorial - Part 2 Hannover, 2016 ...

25

References

● M. Steyvers, T. Griffiths. Latent Semantic Analysis: A Road to Meaning, chap. Probabilistic topic models, 2007

● Heinrich, Gregor. Parameter estimation for text analysis, 2008.

● P. Resnik, E. Hardisty. Gibbs sampling for the uninitiated, 2010.

● M. D. Lee, E. J. Wagenmakers. Bayesian cognitive modeling: A practical course, 2014.

● S. Jackman. Bayesian analysis for the social sciences , 2009.