Top Banner
40

British Museum Library, London Picture Courtesy: flickr.

Dec 21, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: British Museum Library, London Picture Courtesy: flickr.
Page 2: British Museum Library, London Picture Courtesy: flickr.

British Museum Library, LondonPicture Courtesy: flickr

Page 3: British Museum Library, London Picture Courtesy: flickr.

Courtesy: Wikipedia

Page 4: British Museum Library, London Picture Courtesy: flickr.

Topic Models and the Role of Sampling

Barnan Das

Page 5: British Museum Library, London Picture Courtesy: flickr.

British Museum Library, LondonPicture Courtesy: flickr

Page 6: British Museum Library, London Picture Courtesy: flickr.

Topic Modeling

• Methods for automatically organizing, understanding, searching and summarizing large electronic archives.• Uncover hidden topical patterns in collections.

• Annotate documents according to topics.• Using annotations to organize, summarize and search.

Page 7: British Museum Library, London Picture Courtesy: flickr.

Topic Modeling

NIH Grants Topic Map 2011NIH Map Viewer (https://app.nihmaps.org)

Page 8: British Museum Library, London Picture Courtesy: flickr.

Topic Modeling Applications

• Information retrieval.

• Content-based image retrieval.

• Bioinformatics

Page 9: British Museum Library, London Picture Courtesy: flickr.

Overview of this Presentation

• Latent Dirichlet allocation (LDA)

• Approximate posterior inference• Gibbs sampling

• Paper• Fast collapsed Gibbs sampling for LDA

Page 10: British Museum Library, London Picture Courtesy: flickr.

Latent Dirichlet Allocation

David Blei’s TalkMachine Learning Summer School, Cambridge 2009

D. M. Blei, A. Y. Ng, and M. I. Jordan, "Latent dirichlet allocation," The Journal of Machine Learning Research, vol. 3, pp. 993-1022, 2003.

Page 11: British Museum Library, London Picture Courtesy: flickr.

Probabilistic Model

• Generative probabilistic modeling• Treats data as observations• Contains hidden variables• Hidden variables reflect thematic structure of the collection.

• Infer hidden structure using posterior inference• Discovering topics in the collection.

• Placing new data into the estimated model• Situating new documents into the estimated topic structure.

Page 12: British Museum Library, London Picture Courtesy: flickr.

Intuition

Page 13: British Museum Library, London Picture Courtesy: flickr.

Generative Model

Page 14: British Museum Library, London Picture Courtesy: flickr.

Posterior Distribution

• Only documents are observable.

• Infer underlying topic structure.• Topics that generated the documents. • For each document, distribution of topics.• For each word, which topic generated the word.

• Algorithmic challenge: Finding the conditional distribution of all the latent variables, given the observation.

Page 15: British Museum Library, London Picture Courtesy: flickr.

LDA as Graphical Model

DirichletDirichlet Multinomial Multinomial

Page 16: British Museum Library, London Picture Courtesy: flickr.

Posterior Distribution

• From a collection of documents W, infer• Per-word topic assignment zd,n

• Per-document topic proportions d

• Per-corpus topic distribution k

• Use posterior expectation to perform different tasks.

Page 17: British Museum Library, London Picture Courtesy: flickr.

Posterior Distribution

• Evaluate P(z|W): posterior distribution over the assignment of words to topic.

• and can be estimated.

Page 18: British Museum Library, London Picture Courtesy: flickr.

Computing P(z|W)

• Involves evaluating a probability distribution over a large discrete space.

• Contribution of each zd,n depends on:• All z-n values.• Nk

Wn -># of times word Wd,n has been

assigned a topic k.• Nk

d -># of times a word from document d has

been assigned a topic k.

• Sampling from the target distribution using MCMC.

Page 19: British Museum Library, London Picture Courtesy: flickr.

Approximate posterior inference:Gibbs Sampling

C. M. Bishop and SpringerLink, Pattern recognition and machine learning vol. 4: Springer New York, 2006.

Iain Murray’s TalkMachine Learning Summer School, Cambridge 2009

Page 20: British Museum Library, London Picture Courtesy: flickr.

Overview

• When exact inference is intractable.

• Standard sampling techniques have limitation:• Cannot handle all kinds of distributions.• Cannot handle high dimensional data.

• MCMC techniques do not have these limitations.

• Markov chain:

For random variables x(1),…,x(M),

p(x(m+1)|x(1),…,x(m))=p(x(m+1)|x(m)) ; m{1,…M-1}

Page 21: British Museum Library, London Picture Courtesy: flickr.

Gibbs Sampling

• Target distribution: p(x) = p(x1,…,xM).

• Choose the initial state of the Markov chain: {xi:i=1,…M}.

• Replace xi by a value drawn from the distribution p(xi|x-i).• xi: ith component of Z• x-i: x1,…,xM but xi omitted.

• This process is repeated for all the variables.

• Repeat the whole cycle for however many samples are needed.

Page 22: British Museum Library, London Picture Courtesy: flickr.

Why Gibbs Sampling?

• Compared to other MCMC techniques, Gibbs sampling is:• Easy to implement• Requires little memory• Competitive in speed and performance

Page 23: British Museum Library, London Picture Courtesy: flickr.

Gibbs Sampling for LDA

• The full conditional distribution is:

,

, ,, , ( )

, ,

( | , )d nW dn k n k

d n d n d dn k n

N NP z k z W

N W N K

Probability of Wd,n under topic k

Probability of topic k in document d

,

, ,, , ( )

, ,

1( | , )

d nW dn k n k

d n d n d dn k n

N NP z k z W

Z N W N K

Z = k

Page 24: British Museum Library, London Picture Courtesy: flickr.

Gibbs Sampling for LDA

• Target distribution:

• Initial state of Markov chain: {zn} will have value in {1,2,…,K}.

• Chain run for a number of iterations.

• In each iteration a new state is found by sampling {zn} from

, ,( | , )d n d n dP z k z W

, ,( | , )d n d n dP z k z W

Page 25: British Museum Library, London Picture Courtesy: flickr.

Gibbs Sampling for LDA

• Subsequent samples are taken after appropriate lag to ensure that their autocorrelation is low.

• This is collapsed Gibbs sampling.

• For single sample and are calculated from z.

( )ˆ

d

d

WW kk

k

N

N W

ˆ

dd kk d

N

N K

Page 26: British Museum Library, London Picture Courtesy: flickr.

Fast Collapsed Gibbs Sampling For Latent Dirichlet Allocation

Ian Porteous, David Newman, Alexander Ihler, Arthur Asuncion, Padhraic Smyth, Max Welling

University of California, Irvine

Page 27: British Museum Library, London Picture Courtesy: flickr.

FastLDA: Graphical Representation

Page 28: British Museum Library, London Picture Courtesy: flickr.

FastLDA: Segments

• Sequence of bounds on the Z: Z1,…, Zk

• Z1 Z2 … ZK = Z

• Several slk…sK

k segments for each topic.

• 1st segment: conservative estimate on the probability of the topic given the upper bound Zk on the true normalization factor Z.

• Subsequent segments: corrections for the missing probability mass for a topic given the improved bound.

Page 29: British Museum Library, London Picture Courtesy: flickr.

FastLDA: Segments

Page 30: British Museum Library, London Picture Courtesy: flickr.

Upper Bounds for Z

• Find a sequence of improving bounds on the normalization constant.

• Z defined in terms of component vectors.

• Holder’s inequality to construct initial upper bound.

• Bound intelligently improved for each topic.

Page 31: British Museum Library, London Picture Courtesy: flickr.

Fast LDA Algorithm

• Algorithm:• Sort topics in decreasing order of Nk

d

• u ~ Uniform[0,1]• For topics in order:

• Calculate length of segments.• For each next topic, Zk is improved.• When sum of segments > u:

• Return topic and return.

• Complexity:• Not more than O(K log K) for any operation.

Page 32: British Museum Library, London Picture Courtesy: flickr.

Experiments

• Four large datasets:• NIPS full papers• Enron emails• NY Times news articles• PubMed abstracts

• = 0.01 and = 2/K

• Computations run on workstations with:• Dual Xeon 3.0Ghz processors

• Code compiled by gcc version 3.4.

Page 33: British Museum Library, London Picture Courtesy: flickr.

Results

Speedup : 5-8 times

Page 34: British Museum Library, London Picture Courtesy: flickr.

Results

• Speedup relatively insensitive to number of documents in the corpus.

Page 35: British Museum Library, London Picture Courtesy: flickr.

Results

• Large Dirichlet parameter smooths the distribution of the topics within a document.

• FastLDA needs to visit and compute more topics before drawing a sample.

Page 36: British Museum Library, London Picture Courtesy: flickr.

Discussions

Page 37: British Museum Library, London Picture Courtesy: flickr.

Discussions

• Other domains.

• Other sampling techniques.

• Other distributions other than Dirichlet.

• Parallel computation. • Newman et al. “Scalable parallel topic models”.

• Deciding on the value of K.

• Choices of bounds.

• Reason behind choosing these datasets.

• Are the values mentioned in the paper magic numbers?

• Why were the words having count <10 discarded?

• Assigning weights to words.

Page 38: British Museum Library, London Picture Courtesy: flickr.
Page 39: British Museum Library, London Picture Courtesy: flickr.

Backup Slides

Page 40: British Museum Library, London Picture Courtesy: flickr.

Dirichlet Distribution

• The Dirichlet distribution is an exponential family distribution over the simplex, i.e., positive vectors that sum to one.

• The Dirichlet is conjugate to the multinomial. Given a multinomial observation, the posterior distribution of is a Dirichlet.