Top Banner
Neural Models for Documents with Metadata Dallas Card, Chenhao Tan, Noah A. Smith July 18, 2018
46

Neural Models for Documents with Metadata · LatentDirichletAllocation Blei,Ng,andJordan. Latent Dirichlet Allocation. JMLR.2003. DavidBlei. Probabilistic topic models. Comm. ACM.2012

Aug 19, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Neural Models for Documents with Metadata · LatentDirichletAllocation Blei,Ng,andJordan. Latent Dirichlet Allocation. JMLR.2003. DavidBlei. Probabilistic topic models. Comm. ACM.2012

Neural Models for Documents withMetadata

Dallas Card, Chenhao Tan, Noah A. Smith

July 18, 2018

Page 2: Neural Models for Documents with Metadata · LatentDirichletAllocation Blei,Ng,andJordan. Latent Dirichlet Allocation. JMLR.2003. DavidBlei. Probabilistic topic models. Comm. ACM.2012

Outline

Main points of this talk:

1. Introducing Scholar1: a neural model for documents withmetadata

Background (LDA, SAGE, SLDA, etc.)Model and related workExperiments and Results

2. Power of neural variational inference for interactive modeling

1Sparse Contextual Hidden and Observed Language Autoencoder

1

Page 3: Neural Models for Documents with Metadata · LatentDirichletAllocation Blei,Ng,andJordan. Latent Dirichlet Allocation. JMLR.2003. DavidBlei. Probabilistic topic models. Comm. ACM.2012

Latent Dirichlet Allocation

Blei, Ng, and Jordan. Latent Dirichlet Allocation. JMLR. 2003.David Blei. Probabilistic topic models. Comm. ACM. 2012 2

Page 4: Neural Models for Documents with Metadata · LatentDirichletAllocation Blei,Ng,andJordan. Latent Dirichlet Allocation. JMLR.2003. DavidBlei. Probabilistic topic models. Comm. ACM.2012

Types of metadata

Date or time

Author(s)

Rating

Sentiment

Ideology

etc.

3

Page 5: Neural Models for Documents with Metadata · LatentDirichletAllocation Blei,Ng,andJordan. Latent Dirichlet Allocation. JMLR.2003. DavidBlei. Probabilistic topic models. Comm. ACM.2012

Variations and extensions

Author topic model (Rosen-Zvi et al 2004)

Supervised LDA (SLDA; McAuliffe and Blei, 2008)

Dirichlet multinomial regression (Mimno and McCallum, 2008)

Sparse additive generative models (SAGE; Eisenstein et al,2011)

Structural topic model (Roberts et al, 2014)

...

4

Page 6: Neural Models for Documents with Metadata · LatentDirichletAllocation Blei,Ng,andJordan. Latent Dirichlet Allocation. JMLR.2003. DavidBlei. Probabilistic topic models. Comm. ACM.2012

Desired features of model

Fast, scalable inference.

Easy modification by end-users.

Incorporation of metadata:Covariates: features which influences text (as in SAGE).Labels: features to be predicted along with text (as in SLDA).

Possibility of sparse topics.

Incorporate additional prior knowledge.

→ Use variational autoencoder (VAE) style of inference (Kingmaand Welling, 2014)

5

Page 7: Neural Models for Documents with Metadata · LatentDirichletAllocation Blei,Ng,andJordan. Latent Dirichlet Allocation. JMLR.2003. DavidBlei. Probabilistic topic models. Comm. ACM.2012

Desired features of model

Fast, scalable inference.

Easy modification by end-users.

Incorporation of metadata:Covariates: features which influences text (as in SAGE).Labels: features to be predicted along with text (as in SLDA).

Possibility of sparse topics.

Incorporate additional prior knowledge.

→ Use variational autoencoder (VAE) style of inference (Kingmaand Welling, 2014)

5

Page 8: Neural Models for Documents with Metadata · LatentDirichletAllocation Blei,Ng,andJordan. Latent Dirichlet Allocation. JMLR.2003. DavidBlei. Probabilistic topic models. Comm. ACM.2012

Desired features of model

Fast, scalable inference.

Easy modification by end-users.

Incorporation of metadata:Covariates: features which influences text (as in SAGE).Labels: features to be predicted along with text (as in SLDA).

Possibility of sparse topics.

Incorporate additional prior knowledge.

→ Use variational autoencoder (VAE) style of inference (Kingmaand Welling, 2014)

5

Page 9: Neural Models for Documents with Metadata · LatentDirichletAllocation Blei,Ng,andJordan. Latent Dirichlet Allocation. JMLR.2003. DavidBlei. Probabilistic topic models. Comm. ACM.2012

Desired features of model

Fast, scalable inference.

Easy modification by end-users.

Incorporation of metadata:Covariates: features which influences text (as in SAGE).Labels: features to be predicted along with text (as in SLDA).

Possibility of sparse topics.

Incorporate additional prior knowledge.

→ Use variational autoencoder (VAE) style of inference (Kingmaand Welling, 2014)

5

Page 10: Neural Models for Documents with Metadata · LatentDirichletAllocation Blei,Ng,andJordan. Latent Dirichlet Allocation. JMLR.2003. DavidBlei. Probabilistic topic models. Comm. ACM.2012

Desired features of model

Fast, scalable inference.

Easy modification by end-users.

Incorporation of metadata:Covariates: features which influences text (as in SAGE).Labels: features to be predicted along with text (as in SLDA).

Possibility of sparse topics.

Incorporate additional prior knowledge.

→ Use variational autoencoder (VAE) style of inference (Kingmaand Welling, 2014)

5

Page 11: Neural Models for Documents with Metadata · LatentDirichletAllocation Blei,Ng,andJordan. Latent Dirichlet Allocation. JMLR.2003. DavidBlei. Probabilistic topic models. Comm. ACM.2012

Desired outcome

Coherent groupings of words (something like topics), withoffsets for observed metadata

Encoder to map from documents to latent representations

Classifier to predict labels from from latent representation

6

Page 12: Neural Models for Documents with Metadata · LatentDirichletAllocation Blei,Ng,andJordan. Latent Dirichlet Allocation. JMLR.2003. DavidBlei. Probabilistic topic models. Comm. ACM.2012

Desired outcome

Coherent groupings of words (something like topics), withoffsets for observed metadata

Encoder to map from documents to latent representations

Classifier to predict labels from from latent representation

6

Page 13: Neural Models for Documents with Metadata · LatentDirichletAllocation Blei,Ng,andJordan. Latent Dirichlet Allocation. JMLR.2003. DavidBlei. Probabilistic topic models. Comm. ACM.2012

Desired outcome

Coherent groupings of words (something like topics), withoffsets for observed metadata

Encoder to map from documents to latent representations

Classifier to predict labels from from latent representation

6

Page 14: Neural Models for Documents with Metadata · LatentDirichletAllocation Blei,Ng,andJordan. Latent Dirichlet Allocation. JMLR.2003. DavidBlei. Probabilistic topic models. Comm. ACM.2012

Model

ik

words

generator network: p(w i) = fg( )

7

Page 15: Neural Models for Documents with Metadata · LatentDirichletAllocation Blei,Ng,andJordan. Latent Dirichlet Allocation. JMLR.2003. DavidBlei. Probabilistic topic models. Comm. ACM.2012

Model

ik

words

generator network: p(w i) = fg( )p( i w)

8

Page 16: Neural Models for Documents with Metadata · LatentDirichletAllocation Blei,Ng,andJordan. Latent Dirichlet Allocation. JMLR.2003. DavidBlei. Probabilistic topic models. Comm. ACM.2012

Model

ik

words

generator network: p(w i) = fg( )p( i w)q( i w)

9

Page 17: Neural Models for Documents with Metadata · LatentDirichletAllocation Blei,Ng,andJordan. Latent Dirichlet Allocation. JMLR.2003. DavidBlei. Probabilistic topic models. Comm. ACM.2012

Model

ik

words

generator network: p(w i) = fg( )p( i w)q( i w)

ELBO = Eq[log p(words | θi )]− DKL[q(θi | words)‖p(θi )]

10

Page 18: Neural Models for Documents with Metadata · LatentDirichletAllocation Blei,Ng,andJordan. Latent Dirichlet Allocation. JMLR.2003. DavidBlei. Probabilistic topic models. Comm. ACM.2012

Model

ik

words

generator network: p(w i) = fg( )

words

encoder network: q( i w) = fe( )

ELBO = Eq[log p(words | θi )]− DKL[q(θi | words)‖p(θi )]

11

Page 19: Neural Models for Documents with Metadata · LatentDirichletAllocation Blei,Ng,andJordan. Latent Dirichlet Allocation. JMLR.2003. DavidBlei. Probabilistic topic models. Comm. ACM.2012

Model

ik

words

generator network: p(w i) = fg( )

words

encoder network: q( i w) = fe( )

rik

i = softmax(ri)

ELBO = Eq[log p(words | ri )]− DKL[q(ri | words)‖p(ri )]

12

Page 20: Neural Models for Documents with Metadata · LatentDirichletAllocation Blei,Ng,andJordan. Latent Dirichlet Allocation. JMLR.2003. DavidBlei. Probabilistic topic models. Comm. ACM.2012

Model

ik

words

generator network: p(w i) = fg( )

words

encoder network: q( i w) = fe( )

rik

i = softmax(ri)

ELBO ≈ 1S

∑Ss=1[log p(words | r

(s)i )]− DKL[q(ri | words)‖p(ri )]

13

Page 21: Neural Models for Documents with Metadata · LatentDirichletAllocation Blei,Ng,andJordan. Latent Dirichlet Allocation. JMLR.2003. DavidBlei. Probabilistic topic models. Comm. ACM.2012

Model

ik

words

generator network: p(w i) = fg( )

words

encoder network: q( i w) = fe( )

rik

i = softmax(ri)

(0, I)

ELBO ≈ 1S

∑Ss=1[log p(words | r

(s)i )]− DKL[q(ri | words)‖p(ri )]

14

Page 22: Neural Models for Documents with Metadata · LatentDirichletAllocation Blei,Ng,andJordan. Latent Dirichlet Allocation. JMLR.2003. DavidBlei. Probabilistic topic models. Comm. ACM.2012

Model

ik

words

generator network: p(w i) = fg( )

words

encoder network: q( i w) = fe( )

rik

i = softmax(ri)

(0, I)

= q + (s)q

ELBO ≈ 1S

∑Ss=1[log p(words | r

(s)i )]− DKL[q(ri | words)‖p(ri )]

15

Page 23: Neural Models for Documents with Metadata · LatentDirichletAllocation Blei,Ng,andJordan. Latent Dirichlet Allocation. JMLR.2003. DavidBlei. Probabilistic topic models. Comm. ACM.2012

Model

ik

words

generator network: p(w i) = fg( )

words

encoder network: q( i w) = fe( )

rik

i = softmax(ri)

(0, I)

= q + (s)q

Srivastava and Sutton, 2017, Miao et al, 2016

16

Page 24: Neural Models for Documents with Metadata · LatentDirichletAllocation Blei,Ng,andJordan. Latent Dirichlet Allocation. JMLR.2003. DavidBlei. Probabilistic topic models. Comm. ACM.2012

Model

ik

words

generator network: p(w i) = fg( )

words

encoder network: q( i w) = fe( )

rik

i = softmax(ri)

(0, I)

= q + (s)q

yi

17

Page 25: Neural Models for Documents with Metadata · LatentDirichletAllocation Blei,Ng,andJordan. Latent Dirichlet Allocation. JMLR.2003. DavidBlei. Probabilistic topic models. Comm. ACM.2012

Model

ik

words

generator network: p(w i) = fg( )

words

encoder network: q( i w) = fe( )

rik

i = softmax(ri)

(0, I)

= q + (s)q

yi

ci

18

Page 26: Neural Models for Documents with Metadata · LatentDirichletAllocation Blei,Ng,andJordan. Latent Dirichlet Allocation. JMLR.2003. DavidBlei. Probabilistic topic models. Comm. ACM.2012

Model

ik

words

generator network: p(w i) = fg( )

words

encoder network: q( i w) = fe( )

rik

i = softmax(ri)

(0, I)

= q + (s)q

yi

ci

, ci, yi

19

Page 27: Neural Models for Documents with Metadata · LatentDirichletAllocation Blei,Ng,andJordan. Latent Dirichlet Allocation. JMLR.2003. DavidBlei. Probabilistic topic models. Comm. ACM.2012

Scholar

Generator network:

p(word | θi , ci ) = softmax(d + θTi B(topic) + cTi B(cov))

Optionally include interactions between topics and covariates

p(yi | θi , ci ) = fy (θi , ci )

Encoder:

µi = fµ(words, ci , yi )

log σi = fσ(words, ci , yi )

Optional incorporation of word vectors to embed input

20

Page 28: Neural Models for Documents with Metadata · LatentDirichletAllocation Blei,Ng,andJordan. Latent Dirichlet Allocation. JMLR.2003. DavidBlei. Probabilistic topic models. Comm. ACM.2012

Scholar

Generator network:

p(word | θi , ci ) = softmax(d + θTi B(topic) + cTi B(cov))

Optionally include interactions between topics and covariates

p(yi | θi , ci ) = fy (θi , ci )

Encoder:

µi = fµ(words, ci , yi )

log σi = fσ(words, ci , yi )

Optional incorporation of word vectors to embed input

20

Page 29: Neural Models for Documents with Metadata · LatentDirichletAllocation Blei,Ng,andJordan. Latent Dirichlet Allocation. JMLR.2003. DavidBlei. Probabilistic topic models. Comm. ACM.2012

Scholar

Generator network:

p(word | θi , ci ) = softmax(d + θTi B(topic) + cTi B(cov))

Optionally include interactions between topics and covariates

p(yi | θi , ci ) = fy (θi , ci )

Encoder:

µi = fµ(words, ci , yi )

log σi = fσ(words, ci , yi )

Optional incorporation of word vectors to embed input

20

Page 30: Neural Models for Documents with Metadata · LatentDirichletAllocation Blei,Ng,andJordan. Latent Dirichlet Allocation. JMLR.2003. DavidBlei. Probabilistic topic models. Comm. ACM.2012

Scholar

Generator network:

p(word | θi , ci ) = softmax(d + θTi B(topic) + cTi B(cov))

Optionally include interactions between topics and covariates

p(yi | θi , ci ) = fy (θi , ci )

Encoder:

µi = fµ(words, ci , yi )

log σi = fσ(words, ci , yi )

Optional incorporation of word vectors to embed input

20

Page 31: Neural Models for Documents with Metadata · LatentDirichletAllocation Blei,Ng,andJordan. Latent Dirichlet Allocation. JMLR.2003. DavidBlei. Probabilistic topic models. Comm. ACM.2012

Optimization

Stochastic optimization using mini-batches of documents

Tricks from Srivastava and Sutton, 2017:Adam optimizer with high-learning rate to bypass mode collapseBatch-norm layers to avoid divergence

Annealing away from batch-norm output to keep resultsinterpretable

21

Page 32: Neural Models for Documents with Metadata · LatentDirichletAllocation Blei,Ng,andJordan. Latent Dirichlet Allocation. JMLR.2003. DavidBlei. Probabilistic topic models. Comm. ACM.2012

Output of Scholar

B(topic),B(cov): Coherent groupings of positive and negativedeviations from background (∼ topics)

fµ, fσ: Encoder network: mapping from words to topics:θi = softmax(fe(words, ci , yi , ε))

fy : Classifier mapping from θi to labels: y = fy (θi , ci )

22

Page 33: Neural Models for Documents with Metadata · LatentDirichletAllocation Blei,Ng,andJordan. Latent Dirichlet Allocation. JMLR.2003. DavidBlei. Probabilistic topic models. Comm. ACM.2012

Output of Scholar

B(topic),B(cov): Coherent groupings of positive and negativedeviations from background (∼ topics)

fµ, fσ: Encoder network: mapping from words to topics:θi = softmax(fe(words, ci , yi , ε))

fy : Classifier mapping from θi to labels: y = fy (θi , ci )

22

Page 34: Neural Models for Documents with Metadata · LatentDirichletAllocation Blei,Ng,andJordan. Latent Dirichlet Allocation. JMLR.2003. DavidBlei. Probabilistic topic models. Comm. ACM.2012

Output of Scholar

B(topic),B(cov): Coherent groupings of positive and negativedeviations from background (∼ topics)

fµ, fσ: Encoder network: mapping from words to topics:θi = softmax(fe(words, ci , yi , ε))

fy : Classifier mapping from θi to labels: y = fy (θi , ci )

22

Page 35: Neural Models for Documents with Metadata · LatentDirichletAllocation Blei,Ng,andJordan. Latent Dirichlet Allocation. JMLR.2003. DavidBlei. Probabilistic topic models. Comm. ACM.2012

Evaluation

1. Performance as a topic model, without metadata (perplexity,coherence)

2. Performance as a classifier, compared to SLDA

3. Exploratory data analysis

23

Page 36: Neural Models for Documents with Metadata · LatentDirichletAllocation Blei,Ng,andJordan. Latent Dirichlet Allocation. JMLR.2003. DavidBlei. Probabilistic topic models. Comm. ACM.2012

Quantitative results: basic model

0

1000

2000

Perplexity

0.0

0.1

0.2

Coherence

LDA0.0

0.5Sparsity

IMDB dataset (Maas, 2011)

24

Page 37: Neural Models for Documents with Metadata · LatentDirichletAllocation Blei,Ng,andJordan. Latent Dirichlet Allocation. JMLR.2003. DavidBlei. Probabilistic topic models. Comm. ACM.2012

Quantitative results: basic model

0

1000

2000

Perplexity

0.0

0.1

0.2

Coherence

LDA SAGE0.0

0.5Sparsity

IMDB dataset (Maas, 2011)

25

Page 38: Neural Models for Documents with Metadata · LatentDirichletAllocation Blei,Ng,andJordan. Latent Dirichlet Allocation. JMLR.2003. DavidBlei. Probabilistic topic models. Comm. ACM.2012

Quantitative results: basic model

0

1000

2000

Perplexity

0.0

0.1

0.2

Coherence

LDA SAGE NVDM0.0

0.5Sparsity

IMDB dataset (Maas, 2011)

26

Page 39: Neural Models for Documents with Metadata · LatentDirichletAllocation Blei,Ng,andJordan. Latent Dirichlet Allocation. JMLR.2003. DavidBlei. Probabilistic topic models. Comm. ACM.2012

Quantitative results: basic model

0

1000

2000

Perplexity

0.0

0.1

0.2

Coherence

LDA SAGE NVDM Scholar0.0

0.5Sparsity

IMDB dataset (Maas, 2011)

27

Page 40: Neural Models for Documents with Metadata · LatentDirichletAllocation Blei,Ng,andJordan. Latent Dirichlet Allocation. JMLR.2003. DavidBlei. Probabilistic topic models. Comm. ACM.2012

Quantitative results: basic model

0

1000

2000

Perplexity

0.0

0.1

0.2

Coherence

LDA SAGE NVDM Scholar Scholar+wv

0.0

0.5Sparsity

IMDB dataset (Maas, 2011)28

Page 41: Neural Models for Documents with Metadata · LatentDirichletAllocation Blei,Ng,andJordan. Latent Dirichlet Allocation. JMLR.2003. DavidBlei. Probabilistic topic models. Comm. ACM.2012

Quantitative results: basic model

0

1000

2000

Perplexity

0.0

0.1

0.2

Coherence

LDA SAGE NVDM Scholar Scholar+wv

Scholar+sparsity

0.0

0.5Sparsity

IMDB dataset (Maas, 2011)29

Page 42: Neural Models for Documents with Metadata · LatentDirichletAllocation Blei,Ng,andJordan. Latent Dirichlet Allocation. JMLR.2003. DavidBlei. Probabilistic topic models. Comm. ACM.2012

Classification results

LR SLDA Scholar(labels)

Scholar(covariates)

0.5

0.6

0.7

0.8

0.9

1.0

Acc

urac

y

IMDB dataset (Maas, 2011)

30

Page 43: Neural Models for Documents with Metadata · LatentDirichletAllocation Blei,Ng,andJordan. Latent Dirichlet Allocation. JMLR.2003. DavidBlei. Probabilistic topic models. Comm. ACM.2012

Exploratory Data Analysis

Data: Media Frames Corpus (Card et al, 2015)

Collection of thousands of news articles annotated in terms oftone and framing

Relevant metadata: year of publication, newspaper, etc.

31

Page 44: Neural Models for Documents with Metadata · LatentDirichletAllocation Blei,Ng,andJordan. Latent Dirichlet Allocation. JMLR.2003. DavidBlei. Probabilistic topic models. Comm. ACM.2012

Tone as a label

0 1

p(pro-immigration | topic)

arrested charged charges agents operationstate gov benefits arizona law bill billsbush border president bill republicanslabor jobs workers percent study wagesasylum judge appeals deportation courtvisas visa applications students citizenshipboat desert died men miles coast haitianenglish language city spanish community

32

Page 45: Neural Models for Documents with Metadata · LatentDirichletAllocation Blei,Ng,andJordan. Latent Dirichlet Allocation. JMLR.2003. DavidBlei. Probabilistic topic models. Comm. ACM.2012

Tone as a covariate, with interactions

Base topics Anti-immigration Pro-immigrationice customs agency criminal customs detainees detentionpopulation born percent jobs million illegals english newcomersjudge case court guilty guilty charges man asylum court judgepatrol border miles patrol border died authorities desertlicenses drivers card foreign sept visas green citizenship cardisland story chinese smuggling federal island school ellisguest worker workers bill border house workers tech skilledbenefits bill welfare republican california law welfare students

33

Page 46: Neural Models for Documents with Metadata · LatentDirichletAllocation Blei,Ng,andJordan. Latent Dirichlet Allocation. JMLR.2003. DavidBlei. Probabilistic topic models. Comm. ACM.2012

Conclusions

Variational autoencoders (VAEs) provide a powerful frameworkfor latent variable modeling

We use the VAE framework to create a customizable model fordocuments with metadata

We obtain comparable performance with enhanced flexibilityand scalability

Code is available: www.github.com/dallascard/scholar

34