Top Banner
ChronoSAGE ChronoSAGE: Diversifying Topic Modeling Chronologically Tomonari MASADA NAGASAKI University [email protected]
18

ChronoSAGE: Diversifying Topic Modeling Chronologically

Jun 26, 2015

Download

Engineering

Tomonari Masada

Slides for the poster presentation in WAIM 2014
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: ChronoSAGE: Diversifying Topic Modeling Chronologically

ChronoSAGE

ChronoSAGE:Diversifying Topic Modeling

Chronologically

Tomonari MASADANAGASAKI University

[email protected]

Page 2: ChronoSAGE: Diversifying Topic Modeling Chronologically

Solution

ProblemProblem• Find research trends• Present them in a readable manner

Solution• Extract trending words at each epoch• Display them chronologically

Page 3: ChronoSAGE: Diversifying Topic Modeling Chronologically

MethodMethod

•SAGE [Eisenstein+ 11]

–Represent each word probability

as a multiplication of factors

Page 4: ChronoSAGE: Diversifying Topic Modeling Chronologically

ChronoSAGE

• Use SAGE for our chronological

analysis of academic papers

• Represent each word probability

as a multiplication of four factors

ChronoSAGE

• Use SAGE for our chronological

analysis of time-stamped docs

• Represent each word probability

as a multiplication of four factors

Page 5: ChronoSAGE: Diversifying Topic Modeling Chronologically

corpus-wide

background

per-topic

background

Page 6: ChronoSAGE: Diversifying Topic Modeling Chronologically

per-epoch

background

per-topictrends

Page 7: ChronoSAGE: Diversifying Topic Modeling Chronologically

words sorted byper-epoch background probabilities (TDT4)

t=0 edt paralymp lebanon 32nd wild-card u.s china

t=1 kippur 10-13 lebanon china palestinian text join

t=2 10-14 10-16 10-18 10-15 10-19 10-17 10-20

t=3 10-24 10-23 10-22 10-25 10-21 10-26 10-27

t=4 10-29 10-28 10-31 10-30 11-3 leipzig lebanon

t=5 11-10 11-8 11-9 11-6 11-7 11-5 convuls

t=6 11-17 11-16 11-11 11-14 11-15 11-12 11-13

t=7 11-18 11-19 11-24 11-22 11-23 11-20 11-21

t=8 11-25 11-27 11-28 11-26 11-30 11-29 seclus

Page 8: ChronoSAGE: Diversifying Topic Modeling Chronologically

words sorted byper-epoch background probabilities (TDT4)

t=9 12-8 12-6 12-5 12-7 12-3 537-vote 12-4

t=10 12-12 12-15 12-14 12-10 12-13 12-11 12-9

t=11 12-17 12-18 12-21 12-20 12-19 12-22 12-16

t=12 12-24 12-28 12-29 12-23 12-27 12-26 12-25

t=13 309 tabasco 2001 1-5 vy 12-0 free-agent

t=14 presid-elect’s 1-12 1-8 1-11 1-9 1-10 1-7

t=15 1-14 1-13 1-19 1-18 1-17 1-16 1-15

t=16 1-21 1-26 1-25 1-22 1-20 1-23 1-24

t=17 1-28 1-31 1-30 1-27 1-29 dawosi bhuj

Page 9: ChronoSAGE: Diversifying Topic Modeling Chronologically

Evaluation (1)

• SAGE and ChronoSAGE are better

than LDA in terms of PMI (point-

wise mutual information).

–We used the entire English

Wikipedia for PMI computation.

Page 10: ChronoSAGE: Diversifying Topic Modeling Chronologically

PMI

,

where .

Page 11: ChronoSAGE: Diversifying Topic Modeling Chronologically
Page 12: ChronoSAGE: Diversifying Topic Modeling Chronologically
Page 13: ChronoSAGE: Diversifying Topic Modeling Chronologically
Page 14: ChronoSAGE: Diversifying Topic Modeling Chronologically

Evaluation (2)

• ChronoSAGE can extract

chronological trends for each topic

as top-K word lists.

–ChronoSAGE can do what SAGE can’t do.

Page 15: ChronoSAGE: Diversifying Topic Modeling Chronologically
Page 16: ChronoSAGE: Diversifying Topic Modeling Chronologically
Page 17: ChronoSAGE: Diversifying Topic Modeling Chronologically
Page 18: ChronoSAGE: Diversifying Topic Modeling Chronologically