Top Banner
Intelligent Database Systems Presenter: WU, MIN-CONG Authors: Zhiyuan Liu, Wenyi Huang, Yabin Zheng and Maosong Sun 2010, ACM Automatic Keyphrase Extraction via Topic Decomposition
19

Automatic Keyphrase Extraction via Topic Decomposition

Dec 30, 2015

Download

Documents

Branden Hood

Automatic Keyphrase Extraction via Topic Decomposition. Presenter : Wu, Min-Cong Authors: Zhiyuan Liu, Wenyi Huang, Yabin Zheng and Maosong Sun 2010, ACM. Outlines. Motivation Objectives Methodology Experiments Conclusions Comments. Motivation. - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Automatic  Keyphrase  Extraction via Topic Decomposition

Intelligent Database Systems Lab

Presenter: WU, MIN-CONG

Authors: Zhiyuan Liu, Wenyi Huang,

Yabin Zheng and Maosong Sun

2010, ACM

Automatic Keyphrase Extraction via Topic Decomposition

Page 2: Automatic  Keyphrase  Extraction via Topic Decomposition

Intelligent Database Systems Lab

Outlines

MotivationObjectivesMethodologyExperimentsConclusionsComments

1

Page 3: Automatic  Keyphrase  Extraction via Topic Decomposition

Intelligent Database Systems Lab

Motivation• Existing graph-based ranking methods for

keyphrase extraction just compute a single

importance score for each word via a single

random walk.

• Motivated by the fact that both documents and

words can be represented by a mixture of

semantic topics.2

Page 4: Automatic  Keyphrase  Extraction via Topic Decomposition

Intelligent Database Systems Lab

Objectives• We thus build a Topical PageRank (TPR) on word graph

to measure word importance with respect to different

topics.

• we further calculate the ranking scores of words and

extract the top ranked ones as keyphrases.

3

Page 5: Automatic  Keyphrase  Extraction via Topic Decomposition

Intelligent Database Systems Lab

Methodology-Building Topic Interpreters

1

α, β from: ex: Gibbs sampling

Pr(w|z) ∈ ϕ(z) ∈ ϕ

θ

Pr(z|d) ∈θ (d)∈ θ

Document-topicTopic-wordLDA output:

Page 6: Automatic  Keyphrase  Extraction via Topic Decomposition

Intelligent Database Systems Lab

Methodology- Topical PageRank for Keyphrase Extraction

1

Page 7: Automatic  Keyphrase  Extraction via Topic Decomposition

Intelligent Database Systems Lab

Methodology- Constructing Word Graph Slide window size = 3

The document is regarded as a word sequence

1

Page 8: Automatic  Keyphrase  Extraction via Topic Decomposition

Intelligent Database Systems Lab

Methodology- Topical PageRank(PageRank)

Define:

weight of link (wi,wj) as e(wi,wj)

1

Page 9: Automatic  Keyphrase  Extraction via Topic Decomposition

Intelligent Database Systems Lab

Methodology- Topical PageRank(PageRank)

out-degree of vertex

equal probabilities of randomjump to all vertices.

1

Page 10: Automatic  Keyphrase  Extraction via Topic Decomposition

Intelligent Database Systems Lab

Methodology- Topical PageRank

From LDA

1

=pr(w)*pr(z)/pr(z) focuses on word

=pr(z)*pr(w)/pr(w) focuses on topic

(Cohn and Chang, 2000).

Page 11: Automatic  Keyphrase  Extraction via Topic Decomposition

Intelligent Database Systems Lab

Methodology- Extract Keyphrases Using Ranking Scores

1

Step1. annotate the document with POS tags.

Step2. select noun phrases.

Step3. compute the ranking scores of candidate keyphrases separately for each topic.

PageRank Topic PageRank

Step4. integrate topic-specific rankings of candidate keyphrases into a final ranking.

Page 12: Automatic  Keyphrase  Extraction via Topic Decomposition

Intelligent Database Systems Lab

Experiment- Datasets Dataset:

1

Article keyphrases

NEWS 308 2488

RESEARCH 2000 19254

Topic model:build topic interpreters with LDA.

corpus Web page word topic

Wikipedia snapshot at March 2008

2122618 20000 50 to 1500

Page 13: Automatic  Keyphrase  Extraction via Topic Decomposition

Intelligent Database Systems Lab

Experiment- Evaluation Metrics

1

However, precision/recall/F-measure does not take the order of extracted keyphrases into account.

The large value is better than small values.

The values is between 0 and 1.

Page 14: Automatic  Keyphrase  Extraction via Topic Decomposition

Intelligent Database Systems Lab

Experiment- Influences of Parameters to TPR

1

Window Size W

The Number of Topics K

Page 15: Automatic  Keyphrase  Extraction via Topic Decomposition

Intelligent Database Systems Lab

Experiment - Influences of Parameters to TPR

1

Damping Factor λ

Preference Values

=pr(w)*pr(z)/pr(z) focuses on word

=pr(z)*pr(w)/pr(w) focuses on topic

Ex.he 、 she

Page 16: Automatic  Keyphrase  Extraction via Topic Decomposition

Intelligent Database Systems Lab

Experiment - Comparing with Baseline Methods

1

do not use topic information

TPR enjoys the advantages of both LDA and TFIDF/PageRank

Page 17: Automatic  Keyphrase  Extraction via Topic Decomposition

Intelligent Database Systems Lab

Experiment - Extracting Example

1

Page 18: Automatic  Keyphrase  Extraction via Topic Decomposition

Intelligent Database Systems Lab

Conclusions• Experiments on two datasets show that TPR achieves

better performance than other baseline methods.

1

Page 19: Automatic  Keyphrase  Extraction via Topic Decomposition

Intelligent Database Systems Lab

Comments• Advantages

– TPR incorporates topic information within random walk for keyphrase extraction.

• Applications– Automatic Keyphrase Extraction.

1