Top Banner
Mutual-reinforcement document summarization using embedded graph based sentence clustering for storytelling Zhengchen Zhang , Shuzhi Sam Ge , Hongsheng He IPM2012 Hao-Chin Chang Department of Computer Science & Information Engineering National Taiwan Normal University
16

Mutual-reinforcement document summarization using embedded graph based sentence clustering for storytelling Zhengchen Zhang, Shuzhi Sam Ge, Hongsheng He.

Jan 17, 2016

Download

Documents

Cody Walton
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Mutual-reinforcement document summarization using embedded graph based sentence clustering for storytelling Zhengchen Zhang, Shuzhi Sam Ge, Hongsheng He.

Mutual-reinforcement document summarization using embedded graph based sentence

clustering for storytellingZhengchen Zhang , Shuzhi Sam Ge , Hongsheng He

IPM2012

Hao-Chin Chang

Department of Computer Science & Information Engineering

National Taiwan Normal University

2011/03/09

Page 2: Mutual-reinforcement document summarization using embedded graph based sentence clustering for storytelling Zhengchen Zhang, Shuzhi Sam Ge, Hongsheng He.

2

Outline

• Introduction

• Sentence ranking using embedded graph based sentence clustering– Document modeling

– Embedded graph based sentence clustering

– Mutual-reinforcement ranking algorithm

• Experiment

• Conclusion and Future work

Page 3: Mutual-reinforcement document summarization using embedded graph based sentence clustering for storytelling Zhengchen Zhang, Shuzhi Sam Ge, Hongsheng He.

Introduction

• There are three phrases in the framework– Document modeling

document is modeled by a weighted graph with vertexes that represent sentences of the document

– Sentence clustering The sentences are clustered into different groups to find the latent topics in

the story. To alleviate the influence of unrelated sentences in clustering,

– Sentence ranking An embedding process is employed to optimize the document model

• We propose a framework which considers the mutual effect between clusters, sentences and terms instead of the relationship between documents, sentences, and terms to employ the cluster level information

3

Page 4: Mutual-reinforcement document summarization using embedded graph based sentence clustering for storytelling Zhengchen Zhang, Shuzhi Sam Ge, Hongsheng He.

Introduction

• The sentences are then clustered into different groups according to the distance between two sentences– In order to alleviate the influence of unrelated sentences in sentence

clustering, we employ an embedding process to optimize the graph vertexes

• The contributions of this paper are summarized– An embedded graph based sentence clustering method is proposed for

sentence grouping of a document, which is robust with respect to different cluster numbers

– An iterative ranking method is presented which considers the mutual-reinforcement between terms, sentences and sentence clusters

– A document summarization framework considering sentence cluster information is proposed and the framework is evaluated using DUC data sets.

4

Page 5: Mutual-reinforcement document summarization using embedded graph based sentence clustering for storytelling Zhengchen Zhang, Shuzhi Sam Ge, Hongsheng He.

Sentence ranking using embedded graph based sentence clustering

• To reduce the influence of the low cosine similarity weights and to enhance sentence clustering performance, an embedding algorithm is performed on the graph

• the sentences are ranked according to the mutual effects between sentences, termsand clusters based on the assumptions– A sentence is assigned a high rank if it is similar to many high ranking

clusters and it contains many high ranking terms

– The rank of a cluster is high if it contains many high ranking sentences and many high ranking terms

– The rank of a term is high if it appears in many high ranking sentences and clusters.

5

Page 6: Mutual-reinforcement document summarization using embedded graph based sentence clustering for storytelling Zhengchen Zhang, Shuzhi Sam Ge, Hongsheng He.

Document modeling

• The weight wij of an edge denotes the distance between sentences si and sj which is a cosine similarity between the vectors of two sentences

6

Page 7: Mutual-reinforcement document summarization using embedded graph based sentence clustering for storytelling Zhengchen Zhang, Shuzhi Sam Ge, Hongsheng He.

Embedded graph based sentence clustering

• To alleviate the influence of unrelated sentences, we embed the original matrix D of a document into lower dimension space inspired by Locally Linear Embedding (LLE)

• A sentence di which is a column vector of D is expressed as a linear combination of its ni most similar sentences dj

• where i is the set of sentences most similar to di

7

Page 8: Mutual-reinforcement document summarization using embedded graph based sentence clustering for storytelling Zhengchen Zhang, Shuzhi Sam Ge, Hongsheng He.

Embedded graph based sentence clustering

• In graph embedding , we minimize the following cost function of approximation error to determine the optimal weight matrix rij

• Wi =[ri1, …, rin] are the weights connecting di to its neighbors

• Partial derivatives with respect to each weight rij

• Wi is found by solving the equations

8

Page 9: Mutual-reinforcement document summarization using embedded graph based sentence clustering for storytelling Zhengchen Zhang, Shuzhi Sam Ge, Hongsheng He.

Embedded graph based sentence clustering

• The vectors of embedded sentences with enhanced relationship di are obtained by minimizing the cost function

• While the embedding operation keeps the relationship between a sentence and a set of neighbors

• The performance of embedding operation will improve if there are more points in the graph

9

Page 10: Mutual-reinforcement document summarization using embedded graph based sentence clustering for storytelling Zhengchen Zhang, Shuzhi Sam Ge, Hongsheng He.

Mutual-reinforcement ranking algorithm

• To employ the cluster-level information and the latent theme information in the clusters for document summarization

• D is the sentence–term matrix of the document

• where cluster cj is represented by a vector which is summary of the vectors of all the sentences in this cluster

• The weight of the edge connecting term tl and cluster cj

10

jCsSC j

Page 11: Mutual-reinforcement document summarization using embedded graph based sentence clustering for storytelling Zhengchen Zhang, Shuzhi Sam Ge, Hongsheng He.

Mutual-reinforcement ranking algorithm

• r(si) is the rank of sentencesi, r(cj) is the rank of cluster cj, and r(tl) is the rank of term tl

• The sentence with the highest score is selected as a part of the summarization

11

Page 12: Mutual-reinforcement document summarization using embedded graph based sentence clustering for storytelling Zhengchen Zhang, Shuzhi Sam Ge, Hongsheng He.

Experiment

12

Page 13: Mutual-reinforcement document summarization using embedded graph based sentence clustering for storytelling Zhengchen Zhang, Shuzhi Sam Ge, Hongsheng He.

Experiment

13

• KM and agglomerative (AGG) clustering

• MREG algorithm which is named KM-TSC-EM

• mutual-reinforcement between sentences and clusters (KM-SC)

• TSC is short for Term, Sentence and Cluster

• EM is short for Embedded Graph based sentence clustering

Page 14: Mutual-reinforcement document summarization using embedded graph based sentence clustering for storytelling Zhengchen Zhang, Shuzhi Sam Ge, Hongsheng He.

Experiment

• a = 0.25 b = 0.25 and c = 0.50• a = 0.50, b = 0.50,c = 1.0• a = 0.750, b = 1.0,c = 1.0 ROUGE-1 scores 0.31841 • a = 0.750, b = 1.0,c = 0.50 ROUGE-1 scores 0.31769

14

Page 15: Mutual-reinforcement document summarization using embedded graph based sentence clustering for storytelling Zhengchen Zhang, Shuzhi Sam Ge, Hongsheng He.

15

Conclusion

• The sentences were clustered into different groups, and an embedding process was employed to reduce the effect of unrelated sentences and to enhance the sentence clustering performance.

• Performance comparison of different combinations of components illustrated that the algorithm improved system performance and was more robust with respect to different cluster numbers.

• Computer Speech & Language 2012 Camille Guinaudeau , Guillaume Gravier , Pascale Sébillo Enhancing lexical cohesion measure with confidence measures, semantic relations and language model interpolation for multimedia spoken content topic segmentation– Confidence score

Page 16: Mutual-reinforcement document summarization using embedded graph based sentence clustering for storytelling Zhengchen Zhang, Shuzhi Sam Ge, Hongsheng He.

摘要示意圖

16

S1S2S3……Sn

1D

mD

一般化收集語料庫

初次檢索系統KL(S||D)

詞頻特徵上下文特徵

相關特徵

訓練文本語料

生成摘要KL(D||S)