1 Sentence Extraction-based Presentation Summarization Techniques and Evaluation Metrics Makoto Hirohata, Yousuke Shinnaka, Koji Iwano and Sadaoki Furui.

1

Sentence Extraction-based Presentation SummarizationTechniques and Evaluation Metrics

Makoto Hirohata, Yousuke Shinnaka, Koji Iwano and Sadaoki Furui

ICASSP2005Present ： Yao-Min Huang

Date ： 04/07/2005

2

Introduction

• One of the major applications of automatic speech recognition is to transcribe speech documents such as talks, presentations, lectures, and broadcast news.

• Spontaneous speech is ill-formed and very different from written text.– redundant information repetitions , repairs ...

– Irrelevant or incorrect information recognition errors

3

Introduction

• Therefore, an approach in which all words are simply transcribed is not an effective one for spontaneous speech.

• Instead, speech summarization which extracts important information and removes redundant and incorrect information is ideal for recognizing spontaneous speech.

4

Introduction (SAP2004)

5

Sentence Extraction Methods

• Review SAP2004– Important Sentence Extraction

• The score for important sentence extraction is calculated for each sentence.

• N : # of words in the sentence W• L(wi) 、 I(wi) 、 C(wi) are the linguistic score(trigram) , the

significance score(nouns, verbs, adjectives and out-of-vocabulary (OOV) ), and the confidence score of word wi , respectively.

– The experiment shows that Significance score is more effective than L score and C score.

• This paper simply uses the significance score.

1

1( ) ( ) ( ) ( )

N

i I i C it

S W L w I w C wN

6


• Review SAP2004– Significance score measured by the amount of information

• for content words including nouns, verbs, adjectives and out-of-vocabulary (OOV) words, based on word occurrence in a corpus

–

– Important keywords are weighted and the words unrelated to the original content, such as recognition errors, are de-weighted by this score.

( ) log Ai i i

i

FI w f icf f

F

i

i i

A i

# of occurrences of w in the recognized utterances

F # of occurrence of w in the large-scale corpus

F =

i

i

f

F

：：

7

Sentence Extraction Methods• Extraction using latent semantic analysis

– Equal to the “Generic Text Summarization Using Relevance Measure and Latent Semantic Analysis SIGIR2001“

–

****** . . .

. . .

***. . .

U

U*1 U*nU*2

U1*

Um*

U2*

.

.

.= ×∑×

*∆***∆ . . .

. . .

***. . .

VT

V*1'

V*n'

V*2' . . .

V1*'

Vn*'

V2*'

.

.

.

W1

S1

Wm

W2

SnS2

.

.

.

****** . . .

. . .

***. . .

A

Select the sentence which has the largest index valuewith the right singular vector (column vector of V)

( j# in sentence j) ji jia f word icf

8


• Extraction using dimension reduction based on SVD

– K=5

2

1

( ) ( )K

i k ikk

Score i v

9


• Extraction using sentence location– human subjects tend to extract sentences from the first and the last

segments under the condition of 10% summarization ratio,whereas there is no such tendency at 50% summarization ratio.

10


• Extraction using sentence location– The introduction and conclusion segments are

estimated based on the Hearst method(1997) using sentence cohesiveness.

• which is measured by a cosine value between content word-frequency vectors consisting of more than a fixed number of content words.

11


• Each segmentation boundary is the first sentence from the beginning or end of the presentation speech

12

Objective Evaluation Metrics

• Summarization accuracy– SAP2004

• measured in comparison with the closest word string extracted from the word network as the summarization accuracy.

• works reasonably well at relatively high summarization ratios such as 50%, but has problems at low summarization ratios such as 10%

– since the variation between manual summaries is so large that the network accepts inappropriate summaries.

13


• Summarization accuracy– Therefore

• investigated word accuracy obtained by individually using the manual summaries (SumACCY-E).

– (SumACCY-E/max)

» the largest score among human summaries

» which is equivalent to the NrstACCY proposed in [C. Hori ..etc 2004,ACL].

– (SumACCY-E/ave)

» average score

14


• Sentence recall / precision– Since sentence boundaries are not explicitly indicated

in input speech• Solution T.Kitade .. etc,2004

– Extraction of a sentence in the recognition result is considered as extraction of one or multiple sentences in the manual summary with an overlap of 50% or more words.

– In this metric, sentence recall/precision is measured by the largest score (F-measure/max) or the average score (F-measure/ave) of the F-measures.

15


• ROUGE-N

H

n

n n

( )

( )

where

S is the set of manual summaries

S is an individual manual summary

g is an N-gram

C(g ) is the number of co-occurrences of g in t

H n

H n

m nS S g S

nS S g S

C gROUGE N

C g

he manual

summary and automatic summary

16

Experiments

• Experimental conditions– 30 presentations by 20 males and 10 females in the CSJ were

automatically summarized at 10% summarization ratio.– Mean word recognition accuracy was 69%. – Sentence boundaries in the recognition results were automatically

determined using language models, which achieved 72% recall and 75% precision.

– The technique of extracting sentences• significance score (SIG)• latent semantic analysis (LSA)• dimension reduction based on SVD (DIM);• SIG combined with IC (sentence location) (SIG+IC); • LSA combined with IC (LSA+IC); • DIM combined with IC (DIM+IC).

17

Experiments

• Subjective evaluation– 180 automatic summaries (30 presentations x 6

summarization methods) were evaluated by 12 human subjects.

– The summaries were evaluated in terms of ease of understanding and appropriateness as summaries in five levels

• 1-very bad; 2-bad; 3-normal;4-good; 5-very good.

– The subjective evaluation results were converted into factor scores using factor analysis in order to normalize subjective differences.

18

Experiments

• By combining the IC method using sentence location, every summarization method was significantly improved. SIG+IC achieved the best score, but the difference between SIG+IC and DIM+IC was not significant.

19

Experiments

• Correlation between subjective and objective evaluation results– In order to investigate the relationship between subjective and objective

evaluation results, the automatic summaries were evaluated by eight objective evaluation metrics

• SumACCY • SumACCYE/max, • SumACCY-E/ave,• F-measure/max, • F-measure/ave,• ROUGE-1 (Uni-gram)• ROUGE-2 (Bi-gram)• ROUGE-3 (Tri-gram)

20

Experiments

max

21

Experiments

•

22

Experiments

• All the objective metrics yielded correlation with human judgment.

• If the effect of word recognition accuracy for each sentence is removed, all the metrics, except ROUGE-1, yield high correlations. – ROUGE-1 measures overlapping 1-grams, which probably

causes the correlation between ROUGE-1 and the

recognition accuracy.

23

Experiments

24

Experiments

• In contrast with the results averaged over all the presentations, no metric has strong correlation. – This is due to the large variation of scores over the whole set

of presentations.

25

CONCLUSION

• This paper has presented several sentence extraction methods for automatic presentation speech summarization and objective eval-uation metrics.

• We have proposed sentence extraction methods using dimension reduction based on SVD and sentence location.

• Under the condition of 10% summarization ratio, it was confirmed that the method using sentence location improves summarization results.

26

CONCLUSION

• Among the objective evaluation metrics, SumACCY, SumACCY-E, F-measure, ROUGE-2 and 3 were found to be effective.

• Although the correlation between the subjective and objective scores averaged over presentations is high, the correlation for each individual presentation is not so high

– due to the large variation of scores across presentations.

27

Future

• Future research includes investigation of– other objective evaluation metrics– evaluation of summarization methods

containing sentence compaction– producing optimum summarization techniques

by employing objective evaluation metrics

1 Sentence Extraction-based Presentation Summarization Techniques and Evaluation Metrics Makoto Hirohata, Yousuke Shinnaka, Koji Iwano and Sadaoki Furui.

Documents