Top Banner
(gjones, meskevich, agyarmati @computing.dcu.ie) - 1 - Centre for Digital Video Processing C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g Towards Methods for Efficient Access to Spoken Content in the AMI Corpus Gareth J. F. Jones Maria Eskevich Ágnes Gyarmati Centre for Digital Video Processing School of Computing Dublin City University, Ireland
23

Towards Methods for Efficient Access to Spoken Content in the AMI Corpus (SSCS 2010)

Jun 30, 2015

Download

Technology

Maria Eskevich
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Towards Methods for Efficient Access to Spoken Content in the AMI Corpus (SSCS 2010)

(gjones, meskevich, agyarmati @computing.dcu.ie) - 1 - ‏

Centre for Digital Video Processing

C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g

Towards Methods for Efficient Access to Spoken Content in the AMI Corpus

Gareth J. F. Jones Maria Eskevich Ágnes Gyarmati

Centre for Digital Video Processing School of Computing

Dublin City University, Ireland

Page 2: Towards Methods for Efficient Access to Spoken Content in the AMI Corpus (SSCS 2010)

(gjones, meskevich, agyarmati @computing.dcu.ie) - 2 - ‏

Centre for Digital Video Processing

C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g

Outline

•  Issues •  AMI corpus •  Pre-processing •  Experiment and Results •  Future work

Page 3: Towards Methods for Efficient Access to Spoken Content in the AMI Corpus (SSCS 2010)

(gjones, meskevich, agyarmati @computing.dcu.ie) - 3 - ‏

Centre for Digital Video Processing

C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g

Outline

•  Issues •  AMI corpus •  Pre-processing •  Experiment and Results •  Future work

Page 4: Towards Methods for Efficient Access to Spoken Content in the AMI Corpus (SSCS 2010)

(gjones, meskevich, agyarmati @computing.dcu.ie) - 4 - ‏

Centre for Digital Video Processing

C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g

Issues: types of Spoken Content

– News broadcast: •  Structured •  Clearly articulated speech

-> standard text document retrieval task on ASR transcript

– Other types of speech (meetings, lectures): •  Lack of clearly defined document form/structure •  Informal style, cross-talk, noisy environment

->We have to define: •  Search units •  Location of relevant items

Page 5: Towards Methods for Efficient Access to Spoken Content in the AMI Corpus (SSCS 2010)

(gjones, meskevich, agyarmati @computing.dcu.ie) - 5 - ‏

Centre for Digital Video Processing

C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g

Issues: Existing Research

•  Speech Search: –  TV and radio news: Spoken Document Retrieval

(SDR) task at TREC (2000) –  Interviews: Malach Collection (2007) –  AMI (Augmented Multi-party Interaction) corpus

•  Recognition WER and Retrieval: –  Low recognition error level:

•  little loss in retrieval effectiveness (2000) •  documents are retrieved at higher ranks (2003, 2007)

–  Specific metrics (semantic impact of substitutions): •  correlation with retrieval performance (AMI Corpus, 2009)

Page 6: Towards Methods for Efficient Access to Spoken Content in the AMI Corpus (SSCS 2010)

(gjones, meskevich, agyarmati @computing.dcu.ie) - 6 - ‏

Centre for Digital Video Processing

C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g

Issues

•  Goal: – Investigate how difference between manual

and automatic transcription accuracy influences retrieval effectiveness on the material of the AMI Corpus

•  Experiment: – Segmentation of spoken content – Known-item search task using slides from

meetings as queries

Page 7: Towards Methods for Efficient Access to Spoken Content in the AMI Corpus (SSCS 2010)

(gjones, meskevich, agyarmati @computing.dcu.ie) - 7 - ‏

Centre for Digital Video Processing

C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g

Outline

•  Issues •  AMI corpus •  Pre-processing •  Experiment and Results •  Future work

Page 8: Towards Methods for Efficient Access to Spoken Content in the AMI Corpus (SSCS 2010)

(gjones, meskevich, agyarmati @computing.dcu.ie) - 8 - ‏

Centre for Digital Video Processing

C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g

AMI Corpus

•  100 hours •  Each meetings approximately 30

minutes •  Simulating project meetings •  4-5 participants •  Headset and circular microphones •  Automatic and manual transcripts

available •  Additional data (slides, minutes)

Page 9: Towards Methods for Efficient Access to Spoken Content in the AMI Corpus (SSCS 2010)

(gjones, meskevich, agyarmati @computing.dcu.ie) - 9 - ‏

Centre for Digital Video Processing

C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g

Outline

•  Issues •  AMI corpus •  Pre-processing •  Experiment and Results •  Future work

Page 10: Towards Methods for Efficient Access to Spoken Content in the AMI Corpus (SSCS 2010)

(gjones, meskevich, agyarmati @computing.dcu.ie) - 10 - ‏

Centre for Digital Video Processing

C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g

Pre-processing: segmentation

•  Linear segmentation (C99 algorithm): Cosine based sequential sentence similarity

based algorithm Boundaries inserted between sentences based on the difference of lexical inventory (stemmed)

•  Time segmentation (approximately 90 seconds)

Page 11: Towards Methods for Efficient Access to Spoken Content in the AMI Corpus (SSCS 2010)

(gjones, meskevich, agyarmati @computing.dcu.ie) - 11 - ‏

Centre for Digital Video Processing

C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g

Pre-processing: segmentation

•  Number of segments

•  Average number of words per segment

Type of transcript Linear segmentation (C99)

Manual transcript 2678

ASR transcript 3831

Type of transcript Linear segmentation (C99)

Manual transcript 320 ASR transcript 221

Page 12: Towards Methods for Efficient Access to Spoken Content in the AMI Corpus (SSCS 2010)

(gjones, meskevich, agyarmati @computing.dcu.ie) - 12 - ‏

Centre for Digital Video Processing

C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g

Pre-processing: Word Recognition Rate (WRR)

1.  Alignment between ASR and manual transcripts

2.  Recognition rate count Recognition rate – number of correctly recognized words in the meeting divided by the total number of words in the transcript

3.  Recognition rate without stop words

Page 13: Towards Methods for Efficient Access to Spoken Content in the AMI Corpus (SSCS 2010)

(gjones, meskevich, agyarmati @computing.dcu.ie) - 13 - ‏

Centre for Digital Video Processing

C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g

Relation between segmentation and recognition rate

Page 14: Towards Methods for Efficient Access to Spoken Content in the AMI Corpus (SSCS 2010)

(gjones, meskevich, agyarmati @computing.dcu.ie) - 14 - ‏

Centre for Digital Video Processing

C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g

Pre-processing: cross-segmentation

Page 15: Towards Methods for Efficient Access to Spoken Content in the AMI Corpus (SSCS 2010)

(gjones, meskevich, agyarmati @computing.dcu.ie) - 15 - ‏

Centre for Digital Video Processing

C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g

Outline

•  Issues •  AMI corpus •  Pre-processing •  Experiment and Results •  Future work

Page 16: Towards Methods for Efficient Access to Spoken Content in the AMI Corpus (SSCS 2010)

(gjones, meskevich, agyarmati @computing.dcu.ie) - 16 - ‏

Centre for Digital Video Processing

C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g

Experiment: slides and relevant segments selection

Page 17: Towards Methods for Efficient Access to Spoken Content in the AMI Corpus (SSCS 2010)

(gjones, meskevich, agyarmati @computing.dcu.ie) - 17 - ‏

Centre for Digital Video Processing

C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g

Experiment: slides and relevant segments selection

Type of queries

Number of

queries

Number of relevant segments with segmentation based on

ASR transcript Manual transcript

Min 15 56 49

Max 24 68 39

Random 25 36 42

Page 18: Towards Methods for Efficient Access to Spoken Content in the AMI Corpus (SSCS 2010)

(gjones, meskevich, agyarmati @computing.dcu.ie) - 18 - ‏

Centre for Digital Video Processing

C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g

Experiment: Indexing & Retrieval Setup

•  Indri language model of the open source Lemur Toolkit (http://www.lemurproject.org/): –  texts are stemmed using Lemur's built-in

Porter stemmer

•  Stopword list provided by Snowball (http://snowball.tartarus.org/)

Page 19: Towards Methods for Efficient Access to Spoken Content in the AMI Corpus (SSCS 2010)

(gjones, meskevich, agyarmati @computing.dcu.ie) - 19 - ‏

Centre for Digital Video Processing

C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g

Results: at ranks 100 •  Recall at ranks 100:

•  Mean Reciprocal Rate at ranks 100:

Page 20: Towards Methods for Efficient Access to Spoken Content in the AMI Corpus (SSCS 2010)

(gjones, meskevich, agyarmati @computing.dcu.ie) - 20 - ‏

Centre for Digital Video Processing

C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g

Outline

•  Issues •  AMI corpus •  Pre-processing •  Experiment •  Results •  Future work

Page 21: Towards Methods for Efficient Access to Spoken Content in the AMI Corpus (SSCS 2010)

(gjones, meskevich, agyarmati @computing.dcu.ie) - 21 - ‏

Centre for Digital Video Processing

C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g

Problems

•  Errors in the ASR output

•  Common knowledge of the participants of the meeting -> some words are not spoken

•  All parts of the meetings are indexed in the same way

•  Retrieval algorithm favours longer segments

Page 22: Towards Methods for Efficient Access to Spoken Content in the AMI Corpus (SSCS 2010)

(gjones, meskevich, agyarmati @computing.dcu.ie) - 22 - ‏

Centre for Digital Video Processing

C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g

Future work

•  Construct proper segment-based relevance set for the slides

•  Analysis of ASR errors influence on segmentation

•  ASR transcript improvement

Page 23: Towards Methods for Efficient Access to Spoken Content in the AMI Corpus (SSCS 2010)

(gjones, meskevich, agyarmati @computing.dcu.ie) - 23 - ‏

Centre for Digital Video Processing

C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g

Thank You Thank you for your attention!

Questions?