Top Banner
BUT QUESST 2015 System Description Miroslav Skácel, Igor Szöke Speech@FIT Faculty of Information Technology Brno University of Technology MediaEval QUESST 2015 workshop, September 14.-15. 2015, Wurzen
11

MediaEval 2015 - BUT QUESST 2015 System Description

Jan 20, 2017

Download

Education

multimediaeval
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: MediaEval 2015 - BUT QUESST 2015 System Description

BUT QUESST 2015 System Description

Miroslav Skácel, Igor SzökeSpeech@FIT

Faculty of Information TechnologyBrno University of Technology

MediaEval QUESST 2015 workshop, September 14.-15. 2015, Wurzen

Page 2: MediaEval 2015 - BUT QUESST 2015 System Description

System overviewOur internal task was:

● to reuse some Atomic systems as we have● to incorporate bottlenecks● to calibrate and fuse● to cope with T2/T3 queries

We ended up with:● 4 Atomic systems● 3 QbE subsystems based on DTW● 4 languages (Czech, Portuguese, Russian and Spanish).

2

Page 3: MediaEval 2015 - BUT QUESST 2015 System Description

3

Page 4: MediaEval 2015 - BUT QUESST 2015 System Description

Atomic system● no adaptation on target data (SMVN, VTLN, …)● Artificial Neural Networks – to estimate bottlenecks ● bottlenecks – trained on GlobalPhone (GP) database

4

Page 5: MediaEval 2015 - BUT QUESST 2015 System Description

Subsystem

Neural network based features:● bottleneck features (30 dimensional)● No VTLN, No SMN/SVN

Query detector● based on Dynamic Time Warping (DTW)

5

Page 6: MediaEval 2015 - BUT QUESST 2015 System Description

DTW QbE subsystem● segmental DTW (query can start in any frame of utterance)● Voice Activity Detection (VAD) only on queries● Pearson product-moment correlation distance (dcorr)● slope limitation● online normalizing of the path● bottlenecks superior to posteriors

features dcorr in minCnxe (ALL)

SD CZ POST 0.984

SD HU POST 0.972

SD RU POST 0.952

GP CZ BN 0.853

GP PO BN 0.894

GP RU BN 0.893

GP SP BN 0.904

6

Page 7: MediaEval 2015 - BUT QUESST 2015 System Description

Slope limitation

7

Page 8: MediaEval 2015 - BUT QUESST 2015 System Description

Dealing with T2● query split into equal parts● each part searched in utterance separately● results averaged together● query split into 2 (denoted as 2w) and 3 (3w) parts

in late evaluation

8

Page 9: MediaEval 2015 - BUT QUESST 2015 System Description

Score normalization● raw detection scores normalized by length● the best detection per utterance-query pair selected● mode normalization performed

original mode norm.

9

Page 10: MediaEval 2015 - BUT QUESST 2015 System Description

Results

● posteriors do not work for this year dataset● slope limitation helps to control path shape● fea stack of more than 4 langs does not improve performance● mode norm is good for raw score normalization

● we will focus on denoising and dereverberation in next year

10

Page 11: MediaEval 2015 - BUT QUESST 2015 System Description

Thanks for your attention