YOU ARE DOWNLOADING DOCUMENT

Please tick the box to continue:

Transcript
Page 1: MediaEval 2015 - BUT QUESST 2015 System Description

BUT QUESST 2015 System Description

Miroslav Skácel, Igor SzökeSpeech@FIT

Faculty of Information TechnologyBrno University of Technology

MediaEval QUESST 2015 workshop, September 14.-15. 2015, Wurzen

Page 2: MediaEval 2015 - BUT QUESST 2015 System Description

System overviewOur internal task was:

● to reuse some Atomic systems as we have● to incorporate bottlenecks● to calibrate and fuse● to cope with T2/T3 queries

We ended up with:● 4 Atomic systems● 3 QbE subsystems based on DTW● 4 languages (Czech, Portuguese, Russian and Spanish).

2

Page 3: MediaEval 2015 - BUT QUESST 2015 System Description

3

Page 4: MediaEval 2015 - BUT QUESST 2015 System Description

Atomic system● no adaptation on target data (SMVN, VTLN, …)● Artificial Neural Networks – to estimate bottlenecks ● bottlenecks – trained on GlobalPhone (GP) database

4

Page 5: MediaEval 2015 - BUT QUESST 2015 System Description

Subsystem

Neural network based features:● bottleneck features (30 dimensional)● No VTLN, No SMN/SVN

Query detector● based on Dynamic Time Warping (DTW)

5

Page 6: MediaEval 2015 - BUT QUESST 2015 System Description

DTW QbE subsystem● segmental DTW (query can start in any frame of utterance)● Voice Activity Detection (VAD) only on queries● Pearson product-moment correlation distance (dcorr)● slope limitation● online normalizing of the path● bottlenecks superior to posteriors

features dcorr in minCnxe (ALL)

SD CZ POST 0.984

SD HU POST 0.972

SD RU POST 0.952

GP CZ BN 0.853

GP PO BN 0.894

GP RU BN 0.893

GP SP BN 0.904

6

Page 7: MediaEval 2015 - BUT QUESST 2015 System Description

Slope limitation

7

Page 8: MediaEval 2015 - BUT QUESST 2015 System Description

Dealing with T2● query split into equal parts● each part searched in utterance separately● results averaged together● query split into 2 (denoted as 2w) and 3 (3w) parts

in late evaluation

8

Page 9: MediaEval 2015 - BUT QUESST 2015 System Description

Score normalization● raw detection scores normalized by length● the best detection per utterance-query pair selected● mode normalization performed

original mode norm.

9

Page 10: MediaEval 2015 - BUT QUESST 2015 System Description

Results

● posteriors do not work for this year dataset● slope limitation helps to control path shape● fea stack of more than 4 langs does not improve performance● mode norm is good for raw score normalization

● we will focus on denoising and dereverberation in next year

10

Page 11: MediaEval 2015 - BUT QUESST 2015 System Description

Thanks for your attention


Related Documents