Jorge Proença 1,2 Arlindo Veiga 1,2 Fernando Perdigão 1,2 The SPL-IT Query by Example Search on Speech system for MediaEval 2014 The 2014 Query by Example Search on Speech (QUESST) 1 Instituto de Telecomunicações, Coimbra, Portugal 2 Electrical and Computer Eng. Department, University of Coimbra, Portugal
13
Embed
T he SPL - IT Query by Example Search on Speech system for MediaEval 2014
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Jorge Proença 1,2
Arlindo Veiga 1,2
Fernando Perdigão 1,2
The SPL-IT Query by Example Search on Speech
system for MediaEval 2014
The 2014 Query by Example Search on Speech (QUESST)
1 Instituto de Telecomunicações,
Coimbra, Portugal
2 Electrical and Computer Eng.
Department,
University of Coimbra, Portugal
2
SPL-IT system
MediaEval 2014
| October 16-17 2014, Barcelona, SPAIN
Overview of the system:
Fuses Dynamic Time Warping (DTW) modifications
Fuses results from systems with phonetic recognizers for 3
languages
3
Phonetic Recognizer
MediaEval 2014
| October 16-17 2014, Barcelona, SPAIN
Hard to extract good posteriorgrams with an HMM system (our in-
house system).
Used 3 systems/languages (for 8 kHz) based on long temporal context
and neural networks from Brnu University of Technology (BUT):
Czech
Hungarian
Russian
Output: posteriorgrams (3 states per phoneme).
Leading and trailing silence/noise removed
Ph
on
em
e S
tate
Frame
State Posteriorgram example for one query
4
Dynamic Time Warping
MediaEval 2014
| October 16-17 2014, Barcelona, SPAIN
Local Distance matrix:
Dot Product of Query and Audio posterior probability vectors;
Back-off with l =10-4
, logD q x q x
Distance Matrix of Query vs Audio
5
Dynamic Time Warping
MediaEval 2014
| October 16-17 2014, Barcelona, SPAIN
Basic DTW strategy (A1):
Smallest distance in identically
weighted unitary jumps:
Distance Matrix (top) and accumulated Distance matrix (bottom) of Query vs Audio
6
DTW Modifications
MediaEval 2014
| October 16-17 2014, Barcelona, SPAIN
4 additional approaches:
(A2) – Cutting up to 250ms at the end of the query,
keeping the segment above 500ms
(A3) – Cutting up to 250ms at the beginning of the query,
keeping the segment above 500ms
Que
ryQ
ue
ry
Audio
Query vs. Audio posterior distance matrix (top) and the best path from A2 (bottom)
7
DTW Modifications
MediaEval 2014
| October 16-17 2014, Barcelona, SPAIN
(A4) – Allowing one jump in the path up to ½ Query’s length,
can’t occur at initial and final 250ms of the query
can’t occur for queries shorter than 800msQ
ue
ryQ
ue
ry
Audio
Query vs. Audio posterior distance matrix (top) and the best path from A4 (bottom)
8
DTW Modifications
MediaEval 2014
| October 16-17 2014, Barcelona, SPAIN
(A5) – Swaps: accounting for re-ordering of words.
Backtrack the best 5 candidates from (A1) from the end,
Find the best path for the beginning of the query, ahead of the
end of the first one, with restrictions similar to (A4).Q
ue
ryQ
ue
ry
Audio
Query vs. Audio posterior distance matrix (top) and the best path from A5 (bottom)
9
Fusing systems
MediaEval 2014
| October 16-17 2014, Barcelona, SPAIN
Different approaches:
Minimum of the approaches – not the best.
Harmonic mean found to be a good compromise.
Per-query normalization (standard score):
Different languages:
Arithmetic mean of the 3 scores.
X
10
Submissions and Results
MediaEval 2014
| October 16-17 2014, Barcelona, SPAIN
Primary: fusing (A1) and (A2) (basic and cutting the end)