Transcript

1

The NNI QbE-STD System for MedialEval 2014

Peng Yang1, Haihua Xu2, Xiong Xiao2, Lei Xie1, Cheung-Chi Leung3

Hongjie Chen1, Jia Yu1, Hang Lv1, Lei Wang3, Su Jun Leow2

Bin Ma3, Eng Siong Chng1, Haizhou Li2,3

1Northwestern Polytechnical University, Xi’an, China2Nanyang Technological University, Singapore3Institute for Infocomm Research, A STAR, Singapore

Presented by Haihua XuTemasek Laboratories@NTU, Singapore

NNI QbE-STD system, MedialEval 2014 Workshop, Barcelona

2

System Diagram Two groups of subsystems are used:• Subsequence DTW-based template matching on Gaussian/phone posteriorgram and

bottleneck features. • Symbolic search (SS) using phone tokenizer and weighted finite state transducer

(WFST)

NNI QbE-STD system, MedialEval 2014 Workshop, Barcelona

3

TokenizersTokenizers are used to convert the audio signal into • posteriorgram or bottleneck features for DTW based systems• phone sequences/lattices for SS systems

NNI QbE-STD system, MedialEval 2014 Workshop, Barcelona

4

DTW-based Systems

• Full sequence matching1: conventional subsequence DTW. Good for type 1 queries.

• Used partial matching for type 2&3 queries. • Use partial feature segment of query for matching• Segments are 600ms long and shifted by 50ms. • Improved performance for Type 3 queries.

• 9 DTW systems• 5 using full matching• 4 using partial matching

1Yang P. et al, “Intrinsic spectral analysis based on temporal context features for query-by-example spoken term detection ”, in Proc. INTERSPEECH, 2014

NNI QbE-STD system, MedialEval 2014 Workshop, Barcelona

5

Why Symbolic Search (SS)• DTW is effective1, but it is

• computationally expensive and difficult to be indexed,• not easy to handle inexact match.

• Symbolic search allows indexing and fast search, e.g. using weighted finite state transducer (WFST).

1Anguera X., Rodrigues-Fuentes L.J., Szoke I., Buzo A., and Metze F., “Query by example search on speech at mediaeval 2014”, in Working Notes Proceedings of the Mediaeval 2014 workshop, Barcelona, Spain, Oct. 16-17

NNI QbE-STD system, MedialEval 2014 Workshop, Barcelona

6

Symbolic Search System

NNI QbE-STD system, MedialEval 2014 Workshop, Barcelona

• Limitations of symbolic search for QbE-STD:• Must use phone recognizers of other languages for

tokenization poor symbolic representation.• Inconsistent phone representation between query

and search audio.

7NNI QbE-STD system, MedialEval 2014 Workshop, Barcelona

Limitation of Conventional Symbolic Search

• Full – Full symbolic search method• pMiss – Miss rate• pFA – False alarm rate• ATWV – Actual Term Weighted Value

As query length increases,

• Missing rate approaches 100%

• False alarm rate approaches 0

• ATWV approaches 0

8NNI QbE-STD system, MedialEval 2014 Workshop, Barcelona

Partial Phone Sequence Matching

Partial Matching Steps

• If a query phone hypothesis is longer than 6, get all partial sequences of the hypothesis.

• Use all the unique partial sequences to search.

• Search results are pooled and all treated as the match of the query.

• Score normalization is applied, and decision is made.

• High missing rate of long queries can be reduced by simply shorten the query representation.

• Rationale: let the system return something first, and then decide which is true match.

9NNI QbE-STD system, MedialEval 2014 Workshop, Barcelona

Effectiveness of Partial Phone Sequence Matching

Full – Full symbolic search methodPartial – Partial symbolic search methodpMiss – Miss ratepFA – False alarm rateATWV – Actual Term Weighted Value

For queries longer than 6 phones:

• Missing rate reduced

• False alarm increased

• ATWV increased.

If beta is not 66.7, the best trade-off point of pMiss and pFA will change.

10

Results

NNI QbE-STD system, MedialEval 2014 Workshop, Barcelona

• For type 1 query, the partial SS method is obviously worse than DTW method.

• But for type 2 and 3 queries, the partial SS method is comparable with DTW one.

• For type 3 query, the partial SS method is significantly better than the DTW one in terms MTWV.

• The two methods are very complementary.

11

Conclusion

NNI QbE-STD system, MedialEval 2014 Workshop, Barcelona

We have described the NNI system for the QUESST 2014 Task

• DTW based subsystem• Symbolic search subsystem

• Why conventional SS system is not working, especially for long queries• Partial phone sequence SS method is proposed

• The NNI system results are reported

In future, research will be focused on reducing the false alarms introduced by the partial matching method.

12

Thanks !

NNI QbE-STD system, MedialEval 2014 Workshop, Barcelona

top related