Top Banner
The NNI QbE-STD System for MedialEval 2014 Peng Yang 1 , Haihua Xu 2 , Xiong Xiao 2 , Lei Xie 1 , Cheung-Chi Leung 3 Hongjie Chen 1 , Jia Yu 1 , Hang Lv 1 , Lei Wang 3 , Su Jun Leow 2 Bin Ma 3 , Eng Siong Chng 1 , Haizhou Li 2,3 1 Northwestern Polytechnical University, Xi’an, China 2 Nanyang Technological University, Singapore 3 Institute for Infocomm Research, A STAR, Singapore Presented by Haihua Xu Temasek Laboratories@NTU, Singapore 1 NNI QbE-STD system, MedialEval 2014 Workshop, Barcelona
12
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: The NNI Query-by-Example System for MediaEval 2014

1

The NNI QbE-STD System for MedialEval 2014

Peng Yang1, Haihua Xu2, Xiong Xiao2, Lei Xie1, Cheung-Chi Leung3

Hongjie Chen1, Jia Yu1, Hang Lv1, Lei Wang3, Su Jun Leow2

Bin Ma3, Eng Siong Chng1, Haizhou Li2,3

1Northwestern Polytechnical University, Xi’an, China2Nanyang Technological University, Singapore3Institute for Infocomm Research, A STAR, Singapore

Presented by Haihua XuTemasek Laboratories@NTU, Singapore

NNI QbE-STD system, MedialEval 2014 Workshop, Barcelona

Page 2: The NNI Query-by-Example System for MediaEval 2014

2

System Diagram Two groups of subsystems are used:• Subsequence DTW-based template matching on Gaussian/phone posteriorgram and

bottleneck features. • Symbolic search (SS) using phone tokenizer and weighted finite state transducer

(WFST)

NNI QbE-STD system, MedialEval 2014 Workshop, Barcelona

Page 3: The NNI Query-by-Example System for MediaEval 2014

3

TokenizersTokenizers are used to convert the audio signal into • posteriorgram or bottleneck features for DTW based systems• phone sequences/lattices for SS systems

NNI QbE-STD system, MedialEval 2014 Workshop, Barcelona

Page 4: The NNI Query-by-Example System for MediaEval 2014

4

DTW-based Systems

• Full sequence matching1: conventional subsequence DTW. Good for type 1 queries.

• Used partial matching for type 2&3 queries. • Use partial feature segment of query for matching• Segments are 600ms long and shifted by 50ms. • Improved performance for Type 3 queries.

• 9 DTW systems• 5 using full matching• 4 using partial matching

1Yang P. et al, “Intrinsic spectral analysis based on temporal context features for query-by-example spoken term detection ”, in Proc. INTERSPEECH, 2014

NNI QbE-STD system, MedialEval 2014 Workshop, Barcelona

Page 5: The NNI Query-by-Example System for MediaEval 2014

5

Why Symbolic Search (SS)• DTW is effective1, but it is

• computationally expensive and difficult to be indexed,• not easy to handle inexact match.

• Symbolic search allows indexing and fast search, e.g. using weighted finite state transducer (WFST).

1Anguera X., Rodrigues-Fuentes L.J., Szoke I., Buzo A., and Metze F., “Query by example search on speech at mediaeval 2014”, in Working Notes Proceedings of the Mediaeval 2014 workshop, Barcelona, Spain, Oct. 16-17

NNI QbE-STD system, MedialEval 2014 Workshop, Barcelona

Page 6: The NNI Query-by-Example System for MediaEval 2014

6

Symbolic Search System

NNI QbE-STD system, MedialEval 2014 Workshop, Barcelona

• Limitations of symbolic search for QbE-STD:• Must use phone recognizers of other languages for

tokenization poor symbolic representation.• Inconsistent phone representation between query

and search audio.

Page 7: The NNI Query-by-Example System for MediaEval 2014

7NNI QbE-STD system, MedialEval 2014 Workshop, Barcelona

Limitation of Conventional Symbolic Search

• Full – Full symbolic search method• pMiss – Miss rate• pFA – False alarm rate• ATWV – Actual Term Weighted Value

As query length increases,

• Missing rate approaches 100%

• False alarm rate approaches 0

• ATWV approaches 0

Page 8: The NNI Query-by-Example System for MediaEval 2014

8NNI QbE-STD system, MedialEval 2014 Workshop, Barcelona

Partial Phone Sequence Matching

Partial Matching Steps

• If a query phone hypothesis is longer than 6, get all partial sequences of the hypothesis.

• Use all the unique partial sequences to search.

• Search results are pooled and all treated as the match of the query.

• Score normalization is applied, and decision is made.

• High missing rate of long queries can be reduced by simply shorten the query representation.

• Rationale: let the system return something first, and then decide which is true match.

Page 9: The NNI Query-by-Example System for MediaEval 2014

9NNI QbE-STD system, MedialEval 2014 Workshop, Barcelona

Effectiveness of Partial Phone Sequence Matching

Full – Full symbolic search methodPartial – Partial symbolic search methodpMiss – Miss ratepFA – False alarm rateATWV – Actual Term Weighted Value

For queries longer than 6 phones:

• Missing rate reduced

• False alarm increased

• ATWV increased.

If beta is not 66.7, the best trade-off point of pMiss and pFA will change.

Page 10: The NNI Query-by-Example System for MediaEval 2014

10

Results

NNI QbE-STD system, MedialEval 2014 Workshop, Barcelona

• For type 1 query, the partial SS method is obviously worse than DTW method.

• But for type 2 and 3 queries, the partial SS method is comparable with DTW one.

• For type 3 query, the partial SS method is significantly better than the DTW one in terms MTWV.

• The two methods are very complementary.

Page 11: The NNI Query-by-Example System for MediaEval 2014

11

Conclusion

NNI QbE-STD system, MedialEval 2014 Workshop, Barcelona

We have described the NNI system for the QUESST 2014 Task

• DTW based subsystem• Symbolic search subsystem

• Why conventional SS system is not working, especially for long queries• Partial phone sequence SS method is proposed

• The NNI system results are reported

In future, research will be focused on reducing the false alarms introduced by the partial matching method.

Page 12: The NNI Query-by-Example System for MediaEval 2014

12

Thanks !

NNI QbE-STD system, MedialEval 2014 Workshop, Barcelona