Top Banner
Zero-Cost Speech Recognition task Igor Szoke (BUT, CZ) Xavier Anguera (ElsaNow, PT)
14

MediaEval 2016 - Zero-Cost Speech Recognition Task

Jan 09, 2017

Download

Science

multimediaeval
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: MediaEval 2016 - Zero-Cost Speech Recognition Task

Zero-CostSpeech Recognition task

Igor Szoke (BUT, CZ)Xavier Anguera (ElsaNow, PT)

Page 2: MediaEval 2016 - Zero-Cost Speech Recognition Task

The Zero-Cost task goal...● ...not to be a zero/low resource

○ Lots of work done with data preparation.○ There is “lots” of data outside.

● To be more realistic○ Limitation in budget.. Not in data.○ Very noisy data.

● To setup a knowledge base○ How to put things together.○ Not how to compete in data download.

● To be in touch with community○ IARPA BABEL, MGB Challenge, JHU workshops, Zero resource Interspeech challenge

Page 3: MediaEval 2016 - Zero-Cost Speech Recognition Task

The Zero-Cost task in detail...● Language: Vietnamese● Data provided (~12 hours)● 2 sub-tasks

○ Full-size ASR -> Word Error Rate○ A tokenizer -> Normalized Mutual Information

● Participants can find and share more data (must be free).● Participants score using an on-line Leader Board - www.Zero-Cost.org● To help participants we:

○ Prepared a baseline system (Triphone-GMM system trained using Kaldi)○ Provided a BUT BABEL baseline system (to frame ZC results)○ Provided a local scoring

Page 4: MediaEval 2016 - Zero-Cost Speech Recognition Task

The task in deeper detail..

● Data○ ELSA - read sentences on a cellphone (various public places)○ Forvo.com - single word pronunciation (office)○ Rhinospike.com - read sentences or paragraphs (office)○ YouTube.com - news, presentations, talks (single speaker, reverberant)

● Other data provided by participants○ (NNI) - URLs of Vietnamese web pages - (BUT) downloaded and cleaned - 18MB○ (NNI) - Wiki text - cleaned - 750MB○ (NNI) - Wordlist - 80k○ (BUT) - Subtitles - 93MB○ (BUT) - Some telenovels (video + subtitles) - not used

Page 5: MediaEval 2016 - Zero-Cost Speech Recognition Task

Participants● 12 interested● 6 signed up● 3 finished

○ ASR sub task■ NNI - fusion of 3 systems based on DNNs and data augmentation. Paticp. data used.■ ININ - fusion of several systems based on SGMMs, RNN and data augmentation.■ BUT - single system based on BLSTMs.

○ Subword sub task

■ BUT - single system based on automatically derived units using an infinite mixture of HMM.

Page 6: MediaEval 2016 - Zero-Cost Speech Recognition Task

ASR sub-task results (zero-cost.org)

Page 7: MediaEval 2016 - Zero-Cost Speech Recognition Task

ASR sub-task results (zero-cost.org)

Page 8: MediaEval 2016 - Zero-Cost Speech Recognition Task

Subword sub-task results (zero-cost.org)

Page 9: MediaEval 2016 - Zero-Cost Speech Recognition Task

Conclusion● Thanks to all!

○ Participants shared data.○ Coped with low amount of data - data augmentation.○ Used state of the art techniques.

● The future? ○ Leader board stays open. Anyone can join or continue.○ We would like to continue!

■ The new language?■ Tweak the task a bit?■ More active participants...

Page 10: MediaEval 2016 - Zero-Cost Speech Recognition Task

That’s it…

Big thank to all participants!

Page 11: MediaEval 2016 - Zero-Cost Speech Recognition Task

Zero-cost speech recognition task

Leader-Board

Page 12: MediaEval 2016 - Zero-Cost Speech Recognition Task

Zero-Cost Leader Board

Page 13: MediaEval 2016 - Zero-Cost Speech Recognition Task
Page 14: MediaEval 2016 - Zero-Cost Speech Recognition Task

Details...● Zero-cost.org● News, task description, scoring, leader board● Small dev subset provided to participants to save some of their time

uploading too often.● Uploading dev + test, to see just dev results. When evaluations are finished,

to see also test results.● Support of late submissions.● Web interface, token based backend processes (working independently on

web).