Word-subword based keyword spotting with implications in OOV detection Jan “Honza” Černocký, Igor Szöke, Mirko Hannemann, Stefan Kombrink Brno University of Techbnology BUT Speech@FIT 44 th Asilomar Conference on Signals, Systems and
Dec 26, 2015
Word-subword based keyword spotting with
implications in OOV detection
Jan “Honza” Černocký, Igor Szöke, Mirko Hannemann, Stefan Kombrink
Brno University of TechbnologyBUT Speech@FIT
44th Asilomar Conference on Signals, Systems and Computers, 8.11.2010
ASILOMAR SS & C Černocký, Szöke, Hanneman, Kombrink 8.11.2010 2/34
Agenda
• Word-based STD, OOV problem, subwords• Experiments• Sub-word units• Hybrid word-subword system • What can we do with OOVs • Conclusion
ASILOMAR SS & C Černocký, Szöke, Hanneman, Kombrink 8.11.2010 3/34
Goal of STD and glossary of terms
Goal: detect keywords or key-phrases in input speech, for each detection, output:• Identity• Position• Score
Glossary • Large Vocabulary Continuous Speech Recognizer –
LVCSR – system converting spoken speech into text.• Out-of-vocabulary – OOV – word which is not in the
LVCSR vocabulary.• Term – textual entry consisting of one or more words in
sequence.• Spoken Term Detection – STD – a way to search for a
term in spoken data.• Subword(s) – unit(s) that are parts of words (phones,
syllables, automatically found, etc.).
ASILOMAR SS & C Černocký, Szöke, Hanneman, Kombrink 8.11.2010 4/34
Word-based STD
• Due to the presence of language model, Word-based STD systems are reaching better accuracies than acoustic ones.
ASILOMAR SS & C Černocký, Szöke, Hanneman, Kombrink 8.11.2010 5/34
Implementation
• Term is searched in recognition lattice • Allows to estimate posterior probability of a
term.
ASILOMAR SS & C Černocký, Szöke, Hanneman, Kombrink 8.11.2010 6/34
The OOV problem
REF: THIS IS AN EXAMPLE OF RECOGNIZER OUTPUT
REC: THIS IS AMEX APPLE OF RECOGNIZER OUTPUT
• One OOV causes several errors:• OOV can not be found (in the output of LVCSR).• OOV impairs recognition of neighboring words.
• OOV usually carries lot of information (named entity).
We need to handle OOVs ! • Word accuracy.• Spoken term detection accuracy.• Practical (memory, CPU, index size, etc.).
ASILOMAR SS & C Černocký, Szöke, Hanneman, Kombrink 8.11.2010 7/34
Answer to OOV problem – sub-word STD
• Subword recognizer is built (output is subword lattice).
• Term is converted from words to sequence of subwords.
• This sequence is searched in the subword lattice.
*p -r-a y m * *m -ih -n ih -s t-a x r*
P R IM E M IN IS T E R
ASILOMAR SS & C Černocký, Szöke, Hanneman, Kombrink 8.11.2010 8/34
Agenda
• Word-based STD, OOV problem, subwords• Experiments• Sub-word units• Hybrid word-subword system • What can we do with OOVs • Conclusion
ASILOMAR SS & C Černocký, Szöke, Hanneman, Kombrink 8.11.2010 9/34
Evaluation - TWV
• Defined by NIST for NIST STD 2006 evaluation:
• one number• higher is better• depending on normalization
• Requires full STD system
ASILOMAR SS & C Černocký, Szöke, Hanneman, Kombrink 8.11.2010 10/34
Normalization-independent evaluation - UBTVW
• UBTWV - Upper Bound Term Weighted Value
• Finds optimum threshold for each term• one number• higher is better• Independent on
normalization
ASILOMAR SS & C Černocký, Szöke, Hanneman, Kombrink 8.11.2010 11/34
Data
• NIST STD 2006 evaluations.• 3h of English telephone conversations.• 373 1-4 words long terms occurring 4737/196
times.
ASILOMAR SS & C Černocký, Szöke, Hanneman, Kombrink 8.11.2010 12/34
Recognizer I.
• LVCSR developed in AMI/AMIDA project• State-of the art system including VTLN, MPE,
posterior features, SAT, 3 passes.
• Acoustic models trained on 278h of speech.• Language model trained on 977M word tokens
(50k vocabulary).• Dictionary pruned to generate OOVs ->
WRDRED. • Word accuracy – 69.04%.
ASILOMAR SS & C Černocký, Szöke, Hanneman, Kombrink 8.11.2010 14/34
Results
• Words• Words converted to phones• Phone recognizer
Phones too small => need longer units
ASILOMAR SS & C Černocký, Szöke, Hanneman, Kombrink 8.11.2010 15/34
Agenda
• Word-based STD, OOV problem, subwords• Experiments• Sub-word units• Hybrid word-subword system • What can we do with OOVs • Conclusion
ASILOMAR SS & C Černocký, Szöke, Hanneman, Kombrink 8.11.2010 16/34
Better subwords – phone multigrams
• Statistics of phone n-grams are collected (up to 6) from training data (phone transcriptions of speech).
• Probabilities of all units are estimated.• Training data are segmented by the most probable
sequence of multigrams.• Statistics are recomputed and low occurring units
are deleted. Several iterations.• N-gram language model is estimated on top of the
multigram segmentation of the training data.
ASILOMAR SS & C Černocký, Szöke, Hanneman, Kombrink 8.11.2010 17/34
Constrained multigrams
• nosil – sil is not part of multigram unit.• noxwrd – add information of word boundary to
multigram unit.
Term (word representation): PRIME MINISTERTerm pronunciation: p r ay m m ih n ih s t axrTerm (subword representation): *p-r-ay m* *m-ih-n ih-s t-axr*
ASILOMAR SS & C Černocký, Szöke, Hanneman, Kombrink 8.11.2010 18/34
Results
• Subword search can process OOV terms.• Subword search is not so accurate as word search
of in-vocabulary terms.• Subword search consumes more index space.
=> Need for combination of word and subword searches.
ASILOMAR SS & C Černocký, Szöke, Hanneman, Kombrink 8.11.2010 19/34
Agenda
• Word-based STD, OOV problem, subwords• Experiments• Sub-word units• Hybrid word-subword system • What can we do with OOVs • Conclusion
ASILOMAR SS & C Černocký, Szöke, Hanneman, Kombrink 8.11.2010 20/34
Parallel word-subword
… works, but needs to maintain and run 2 systems.
ASILOMAR SS & C Černocký, Szöke, Hanneman, Kombrink 8.11.2010 22/34
Implementation by composition of networks
ASILOMAR SS & C Černocký, Szöke, Hanneman, Kombrink 8.11.2010 23/34
Multigram dictionary for hybrid system
• For hybrid system, phone multigrams must not be trained on utterances.
• Phone multigrams are trained on dictionary.• Experimented with LVCSR vs. big vs. OOV
dictionary.
ASILOMAR SS & C Černocký, Szöke, Hanneman, Kombrink 8.11.2010 24/34
Results – different configurations
• Pruning factors play role in the memory consumption, size of index, RT factor …
• “Reasonable system”• ~2.5x slower than word• ~2.5x bigger index than word• Matches the accuracy of word system for IV• OOVs found.
ASILOMAR SS & C Černocký, Szöke, Hanneman, Kombrink 8.11.2010 25/34
Agenda
• Word-based STD, OOV problem, subwords• Experiments• Sub-word units• Hybrid word-subword system • What can we do with OOVs • Conclusion
ASILOMAR SS & C Černocký, Szöke, Hanneman, Kombrink 8.11.2010 26/34
OOV detection by the hybrid system
Comparison of the subword confidence measure
to a threshold => detection of
OOVs
ASILOMAR SS & C Černocký, Szöke, Hanneman, Kombrink 8.11.2010 27/34
OOV recovery
Use of phoneme to grapheme (P2G) to derive word-form of detected OOV
ASILOMAR SS & C Černocký, Szöke, Hanneman, Kombrink 8.11.2010 28/34
Alignment error model
• Some detected OOVs could be even converted back to in-vocabulary words !
• But the phone pronunciation in 1-best output is not ideal…
• … alignment error model• Parameters (probabilities of deletion, insertion,
substitution) trained from data. • Can process dictionary and look up detected
OOVs.
ASILOMAR SS & C Černocký, Szöke, Hanneman, Kombrink 8.11.2010 29/34
Going more complex …
Can construct an wFST accounting for • Sequences of in-vocabulary words• In-vocabulary words + common pre- and
suffixes• OOVs• And combinations …
m ey sh en -> INFORMATIONae l k ax hh aa l ih z em (ALCOHOLISM) -> ALCOHOL /
ISMaa f ax s m ae k s (’Office Max’) -> OFFICE OOV1572
ASILOMAR SS & C Černocký, Szöke, Hanneman, Kombrink 8.11.2010 30/34
OOV clustering
• Alignment model allows for the evaluation of similarity
• Clustering possible
ASILOMAR SS & C Černocký, Szöke, Hanneman, Kombrink 8.11.2010 31/34
Agenda
• Word-based STD, OOV problem, subwords• Experiments• Sub-word units• Hybrid word-subword system • What can we do with OOVs • Conclusion
ASILOMAR SS & C Černocký, Szöke, Hanneman, Kombrink 8.11.2010 32/34
Conclusion
• Subword system with constrained multigrams - very good STD performace and OOV tolerant system.
• Improved hybrid word-subword system tested from STD accuracy and real application point of view.• Hybrid system brings better accuracy/size ratio and is
faster than the standalone system.• It works well in a real indexing & search engine.
• With a hybrid system, we can • Recover OOVs (simple P2G or more elaborate model)• Measure similarity of OOVs• Cluster them, find re-occurring ones, update
vocabulary.
ASILOMAR SS & C Černocký, Szöke, Hanneman, Kombrink 8.11.2010 33/34
Reading and playing with
• Igor Szöke: Hybrid word-subword spoken term detection, Ph.D. thesis, Brno University of Technology, Oct 2010
• Stefan Kombrink, Mirko Hannemann, Lukáš Burget, and Hynek Heřmanský: Recovery of Rare Words in Lecture Speech, in Proc. Text, Speech and Dialogue (TSD) 2010, Brno, 2010
• Mirko Hannemann, Stefan Kombrink, Martin Karafiát, and Lukáš Burget: Similarity Scoring for Recognizing Repeated Out-of-VocabularyWords, in Proc. Interspeech 2010, Makuhari, Japan, 2010.
• … ‘Publications’ section of http://speech.fit.vutbr.cz/
• http://www.superlectures.com/odyssey/