Top Banner
Brief Overview of Different Versions of Sphinx Arthur Chan
29

Brief Overview of Different Versions of Sphinx Arthur Chan.

Dec 18, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Brief Overview of Different Versions of Sphinx Arthur Chan.

Brief Overview of Different Versions of Sphinx

Arthur Chan

Page 2: Brief Overview of Different Versions of Sphinx Arthur Chan.

Introduction

Software aspect of the recognizer is very important

Research always require correct use of the software.

Sphinx II + III + IV + SphinxTrain ~= 100 k lines of code Each of them are fairly complex

Page 3: Brief Overview of Different Versions of Sphinx Arthur Chan.

This presentation (30 pages) Introduction (3 pages) History of Sphinx (13 pages)

Sphinx I (2 pages) Sphinx II (2 pages) Sphinx III (3 pages) SphinxTrain (3 pages) Sphinx IV (3 pages)

How do I get the source code? (4 pages) Versioning Three rules of not getting lost in different recognizers

Where can I get “official” information? (2 pages) Outlook in each recognizer. (3 pages) Conclusion

Page 4: Brief Overview of Different Versions of Sphinx Arthur Chan.

Brief history of Sphinx

Largely adapted from Rita’s “The Sphinx Speech Recognition Systems”

www.cs.cmu.edu/~rsingh/ Kevin et al’s “Speech Recognition: Past, Present

and Future” www.cs.cmu.edu/~msiegler/ASR/futureofcmu-final.ht

ml

Page 5: Brief Overview of Different Versions of Sphinx Arthur Chan.

Before Sphinx

Dragon One of the first use of HMM in speech recognition One of the first use of “purely statistically model” in

speech Express the knowledge using HMM network

Harpy One of the first use of beam search Use phoneme to represent words.

Page 6: Brief Overview of Different Versions of Sphinx Arthur Chan.

Sphinx I

Before Sphinx …... From AT&T’s literature, the concept of speaker-

independence was proposed in 1979 In 1979-1987, most systems are either,

Speaker dependent Speaker independent but in a very small domain

(<100 words)

Sphinx I is therefore outstanding Accuracy is 90% on Resource Management

Page 7: Brief Overview of Different Versions of Sphinx Arthur Chan.

Sphinx I (1987)

By Kai-Fu Lee and Roberto Bisiani Key developer included Hsiao-wuen Hon, Fil Alleva Written in C. Continuous speech recognizer using discrete HMM

with 3 codebooks of size 256. Using simple word-pair grammar Generalize triphones Real-time on Sun3 or Dec 3000 Where is the source code? Good antique!

Page 8: Brief Overview of Different Versions of Sphinx Arthur Chan.

Sphinx II (1992)

By Xuedong Huang Hardwired to 5-state Bakis topology 3-gram language models Decision-tree tying of HMM (by Mei-Yuh Hua

ng) 90% in WSJ task (0 or 1?)

Page 9: Brief Overview of Different Versions of Sphinx Arthur Chan.

Fast Beam Search v. X FBS-6 flat lexicon decoder FBS-7 lexicon tree-based. FBS-8 decoder (written by Ravi Mosur, see thesis in

96) Support multiple types of beam pruning. Lexical tree Tricks in GMM Computation

Machine optimization: loop unrolling Predictive Codebook computation Phoneme lookahead Best path search .

Page 10: Brief Overview of Different Versions of Sphinx Arthur Chan.

Other facts about Sphinx II

We license it at the beginning (seem to back till days like 95)

In 2000, it starts to be open-sourced in Sourceforge under Berkeley’s style license You could incorporate Sphinx’s source code You don’t need to open your source code. (No rec

ursive legal binding) Similar to LGPL

In 2001, a major alpha release by Kevin that ensures portability in several platforms.

Page 11: Brief Overview of Different Versions of Sphinx Arthur Chan.

Sphinx III flat lexicon decoder (“s3”,“s3flat”,”s3slow”) Sphinx III (by Ravi Mosur)

Flat Lexicon Support both CHMM and SCHMM “Poor-man” trigram

Use only the most likely first word, this avoid D^2 expansion of the word lattice.

Arbitrary topology Very accurate, used in evaluation of BN and others.

Derivative from the search include N-best generator Aligner Phone recognizer

Page 12: Brief Overview of Different Versions of Sphinx Arthur Chan.

Sphinx III tree lexicon decoder(“s3.x”,”s3fast”,”s3inaccurate”)

What is s3.x actually? A “spin-off” of the Sphinx III flat lexicon’s source c

ode First use was in BN 10x RT evaluation in 1999

From s3.0 -> s3.2 Use tree-lexicon with unigram lookahead Lexical tree with approximation to avoid memory

problem One of the first in the world used Sub-vector quan

tization in speed-up GMM computation

Page 13: Brief Overview of Different Versions of Sphinx Arthur Chan.

(cont.)

From s3.2 -> s3.3 (Rita, Ricky) Live mode recognizer (livedecode) and simulator

(livepretend) From s3.3 -> s3.4 (Evandro, Arthur C, Jahan

zeb,) 4-level of speed-up of GMM computation, phone

me lookahead Bug fixes in live mode

From s3.4 -> s3.5 (Evandro, Arthur C, Yitao) (Tentative) Speaker adaptation + documentation

Page 14: Brief Overview of Different Versions of Sphinx Arthur Chan.

Facts about S3

A Java version exists -> sphin3j Open source at ~2002 Always being maintained by Evandro from

2001 to now. s3.5 is the current active branch in S3

development.

Page 15: Brief Overview of Different Versions of Sphinx Arthur Chan.

SphinxTrain

Equally important and very complex But not well understood. What is SphinxTrain?

A collection of ~40 tools for Sphinx 2, 3 and 4 acoustic model training

A set of perl scripts to do training Sphinx 2 and 3 all have slight different format

s of models

Page 16: Brief Overview of Different Versions of Sphinx Arthur Chan.

Mini-history Baum Welch trainer and Viterbi trainer existed very long time ago.

Training tool in general was not systematic and was no structured. From the chaos, Eric Thayer first pull everything together to create the pack

age SphinxTrain Rita did numerous bug fixes and modification of the current trainer

Innovate the use of automatic question generation. (make_quest) Built a set of training scripts for RM (the 0*/ scripts) Write the first set of systematic tutorial on training

Ricky refined the code and wrote the first set of perl script for Training. He made a PHD out of it too. (PHD = Push Here Dummy!)

Alan and Kevin Put the set of code to sourceforge Alan build a set of training script that can “run-through”

Page 17: Brief Overview of Different Versions of Sphinx Arthur Chan.

Sphinx IV

Why Sphinx IV? Too many limitations in SphinxTrain and Sphi

nx III Only N-gram Approximation of triphones Fast GMM computation could be very troublesom

e to understood Bw doesn’t skip silence. We heavily rely on force

alignement in training.

Page 18: Brief Overview of Different Versions of Sphinx Arthur Chan.

Sphinx IV (cont.)

(By no mean complete……) Lead Design : Bhiksha (MERL) Lead Team Developer : Willer Walker (Sun) Key developers : Evandro, Rita, Phillip Kwok

and Paul Lamere Many heavy weight speech advisors: Evandr

o, Rita, Ravi, Bhiksha, Medro Moreno ……

Page 19: Brief Overview of Different Versions of Sphinx Arthur Chan.

Is Sphinx IV good?

Very accurate, very fast, very versatile and very nicely-pakcaged Java-based speech recognizer

Some internal benchmark in RM and WSJ 5k is shown to be faster and more accurate than s3.3 (under 1xRT and 10% better)

Support N-gram, FSM and FSG. Will provide facilities like confidence-scoring Still under development (just have first alpha releas

e) Trainer is not stable

Page 20: Brief Overview of Different Versions of Sphinx Arthur Chan.

Summary of the recognizers and trainers

Sphinx I -> obsolete Sphinx II -> we are using the fast recognizer now Sphinx III, the following coexists

S3 flat S3 fast (s3.4 stable, s3.5 devel)

SphinxTrain (0.92 in the CVS) Sphinx IV

Recognizer is alpha released Trainer not yet stable

Page 21: Brief Overview of Different Versions of Sphinx Arthur Chan.

How can I get version X of Sphinx?

Official Web page of Sphinx http://cmusphinx.sourceforge.net Give announcement and news of development Some documentation is there.

For the tarballs http://sourceforge.net/projects/cmusphinx Releases:

sphinx2-0.4.tgz (s2) sphinx3-0.1.tgz (s3.3) sphinx3-0.4-rc2.tgz (s3.4 release candidate II) sphinx4-0.1alpha-src.zip (s4)

Page 22: Brief Overview of Different Versions of Sphinx Arthur Chan.

Rule 2: If it doesn’t exist in CVS, officially it doesn’t exist

Simply speaking, no one actually support and maintain them. Software fall into this category: CMU LM Toolkit (we haven’t touched it for a whil

e) We may do it in the future.

Phoenix (Distributed somewhere else) Training scripts in csh

Rita always actively support it.

Page 23: Brief Overview of Different Versions of Sphinx Arthur Chan.

Rule 1: If they were no tarballs, they are in CVS ANYONE can get the following modules through CVS by using th

e following commands: cvs –z3 –d:pserver:[email protected];/cvsroot/c

musphinx co modulname modulename =

SphinxTrain -> SphinxTrain archive_s3 -> s3 + s3.0 + s3.2 + s3.3 sphinx2 -> devel ver. of sphinx2 sphinx3 =~ s3.4 -> we will check base on this to develop s3.5 share =~ cepview + lm3g2dmp sphinx3j = the java version of sphinx3 Sphinx4 = development version of sphinx4

Page 24: Brief Overview of Different Versions of Sphinx Arthur Chan.

Rule 3: You may need other modules to complete your task

SphinxTrain heavily rely on force alignment so you also need s3-align

Usage of any s3 recognizers required the LM in DMP format so you need the tool lm3g2dmp which can be found in sphinx2 or share.

Page 25: Brief Overview of Different Versions of Sphinx Arthur Chan.

Where can I get more information for the recognizer?

People to ask s2 : Evandro , Ravi S3 flat : Evandro, Ravi , ArthurC S3 tree: Evandro, Ravi, ArthurC SphinxTrain: Rita, Evandro, Ravi, ArthurC, Rong,

Ziad, Murali. S4 : S4’s developers in Sourceforge

Willie, Paul, Phillip, Bhiksha, Rita, Evandro.

Page 26: Brief Overview of Different Versions of Sphinx Arthur Chan.

Web page to look up

Rita’s web page www.cs.cmu.edu/~rsingh Contains the manual of training

Twiki web page for sphinx 4 design www.speech.cs.cmu.edu/cgi-bin/cmusphinx/twiki/

view/Sphinx4/WebHome/ ArthurC’s web page

Risk his life to write a manual for Sphinx 3.4 Also collect some information for each Sphinx

Page 27: Brief Overview of Different Versions of Sphinx Arthur Chan.

Outlook of all recognizers Sphinx II

Sorry, we won’t support it too much. Reason, s3.4 and s4 are proved to have very nice speed a

nd accuracy performance Sphinx III

Only active branch is s3.5 Moderate change in s3flat Motivated by project CALO This quarter : make adaptation works.

SphinxTrain Write a set of scripts for Continuous HMM training Silence deletion problem will be fixed.

Page 28: Brief Overview of Different Versions of Sphinx Arthur Chan.

(cont.) sphinxDoc

Chapter 1 and 2 completed (*sigh*, still 7 left) Only begin written when Arthur C is procrastinating and do

n’t want to read and play video game. Will be there at around Sep or Oct.

Sphinx IV Alpha release Trainer will be fixed

Argus Incorporate the advantages of many speech recognizers to

gether Not yet started.

Page 29: Brief Overview of Different Versions of Sphinx Arthur Chan.

Conclusion

This presentation Summarize the current code status of Sphinx and

SphinxTrain. We still have a lot of work to do……

Next presentation s3 or s3.4 from main to the search.