... NOT JUST ANOTHER PUBLIC DOMAIN SOFTWARE PROJECT ... UAB – CIS • Joseph Picone Inst. for Signal and Info. Processing Dept. Electrical and Computer Eng. Mississippi State University • Contact Information: Box 9571 Mississippi State University Mississippi State, Mississippi 39762 Tel: 662-325-3149 Fax: 662-325-2298 Email: [email protected]• URL: www.isip.msstate.edu/publications/seminars/external/2003/u ab • Acknowledgement: Supported by several NSF grants (e.g. EIA- 9809300).
33
Embed
... NOT JUST ANOTHER PUBLIC DOMAIN SOFTWARE PROJECT... UAB – CIS Joseph Picone Inst. for Signal and Info. Processing Dept. Electrical and Computer Eng.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
... NOT JUST ANOTHERPUBLIC DOMAIN SOFTWARE PROJECT ...
• Origins date to work at Texas Instruments in 1985.
• The Institute for Signal and Information Processing (ISIP) was created in 1994 at Mississippi State University with a simple vision to develop public domain software.
• Key differentiating characteristics of this project are: Public Domain: unrestricted software (including
commercial use); no copyrights, licenses, or research-only restrictions.
Increase Participation: competitive technology plus application-specific toolkits reduce start-up costs.
Lasting Infrastructure: Support, training, education, dissemination of information are priorities.
APPROACHFLEXIBLE YET EFFICIENT
Research:
Rapid Prototyping
“Fair” Evaluations
Ease of Use
Lightweight Programming
Efficiency:
Memory
Hyper-real time training
Parallel processing
Data intensive
Research:
• Matlab• Octave• Python
ASR:
• HTK• SPHINX• CSLUISIP:
• IFCs• Java Apps• Toolkits
APPROACHPLATFORMS AND COMPILERS
Supported platforms:• Linux (Redhat 6.1 or greater) • Sun x86 Solaris 7 or greater • Windows (cygwin tools)• (Recently phased out Sun Sparc)
Languages and Compilers:• Remember Lisp? Java? Tk/Tcl?• Avoid a reliance on Perl! • C++ was the obvious choice as
a tradeoff between stability, standardization, and efficiency.
DOCUMENTATION AND WORKSHOPS
• Extensive online software documentation, tutorials, and training materials
• Self-documenting software
• Over 100 students and professionals representing 25 countries and 75 institutions have attended our workshops
• Over a dozen companies have trained in our lab
APPROACH
• Metadata extraction from conversational speech
• Automatic gisting and intelligence gathering
• Speech to text is the core technology challenge
• Machines vs. humans
• Real-time audio indexing• Time-varying channel• Dynamic language model• Multilingual and cross-lingual
APPLICATIONSREAL-TIME INFORMATION EXTRACTION
• In-vehicle dialog systems improve information access.
• Advanced user interfaces enhance workforce training and increase manufacturing efficiency.
• Noise robustness in both environments to improve recognition performance
• Advanced statistical models and machine learning technology
DIALOG SYSTEMS FOR THE CARAPPLICATIONS
APPLICATIONSSPEAKER RECOGNITION
• Voice verification for calling card security
• First wide-spread deployment of recognition technology in the telephone network
• Extension of same statistical modeling technology used in speech recognition
APPLICATIONSSPEAKER STRESS AND FATIGUE
• Recognition of emotion, stress, fatigue, and other voice qualities are possible from enhanced descriptions of the speech signal
• Fundamentally the same statistical modeling problem as other speech applications
• Fatigue analysis from voice under development under an SBIR
• RVMs yield a large reduction in the parameter count while attaining superior performance
• Computational costs mainly in training for RVMs but is still prohibitive for larger sets
Approach Error
Rate
Avg. # Parameters
Training Time
Testing Time
SVM 16.4% 257 0.5 hours 30 mins
RVM 16.2% 12 30 days 1 min
EXPERIMENTAL RESULTSSVM/RVM ALPHADIGIT COMPARISON
EXPERIMENTAL RESULTSPRACTICAL RISK MINIMIZATION?
• Reduction of complexity at the same level of performance is interesting:
• Results hold across tasks
• RVMs have been trained on 100,000 vectors
• Results suggest integrated training is critical
• Risk minimization provides a family of solutions:
• Is there a better solution than minimum risk?
• What is the impact on complexity and robustness?
• Applications to other problems?
• Speech/Non-speech classification?
• Speaker adaptation?
• Language modeling?
EXPERIMENTAL RESULTSPRELIMINARY RESULTS
ApproachError
RateAvg. #
ParametersTraining
TimeTesting
Time
SVM 15.5% 994 3 hours 1.5 hoursRVM
Constructive 14.8% 72 5 days 5 mins
RVMReduction 14.8% 74 6 days 5 mins
• Data increased to 10000 training vectors
• Reduction method has been trained up to 100k vectors (on toy task). Not possible for Constructive method
SUMMARYRELEVANT SOFTWARE RESOURCES
• Pattern Recognition Applet: compare popular algorithms on standard or custom data sets
• Speech Processing Toolkits: speech recognition, speaker recognition and verification, statistical modeling, machine learning, state of the art toolkits
• Fun Stuff: have you seen our commercial on the Home Shopping Channel?
• Foundation Classes: generic C++ implementations of many popular statistical modeling approaches
SUMMARYBRIEF BIBLIOGRAPHY
Applications to Speech Recognition:
1. J. Hamaker and J. Picone, “Advances in Speech Recognition Using Sparse Bayesian Methods,” submitted to the IEEE Transactions on Speech and Audio Processing, January 2003.
2. A. Ganapathiraju, J. Hamaker and J. Picone, “Applications of Risk Minimization to Speech Recognition,” submitted to the IEEE Transactions on Signal Processing, July 2003.
3. J. Hamaker, J. Picone, and A. Ganapathiraju, “A Sparse Modeling Approach to Speech Recognition Based on Relevance Vector Machines,” Proceedings of the International Conference of Spoken Language Processing, vol. 2, pp. 1001-1004, Denver, Colorado, USA, September 2002.
4. J. Hamaker, Sparse Bayesian Methods for Continuous Speech Recognition, Ph.D. Dissertation, Department of Electrical and Computer Engineering, Mississippi State University, December 2003.
5. A. Ganapathiraju, Support Vector Machines for Speech Recognition, Ph.D. Dissertation, Department of Electrical and Computer Engineering, Mississippi State University, January 2002.
Influential work:
6. M. Tipping, “Sparse Bayesian Learning and the Relevance Vector Machine,” Journal of Machine Learning, vol. 1, pp. 211-244, June 2001.
7. D. J. C. MacKay, “Probable networks and plausible predictions --- a review of practical Bayesian methods for supervised neural networks,” Network: Computation in Neural Systems, 6, pp. 469-505, 1995.
8. D. J. C. MacKay, Bayesian Methods for Adaptive Models, Ph. D. thesis, California Institute of Technology, Pasadena, California, USA, 1991.
9. E. T. Jaynes, “Bayesian Methods: General Background,” Maximum Entropy and Bayesian Methods in Applied Statistics, J. H. Justice, ed., pp. 1-25, Cambridge Univ. Press, Cambridge, UK, 1986.
10. V.N. Vapnik, Statistical Learning Theory, John Wiley, New York, NY, USA, 1998.
11. V.N. Vapnik, The Nature of Statistical Learning Theory, Springer-Verlag, New York, NY, USA, 1995.
12. C.J.C. Burges, “A Tutorial on Support Vector Machines for Pattern Recognition,” AT&T Bell Laboratories, November 1999.