Top Banner
Speech Recognition Kimberlee A. Kemble Program Manager, Voice Systems Middleware Education IBM Corporation Presenter: Sajana.A
24
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Speech recognition An overview

Speech Recognition Kimberlee A. Kemble

Program Manager, Voice Systems Middleware Education

IBM Corporation

Presenter: Sajana.A S2-ELT

Page 2: Speech recognition An overview

Agenda

• What is speech Recognition??• Closer look• Terms & concepts• Components• How it works??• Pros & cons• Applications

Page 3: Speech recognition An overview

What is speech recognition?

Speech Recognition (SR) is the ability to translate a dictation or spoken word to text.

Also known as “automatic speech recognition” (ASR), “computer speech recognition”, or “speech to text” (STT)

Page 4: Speech recognition An overview

A Closer look!!!• Speech recognition engine

1. Command and control application The application can interpret the result of the

recognition as a command.

2. Dictation application Application handles the recognized text simply as text.

Page 5: Speech recognition An overview

Terms &Concepts

• Utterances1. An utterance is any stream of speech between

two periods of silence. 2. Silence delineates the start and end of an

utterance.3. An utterance can be a single word, or it can

contain multiple words (a phrase or a sentence)

Page 6: Speech recognition An overview

Continued..

• Pronunciations Represents what the speech engine thinks a word

should sound like. • Grammars

uses a particular syntax, or set of rules, to define the words and phrases that can be recognized by the engine.

define the domain, or context, within which the recognition engine works

Page 7: Speech recognition An overview

Continued..• Speaker-dependent systems– Require “training” to “teach” the individual System– More robust– But less convenient– And obviously less portable

• Speaker-independent systems– Language coverage is reduced to compensate need to be

flexible in phoneme identification– Clever compromise is to learn on the fly

Page 8: Speech recognition An overview

Components• Audio input• Grammar• Speech Recognition Engine• Acoustic Model• Recognized text

TheMicrophoneStore.comKnowBrainer.com

Page 9: Speech recognition An overview

How it works??

Speech recognition Engine

Grammar

Acoustic model

Audio input Recognized

Text

Page 10: Speech recognition An overview

ProcessHere’s another look at how SRS works...

Source:Automatic Speech Recognition: A ReviewPreeti Saini#1, Parneet Kaur*2

Page 11: Speech recognition An overview

Acceptance and Rejection

• An accepted utterance is one in which the engine returns recognized text.

• confidence score along with the text to indicate the likelihood that the returned text is correct.

• Not all utterances that are processed by the speech engine are accepted

Page 12: Speech recognition An overview

What’s hard about that?• Digitization

– Converting analogue signal into digital representation.

• Signal processing – Separating speech from background noise.

• Phonetics– Variability in human speech.

• Phonology– Recognizing individual sound distinctions (similar phonemes.)

• Lexicology and syntax– Disambiguating homophones.– Features of continuous speech.

• Syntax and pragmatics– Interpreting features.– Filtering of performance errors (disfluencies).

Page 13: Speech recognition An overview

The Uses • Individuals With Disabilities – Assists those who have visual impairment, hand immobility, dyslexia, etc.

• Medical Transcription – Reduces delays to write out medical transcriptions

• Dictation - Converts words to text in emails or other word documents (also helpful for English Language Learners).

• Access Menu Commands – Opens files using voice commands.

Page 14: Speech recognition An overview

Applications of Speech Recognition• Speech recognition applications include

Voice dialling (e.g., "Call home"), Call routing (e.g., "I would like to make a collect call"), Simple data entry (e.g., entering a credit card number), Preparation of structured documents (e.g., A radiology

report), Speech-to-text processing (e.g., word processors or

emails), and In aircraft cockpits (usually termed Direct Voice Input).

Page 15: Speech recognition An overview

Applications• Medical Transcription• Military• Telephony and other domains• Serving the disabledFurther Applications• Home automation• Automobile audio systems• Telematics

TheMicrophoneStore.comKnowBrainer.com

Page 16: Speech recognition An overview

Pros of Speech Recognition• Faster than “hand-writing”.• Allows for better spelling, whether it be in text

or documents.• Helpful for people with a mental or physical

disability .• Hands-free capability .

Page 17: Speech recognition An overview

Cons of Speech Recognition

• No program is 100% perfect

• Factors that affect the accuracy of speech recognition are: slang, homonyms, signal-to-noise ratio, and overlapping speech

• Can be expensive depending on the program

Page 18: Speech recognition An overview

Programs

Now let’s take a look at a some of the many SRS programs...DragonSiriIndigo

KnowBrainer.com

Page 19: Speech recognition An overview

Using Dragon Mobile

ftp://public.dhe.ibm.com/software/pervasive/info/products/Introduction_to_Speech_Recognition.pdf

Page 20: Speech recognition An overview

Different Home Appliances Control Scenarios

http://en.wikipedia.org/wiki/VoiceXML

Page 21: Speech recognition An overview

The Future of Assistive Technologyin Schools

•Students who need assistance in their writing skills because they have stronger oral skills.

•Students who need were absent for a class, have poor memory, or need assistance hearing the lesson.

•Students who need assistance during Guided Reading.

•Students who are English Language Learners.

•Students with visual/hearing impairments and learning disabilities regarding reading/spelling/writing.

Page 22: Speech recognition An overview

Conclusion

• Revolutionize the way people conduct business over the Web and ,differentiate world-class e-businesses.

• VoiceXML ties speech recognition and telephony together

• voice-enabled Web solutions TODAY!

Page 23: Speech recognition An overview

References• Kai-Fu Lee, Hsiao-Wuen Hon, and Raj Reddy, An Overview of the SPHINX

Speech Recognition System. IEEE Transactions on Acoustics, Speech and Signal Processing,

• Pellom, B., Sonic: The University of Colorado Continuous Speech Recognition System.

• http://www.tldp.org/HOWTO/Speech-Recognition-HOWTO/index.html• http://www.zachary.com/s/xvoice• http://xvoice.sourceforge.net/Willie Walker, Paul Lamere, Philip Kwok,

Bhiksha Raj, Rita Singh, Evandro Gouvea,• Peter Wolf, Joe Woelfel, Sphinx-4: A Flexible Open Source Framework for

SpeechRecognition.• A. Hagen, D. A. Connors, B. L. Pellom, The Analysis and Design of

Architecture Systems

Page 24: Speech recognition An overview

thank you!