Top Banner
Speech Recognition Created By : Kanjariya Hardik G. Roll No : 17
19
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Speech Recognition

Speech Recognition

Created By : Kanjariya Hardik G. Roll No : 17

Page 2: Speech Recognition

Introduction

Speech recognition technology has recently reached a higher level of performance and robustness, allowing it to communicate to another user by talking .

Speech Recognization is process of decoding acoustic speech signal captured by microphone or telephone ,to a set of words.

And with the help of these it will recognize whole speech is recognized word by word .

Page 3: Speech Recognition

Types of SR There are two main types of speaker models: speaker independent

and speaker dependent.

Speaker independent models recognize the speech patterns of a large group of people.

Speaker dependent models recognize speech patterns from only one person. Both models use mathematical and statistical formulas to yield the best work match for speech. A third variation of speaker models is now emerging, called speaker adaptive.

Speaker adaptive systems usually begin with a speaker independent model and adjust these models more closely to each individual during a brief training period.

Page 4: Speech Recognition

Speech produces a sound pressure wave which forms an acoustic signal.

The microphone – receives the acoustic signal and converts it to an

analogue signal.

To store the analogue signal, it must be converted to a digital signal.

A speech recognizer tries to transform a digitally encoded acoustic signal in a natural language

into text in that language.

How does it works?..

Page 5: Speech Recognition

Speech Waveform/Spectrogram

The spectrogram is an alternative way to characterize speech.

The louder the sound the greater the amplitude on the y-axis.

s p ee ch l a bHz

s

Page 6: Speech Recognition

Speech Recognition Process Flow

Page 7: Speech Recognition

Audio input

Grammar

Acoustic Model

Recognized text

The major components

Page 8: Speech Recognition

It is important to understand that this audio stream is rarely pristine

It contains not only the speech data (what was said) but also background noise.

This noise can interfere with the recognition process, and the speech engine must handle (and possibly even adapt to) the environment within which the audio is spoken.

Audio I/O

Page 9: Speech Recognition

Once the speech data is in the proper format, the engine searches for the best match.

It does this by taking into consideration the words and phrases it knows about (the active grammars), along with its knowledge of the environment in which it is operating.

The knowledge of the environment is provided in the form of an acoustic model.

Once it identifies the most likely match for what was said, it returns what it recognized as a text string.

Acoustic+Grammer

Page 10: Speech Recognition

About SR Engine

SR requires a software application "engine" with logic built in to decipher and act on the spoken word.

Sound Card – Converts acoustic signal to digital signal.

Function of SR Engine-– SR Engine converts these digital signal to

phonemes to word.

Page 11: Speech Recognition

Different SR engine

CMU Sphinx

Microsoft SAPI

IBM ViaVoice

Page 12: Speech Recognition

Decoding process.

Page 13: Speech Recognition

Recognition Process Flow Summary

Step 1:User Input The system catches user’s voice in the form of analog

acoustic signal.

Step 2:Digitization Digitize the analog acoustic signal.

Step 3:Phonetic Breakdown Breaking signals into phonemes.

Page 14: Speech Recognition

Recognition Process Flow Summary

Step 4:Statistical Modeling Mapping phonemes to their phonetic representation

using statistics model.

Step 5:Matching According to grammar , phonetic representation and

Dictionary , the system returns an n-best list (I.e.:a word plus a confidence score)

Grammar-the union words or phrases to constraint the range of input or output in the voice application.

Dictionary-the mapping table of phonetic representation and word(EX:thu,theethe)

Page 15: Speech Recognition

REPRESENTATION OF SOFTWARE

15

Page 16: Speech Recognition

Challenges and Difficultiesof SR

Speech Recognition is still a very cumbersome problem. Following are the problem….

Speaker Variability Two speakers or even the same speaker will pronounce the

same word differently

Channel Variability The quality and position of microphone and background

environment will affect the output

Page 17: Speech Recognition

Current Software Options for PC

Dragon Systems – Naturally Speaking

Philips – FreeSpeech

IBM – ViaVoice

Lernout & Hauspie – Voice Xpress

Page 18: Speech Recognition
Page 19: Speech Recognition