Top Banner
Speech Recognition By, Chhatbar Jay(14mecc03) Lokender Sekhawat(14mecc08)
27

Speech Recognition

Dec 23, 2015

Download

Documents

sppech recognition
window speech recognition
processing, afvantage, disadvantage, limitation, future scope, how speaker recognition works
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Speech Recognition

Speech Recognition

By,Chhatbar Jay(14mecc03)

Lokender Sekhawat(14mecc08)

Page 2: Speech Recognition

What is Speech Recognition?

Speech recognition is the ability of a machine or program to identify words and phrases in spoken language and convert them to a machine-readable format.

Also, known as automatic speech recognition or computer speech recognition which means understanding voice by the computer and performing any required task.

Speech Recognition (SR) is the ability to translate a dictation or spoken word to text.

Page 3: Speech Recognition

Where can it be used?

Dictation

System control/navigation

Commercial/Industrial applications

Voice dialing

Page 4: Speech Recognition

Block diagram of speech recognition

Page 5: Speech Recognition

Speech Modeling

•Acoustic ModelAn acoustic model is created by taking audio recordings

of speech, and their text transcriptions, and using software to create statistical representations of the sounds that make up each word. It is used by a speech recognition engine to recognize speech.

•Language ModelLanguage modeling is used in many natural language

processing applications such as speech recognition tries to capture the properties of a language, and to predict the next word in a speech sequence.

Page 6: Speech Recognition

TYPES OF VOICE RECOGNITION

There are two types of speech recognition. One is called speaker-dependent and the other is speaker-independent. Speaker-dependent software is commonly used for dictation software, while speaker-independent software is more commonly found in telephone applications.

Speaker-dependent software works by learning the unique characteristics of a single person’s voice, in a way similar to voice recognition. New users must first “train” the software by speaking to it, so the computer can analyze how the person talks. This often means users have to read a few pages of text to the computer before they can use the speech recognition software.

Page 7: Speech Recognition

TYPES OF VOICE RECOGNITIONSpeaker-independent software is designed to recognize

anyone’s voice, so no training is involved. This means it is the only real option for applications such as interactive voice response systems — where businesses can’t ask callers to read pages of text before using the system. The downside is that speaker-independent software is generally less accurate than speaker-dependent software.

Speech recognition engines that are speaker independent generally deal with this fact by limiting the grammars they use. By using a smaller list of recognized words, the speech engine is more likely to correctly recognize what a speaker said.

Page 8: Speech Recognition

How do humans do it?

• Articulation produces• sound waves which• the ear conveys to the brain• for processing

Page 9: Speech Recognition

How might computers do it?

• Digitization• Acoustic analysis of the speech signal• Language interpretation

Acoustic waveform Acoustic signal

Speech recognition

Page 10: Speech Recognition
Page 11: Speech Recognition

DIFFERENT PROCESSES INVOLVED

• Digitization– Converting analogue signal into digital representation

• Signal processing – Separating speech from background noise

• Phonetics– Variability in human speech

• Phonology– Recognizing individual sound distinctions (similar

phonemes)– is the systematic use of sound to encode meaning in any

spoken human language

Page 12: Speech Recognition

DIFFERENT PROCESSES INVOLVED(CONTD.)• Lexicology and syntax

• Lexicology is that part of linguistics which studies words, their nature and meaning, words' elements, relations between words, words groups and the whole lexicon.

• Syntax and pragmatics• Semantics tells about the meaning • Pragmatics is concerned with bridging the explanatory

gap between sentence meaning and speaker's meaning

Page 13: Speech Recognition

Digitization• Analogue to digital conversion • Sampling and quantizing

Sampling is converting a continuous signal into a discrete signal Quantizing is the process of approximating a continuous range of values

• Use filters to measure energy levels for various points on the frequency spectrum

• Knowing the relative importance of different frequency bands (for speech) makes this process more efficient

• E.g. high frequency sounds are less informative, so can be sampled using a broader bandwidth (log scale)

Page 14: Speech Recognition

Separating speech from background noise

• Noise cancelling microphones• Two mics, one facing speaker, the other facing away• Ambient noise is roughly same for both mics

• Knowing which bits of the signal relate to speech

Page 15: Speech Recognition

Process of speech recognition

Speaker Recognition

Speech Recognition

parsingand

arbitration

S1

S2

SK

SN

Page 16: Speech Recognition

Speaker Recognition

Speech Recognition

parsingand

arbitration

Switch on Channel 9

S1

S2

SK

SN

Page 17: Speech Recognition

Speaker Recognition

Speech Recognition

parsingand

arbitration

Who is speaking?

AnnieDavidCathy

S1

S2

SK

SN

“Authentication”

Page 18: Speech Recognition

Speaker Recognition

Speech Recognition

parsingand

arbitration

What is he saying?

On,Off,TVFridge,Door

S1

S2

SK

SN

“Understanding”

Page 19: Speech Recognition

Speaker Recognition

Speech Recognition

parsingand

arbitration

What is he talking about?

Channel->TVDim->Lamp

On->TV,Lamp

S1

S2

SK

SN“Switch”,”to”,”channel”,”nine”

“Inferring and execution”

Page 20: Speech Recognition

Framework of Voice Recognition

Face Recognition

GestureRecognition

parsingand

arbitration

S1

S2

SK

SN

“Authentication” “Understanding” “Inferring and execution”

Page 21: Speech Recognition

Speaker Recognition

•Definition• It is the method of recognizing a person based on his voice• It is one of the forms of biometric identification

•Depends of speaker specific characteristics.

Page 22: Speech Recognition

Generic Speaker Recognition System

PreprocessingFeature

ExtractionPattern

Matching

PreprocessingFeature

ExtractionSpeaker Model

Speech signalAnalysis Frames

Feature Vector

Score

Page 23: Speech Recognition

ADVANTAGES

• Advantages• People with disabilities• Organizations - Increases productivity, reduces costs and errors.• Lower operational Costs• Advances in technology will allow consumers and businesses to

implement speech recognition systems at a relatively low cost.• Cell-phone users can dial pre-programmed numbers by voice command.• Users can trade stocks through a voice-activated trading system.• Speech recognition technology can also replace touch-tone dialing

resulting in the ability to target customers that speak different languages

Page 24: Speech Recognition

DISADVANTAGES

• Difficult to build a perfect system.• Conversations

• Involves more than just words (non-verbal communication; stutters etc.• Every human being has differences such as their voice, mouth, and speaking

style.

• Filtering background noise is a task that can even be difficult for humans to accomplish.

Page 25: Speech Recognition

Future of Speech Recognition

• Accuracy will become better and better.• Dictation speech recognition will gradually become

accepted. • Small hand-held writing tablets for computer speech

recognition dictation and data entry will be developed, as faster processors and more memory become available.

• Greater use will be made of "intelligent systems" which will attempt to guess what the speaker intended to say, rather than what was actually said, as people often misspeak and make unintentional mistakes.

• Microphone and sound systems will be designed to adapt more quickly to changing background noise levels, different environments, with better recognition of extraneous material to be discarded.

Page 26: Speech Recognition

References

• 1. Alwang, Greg. “Speech Recognition,” PC Magazine, December 1 1999

• 2. Hauptmann, Alexander G. Jang, Photina Jaeyun. Carnegie Mellon University. “Learning to Recognize Speech by Watching Television,” IEEE Intelligent Systems, September/October 1999.

• 3. Miastkowski, Stan. “Latest Speech Software Gets You Up and Running Faster,” PC World, November 1999.

Page 27: Speech Recognition

THANK YOU