Top Banner
SPEECH RECOGNITION SYSTEMS TWINKLE SAHU CSE 6 TH SEM
26
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Speech recognition system seminar

SPEECH RECOGNITION

SYSTEMS

TWINKLE SAHU CSE 6TH SEM

Page 2: Speech recognition system seminar

INTRODUCTION• Speech recognition is a process by which a

computer takes a speech signal (recorded using a microphone) and converts it into words in real-time. It is achieved by following certain steps and the software responsible for it is known as a ‘Speech Recognition System’

• SR systems are usually implemented in the form of dictation software and intelligent assistants in personal computers, smartphones, web browsers and many other devices.

Page 3: Speech recognition system seminar

CHALLENGES IN THE DESIGN OF A SR

SYSTEMSR systems have to deal with a large number of challenges like :-• The speaker’s voice is often accompanied by

surrounding noise which makes their accurate recognition difficult.

• A speaker may speak a number of different words and all of these words have to be accurately recognized.

• Accent of speaking varies from person to person and this is a very big challenge

• A speaker may speak something very quickly and all of the words spoken have to be individually recognized accurately.

Page 4: Speech recognition system seminar

TYPES OF SR SYSTEMS

• Speaker Dependent SR systems : Work by learning the unique characteristics of a single person’s voice and depend on the speaker for training.

• Speaker Independent SR systems : Designed to recognize anyone’s voice, so no training is involved.

Page 5: Speech recognition system seminar

BASIC PRINCIPLES OF SPEECH RECOGNITION• The smallest unit of spoken language is known as

a Phoneme.

• The English language contains approximately 44 phonemes representing all the vowels and consonants that we use for speech.

• We can take the example of a typical word such

as moon which can be broken down into three phonemes: m, ue, n.

Page 6: Speech recognition system seminar

• To interpret speech we must have a way of identifying the components of spoken words and phonemes act as identifying markers within speech.

• An algorithm has to be used to interpret the speech further. The Hidden Markov Model is a commonly used mathematical model used to do this.

• To create a speech recognition engine, a large database of models is created to match each phoneme.

• When a comparison is performed, the most likely match is determined between the spoken phoneme and the stored one, and further computations are performed.

Page 7: Speech recognition system seminar

COMPONENTS OF SPEECH RECOGNITION

• Corpus Collection : Database consisting of speech data that built from multiple speech samples.

Page 8: Speech recognition system seminar

• Corpus collection construction for a speaker-dependent SR system :-

Page 9: Speech recognition system seminar

• Corpus collection construction for a speaker-independent SR system.

Page 10: Speech recognition system seminar

• Signal Analyzer :Analyses the speech signaland removes the background noise thus focusing only on the speaker’s speech .

• Acoustic Model : Identifies phonemes from the speech sample using a probability based mathematical model.

ACOUSTIC MODEL

Page 11: Speech recognition system seminar

• Language Model : Identifies words and thus sentences uttered by the speaker from the phonemes by making use of a dictionary file and grammar file.

DICTIONARY FILE

GRAMMAR FILE

Page 12: Speech recognition system seminar

PROCESS OF SPEECH

RECOGNITIONPAIN……

……

SPEECH ANALYZER

Page 13: Speech recognition system seminar

/p/--/ae/--/n/

SPEECH ANALYZER

Page 14: Speech recognition system seminar

ACOUSTIC MODEL

/p/--/ae/--/n/

CORRECT

TRAINED HIDDEN MARKOV MODEL

/p/--/ae/--/n/

Page 15: Speech recognition system seminar

DICTIONARY FILE

GRAMMAR FILE

/p/--/ae/--/n/ pain

pain

pain

TEXT OUTPUT

LANGUAGE MODEL

Page 16: Speech recognition system seminar

The Grammar File

Page 17: Speech recognition system seminar

HIDDEN MARKOV MODEL• Markov models are excellent ways of abstracting

simple concepts into a relatively easily computable form.

• Used in data compression to sound recognition.

From this graph we can create sequences such as:

N1 N2 N3N1 N2 N2 N2 N3 N3 N3 N3 N3

N1 N1 N2 N2 N3

Page 18: Speech recognition system seminar

N1 N2 N3 = 0.4 X 0.8 X 0.5 = 0.16  N1 N2 N2 N2 N3 N3 N3 N3 N3 = 0.4 x 0.2 x 0.2 x 0.8 x 0.5 x 0.5 x 0.5 x 0.5 = 0.0008 N1 N1 N2 N2 N3 = 0.6 x 0.4 x 0.2 x 0.8 x 0.5 = 0.192   

Page 19: Speech recognition system seminar

This accommodates for pronunciations such as:t ow m aa t ow - British Englisht ah m ey t ow - American Englisht ah mey t a - Possibly pronunciation when speaking quickly 

Page 20: Speech recognition system seminar

With sentences such as:I like apple juice - Very probableI like tomato juice - Very improbable!I hate apple juice - Relatively improbableI hate tomato juice - Relatively probable 

Page 21: Speech recognition system seminar

• The Markov Model makes the Speech Recognition systems more intelligent i.e. it can accurately differentiate between similar sounding words like in the case :

James's school... James is cool

• In simpler Markov models , the state is directly visible to the observer.

• In a hidden Markov model, the state is not directly visible, but output, dependent on the state, is visible.

Page 22: Speech recognition system seminar

PERFORMANCE OF A SR SYSTEM

• Accuracy is usually rated with word error rate (WER), whereas speed is measured with the real time factor.

• Other measures of accuracy include Single Word Error Rate (SWER) and Command Success Rate (CSR).

Page 23: Speech recognition system seminar

Factors affecting the accuracy of a SR system :-

• Vocabulary size and confusability• Speaker dependence vs. independence• Isolated, discontinuous, or continuous

speech• Task and language constraints• Read vs. spontaneous speech• Adverse conditions

Page 24: Speech recognition system seminar

APPLICATIONS• Health Care

• Military - High Performance Aircrafts - Air Traffic Control Systems

• Telephony – Smart-phones - Customer Helpline Services

• Personal Computers

Page 25: Speech recognition system seminar

SIRI AND GOOGLE NOW

Intelligent Personal Assistant developed by Apple.

Google Now is an intelligent personal assistant developed by Google.

Both use a combination of speaker- dependent and speaker-independent sr systems

Page 26: Speech recognition system seminar

CONCLUSION• Speech Recognition systems are an indispensable

part of the ever-advancing field of human-computer interaction.

• Needs greater research to tackle various challenges.

Thank You!