Top Banner
8 - Speech Recognition Speech Recognition Concepts Speech Recognition Approaches Recognition Theories Bayse Rule Simple Language Model P(A|W) Network Types 1
57

8-Speech Recognition Speech Recognition Concepts Speech Recognition Approaches Recognition Theories Bayse Rule Simple Language Model P(A|W)

Dec 28, 2015

Download

Documents

Lorraine Powell
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: 8-Speech Recognition  Speech Recognition Concepts  Speech Recognition Approaches  Recognition Theories  Bayse Rule  Simple Language Model  P(A|W)

8-Speech Recognition

Speech Recognition Concepts Speech Recognition Approaches Recognition Theories Bayse Rule Simple Language Model P(A|W) Network Types

1

Page 2: 8-Speech Recognition  Speech Recognition Concepts  Speech Recognition Approaches  Recognition Theories  Bayse Rule  Simple Language Model  P(A|W)

7-Speech Recognition (Cont’d)

HMM Calculating Approaches Neural Components Three Basic HMM Problems Viterbi Algorithm State Duration Modeling Training In HMM

2

Page 3: 8-Speech Recognition  Speech Recognition Concepts  Speech Recognition Approaches  Recognition Theories  Bayse Rule  Simple Language Model  P(A|W)

Recognition Tasks Isolated Word Recognition (IWR)

Connected Word (CW) , And Continuous Speech Recognition (CSR)

Speaker Dependent, Multiple Speaker, And Speaker Independent

Vocabulary SizeSmall <20Medium >100 , <1000Large >1000, <10000Very Large >10000

3

Page 4: 8-Speech Recognition  Speech Recognition Concepts  Speech Recognition Approaches  Recognition Theories  Bayse Rule  Simple Language Model  P(A|W)

Speech Recognition Concepts

4

NLPSpeech

Processing

Text Speech

NLPSpeech

ProcessingSpeech

Understanding

Speech Synthesis

TextPhone Sequence

Speech Recognition

Speech recognition is inverse of Speech Synthesis

Page 5: 8-Speech Recognition  Speech Recognition Concepts  Speech Recognition Approaches  Recognition Theories  Bayse Rule  Simple Language Model  P(A|W)

Speech Recognition Approaches

Bottom-Up Approach

Top-Down Approach

Blackboard Approach

5

Page 6: 8-Speech Recognition  Speech Recognition Concepts  Speech Recognition Approaches  Recognition Theories  Bayse Rule  Simple Language Model  P(A|W)

Bottom-Up Approach

6

Signal Processing

Feature Extraction

Segmentation

Signal Processing

Feature Extraction

Segmentation

Segmentation

Sound Classification Rules

Phonotactic Rules

Lexical Access

Language Model

Voiced/Unvoiced/Silence

Kno

wle

dge

Sou

rces

Recognized Utterance

Page 7: 8-Speech Recognition  Speech Recognition Concepts  Speech Recognition Approaches  Recognition Theories  Bayse Rule  Simple Language Model  P(A|W)

Top-Down Approach

7

UnitMatching

System

FeatureAnalysis

LexicalHypothesis

SyntacticHypothesis

SemanticHypothesis

UtteranceVerifier/Matcher

Inventory of speech

recognition units

Word Dictionary Grammar

TaskModel

Recognized Utterance

Page 8: 8-Speech Recognition  Speech Recognition Concepts  Speech Recognition Approaches  Recognition Theories  Bayse Rule  Simple Language Model  P(A|W)

Blackboard Approach

8

EnvironmentalProcesses

Acoustic Processes Lexical

Processes

SyntacticProcesses

SemanticProcesses

Blackboard

Page 9: 8-Speech Recognition  Speech Recognition Concepts  Speech Recognition Approaches  Recognition Theories  Bayse Rule  Simple Language Model  P(A|W)

Recognition Theories

Articulatory Based RecognitionUse from Articulatory system for recognitionThis theory is the most successful until now

Auditory Based RecognitionUse from Auditory system for recognition

Hybrid Based RecognitionIs a hybrid from the above theories

Motor TheoryModel the intended gesture of speaker

9

Page 10: 8-Speech Recognition  Speech Recognition Concepts  Speech Recognition Approaches  Recognition Theories  Bayse Rule  Simple Language Model  P(A|W)

Recognition Problem

We have the sequence of acoustic symbols and we want to find the words that expressed by speaker

Solution : Finding the most probable of word sequence by having Acoustic symbols

10

Page 11: 8-Speech Recognition  Speech Recognition Concepts  Speech Recognition Approaches  Recognition Theories  Bayse Rule  Simple Language Model  P(A|W)

Recognition Problem

A : Acoustic Symbols W : Word Sequence

we should find so that

11

W)|(max)|ˆ( AWPAWP

W

Page 12: 8-Speech Recognition  Speech Recognition Concepts  Speech Recognition Approaches  Recognition Theories  Bayse Rule  Simple Language Model  P(A|W)

Bayse Rule

),()()|( yxPyPyxP

12

)(

)()|()|(

yP

xPxyPyxP

)(

)()|()|(

AP

WPWAPAWP

Page 13: 8-Speech Recognition  Speech Recognition Concepts  Speech Recognition Approaches  Recognition Theories  Bayse Rule  Simple Language Model  P(A|W)

Bayse Rule (Cont’d)

13

)(

)()|(max

AP

WPWAPW

)|(max)|ˆ( AWPAWPW

)()|(max

)|(maxˆ

WPWAPArg

AWPArgW

W

W

Page 14: 8-Speech Recognition  Speech Recognition Concepts  Speech Recognition Approaches  Recognition Theories  Bayse Rule  Simple Language Model  P(A|W)

Simple Language Model

14

nwwwww 321

),...,,,(

),...,,|(

).....,,|(

),|()|()(

)|()(

121

121

1234

123121

1211

WWWWP

WWWWP

WWWWP

WWWPWWPWP

wwwwPwP

nnn

nnn

iii

n

i

Computing this probability is very difficult and we need a very big database. So we use from Trigram and Bigram models.

Page 15: 8-Speech Recognition  Speech Recognition Concepts  Speech Recognition Approaches  Recognition Theories  Bayse Rule  Simple Language Model  P(A|W)

Simple Language Model (Cont’d)

15

)|()( 211

iii

n

iwwwPwP

)|()( 11

ii

n

iwwPwP

Trigram :

Bigram :

)()(1

i

n

iwPwP

Monogram :

Page 16: 8-Speech Recognition  Speech Recognition Concepts  Speech Recognition Approaches  Recognition Theories  Bayse Rule  Simple Language Model  P(A|W)

Simple Language Model (Cont’d)

16

)|( 123 wwwP

Computing Method :Number of happening W3 after W1W2

Total number of happening W1W2

AdHoc Method :

)()|()|()|( 332321231123 wfwwfwwwfwwwP

Page 17: 8-Speech Recognition  Speech Recognition Concepts  Speech Recognition Approaches  Recognition Theories  Bayse Rule  Simple Language Model  P(A|W)

Error Production Factor

Prosody (Recognition should be Prosody Independent)

Noise (Noise should be prevented)

Spontaneous Speech

17

Page 18: 8-Speech Recognition  Speech Recognition Concepts  Speech Recognition Approaches  Recognition Theories  Bayse Rule  Simple Language Model  P(A|W)

P(A|W) Computing Approaches

Dynamic Time Warping (DTW)

Hidden Markov Model (HMM)

Artificial Neural Network (ANN)

Hybrid Systems

18

Page 19: 8-Speech Recognition  Speech Recognition Concepts  Speech Recognition Approaches  Recognition Theories  Bayse Rule  Simple Language Model  P(A|W)

Dynamic Time Warping

Page 20: 8-Speech Recognition  Speech Recognition Concepts  Speech Recognition Approaches  Recognition Theories  Bayse Rule  Simple Language Model  P(A|W)

Dynamic Time Warping

Page 21: 8-Speech Recognition  Speech Recognition Concepts  Speech Recognition Approaches  Recognition Theories  Bayse Rule  Simple Language Model  P(A|W)

Dynamic Time Warping

Page 22: 8-Speech Recognition  Speech Recognition Concepts  Speech Recognition Approaches  Recognition Theories  Bayse Rule  Simple Language Model  P(A|W)

Dynamic Time Warping

Page 23: 8-Speech Recognition  Speech Recognition Concepts  Speech Recognition Approaches  Recognition Theories  Bayse Rule  Simple Language Model  P(A|W)

Dynamic Time Warping

Search Limitation :

- First & End Interval

- Global Limitation

- Local Limitation

Page 24: 8-Speech Recognition  Speech Recognition Concepts  Speech Recognition Approaches  Recognition Theories  Bayse Rule  Simple Language Model  P(A|W)

Dynamic Time Warping

Global Limitation :

Page 25: 8-Speech Recognition  Speech Recognition Concepts  Speech Recognition Approaches  Recognition Theories  Bayse Rule  Simple Language Model  P(A|W)

Dynamic Time Warping

Local Limitation :

Page 26: 8-Speech Recognition  Speech Recognition Concepts  Speech Recognition Approaches  Recognition Theories  Bayse Rule  Simple Language Model  P(A|W)

Artificial Neural Network

26

...

1x

0x

1w0w

1Nw

1Nx

y)(

1

0

i

N

ii xwy

Simple Computation Element of a Neural Network

Page 27: 8-Speech Recognition  Speech Recognition Concepts  Speech Recognition Approaches  Recognition Theories  Bayse Rule  Simple Language Model  P(A|W)

Artificial Neural Network (Cont’d)

Neural Network TypesPerceptronTime DelayTime Delay Neural Network Computational

Element (TDNN)

27

Page 28: 8-Speech Recognition  Speech Recognition Concepts  Speech Recognition Approaches  Recognition Theories  Bayse Rule  Simple Language Model  P(A|W)

Artificial Neural Network (Cont’d)

28

. . .

. . .

0x

0y 1My

1Nx

Single Layer Perceptron

Page 29: 8-Speech Recognition  Speech Recognition Concepts  Speech Recognition Approaches  Recognition Theories  Bayse Rule  Simple Language Model  P(A|W)

Artificial Neural Network (Cont’d)

29

. . .

. . .

Three Layer Perceptron

. . .

. . .

Page 30: 8-Speech Recognition  Speech Recognition Concepts  Speech Recognition Approaches  Recognition Theories  Bayse Rule  Simple Language Model  P(A|W)

2.5.4.2 Neural Network Topologies

30

Page 31: 8-Speech Recognition  Speech Recognition Concepts  Speech Recognition Approaches  Recognition Theories  Bayse Rule  Simple Language Model  P(A|W)

TDNN

31

Page 32: 8-Speech Recognition  Speech Recognition Concepts  Speech Recognition Approaches  Recognition Theories  Bayse Rule  Simple Language Model  P(A|W)

2.5.4.6 Neural Network Structures for Speech Recognition

32

Page 33: 8-Speech Recognition  Speech Recognition Concepts  Speech Recognition Approaches  Recognition Theories  Bayse Rule  Simple Language Model  P(A|W)

2.5.4.6 Neural Network Structures for Speech Recognition

33

Page 34: 8-Speech Recognition  Speech Recognition Concepts  Speech Recognition Approaches  Recognition Theories  Bayse Rule  Simple Language Model  P(A|W)

Hybrid Methods

Hybrid Neural Network and Matched Filter For Recognition

34

PATTERN

CLASSIFIER

SpeechAcoustic Features Delays

Output Units

Page 35: 8-Speech Recognition  Speech Recognition Concepts  Speech Recognition Approaches  Recognition Theories  Bayse Rule  Simple Language Model  P(A|W)

Neural Network Properties

The system is simple, But too much iteration is needed for training

Doesn’t determine a specific structure Regardless of simplicity, the results are

good Training size is large, so training should

be offline Accuracy is relatively good

35

Page 36: 8-Speech Recognition  Speech Recognition Concepts  Speech Recognition Approaches  Recognition Theories  Bayse Rule  Simple Language Model  P(A|W)

Pre-processing

Different preprocessing techniques are employed as the front end for speech recognition systems

The choice of preprocessing method is based on the task, the noise level, the modeling tool, etc.

36

Page 37: 8-Speech Recognition  Speech Recognition Concepts  Speech Recognition Approaches  Recognition Theories  Bayse Rule  Simple Language Model  P(A|W)

38

Page 38: 8-Speech Recognition  Speech Recognition Concepts  Speech Recognition Approaches  Recognition Theories  Bayse Rule  Simple Language Model  P(A|W)

39

Page 39: 8-Speech Recognition  Speech Recognition Concepts  Speech Recognition Approaches  Recognition Theories  Bayse Rule  Simple Language Model  P(A|W)

41

Page 40: 8-Speech Recognition  Speech Recognition Concepts  Speech Recognition Approaches  Recognition Theories  Bayse Rule  Simple Language Model  P(A|W)

42

Page 41: 8-Speech Recognition  Speech Recognition Concepts  Speech Recognition Approaches  Recognition Theories  Bayse Rule  Simple Language Model  P(A|W)

MFCCروش

روش MFCCبر نحوه ادراک گوش انسان از اصوات ي مبتن باشد.يم

روش MFCCيها در محير وي نسبت به ساM بهتر ي نويزيطهايژگکند.يعمل م

MFCCجهت کاربردها Y ه شده ي گفتار ارايي شناساي اساسا دارد.يز راندمان مناسبينده ني گويياست اما در شناسا

دار گوش انسان ي واحد شنMelباشد که به کمک رابطه ي م د:ي آير بدست ميز

43

Page 42: 8-Speech Recognition  Speech Recognition Concepts  Speech Recognition Approaches  Recognition Theories  Bayse Rule  Simple Language Model  P(A|W)

MFCCمراحل روش

گنال از حوزه زمان به حوزه ي: نگاشت س1 مرحله زمان کوتاه.FFTفرکانس به کمک

44

گنال گفتاريس : Z(n)تابع پنجره مانند پنجره :

)W(nهمينگWF= e-j2π/F

m : 0,…,F – 1;يم گفتاريطول فر : .F

Page 43: 8-Speech Recognition  Speech Recognition Concepts  Speech Recognition Approaches  Recognition Theories  Bayse Rule  Simple Language Model  P(A|W)

MFCCمراحل روش

لتر.ي هر کانال بانک فيافتن انرژي: 2مرحله

Mبر معيار مل ي فيلتر مبتني تعداد بانکها باشد.يم

بانک فيلتر يلترهاي تابع فاست.

45

0,1,..., 1k M ( )kW j

Page 44: 8-Speech Recognition  Speech Recognition Concepts  Speech Recognition Approaches  Recognition Theories  Bayse Rule  Simple Language Model  P(A|W)

توزيع فيلتر مبتنی بر معيار مل

46

Page 45: 8-Speech Recognition  Speech Recognition Concepts  Speech Recognition Approaches  Recognition Theories  Bayse Rule  Simple Language Model  P(A|W)

MFCCمراحل روش

ل ي طيف و اعمال تبدي: فشرده ساز4 مرحلهDCT MFCCب يجهت حصول به ضرا

47

در رابطه باالL،...،0=nب ي مرتبه ضراMFCC باشد.يم

Page 46: 8-Speech Recognition  Speech Recognition Concepts  Speech Recognition Approaches  Recognition Theories  Bayse Rule  Simple Language Model  P(A|W)

روش مل-کپستروم

48

Mel-scaling بندی فریم

IDCT

|FFT|2

Low-order coefficientsDifferentiator

Cepstra

Delta & Delta Delta Cepstra

زمانی سیگنال

Logarithm

Page 47: 8-Speech Recognition  Speech Recognition Concepts  Speech Recognition Approaches  Recognition Theories  Bayse Rule  Simple Language Model  P(A|W)

ضرایب مل MFCC)کپستروم

)

49

Page 48: 8-Speech Recognition  Speech Recognition Concepts  Speech Recognition Approaches  Recognition Theories  Bayse Rule  Simple Language Model  P(A|W)

ویژگی های مل (MFCC)کپستروم

نگاشت انرژی های بانک فیلترملدرجهتی که واریانس آنها ماکسیمم

(DCT )با استفاده ازباشد استقالل ویژگی های گفتار به صورت

(DCT غیرکامل نسبت به یکدیگر)تاثیرپاسخ مناسب در محیطهای تمیزکاهش کارایی آن در محیطهای نویزی

50

Page 49: 8-Speech Recognition  Speech Recognition Concepts  Speech Recognition Approaches  Recognition Theories  Bayse Rule  Simple Language Model  P(A|W)

Time-Frequency analysis

Short-term Fourier Transform Standard way of frequency analysis: decompose the

incoming signal into the constituent frequency components.

W(n): windowing function N: frame length p: step size

51

Page 50: 8-Speech Recognition  Speech Recognition Concepts  Speech Recognition Approaches  Recognition Theories  Bayse Rule  Simple Language Model  P(A|W)

Critical band integration

Related to masking phenomenon: the threshold of a sinusoid is elevated when its frequency is close to the center frequency of a narrow-band noise

Frequency components within a critical band are not resolved. Auditory system interprets the signals within a critical band as a whole

52

Page 51: 8-Speech Recognition  Speech Recognition Concepts  Speech Recognition Approaches  Recognition Theories  Bayse Rule  Simple Language Model  P(A|W)

Bark scale

53

Page 52: 8-Speech Recognition  Speech Recognition Concepts  Speech Recognition Approaches  Recognition Theories  Bayse Rule  Simple Language Model  P(A|W)

Feature orthogonalization

Spectral values in adjacent frequency channels are highly correlated

The correlation results in a Gaussian model with lots of parameters: have to estimate all the elements of the covariance matrix

Decorrelation is useful to improve the parameter estimation.

54

Page 53: 8-Speech Recognition  Speech Recognition Concepts  Speech Recognition Approaches  Recognition Theories  Bayse Rule  Simple Language Model  P(A|W)

otherwise

validiswwifwwP

wwwPwwwwP

wwwwP

wwwPwwPwPwwwPWP

wwwW

jkkj

jNjjjQ

QQ

Q

Q

0

1)|(

),|()|(

|(

)|()|()()()(

,

11121

).121

21312121

21

Language Models for LVCSR

Word Pair Model: Specify which word pairs are valid

Page 54: 8-Speech Recognition  Speech Recognition Concepts  Speech Recognition Approaches  Recognition Theories  Bayse Rule  Simple Language Model  P(A|W)

Statistical Language Modeling

)(

)(

)(

),(

),(

),,(),|(ˆ

,),,(

),,,(),,|(ˆ

),,,,|()(

13

1

212

21

3211213

11

1111

1211

i

Nii

NiiiNiii

Niii

Q

iiN

wF

wFp

wF

wwFp

wwF

wwwFpwwwP

wwF

wwwFwwwP

wwwwPWP

Page 55: 8-Speech Recognition  Speech Recognition Concepts  Speech Recognition Approaches  Recognition Theories  Bayse Rule  Simple Language Model  P(A|W)

),,,(log1

lim

)(log)(

)()()(),,,(

),,,(log),,,(1

lim

21

2121

2121

QQ

Vw

QQ

QQQ

wwwPQ

H

wPwPH

wPwPwPwwwP

wwwPwwwPQ

H

Perplexity of the Language Model

Entropy of the Source:

First order entropy of the source:

If the source is ergodic, meaning its statistical properties can be completely characterized in a sufficiently long sequence that the Source puts out,

Page 56: 8-Speech Recognition  Speech Recognition Concepts  Speech Recognition Approaches  Recognition Theories  Bayse Rule  Simple Language Model  P(A|W)

QQ

H

Qp

Ni

Q

iiiip

Q

wwwPB

wwwPQ

H

wwwwPQ

H

wwwPQ

H

p /121

21

11

21

21

),,,(ˆ2

),,,(ˆlog1

),,,|(log1

),,,(log1

We often compute H based on a finite but sufficiently large Q:

H is the degree of difficulty that the recognizer encounters, on average,When it is to determine a word from the same source.

Using language model, if the N-gram language model PN(W) is used,An estimate of H is:

In general:

Perplexity is defined as:

Page 57: 8-Speech Recognition  Speech Recognition Concepts  Speech Recognition Approaches  Recognition Theories  Bayse Rule  Simple Language Model  P(A|W)

Overall recognition system based on subword units