This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Recognition TasksRecognition TasksIsolated Word Recognition (IWR)Isolated Word Recognition (IWR)
Connected Word (CW) , And Continuous Connected Word (CW) , And Continuous Speech Recognition (CSR)Speech Recognition (CSR)Speaker Dependent, Multiple Speaker, And Speaker Dependent, Multiple Speaker, And Speaker Independent Speaker Independent Vocabulary SizeVocabulary Size– Small <20Small <20– Medium >100 , <1000Medium >100 , <1000– Large >1000, <10000Large >1000, <10000– Very Large >10000Very Large >10000
Articulatory Based RecognitionArticulatory Based Recognition– Use from Articulatory system for recognitionUse from Articulatory system for recognition– This theory is the most successful until nowThis theory is the most successful until now
Auditory Based RecognitionAuditory Based Recognition– Use from Auditory system for recognitionUse from Auditory system for recognition
Hybrid Based RecognitionHybrid Based Recognition– Is a hybrid from the above theoriesIs a hybrid from the above theories
Motor TheoryMotor Theory– Model the intended gesture of speakerModel the intended gesture of speaker
1010
Recognition ProblemRecognition Problem
We have the sequence of acoustic We have the sequence of acoustic symbols and we want to find the words symbols and we want to find the words that expressed by speakerthat expressed by speaker
Solution : Finding the most probable of Solution : Finding the most probable of word sequence by having Acoustic word sequence by having Acoustic symbolssymbols
1111
Recognition ProblemRecognition Problem
A : Acoustic SymbolsA : Acoustic SymbolsW : Word SequenceW : Word Sequence
we should find so that we should find so that W)|(max)|ˆ( AWPAWP
W
1212
Bayse RuleBayse Rule
),()()|( yxPyPyxP
)()()|()|(
yPxPxyPyxP
)()()|()|(
APWPWAPAWP
1313
Bayse Rule (Cont’d)Bayse Rule (Cont’d)
)()()|(max
APWPWAP
W
)|(max)|ˆ( AWPAWPW
)()|(max
)|(maxˆ
WPWAPArg
AWPArgW
W
W
1414
Simple Language ModelSimple Language Modelnwwwww 321
),...,,,(),...,,|(
).....,,|(),|()|()(
)|()(
121
121
1234
123121
1211
WWWWPWWWWP
WWWWPWWWPWWPWP
wwwwPwP
nnn
nnn
iii
n
i
Computing this probability is very difficult and we need a very big database. So we use from Trigram and Bigram models.
1515
Simple Language Model Simple Language Model (Cont’d)(Cont’d)
)|()( 211 iii
n
iwwwPwP
)|()( 11 ii
n
iwwPwP
Trigram :
Bigram :
)()(1 i
n
iwPwP
Monogram :
1616
Simple Language Model Simple Language Model (Cont’d)(Cont’d)
)|( 123 wwwP
Computing Method :Number of happening W3 after W1W2
Search Limitation :Search Limitation :- First & End Interval- First & End Interval- Global Limitation- Global Limitation- Local Limitation- Local Limitation
The system is simple, But too much The system is simple, But too much iteration is needed for trainingiteration is needed for trainingDoesn’t determine a specific structureDoesn’t determine a specific structureRegardless of simplicity, the results are Regardless of simplicity, the results are goodgoodTraining size is large, so training should be Training size is large, so training should be offlineofflineAccuracy is relatively goodAccuracy is relatively good
Pre-processingPre-processing
Different preprocessing techniques are Different preprocessing techniques are employed as the front end for speech employed as the front end for speech recognition systemsrecognition systems
The choice of preprocessing method is The choice of preprocessing method is based on the task, the noise level, the based on the task, the noise level, the modeling tool, etc.modeling tool, etc.
3636
3838
3939
4141
4242
4343
MFCCMFCCروش روش
يي بر نحوه ادراک گوش انسان از اصوات م بر نحوه ادراک گوش انسان از اصوات ميي مبتن مبتنMFCCMFCC روش روش باشد.باشد.
بهتر بهتر يي نويز نويزييطهاطهاييژگيها در محژگيها در محيير ور ويي نسبت به سا نسبت به ساMFCCMFCC روش روش کند.کند.ييعمل معمل م
MFCCMFCCه شده ه شده يي گفتار ارا گفتار ارايييي شناسا شناسايي اساسا جهت کاربردها اساسا جهت کاربردها دارد. دارد.ييز راندمان مناسبز راندمان مناسبيينده ننده نيي گو گويييياست اما در شناسااست اما در شناسا
ر ر يي باشد که به کمک رابطه ز باشد که به کمک رابطه زيي م مMelMelدار گوش انسان دار گوش انسان يي واحد شن واحد شند:د:يي آ آييبدست مبدست م
4444
MFCCMFCCمراحل روش مراحل روش
گنال از حوزه زمان به حوزه گنال از حوزه زمان به حوزه يي: نگاشت س: نگاشت س11 مرحله مرحله زمان کوتاه. زمان کوتاه.FFTFFTفرکانس به کمک فرکانس به کمک
گنال گفتاريس : Z(n)تابع پنجره مانند پنجره :
)W(nهمينگWF= e-j2π/F
m : 0,…,F – 1;يم گفتاريطول فر : .F
4545
MFCCMFCCمراحل روش مراحل روش
لتر.لتر.يي هر کانال بانک ف هر کانال بانک فييافتن انرژافتن انرژيي: : 22مرحله مرحله
MMبر معيار مل بر معيار مل يي فيلتر مبتن فيلتر مبتنيي تعداد بانکها تعداد بانکها باشد.باشد.ييمم
توزيع فيلتر مبتنی بر معيار ملتوزيع فيلتر مبتنی بر معيار مل
4747
MFCCMFCCمراحل روش مراحل روش
DCTDCTل ل يي طيف و اعمال تبد طيف و اعمال تبديي: فشرده ساز: فشرده ساز44 مرحله مرحله MFCCMFCCب ب ييجهت حصول به ضراجهت حصول به ضرا
در رابطه باال در رابطه باالLL،،......،،00==nnب ب يي مرتبه ضرا مرتبه ضراMFCCMFCC باشد.باشد.ييمم
4848
روش مل-کپسترومروش مل-کپستروم
Mel-scaling بندی فریم
IDCT
|FFT|2
Low-order coefficientsDifferentiator
Cepstra
Delta & Delta Delta Cepstra
زمانی سیگنال
Logarithm
4949
ضرایب مل ضرایب مل ((MFCCMFCC))کپسترومکپستروم
5050
ویژگی های مل ویژگی های مل ((MFCCMFCC))کپسترومکپستروم
نگاشت انرژی های بانک فیلترمل نگاشت انرژی های بانک فیلترمل درجهتی که واریانس آنها ماکسیمم باشددرجهتی که واریانس آنها ماکسیمم باشد
((DCTDCT )با استفاده از)با استفاده ازاستقالل ویژگی های گفتار به صورت استقالل ویژگی های گفتار به صورت
((DCTDCT غیرکامل نسبت به یکدیگر)تاثیرغیرکامل نسبت به یکدیگر)تاثیرپاسخ مناسب در محیطهای تمیزپاسخ مناسب در محیطهای تمیز
کاهش کAارایی آن در محیطهای نویزیکاهش کAارایی آن در محیطهای نویزی
5151
Time-Frequency analysisTime-Frequency analysis
Short-term Fourier TransformShort-term Fourier Transform– Standard way of frequency analysis: decompose the Standard way of frequency analysis: decompose the
incoming signal into the constituent frequency components.incoming signal into the constituent frequency components.
Critical band integrationCritical band integration
Related to masking phenomenon: the Related to masking phenomenon: the threshold of a sinusoid is elevated when its threshold of a sinusoid is elevated when its frequency is close to the center frequency of frequency is close to the center frequency of a narrow-band noisea narrow-band noise
Frequency components within a critical band Frequency components within a critical band are not resolved. Auditory system interprets are not resolved. Auditory system interprets the signals within a critical band as a wholethe signals within a critical band as a whole
Spectral values in adjacent frequency Spectral values in adjacent frequency channels are highly correlatedchannels are highly correlatedThe correlation results in a Gaussian The correlation results in a Gaussian model with lots of parameters: have to model with lots of parameters: have to estimate all the elements of the estimate all the elements of the covariance matrixcovariance matrixDecorrelation is useful to improve the Decorrelation is useful to improve the parameter estimation.parameter estimation.