Top Banner
- Ajay Iyer
14

Speech Recognition

Jan 11, 2016

Download

Documents

Fedora

Speech Recognition. - Ajay Iyer. Outline. What is a Spectrogram? Types of Spectrogram Linguistic and Acoustic Category Prosodic Analysis Pitch Estimation. What is a Spectrogram?. A Spectrogram is a visual representation of an acoustic signal. - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Speech Recognition

- Ajay Iyer

Page 2: Speech Recognition

OutlineWhat is a Spectrogram?Types of SpectrogramLinguistic and Acoustic CategoryProsodic AnalysisPitch Estimation

Page 3: Speech Recognition

What is a Spectrogram?A Spectrogram is a visual representation of an

acoustic signal.It displays the degrees of amplitude, frequency and

temporal content of the signal.Depending on the size of the Fourier analysis

window, different resolutions in frequency/time are achieved.

A long analysis window, resolves frequency at the expense of time thereby giving a “Narrowband spectr0gram”.

A short analysis window on the other hand, resolves time at the expense of frequency – hence called a “Wideband spectrogram”.

Page 4: Speech Recognition

Types of Spectrograms

Narrowband Spectrogram Wideband Spectrogram

Page 5: Speech Recognition

Spectrograms

Page 6: Speech Recognition

Linguistic/ Acoustic CategoriesLabeling of the Linguistic and/or Acoustic

categories aids in speeding up the search and decoding algorithms, by discarding the impossible and highly unlikely phoneme combinations.

Implementation : The given phoneme is compared to the different categories according to TIMIT lexicon.

The category thus obtained is displayed along with the phoneme as shown in the following slide.

Page 7: Speech Recognition

Linguistic/Acoustic Categories

Page 8: Speech Recognition

Prosodic AnalysisAcoustically speaking, prosodies refer to

variation in syllable duration, loudness, pitch and the formant frequencies of the speech signal.

Prosodic features are suprasegmental, i.e they are not restricted to any one segment of speech. They occur in some higher level of an utterance.

Say for example: “No!”, “Don’t!”

Page 9: Speech Recognition

PitchOf the various prosodic features, the most

important one is the pitch. Its knowledge enables one to differentiate

between contexts in which a word is spoken viz. Alerting or Referential contexts.

Thus incorporation of pitch information increases the accuracy of the recognizer.

Page 10: Speech Recognition

ImplementationThe pitch.m file uses cepstral analysis to

extract pitch information.Pitch.m performs analysis on one analysis

frame segment.Frame based analysis has been coded for

pitch estimation of the entire speech signal.The estimated fundamental frequency (pitch)

is for the instance of time tpitch

= tinterval(frameNum - 1) + fo/Fs;

Page 11: Speech Recognition

Pitch Estimation

Page 12: Speech Recognition

Pitch Estimation

Page 13: Speech Recognition

ReferencesProsodic_Modeling_for_Improved_Speech_Recogntion_and_

Understanding_Wang_phd_thesis.pdf Prosodic Analysis of Alerting and Referential Context of

Sentinel Words_final_draft.pdf Discrimination_of_Sentinel_Word_Contexts_using_Prosodic

_Features_Journal_v1.pdf http://home.cc.umanitoba.ca/~robh/howto.htmlhttp://en.wikipedia.org/wiki/Prosody_(linguistics)

Page 14: Speech Recognition

Thank You