Top Banner
A STUDY ON SPEECH RECOGNITION USING DYNAMIC TIME WARPING CS 525 : Project Presentation PALDEN LAMA and MOUNIKA NAMBURU
23

A STUDY ON SPEECH RECOGNITION USING DYNAMIC TIME WARPING CS 525 : Project Presentation PALDEN LAMA and MOUNIKA NAMBURU.

Dec 19, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: A STUDY ON SPEECH RECOGNITION USING DYNAMIC TIME WARPING CS 525 : Project Presentation PALDEN LAMA and MOUNIKA NAMBURU.

A STUDY ON SPEECH RECOGNITION USING DYNAMIC TIME WARPING

CS 525 : Project Presentation

PALDEN LAMA and MOUNIKA NAMBURU

Page 2: A STUDY ON SPEECH RECOGNITION USING DYNAMIC TIME WARPING CS 525 : Project Presentation PALDEN LAMA and MOUNIKA NAMBURU.

GOALS

Learn how it works ! Focus:

Pre-Processing Dynamic Time Warping/Dynamic Programming

Verify using MATLAB Build a simple Voice to Text Converter

application.

Page 3: A STUDY ON SPEECH RECOGNITION USING DYNAMIC TIME WARPING CS 525 : Project Presentation PALDEN LAMA and MOUNIKA NAMBURU.

HOW DOES IT WORK?

Record Extracta voice Feature Vectors

Digitized Speech Signal(.wave

file)

Acoustic Preprocessin

g(DFT + MFCC)

Speech Recognizer(Dynamic

Time Warping)

Page 4: A STUDY ON SPEECH RECOGNITION USING DYNAMIC TIME WARPING CS 525 : Project Presentation PALDEN LAMA and MOUNIKA NAMBURU.
Page 5: A STUDY ON SPEECH RECOGNITION USING DYNAMIC TIME WARPING CS 525 : Project Presentation PALDEN LAMA and MOUNIKA NAMBURU.

SPEECH SIGNAL

Voiced Excitation fundamental frequency (Speaker dependent)

Loudness signal amplitude Vocal tract shape spectral shaping

(most important to recognize words)

A time signal of vowel /a:/ (fs=11 kHz, length=100ms)

time

Page 6: A STUDY ON SPEECH RECOGNITION USING DYNAMIC TIME WARPING CS 525 : Project Presentation PALDEN LAMA and MOUNIKA NAMBURU.

ACOUSTIC PRE-PROCESSING

DFT (Discrete Fourier Transform) Spectral Coeff. Inverse DFT on log power spectrum Cepstral

Coeff. Makes it easier to extract spectral shaping of the

speech signal.

frequency

Log power spectrum of vowel /a:/(fs=11 kHz, N=512)

Power spectrum of the vowel /a:/ after cepstral smoothing

Page 7: A STUDY ON SPEECH RECOGNITION USING DYNAMIC TIME WARPING CS 525 : Project Presentation PALDEN LAMA and MOUNIKA NAMBURU.

MFCC (MEL FREQUENCY CEPSTRAL COEFFICIENTS)

Mel frequency scale reflects frequency resolution of human ear.

Coeff. Of power spectrum Mel Spectral Coeff. (FEATURE VECTOR)

Page 8: A STUDY ON SPEECH RECOGNITION USING DYNAMIC TIME WARPING CS 525 : Project Presentation PALDEN LAMA and MOUNIKA NAMBURU.

RECOGNIZER One word spoken contains dozens of feature

vectors. (preprocessing every 10 ms of signal)

Compute a ”distance” between this unknown sequence of vectors (unknown word) and known sequence of vectors (prototypes of words to recognize)

PROBLEM !! Unequal length of vector sequence

Page 9: A STUDY ON SPEECH RECOGNITION USING DYNAMIC TIME WARPING CS 525 : Project Presentation PALDEN LAMA and MOUNIKA NAMBURU.

DYNAMIC TIME WARPING : FIND OPTIMAL ASSIGNMENT PATH

Page 10: A STUDY ON SPEECH RECOGNITION USING DYNAMIC TIME WARPING CS 525 : Project Presentation PALDEN LAMA and MOUNIKA NAMBURU.

DYNAMIC TIME WARPING : FIND OPTIMAL ASSIGNMENT PATH

Page 11: A STUDY ON SPEECH RECOGNITION USING DYNAMIC TIME WARPING CS 525 : Project Presentation PALDEN LAMA and MOUNIKA NAMBURU.

DYNAMIC TIME WARPING : FIND OPTIMAL ASSIGNMENT PATH

Page 12: A STUDY ON SPEECH RECOGNITION USING DYNAMIC TIME WARPING CS 525 : Project Presentation PALDEN LAMA and MOUNIKA NAMBURU.

DTW : RECOGNIZING CONNECTED WORDS

Page 13: A STUDY ON SPEECH RECOGNITION USING DYNAMIC TIME WARPING CS 525 : Project Presentation PALDEN LAMA and MOUNIKA NAMBURU.

MATLAB FUNCTIONS

PRE-PROCESSING recordMelMatrix(3)

S = wavread(“speech.wav”) C = Melfiltermatrix(S, N, K) computeMelSpectrum( C,S);

DISPLAY FEATURES Featuredisp.m

WORD RECOGNITION dp_asym(vector1, vector2)

Page 14: A STUDY ON SPEECH RECOGNITION USING DYNAMIC TIME WARPING CS 525 : Project Presentation PALDEN LAMA and MOUNIKA NAMBURU.

RESULTShello hello1

Page 15: A STUDY ON SPEECH RECOGNITION USING DYNAMIC TIME WARPING CS 525 : Project Presentation PALDEN LAMA and MOUNIKA NAMBURU.

library

hello

Page 16: A STUDY ON SPEECH RECOGNITION USING DYNAMIC TIME WARPING CS 525 : Project Presentation PALDEN LAMA and MOUNIKA NAMBURU.

computerhello

Page 17: A STUDY ON SPEECH RECOGNITION USING DYNAMIC TIME WARPING CS 525 : Project Presentation PALDEN LAMA and MOUNIKA NAMBURU.

3.0304e+003

3.5820e+003

3.4499e+003

Page 18: A STUDY ON SPEECH RECOGNITION USING DYNAMIC TIME WARPING CS 525 : Project Presentation PALDEN LAMA and MOUNIKA NAMBURU.

Welcome home (male)

Welcome home (female)

Page 19: A STUDY ON SPEECH RECOGNITION USING DYNAMIC TIME WARPING CS 525 : Project Presentation PALDEN LAMA and MOUNIKA NAMBURU.

Welcome home Welcome back

Page 20: A STUDY ON SPEECH RECOGNITION USING DYNAMIC TIME WARPING CS 525 : Project Presentation PALDEN LAMA and MOUNIKA NAMBURU.

Welcome home Computer Science

Page 21: A STUDY ON SPEECH RECOGNITION USING DYNAMIC TIME WARPING CS 525 : Project Presentation PALDEN LAMA and MOUNIKA NAMBURU.

Welcome back Computer Science

Page 22: A STUDY ON SPEECH RECOGNITION USING DYNAMIC TIME WARPING CS 525 : Project Presentation PALDEN LAMA and MOUNIKA NAMBURU.

2.6418e+003

2.9468e+003

3.8109e+003

4.6701e+003

Page 23: A STUDY ON SPEECH RECOGNITION USING DYNAMIC TIME WARPING CS 525 : Project Presentation PALDEN LAMA and MOUNIKA NAMBURU.

THANKS ! ANY QUESTIONS?