Top Banner
EEL 6586: AUTOMATIC SPEECH PROCESSING Windows Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida February 10, 2003
17

EEL 6586: AUTOMATIC SPEECH PROCESSING Windows Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida February 10, 2003.

Jan 18, 2018

Download

Documents

Derick Martin

…not those either!
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: EEL 6586: AUTOMATIC SPEECH PROCESSING Windows Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida February 10, 2003.

EEL 6586: AUTOMATIC SPEECH PROCESSING

Windows Lecture

Mark D. Skowronski Computational Neuro-Engineering Lab

University of FloridaFebruary 10, 2003

Page 2: EEL 6586: AUTOMATIC SPEECH PROCESSING Windows Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida February 10, 2003.

No, not MS Windows®…

Page 3: EEL 6586: AUTOMATIC SPEECH PROCESSING Windows Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida February 10, 2003.

…not those either!

Page 4: EEL 6586: AUTOMATIC SPEECH PROCESSING Windows Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida February 10, 2003.

Speech windows

Speech is NONSTATIONARY

Page 5: EEL 6586: AUTOMATIC SPEECH PROCESSING Windows Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida February 10, 2003.

Assume speech is stationary over ‘short’ window of time.

‘SEVEN’

Speech windows

Page 6: EEL 6586: AUTOMATIC SPEECH PROCESSING Windows Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida February 10, 2003.

What is a ‘short’ window of time?• 10 μs: smallest difference detectable by

auditory system (localization),• 3 ms: shortest phoneme (plosive burst),• 10 ms: glottal pulse period,• 100 ms: average phoneme duration,• 4 s: exhale period during speech.

‘Short’ depends on application.

Page 7: EEL 6586: AUTOMATIC SPEECH PROCESSING Windows Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida February 10, 2003.

Applications using windows

• Automatic speech recognition,• Speech coding/decoding,• Speaker identification,• Text-to-speech synthesis,• Noise reduction

Typical window (frame) length: 20-30 msTypical frame rate: 100 frames/sec

Page 8: EEL 6586: AUTOMATIC SPEECH PROCESSING Windows Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida February 10, 2003.

Short-time analysis

)()()( nsnwnx s(n): entire speech utterance

w(n): window function

x(n): frame of speech

Window function is non-zero for N samples, n=0,…,N-1

Page 9: EEL 6586: AUTOMATIC SPEECH PROCESSING Windows Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida February 10, 2003.

Short-term Fourier Transform

m

njemnwmsnX )()(),(

s(m): entire speech utterance

w(m): window function

X(n,ω): STFT of speech at time n

STFT is a smoothed version of original spectrum.

Page 10: EEL 6586: AUTOMATIC SPEECH PROCESSING Windows Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida February 10, 2003.

STFT example

)(*)()()()()( SWXnsnwnx

s(n): pure sinewave of infinite length

w(n): rectangular window:

o.w.0

1,...,01)(

Nnnw

Page 11: EEL 6586: AUTOMATIC SPEECH PROCESSING Windows Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida February 10, 2003.

STFT example|W(ω)|

*

|S(ω)|

ω0

ω0

=|X(ω)|

Page 12: EEL 6586: AUTOMATIC SPEECH PROCESSING Windows Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida February 10, 2003.

Window types

• Rectangular• Hann (cosine)• Hamming (raised cosine)• Blackman• Kaiser-Bessel

Tradeoff between leakage and blurring

Page 13: EEL 6586: AUTOMATIC SPEECH PROCESSING Windows Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida February 10, 2003.

Window tradeoff• Blurring: main lobe width A• Leakage: side lobe suppression B

B

A

Page 14: EEL 6586: AUTOMATIC SPEECH PROCESSING Windows Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida February 10, 2003.

Popular windowsWindow Unit BW Sidelobe

Rectangle 1 -13 dB

Hann 2 -31 dB

Hamming 2 -43 dB

Blackman 3 -68 dB

Kaiser-Bessel

4 -91 dB

Page 15: EEL 6586: AUTOMATIC SPEECH PROCESSING Windows Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida February 10, 2003.

Practical issues

• Rule of thumb:– Time domain, use Rectangle window– Freq domain, use Hamming window

• Why?

Page 16: EEL 6586: AUTOMATIC SPEECH PROCESSING Windows Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida February 10, 2003.

Time domain issues• Correlation in time domain interfered by

tapered windows

20 ms /eh/, male utterance, pitch measurement (normalized autocorrelation).

First side peak lower using Hamming window

Page 17: EEL 6586: AUTOMATIC SPEECH PROCESSING Windows Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida February 10, 2003.

Frequency domain issuesfs=12.5 KHz, /eh/, 800 samples, male speaker.Blurring/Leakage tradeoff evidence: