EEL 6586: AUTOMATIC SPEECH PROCESSING Windows Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida February 10, 2003.

EEL 6586: AUTOMATIC SPEECH PROCESSING

Windows Lecture

Mark D. Skowronski Computational Neuro-Engineering Lab

University of FloridaFebruary 10, 2003

No, not MS Windows®…

…not those either!

Speech windows

Speech is NONSTATIONARY

Assume speech is stationary over ‘short’ window of time.

‘SEVEN’

Speech windows

What is a ‘short’ window of time?• 10 μs: smallest difference detectable by

auditory system (localization),• 3 ms: shortest phoneme (plosive burst),• 10 ms: glottal pulse period,• 100 ms: average phoneme duration,• 4 s: exhale period during speech.

‘Short’ depends on application.

Applications using windows

• Automatic speech recognition,• Speech coding/decoding,• Speaker identification,• Text-to-speech synthesis,• Noise reduction

Typical window (frame) length: 20-30 msTypical frame rate: 100 frames/sec

Short-time analysis

)()()( nsnwnx s(n): entire speech utterance

w(n): window function

x(n): frame of speech

Window function is non-zero for N samples, n=0,…,N-1

Short-term Fourier Transform

m

njemnwmsnX )()(),(

s(m): entire speech utterance

w(m): window function

X(n,ω): STFT of speech at time n

STFT is a smoothed version of original spectrum.

STFT example

)(*)()()()()( SWXnsnwnx

s(n): pure sinewave of infinite length

w(n): rectangular window:

o.w.0

1,...,01)(

Nnnw

STFT example|W(ω)|

*

|S(ω)|

ω0

ω0

=|X(ω)|

Window types

• Rectangular• Hann (cosine)• Hamming (raised cosine)• Blackman• Kaiser-Bessel

Tradeoff between leakage and blurring

Window tradeoff• Blurring: main lobe width A• Leakage: side lobe suppression B

B

A

Popular windowsWindow Unit BW Sidelobe

Rectangle 1 -13 dB

Hann 2 -31 dB

Hamming 2 -43 dB

Blackman 3 -68 dB

Kaiser-Bessel

4 -91 dB

Practical issues

• Rule of thumb:– Time domain, use Rectangle window– Freq domain, use Hamming window

• Why?

Time domain issues• Correlation in time domain interfered by

tapered windows

20 ms /eh/, male utterance, pitch measurement (normalized autocorrelation).

First side peak lower using Hamming window

Frequency domain issuesfs=12.5 KHz, /eh/, 800 samples, male speaker.Blurring/Leakage tradeoff evidence:

EEL 6586: AUTOMATIC SPEECH PROCESSING Windows Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida February 10, 2003.

Documents