Top Banner
SPOKEN LANGUAGE SYSTEMS MIT Laboratory for Computer Science Gammachirp Auditory Filter Alex Park May 7 th , 2003
19

SPOKEN LANGUAGE SYSTEMS MIT Laboratory for Computer Science Gammachirp Auditory Filter Alex Park May 7 th, 2003.

Jan 04, 2016

Download

Documents

Lorin Carr
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: SPOKEN LANGUAGE SYSTEMS MIT Laboratory for Computer Science Gammachirp Auditory Filter Alex Park May 7 th, 2003.

SPOKEN LANGUAGE SYSTEMS

MIT Laboratory for Computer Science

Gammachirp Auditory Filter

Alex Park

May 7th, 2003

Page 2: SPOKEN LANGUAGE SYSTEMS MIT Laboratory for Computer Science Gammachirp Auditory Filter Alex Park May 7 th, 2003.

MIT Laboratory for Computer Science

SLSProject Overview

• Goal: – Investigate use of (non-linear) auditory filters for speech analysis

• Background:– Sound analysis in auditory periphery similar to wavelet transform

• Comparison:– Traditional Short-Time Fourier analysis

– Gammatone wavelet based analysis (auditory filter)

• Extension:– Gammachirp filter has level-dependent parameters which can

model non-linear characteristics of auditory periphery

• Implementation:– Specifics of Gammachirp implementation

– How to incorporate level dependency

Page 3: SPOKEN LANGUAGE SYSTEMS MIT Laboratory for Computer Science Gammachirp Auditory Filter Alex Park May 7 th, 2003.

MIT Laboratory for Computer Science

SLSAuditory Physiology

• Sound pressure variation in the air is transduced through the outer and middle ears onto end of cochlea

• Basilar membrane which runs throughout the cochlea maps place of maximal displacement to frequency

Low freq (200 Hz)

High freq (20 kHz)

Outer ear Middle ear Cochlea

Basilar Membrane

Auditory Nerve

Cortex

Page 4: SPOKEN LANGUAGE SYSTEMS MIT Laboratory for Computer Science Gammachirp Auditory Filter Alex Park May 7 th, 2003.

MIT Laboratory for Computer Science

SLSMotivation – Why better auditory models?

• Automatic Speech Recognition (ASR)– ASR systems perform adequately in ‘clean’ conditions

– Robustness is a major problem; degradation in low SNR conditions is much worse than humans

• Hearing research– Build better hearing aids and cochlear implants

– Hearing impaired subjects with damaged cochlea have trouble understanding speech in noisy environments

– Current hearing aids perform linear amplification, amplify noise as well as the signal

• Is the lack of compressive non-linearity in the front-end a common link?

Page 5: SPOKEN LANGUAGE SYSTEMS MIT Laboratory for Computer Science Gammachirp Auditory Filter Alex Park May 7 th, 2003.

MIT Laboratory for Computer Science

SLS

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8

0

Non-stationary Nature of Speech

• Why is speech a good candidate for local frequency analysis?

Waveform of the word “tapestry”

/t/transient

/ae/tone

/s/noise

Page 6: SPOKEN LANGUAGE SYSTEMS MIT Laboratory for Computer Science Gammachirp Auditory Filter Alex Park May 7 th, 2003.

MIT Laboratory for Computer Science

SLSTime-Frequency Representation

• The most common way of representing changing spectral content is the Short Time Fourier Transform (STFT)

0 0.01 0.02 0.03 0.04 0.05 0.06-0.1

0

0.1

0.2

Time0 0.01 0.02 0.03 0.04

0

1000

2000

3000

4000

5000

6000

7000

8000

Power

02000

40006000

8000

Frequency (Hz)

FFT

Page 7: SPOKEN LANGUAGE SYSTEMS MIT Laboratory for Computer Science Gammachirp Auditory Filter Alex Park May 7 th, 2003.

MIT Laboratory for Computer Science

SLSSpectrogram from STFT

“tapestry”

Page 8: SPOKEN LANGUAGE SYSTEMS MIT Laboratory for Computer Science Gammachirp Auditory Filter Alex Park May 7 th, 2003.

MIT Laboratory for Computer Science

SLSSTFT Characteristics

• We can think of the STFT as filtering using the following basis

0• In the frequency domain, we are using a filterbank consisting of linearly spaced, constant bandwidth filters

0 200 400 600 800 1000 1200 1400 1600 1800 2000

-40

-30

-20

-10

0

dB

Freq (Hz)Freq (Hz)

Page 9: SPOKEN LANGUAGE SYSTEMS MIT Laboratory for Computer Science Gammachirp Auditory Filter Alex Park May 7 th, 2003.

MIT Laboratory for Computer Science

SLSAuditory Filterbanks

• Unlike the STFT, physiological data indicates that auditory filters:– are spaced more closely at lower freq than at high freq

– have narrower bandwidths at lower frequencies (constant-Q)

• The Gammatone filter bank proposed by Patterson, models these characteristics using a wavelet transform.

• The mother wavelet, or kernel function, is

)2exp())(2exp( 11 tfjtfERBbat cc

n

Gamma Envelope Tone carrier

0 0.1 0.2 0.3 0.4 0.5

Page 10: SPOKEN LANGUAGE SYSTEMS MIT Laboratory for Computer Science Gammachirp Auditory Filter Alex Park May 7 th, 2003.

MIT Laboratory for Computer Science

SLSGammatone Characteristics

• Unlike the STFT, the Gammatone filterbank uses the following basis

• The corresponding frequency responses are

0 500 1000 1500

-80

-60

-40

-20

0

Freq (Hz)

dB

Freq (Hz)

Page 11: SPOKEN LANGUAGE SYSTEMS MIT Laboratory for Computer Science Gammachirp Auditory Filter Alex Park May 7 th, 2003.

MIT Laboratory for Computer Science

SLSWhat are we missing?

• The Gammatone filterbank has constant-Q bandwidths and logarithmic spacing of center frequencies

• Also, Gamma envelope guarantees compact support

• But, the filters are 1) symmetric and 2) linear

• Psychophysical experiments indicate that auditory filter shapes are:1) Asymmetric

* Sharper drop-off on high frequency side

2) Non-linear

* Filter shape and gain change depending on input level

* Compressive non-linearity of the cochlea

* Important for hearing in noise and for dynamic range

Page 12: SPOKEN LANGUAGE SYSTEMS MIT Laboratory for Computer Science Gammachirp Auditory Filter Alex Park May 7 th, 2003.

MIT Laboratory for Computer Science

SLSGammachirp Characteristics

• The Gammachirp filter developed by Irino & Patterson uses a modified version of the Gammatone kernel

)ln2exp())(2exp( 11 tjctfjtfERBbat cc

n

Gamma Envelope Tone carrier Chirp term

• Frequency response is asymmetric, can fit passive filter

• Level-dependent parameters can fit changes due to stimulus

0 0.1 0.2 0.3 0.4 0.5Impulse response

0 500 1000 1500

-80

-60

-40

-20

0

Freq (Hz)

dB

Page 13: SPOKEN LANGUAGE SYSTEMS MIT Laboratory for Computer Science Gammachirp Auditory Filter Alex Park May 7 th, 2003.

MIT Laboratory for Computer Science

SLSImplementation

• Looking in the frequency domain, the Gammachirp can be obtained by cascading a fixed Gammatone filter with an asymmetric filter

• To fit psychophysical data, a fixed Gammachirp is cascaded with level-dependent asymmetric IIR filters

0 500 1000 1500 2000 2500Frequency (Hz)

Filt

er G

ain

(dB

)

Gammatone AsymmetricCompensation Filter

Gammachirp

200 400 600 800 1000 1200 1400 1600 1800 2000Frequency (Hz)

Filt

er G

ain(

dB)

Level dependentchirps

PassiveGammachirp

Level dependentasymmetries

Page 14: SPOKEN LANGUAGE SYSTEMS MIT Laboratory for Computer Science Gammachirp Auditory Filter Alex Park May 7 th, 2003.

MIT Laboratory for Computer Science

SLSComparison: Tone vs. Passive Chirp outputs

Time (s)

Passive Gammachirp Spectrogram

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8

Time (s)

Gammatone Spectrogram

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8

• Gammatone output seems to have better frequency res.

• Passive Gammachirp output seems to have better time res.

Page 15: SPOKEN LANGUAGE SYSTEMS MIT Laboratory for Computer Science Gammachirp Auditory Filter Alex Park May 7 th, 2003.

MIT Laboratory for Computer Science

SLSComparison: Tone vs. Active Chirp Outputs

Active Gammachirp

Gammatone

Page 16: SPOKEN LANGUAGE SYSTEMS MIT Laboratory for Computer Science Gammachirp Auditory Filter Alex Park May 7 th, 2003.

MIT Laboratory for Computer Science

SLSIncorporating level dependency

• As illustrated in previous slide, passive Gammachirp output offers little advantage on clean speech using fixed stimulus levels

• We can incorporate parameter control via feedback

ComputePassive GC

SpectrogramSegment into

frames

Get stimuluslevel/channel

Filter w/ level specific filter

S1

S2

:SN-1

SN

For each time frame

ReconstructFrames

Page 17: SPOKEN LANGUAGE SYSTEMS MIT Laboratory for Computer Science Gammachirp Auditory Filter Alex Park May 7 th, 2003.

MIT Laboratory for Computer Science

SLS

Clean

Sample outputs

40dB SNR 20dB SNR

30dB SNR

Page 18: SPOKEN LANGUAGE SYSTEMS MIT Laboratory for Computer Science Gammachirp Auditory Filter Alex Park May 7 th, 2003.

MIT Laboratory for Computer Science

SLSReferences

• Bleeck, S., Patterson, R.D., and Ives, T. (2003) Auditory Image Model for Matlab. Centre for the Neural Basis of Hearing. http://www.mrc-cbu.cam.ac.uk/cnbh/aimmanual/Introduction/

• Irino, T. and Patterson, R.D. (2001). “A compressive gammachirp auditory filter for both physiological and psychophysical data,” J. Acoust. Soc. Am. 109, 2008-2022.

• Pickles, J.O. (1988). An Introduction to the Physiology of Hearing (Academic, London).

• Slaney, M. (1993). “An efficient implementation of the Patterson-Holdsworth auditory filterbank,” Apple Computer Technical Report #35.

• Slaney, M. (1998). “Auditory Toolbox for Matlab,” Interval Research Technical Report #1998-010. http://rvl4.ecn.purdue.edu/~malcolm/interval/1998-010/

Page 19: SPOKEN LANGUAGE SYSTEMS MIT Laboratory for Computer Science Gammachirp Auditory Filter Alex Park May 7 th, 2003.

MIT Laboratory for Computer Science

SLSSidenote

Clean

40 dB SNR

30 dB SNR