SPOKEN LANGUAGE SYSTEMS MIT Laboratory for Computer Science Gammachirp Auditory Filter Alex Park May 7 th, 2003.

SPOKEN LANGUAGE SYSTEMS

MIT Laboratory for Computer Science

Gammachirp Auditory Filter

Alex Park

May 7th, 2003


SLSProject Overview

• Goal: – Investigate use of (non-linear) auditory filters for speech analysis

• Background:– Sound analysis in auditory periphery similar to wavelet transform

• Comparison:– Traditional Short-Time Fourier analysis

– Gammatone wavelet based analysis (auditory filter)

• Extension:– Gammachirp filter has level-dependent parameters which can

model non-linear characteristics of auditory periphery

• Implementation:– Specifics of Gammachirp implementation

– How to incorporate level dependency


SLSAuditory Physiology

• Sound pressure variation in the air is transduced through the outer and middle ears onto end of cochlea

• Basilar membrane which runs throughout the cochlea maps place of maximal displacement to frequency

Low freq (200 Hz)

High freq (20 kHz)

Outer ear Middle ear Cochlea

Basilar Membrane

Auditory Nerve

Cortex


SLSMotivation – Why better auditory models?

• Automatic Speech Recognition (ASR)– ASR systems perform adequately in ‘clean’ conditions

– Robustness is a major problem; degradation in low SNR conditions is much worse than humans

• Hearing research– Build better hearing aids and cochlear implants

– Hearing impaired subjects with damaged cochlea have trouble understanding speech in noisy environments

– Current hearing aids perform linear amplification, amplify noise as well as the signal

• Is the lack of compressive non-linearity in the front-end a common link?


SLS

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8

0

Non-stationary Nature of Speech

• Why is speech a good candidate for local frequency analysis?

Waveform of the word “tapestry”

/t/transient

/ae/tone

/s/noise


SLSTime-Frequency Representation

• The most common way of representing changing spectral content is the Short Time Fourier Transform (STFT)

0 0.01 0.02 0.03 0.04 0.05 0.06-0.1

0

0.1

0.2

Time0 0.01 0.02 0.03 0.04

0

1000

2000

3000

4000

5000

6000

7000

8000

Power

02000

40006000

8000

Frequency (Hz)

FFT


SLSSpectrogram from STFT

“tapestry”


SLSSTFT Characteristics

• We can think of the STFT as filtering using the following basis

0• In the frequency domain, we are using a filterbank consisting of linearly spaced, constant bandwidth filters

0 200 400 600 800 1000 1200 1400 1600 1800 2000

-40

-30

-20

-10

0

dB

Freq (Hz)Freq (Hz)


SLSAuditory Filterbanks

• Unlike the STFT, physiological data indicates that auditory filters:– are spaced more closely at lower freq than at high freq

– have narrower bandwidths at lower frequencies (constant-Q)

• The Gammatone filter bank proposed by Patterson, models these characteristics using a wavelet transform.

• The mother wavelet, or kernel function, is

)2exp())(2exp( 11 tfjtfERBbat cc

n

Gamma Envelope Tone carrier

0 0.1 0.2 0.3 0.4 0.5


SLSGammatone Characteristics

• Unlike the STFT, the Gammatone filterbank uses the following basis

• The corresponding frequency responses are

0 500 1000 1500

-80

-60

-40

-20

0

Freq (Hz)

dB

Freq (Hz)


SLSWhat are we missing?

• The Gammatone filterbank has constant-Q bandwidths and logarithmic spacing of center frequencies

• Also, Gamma envelope guarantees compact support

• But, the filters are 1) symmetric and 2) linear

• Psychophysical experiments indicate that auditory filter shapes are:1) Asymmetric

* Sharper drop-off on high frequency side

2) Non-linear

* Filter shape and gain change depending on input level

* Compressive non-linearity of the cochlea

* Important for hearing in noise and for dynamic range


SLSGammachirp Characteristics

• The Gammachirp filter developed by Irino & Patterson uses a modified version of the Gammatone kernel

)ln2exp())(2exp( 11 tjctfjtfERBbat cc

n

Gamma Envelope Tone carrier Chirp term

• Frequency response is asymmetric, can fit passive filter

• Level-dependent parameters can fit changes due to stimulus

0 0.1 0.2 0.3 0.4 0.5Impulse response

0 500 1000 1500

-80

-60

-40

-20

0

Freq (Hz)

dB


SLSImplementation

• Looking in the frequency domain, the Gammachirp can be obtained by cascading a fixed Gammatone filter with an asymmetric filter

• To fit psychophysical data, a fixed Gammachirp is cascaded with level-dependent asymmetric IIR filters

0 500 1000 1500 2000 2500Frequency (Hz)

Filt

er G

ain

(dB

)

Gammatone AsymmetricCompensation Filter

Gammachirp

200 400 600 800 1000 1200 1400 1600 1800 2000Frequency (Hz)

Filt

er G

ain(

dB)

Level dependentchirps

PassiveGammachirp

Level dependentasymmetries


SLSComparison: Tone vs. Passive Chirp outputs

Time (s)

Passive Gammachirp Spectrogram

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8

Time (s)

Gammatone Spectrogram

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8

• Gammatone output seems to have better frequency res.

• Passive Gammachirp output seems to have better time res.


SLSComparison: Tone vs. Active Chirp Outputs

Active Gammachirp

Gammatone


SLSIncorporating level dependency

• As illustrated in previous slide, passive Gammachirp output offers little advantage on clean speech using fixed stimulus levels

• We can incorporate parameter control via feedback

ComputePassive GC

SpectrogramSegment into

frames

Get stimuluslevel/channel

Filter w/ level specific filter

S1

S2

:SN-1

SN

For each time frame

ReconstructFrames


SLS

Clean

Sample outputs

40dB SNR 20dB SNR

30dB SNR


SLSReferences

• Bleeck, S., Patterson, R.D., and Ives, T. (2003) Auditory Image Model for Matlab. Centre for the Neural Basis of Hearing. http://www.mrc-cbu.cam.ac.uk/cnbh/aimmanual/Introduction/

• Irino, T. and Patterson, R.D. (2001). “A compressive gammachirp auditory filter for both physiological and psychophysical data,” J. Acoust. Soc. Am. 109, 2008-2022.

• Pickles, J.O. (1988). An Introduction to the Physiology of Hearing (Academic, London).

• Slaney, M. (1993). “An efficient implementation of the Patterson-Holdsworth auditory filterbank,” Apple Computer Technical Report #35.

• Slaney, M. (1998). “Auditory Toolbox for Matlab,” Interval Research Technical Report #1998-010. http://rvl4.ecn.purdue.edu/~malcolm/interval/1998-010/


SLSSidenote

Clean

40 dB SNR

30 dB SNR

SPOKEN LANGUAGE SYSTEMS MIT Laboratory for Computer Science Gammachirp Auditory Filter Alex Park May 7 th, 2003.

Documents